From c93d56e4daeb5896dd741de7a47f1dcb6fbdf3d5 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Tue, 17 Sep 2024 15:55:11 +0200 Subject: [PATCH 01/21] Add export script --- internal/export.json | 688 +++++++++++++++++++++++++++++++++++++++++++ internal/export.mjs | 49 +++ internal/import.mjs | 75 +++++ package-lock.json | 231 +++++++++++++++ package.json | 2 + 5 files changed, 1045 insertions(+) create mode 100644 internal/export.json create mode 100644 internal/export.mjs create mode 100644 internal/import.mjs diff --git a/internal/export.json b/internal/export.json new file mode 100644 index 00000000..3721aee1 --- /dev/null +++ b/internal/export.json @@ -0,0 +1,688 @@ +[ + { + "slug": "2014/nextflow-meets-docker", + "title": "Reproducibility in Science - Nextflow meets Docker", + "date": "2014-09-09T00:00:00.000Z", + "content": "\nThe scientific world nowadays operates on the basis of published articles.\nThese are used to report novel discoveries to the rest of the scientific community.\n\nBut have you ever wondered what a scientific article is? It is a:\n\n1. defeasible argument for claims, supported by\n2. exhibited, reproducible data and methods, and\n3. explicit references to other work in that domain;\n4. described using domain-agreed technical terminology,\n5. which exists within a complex ecosystem of technologies, people and activities.\n\nHence the very essence of Science relies on the ability of scientists to reproduce and\nbuild upon each other’s published results.\n\nSo how much can we rely on published data? In a recent report in Nature, researchers at the\nAmgen corporation found that only 11% of the academic research in the literature was\nreproducible by their groups [[1](http://www.nature.com/nature/journal/v483/n7391/full/483531a.html)].\n\nWhile many factors are likely at play here, perhaps the most basic requirement for\nreproducibility holds that the materials reported in a study can be uniquely identified\nand obtained, such that experiments can be reproduced as faithfully as possible.\nThis information is meant to be documented in the \"materials and methods\" of journal articles,\nbut as many can attest, the information provided there is often not adequate for this task.\n\n### Promoting Computational Research Reproducibility\n\nEncouragingly scientific reproducibility has been at the forefront of many news stories\nand there exist numerous initiatives to help address this problem. Particularly, when it\ncomes to producing reproducible computational analyses, some publications are starting\nto publish the code and data used for analysing and generating figures.\n\nFor example, many articles in Nature and in the new Elife journal (and others) provide a\n\"source data\" download link next to figures. Sometimes Elife might even have an option\nto download the source code for figures.\n\nAs pointed out by Melissa Gymrek [in a recent post](http://melissagymrek.com/science/2014/08/29/docker-reproducible-research.html)\nthis is a great start, but there are still lots of problems. She wrote that, for example, if one wants\nto re-execute a data analyses from these papers, he/she will have to download the\nscripts and the data, to only realize that he/she has not all the required libraries,\nor that it only runs on, for example, an Ubuntu version he/she doesn't have, or some\npaths are hard-coded to match the authors' machine.\n\nIf it's not easy to run and doesn't run out of the box the chances that a researcher\nwill actually ever run most of these scripts is close to zero, especially if they lack\nthe time or expertise to manage the required installation of third-party libraries,\ntools or implement from scratch state-of-the-art data processing algorithms.\n\n### Here comes Docker\n\n[Docker](http://www.docker.com) containers technology is a solution to many of the computational\nresearch reproducibility problems. Basically, it is a kind of a lightweight virtual machine\nwhere you can set up a computing environment including all the libraries, code and data that you need,\nwithin a single _image_.\n\nThis image can be distributed publicly and can seamlessly run on any major Linux operating system.\nNo need for the user to mess with installation, paths, etc.\n\nThey just run the Docker image you provided, and everything is set up to work out of the box.\nResearchers have already started discussing this (e.g. [here](http://www.bioinformaticszen.com/post/reproducible-assembler-benchmarks/),\nand [here](https://bcbio.wordpress.com/2014/03/06/improving-reproducibility-and-installation-of-genomic-analysis-pipelines-with-docker/)).\n\n### Docker and Nextflow: a perfect match\n\nOne big advantage Docker has compared to _traditional_ machine virtualisation technology\nis that it doesn't need a complete copy of the operating system, thus it has a minimal\nstartup time. This makes it possible to virtualise single applications or launch the execution\nof multiple containers, that can run in parallel, in order to speedup a large computation.\n\nNextflow is a data-driven toolkit for computational pipelines, which aims to simplify the deployment of\ndistributed and highly parallelised pipelines for scientific applications.\n\nThe latest version integrates the support for Docker containers that enables the deployment\nof self-contained and truly reproducible pipelines.\n\n### How they work together\n\nA Nextflow pipeline is made up by putting together several processes. Each process\ncan be written in any scripting language that can be executed by the Linux platform\n(BASH, Perl, Ruby, Python, etc). Parallelisation is automatically managed\nby the framework and it is implicitly defined by the processes input and\noutput declarations.\n\nBy integrating Docker with Nextflow, every pipeline process can be executed independently\nin its own container, this guarantees that each of them run in a predictable\nmanner without worrying about the configuration of the target execution platform. Moreover the\nminimal overhead added by Docker allows us to spawn multiple container executions in a parallel\nmanner with a negligible performance loss when compared to a platform _native_ execution.\n\n### An example\n\nAs a proof of concept of the Docker integration with Nextflow you can try out the\npipeline example at this [link](https://github.com/nextflow-io/examples/blob/master/blast-parallel.nf).\n\nIt splits a protein sequences multi FASTA file into chunks of _n_ entries, executes a BLAST query\nfor each of them, then extracts the top 10 matching sequences and\nfinally aligns the results with the T-Coffee multiple sequence aligner.\n\nIn a common scenario you generally need to install and configure the tools required by this\nscript: BLAST and T-Coffee. Moreover you should provide a formatted protein database in order\nto execute the BLAST search.\n\nBy using Docker with Nextflow you only need to have the Docker engine installed in your\ncomputer and a Java VM. In order to try this example out, follow these steps:\n\nInstall the latest version of Nextflow by entering the following command in your shell terminal:\n\n curl -fsSL get.nextflow.io | bash\n\nThen download the required Docker image with this command:\n\n docker pull nextflow/examples\n\nYou can check the content of the image looking at the [Dockerfile](https://github.com/nextflow-io/examples/blob/master/Dockerfile)\nused to create it.\n\nNow you are ready to run the demo by launching the pipeline execution as shown below:\n\n nextflow run examples/blast-parallel.nf -with-docker\n\nThis will run the pipeline printing the final alignment out on the terminal screen.\nYou can also provide your own protein sequences multi FASTA file by adding, in the above command line,\nthe option `--query ` and change the splitting chunk size with `--chunk n` option.\n\nNote: the result doesn't have a real biological meaning since it uses a very small protein database.\n\n### Conclusion\n\nThe mix of Docker, GitHub and Nextflow technologies make it possible to deploy\nself-contained and truly replicable pipelines. It requires zero configuration and\nenables the reproducibility of data analysis pipelines in any system in which a Java VM and\nthe Docker engine are available.\n\n### Learn how to do it!\n\nFollow our documentation for a quick start using Docker with Nextflow at\nthe following link https://www.nextflow.io/docs/latest/docker.html\n", + "images": [] + }, + { + "slug": "2014/share-nextflow-pipelines-with-github", + "title": "Share Nextflow pipelines with GitHub", + "date": "2014-08-07T00:00:00.000Z", + "content": "\nThe [GitHub](https://github.com) code repository and collaboration platform is widely\nused between researchers to publish their work and to collaborate on projects source code.\n\nEven more interestingly a few months ago [GitHub announced improved support for researchers](https://github.com/blog/1840-improving-github-for-science)\nmaking it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.\n\nWith a DOI for your GitHub repository archive your code becomes formally citable\nin scientific publications.\n\n### Why use GitHub with Nextflow?\n\nThe latest Nextflow release (0.9.0) seamlessly integrates with GitHub.\nThis feature allows you to manage your code in a more consistent manner, or use other\npeople's Nextflow pipelines, published through GitHub, in a quick and transparent manner.\n\n### How it works\n\nThe idea is very simple, when you launch a script execution with Nextflow, it will look for\na file with the pipeline name you've specified. If that file does not exist,\nit will look for a public repository with the same name on GitHub. If it is found, the\nrepository is automatically downloaded to your computer and the code executed. This repository\nis stored in the Nextflow home directory, by default `$HOME/.nextflow`, thus it will be reused\nfor any further execution.\n\nYou can try this feature out, having Nextflow (version 0.9.0 or higher) installed in your computer,\nby simply entering the following command in your shell terminal:\n\n nextflow run nextflow-io/hello\n\nThe first time you execute this command Nextflow will download the pipeline\nat the following GitHub repository `https://github.com/nextflow-io/hello`,\nas you don't already have it in your computer. It will then execute it producing the expected output.\n\nIn order for a GitHub repository to be used as a Nextflow project, it must\ncontain at least one file named `main.nf` that defines your Nextflow pipeline script.\n\n### Run a specific revision\n\nAny Git branch, tag or commit ID in the GitHub repository can be used to specify a revision,\nthat you want to execute, when running your pipeline by adding the `-r` option to the run command line.\nSo for example you could enter:\n\n nextflow run nextflow-io/hello -r mybranch\n\nor\n\n nextflow run nextflow-io/hello -r v1.1\n\nThis can be very useful when comparing different versions of your project.\nIt also guarantees consistent results in your pipeline as your source code evolves.\n\n### Commands to manage pipelines\n\nThe following commands allows you to perform some basic operations that can be used to manage your pipelines.\nAnyway Nextflow is not meant to replace functionalities provided by the [Git](http://git-scm.com/) tool,\nyou may still need it to create new repositories or commit changes, etc.\n\n#### List available pipelines\n\nThe `ls` command allows you to list all the pipelines you have downloaded in\nyour computer. For example:\n\n nextflow ls\n\nThis prints a list similar to the following one:\n\n cbcrg/piper-nf\n nextflow-io/hello\n\n#### Show pipeline information\n\nBy using the `info` command you can show information from a downloaded pipeline. For example:\n\n $ nextflow info hello\n\nThis command prints:\n\n repo name : nextflow-io/hello\n home page : http://github.com/nextflow-io/hello\n local path : $HOME/.nextflow/assets/nextflow-io/hello\n main script: main.nf\n revisions :\n * master (default)\n mybranch\n v1.1 [t]\n v1.2 [t]\n\nStarting from the top it shows: 1) the repository name; 2) the project home page; 3) the local folder where the pipeline has been downloaded; 4) the script that is executed\nwhen launched; 5) the list of available revisions i.e. branches + tags. Tags are marked with\na `[t]` on the right, the current checked-out revision is marked with a `*` on the left.\n\n#### Pull or update a pipeline\n\nThe `pull` command allows you to download a pipeline from a GitHub repository or to update\nit if that repository has already been downloaded. For example:\n\n nextflow pull nextflow-io/examples\n\nDownloaded pipelines are stored in the folder `$HOME/.nextflow/assets` in your computer.\n\n#### Clone a pipeline into a folder\n\nThe `clone` command allows you to copy a Nextflow pipeline project to a directory of your choice. For example:\n\n nextflow clone nextflow-io/hello target-dir\n\nIf the destination directory is omitted the specified pipeline is cloned to a directory\nwith the same name as the pipeline _base_ name (e.g. `hello`) in the current folder.\n\nThe clone command can be used to inspect or modify the source code of a pipeline. You can\neventually commit and push back your changes by using the usual Git/GitHub workflow.\n\n#### Drop an installed pipeline\n\nDownloaded pipelines can be deleted by using the `drop` command, as shown below:\n\n nextflow drop nextflow-io/hello\n\n### Limitations and known problems\n\n- GitHub private repositories currently are not supported Support for private GitHub repositories has been introduced with version 0.10.0.\n- Symlinks committed in a Git repository are not resolved correctly\n when downloaded/cloned by Nextflow Symlinks are resolved correctly when using Nextflow version 0.11.0 (or higher).\n", + "images": [] + }, + { + "slug": "2014/using-docker-in-hpc-cluster", + "title": "Using Docker for scientific data analysis in an HPC cluster", + "date": "2014-11-06T00:00:00.000Z", + "content": "\nScientific data analysis pipelines are rarely composed by a single piece of software.\nIn a real world scenario, computational pipelines are made up of multiple stages, each of which\ncan execute many different scripts, system commands and external tools deployed in a hosting computing\nenvironment, usually an HPC cluster.\n\nAs I work as a research engineer in a bioinformatics lab I experience on a daily basis the\ndifficulties related on keeping such a piece of software consistent.\n\nComputing environments can change frequently in order to test new pieces of software or\nmaybe because system libraries need to be updated. For this reason replicating the results\nof a data analysis over time can be a challenging task.\n\n[Docker](http://www.docker.com) has emerged recently as a new type of virtualisation technology that allows one\nto create a self-contained runtime environment. There are plenty of examples\nshowing the benefits of using it to run application services, like web servers\nor databases.\n\nHowever it seems that few people have considered using Docker for the deployment of scientific\ndata analysis pipelines on distributed cluster of computer, in order to simplify the development,\nthe deployment and the replicability of this kind of applications.\n\nFor this reason I wanted to test the capabilities of Docker to solve these problems in the\ncluster available in our [institute](http://www.crg.eu).\n\n## Method\n\nThe Docker engine has been installed in each node of our cluster, that runs a [Univa grid engine](http://www.univa.com/products/grid-engine.php) resource manager.\nA Docker private registry instance has also been installed in our internal network, so that images\ncan be pulled from the local repository in a much faster way when compared to the public\n[Docker registry](http://registry.hub.docker.com).\n\nMoreover the Univa grid engine has been configured with a custom [complex](http://www.gridengine.eu/mangridengine/htmlman5/complex.html)\nresource type. This allows us to request a specific Docker image as a resource type while\nsubmitting a job execution to the cluster.\n\nThe Docker image is requested as a _soft_ resource, by doing that the UGE scheduler\ntries to run a job to a node where that image has already been pulled,\notherwise a lower priority is given to it and it is executed, eventually, by a node where\nthe specified Docker image is not available. This will force the node to pull the required\nimage from the local registry at the time of the job execution.\n\nThis environment has been tested with [Piper-NF](https://github.com/cbcrg/piper-nf), a genomic pipeline for the\ndetection and mapping of long non-coding RNAs.\n\nThe pipeline runs on top of Nextflow, which takes care of the tasks parallelisation and submits\nthe jobs for execution to the Univa grid engine.\n\nThe Piper-NF code wasn't modified in order to run it using Docker.\nNextflow is able to handle it automatically. The Docker containers are run in such a way that\nthe tasks result files are created in the hosting file system, in other\nwords it behaves in a completely transparent manner without requiring extra steps or affecting\nthe flow of the pipeline execution.\n\nIt was only necessary to specify the Docker image (or images) to be used in the Nextflow\nconfiguration file for the pipeline. You can read more about this at [this link](https://www.nextflow.io/docs/latest/docker.html).\n\n## Results\n\nTo benchmark the impact of Docker on the pipeline performance a comparison was made running\nit with and without Docker.\n\nFor this experiment 10 cluster nodes were used. The pipeline execution launches around 100 jobs,\nand it was run 5 times by using the same dataset with and without Docker.\n\nThe average execution time without Docker was 28.6 minutes, while the average\npipeline execution time, running each job in a Docker container, was 32.2 minutes.\nThus, by using Docker the overall execution time increased by something around 12.5%.\n\nIt is important to note that this time includes both the Docker bootstrap time,\nand the time overhead that is added to the task execution by the virtualisation layer.\n\nFor this reason the actual task run time was measured as well i.e. without including the\nDocker bootstrap time overhead. In this case, the aggregate average task execution time was 57.3 minutes\nand 59.5 minutes when running the same tasks using Docker. Thus, the time overhead\nadded by the Docker virtualisation layer to the effective task run time can be estimated\nto around 4% in our test.\n\nKeeping the complete toolset required by the pipeline execution within a Docker image dramatically\nreduced configuration and deployment problems. Also storing these images into the private and\n[public](https://registry.hub.docker.com/repos/cbcrg/) repositories with a unique tag allowed us\nto replicate the results without the usual burden required to set-up an identical computing environment.\n\n## Conclusion\n\nThe fast start-up time for Docker containers technology allows one to virtualise a single process or\nthe execution of a bunch of applications, instead of a complete operating system. This opens up new possibilities,\nfor example the possibility to \"virtualise\" distributed job executions in an HPC cluster of computers.\n\nThe minimal performance loss introduced by the Docker engine is offset by the advantages of running\nyour analysis in a self-contained and dead easy to reproduce runtime environment, which guarantees\nthe consistency of the results over time and across different computing platforms.\n\n#### Credits\n\nThanks to Arnau Bria and the all scientific systems admins team to manage the Docker installation\nin the CRG computing cluster.\n", + "images": [] + }, + { + "slug": "2015/innovation-in-science-the-story-behind-nextflow", + "title": "Innovation In Science - The story behind Nextflow", + "date": "2015-06-09T00:00:00.000Z", + "content": "\nInnovation can be viewed as the application of solutions that meet new requirements or\nexisting market needs. Academia has traditionally been the driving force of innovation.\nScientific ideas have shaped the world, but only a few of them were brought to market by\nthe inventing scientists themselves, resulting in both time and financial loses.\n\nLately there have been several attempts to boost scientific innovation and translation,\nwith most notable in Europe being the Horizon 2020 funding program. The problem with these\ntypes of funding is that they are not designed for PhDs and Postdocs, but rather aim to\npromote the collaboration of senior scientists in different institutions. This neglects two\nvery important facts, first and foremost that most of the Nobel prizes were given for\ndiscoveries made when scientists were in their 20's / 30's (not in their 50's / 60's).\nSecondly, innovation really happens when a few individuals (not institutions) face a\nproblem in their everyday life/work, and one day they just decide to do something about it\n(end-user innovation). Without realizing, these people address a need that many others have.\nThey don’t do it for the money or the glory; they do it because it bothers them!\nMany examples of companies that started exactly this way include Apple, Google, and\nVirgin Airlines.\n\n### The story of Nextflow\n\nSimilarly, Nextflow started as an attempt to solve the every-day computational problems we\nwere facing with “big biomedical data” analyses. We wished that our huge and almost cryptic\nBASH-based pipelines could handle parallelization automatically. In our effort to make that\nhappen we stumbled upon the [Dataflow](http://en.wikipedia.org/wiki/Dataflow_programming)\nprogramming model and Nextflow was created.\nWe were getting furious every time our two-week long pipelines were crashing and we had\nto re-execute them from the beginning. We, therefore, developed a caching system, which\nallows Nextflow to resume any pipeline from the last executed step. While we were really\nenjoying developing a new [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) and\ncreating our own operators, at the same time we were not willing to give up our favorite\nPerl/Python scripts and one-liners, and thus Nextflow became a polyglot.\n\nAnother problem we were facing was that our pipelines were invoking a lot of\nthird-party software, making distribution and execution on different platforms a nightmare.\nOnce again while searching for a solution to this problem, we were able to identify a\nbreakthrough technology [Docker](https://www.docker.com/), which is now revolutionising\ncloud computation. Nextflow has been one of the first framework, that fully\nsupports Docker containers and allows pipeline execution in an isolated and easy to distribute manner.\nOf course, sharing our pipelines with our friends rapidly became a necessity and so we had\nto make Nextflow smart enough to support [Github](https://github.com) and [Bitbucket](https://bitbucket.org/) integration.\n\nI don’t know if Nextflow will make as much difference in the world as the Dataflow\nprogramming model and Docker container technology are making, but it has already made a\nbig difference in our lives and that is all we ever wanted…\n\n### Conclusion\n\nSummarising, it is a pity that PhDs and Postdocs are the neglected engine of Innovation.\nThey are not empowered to innovate, by identifying and addressing their needs, and to\npotentially set up commercial solutions to their problems. This fact becomes even sadder\nwhen you think that only 3% of Postdocs have a chance to become PIs in the UK. Instead more\nand more money is being invested into the senior scientists who only require their PhD students\nand Postdocs to put another step into a well-defined ladder. In todays world it seems that\nideas, such as Nextflow, will only get funded for their scientific value, not as innovative\nconcepts trying to address a need.\n", + "images": [] + }, + { + "slug": "2015/introducing-nextflow-console", + "title": "Introducing Nextflow REPL Console", + "date": "2015-04-14T00:00:00.000Z", + "content": "\nThe latest version of Nextflow introduces a new _console_ graphical interface.\n\nThe Nextflow console is a REPL ([read-eval-print loop](http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop))\nenvironment that allows one to quickly test part of a script or pieces of Nextflow code\nin an interactive manner.\n\nIt is a handy tool that allows one to evaluate fragments of Nextflow/Groovy code\nor fast prototype a complete pipeline script.\n\n### Getting started\n\nThe console application is included in the latest version of Nextflow\n([0.13.1](https://github.com/nextflow-io/nextflow/releases) or higher).\n\nYou can try this feature out, having Nextflow installed on your computer, by entering the\nfollowing command in your shell terminal: `nextflow console `.\n\nWhen you execute it for the first time, Nextflow will spend a few seconds downloading\nthe required runtime dependencies. When complete the console window will appear as shown in\nthe picture below.\n\n\"Nextflow\n\nIt contains a text editor (the top white box) that allows you to enter and modify code snippets.\nThe results area (the bottom yellow box) will show the executed code's output.\n\nAt the top you will find the menu bar (not shown in this picture) and the actions\ntoolbar that allows you to open, save, execute (etc.) the code been tested.\n\nAs a practical execution example, simply copy and paste the following piece of code in the\nconsole editor box:\n\n echo true\n\n process sayHello {\n\n \"\"\"\n echo Hello world\n \"\"\"\n\n }\n\nThen, in order to evaluate it, open the `Script` menu in the top menu bar and select the `Run`\ncommand. Alternatively you can use the `CTRL+R` keyboard shortcut to run it (`⌘+R` on the Mac).\nIn the result box an output similar to the following will appear:\n\n [warm up] executor > local\n [00/d78a0f] Submitted process > sayHello (1)\n Hello world\n\nNow you can try to modify the entered process script, execute it again and check that\nthe printed result has changed.\n\nIf the output doesn't appear, open the `View` menu and make sure that the entry `Capture Standard\nOutput` is selected (it must have a tick on the left).\n\nIt is worth noting that the global script context is maintained across script executions.\nThis means that variables declared in the global script scope are not lost when the\nscript run is complete, and they can be accessed in further executions of the same or another\npiece of code.\n\nIn order to reset the global context you can use the command `Clear Script Context`\navailable in the `Script` menu.\n\n### Conclusion\n\nThe Nextflow console is a REPL environment which allows you to experiment and get used\nto the Nextflow programming environment. By using it you can prototype or test your code\nwithout the need to create/edit script files.\n\nNote: the Nextflow console is implemented by sub-classing the [Groovy console](http://groovy-lang.org/groovyconsole.html) tool.\nFor this reason you may find some labels that refer to the Groovy programming environment\nin this program.\n", + "images": [ + "/img/nextflow-console1.png" + ] + }, + { + "slug": "2015/mpi-like-execution-with-nextflow", + "title": "MPI-like distributed execution with Nextflow", + "date": "2015-11-13T00:00:00.000Z", + "content": "\nThe main goal of Nextflow is to make workflows portable across different\ncomputing platforms taking advantage of the parallelisation features provided\nby the underlying system without having to reimplement your application code.\n\nFrom the beginning Nextflow has included executors designed to target the most popular\nresource managers and batch schedulers commonly used in HPC data centers,\nsuch as [Univa Grid Engine](http://www.univa.com), [Platform LSF](http://www.ibm.com/systems/platformcomputing/products/lsf/),\n[SLURM](https://computing.llnl.gov/linux/slurm/), [PBS](http://www.pbsworks.com/Product.aspx?id=1) and [Torque](http://www.adaptivecomputing.com/products/open-source/torque/).\n\nWhen using one of these executors Nextflow submits the computational workflow tasks\nas independent job requests to the underlying platform scheduler, specifying\nfor each of them the computing resources needed to carry out its job.\n\nThis approach works well for workflows that are composed of long running tasks, which\nis the case of most common genomic pipelines.\n\nHowever this approach does not scale well for workloads made up of a large number of\nshort-lived tasks (e.g. a few seconds or sub-seconds). In this scenario the resource\nmanager scheduling time is much longer than the actual task execution time, thus resulting\nin an overall execution time that is much longer than the real execution time.\nIn some cases this represents an unacceptable waste of computing resources.\n\nMoreover supercomputers, such as [MareNostrum](https://www.bsc.es/marenostrum-support-services/mn3)\nin the [Barcelona Supercomputer Center (BSC)](https://www.bsc.es/), are optimized for\nmemory distributed applications. In this context it is needed to allocate a certain\namount of computing resources in advance to run the application in a distributed manner,\ncommonly using the [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) standard.\n\nIn this scenario, the Nextflow execution model was far from optimal, if not unfeasible.\n\n### Distributed execution\n\nFor this reason, since the release 0.16.0, Nextflow has implemented a new distributed execution\nmodel that greatly improves the computation capability of the framework. It uses [Apache Ignite](https://ignite.apache.org/),\na lightweight clustering engine and in-memory data grid, which has been recently open sourced\nunder the Apache software foundation umbrella.\n\nWhen using this feature a Nextflow application is launched as if it were an MPI application.\nIt uses a job wrapper that submits a single request specifying all the needed computing\nresources. The Nextflow command line is executed by using the `mpirun` utility, as shown in the\nexample below:\n\n #!/bin/bash\n #$ -l virtual_free=120G\n #$ -q \n #$ -N \n #$ -pe ompi \n mpirun --pernode nextflow run -with-mpi [pipeline parameters]\n\nThis tool spawns a Nextflow instance in each of the computing nodes allocated by the\ncluster manager.\n\nEach Nextflow instance automatically connects with the other peers creating an _private_\ninternal cluster, thanks to the Apache Ignite clustering feature that\nis embedded within Nextflow itself.\n\nThe first node becomes the application driver that manages the execution of the\nworkflow application, submitting the tasks to the remaining nodes that act as workers.\n\nWhen the application is complete, the Nextflow driver automatically shuts down the\nNextflow/Ignite cluster and terminates the job execution.\n\n![Nextflow distributed execution](/img/nextflow-distributed-execution.png)\n\n### Conclusion\n\nIn this way it is possible to deploy a Nextflow workload in a supercomputer using an\nexecution strategy that resembles the MPI distributed execution model. This doesn't\nrequire to implement your application using the MPI api/library and it allows you to\nmaintain your code portable across different execution platforms.\n\nAlthough we do not currently have a performance comparison between a Nextflow distributed\nexecution and an equivalent MPI application, we assume that the latter provides better\nperformance due to its low-level optimisation.\n\nNextflow, however, focuses on the fast prototyping of scientific applications in a portable\nmanner while maintaining the ability to scale and distribute the application workload in an\nefficient manner in an HPC cluster.\n\nThis allows researchers to validate an experiment, quickly, reusing existing tools and\nsoftware components. This eventually makes it possible to implement an optimised version\nusing a low-level programming language in the second stage of a project.\n\nRead the documentation to learn more about the [Nextflow distributed execution model](https://www.nextflow.io/docs/latest/ignite.html#execution-with-mpi).\n", + "images": [] + }, + { + "slug": "2015/the-impact-of-docker-on-genomic-pipelines", + "title": "The impact of Docker containers on the performance of genomic pipelines", + "date": "2015-06-15T00:00:00.000Z", + "content": "\nIn a recent publication we assessed the impact of Docker containers technology\non the performance of bioinformatic tools and data analysis workflows.\n\nWe benchmarked three different data analyses: a RNA sequence pipeline for gene expression,\na consensus assembly and variant calling pipeline, and finally a pipeline for the detection\nand mapping of long non-coding RNAs.\n\nWe found that Docker containers have only a minor impact on the performance\nof common genomic data analysis, which is negligible when the executed tasks are demanding\nin terms of computational time.\n\n_[This publication is available as PeerJ preprint at this link](https://peerj.com/preprints/1171/)._\n", + "images": [] + }, + { + "slug": "2016/best-practice-for-reproducibility", + "title": "Workflows & publishing: best practice for reproducibility", + "date": "2016-04-13T00:00:00.000Z", + "content": "\nPublication time acts as a snapshot for scientific work. Whether a project is ongoing\nor not, work which was performed months ago must be described, new software documented,\ndata collated and figures generated.\n\nThe monumental increase in data and pipeline complexity has led to this task being\nperformed to many differing standards, or [lack of thereof](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080278).\nWe all agree it is not good enough to simply note down the software version number.\nBut what practical measures can be taken?\n\nThe recent publication describing _Kallisto_ [(Bray et al. 2016)](https://doi.org/10.1038/nbt.3519)\nprovides an excellent high profile example of the growing efforts to ensure reproducible\nscience in computational biology. The authors provide a GitHub [repository](https://github.com/pachterlab/kallisto_paper_analysis)\nthat _“contains all the analysis to reproduce the results in the kallisto paper”_.\n\nThey should be applauded and indeed - in the Twittersphere - they were. The corresponding\nauthor Lior Pachter stated that the publication could be reproduced starting from raw\nreads in the NCBI Sequence Read Archive through to the results, which marks a fantastic\naccomplishment.\n\n

Hoping people will notice https://t.co/qiu3LFozMX by @yarbsalocin @hjpimentel @pmelsted reproducing ALL the #kallisto paper from SRA→results

— Lior Pachter (@lpachter) April 5, 2016
\n\n\nThey achieve this utilising the workflow framework [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home).\nIncreasingly, we are seeing scientists applying workflow frameworks to their pipelines,\nwhich is great to see. There is a learning curve, but I have personally found the payoffs\nin productivity to be immense.\n\nAs both users and developers of Nextflow, we have long discussed best practice to ensure\nreproducibility of our work. As a community, we are at the beginning of that conversation\n\n- there are still many ideas to be aired and details ironed out - nevertheless we wished\n to provide a _state-of-play_ as we see it and to describe what is possible with Nextflow\n in this regard.\n\n### Guaranteed Reproducibility\n\nThis is our goal. It is one thing for a pipeline to be able to be reproduced in your own\nhands, on your machine, yet is another for this to be guaranteed so that anyone anywhere\ncan reproduce it. What I mean by guaranteed is that when a given pipeline is executed,\nthere is only one result which can be output.\nEnvisage what I term the _reproducibility triangle_: consisting of data, code and\ncompute environment.\n\n![Reproducibility Triangle](/img/reproducibility-triangle.png)\n\n**Figure 1:** The Reproducibility Triangle. _Data_: raw data such as sequencing reads,\ngenomes and annotations but also metadata such as experimental design. _Code_:\nscripts, binaries and libraries/dependencies. _Environment_: operating system.\n\nIf there is any change to one of these then the reproducibililty is no longer guaranteed.\nFor years there have been solutions to each of these individual components. But they have\nlived a somewhat discrete existence: data in databases such as the SRA and Ensembl, code\non GitHub and compute environments in the form of virtual machines. We think that in the\nfuture science must embrace solutions that integrate each of these components natively and\nholistically.\n\n### Implementation\n\nNextflow provides a solution to reproduciblility through version control and sandboxing.\n\n#### Code\n\nVersion control is provided via [native integration with GitHub](https://www.nextflow.io/docs/latest/sharing.html)\nand other popular code management platforms such as Bitbucket and GitLab.\nPipelines can be pulled, executed, developed, collaborated on and shared. For example,\nthe command below will pull a specific version of a [simple Kallisto + Sleuth pipeline](https://github.com/cbcrg/kallisto-nf)\nfrom GitHub and execute it. The `-r` parameter can be used to specify a specific tag, branch\nor revision that was previously defined in the Git repository.\n\n nextflow run cbcrg/kallisto-nf -r v0.9\n\n#### Environment\n\nSandboxing during both development and execution is another key concept; version control\nalone does not ensure that all dependencies nor the compute environment are the same.\n\nA simplified implementation of this places all binaries, dependencies and libraries within\nthe project repository. In Nextflow, any binaries within the the `bin` directory of a\nrepository are added to the path. Also, within the Nextflow [config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config),\nenvironmental variables such as `PERL5LIB` can be defined so that they are automatically\nadded during the task executions.\n\nThis can be taken a step further with containerisation such as [Docker](https://www.nextflow.io/docs/latest/docker.html).\nWe have recently published [work](https://doi.org/10.7717/peerj.1273) about this:\nbriefly a [dockerfile](https://github.com/cbcrg/kallisto-nf/blob/master/Dockerfile)\ncontaining the instructions on how to build the docker image resides inside a repository.\nThis provides a specification for the operating system, software, libraries and\ndependencies to be run.\n\nThe images themself also have content-addressable identifiers in the form of\n[digests](https://docs.docker.com/engine/userguide/containers/dockerimages/#image-digests),\nwhich ensure not a single byte of information, from the operating system through to the\nlibraries pulled from public repos, has been changed. This container digest can be specified\nin the [pipeline config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config).\n\n process {\n container = \"cbcrg/kallisto-nf@sha256:9f84012739...\"\n }\n\nWhen doing so Nextflow automatically pulls the specified image from the Docker Hub and\nmanages the execution of the pipeline tasks from within the container in a transparent manner,\ni.e. without having to adapt or modify your code.\n\n#### Data\n\nData is currently one of the more challenging aspect to address. _Small data_ can be\neasily version controlled within git-like repositories. For larger files\nthe [Git Large File Storage](https://git-lfs.github.com/), for which Nextflow provides\nbuilt-in support, may be one solution. Ultimately though, the real home of scientific data\nis in publicly available, programmatically accessible databases.\n\nProviding out-of-box solutions is difficult given the hugely varying nature of the data\nand meta-data within these databases. We are currently looking to incorporate the most\nhighly used ones, such as the [SRA](http://www.ncbi.nlm.nih.gov/sra) and [Ensembl](http://www.ensembl.org/index.html).\nIn the long term we have an eye on initiatives, such as [NCBI BioProject](https://www.ncbi.nlm.nih.gov/bioproject/),\nwith the idea there is a single identifier for both the data and metadata that can be referenced in a workflow.\n\nAdhering to the practices above, one could imagine one line of code which would appear within a publication.\n\n nextflow run [user/repo] -r [version] --data[DB_reference:data_reference] -with-docker\n\nThe result would be guaranteed to be reproduced by whoever wished.\n\n### Conclusion\n\nWith this approach the reproducilbility triangle is complete. But it must be noted that\nthis does not guard against conceptual or implementation errors. It does not replace proper\ndocumentation. What it does is to provide transparency to a result.\n\nThe assumption that the deterministic nature of computation makes results insusceptible\nto irreproducbility is clearly false. We consider Nextflow with its other features such\nits polyglot nature, out-of-the-box portability and native support across HPC and Cloud\nenvironments to be an ideal solution in our everyday work. We hope to see more scientists\nadopt this approach to their workflows.\n\nThe recent efforts by the _Kallisto_ authors highlight the appetite for increasing these\nstandards and we encourage the community at large to move towards ensuring this becomes\nthe normal state of affairs for publishing in science.\n\n### References\n\nBray, Nicolas L., Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology, April. Nature Publishing Group. doi:10.1038/nbt.3519.\n\nDi Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. (2015) \"The impact of Docker containers on the performance of genomic pipelines.\" PeerJ 3:e1273 doi.org:10.7717/peerj.1273.\n\nGarijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) \"Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.\" PLoS ONE 8(11): e80278. doi:10.1371/journal.pone.0080278\n", + "images": [] + }, + { + "slug": "2016/deploy-in-the-cloud-at-snap-of-a-finger", + "title": "Deploy your computational pipelines in the cloud at the snap-of-a-finger", + "date": "2016-09-01T00:00:00.000Z", + "content": "\n

\nLearn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers\n

\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\n

\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n

\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\nWe are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image `ami-43f49030` `ami-4b7daa32`\\* (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the documentation for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", + "images": [] + }, + { + "slug": "2016/developing-bioinformatics-pipeline-across-multiple-environments", + "title": "Developing a bioinformatics pipeline across multiple environments", + "date": "2016-02-04T00:00:00.000Z", + "content": "\nAs a new bioinformatics student with little formal computer science training, there are\nfew things that scare me more than PhD committee meetings and having to run my code in a\ncompletely different operating environment.\n\nRecently my work landed me in the middle of the phylogenetic tree jungle and the computational\nrequirements of my project far outgrew the resources that were available on our institute’s\n[Univa Grid Engine](https://en.wikipedia.org/wiki/Univa_Grid_Engine) based cluster. Luckily for me,\nan opportunity arose to participate in a joint program at the MareNostrum HPC at the\n[Barcelona Supercomputing Centre](http://www.bsc.es) (BSC).\n\nAs one of the top 100 supercomputers in the world, the [MareNostrum III](https://www.bsc.es/discover-bsc/the-centre/marenostrum)\ndwarfs our cluster and consists of nearly 50'000 processors. However it soon became apparent\nthat with great power comes great responsibility and in the case of the BSC, great restrictions.\nThese include no internet access, restrictive wall times for jobs, longer queues,\nfewer pre-installed binaries and an older version of bash. Faced with the possibility of\nhaving to rewrite my 16 bodged scripts for another queuing system I turned to Nextflow.\n\nStraight off the bat I was able to reduce all my previous scripts to a single Nextflow script.\nAdmittedly, the original code was not great, but the data processing model made me feel confident\nin what I was doing and I was able to reduce the volume of code to 25% of its initial amount\nwhilst making huge improvements in the readability. The real benefits however came from the portability.\n\nI was able to write the project on my laptop (Macbook Air), continuously test it on my local\ndesktop machine (Linux) and then perform more realistic heavy lifting runs on the cluster,\nall managed from a single GitHub repository. The BSC uses the [Load Sharing Facility](https://en.wikipedia.org/wiki/Platform_LSF)\n(LSF) platform with longer queue times, but a large number of CPUs. My project on the other\nhand had datasets that require over 100'000 tasks, but the tasks processes themselves run\nfor a matter of seconds or minutes. We were able to marry these two competing interests\ndeploying Nextflow in a [distributed execution manner that resemble the one of an MPI application](/blog/2015/mpi-like-execution-with-nextflow.html).\n\nIn this configuration, the queuing system allocates the Nextflow requested resources and\nusing the embedded [Apache Ignite](https://ignite.apache.org/) clustering engine, Nextflow handles\nthe submission of processes to the individual nodes.\n\nHere is some examples of how to run the same Nextflow project over multiple platforms.\n\n#### Local\n\nIf I wished to launch a job locally I can run it with the command:\n\n nextflow run myproject.nf\n\n#### Univa Grid Engine (UGE)\n\nFor the UGE I simply needed to specify the following in the `nextflow.config` file:\n\n process {\n executor='uge'\n queue='my_queue'\n }\n\nAnd then launch the pipeline execution as we did before:\n\n nextflow run myproject.nf\n\n#### Load Sharing Facility (LSF)\n\nFor running the same pipeline in the MareNostrum HPC environment, taking advantage of the MPI\nstandard to deploy my workload, I first created a wrapper script (for example `bsc-wrapper.sh`)\ndeclaring the resources that I want to reserve for the pipeline execution:\n\n #!/bin/bash\n #BSUB -oo logs/output_%J.out\n #BSUB -eo logs/output_%J.err\n #BSUB -J myProject\n #BSUB -q bsc_ls\n #BSUB -W 2:00\n #BSUB -x\n #BSUB -n 512\n #BSUB -R \"span[ptile=16]\"\n export NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)\n mpirun --pernode bin/nextflow run concMSA.nf -with-mpi\n\nAnd then can execute it using `bsub` as shown below:\n\n bsub < bsc-wrapper.sh\n\nBy running Nextflow in this way and given the wrapper above, a single `bsub` job will run\non 512 cores in 32 computing nodes (512/16 = 32) with a maximum wall time of 2 hours.\nThousands of Nextflow processes can be spawned during this and the execution can be monitored\nin the standard manner from a single Nextflow output and error files. If any errors occur\nthe execution can of course to continued with [`-resume` command line option](/docs/latest/getstarted.html?highlight=resume#modify-and-resume).\n\n### Conclusion\n\nNextflow provides a simplified way to develop across multiple platforms and removes\nmuch of the overhead associated with running niche, user developed pipelines in an HPC\nenvironment.\n", + "images": [] + }, + { + "slug": "2016/docker-for-dunces-nextflow-for-nunces", + "title": "Docker for dunces & Nextflow for nunces", + "date": "2016-06-10T00:00:00.000Z", + "content": "\n_Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution._\n\nModern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.\n\nDuring several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.\n\nCreating the Nextflow pipeline ([here](http://www.github.com/cbcrg/lncrna-annotation-nf)) in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.\n\nThe pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the _long_ dependency tree. The solution was to turn to [Docker](https://www.docker.com/) containers.\n\nI have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.\n\n###Docker Installation\n\nBy far the most challenging issue is the installation of Docker. For local installations, the [process is relatively straight forward](https://docs.docker.com/engine/installation). However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with [updates addressing these issues](https://blog.docker.com/2016/02/docker-engine-1-10-security/).\n\nThat being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.\n\nIf you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.\n\n docker pull cbcrg/lncrna_annotation\n nextflow run cbcrg/lncrna-annotation-nf -profile test\n\n[If the following does not work, there could be a problem with your Docker installation.]\n\nThe first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and\nrun it using the test data included with it.\n\n###The Dockerfile\n\nThe `Dockerfile` contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules.\n\nWe begin by creating a file `Dockerfile` in the Nextflow project directory. The Dockerfile begins with:\n\n # Set the base image to debian jessie\n FROM debian:jessie\n\n # File Author / Maintainer\n MAINTAINER Evan Floden \n\nThis sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.\n\nNext we update the repository sources and install some essential tools such as `wget` and `perl`.\n\n RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...\n\nNotice that we use the command `RUN` before each line. The `RUN` instruction executes commands as if they are performed from the Linux shell.\n\nAlso is good practice to group as many as possible commands in the same `RUN` statement. This reduces the size of the final Docker image. See [here](https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/) for these details and [here](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/) for more best practices.\n\nNext we can specify the install of the required perl modules using [cpan minus](http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo):\n\n # Install perl modules\n RUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...\n\nWe can give the instructions to download and install software from GitHub using:\n\n # Install Star Mapper\n RUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR\n\nWe can add custom Perl modules and specify environmental variables such as `PERL5LIB` as below:\n\n # Install FEELnc\n RUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\n ENV FEELNCPATH /FEELnc\n ENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/\n\nR and R libraries can be installed as follows:\n\n # Install R\n RUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\n apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\n apt-get update --fix-missing && \\\n apt-get -y install r-base\n\n # Install R libraries\n RUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'\n\nFor the complete working Dockerfile of this project see [here](https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile)\n\n###Building the Docker Image\n\nOnce we start working on the Dockerfile, we can build it anytime using:\n\n docker build -t skptic/lncRNA_annotation .\n\nThis builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.\n\n###Testing the Docker Image\n\nWe find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:\n\n docker run -ti lncrna_annotation\n\n cd FEELnc/test\n\n FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n > -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\n exit # remember to exit the Docker image\n\n###Tagging the Docker Image\n\nOnce you are confident your image is built correctly, you can tag it, allowing you to push it to [Dockerhub.io](https://hub.docker.com/). Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.\n\nYou can view the images in your local repository with the `docker images` command and tag using `docker tag` with the image ID and the name.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n docker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest\n\nNow when we check our local images we can see the updated tag.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n cbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n###Pushing the Docker Image to Dockerhub\n\nIf you have not previously, sign up for a Dockerhub account [here](https://hub.docker.com/). From the command line, login to Dockerhub and push your image.\n\n docker login --username=cbcrg\n docker push cbcrg/lncrna_annotation\n\nYou can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:\n\n docker rmi -f d8ec49cbe3ed\n\n # Ensure the local version is not listed.\n docker images\n\n docker pull cbcrg/lncrna_annotation\n\nWe are now almost ready to run our pipeline. The last step is to set up the Nexflow config.\n\n###Nextflow Configuration\n\nWithin the `nextflow.config` file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:\n\n- General (same docker image for all processes):\n\n process {\n container = 'cbcrg/lncrna_annotation'\n }\n\n- Specific to a profile (specified by `-profile crg` for example):\n\n profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n\n- Specific to a given process within a pipeline:\n\n $processName.container = 'cbcrg/lncrna_annotation'\n\nIn most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have [previously written about this](https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html), but briefly, including a hash ensures that not a single byte of the operating system or software is different.\n\n process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }\n\nAll that is left now to run the pipeline.\n\n nextflow run lncRNA-Annotation-nf -profile test\n\nWhilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.\n\n###CircleCI and Nextflow\n\nNow that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the [CircleCI](http://www.circleci.com) servers (using Docker).\n\nTo include CircleCI in the Nexflow pipeline, create a file named `circle.yml` in the project directory. We add the following instructions to the file:\n\n machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\n dependencies:\n override:\n\n test:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test\n\nNext you can sign up to CircleCI, linking your GitHub account.\n\nWithin the GitHub README.md you can add a badge with the following:\n\n ![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)\n\n###Tips and Tricks\n\n**File permissions**: When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:\n\nfind -type f -exec chmod 644 {} \\;\nfind -type d -exec chmod 755 {} \\;\n\n###Summary\nThis was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform.\n", + "images": [] + }, + { + "slug": "2016/enabling-elastic-computing-nextflow", + "title": "Enabling elastic computing with Nextflow", + "date": "2016-10-19T00:00:00.000Z", + "content": "\n

\nLearn how to deploy an elastic computing cluster in the AWS cloud with Nextflow \n

\n\nIn the [previous post](/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html) I introduced\nthe new cloud native support for AWS provided by Nextflow.\n\nIt allows the creation of a computing cluster in the cloud in a no-brainer way, enabling\nthe deployment of complex computational pipelines in a few commands.\n\nThis solution is characterised by using a lean application stack which does not\nrequire any third party component installed in the EC2 instances other than a Java VM and the\nDocker engine (the latter it's only required in order to deploy pipeline binary dependencies).\n\n![Nextflow cloud deployment](/img/cloud-deployment.png)\n\nEach EC2 instance runs a script, at bootstrap time, that mounts the [EFS](https://aws.amazon.com/efs/)\nstorage and downloads and launches the Nextflow cluster daemon. This daemon is self-configuring,\nit automatically discovers the other running instances and joins them forming the computing cluster.\n\nThe simplicity of this stack makes it possible to setup the cluster in the cloud in just a few minutes,\na little more time than is required to spin up the EC2 VMs. This time does not depend on\nthe number of instances launched, as they configure themself independently.\n\nThis also makes it possible to add or remove instances as needed, realising the [long promised\nelastic scalability](http://www.nextplatform.com/2016/09/21/three-great-lies-cloud-computing/)\nof cloud computing.\n\nThis ability is even more important for bioinformatic workflows, which frequently crunch\nnot homogeneous datasets and are composed of tasks with very different computing requirements\n(eg. a few very long running tasks and many short-lived tasks in the same workload).\n\n### Going elastic\n\nThe Nextflow support for the cloud features an elastic cluster which is capable of resizing itself\nto adapt to the actual computing needs at runtime, thus spinning up new EC2 instances when jobs\nwait for too long in the execution queue, or terminating instances that are not used for\na certain amount of time.\n\nIn order to enable the cluster autoscaling you will need to specify the autoscale\nproperties in the `nextflow.config` file. For example:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.xlarge'\n\n autoscale {\n enabled = true\n minInstances = 5\n maxInstances = 10\n }\n}\n```\n\nThe above configuration enables the autoscaling features so that the cluster will include\nat least 5 nodes. If at any point one or more tasks spend more than 5 minutes without being\nprocessed, the number of instances needed to fullfil the pending tasks, up to limit specified\nby the `maxInstances` attribute, are launched. On the other hand, if these instances are\nidle, they are terminated before reaching the 60 minutes instance usage boundary.\n\nThe autoscaler launches instances by using the same AMI ID and type specified in the `cloud`\nconfiguration. However it is possible to define different attributes as shown below:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.large'\n\n autoscale {\n enabled = true\n maxInstances = 10\n instanceType = 'm4.2xlarge'\n spotPrice = 0.05\n }\n}\n```\n\nThe cluster is first created by using instance(s) of type `m4.large`. Then, when new\ncomputing nodes are required the autoscaler launches instances of type `m4.2xlarge`.\nAlso, since the `spotPrice` attribute is specified, [EC2 spot](https://aws.amazon.com/ec2/spot/)\ninstances are launched, instead of regular on-demand ones, bidding for the price specified.\n\n### Conclusion\n\nNextflow implements an easy though effective cloud scheduler that is able to scale dynamically\nto meet the computing needs of deployed workloads taking advantage of the _elastic_ nature\nof the cloud platform.\n\nThis ability, along the support for spot/preemptible instances, allows a cost effective solution\nfor the execution of your pipeline in the cloud.\n", + "images": [] + }, + { + "slug": "2016/error-recovery-and-automatic-resources-management", + "title": "Error recovery and automatic resource management with Nextflow", + "date": "2016-02-11T00:00:00.000Z", + "content": "\nRecently a new feature has been added to Nextflow that allows failing jobs to be rescheduled,\nautomatically increasing the amount of computational resources requested.\n\n## The problem\n\nNextflow provides a mechanism that allows tasks to be automatically re-executed when\na command terminates with an error exit status. This is useful to handle errors caused by\ntemporary or even permanent failures (i.e. network hiccups, broken disks, etc.) that\nmay happen in a cloud based environment.\n\nHowever in an HPC cluster these events are very rare. In this scenario\nerror conditions are more likely to be caused by a peak in computing resources, allocated\nby a job exceeding the original resource requested. This leads to the batch scheduler\nkilling the job which in turn stops the overall pipeline execution.\n\nIn this context automatically re-executing the failed task is useless because it\nwould simply replicate the same error condition. A common solution consists of increasing\nthe resource request for the needs of the most consuming job, even though this will result\nin a suboptimal allocation of most of the jobs that are less resource hungry.\n\nMoreover it is also difficult to predict such upper limit. In most cases the only way to\ndetermine it is by using a painful fail-and-retry approach.\n\nTake in consideration, for example, the following Nextflow process:\n\n process align {\n executor 'sge'\n memory 1.GB\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nThe above definition will execute as many jobs as there are fasta files emitted\nby the `sequences` channel. Since the `retry` _error strategy_ is specified, if the\ntask returns a non-zero error status, Nextflow will reschedule the job execution requesting\nthe same amount of memory and disk storage. In case the error is generated by `t_coffee` that\nit needs more than one GB of memory for a specific alignment, the task will continue to fail,\nstopping the pipeline execution as a consequence.\n\n## Increase job resources automatically\n\nA better solution can be implemented with Nextflow which allows resources to be defined in\na dynamic manner. By doing this it is possible to increase the memory request when\nrescheduling a failing task execution. For example:\n\n process align {\n executor 'sge'\n memory { 1.GB * task.attempt }\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nIn the above example the memory requirement is defined by using a dynamic rule.\nThe `task.attempt` attribute represents the current task attempt (`1` the first time the task\nis executed, `2` the second and so on).\n\nThe task will then request one GB of memory. In case of an error it will be rescheduled\nrequesting 2 GB and so on, until it is executed successfully or the limit of times a task\ncan be retried is reached, forcing the termination of the pipeline.\n\nIt is also possible to define the `errorStrategy` directive in a dynamic manner. This\nis useful to re-execute failed jobs only if a certain condition is verified.\n\nFor example the Univa Grid Engine batch scheduler returns the exit status `140` when a job\nis terminated because it's using more resources than the ones requested.\n\nBy checking this exit status we can reschedule only the jobs that fail by exceeding the\nresources allocation. This can be done with the following directive declaration:\n\n errorStrategy { task.exitStatus == 140 ? 'retry' : 'terminate' }\n\nIn this way a failed task is rescheduled only when it returns the `140` exit status.\nIn all other cases the pipeline execution is terminated.\n\n## Conclusion\n\nNextflow provides a very flexible mechanism for defining the job resource request and\nhandling error events. It makes it possible to automatically reschedule failing tasks under\ncertain conditions and to define job resource requests in a dynamic manner so that they\ncan be adapted to the actual job's needs and to optimize the overall resource utilisation.\n", + "images": [] + }, + { + "slug": "2016/more-fun-containers-hpc", + "title": "More fun with containers in HPC", + "date": "2016-12-20T00:00:00.000Z", + "content": "\nNextflow was one of the [first workflow framework](https://www.nextflow.io/blog/2014/nextflow-meets-docker.html)\nto provide built-in support for Docker containers. A couple of years ago we also started\nto experiment with the deployment of containerised bioinformatic pipelines at CRG,\nusing Docker technology (see [here](<(https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html)>) and [here](https://www.nextplatform.com/2016/01/28/crg-goes-with-the-genomics-flow/)).\n\nWe found that by isolating and packaging the complete computational workflow environment\nwith the use of Docker images, radically simplifies the burden of maintaining complex\ndependency graphs of real workload data analysis pipelines.\n\nEven more importantly, the use of containers enables replicable results with minimal effort\nfor the system configuration. The entire computational environment can be archived in a\nself-contained executable format, allowing the replication of the associated analysis at\nany point in time.\n\nThis ability is the main reason that drove the rapid adoption of Docker in the bioinformatic\ncommunity and its support in many projects, like for example [Galaxy](https://galaxyproject.org),\n[CWL](http://commonwl.org), [Bioboxes](http://bioboxes.org), [Dockstore](https://dockstore.org) and many others.\n\nHowever, while the popularity of Docker spread between the developers, its adaption in\nresearch computing infrastructures continues to remain very low and it's very unlikely\nthat this trend will change in the future.\n\nThe reason for this resides in the Docker architecture, which requires a daemon running\nwith root permissions on each node of a computing cluster. Such a requirement raises many\nsecurity concerns, thus good practices would prevent its use in shared HPC cluster or\nsupercomputer environments.\n\n### Introducing Singularity\n\nAlternative implementations, such as [Singularity](http://singularity.lbl.gov), have\nfortunately been promoted by the interested in containers technology.\n\nSingularity is a containers engine developed at the Berkeley Lab and designed for the\nneeds of scientific workloads. The main differences with Docker are: containers are file\nbased, no root escalation is allowed nor root permission is needed to run a container\n(although a privileged user is needed to create a container image), and there is no\nseparate running daemon.\n\nThese, along with other features, such as support for autofs mounts, makes Singularity a\ncontainer engine better suited to the requirements of HPC clusters and supercomputers.\n\nMoreover, although Singularity uses a container image format different to that of Docker,\nthey provide a conversion tool that allows Docker images to be converted to the\nSingularity format.\n\n### Singularity in the wild\n\nWe integrated Singularity support in Nextflow framework and tested it in the CRG\ncomputing cluster and the BSC [MareNostrum](https://www.bsc.es/discover-bsc/the-centre/marenostrum) supercomputer.\n\nThe absence of a separate running daemon or image gateway made the installation\nstraightforward when compared to Docker or other solutions.\n\nTo evaluate the performance of Singularity we carried out the [same benchmarks](https://peerj.com/articles/1273/)\nwe performed for Docker and compared the results of the two engines.\n\nThe benchmarks consisted in the execution of three Nextflow based genomic pipelines:\n\n1. [Rna-toy](https://github.com/nextflow-io/rnatoy/tree/peerj5515): a simple pipeline for RNA-Seq data analysis.\n2. [Nmdp-Flow](https://github.com/nextflow-io/nmdp-flow/tree/peerj5515/): an assembly-based variant calling pipeline.\n3. [Piper-NF](https://github.com/cbcrg/piper-nf/tree/peerj5515): a pipeline for the detection and mapping of long non-coding RNAs.\n\nIn order to repeat the analyses, we converted the container images we used to perform\nthe Docker benchmarks to Singularity image files by using the [docker2singularity](https://github.com/singularityware/docker2singularity) tool\n_(this is not required anymore, see the update below)_.\n\nThe only change needed to run these pipelines with Singularity was to replace the Docker\nspecific settings with the following ones in the configuration file:\n\n singularity.enabled = true\n process.container = ''\n\nEach pipeline was executed 10 times, alternately by using Docker and Singularity as\ncontainer engine. The results are shown in the following table (time in minutes):\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PipelineTasksMean task timeMean execution timeExecution time std devRatio
  SingularityDockerSingularityDockerSingularityDocker 
RNA-Seq973.773.6663.6662.32.03.10.998
Variant call4822.122.41061.21074.443.138.51.012
Piper-NF981.21.3120.0124.56.9 2.81.038
\n\nThe benchmark results show that there isn't any significative difference in the\nexecution times of containerised workflows between Docker and Singularity. In two\ncases Singularity was slightly faster and a third one it was almost identical although\na little slower than Docker.\n\n### Conclusion\n\nIn our evaluation Singularity proved to be an easy to install,\nstable and performant container engine.\n\nThe only minor drawback, we found when compared to Docker, was the need to define the\nhost path mount points statically when the Singularity images were created. In fact,\neven if Singularity supports user mount points to be defined dynamically when the\ncontainer is launched, this feature requires the overlay file system which was not\nsupported by the kernel available in our system.\n\nDocker surely will remain the _de facto_ standard engine and image format for containers\ndue to its popularity and [impressive growth](http://www.coscale.com/blog/docker-usage-statistics-increased-adoption-by-enterprises-and-for-production-use).\n\nHowever, in our opinion, Singularity is the tool of choice for the execution of\ncontainerised workloads in the context of HPC, thanks to its focus on system security\nand its simpler architectural design.\n\nThe transparent support provided by Nextflow for both Docker and Singularity technology\nguarantees the ability to deploy your workflows in a range of different platforms (cloud,\ncluster, supercomputer, etc). Nextflow transparently manages the deployment of the\ncontainerised workload according to the runtime available in the target system.\n\n#### Credits\n\nThanks to Gabriel Gonzalez (CRG), Luis Exposito (CRG) and Carlos Tripiana Montes (BSC)\nfor the support installing Singularity.\n\n**Update** Singularity, since version 2.3.x, is able to pull and run Docker images from the Docker Hub.\nThis greatly simplifies the interoperability with existing Docker containers. You only need\nto prefix the image name with the `docker://` pseudo-protocol to download it as a Singularity image,\nfor example:\n\n singularity pull --size 1200 docker://nextflow/rnatoy\n", + "images": [] + }, + { + "slug": "2017/caw-and-singularity", + "title": "Running CAW with Singularity and Nextflow", + "date": "2017-11-16T00:00:00.000Z", + "content": "\nThis is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max\ndescribes how they deploy complex cancer data analysis pipelines using Nextflow\nand Singularity. We are very happy to share their experience across the Nextflow community.\n\n### The CAW pipeline\n\n\"Cancer\n\n[Cancer Analysis Workflow](http://opensource.scilifelab.se/projects/sarek/) (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs.\nIt is developed in collaboration with two infrastructures within [Science for Life Laboratory](https://www.scilifelab.se/): [National Genomics Infrastructure](https://ngisweden.scilifelab.se/) (NGI), in The Stockholm [Genomics Applications Development Facility](https://www.scilifelab.se/facilities/ngi-stockholm/) to be precise and [National Bioinformatics Infrastructure Sweden](https://www.nbis.se/) (NBIS).\n\nCAW is based on [GATK Best Practices](https://software.broadinstitute.org/gatk/best-practices/) for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels ([MuTect1](https://github.com/broadinstitute/mutect/), [MuTect2](https://github.com/broadgsa/gatk-protected/), [Strelka](https://github.com/Illumina/strelka/), [Freebayes](https://github.com/ekg/freebayes/)), ([GATK HaplotyeCaller](https://github.com/broadgsa/gatk-protected/)), for structural variants([Manta](https://github.com/Illumina/manta/)) and for CNVs ([ASCAT](https://github.com/Crick-CancerGenomics/ascat/)).\nAnnotation tools ([snpEff](http://snpeff.sourceforge.net/), [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)) are also used, and finally [MultiQC](http://multiqc.info/) for handling reports.\n\nWe are currently working on a manuscript, but you're welcome to look at (or even contribute to) our [github repository](https://github.com/SciLifeLab/CAW/) or talk with us on our [gitter channel](https://gitter.im/SciLifeLab/CAW/).\n\n### Singularity and UPPMAX\n\n[Singularity](http://singularity.lbl.gov/) is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.\n\nWe're based in Sweden, and [Uppsala Multidisciplinary Center for Advanced Computational Science](https://uppmax.uu.se/) (UPPMAX) provides Computational infrastructures for all Swedish researchers.\nSince we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: [SNIC-SENS](https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/).\n\nIn my case, since we're still developing the pipeline, I am mainly using the research cluster [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/).\nSo I can only transfer files and data in one specific repository using SFTP.\n\nUPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time.\nTypically, [Environment Modules](http://modules.sourceforge.net/) are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.\n\n### Why use containers?\n\nThe idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow.\nWe cannot use [Docker](https://www.docker.com/) on our secure cluster, so we wanted to run CAW with [Singularity](http://singularity.lbl.gov/) images instead.\n\n### How was the switch made?\n\nWe were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process.\nBecause this process is quite slow, repetitive and I'm lazy like to automate everything, I made a simple NF [script](https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf) to build and push all docker containers.\nBasically it's just `build` and `pull` for all containers, with some configuration possibilities.\n\n```\ndocker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}\n```\n\nSince Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.\n\n```\nsingularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}\n```\n\nAfter this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile.\nI'll spare you the details of the SFTP transfer.\nThis is what the configuration file for such Singularity images looks like: [`singularity-path.config`](https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config)\n\n```\n/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}\n```\n\nThis approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...\n\n### Conclusion\n\nThis switch was completed a couple of months ago and has been a great success.\nWe are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI.\nEven if we do enjoy the improved control, we must not forgot that:\n\n> With great power comes great responsibility!\n\n### Credits\n\nThanks to [Rickard Hammarén](https://github.com/Hammarn) and [Phil Ewels](http://phil.ewels.co.uk/) for comments and suggestions for improving the post.\n", + "images": [ + "/img/CAW_logo.png" + ] + }, + { + "slug": "2017/nextflow-and-cwl", + "title": "Nextflow and the Common Workflow Language", + "date": "2017-07-20T00:00:00.000Z", + "content": "\nThe Common Workflow Language ([CWL](http://www.commonwl.org/)) is a specification for defining\nworkflows in a declarative manner. It has been implemented to varying degrees\nby different software packages. Nextflow and CWL share a common goal of enabling portable\nreproducible workflows.\n\nWe are currently investigating the automatic conversion of CWL workflows into Nextflow scripts\nto increase the portability of workflows. This work is being developed as\nthe [cwl2nxf](https://github.com/nextflow-io/cwl2nxf) project, currently in early prototype stage.\n\nOur first phase of the project was to determine mappings of CWL to Nextflow and familiarize\nourselves with how the current implementation of the converter supports a number of CWL specific\nfeatures.\n\n### Mapping CWL to Nextflow\n\nInputs in the CWL workflow file are initially parsed as _channels_ or other Nextflow input types.\nEach step specified in the workflow is then parsed independently. At the time of writing\nsubworkflows are not supported, each step must be a CWL `CommandLineTool` file.\n\nThe image below shows an example of the major components in the CWL files and then post-conversion (click to zoom).\n\n[![Nextflow CWL conversion](/img/cwl2nxf-min.png)](/img/cwl2nxf-min.png)\n\nCWL and Nextflow share a similar structure of defining inputs and outputs as shown above.\n\nA notable difference between the two is how tasks are defined. CWL requires either a separate\nfile for each task or a sub-workflow. CWL also requires the explicit mapping of each command\nline option for an executed tool. This is done using YAML meta-annotation to indicate the position, prefix, etc.\nfor each command line option.\n\nIn Nextflow a task command is defined as a separated component in the `process` definition and\nit is ultimately a multiline string which is interpreted by a command script by the underlying\nsystem. Input parameters can be used in the command string with a simple variable interpolation\nmechanism. This is beneficial as it simplifies porting existing BASH scripts to Nextflow\nwith minimal refactoring.\n\nThese examples highlight some of the differences between the two approaches, and the difficulties\nconverting complex use cases such as scatter, CWL expressions, and conditional command line inclusion.\n\n### Current status\n\nThe cwl2nxf is a Groovy based tool with a limited conversion ability. It parses the\nYAML documents and maps the various CWL objects to Nextflow. Conversion examples are\nprovided as part of the repository along with documentation for each example specifying the mapping.\n\nThis project was initially focused on developing an understanding of how to translate CWL to Nextflow.\nA number of CWL specific features such as scatter, secondary files and simple JavaScript expressions\nwere analyzed and implemented.\n\nThe GitHub repository includes instructions on how to build cwl2nxf and an example usage.\nThe tool can be executed as either just a parser printing the converted CWL to stdout,\nor by specifying an output file which will generate the Nextflow script file and if necessary\na config file.\n\nThe tool takes in a CWL workflow file and the YAML inputs file. It does not currently work\nwith a standalone `CommandLineTool`. The following example show how to run it:\n\n```\njava -jar build/libs/cwl2nxf-*.jar rnatoy.cwl samp.yaml\n```\n\n
\nSee the GitHub [repository](https://github.com/nextflow-io/cwl2nxf) for further details.\n\n### Conclusion\n\nWe are continuing to investigate ways to improve the interoperability of Nextflow with CWL.\nAlthough still an early prototype, the cwl2nxf tool provides some level of conversion of CWL to Nextflow.\n\nWe are also planning to explore [CWL Avro](https://github.com/common-workflow-language/cwlavro),\nwhich may provide a more efficient way to parse and handle CWL objects for conversion to Nextflow.\n\nAdditionally, a number of workflows in the GitHub repository have been implemented in both\nCWL and Nextflow which can be used as a comparison of the two languages.\n\nThe Nextflow team will be presenting a short talk and participating in the Codefest at [BOSC 2017](https://www.open-bio.org/wiki/BOSC_2017).\nWe are interested in hearing from the community regarding CWL to Nextflow conversion, and would like\nto encourage anyone interested to contribute to the cwl2nxf project.\n", + "images": [] + }, + { + "slug": "2017/nextflow-hack17", + "title": "Nexflow Hackathon 2017", + "date": "2017-09-30T00:00:00.000Z", + "content": "\nLast week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation\n(CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as\nwell as two hackathon sessions for more advanced users.\n\nNearly 50 participants attended over the two days which included an entertaining tapas course\nduring the first evening!\n\nOne of the main objectives of the event was to bring together Nextflow users to work\ntogether on common interest projects. There were several proposals for the hackathon\nsessions and in the end five diverse ideas were chosen for communal development ranging from\nnew pipelines through to the addition of new features in Nextflow.\n\nThe proposals and outcomes of each the projects, which can be found in the issues section\nof [this GitHub repository](https://github.com/nextflow-io/hack17), have been summarised below.\n\n### Nextflow HTML tracing reports\n\nThe HTML tracing project aims to generate a rendered version of the Nextflow trace file to\nenable fast sorting and visualisation of task/process execution statistics.\n\nCurrently the data in the trace includes information such as CPU duration, memory usage and\ncompletion status of each task, however wading through the file is often not convenient\nwhen a large number of tasks have been executed.\n\n[Phil Ewels](https://github.com/ewels) proposed the idea and led the coordination effort\nwith the outcome being a very impressive working prototype which can be found in the Nextflow\nbranch `html-trace`.\n\nAn image of the example report is shown below with the interactive HTML available\n[here](/misc/nf-trace-report.html). It is expected to be merged into the main branch of Nextflow\nwith documentation in a near-future release.\n\n![Nextflow HTML execution report](/img/nf-trace-report-min.png)\n\n### Nextflow pipeline for 16S microbial data\n\nThe H3Africa Bioinformatics Network have been developing several pipelines which are used\nacross the participating centers. The diverse computing resources available across the nodes has led to\nmembers wanting workflow solutions with a particular focus on portability.\n\nWith this is mind, Scott Hazelhurst proposed a project for a 16S Microbial data analysis\npipeline which had [previously been developed using CWL](https://github.com/h3abionet/h3abionet16S/tree/master).\n\nThe participants made a new [branch](https://github.com/h3abionet/h3abionet16S/tree/nextflow)\nof the original pipeline and ported it into Nextflow.\n\nThe pipeline will continue to be developed with the goal of acting as a comparison between\nCWL and Nextflow. It is thought this can then be extended to other pipelines by both those\nwho are already familiar with Nextflow as well as used as a tool for training newer users.\n\n### Nextflow modules prototyping\n\n_Toolboxing_ allows users to incorporate software into their pipelines in an efficient and\nreproducible manner. Various software repositories are becoming increasing popular,\nhighlighted by the over 5,000 tools available in the [Galaxy Toolshed](https://toolshed.g2.bx.psu.edu/).\n\nProjects such as [Biocontainers](http://biocontainers.pro/) aim to wrap up the execution\nenvironment using containers. [Myself](https://github.com/skptic) and [Johan Viklund](https://github.com/viklund)\nwished to piggyback off existing repositories and settled on [Dockstore](https://dockstore.org)\nwhich is an open platform compliant with the [GA4GH](http://genomicsandhealth.org) initiative.\n\nThe majority of tools in Dockstore are written in the CWL and therefore we required a parser\nbetween the CWL CommandLineTool class and Nextflow processes. Johan was able to develop\na parser which generates Nextflow processes for several Dockstore tools.\n\nAs these resources such as Dockstore become mature and standardised, it will be\npossible to automatically generate a _Nextflow Store_ and enable efficient incorporation\nof tools into workflows.\n\n\n\n_Example showing a Nextflow process generated from the Dockstore CWL repository for the tool BAMStats._\n\n### Nextflow pipeline for de novo assembly of nanopore reads\n\n[Nanopore sequencing](https://en.wikipedia.org/wiki/Nanopore_sequencing) is an exciting\nand emerging technology which promises to change the landscape of nucleotide sequencing.\n\nWith keen interest in Nanopore specific pipelines, [Hadrien Gourlé](https://github.com/HadrienG)\nlead the hackathon project for _Nanoflow_.\n\n[Nanoflow](https://github.com/HadrienG/nanoflow) is a de novo assembler of bacterials genomes\nfrom nanopore reads using Nextflow.\n\nDuring the two days the participants developed the pipeline for adapter trimming as well\nas assembly and consensus sequence generation using either\n[Canu](https://github.com/marbl/canu) and [Miniasm](https://github.com/lh3/miniasm).\n\nThe future plans are to finalise the pipeline to include a polishing step and a genome\nannotation step.\n\n### Nextflow AWS Batch integration\n\nNextflow already has experimental support for [AWS Batch](https://aws.amazon.com/batch/)\nand the goal of this project proposed by [Francesco Strozzi](https://github.com/fstrozzi)\nwas to improve this support, add features and test the implementation on real world pipelines.\n\nEarlier work from [Paolo Di Tommaso](https://github.com/pditommaso) in the Nextflow\nrepository, highlighted several challenges to using AWS Batch with Nextflow.\n\nThe major obstacle described by [Tim Dudgeon](https://github.com/tdudgeon) was the requirement\nfor each Docker container to have a version of the Amazon Web Services Command Line tools\n(aws-cli) installed.\n\nA solution was to install the AWS CLI tools on a custom AWS image that is used by the\nDocker host machine, and then mount the directory that contains the necessary items into\neach of the Docker containers as a volume. Early testing suggests this approach works\nwith the hope of providing a more elegant solution in future iterations.\n\nThe code and documentation for AWS Batch has been prepared and will be tested further\nbefore being rolled into an official Nextflow release in the near future.\n\n### Conclusion\n\nThe event was seen as an overwhelming success and special thanks must be made to all the\nparticipants. As the Nextflow community continues to grow, it would be fantastic to make these types\nmeetings more regular occasions.\n\nIn the meantime we have put together a short video containing some of the highlights\nof the two days.\n\nWe hope to see you all again in Barcelona soon or at new events around the world!\n\n\n", + "images": [] + }, + { + "slug": "2017/nextflow-nature-biotech-paper", + "title": "Nextflow published in Nature Biotechnology", + "date": "2017-04-12T00:00:00.000Z", + "content": "\nWe are excited to announce the publication of our work _[Nextflow enables reproducible computational workflows](http://rdcu.be/qZVo)_ in Nature Biotechnology.\n\nThe article provides a description of the fundamental components and principles of Nextflow.\nWe illustrate how the unique combination of containers, pipeline sharing and portable\ndeployment provides tangible advantages to researchers wishing to generate reproducible\ncomputational workflows.\n\nReproducibility is a [major challenge](http://www.nature.com/news/reproducibility-1.17552)\nin today's scientific environment. We show how three bioinformatics data analyses produce\ndifferent results when executed on different execution platforms and how Nextflow, along\nwith software containers, can be used to control numerical stability, enabling consistent\nand replicable results across different computing platforms. As complex omics analyses\nenter the clinical setting, ensuring that results remain stable brings on extra importance.\n\nSince its first release three years ago, the Nextflow user base has grown in an organic fashion.\nFrom the beginning it has been our own demands in a workflow tool and those of our users that\nhave driven the development of Nextflow forward. The publication forms an important milestone\nin the project and we would like to extend a warm thank you to all those who have been early\nusers and contributors.\n\nWe kindly ask if you use Nextflow in your own work to cite the following article:\n\n
\nDi Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017).\nNextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319.\ndoi:10.1038/nbt.3820\n
\n", + "images": [] + }, + { + "slug": "2017/nextflow-workshop", + "title": "Nextflow workshop is coming!", + "date": "2017-04-26T00:00:00.000Z", + "content": "\nWe are excited to announce the first Nextflow workshop that will take place at the\nBarcelona Biomedical Research Park building ([PRBB](https://www.prbb.org/)) on 14-15th September 2017.\n\nThis event is open to everybody who is interested in the problem of computational workflow\nreproducibility. Leading experts and users will discuss the current state of the Nextflow\ntechnology and how it can be applied to manage -omics analyses in a reproducible manner.\nBest practices will be introduced on how to deploy real-world large-scale genomic\napplications for precision medicine.\n\nDuring the hackathon, organized for the second day, participants will have the\nopportunity to learn how to write self-contained, replicable data analysis\npipelines along with Nextflow expert developers.\n\nMore details at [this link](http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics).\nThe registration form is [available here](http://apps.crg.es/content/internet/events/webforms/17502) (deadline 15th Jun).\n\n### Schedule (draft)\n\n#### Thursday, 14 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
10.00Welcome & introduction
\n Cedric Notredame
\n Comparative Bioinformatics, CRG, Spain
10.15Nextflow: a quick review
\n Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain
10.30Standardising Swedish genomics analyses using Nextflow
\n Phil Ewels
\n National Genomics Infrastructure, SciLifeLab, Sweden
\n
11.00Building Pipelines to Support African Bioinformatics: the H3ABioNet Pipelines Project
\n Scott Hazelhurst
\n University of the Witwatersrand, Johannesburg, South Africa
\n
11.30coffee break\n
12.00Using Nextflow for Large Scale Benchmarking of Phylogenetic methods and tools
\n Frédéric Lemoine
\n Evolutionary Bioinformatics, Institut Pasteur, France
\n
12.30Nextflow for chemistry - crossing the divide
\n Tim Dudgeon
\n Informatics Matters Ltd, UK
\n
12.50From zero to Nextflow @ CRG's Biocore
\n Luca Cozzuto
\n Bioinformatics Core Facility, CRG, Spain
\n
13.10(to be determined)
13.30Lunch
14.30
18.30
Hackathon & course
\n\n#### Friday, 15 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
9.30Computational workflows for omics analyses at the IARC
\n Matthieu Foll
\n International Agency for Research on Cancer (IARC), France
10.00Medical Genetics at Oslo University Hospital
\n Hugues Fontanelle
\n Oslo University Hospital, Norway
10.30Inside-Out: reproducible analysis of external data, inside containers with Nextflow
\n Evan Floden
\n Comparative Bioinformatics, CRG, Spain
11.00coffee break
11.30(title to be defined)
\n Johnny Wu
\n Roche Sequencing, Pleasanton, USA
12.00Standardizing life sciences datasets to improve studies reproducibility in the EOSC
\n Jordi Rambla
\n European Genome-Phenome Archive, CRG
12.20Unbounded by Economics
\n Brendan Bouffler
\n AWS Research Cloud Program, UK
12.40Challenges with large-scale portable computational workflows
\n Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain
13.00Lunch
14.00
18.00
Hackathon
\n\n
\nSee you in Barcelona!\n\n![Nextflow workshop](/img/nf-workshop.png)\n", + "images": [] + }, + { + "slug": "2017/scaling-with-aws-batch", + "title": "Scaling with AWS Batch", + "date": "2017-11-08T00:00:00.000Z", + "content": "\nThe latest Nextflow release (0.26.0) includes built-in support for [AWS Batch](https://aws.amazon.com/batch/),\na managed computing service that allows the execution of containerised workloads\nover the Amazon EC2 Container Service (ECS).\n\nThis feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading\nthe process executions as managed Batch jobs. The service takes care to spin up the required\ncomputing instances on-demand, scaling up and down the number and composition of the instances\nto best accommodate the actual workload resource needs at any point in time.\n\nAWS Batch shares with Nextflow the same vision regarding workflow containerisation\ni.e. each compute task is executed in its own Docker container. This dramatically\nsimplifies the workflow deployment through the download of a few container images.\nThis common design background made the support for AWS Batch a natural extension for Nextflow.\n\n### Batch in a nutshell\n\nBatch is organised in _Compute Environments_, _Job queues_, _Job definitions_ and _Jobs_.\n\nThe _Compute Environment_ allows you to define the computing resources required for a specific workload (type).\nYou can specify the minimum and maximum number of CPUs that can be allocated,\nthe EC2 provisioning model (On-demand or Spot), the AMI to be used and the allowed instance types.\n\nThe _Job queue_ definition allows you to bind a specific task to one or more Compute Environments.\n\nThen, the _Job definition_ is a template for one or more jobs in your workload. This is required\nto specify the Docker image to be used in running a particular task along with other requirements\nsuch as the container mount points, the number of CPUs, the amount of memory and the number of\nretries in case of job failure.\n\nFinally the _Job_ binds a Job definition to a specific Job queue\nand allows you to specify the actual task command to be executed in the container.\n\nThe job input and output data management is delegated to the user. This means that if you\nonly use Batch API/tools you will need to take care to stage the input data from a S3 bucket\n(or a different source) and upload the results to a persistent storage location.\n\nThis could turn out to be cumbersome in complex workflows with a large number of\ntasks and above all it makes it difficult to deploy the same applications across different\ninfrastructure.\n\n### How to use Batch with Nextflow\n\nNextflow streamlines the use of AWS Batch by smoothly integrating it in its workflow processing\nmodel and enabling transparent interoperability with other systems.\n\nTo run Nextflow you will need to set-up in your AWS Batch account a [Compute Environment](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html)\ndefining the required computing resources and associate it to a [Job Queue](http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html).\n\nNextflow takes care to create the required _Job Definitions_ and _Job_ requests as needed.\nThis spares some Batch configurations steps.\n\nIn the `nextflow.config`, file specify the `awsbatch` executor, the Batch `queue` and\nthe container to be used in the usual manner. You may also need to specify the AWS region\nand access credentials if they are not provided by other means. For example:\n\n process.executor = 'awsbatch'\n process.queue = 'my-batch-queue'\n process.container = your-org/your-docker:image\n aws.region = 'eu-west-1'\n aws.accessKey = 'xxx'\n aws.secretKey = 'yyy'\n\nEach process can eventually use a different queue and Docker image (see Nextflow documentation for details).\nThe container image(s) must be published in a Docker registry that is accessible from the\ninstances run by AWS Batch eg. [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/)\nor [ECS Container Registry](https://aws.amazon.com/ecr/).\n\nThe Nextflow process can be launched either in a local computer or a EC2 instance.\nThe latter is suggested for heavy or long running workloads.\n\nNote that input data should be stored in the S3 storage. In the same manner\nthe pipeline execution must specify a S3 bucket as a working directory by using the `-w` command line option.\n\nA final caveat about custom containers and computing AMI. Nextflow automatically stages input\ndata and shares tasks intermediate results by using the S3 bucket specified as a work directory.\nFor this reason it needs to use the `aws` command line tool which must be installed either\nin your process container or be present in a custom AMI that can be mounted and accessed\nby the Docker containers.\n\nYou may also need to create a custom AMI because the default image used by AWS Batch only\nprovides 22 GB of storage which may not be enough for real world analysis pipelines.\n\nSee the documentation to learn [how to create a custom AMI](/docs/latest/awscloud.html#custom-ami)\nwith larger storage and how to setup the AWS CLI tools.\n\n### An example\n\nIn order to validate Nextflow integration with AWS Batch, we used a simple RNA-Seq pipeline.\n\nThis pipeline takes as input a metadata file from the Encode project corresponding to a [search\nreturning all human RNA-seq paired-end datasets](https://www.encodeproject.org/search/?type=Experiment&award.project=ENCODE&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&files.file_type=fastq&files.run_type=paired-ended&replicates.library.nucleic_acid_term_name=RNA&replicates.library.depleted_in_term_name=rRNA)\n(the metadata file has been additionally filtered to retain only data having a SRA ID).\n\nThe pipeline automatically downloads the FASTQ files for each sample from the EBI ENA database,\nit assesses the overall quality of sequencing data using FastQC and then runs [Salmon](https://combine-lab.github.io/salmon/)\nto perform the quantification over the human transcript sequences. Finally all the QC and\nquantification outputs are summarised using the [MultiQC](http://multiqc.info/) tool.\n\nFor the sake of this benchmark we used the first 38 samples out of the full 375 samples dataset.\n\nThe pipeline was executed both on AWS Batch cloud and in the CRG internal Univa cluster,\nusing [Singularity](/blog/2016/more-fun-containers-hpc.html) as containers runtime.\n\nIt's worth noting that with the exception of the two configuration changes detailed below,\nwe used exactly the same pipeline implementation at [this GitHub repository](https://github.com/nextflow-io/rnaseq-encode-nf).\n\nThe AWS deploy used the following configuration profile:\n\n aws.region = 'eu-west-1'\n aws.client.storageEncryption = 'AES256'\n process.queue = 'large'\n executor.name = 'awsbatch'\n executor.awscli = '/home/ec2-user/miniconda/bin/aws'\n\nWhile for the cluster deployment the following configuration was used:\n\n executor = 'crg'\n singularity.enabled = true\n process.container = \"docker://nextflow/rnaseq-nf\"\n process.queue = 'cn-el7'\n process.time = '90 min'\n process.$quant.time = '4.5 h'\n\n### Results\n\nThe AWS Batch Compute environment was configured to use a maximum of 132 CPUs as the number of CPUs\nthat were available in the queue for local cluster deployment.\n\nThe two executions ran in roughly the same time: 2 hours and 24 minutes when running in the\nCRG cluster and 2 hours and 37 minutes when using AWS Batch.\n\nIt must be noted that 14 jobs failed in the Batch deployment, presumably because one or more spot\ninstances were retired. However Nextflow was able to re-schedule the failed jobs automatically\nand the overall pipeline execution completed successfully, also showing the benefits of a truly\nfault tolerant environment.\n\nThe overall cost for running the pipeline with AWS Batch was **$5.47** ($ 3.28 for EC2 instances,\n$1.88 for EBS volume and $0.31 for S3 storage). This means that with ~ $55 we could have\nperformed the same analysis on the full Encode dataset.\n\nIt is more difficult to estimate the cost when using the internal cluster, because we don't\nhave access to such detailed cost accounting. However, as a user, we can estimate it roughly\ncomes out at $0.01 per CPU-Hour. The pipeline needed around 147 CPU-Hour to carry out the analysis,\nhence with an estimated cost of **$1.47** just for the computation.\n\nThe execution report for the Batch execution is available at [this link](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/aws-batch/report.html)\nand the one for cluster is available [here](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/crg-cluster/report.html).\n\n### Conclusion\n\nThis post shows how Nextflow integrates smoothly with AWS Batch and how it can be used to\ndeploy and execute real world genomics pipeline in the cloud with ease.\n\nThe auto-scaling ability provided by AWS Batch along with the use of spot instances make\nthe use of the cloud even more cost effective. Running on a local cluster may still be cheaper,\neven if it is non trivial to account for all the real costs of a HPC infrastructure.\nHowever the cloud allows flexibility and scalability not possible with common on-premises clusters.\n\nWe also demonstrate how the same Nextflow pipeline can be _transparently_ deployed in two very\ndifferent computing infrastructure, using different containerisation technologies by simply\nproviding a separate configuration profile.\n\nThis approach enables the interoperability across different deployment sites, reduces\noperational and maintenance costs and guarantees consistent results over time.\n\n### Credits\n\nThis post is co-authored with [Francesco Strozzi](https://twitter.com/fstrozzi),\nwho also helped to write the pipeline used for the benchmark in this post and contributed\nto and tested the AWS Batch integration. Thanks to [Emilio Palumbo](https://github.com/emi80)\nthat helped to set-up and configure the AWS Batch environment and [Evan Floden](https://gitter.im/skptic)\nfor the comments.\n", + "images": [] + }, + { + "slug": "2018/bringing-nextflow-to-google-cloud-wuxinextcode", + "title": "Bringing Nextflow to Google Cloud Platform with WuXi NextCODE", + "date": "2018-12-18T00:00:00.000Z", + "content": "\n
\nThis is a guest post authored by Halli Bjornsson, Head of Product Development Operations at WuXi NextCODE and Jonathan Sheffi, Product Manager, Biomedical Data at Google Cloud.\n\n
\n\nGoogle Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.\n\nWuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the [Google Pipelines API](https://cloud.google.com/genomics/pipelines). Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.\n\n
\n
\n \n
\n
\n \n
\n
\n\n\nNextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud.\n\n### Get started!\n\nThis feature is currently available in the Nextflow edge channel. Follow these steps to get started:\n\n- Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:\n\n ```\n export NXF_VER=18.12.0-edge\n export NXF_MODE=google\n curl https://get.nextflow.io | bash\n ```\n\n- [Enable the Google Genomics API for your GCP projects](https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com).\n\n- [Download and set credentials for your Genomics API-enabled project](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).\n\n- Change your `nextflow.config` file to use the Google Pipelines executor and specify the required config values for it as [described in the documentation](/docs/edge/google.html#google-pipelines).\n\n- Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the `-work-dir` option. For example:\n\n ```\n nextflow run rnaseq-nf -work-dir gs://your-bucket/scratch\n ```\n\n
\nYou can find more detailed info about available configuration settings and deployment options at [this link](/docs/edge/google.html).\n\nWe’re thrilled to make this contribution available to the Nextflow community!\n", + "images": [ + "/img/google-cloud.svg", + "/img/wuxi-nextcode.jpeg" + ] + }, + { + "slug": "2018/clarification-about-nextflow-license", + "title": "Clarification about the Nextflow license", + "date": "2018-07-20T00:00:00.000Z", + "content": "\nOver past week there was some discussion on social media regarding the Nextflow license\nand its impact on users' workflow applications.\n\n

… don’t use Nextflow, yo. https://t.co/Paip5W1wgG

— Konrad Rudolph 👨‍🔬💻 (@klmr) July 10, 2018
\n\n\n

This is certainly disappointing. An argument in favor of writing workflows in @commonwl, which is independent of the execution engine. https://t.co/mIbdLQQxmf

— John Didion (@jdidion) July 10, 2018
\n\n\n

GPL is generally considered toxic to companies due to fear of the viral nature of the license.

— Jeff Gentry (@geoffjentry) July 10, 2018
\n\n\n### What's the problem with GPL?\n\nNextflow has been released under the GPLv3 license since its early days [over 5 years ago](https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy).\nGPL is a very popular open source licence used by many projects\n(like, for example, [Linux](https://www.kernel.org/doc/html/v4.17/process/license-rules.html) and [Git](https://git-scm.com/about/free-and-open-source))\nand it has been designed to promote the adoption and spread of open source software and culture.\n\nWith this idea in mind, GPL requires the author of a piece of software, _derived_ from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.\n\nThis is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.\n\nHowever, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example\n[this post by Titus Brown](http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html) at this regard.\n\n#### What's the impact of the Nextflow license on my application?\n\nIf you are not distributing your application, based on Nextflow, it doesn't affect you in any way.\nIf you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See [here](https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic) and [here](https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL).\n\nHowever, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though.\n\n\n### That's all?\n\nNo. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.\n\nWe are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.\n", + "images": [] + }, + { + "slug": "2018/conda-support-has-landed", + "title": "Conda support has landed!", + "date": "2018-06-05T00:00:00.000Z", + "content": "\nNextflow aims to ease the development of large scale, reproducible workflows allowing\ndevelopers to focus on the main application logic and to rely on best community tools and\nbest practices.\n\nFor this reason we are very excited to announce that the latest Nextflow version (`0.30.0`) finally\nprovides built-in support for [Conda](https://conda.io/docs/).\n\nConda is a popular package manager that simplifies the installation of software packages\nand the configuration of complex software environments. Above all, it provides access to large\ntool and software package collections maintained by domain specific communities such as\n[Bioconda](https://bioconda.github.io) and [BioBuild](https://biobuilds.org/).\n\nThe native integration with Nextflow allows researchers to develop workflow applications\nin a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the\nconfiguration flexibility, portability and scalability provided by Nextflow.\n\n### How it works\n\nNextflow automatically creates and activates the Conda environment(s) given the dependencies\nspecified by each process.\n\nDependencies are specified by using the [conda](/docs/latest/process.html#conda) directive,\nproviding either the names of the required Conda packages, the path of a Conda environment yaml\nfile or the path of an existing Conda environment directory.\n\nConda environments are stored on the file system. By default Nextflow instructs Conda to save\nthe required environments in the pipeline work directory. You can specify the directory where the\nConda environments are stored using the `conda.cacheDir` configuration property.\n\n#### Use Conda package names\n\nThe simplest way to use one or more Conda packages consists in specifying their names using the `conda` directive.\nMultiple package names can be specified by separating them with a space. For example:\n\n```\nprocess foo {\n conda \"bwa samtools multiqc\"\n\n \"\"\"\n your_command --here\n \"\"\"\n}\n```\n\nUsing the above definition a Conda environment that includes BWA, Samtools and MultiQC tools\nis created and activated when the process is executed.\n\nThe usual Conda package syntax and naming conventions can be used. The version of a package can be\nspecified after the package name as shown here: `bwa=0.7.15`.\n\nThe name of the channel where a package is located can be specified prefixing the package with\nthe channel name as shown here: `bioconda::bwa=0.7.15`.\n\n#### Use Conda environment files\n\nWhen working in a project requiring a large number of dependencies it can be more convenient\nto consolidate all required tools using a Conda environment file. This is a file that\nlists the required packages and channels, structured using the YAML format. For example:\n\n```\nname: my-env\nchannels:\n - bioconda\n - conda-forge\n - defaults\ndependencies:\n - star=2.5.4a\n - bwa=0.7.15\n```\n\nThe path of the environment file can be specified using the `conda` directive:\n\n```\nprocess foo {\n conda '/some/path/my-env.yaml'\n\n '''\n your_command --here\n '''\n}\n```\n\nNote: the environment file name **must** end with a `.yml` or `.yaml` suffix otherwise\nit won't be properly recognized. Also relative paths are resolved against the workflow\nlaunching directory.\n\nThe suggested approach is to store the the Conda environment file in your project root directory\nand reference it in the `nextflow.config` directory using the `baseDir` variable as shown below:\n\n```\nprocess.conda = \"$baseDir/my-env.yaml\"\n```\n\nThis guarantees that the environment paths is correctly resolved independently of the execution path.\n\nSee the [documentation](/docs/latest/conda.html) for more details on how to configure and\nuse Conda environments in your Nextflow workflow.\n\n### Bonus!\n\nThis release includes also a better support for [Biocontainers](https://biocontainers.pro/). So far,\nNextflow users were able to use container images provided by the Biocontainers community. However,\nit was not possible to collect process metrics and runtime statistics within those images due to the usage\nof a legacy version of the `ps` system tool that is not compatible with the one expected by Nextflow.\n\nThe latest version of Nextflow does not require the `ps` tool any more to fetch execution metrics\nand runtime statistics, therefore this information is collected and correctly reported when using Biocontainers\nimages.\n\n### Conclusion\n\nWe are very excited by this new feature bringing the ability to use popular Conda tool collections,\nsuch as Bioconda, directly into Nextflow workflow applications.\n\nNextflow developers have now yet another option to transparently manage the dependencies in their\nworkflows along with [Environment Modules](/docs/latest/process.html#module) and [containers](/docs/latest/docker.html)\n[technology](/docs/latest/singularity.html), giving them great configuration flexibility.\n\nThe resulting workflow applications can easily be reconfigured and deployed across a range of different\nplatforms choosing the best technology according to the requirements of the target system.\n", + "images": [] + }, + { + "slug": "2018/goodbye-zero-hello-apache", + "title": "Goodbye zero, Hello Apache!", + "date": "2018-10-24T00:00:00.000Z", + "content": "\nToday marks an important milestone in the Nextflow project. We are thrilled to announce three important changes to better meet users’ needs and ground the project on a solid foundation upon which to build a vibrant ecosystem of tools and data analysis applications for genomic research and beyond.\n\n### Apache license\n\nNextflow was originally licensed as GPLv3 open source software more than five years ago. GPL is designed to promote the adoption and spread of open source software and culture. On the other hand it has also some controversial side-effects, such as the one on derivative works and legal implications which make the use of GPL released software a headache in many organisations. We have previously discussed these concerns in this blog post and, after community feedback, have opted to change the project license to Apache 2.0.\n\nThis is a popular permissive free software license written by the Apache Software Foundation (ASF). Software distributed with this license requires the preservation of the copyright notice and disclaimer. It allows the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software without dictating the licence terms of the resulting applications and derivative works. We are sure this licensing model addresses the concerns raised by the Nextflow community and will boost further project developments.\n\n### New release schema\n\nIn the time since Nextflow was open sourced, we have released 150 versions which have been used by many organizations to deploy critical production workflows on a large range of computational platforms and under heavy loads and stress conditions.\n\nFor example, at the Centre for Genomic Regulation (CRG) alone, Nextflow has been used to deploy data intensive computation workflows since 2014, and it has orchestrated the execution of over 12 million jobs totalling 1.4 million CPU-hours.\n\n\"Nextflow\n\nThis extensive use across different execution environments has resulted in a reliable software package, and it's therefore finally time to declare Nextflow stable and drop the zero from the version number!\n\nFrom today onwards, Nextflow will use a 3 monthly time-based _stable_ release cycle. Today's release is numbered as **18.10**, the next one will be on January 2019, numbered as 19.01, and so on. This gives our users a more predictable release cadence and allows us to better focus on new feature development and scheduling.\n\nAlong with the 3-months stable release cycle, we will provide a monthly _edge_ release, which will include access to the latest experimental features and developments. As such, it should only be used for evaluation and testing purposes.\n\n### Commercial support\n\nFinally, for organisations requiring commercial support, we have recently incorporated Seqera Labs, a spin-off of the Centre for Genomic Regulation.\n\nSeqera Labs will foster Nextflow adoption as professional open source software by providing commercial support services and exploring new innovative products and solutions.\n\nIt's important to highlight that Seqera Labs will not close or make Nextflow a commercial project. Nextflow is and will continue to be owned by the CRG and the other contributing organisations and individuals.\n\n### Conclusion\n\nThe Nextflow project has reached an important milestone. In the last five years it has grown and managed to become a stable technology used by thousands of people daily to deploy large scale workloads for life science data analysis applications and beyond. The project is now exiting from the experimental stage.\n\nWith the above changes we want to fulfil the needs of researchers, for a reliable tool enabling scalable and reproducible data analysis, along with the demand of production oriented users, who require reliable support and services for critical deployments.\n\nAbove all, our aim is to strengthen the community effort around the Nextflow ecosystem and make it a sustainable and solid technology in the long run.\n\n### Credits\n\nWe want to say thank you to all the people who have supported and contributed to this project to this stage. First of all to Cedric Notredame for his long term commitment to the project within the Comparative Bioinformatics group at CRG. The Open Bioinformatics Foundation (OBF) in the name of Chris Fields and The Ontario Institute for Cancer Research (OICR), namely Dr Lincoln Stein, for supporting the Nextflow change of license. The CRG TBDO department, and in particular Salvatore Cappadona for his continued support and advice. Finally, the user community who with their feedback and constructive criticism contribute everyday to make this project more stable, useful and powerful.\n", + "images": [ + "/img/nextflow-release-schema-01.png" + ] + }, + { + "slug": "2018/nextflow-meets-dockstore", + "title": "Nextflow meets Dockstore", + "date": "2018-09-18T00:00:00.000Z", + "content": "\n
\nThis post is co-authored with Denis Yuen, lead of the Dockstore project at the Ontario Institute for Cancer Research\n
\n\nOne key feature of Nextflow is the ability to automatically pull and execute a workflow application directly from a sharing platform such as GitHub. We realised this was critical to allow users to properly track code changes and releases and, above all, to enable the [seamless sharing of workflow projects](/blog/2016/best-practice-for-reproducibility.html).\n\nNextflow never wanted to implement its own centralised workflow registry because we thought that in order for a registry to be viable and therefore useful, it should be technology agnostic and it should be driven by a consensus among the wider user community.\n\nThis is exactly what the [Dockstore](https://dockstore.org/) project is designed for and for this reason we are thrilled to announce that Dockstore has just released the support for Nextflow workflows in its latest release!\n\n### Dockstore in a nutshell\n\nDockstore is an open platform that collects and catalogs scientific data analysis tools and workflows, starting from the genomics community. It’s developed by the [OICR](https://oicr.on.ca/) in collaboration with [UCSC](https://ucscgenomics.soe.ucsc.edu/) and it is based on the [GA4GH](https://www.ga4gh.org/) open standards and the FAIR principles i.e. the idea to make research data and applications findable, accessible, interoperable and reusable ([FAIR](https://www.nature.com/articles/sdata201618)).\n\n\"Dockstore\n\nIn Dockstore’s initial release of support for Nextflow, users will be able to register and display Nextflow workflows. Many of Dockstore’s cross-language features will be available such as [searching](https://dockstore.org/search?descriptorType=nfl&searchMode=files), displaying metadata information on authorship from Nextflow’s config ([author and description](https://www.nextflow.io/docs/latest/config.html?highlight=author#scope-manifest)), displaying the [Docker images](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=tools) used by a workflow, and limited support for displaying a visualization of the [workflow structure](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=dag).\n\nThe Dockstore team will initially work to on-board the high-quality [nf-core](https://github.com/nf-core) workflows curated by the Nextflow community. However, all developers that develop Nextflow workflows will be able to login, contribute, and maintain workflows starting with our standard [workflow tutorials](https://docs.dockstore.org/docs/publisher-tutorials/workflows/).\n\nMoving forward, the Dockstore team hopes to engage more with the Nextflow community and integrate Nextflow code in order to streamline the process of publishing Nextflow workflows and draw better visualizations of Nextflow workflows. Dockstore also hopes to work with a cloud vendor to add browser based launch-with support for Nextflow workflows.\n\nFinally, support for Nextflow workflows in Dockstore will also enable the possibility of cloud platforms that implement [GA4GH WES](https://github.com/ga4gh/workflow-execution-service-schemas) to run Nextflow workflows.\n\n### Conclusion\n\nWe welcome the support for Nextflow workflows in the Dockstore platform. This is a valuable contribution and presents great opportunities for workflow developers and the wider scientific community.\n\nWe invite all Nextflow developers to register their data analysis applications in the Dockstore platform to make them accessible and reusable to a wider community of researchers.\n", + "images": [ + "/img/dockstore.png" + ] + }, + { + "slug": "2018/nextflow-turns-5", + "title": "Nextflow turns five! Happy birthday!", + "date": "2018-04-03T00:00:00.000Z", + "content": "\nNextflow is growing up. The past week marked five years since the [first commit](https://github.com/nextflow-io/nextflow/commit/c080150321e5000a2c891e477bb582df07b7f75f) of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!\n\nA lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the [Gitter channel](https://gitter.im/nextflow-io/nextflow).\n\nHere are a few neat developments churning out of the birthday cake mix.\n\n### nf-core\n\n[nf-core](https://nf-core.github.io/) is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being led by [Phil Ewels](https://github.com/ewels) of [MultiQC](http://multiqc.info/) fame. The principle is that _nf-core_ pipelines can be used out-of-the-box or as inspiration for something different.\n\nAs well as being a place for best-practise pipelines, other features of _nf-core_ include the [cookie cutter template tool](https://github.com/nf-core/cookiecutter) which provides a fast way to create a dependable workflow using many of Nextflow’s sweet capabilities such as:\n\n- _Outline:_ Skeleton pipeline script.\n- _Data:_ Reference Genome implementation (AWS iGenomes).\n- _Configuration:_ Robust configuration setup.\n- _Containers:_ Skeleton files for Docker image generation.\n- _Reporting:_ HTML email functionality and and HTML results output.\n- _Documentation:_ Installation, Usage, Output, Troubleshooting, etc.\n- _Continuous Integration:_ Skeleton files for automated testing using Travis CI.\n\nThere is also a Python package with helper tools for Nextflow.\n\nYou can find more information about the community via the project [website](https://nf-core.github.io), [GitHub repository](https://github.com/nf-core), [Twitter account](https://twitter.com/nf_core) or join the dedicated [Gitter](https://gitter.im/nf-core/Lobby) chat.\n\n
\n\n[![nf-core logo](/img/nf-core-logo-min.png)](https://nf-co.re)\n\n
\n\n### Kubernetes has landed\n\nAs of version 0.28.0 Nextflow now has support for Kubernetes. If you don’t know much about Kubernetes, at its heart it is an open-source platform for the management and deployment of containers at scale. Google led the initial design and it is now maintained by the Cloud Native Computing Foundation. I found the [The Illustrated Children's Guide to Kubernetes](https://www.youtube.com/watch?v=4ht22ReBjno) particularly useful in explaining the basic vocabulary and concepts.\n\nKubernetes looks be one of the key technologies for the application of containers in the cloud as well as for building Infrastructure as a Service (IaaS) and Platform and a Service (PaaS) applications. We have been approached by many users who wish to use Nextflow with Kubernetes to be able to deploy workflows across both academic and commercial settings. With enterprise versions of Kubernetes such as Red Hat's [OpenShift](https://www.openshift.com/), it was becoming apparent there was a need for native execution with Nextflow.\n\nThe new command `nextflow kuberun` launches the Nextflow driver as a _pod_ which is then able to run workflow tasks as other pods within a Kubernetes cluster. You can read more in the documentation on Kubernetes support for Nextflow [here](https://www.nextflow.io/docs/latest/kubernetes.html).\n\n![Nextflow and Kubernetes](/img/nextflow-kubernetes-min.png)\n\n### Improved reporting and notifications\n\nFollowing the hackathon in September we wrote about the addition of HTML trace reports that allow for the generation HTML detailing resource usage (CPU time, memory, disk i/o etc).\n\nThanks to valuable feedback there has continued to be many improvements to the reports as tracked through the Nextflow GitHub issues page. Reports are now able to display [thousands of tasks](https://github.com/nextflow-io/nextflow/issues/547) and include extra information such as the [container engine used](https://github.com/nextflow-io/nextflow/issues/521). Tasks can be filtered and an [overall progress bar](https://github.com/nextflow-io/nextflow/issues/534) has been added.\n\nYou can explore a [real-world HTML report](/misc/nf-trace-report2.html) and more information on HTML reports can be found in the [documentation](https://www.nextflow.io/docs/latest/tracing.html).\n\nThere has also been additions to workflow notifications. Currently these can be configured to automatically send a notification email when a workflow execution terminates. You can read more about how to setup notifications in the [documentation](https://www.nextflow.io/docs/latest/mail.html?highlight=notification#workflow-notification).\n\n### Syntax-tic!\n\nWriting workflows no longer has to be done in monochrome. There is now syntax highlighting for Nextflow in the popular [Atom editor](https://atom.io) as well as in [Visual Studio Code](https://code.visualstudio.com).\n\n
\n\n[![Nextflow syntax highlighting with Atom](/img/atom-min.png)](/img/atom-min.png)\n\n
\n\n[![Nextflow syntax highlighting with VSCode](/img/vscode-min.png)](/img/vscode-min.png)\n\n
\n\nYou can find the Atom plugin by searching for Nextflow in Atoms package installer or clicking [here](https://atom.io/packages/language-nextflow). The Visual Studio plugin can be downloaded [here](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow).\n\nOn a related note, Nextflow is now an official language on GitHub!\n\n![GitHub nextflow syntax](/img/github-nf-syntax-min.png)\n\n### Conclusion\n\nNextflow developments are progressing faster than ever and with the help of the community, there are a ton of great new features on the way. If you have any suggestions of your killer NF idea then please drop us a line, open an issue or even better, join in the fun.\n\nOver the coming months Nextflow will be reaching out with several training and presentation sessions across the US and Europe. We hope to see as many of you as possible on the road.\n", + "images": [] + }, + { + "slug": "2019/demystifying-nextflow-resume", + "title": "Demystifying Nextflow resume", + "date": "2019-06-24T00:00:00.000Z", + "content": "\n_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part two [here](/blog/2019/troubleshooting-nextflow-resume.html)_\n\nTask execution caching and checkpointing is an essential feature of any modern workflow manager and Nextflow provides an automated caching mechanism with every workflow execution. When using the `-resume` flag, successfully completed tasks are skipped and the previously cached results are used in downstream tasks. But understanding the specifics of how it works and debugging situations when the behaviour is not as expected is a common source of frustration.\n\nThe mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, called the working directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number obtained from a composition of the task’s:\n\n- Inputs values\n- Input files\n- Command line string\n- Container ID\n- Conda environment\n- Environment modules\n- Any executed scripts in the bin directory\n\n### How does resume work?\n\nThe `-resume` command line option allows for the continuation of a workflow execution. It can be used in its most basic form with:\n\n```\n$ nextflow run nextflow-io/hello -resume\n```\n\nIn practice, every execution starts from the beginning. However, when using resume, before launching a task, Nextflow uses the unique ID to check if:\n\n- the working directory exists\n- it contains a valid command exit status\n- it contains the expected output files.\n\nIf these conditions are satisfied, the task execution is skipped and the previously computed outputs are applied. When a task requires recomputation, ie. the conditions above are not fulfilled, the downstream tasks are automatically invalidated.\n\n### The working directory\n\nBy default, the task work directories are created in the directory from where the pipeline is launched. This is often a scratch storage area that can be cleaned up once the computation is completed. A different location for the execution work directory can be specified using the command line option `-w` e.g.\n\n```\n$ nextflow run \n\nThe ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the `-bg` option. It can also be explicitly disabled using the `-ansi-log false` option or setting the `NXF_ANSI_LOG=false` variable in your launching environment.\n\n#### NCBI SRA data source\n\nThe support for NCBI SRA archive was introduced in the [previous edge release](/blog/2019/release-19.03.0-edge.html). Given the very positive reaction, we are graduating this feature into the stable release for general availability.\n\n#### Sharing\n\nThis version includes also a new Git repository provider for the [Gitea](https://gitea.io) self-hosted source code management system, which is added to the already existing support for GitHub, Bitbucket and GitLab sharing platforms.\n\n#### Reports and metrics\n\nFinally, this version includes important enhancements and bug fixes for the task executions metrics collected by Nextflow. If you are using this feature we strongly suggest updating Nextflow to this version.\n\nRemember that updating can be done with the `nextflow -self-update` command.\n\n### Changelog\n\nThe complete list of changes and bug fixes is available on GitHub at [this link](https://github.com/nextflow-io/nextflow/releases/tag/v19.04.0).\n\n### Contributions\n\nSpecial thanks to all people contributed to this release by reporting issues, improving the docs or submitting (patiently) a pull request (sorry if we have missed somebody):\n\n- [Alex Cerjanic](https://github.com/acerjanic)\n- [Anthony Underwood](https://github.com/aunderwo)\n- [Akira Sekiguchi](https://github.com/pachiras)\n- [Bill Flynn](https://github.com/wflynny)\n- [Jorrit Boekel](https://github.com/glormph)\n- [Olga Botvinnik](https://github.com/olgabot)\n- [Ólafur Haukur Flygenring](https://github.com/olifly)\n- [Sven Fillinger](https://github.com/sven1103)\n", + "images": [] + }, + { + "slug": "2019/troubleshooting-nextflow-resume", + "title": "Troubleshooting Nextflow resume", + "date": "2019-07-01T00:00:00.000Z", + "content": "\n_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part one [here](/blog/2019/demystifying-nextflow-resume.html)_.\n\n### Troubleshooting resume\n\nIf your workflow execution is not resumed as expected, there exists several strategies to debug the problem.\n\n#### Modified input file(s)\n\nMake sure that there has been no change in your input files. Don’t forget the unique task hash is computed by taking into account the complete file path, the last modified timestamp and the file size. If any of these change, the workflow will be re-executed, even if the input content is the same.\n\n#### A process modifying one or more inputs\n\nA process should never alter input files. When this happens, the future execution of tasks will be invalidated for the same reason explained in the previous point.\n\n#### Inconsistent input file attributes\n\nSome shared file system, such as NFS, may report inconsistent file timestamp i.e. a different timestamp for the same file even if it has not been modified. There is an option to use the [lenient mode of caching](https://www.nextflow.io/docs/latest/process.html#cache) to avoid this problem.\n\n#### Race condition in a global variable\n\nNextflow does its best to simplify parallel programming and to prevent race conditions and the access of shared resources. One of the few cases in which a race condition may arise is when using a global variable with two (or more) operators. For example:\n\n```\nChannel\n .from(1,2,3)\n .map { it -> X=it; X+=2 }\n .println { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> X=it; X*=2 }\n .println { \"ch2 = $it\" }\n```\n\nThe problem with this snippet is that the `X` variable in the closure definition is defined in the global scope. Since operators are executed in parallel, the `X` value can, therefore, be overwritten by the other `map` invocation.\n\nThe correct implementation requires the use of the `def` keyword to declare the variable local.\n\n```\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X+=2 }\n .view { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X*=2 }\n .view { \"ch2 = $it\" }\n```\n\n#### Non-deterministic input channels\n\nWhile dataflow channel ordering is guaranteed i.e. data is read in the same order in which it’s written in the channel, when a process declares as input two or more channels, each of which is the output of a different process, the overall input ordering is not consistent across different executions.\n\nConsider the following snippet:\n\n```\nprocess foo {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bam') into bam_ch\n \"\"\"\n your_command --here\n \"\"\"\n}\n\nprocess bar {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bai') into bai_ch\n \"\"\"\n other_command --here\n \"\"\"\n}\n\nprocess gather {\n input:\n set val(pair), file(bam) from bam_ch\n set val(pair), file(bai) from bai_ch\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\nThe inputs declared in the gather process can be delivered in any order as the execution order of the process `foo` and `bar` is not deterministic due to parallel executions.\n\nTherefore, the input of the third process needs to be synchronized using the `join` operator or a similar approach. The third process should be written as:\n\n```\nprocess gather {\n input:\n set val(pair), file(bam), file(bai) from bam_ch.join(bai_ch)\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\n#### Still in trouble?\n\nThese are most frequent causes of problems with the Nextflow resume mechanism. If you are still not able to resolve\nyour problem, identify the first process not resuming correctly, then run your script twice using `-dump-hashes`. You can then compare the resulting `.nextflow.log` files (the first will be named `.nextflow.log.1`).\n\nUnfortunately, the information reported by `-dump-hashes` can be quite cryptic, however, with the help of a good _diff_ tool it is possible to compare the two log files to identify the reason for the cache to be invalidated.\n\n#### The golden rule\n\nNever try to debug this kind of problem with production data! This issue can be annoying, but when it happens\nit should be able to be replicated in a consistent manner with any data.\n\nTherefore, we always suggest Nextflow developers include in their pipeline project\na small synthetic dataset to easily execute and test the complete pipeline execution in a few seconds.\nThis is the golden rule for debugging and troubleshooting execution problems avoids getting stuck with production data.\n\n#### Resume by default?\n\nGiven the majority of users always apply resume, we recently discussed having resume applied by the default.\n\nIs there any situation where you do not use resume? Would a flag specifying `-no-cache` be enough to satisfy these use cases?\n\nWe want to hear your thoughts on this. Help steer Nextflow development and vote in the twitter poll below.\n\n

Should -resume⏯️ be the default when launching a Nextflow pipeline?

— Nextflow (@nextflowio) July 1, 2019
\n\n\n
\n*In the following post of this series, we will show how to produce a provenance report using a built-in Nextflow command.*\n", + "images": [] + }, + { + "slug": "2020/cli-docs-release", + "title": "The Nextflow CLI - tricks and treats!", + "date": "2020-10-22T00:00:00.000Z", + "content": "\nFor most developers, the command line is synonymous with agility. While tools such as [Nextflow Tower](https://tower.nf) are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the [Nextflow website](https://www.nextflow.io/docs/edge/cli.html).\n\nAnd given Halloween is just around the corner, in this blog post we'll take a look at 5 CLI tricks and examples which will make your life easier in designing, executing and debugging data pipelines. We are also giving away 5 limited-edition Nextflow hoodies and sticker packs so you can code in style this Halloween season!\n\n### 1. Invoke a remote pipeline execution with the latest revision\n\nNextflow facilitates easy collaboration and re-use of existing pipelines in multiple ways. One of the simplest ways to do this is to use the URL of the Git repository.\n\n```\n$ nextflow run https://www.github.com/nextflow-io/hello\n```\n\nWhen executing a pipeline using the run command, it first checks to see if it has been previously downloaded in the ~/.nextflow/assets directory, and if so, Nextflow uses this to execute the pipeline. If the pipeline is not already cached, Nextflow will download it, store it in the `$HOME/.nextflow/` directory and then launch the execution.\n\nHow can we make sure that we always run the latest code from the remote pipeline? We simply need to add the `-latest` option to the run command, and Nextflow takes care of the rest.\n\n```\n$ nextflow run nextflow-io/hello -latest\n```\n\n### 2. Query work directories for a specific execution\n\nFor every invocation of Nextflow, all the metadata about an execution is stored including task directories, completion status and time etc. We can use the `nextflow log` command to generate a summary of this information for a specific run.\n\nTo see a list of work directories associated with a particular execution (for example, `tiny_leavitt`), use:\n\n```\n$ nextflow log tiny_leavitt\n```\n\nTo filter out specific process-level information from the logs of any execution, we simply need to use the fields (-f) option and specify the fields.\n\n```\n$ nextflow log tiny_leavitt –f 'process, hash, status, duration'\n```\n\nThe hash is the name of the work directory where the process was executed; therefore, the location of a process work directory would be something like `work/74/68ff183`.\n\nThe log command also has other child options including `-before` and `-after` to help with the chronological inspection of logs.\n\n### 3. Top-level configuration\n\nNextflow emphasizes customization of pipelines and exposes multiple options to facilitate this. The configuration is applied to multiple Nextflow commands and is therefore a top-level option. In practice, this means specifying configuration options _before_ the command.\n\nNextflow CLI provides two kinds of config overrides - the soft override and the hard override.\n\nThe top-level soft override \"-c\" option allows us to change the previous config in an additive manner, overriding only the fields included the configuration file.\n\n```\n$ nextflow -c my.config run nextflow-io/hello\n```\n\nOn the other hand, the hard override `-C` completely replaces and ignores any additional configurations.\n\n $ nextflow –C my.config nextflow-io/hello\n\nMoreover, we can also use the config command to inspect the final inferred configuration and view any profiles.\n\n```\n$ nextflow config -show-profiles\n```\n\n### 4. Passing in an input parameter file\n\nNextflow is designed to work across both research and production settings. In production especially, specifying multiple parameters for the pipeline on the command line becomes cumbersome. In these cases, environment variables or config files are commonly used which contain all input files, options and metadata. Love them or hate them, YAML and JSON are the standard formats for human and machines, respectively.\n\nThe Nextflow run option `-params-file` can be used to pass in a file containing parameters in either format.\n\n```\n$ nextflow run nextflow-io/rnaseq -params-file run_42.yaml\n```\n\nThe YAML file could contain the following.\n\n```\nreads : \"s3://gatk-data/run_42/reads/*_R{1,2}_*.fastq.gz\"\nbwa_index : \"$baseDir/index/*.bwa-index.tar.gz\"\npaired_end : true\npenalty : 12\n```\n\n### 5. Specific workflow entry points\n\nThe recently released [DSL2](https://www.nextflow.io/blog/2020/dsl2-is-here.html) adds powerful modularity to Nextflow and enables scripts to contain multiple workflows. By default, the unnamed workflow is assumed to be the main entry point for the script, however, with numerous named workflows, the entry point can be customized by using the `entry` child-option of the run command.\n\n $ nextflow run main.nf -entry workflow1\n\nThis allows users to run a specific sub-workflow or a section of their entire workflow script. For more information, refer to the [implicit workflow](https://www.nextflow.io/docs/latest/dsl2.html#implicit-workflow) section of the documentation.\n\nAdditionally, as of version 20.09.1-edge, you can specify the script in a project to run other than `main.nf` using the command line option\n`-main-script`.\n\n $ nextflow run http://github.com/my/pipeline -main-script my-analysis.nf\n\n### Bonus trick! Web dashboard launched from the CLI\n\nThe tricks above highlight the functionality of the Nextflow CLI. However, for long-running workflows, monitoring becomes all the more crucial. With Nextflow Tower, we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow execution wherever we are. Sign-in to [Tower](https://tower.nf) using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc` or include them in your `nextflow.config`.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n$ export NXF_VER=20.07.1\n```\n\nNext simply add the \"-with-tower\" child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/hello -with-tower\n```\n\n### Nextflow Giveaway\n\nIf you want to look stylish while you put the above tips into practice, or simply like free stuff, we are giving away five of our latest Nextflow hoodie and sticker packs. Retweet or like the Nextflow tweet about this article and we will draw and notify the winners on October 31st!\n\n### About the Author\n\n[Abhinav Sharma](https://www.linkedin.com/in/abhi18av/) is a Bioinformatics Engineer at [Seqera Labs](https://www.seqera.io) interested in Data Science and Cloud Engineering. He enjoys working on all things Genomics, Bioinformatics and Nextflow.\n\n### Acknowledgements\n\nShout out to [Kevin Sayers](https://github.com/KevinSayers) and [Alexander Peltzer](https://github.com/apeltzer) for their earlier efforts in documenting the CLI and which inspired this work.\n\n_The latest CLI docs can be found in the edge release docs at [https://www.nextflow.io/docs/latest/cli.html](https://www.nextflow.io/docs/latest/cli.html)._\n", + "images": [] + }, + { + "slug": "2020/dsl2-is-here", + "title": "Nextflow DSL 2 is here!", + "date": "2020-07-24T00:00:00.000Z", + "content": "\nWe are thrilled to announce the stable release of Nextflow DSL 2 as part of the latest 20.07.1 version!\n\nNextflow DSL 2 represents a major evolution of the Nextflow language and makes it possible to scale and modularise your data analysis pipeline while continuing to use the Dataflow programming paradigm that characterises the Nextflow processing model.\n\nWe spent more than one year collecting user feedback and making sure that DSL 2 would naturally fit the programming experience Nextflow developers are used to.\n\n#### DLS 2 in a nutshell\n\nBackward compatibility is a paramount value, for this reason the changes introduced in the syntax have been minimal and above all, guarantee the support of all existing applications. DSL 2 will be an opt-in feature for at least the next 12 to 18 months. After this transitory period, we plan to make it the default Nextflow execution mode.\n\nAs of today, to use DSL 2 in your Nextflow pipeline, you are required to use the following declaration at the top of your script:\n\n```\nnextflow.enable.dsl=2\n```\n\nNote that the previous `nextflow.preview` directive is still available, however, when using the above declaration the use of the final syntax is enforced.\n\n#### Nextflow modules\n\nA module file is nothing more than a Nextflow script containing one or more `process` definitions that can be imported from another Nextflow script.\n\nThe only difference when compared with legacy syntax is that the process is not bound with specific input and output channels, as was previously required using the `from` and `into` keywords respectively. Consider this example of the new syntax:\n\n```\nprocess INDEX {\n input:\n path transcriptome\n output:\n path 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}\n```\n\nThis allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new `workflow` scope. This effectively allows for the composition of the pipeline logic and enables reuse of workflow components. We anticipate this to improve both the speed that users can develop new pipelines, and the robustness of these pipelines through the use of validated modules.\n\nAny process input can be provided as a function argument using the usual channel semantics familiar to Nextflow developers. Moreover process outputs can either be assigned to a variable or accessed using the implicit `.out` attribute in the scope implicitly defined by the process name itself. See the example below:\n\n```\ninclude { INDEX; FASTQC; QUANT; MULTIQC } from './some/module/script.nf'\n\nread_pairs_ch = channel.fromFilePairs( params.reads)\n\nworkflow {\n INDEX( params.transcriptome )\n FASTQC( read_pairs_ch )\n QUANT( INDEX.out, read_pairs_ch )\n MULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n}\n```\n\nAlso enhanced is the ability to use channels as inputs multiple times without the need to duplicate them (previously done with the special into operator) which makes the resulting pipeline code more concise, fluent and therefore readable!\n\n#### Sub-workflows\n\nNotably, the DSL 2 syntax allows for the definition of reusable processes as well as sub-workflow libraries. The only requirement is to provide a `workflow` name that will be used to reference and declare the corresponding inputs and outputs using the new `take` and `emit` keywords. For example:\n\n```\nworkflow RNASEQ {\n take:\n transcriptome\n read_pairs_ch\n\n main:\n INDEX(transcriptome)\n FASTQC(read_pairs_ch)\n QUANT(INDEX.out, read_pairs_ch)\n\n emit:\n QUANT.out.mix(FASTQC.out).collect()\n}\n```\n\nNow named sub-workflows can be used in the same way as processes, allowing you to easily include and reuse multi-step workflows as part of larger workflows. Find more details [here](/docs/latest/dsl2.html).\n\n#### More syntax sugar\n\nAnother exciting feature of Nextflow DSL 2 is the ability to compose built-in operators, pipeline processes and sub-workflows with the pipe (|) operator! For example the last line in the above example could be written as:\n\n```\nemit:\n QUANT.out | mix(FASTQC.out) | collect\n```\n\nThis syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.\n\nAnother change is the introduction of `channel` as an alternative name as a synonym of `Channel` type identifier and therefore allows the use of `channel.fromPath` instead of `Channel.fromPath` and so on. This is a small syntax sugar to keep the capitazionation consistent with the rest of the language.\n\nMoreover, several process inputs and outputs syntax shortcuts were removed when using the final version of DSL 2 to make it more predictable. For example, with DSL1, in a tuple input or output declaration the component type could be omitted, for example:\n\n```\ninput:\n tuple foo, 'bar'\n```\n\nThe `foo` identifier was implicitly considered an input value declaration instead the string `'bar'` was considered a shortcut for `file('bar')`. However, this was a bit confusing especially for new users and therefore using DSL 2, the fully qualified version must be used:\n\n```\ninput:\n tuple val(foo), path('bar')\n```\n\nYou can find more detailed migration notes at [this link](/docs/latest/dsl2.html#dsl2-migration-notes).\n\n#### What's next\n\nAs always, reaching an important project milestone can be viewed as a major success, but at the same time the starting point for challenges and developments. Having a modularization mechanism opens new needs and possibilities. The first one of which will be focused on the ability to test and validate process modules independently using a unit-testing style approach. This will definitely help to make the resulting pipelines more resilient.\n\nAnother important area for the development of the Nextflow language will be the ability to better formalise pipeline inputs and outputs and further decouple for the process declaration. Nextflow currently strongly relies on the `publishDir` constructor for the generation of the workflow outputs.\n\nHowever in the new _module_ world, this approach results in `publishDir` being tied to a single process definition. The plan is instead to extend this concept in a more general and abstract manner, so that it will be possible to capture and redirect the result of any process and sub-workflow based on semantic annotations instead of hardcoding it at the task level.\n\n### Conclusion\n\nWe are extremely excited about today's release. This was a long awaited advancement and therefore we are very happy to make it available for general availability to all Nextflow users. We greatly appreciate all of the community feedback and ideas over the past year which have shaped DSL 2.\n\nWe are confident this represents a big step forward for the project and will enable the writing of a more scalable and complex data analysis pipeline and above all, a more enjoyable experience.\n", + "images": [] + }, + { + "slug": "2020/groovy3-syntax-sugar", + "title": "More syntax sugar for Nextflow developers!", + "date": "2020-11-03T00:00:00.000Z", + "content": "\nThe latest Nextflow version 2020.10.0 is the first stable release running on Groovy 3.\n\nThe first benefit of this change is that now Nextflow can be compiled and run on any modern Java virtual machine,\nfrom Java 8, all the way up to the latest Java 15!\n\nAlong with this, the new Groovy runtime brings a whole lot of syntax enhancements that can be useful in\nthe everyday life of pipeline developers. Let's see them more in detail.\n\n### Improved not operator\n\nThe `!` (not) operator can now prefix the `in` and `instanceof` keywords.\nThis makes for more concise writing of some conditional expression, for example, the following snippet:\n\n```\nlist = [10,20,30]\n\nif( !(x in list) ) {\n // ..\n}\nelse if( !(x instanceof String) ) {\n // ..\n}\n```\n\ncould be replaced by the following:\n\n```\nlist = [10,20,30]\n\nif( x !in list ) {\n // ..\n}\nelse if( x !instanceof String ) {\n // ..\n}\n```\n\nAgain, this is a small syntax change which makes the code a little more\nreadable.\n\n### Elvis assignment operator\n\nThe elvis assignment operator `?=` allows the assignment of a value only if it was not\npreviously assigned (or if it evaluates to `null`). Consider the following example:\n\n```\ndef opts = [foo: 1]\n\nopts.foo ?= 10\nopts.bar ?= 20\n\nassert opts.foo == 1\nassert opts.bar == 20\n```\n\nIn this snippet, the assignment `opts.foo ?= 10` would be ignored because the dictionary `opts` already\ncontains a value for the `foo` attribute, while it is now assigned as expected.\n\nIn other words this is a shortcut for the following idiom:\n\n```\nif( some_variable != null ) {\n some_variable = 'Hello'\n}\n```\n\nIf you are wondering why it's called _Elvis_ assignment, well it's simple, because there's also the [Elvis operator](https://groovy-lang.org/operators.html#_elvis_operator) that you should know (and use!) already. 😆\n\n### Java style lambda expressions\n\nGroovy 3 supports the syntax for Java lambda expression. If you don't know what a Java lambda expression is\ndon't worry; it's a concept very similar to a Groovy closure, though with slight differences\nboth in the syntax and the semantic. In a few words, a Groovy closure can modify a variable in the outside scope,\nwhile a Java lambda cannot.\n\nIn terms of syntax, a Groovy closure is defined as:\n\n```\n{ it -> SOME_EXPRESSION_HERE }\n```\n\nWhile Java lambda expression looks like:\n\n```\nit -> { SOME_EXPRESSION_HERE }\n```\n\nwhich can be simplified to the following form when the expression is a single statement:\n\n```\nit -> SOME_EXPRESSION_HERE\n```\n\nThe good news is that the two syntaxes are interoperable in many cases and we can use the _lambda_\nsyntax to get rid-off of the curly bracket parentheses used by the Groovy notation to make our Nextflow\nscript more readable.\n\nFor example, the following Nextflow idiom:\n\n```\nChannel\n .of( 1,2,3 )\n .map { it * it +1 }\n .view { \"the value is $it\" }\n```\n\nCan be rewritten using the lambda syntax as:\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> it * it +1 )\n .view( it -> \"the value is $it\" )\n```\n\nIt is a bit more consistent. Note however that the `it ->` implicit argument is now mandatory (while when using the closure syntax it could be omitted). Also, when the operator argument is not _single_ value, the lambda requires the\nround parentheses to define the argument e.g.\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> tuple(it * it, it+1) )\n .view( (a,b) -> \"the values are $a and $b\" )\n```\n\n### Full support for Java streams API\n\nSince version 8, Java provides a [stream library](https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/) that is very powerful and implements some concepts and operators similar to Nextflow channels.\n\nThe main differences between the two are that Nextflow channels and the corresponding operators are _non-blocking_\ni.e. their evaluation is performed asynchronously without blocking your program execution, while Java streams are\nexecuted in a synchronous manner (at least by default).\n\nA Java stream looks like the following:\n\n```\nassert (1..10).stream()\n .filter(e -> e % 2 == 0)\n .map(e -> e * 2)\n .toList() == [4, 8, 12, 16, 20]\n\n```\n\nNote, in the above example\n[filter](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-),\n[map](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-) and\n[toList](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toList--)\nmethods are Java stream operator not the\n[Nextflow](https://www.nextflow.io/docs/latest/operator.html#filter)\n[homonymous](https://www.nextflow.io/docs/latest/operator.html#map)\n[ones](https://www.nextflow.io/docs/latest/operator.html#tolist).\n\n### Java style method reference\n\nThe new runtime also allows for the use of the `::` operator to reference an object method.\nThis can be useful to pass a method as an argument to a Nextflow operator in a similar\nmanner to how it was already possible using a closure. For example:\n\n```\nChannel\n .of( 'a', 'b', 'c')\n .view( String::toUpperCase )\n```\n\nThe above prints:\n\n```\n A\n B\n C\n```\n\nBecause to [view](https://www.nextflow.io/docs/latest/operator.html#filter) operator applied\nthe method [toUpperCase](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toUpperCase--)\nto each element emitted by the channel.\n\n### Conclusion\n\nThe new Groovy runtime brings a lot of syntax sugar for Nextflow pipelines and allows the use of modern Java\nruntime which delivers better performance and resource usage.\n\nThe ones listed above are only a small selection which may be useful to everyday Nextflow developers.\nIf you are curious to learn more about all the changes in the new Groovy parser you can find more details in\n[this link](https://groovy-lang.org/releasenotes/groovy-3.0.html).\n\nFinally, a big thanks to the Groovy community for their significant efforts in developing and maintaining this\ngreat programming environment.\n", + "images": [] + }, + { + "slug": "2020/learning-nextflow-in-2020", + "title": "Learning Nextflow in 2020", + "date": "2020-12-01T00:00:00.000Z", + "content": "\nWith the year nearly over, we thought it was about time to pull together the best-of-the-best guide for learning Nextflow in 2020. These resources will support anyone in the journey from total noob to Nextflow expert so this holiday season, give yourself or someone you know the gift of learning Nextflow!\n\n### Prerequisites to get started\n\nWe recommend that learners are comfortable with using the command line and the basic concepts of a scripting language such as Python or Perl before they start writing pipelines. Nextflow is widely used for bioinformatics applications, and the examples in these guides often focus on applications in these topics. However, Nextflow is now adopted in a number of data-intensive domains such as radio astronomy, satellite imaging and machine learning. No domain expertise is expected.\n\n### Time commitment\n\nWe estimate that the speediest of learners can complete the material in around 12 hours. It all depends on your background and how deep you want to dive into the rabbit-hole! Most of the content is introductory with some more advanced dataflow and configuration material in the workshops and patterns sections.\n\n### Overview of the material\n\n- Why learn Nextflow?\n- Introduction to Nextflow - AWS HPC Conference 2020 (8m)\n- A simple RNA-Seq hands-on tutorial (2h)\n- Full-immersion workshop (8h)\n- Nextflow advanced implementation Patterns (2h)\n- Other resources\n- Community and Support\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FDA repeatability and reproducibility guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support**. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow from the HPC on AWS Conference 2020\n\nThis short YouTube video provides a general overview of Nextflow, the motivations behind its development and a demonstration of some of the latest features.\n\n\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, use channels for data and write processes to perform tasks. It includes all scripts, data and resources and is perfect for getting a flavor for Nextflow.\n\n[Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Full-immersion workshop\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n[Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Nextflow implementation Patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along with as well as a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### Other resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing something.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. The docs should be your first point of reference when something is not clear. Newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines built by domain experts with Nextflow, that have passed tests and have been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n[nf-core website](https://nf-co.re)\n\n#### Tower Docs\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface or API.\n\n[Nextflow Tower documentation](http://help.tower.nf)\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n[Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Running Nextflow by Google Cloud\n\nGoogle Cloud Nextflow step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n[Nextflow on Google Cloud ](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Awesome Nextflow\n\nA collections of Nextflow based pipelines and other resources.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow)\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- [nf-core Slack](https://nfcore.slack.com/)\n- Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Seqera Labs](https://www.seqera.io) technical support & consulting\n\nNextflow is a community-driven project. The list of links below has been collated from a diverse collection of resources and experts to guide you in learning Nextflow. If you have any suggestions, please make a pull request to this page on GitHub.\n\nAlso stay tuned for our upcoming post, where we will discuss the ultimate Nextflow development environment.\n", + "images": [] + }, + { + "slug": "2021/5-more-tips-for-nextflow-user-on-hpc", + "title": "Five more tips for Nextflow user on HPC", + "date": "2021-06-15T00:00:00.000Z", + "content": "\nIn May we blogged about [Five Nextflow Tips for HPC Users](/blog/2021/5_tips_for_hpc_users.html) and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers.\n\n### 1. Use the scratch directive\n\nTo allow the pipeline tasks to share data with each other, Nextflow requires a shared file system path as a working directory. When using this model, a common recommendation is to use the node's local scratch storage as the job working directory to avoid unnecessary use of the network shared file system and achieve better performance.\n\nNextflow implements this best-practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```\nprocess.scratch = true\n```\n\nWhen using this option, Nextflow:\n\n- Creates a unique directory in the computing node's local `/tmp` or the path assigned by your cluster via the `TMPDIR` environment variable.\n- Creates a [symlink](https://en.wikipedia.org/wiki/Symbolic_link) for each input file required by the job execution.\n- Runs the job in the local scratch path.\n Copies the job output files into the job shared work directory assigned by Nextflow.\n\n### 2. Use -bg option to launch the execution in the background\n\nIn some circumstances, you may need to run your Nextflow pipeline in the background without losing the execution output. In this scenario use the `-bg` command line option as shown below.\n\n```\nnextflow run -bg > my-file.log\n```\n\nThis can be very useful when launching the execution from an SSH connected terminal and ensures that any connection issues don't stop the pipeline. You can use `ps` and `kill` to find and stop the execution.\n\n### 3. Disable interactive logging\n\nNextflow has rich terminal logging which uses ANSI escape codes to update the pipeline execution counters interactively. However, this is not very useful when submitting the pipeline execution as a cluster job or in the background. In this case, disable the rich ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`.\n\n### 4. Cluster native options\n\nNextlow has portable directives for common resource requests such as [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory) and [disk](https://www.nextflow.io/docs/latest/process.html#disk) allocation.\n\nThese directives allow you to specify the request for a certain number of computing resources e.g CPUs, memory, or disk and Nextflow converts these values to the native setting of the target execution platform specified in the pipeline configuration.\n\nHowever, there can be settings that are only available on some specific cluster technology or vendors.\n\nThe [clusterOptions](https://www.nextflow.io/docs/latest/process.html#clusterOptions) directive allows you to specify any option of your resource manager for which there isn't direct support in Nextflow.\n\n### 5. Retry failing jobs increasing resource allocation\n\nA common scenario is that instances of the same process may require different computing resources. For example, requesting an amount of memory that is too low for some processes will result in those tasks failing. You could specify a higher limit which would accommodate the task with the highest memory utilization, but you then run the risk of decreasing your job’s execution priority.\n\nNextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure and attempt to re-execute it using a higher limit. For example:\n\n```\nprocess foo {\n\n memory { 2.GB * task.attempt }\n time { 1.hour * task.attempt }\n\n errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries 3\n\n script:\n \"\"\"\n your_job_command --here\n \"\"\"\n}\n```\n\nIn the above example the memory and execution time limits are defined dynamically. The first time the process is executed the task.attempt is set to 1, thus it will request 2 GB of memory and one hour of maximum execution time.\n\nIf the task execution fails, reporting an exit status in the range between 137 and 140, the task is re-submitted (otherwise it terminates immediately). This time the value of task.attempt is 2, thus increasing the amount of the memory to four GB and the time to 2 hours, and so on.\n\nNOTE: These exit statuses are not standard and can change depending on the resource manager you are using. Consult your cluster administrator or scheduler administration guide for details on the exit statuses used by your cluster in similar error conditions.\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These Nextflow options allow you to shape how Nextflow submits your processes to your executor, that can make your workflow more robust by avoiding the overloading of the executor. Some systems have hard limits which if you do not take into account, no processes will be executed. Being aware of these configuration values and how to use them is incredibly helpful when working with larger workflows.\n", + "images": [] + }, + { + "slug": "2021/5_tips_for_hpc_users", + "title": "5 Nextflow Tips for HPC Users", + "date": "2021-05-13T00:00:00.000Z", + "content": "\nNextflow is a powerful tool for developing scientific workflows for use on HPC systems. It provides a simple solution to deploy parallelized workloads at scale using an elegant reactive/functional programming model in a portable manner.\n\nIt supports the most popular workload managers such as Grid Engine, Slurm, LSF and PBS, among other out-of-the-box executors, and comes with sensible defaults for each. However, each HPC system is a complex machine with its own characteristics and constraints. For this reason you should always consult your system administrator before running a new piece of software or a compute intensive pipeline that spawns a large number of jobs.\n\nIn this series of posts, we will be sharing the top tips we have learned along the way that should help you get results faster while keeping in the good books of your sys admins.\n\n### 1. Don't forget the executor\n\nNextflow, by default, spawns parallel task executions in the computer on which it is running. This is generally useful for development purposes, however, when using an HPC system you should specify the executor matching your system. This instructs Nextflow to submit pipeline tasks as jobs into your HPC workload manager. This can be done adding the following setting to the `nextflow.config` file in the launching directory, for example:\n\n```\nprocess.executor = 'slurm'\n```\n\nWith the above setting Nextflow will submit the job executions to your Slurm cluster spawning a `sbatch` command for each job in your pipeline. Find the executor matching your system at [this link](https://www.nextflow.io/docs/latest/executor.html).\nEven better, to prevent the undesired use of the local executor in a specific environment, define the _default_ executor to be used by Nextflow using the following system variable:\n\n```\nexport NXF_EXECUTOR=slurm\n```\n\n### 2. Nextflow as a job\n\nQuite surely your sys admin has already warned you that the login/head node should only be used to submit job executions and not run compute intensive tasks.\nWhen running a Nextflow pipeline, the driver application submits and monitors the job executions on your cluster (provided you have correctly specified the executor as stated in point 1), and therefore it should not run compute intensive tasks.\n\nHowever, it's never a good practice to launch a long running job in the login node, and therefore a good practice consists of running Nextflow itself as a cluster job. This can be done by wrapping the `nextflow run` command in a shell script and submitting it as any other job. An average pipeline may require 2 CPUs and 2 GB of resources allocation.\n\nNote: the queue where the Nextflow driver job is submitted should allow the spawning of the pipeline jobs to carry out the pipeline execution.\n\n### 3. Use the queueSize directive\n\nThe `queueSize` directive is part of the executor configuration in the `nextflow.config` file, and defines how many processes are queued at a given time. By default, Nextflow will submit up to 100 jobs at a time for execution. Increase or decrease this setting depending your HPC system quota and throughput. For example:\n\n```\nexecutor {\n name = 'slurm'\n queueSize = 50\n}\n```\n\n### 4. Specify the max heap size\n\nThe Nextflow runtime runs on top of the Java virtual machine which, by design, tries to allocate as much memory as is available. This is not a good practice in HPC systems which are designed to share compute resources across many users and applications.\nTo avoid this, specify the maximum amount of memory that can be used by the Java VM using the -Xms and -Xmx Java flags. These can be specified using the `NXF_OPTS` environment variable.\n\nFor example:\n\n```\nexport NXF_OPTS=\"-Xms500M -Xmx2G\"\n```\n\nThe above setting instructs Nextflow to allocate a Java heap in the range of 500 MB and 2 GB of RAM.\n\n### 5. Limit the Nextflow submit rate\n\nNextflow attempts to submit the job executions as quickly as possible, which is generally not a problem. However, in some HPC systems the submission throughput is constrained or it should be limited to avoid degrading the overall system performance.\nTo prevent this problem you can use `submitRateLimit` to control the Nextflow job submission throughput. This directive is part of the `executor` configuration scope, and defines the number of tasks that can be submitted per a unit of time. The default for the `submitRateLimit` is unlimited.\nYou can specify the `submitRateLimit` like this:\n\n```\nexecutor {\n submitRateLimit = '10 sec'\n}\n```\n\nYou can also more explicitly specify it as a rate of # processes / time unit:\n\n```\nexecutor {\n submitRateLimit = '10/2min'\n}\n```\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These options allow you to shape how Nextflow communicates with your HPC system. This can make workflows more robust while avoiding overloading the executor. Some systems have hard limits, and if you do not take them into account, it will stop any jobs from being scheduled.\n\nStay tuned for part two where we will discuss background executions, retry strategies, maxForks and other tips.\n", + "images": [] + }, + { + "slug": "2021/configure-git-repositories-with-nextflow", + "title": "Configure Git private repositories with Nextflow", + "date": "2021-10-21T00:00:00.000Z", + "content": "\nGit has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.\n\nNextflow provides builtin support for Git and most popular Git hosting platforms such\nas GitHub, GitLab and Bitbucket between the others, which streamline managing versions\nand track changes in your pipeline projects and facilitate the collaboration across\ndifferent users.\n\nIn order to access public repositories Nextflow does not require any special configuration, just use the _http_ URL of the pipeline project you want to run\nin the run command, for example:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\nHowever to allow Nextflow to access private repositories you will need to specify\nthe repository credentials, and the server hostname in the case of self-managed\nGit server installations.\n\n## Configure access to private repositories\n\nThis is done through a file name `scm` placed in the `$HOME/.nextflow/` directory, containing the credentials and other details for accessing a particular Git hosting solution. You can refer to the Nextflow documentation for all the [SCM configuration file](https://www.nextflow.io/docs/edge/sharing.html) options.\n\nAll of these platforms have their own authentication mechanisms for Git operations which are captured in the `$HOME/.nextflow/scm` file with the following syntax:\n\n```groovy\nproviders {\n\n '' {\n user = value\n password = value\n ...\n }\n\n '' {\n user = value\n password = value\n ...\n }\n\n}\n```\n\nNote: Make sure to enclose the provider name with `'` if it contains a `-` or a\nblank character.\n\nAs of the 21.09.0-edge release, Nextflow integrates with the following Git providers:\n\n## GitHub\n\n[GitHub](https://github.com) is one of the most well known Git providers and is home to some of the most popular open-source Nextflow pipelines from the [nf-core](https://github.com/nf-core/) community project.\n\nIf you wish to use Nextflow code from a **public** repository hosted on GitHub.com, then you don't need to provide credentials (`user` and `password`) to pull code from the repository. However, if you wish to interact with a private repository or are running into GitHub API rate limits for public repos, then you must provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nIt is worth noting that [GitHub recently phased out Git password authentication](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/#what-you-need-to-do-today) and now requires that users supply a more secure GitHub-generated _Personal Access Token_ for authentication. With Nextflow, you can specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n github {\n user = 'me'\n password = 'my-personal-access-token'\n }\n\n}\n```\n\nTo generate a `personal-access-token` for the GitHub platform, follow the instructions provided [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). Ensure that the token has at a minimum all the permissions in the `repo` scope.\n\nOnce you have provided your username and _personal access token_, as shown above, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://github.com/user_name/private_repo\n```\n\n## Bitbucket Cloud\n\n[Bitbucket](https://bitbucket.org/) is a publicly accessible Git solution hosted by Atlassian. Please note that if you are using an on-premises Bitbucket installation, you should follow the instructions for _Bitbucket Server_ in the following section.\n\nIf your Nextflow code is in a public Bitbucket repository, then you don't need to specify your credentials to pull code from the repository. However, if you wish to interact with a private repository, you need to provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that Bitbucket Cloud requires your `app password` in the `password` field, which is different from your login password.\n\n```groovy\nproviders {\n\n bitbucket {\n user = 'me'\n password = 'my-app-password'\n }\n\n}\n```\n\nTo generate an `app password` for the Bitbucket platform, follow the instructions provided [here](https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/). Ensure that the token has at least `Repositories: Read` permission.\n\nOnce these settings are saved in `$HOME/.nextflow/scm`, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://bitbucket.org/user_name/private_repo\n```\n\n## Bitbucket Server\n\n[Bitbucket Server](https://www.atlassian.com/software/bitbucket/enterprise) is a Git hosting solution from Atlassian which is meant for teams that require a self-managed solution. If Nextflow code resides in an open Bitbucket repository, then you don't need to provide credentials to pull code from this repository. However, if you wish to interact with a private repository, you need to give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Bitbucket server as `mybitbucketserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mybitbucketserver {\n platform = 'bitbucketserver'\n server = 'https://your.bitbucket.host.com'\n user = 'me'\n password = 'my-password' // OR \"my-token\"\n }\n\n}\n```\n\nTo generate a _personal access token_ for Bitbucket Server, refer to the [Bitbucket Support documentation](https://confluence.atlassian.com/bitbucketserver/managing-personal-access-tokens-1005339986.html) from Atlassian.\n\nOnce the configuration is saved, you can test the integration by pulling code from a private repository and specifying the `mybitbucketserver` Git provider using the `-hub` option.\n\n```\nnextflow pull https://your.bitbucket.host.com/user_name/private_repo -hub mybitbucketserver\n```\n\nNOTE: It is worth noting that [Atlassian is phasing out the Server offering](https://www.atlassian.com/migration/assess/journey-to-cloud) in favor of cloud product [bitbucket.org](https://bitbucket.org).\n\n## GitLab\n\n[GitLab](https://gitlab.com) is a popular Git provider that offers features covering various aspects of the DevOps cycle.\n\nIf you wish to run a Nextflow pipeline from a public GitLab repository, there is no need to provide credentials to pull code. However, if you wish to interact with a private repository, then you must give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that you need to specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n mygitlab {\n user = 'me'\n password = 'my-password' // or 'my-personal-access-token'\n token = 'my-personal-access-token'\n }\n\n}\n```\n\nIn addition, you can specify the `server` fields for your self-hosted instance of GitLab, by default [https://gitlab.com](https://gitlab.com) is assumed as the server.\n\nTo generate a `personal-access-token` for the GitLab platform follow the instructions provided [here](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html). Please ensure that the token has at least `read_repository`, `read_api` permissions.\n\nOnce the configuration is saved, you can test the integration by pulling the repository code using the `-hub` option.\n\n```\nnextflow pull https://gitlab.com/user_name/private_repo -hub mygitlab\n```\n\n## Gitea\n\n[Gitea server](https://gitea.com/) is an open source Git-hosting solution that can be self-hosted. If you have your Nextflow code in an open Gitea repository, there is no need to specify credentials to pull code from this repository. However, if you wish to interact with a private repository, you can give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Gitea server `mygiteaserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mygiteaserver {\n platform = 'gitea'\n server = 'https://gitea.host.com'\n user = 'me'\n password = 'my-password'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Gitea server, please refer to the [official guide](https://docs.gitea.io/en-us/api-usage/).\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `mygiteaserver` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://git.host.com/user_name/private_repo -hub mygiteaserver\n```\n\n## Azure Repos\n\n[Azure Repos](https://azure.microsoft.com/en-us/services/devops/repos/) is a part of Microsoft Azure Cloud Suite. Nextflow integrates natively Azure Repos via the usual `~/$HOME/.nextflow/scm` file.\n\nIf you'd like to use the `myazure` alias for the `azurerepos` provider, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n myazure {\n server = 'https://dev.azure.com'\n platform = 'azurerepos'\n user = 'me'\n token = 'my-api-token'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Azure Repos integration, please refer to the [official guide](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=preview-page) on Azure.\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `myazure` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://dev.azure.com/org_name/DefaultCollection/_git/repo_name -hub myazure\n```\n\n## Conclusion\n\nGit is a popular, widely used software system for source code management. The native integration of Nextflow with various Git hosting solutions is an important feature to facilitate reproducible workflows that enable collaborative development and deployment of Nextflow pipelines.\n\nStay tuned for more integrations as we continue to improve our support for various source code management solutions!\n", + "images": [] + }, + { + "slug": "2021/introducing-nextflow-for-azure-batch", + "title": "Introducing Nextflow for Azure Batch", + "date": "2021-02-22T00:00:00.000Z", + "content": "\nWhen the Nextflow project was created, one of the main drivers was to enable reproducible data pipelines that could be deployed across a wide range of execution platforms with minimal effort as well as to empower users to scale their data analysis while facilitating the migration to the cloud.\n\nThroughout the years, the computing services provided by cloud vendors have evolved in a spectacular manner. Eight years ago, the model was focused on launching virtual machines in the cloud, then came containers and then the idea of serverless computing which changed everything again. However, the power of the Nextflow abstraction consists of hiding the complexity of the underlying platform. Through the concept of executors, emerging technologies and new platforms can be easily adapted with no changes required to user pipelines.\n\nWith this in mind, we could not be more excited to announce that over the past months we have been working with Microsoft to implement built-in support for [Azure Batch](https://azure.microsoft.com/en-us/services/batch/) into Nextflow. Today we are delighted to make it available to all users as a beta release.\n\n### How does it work\n\nAzure Batch is a cloud-based computing service that allows the execution of highly scalable, container based, workloads in the Azure cloud.\n\nThe support for Nextflow comes in the form of a plugin which implements a new executor, not surprisingly named `azurebatch`, which offloads the execution of the pipeline jobs to corresponding Azure Batch jobs.\n\nEach job run consists in practical terms of a container execution which ships the job dependencies and carries out the job computation. As usual, each job is assigned a unique working directory allocated into a [Azure Blob](https://azure.microsoft.com/en-us/services/storage/blobs/) container.\n\n### Let's get started!\n\nThe support for Azure Batch requires the latest release of Nextflow from the _edge_ channel (version 21.02-edge or later). If you don't have this, you can install it using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nNote for Windows users, as Nextflow is \\*nix based tool you will need to run it using the [Windows subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Also make sure Java 8 or later is installed in the Linux environment.\n\nOnce Nextflow is installed, to run your data pipelines with Azure Batch, you will need to create an Azure Batch account in the region of your choice using the Azure Portal. In a similar manner, you will need an Azure Blob container.\n\nWith the Azure Batch and Blob storage container configured, your `nextflow.config` file should be set up similar to the example below:\n\n```\nplugins {\n id 'nf-azure'\n}\n\nprocess {\n executor = 'azurebatch'\n}\n\nazure {\n batch {\n location = 'westeurope'\n accountName = ''\n accountKey = ''\n autoPoolMode = true\n }\n storage {\n accountName = \"\"\n accountKey = \"\"\n }\n}\n```\n\nUsing this configuration snippet, Nextflow will automatically create the virtual machine pool(s) required to deploy the pipeline execution in the Azure Batch service.\n\nNow you will be able to launch the pipeline execution using the following command:\n\n```\nnextflow run -w az://my-container/work\n```\n\nReplace `` with a pipeline name e.g. nextflow-io/rnaseq-nf and `my-container` with a blob container in the storage account as defined in the above configuration.\n\nFor more details regarding the Nextflow configuration setting for Azure Batch\nrefers to the Nextflow documentation at [this link](/docs/edge/azure.html).\n\n### Conclusion\n\nThe support for Azure Batch further expands the wide range of computing platforms supported by Nextflow and empowers Nextflow users to deploy their data pipelines in the cloud provider of their choice. Above all, it allows researchers to scale, collaborate and share their work without being locked into a specific platform.\n\nWe thank Microsoft, and in particular [Jer-Ming Chia](https://www.linkedin.com/in/jermingchia/) who works in the HPC and AI team for having supported and sponsored this open source contribution to the Nextflow framework.\n", + "images": [] + }, + { + "slug": "2021/nextflow-developer-environment", + "title": "6 Tips for Setting Up Your Nextflow Dev Environment", + "date": "2021-03-04T00:00:00.000Z", + "content": "\n_This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html)._\n\nThis guide is designed to walk you through a basic development setup for writing Nextflow pipelines.\n\n### 1. Installation\n\nNextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Installing Nextflow is straightforward. You just need to download the `nextflow` executable. In your terminal type the following commands:\n\n```\n$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin\n```\n\nThe first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note `/usr/local/bin` is the default for MacOS, you might want to choose `~/bin` or `/usr/bin` depending on your PATH definition and operating system.\n\n### 2. Text Editor or IDE?\n\nNextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience.\n\nMy current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With [VSCode installed](https://code.visualstudio.com/download), you can search for the Nextflow extension in the marketplace.\n\n![VSCode with Nextflow Syntax Highlighting](/img/vscode-nf-highlighting.png)\n\nOther syntax highlighting has been made available by the community including:\n\n- [Atom](https://atom.io/packages/language-nextflow)\n- [Vim](https://github.com/LukeGoodsell/nextflow-vim)\n- [Emacs](https://github.com/Emiller88/nextflow-mode)\n\n### 3. The Nextflow REPL console\n\nThe Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts.\n\nStart the Nextflow console with the following command:\n\n```\n$ nextflow console\n```\n\n![Nextflow REPL console](/img/nf-repl-console.png)\n\nUse the `CTRL+R` keyboard shortcut to run (`⌘+R`on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the **Run selection**.\n\n### 4. Containerize all the things\n\nContainers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.\n\nOnce you have [downloaded and installed Docker](https://docs.docker.com/engine/install/), try pull a public docker image:\n\n```\n$ docker pull quay.io/nextflow/rnaseq-nf\n```\n\nTo run a Nextflow pipeline using the latest tag of the image, we can use:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest\n```\n\nTo learn more about building Docker containers, see the [Seqera Labs tutorial](https://seqera.io/training/#_manage_dependencies_containers) on managing dependencies with containers.\n\nAdditionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.\n\n![VSCode with Docker Extension](/img/vs-code-with-docker-extension.png)\n\n### 5. Use Tower to monitor your pipelines\n\nWhen developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With [Nextflow Tower](https://tower.nf), we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.\n\nSign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc`, or include them in your nextflow.config.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n```\n\nWe can then add the `-with-tower` child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/rnaseq-nf -with-tower\n```\n\n### 6. nf-core tools\n\n[nf-core](https://nf-co.re/) is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage.\n\nThese tools are particularly useful for developers contributing to the community pipelines on [GitHub](https://github.com/nf-core/) with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.\n\n`nf-core tools` is a python package that can be installed in your development environment from Bioconda or PyPi.\n\n```\n$ conda install nf-core\n```\n\nor\n\n```\n$ pip install nf-core\n```\n\n![nf-core tools](/img/nf-core-tools.png)\n\n### Conclusion\n\nDeveloper workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.\n\nSolutions such as [GitHub Codespaces](https://github.com/features/codespaces) and [Gitpod](https://www.gitpod.io/) are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!\n", + "images": [] + }, + { + "slug": "2021/nextflow-sql-support", + "title": "Introducing Nextflow support for SQL databases", + "date": "2021-09-16T00:00:00.000Z", + "content": "\nThe recent tweet introducing the [Nextflow support for SQL databases](https://twitter.com/PaoloDiTommaso/status/1433120149888974854) raised a lot of positive reaction. In this post, I want to describe more in detail how this extension works.\n\nNextflow was designed with the idea to streamline the deployment of complex data pipelines in a scalable, portable and reproducible manner across different computing platforms. To make this all possible, it was decided the resulting pipeline and the runtime should be self-contained i.e. to not depend on separate services such as database servers.\n\nThis makes the resulting pipelines easier to configure, deploy, and allows for testing them using [CI services](https://en.wikipedia.org/wiki/Continuous_integration), which is a critical best practice for delivering high-quality and stable software.\n\nAnother important consequence is that Nextflow pipelines do not retain the pipeline state on separate storage. Said in a different way, the idea was - and still is - to promote stateless pipeline execution in which the computed results are only determined by the pipeline inputs and the code itself, which is consistent with the _functional_ dataflow paradigm on which Nextflow is based.\n\nHowever, the ability to access SQL data sources can be very useful in data pipelines, for example, to ingest input metadata or to store task executions logs.\n\n### How does it work?\n\nThe support for SQL databases in Nextflow is implemented as an optional plugin component. This plugin provides two new operations into your Nextflow script:\n\n1. `fromQuery` performs a SQL query against the specified database and returns a Nextflow channel emitting them. This channel can be used in your pipeline as any other Nextflow channel to trigger the process execution with the corresponding values.\n2. `sqlInsert` takes the values emitted by a Nextflow channel and inserts them into a database table.\n\nThe plugin supports out-of-the-box popular database servers such as MySQL, PostgreSQL and MariaDB. It should be noted that the technology is based on the Java JDBC database standard, therefore it could easily support any database technology implementing a driver for this standard interface.\n\nDisclaimer: This plugin is a preview technology. Some features, syntax and configuration settings can change in future releases.\n\n### Let's get started!\n\nThe use of the SQL plugin requires the use of Nextflow 21.08.0-edge or later. If are using an older version, check [this page](https://www.nextflow.io/docs/latest/getstarted.html#stable-edge-releases) on how to update to the latest edge release.\n\nTo enable the use of the database plugin, add the following snippet in your pipeline configuration file.\n\n```\nplugins {\n id 'nf-sqldb@0.1.0'\n}\n```\n\nIt is then required to specify the connection _coordinates_ of the database service you want to connect to in your pipeline. This is done by adding a snippet similar to the following in your configuration file:\n\n```\nsql {\n db {\n 'my-db' {\n url = 'jdbc:mysql://localhost:3306/demo'\n user = 'my-user'\n password = 'my-password'\n }\n }\n}\n```\n\nIn the above example, replace `my-db` with a name of your choice (this name will be used in the script to reference the corresponding database connection coordinates). Also, provide a `url`, `user` and `password` matching your database server.\n\nYour script should then look like the following:\n\n```\nnextflow.enable.dsl=2\n\nprocess myProcess {\n input:\n tuple val(sample_id), path(sample_in)\n output:\n tuple val(sample_id), path('sample.out')\n\n \"\"\"\n your_command --input $sample_id > sample.out\n \"\"\"\n}\n\nworkflow {\n\n query = 'select SAMPLE_ID, SAMPLE_FILE from SAMPLES'\n channel.sql.fromQuery(query, db: 'my-db') \\\n | myProcess \\\n | sqlInsert(table: 'RESULTS', db: 'my-db')\n\n}\n```\n\nThe above example shows how to perform a simple database query, pipe the results to a fictitious process named `myProcess` and finally store the process outputs into a database table named `RESULTS`.\n\nIt is worth noting that Nextflow allows the use of any number of database instances in your pipeline, simply defining them in the configuration file using the syntax shown above. This could be useful to fetch database data from one data source and store the results into a different one.\n\nAlso, this makes it straightforward to write [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) scripts that span across multiple data sources.\n\nFind more details about the SQL plugin for Nextflow at [this link](https://github.com/nextflow-io/nf-sqldb).\n\n## What about the self-contained property?\n\nYou may wonder if adding this capability breaks the self-contained property of Nextflow pipelines which allows them to be run in a single command and to be tested with continuous integration services e.g. GitHub Action.\n\nThe good news is that it does not ... or at least it should not if used properly.\n\nIn fact, the SQL plugin includes the [H2](http://www.h2database.com/html/features.html) embedded in-memory SQL database that is used by default when no other database is provided in the Nextflow configuration file and can be used for developing and testing your pipeline without the need for a separate database service.\n\nTip: Other than this, H2 also provides the capability to access and query CSV/TSV files as SQL tables. Read more about this feature at [this link](http://www.h2database.com/html/tutorial.html?highlight=csv&search=csv#csv).\n\n### Conclusion\n\nThe use of this plugin adds to Nextflow the capability to query and store data into the SQL databases. Currently, the most popular SQL technologies are supported such as MySQL, PostgreSQL and MariaDB. In the future, support for other database technologies e.g. MongoDB, DynamoDB could be added.\n\nNotably, the support for SQL data-stores has been implemented preserving the core Nextflow capabilities to allow portable and self-contained pipeline scripts that can be developed locally, tested through CI services, and deployed at scale into production environments.\n\nIf you have any questions or suggestions, please feel free to comment in the project discussion group at [this link](https://github.com/nextflow-io/nf-sqldb/discussions).\n\nCredits to [Francesco Strozzi](https://twitter.com/fstrozzi) & [Raoul J.P. Bonnal](https://twitter.com/bonnalr) for having contributed to this work 🙏.\n", + "images": [] + }, + { + "slug": "2021/setup-nextflow-on-windows", + "title": "Setting up a Nextflow environment on Windows 10", + "date": "2021-10-13T00:00:00.000Z", + "content": "\nFor Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as [Cygwin](http://www.cygwin.com/) or [Wubi](https://wiki.ubuntu.com/WubiGuide). Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.\n\nThe Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch.\n\nThis document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10.\n\n## High-level Steps\n\nThe steps described in this guide are as follows:\n\n- Install Windows PowerShell\n- Configure the Windows Subsystem for Linux (WSL2)\n- Obtain and Install a Linux distribution (on WSL2)\n- Install Windows Terminal\n- Install and configure Docker\n- Download and install an IDE (VS Code)\n- Install and test Nextflow\n- Configure X-Windows for use with the Nextflow Console\n- Install and Configure GIT\n\n## Install Windows PowerShell\n\nPowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install.\n\nPowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality.\n\n- You can obtain PowerShell for Windows from GitHub at the URL https://github.com/PowerShell/PowerShell.\n- Download and install the latest stable version of PowerShell for Windows x64 - e.g., [powershell-7.1.3-win-x64.msi](https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi).\n- If you run into difficulties, Microsoft provides detailed instructions [here](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1).\n\n## Configure the Windows Subsystem for Linux (WSL)\n\n### Enable the Windows Subsystem for Linux\n\nMake sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by select WIN-R (using the Windows key to run a command) and running the utility `winver`.\n\nFrom within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"_Run as administrator_\".\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nYou can learn more about DISM [here](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism).\n\n### Step 2: Enable the Virtual Machine Feature\n\nWithin PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nAfter enabling the Virtual Machine Platform support, **restart your machine**.\n\n### Step 3: Download the Linux Kernel Update Package\n\nNextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 [here](https://docs.microsoft.com/en-us/windows/wsl/compare-versions). Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:\n\n[WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi)\n\nOnce downloaded, double click on the kernel update package and select \"Yes\" to install it with elevated permissions.\n\n### STEP 4: Set WSL2 as your Default Version\n\nFrom within PowerShell:\n\n```powershell\nPS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2\n```\n\nIf you run into difficulties with any of these steps, Microsoft provides detailed installation instructions [here](https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps).\n\n## Obtain and Install a Linux Distribution on WSL\n\nIf you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux.\n\n- Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - https://aka.ms/wslstore.\n\n ![Linux Distributions at the Microsoft Store](/img/ms-store.png)\n\n- We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.\n- When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have `sudo` privileges. Whenever a shell is started, it will default to this user.\n- After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:\n\n ```bash\n sudo apt update && sudo apt upgrade\n ```\n\n- This is also a good time to add any additional Linux packages that you will want to use.\n\n ```bash\n sudo apt install net-tools\n ```\n\n## Install Windows Terminal\n\nWhile not necessary, it is a good idea to install [Windows Terminal](https://github.com/microsoft/terminal) at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.\n\nWindows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs.\n\n![Windows Terminal](/img/windows-terminal.png)\n\nInstructions for downloading and installing Windows Terminal are available at: https://docs.microsoft.com/en-us/windows/terminal/get-started.\n\nIt is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments.\n\nSome Windows Terminal commands you'll need right away are provided below:\n\n- Split the active window vertically: SHIFT ALT =\n- Split the active window horizontally: SHIFT ALT \n- Resize the active window: SHIFT ALT ``\n- Open a new window under the current tab: ALT v (_the new tab icon along the top of the Windows Terminal interface_)\n\n## Installing Docker on Windows\n\nThere are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this \"native Linux\" approach is provided [here](https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9).\n\nA second method is to run [Docker Desktop](https://www.docker.com/products/docker-desktop) on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:\n\n- The installation process is automated\n- Docker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)\n- Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extension\n- Docker Desktop provides support for auto-installing a single-node Kubernetes cluster\n- The Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user's perspective, Docker appears to be running natively on Linux.\n\nAn explanation of how the Docker Desktop WSL 2 Back-end works is provided [here](https://www.docker.com/blog/new-docker-desktop-wsl2-backend/).\n\n### Step 1: Install Docker Desktop on Windows\n\n- Download and install Docker Desktop for Windows from the following link: https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exe\n- Follow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.\n- After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.\n- The installation process is straightforward, but if you run into difficulties, detailed instructions are available [here](https://docs.docker.com/docker-for-windows/install/).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-images.png)\n\n The Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available [here](https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-windows-arch.png)\n\n### Step 2: Verify the Docker installation\n\nNow that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.\n\n- Run a Docker command from the Linux shell as shown below below. This command downloads a **centos** image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n\n ```console\n $ docker run -ti centos:6\n [root@02ac0beb2d2c /]# hostname\n 02ac0beb2d2c\n ```\n\n- You can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running `docker ps` in another window shows the running CentOS Docker container.\n\n ```console\n $ docker ps\n CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\n f5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes \thappy_hopper\n ```\n\n### Step 3: Dealing with exit code 139\n\nYou may encounter exit code `139` when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided [here](https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438) and [here](https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139).\n\nThe solution is to add two lines to a `.wslconfig` file in your Windows home directory. The `.wslconfig` file specifies kernel options that apply to all Linux distributions running under WSL 2.\n\nSome of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix.\n\n- Edit the `.wslconfig` file in your Windows home directory. You can do this using PowerShell as shown:\n\n ```powershell\n PS C:\\Users\\ notepad .wslconfig\n ```\n\n- Add these two lines to the `.wslconfig` file and save it:\n\n ```ini\n [wsl2]\n kernelCommandLine = vsyscall=emulate\n ```\n\n- After this, **restart your machine** to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code `139`.\n\n## Install Visual Studio Code as your IDE (optional)\n\nDevelopers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:\n\n- Visual Studio Code - https://code.visualstudio.com/Download (Nextflow VSCode Language plug-in [here](https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md))\n- Eclipse - https://www.eclipse.org/\n- VIM - https://www.vim.org/ (VIM plug-in for Nextflow [here](https://github.com/LukeGoodsell/nextflow-vim))\n- Emacs - https://www.gnu.org/software/emacs/download.html (Nextflow syntax highlighter [here](https://github.com/Emiller88/nextflow-mode))\n- JetBrains PyCharm - https://www.jetbrains.com/pycharm/\n- IntelliJ IDEA - https://www.jetbrains.com/idea/\n- Atom – https://atom.io/ (Nextflow Atom support available [here](https://atom.io/packages/language-nextflow))\n- Notepad++ - https://notepad-plus-plus.org/\n\nWe decided to install Visual Studio Code because it has some nice features, including:\n\n- Support for source code control from within the IDE (Git)\n- Support for developing on Linux via its WSL 2 Video Studio Code Backend\n- A library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an [extension pack for the nf-core community](https://github.com/nf-core/vscode-extensionpack).\n\nDownload Visual Studio Code from https://code.visualstudio.com/Download and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.\n\n- Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path `\\\\wsl$\\`. In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: [**\\\\wsl$\\Ubuntu-20.04**](file://wsl$/Ubuntu-20.04).\n\nNote that the reverse is possible also – from within Linux, `/mnt/c` maps to the Windows C: drive. You can inspect `/etc/mtab` to see the mounted file systems available under Linux.\n\n- It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at https://github.com/nextflow-io/vscode-language-nextflow\n\n ![Nextflow Visual Studio Code Extension](/img/nf-vscode-ext.png)\n\n## Visual Studio Code Remote Development\n\nVisual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment.\n\nWindows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below.\n\n![The Remote Development Environment in VS Code](/img/vscode-remote-dev.png)\n\nAn explanation of how VS Code Remote Development works is provided [here](https://code.visualstudio.com/docs/remote/remote-overview).\n\nVS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.\n\nTo open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select _\"New WSL Window using Distro ..\"_ and select `Ubuntu 20.04`. You'll notice that the environment changes to show that you are working in the WSL: `Ubuntu-20.04` environment.\n\n![Selecting the Remote Dev Environment within VS Code](/img/remote-dev-side-by-side.png)\n\nSelecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.\n\nThe Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.\n\n![Local vs. Remote Extensions in VS Code](/img/vscode-extensions.png)\n\n## Installing Nextflow\n\nWith Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available at https://www.nextflow.io/docs/latest/getstarted.html#installation\n\n### Step 1: Make sure Java is installed (under WSL)\n\nJava is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available [here](https://linuxize.com/post/install-java-on-ubuntu-18-04/). To install the default OpenJDK, follow the instructions below in a Linux shell window:\n\n- Update the _apt_ package index:\n\n ```bash\n sudo apt update\n ```\n\n- Install the latest default OpenJDK package\n\n ```bash\n sudo apt install default-jdk\n ```\n\n- Verify the installation\n\n ```bash\n java -version\n ```\n\n### Step 2: Make sure curl is installed\n\n`curl` is a convenient way to obtain Nextflow. `curl` is included in the default Ubuntu repositories, so installation is straightforward.\n\n- From the shell:\n\n ```bash\n sudo apt update\n sudo apt install curl\n ```\n\n- Verify that `curl` works:\n\n ```console\n $ curl\n curl: try 'curl --help' or 'curl --manual' for more information\n ```\n\n### STEP 3: Download and install Nextflow\n\n- Use `curl` to retrieve Nextflow into a temporary directory and then install it in `/usr/bin` so that the Nextflow command is on your path:\n\n ```bash\n mkdir temp\n cd temp\n curl -s https://get.nextflow.io | bash\n sudo cp nextflow /usr/bin\n ```\n\n- Make sure that Nextflow is executable:\n\n ```bash\n sudo chmod 755 /usr/bin/nextflow\n ```\n\n or if you prefer:\n\n ```bash\n sudo chmod +x /usr/bin/nextflow\n ```\n\n### Step 4: Verify the Nextflow installation\n\n- Make sure Nextflow runs:\n\n ```console\n $ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io\n ```\n\n- Run a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.\n\n ```console\n $ nextflow run hello\n\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\n executor > local (4)\n [06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\n Ciao world!\n\n Hola world!\n\n Bonjour world!\n\n Hello world!\n ```\n\n### Step 5: Run a Containerized Workflow\n\nTo validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at https://github.com/nextflow-io/blast-example. Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.\n\nThe `nextflow.config` file for the Blast example (below) specifies that process logic is encapsulated in the container `nextflow/examples` available from Docker Hub (https://hub.docker.com/r/nextflow/examples).\n\n- On GitHub: [nextflow-io/blast-example/nextflow.config](https://github.com/nextflow-io/blast-example/blob/master/nextflow.config)\n\n ```groovy\n manifest {\n nextflowVersion = '>= 20.01.0'\n }\n\n process {\n container = 'nextflow/examples'\n }\n ```\n\n- Run the _blast-example_ pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:\n\n ```console\n $ nextflow run blast-example -with-docker\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\n executor > local (2)\n [aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n [b3/c41401] process > extract (1) [100%] 1 of 1 ✔\n matching sequences:\n >lcl|1ABO:B unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1ABO:A unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1YCS:B unnamed protein product\n PEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\n GHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\n VQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n >lcl|1IHD:C unnamed protein product\n LPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT\n ```\n\n- Nextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes.\n\n## Configuring an XServer for the Nextflow Console\n\nPipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console's REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.\n\nThe Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided [here](https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3).\n\n- Download an X-Windows server for Windows. In this example, we use the _VcXsrv Windows X Server_ available from source forge at https://sourceforge.net/projects/vcxsrv/.\n\n- Accept all the defaults when running the automated installer. The X-server will end up installed in `c:\\Program Files\\VcXsrv`.\n\n- The automated installation of VcXsrv will create an _\"XLaunch\"_ shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don't need to interact with the XLaunch interface every time you start the X-server.\n\n- Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:\n\n ```powershell\n \"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto\n ```\n\n- Inspecting the new shortcut properties, it should look something like this:\n\n ![X-Server (vcxsrc) Properties](/img/xserver.png)\n\n- Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.\n\n- One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the `netstat` command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using `netstat`, you should see output like the following:\n\n ```powershell\n PS C:\\WINDOWS\\system32> **netstat -abno | findstr 6000**\n TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING 35176\n TCP 127.0.0.1:6000 127.0.0.1:56516 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56517 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56518 ESTABLISHED 35176\n TCP 127.0.0.1:56516 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56517 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56518 127.0.0.1:6000 ESTABLISHED 35176\n TCP 172.28.192.1:6000 172.28.197.205:46290 TIME_WAIT 0\n TCP [::]:6000 [::]:0 LISTENING 35176\n ```\n\n- At this point, the X-server is up and running and awaiting a connection from a client.\n\n- Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.\n\n- The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly.\n\n ```console\n $ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n $ echo $DISPLAY\n 172.28.192.1:0.0\n ```\n\n- Add this command to the end of your `.bashrc` file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n\n ```bash\n cd ~\n vi .bashrc\n ```\n\n ```bash\n # set the X-Windows display to connect to VcXsrv on Windows\n export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n \".bashrc\" 120L, 3912C written\n ```\n\n- Use an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:\n\n ```bash\n sudo apt install xterm\n ```\n\n- Assuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\n Before testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\n Depending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n\n ![Ensure that the Firewall is not interfering](/img/firewall.png)\n\n- With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:\n\n ```bash\n xterm &\n ```\n\n- If everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n\n ![Launch an xterm to verify functionality](/img/xterm.png)\n\n- Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to _\"Open ports to Work and Home networks\"_ for the `vcxsrv.exe` program only as shown:\n\n ![Allowing access to XServer traffic](/img/xserver_setup.png)\n\n- With the X-server running, the `DISPLAY` variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:\n\n ```bash\n nextflow console\n ```\n\n The command above opens the Nextflow REPL console under X-Windows.\n\n ![Nextflow REPL Console under X-Windows](/img/repl_console.png)\n\nInside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.\n\n# Installing Git\n\nCollaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.\n\nAs explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)\n\nDevelopers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.\n\n### Step 1: Install Git on Windows (optional)\n\n- Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.\n\n- Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).\n\n ![Installing Git on Windows](/img/git-install.png)\n\n- The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the `PATH` variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.\n\n- After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git.\n\n- If you plan to use Git from the command line, GitHub provides a useful cheatsheet [here](https://training.github.com/downloads/github-git-cheat-sheet.pdf).\n\n- After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.\n\n ![Using Git within VS Code](/img/git-vscode.png)\n\n- Documentation on using Git with Visual Studio Code is provided at https://code.visualstudio.com/docs/editor/versioncontrol\n\n### Step 2: Install Git on Linux\n\n- Open a Remote VS Code Window on **\\*WSL: Ubuntu 20.04\\*** (By selecting the green icon on the lower-left corner of the VS code interface.)\n\n- Git should already be installed in `/usr/bin`, but you can validate this from the Ubuntu shell:\n\n ```console\n $ git --version\n git version 2.25.1\n ```\n\n- To get started using Git with VS Code Remote on WSL, select the _Source Control icon_ on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to _Clone a Repository_.\n\n- Select \"Clone Repository,\" and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see _blast-example_ appear as a source code repository within VS code as shown:\n\n ![Using Git within VS Code](/img/git-linux-1.png)\n\n- Select the _Explorer_ panel in VS Code to see the cloned _blast-example_ repo. Now we can explore and modify the pipeline code using the IDE.\n\n ![Using Git within VS Code](/img/git-linux-2.png)\n\n- After making modifications to the pipeline, we can execute the _local copy_ of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:\n\n ![Using Git within VS Code](/img/git-linux-3.png)\n\n- With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.\n\n- Git commands are available from within VS Code by selecting the _Source Control_ icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.\n\n ![Using Git within VS Code](/img/git-linux-4.png)\n\n## Summary\n\nWith WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers.\n\nPipelines developed in the Windows environment can easily be extended to compute environments in the cloud.\n\nWhile installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful.\n", + "images": [] + }, + { + "slug": "2022/caching-behavior-analysis", + "title": "Analyzing caching behavior of pipelines", + "date": "2022-11-10T00:00:00.000Z", + "content": "\nThe ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending `-resume` to the `nextflow run` command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.\n\nWe have previously written about Nextflow's [resume functionality](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) as well as some [troubleshooting strategies](https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html) to gain more insights on the caching behavior.\n\nIn this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.\n\n### Local setup for the test\n\nFirst, we clone the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline locally:\n\n```bash\n$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf\n```\n\nIn the examples below, we have used Nextflow `v22.10.0`, Docker `v20.10.8` and `Java v17 LTS` on MacOS.\n\n### Pipeline flowchart\n\nThe flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.\n\n![rnaseq-nf](/img/rnaseq-nf.base.png)\n\n### Logs from initial (fresh) run\n\nAs a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article [Troubleshooting Nextflow resume](https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).\n\nTo have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called `fresh_run.log` and use them later on as \"ground-truth\" for the analysis. In order to save the process hashes we use the `-dump-hashes` flag, which prints them to the log.\n\n**TIP:** We rely upon the [`-log` option](https://www.nextflow.io/docs/latest/cli.html#execution-logs) in the `nextflow` command line interface to be able to supply a custom log file name instead of the default `.nextflow.log`.\n\n```console\n$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n### Edit the `FastQC` process\n\nAfter the initial run of the pipeline, we introduce a change in the `fastqc.nf` module, hard coding the number of threads which should be used to run the `FASTQC` process via Nextflow's [`cpus` directive](https://www.nextflow.io/docs/latest/process.html#cpus).\n\nHere's the output of `git diff` on the contents of `modules/fastqc/main.nf` file:\n\n```diff\n--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }\n```\n\n### Logs from the follow up run\n\nNext, we run the pipeline again with the `-resume` option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called `resumed_run.log`.\n\n```console\n$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n## Analysis of cache hashes\n\nFrom the summary of the command line output above, we can see that the `RNASEQ:FASTQC (FASTQC on ggal_gut)` and `MULTIQC` processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the `fresh_run` and `resumed_run`.\n\nFor the analysis, we need to keep in mind that:\n\n1. The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.\n\n2. The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.\n\n### Find the process level hashes\n\nWe can use standard Unix tools like `grep`, `cut` and `sort` to address these points and filter out the relevant information:\n\n1. Use `grep` to isolate log entries with `cache hash` string\n2. Remove the prefix time-stamps using `cut -d ‘-’ -f 3`\n3. Remove the caching mode related information using `cut -d ';' -f 1`\n4. Sort the lines based on process names using `sort` to have a standard order before comparison\n5. Use `tee` to print the resultant strings to the terminal and simultaneously save to a file\n\nNow, let’s apply these transformations to the `fresh_run.log` as well as `resumed_run.log` entries.\n\n- `fresh_run.log`\n\n```console\n$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n- `resumed_run.log`\n\n```console\n$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n### Inference from process top-level hashes\n\nComputing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in [Demystifying Nextflow resume](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) blog post) . The change we made in the task level CPUs directive and script section of the `FASTQC` process triggered a re-computation of hashes:\n\n```diff\n--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n\n\n```\n\nEven though we only introduced changes in `FASTQC`, the `MULTIQC` process was re-run since it relies upon the output of the `FASTQC` process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:\n\n![rnaseq-nf after modification](/img/rnaseq-nf.modified.png)\n\n### Understanding why `FASTQC` was re-run\n\nWe can see the full list of `FASTQC` process hashes within the `fresh_run.log` file\n\n```console\n\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n\n```\n\nWhen we isolate and compare the log entries for `FASTQC` between `fresh_run.log` and `resumed_run.log`, we see the following diff:\n\n```diff\n--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n\n```\n\nObservations from the diff:\n\n1. We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command.\n2. There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.\n\nIn other words, the diff from log files is confirming our edits.\n\n### Understanding why `MULTIQC` was re-run\n\nNow, we apply the same analysis technique for the `MULTIQC` process in both log files:\n\n```diff\n--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n```\n\nHere, the highlighted diffs show the directory of the input files, changing as a result of `FASTQC` being re-run; as a result `MULTIQC` has a new hash and has to be re-run as well.\n\n## Conclusion\n\nDebugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run.\n\nWhen analyzing large datasets, it may be worth using the `-dump-hashes` option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.\n\nWhile this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an `nf-cache` plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins.\n", + "images": [] + }, + { + "slug": "2022/czi-mentorship-round-1", + "title": "Nextflow and nf-core mentorship, Round 1", + "date": "2022-09-18T00:00:00.000Z", + "content": "\n## Introduction\n\n
\n \"Word\n

Word cloud of scientific interest keywords, averaged across all applications.

\n
\n\nOur recent [The State of the Workflow 2022: Community Survey Results](https://seqera.io/blog/state-of-the-workflow-2022-results/) showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.\n\nThanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.\n\nFrom almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.\n\nPeople worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor / mentee pairs](/img/mentorships-round1-map.png)
\n_The mentorship program was worldwide._\n\n## Ndeye Marième Top (mentee) & John Juma (mentor)\n\nFor the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.\n\nTogether, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to [GISAID](https://gisaid.org/). She also got up to speed with the [nf-core/mag](https://nf-co.re/mag) pipeline for metagenomic analysis.\n\n
\n \"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\" - Marième Top (mentee)\n
\n
\n \"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\" - John Juma (mentor)\n
\n\n## Philip Ashton (mentee) & Robert Petit (mentor)\n\nPhilip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the [Bactopia](https://bactopia.github.io/) pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.\n\nRobert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.\n\n
\n \"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\" - Phil Ashton (mentee)\n
\n
\n \"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\" - Robert Petit (mentor)\n
\n\n## Kalayanee Chairat (mentee) & Alison Meynert (mentor)\n\nKalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom.\n\nWorking with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to [nf-core/configs](https://github.com/nf-core/configs) as an institutional profile. Now she is familiar with these tools, Kalayanee is using [nf-core/sarek](https://nf-co.re/sarek) and [nf-core/rnaseq](https://nf-co.re/rnaseq) to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment.\n\n
\n \"The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\" - Kalayanee Chairat (mentee)\n
\n
\n \"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\" - Alison Meynert (mentor)\n
\n\n## Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)\n\nFor the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.\n\nEdward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.\n\n
\n \"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\" - Edward Lukyamuzi (mentee)\n
\n
\n \"I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\" - Emilio Garcia-Rios (mentor)\n
\n\n## Suchitra Thapa (mentee) & Maxime Borry (mentor)\n\nSuchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the [metaphlankrona](https://github.com/suchitrathapa/metaphlankrona) pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.\n\nSuchitra will be sharing her work and progress on the pipeline as a poster at the [Nextflow Summit 2022](https://summit.nextflow.io/speakers/suchitra-thapa/).\n\n
\n \"This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\" - Suchitra Thapa (mentee)\n
\n
\n \"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\" - Maxime Borry (mentor)\n
\n\n## Conclusion\n\nFeedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.\n\n
\n \"This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!\" - John Juma (mentor)\n
\n\n![Screenshot of final zoom meetup](/img/mentorships-round1-zoom.png)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow.\n\nWe were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See [https://nf-co.re/mentorships](https://nf-co.re/mentorships) for details.\n\n

\n Mentorship Round 2 - Details\n

\n", + "images": [ + "/img/mentorships-round1-wordcloud.png" + ] + }, + { + "slug": "2022/deploy-nextflow-pipelines-with-google-cloud-batch", + "title": "Deploy Nextflow Pipelines with Google Cloud Batch!", + "date": "2022-07-13T00:00:00.000Z", + "content": "\nA key feature of Nextflow is the ability to abstract the implementation of data analysis pipelines so they can be deployed in a portable manner across execution platforms.\n\nAs of today, Nextflow supports a rich variety of HPC schedulers and all major cloud providers. Our goal is to support new services as they emerge to enable Nextflow users to take advantage of the latest technology and deploy pipelines on the compute environments that best fit their requirements.\n\nFor this reason, we are delighted to announce that Nextflow now supports [Google Cloud Batch](https://cloud.google.com/batch), a new fully managed batch service just announced for beta availability by Google Cloud.\n\n### A New On-Ramp to the Google Cloud\n\nGoogle Cloud Batch is a comprehensive cloud service suitable for multiple use cases, including HPC, AI/ML, and data processing. While it is similar to the Google Cloud Life Sciences API, used by many Nextflow users today, Google Cloud Batch offers a broader set of capabilities. As with Google Cloud Life Sciences, Google Cloud Batch automatically provisions resources, manages capacity, and allows batch workloads to run at scale. It offers several advantages, including:\n\n- The ability to re-use VMs across jobs steps to reduce overhead and boost performance.\n- Granular control over task execution, compute, and storage resources.\n- Infrastructure, application, and task-level logging.\n- Improved task parallelization, including support for multi-node MPI jobs, with support for array jobs, and subtasks.\n- Improved support for spot instances, which provides a significant cost saving when compared to regular instance.\n- Streamlined data handling and provisioning.\n\nA nice feature of Google Cloud Batch API, that fits nicely with Nextflow, is its built-in support for data ingestion from Google Cloud Storage buckets. A batch job can _mount_ a storage bucket and make it directly accessible to a container running a Nextflow task. This feature makes data ingestion and sharing resulting data sets more efficient and reliable than other solutions.\n\n### Getting started with Google Cloud Batch\n\nSupport for the Google Cloud Batch requires the latest release of Nextflow from the edge channel (version `22.07.1-edge` or later). If you don't already have it, you can install this release using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nMake sure your Google account is allowed to access the Google Cloud Batch service by checking the [API & Service](https://console.cloud.google.com/apis/dashboard) dashboard.\n\nCredentials for accessing the service are picked up by Nextflow from your environment using the usual [Google Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http) mechanism. That is, either via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or by using the following command to set up the environment:\n\n```\ngcloud auth application-default login\n```\n\nAfter authenticating yourself to Google Cloud, create a `nextflow.config` file and specify `google-batch` as the Nextflow executor. You will also need to specify the Google Cloud project where execution will occur and the Google Cloud Storage working directory for pipeline execution.\n\n```\ncat < nextflow.config\nprocess.executor = 'google-batch'\nworkDir = 'gs://YOUR-GOOGLE-BUCKET/scratch'\ngoogle.project = 'YOUR GOOGLE PROJECT ID'\nEOT\n```\n\nIn the above snippet replace `` with a Google Storage bucket of your choice where to store the pipeline output data and `` with your Google project Id where the computation will be deployed.\n\nWith this information, you are ready to start. You can verify that the integration is working by running the Nextflow “hello” pipeline as shown below:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\n### Migrating Google Cloud Life Sciences pipelines to Google Cloud Batch\n\nGoogle Cloud Life Sciences users can easily migrate their pipelines to Google Cloud Batch by making just a few edits to their pipeline configuration settings. Simply replace the `google-lifesciences` executor with `google-batch`.\n\nFor each setting having the prefix `google.lifeScience.`, there is a corresponding `google.batch.` setting. Simply update these configuration settings to reflect the new service.\n\nThe usual process directives such as: [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory), [time](https://www.nextflow.io/docs/latest/process.html#time), [machineType](https://www.nextflow.io/docs/latest/process.html#machinetype) are natively supported by Google Cloud Batch, and should not be modified.\n\nFind out more details in the [Nextflow documentation](https://www.nextflow.io/docs/edge/google.html#cloud-batch).\n\n### 100% Open, Built to Scale\n\nThe Google Cloud Batch executor for Nextflow is offered as an open source contribution to the Nextflow project. The integration was developed by Google in collaboration with [Seqera Labs](https://seqera.io/). This is a validation of Google Cloud’s ongoing commitment to open source software (OSS) and a testament to the health and vibrancy of the Nextflow project. We wish to thank the entire Google Cloud Batch team, and Shamel Jacobs in particular, for their support of this effort.\n\n### Conclusion\n\nSupport for Google Cloud Batch further expands the wide range of computing platforms supported by Nextflow. It empowers Nextflow users to easily access cost-effective resources, and take full advantage of the rich capabilities of the Google Cloud. Above all, it enables researchers to easily scale and collaborate, improving their productivity, and resulting in better research outcomes.\n", + "images": [] + }, + { + "slug": "2022/evolution-of-nextflow-runtime", + "title": "Evolution of the Nextflow runtime", + "date": "2022-03-24T00:00:00.000Z", + "content": "\nSoftware development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.\n\nNextflow is no exception. This post will summarise the major changes in the evolution of the framework over the next 12 to 18 months.\n\n### Java baseline version\n\nNextflow runs on top of Java (or, more precisely, the Java virtual machine). So far, Java 8 has been the minimal version required to run Nextflow. However, this version was released 8 years ago and is going to reach its end-of-life status at the end of [this month](https://endoflife.date/java). For this reason, as of version 22.01.x-edge and the upcoming stable release 22.04.0, Nextflow will require Java version 11 or later for its execution. This also allows the introduction of new capabilities provided by the modern Java runtime.\n\nTip: If you are confused about how to install or upgrade Java on your computer, consider using [Sdkman](https://sdkman.io/). It’s a one-liner install tool that allows easy management of Java versions.\n\n### DSL2 as default syntax\n\nNextflow DSL2 has been introduced nearly [2 years ago](https://www.nextflow.io/blog/2020/dsl2-is-here.html) (how time flies!) and definitely represented a major milestone for the project. Established pipeline collections such as those in [nf-core](https://nf-co.re/pipelines) have migrated their pipelines to DSL2 syntax.\n\nThis is a confirmation that the DSL2 syntax represents a natural evolution for the project and is not considered to be just an experimental or alternative syntax.\n\nFor this reason, as for Nextflow version 22.03.0-edge and the upcoming 22.04.0 stable release, DSL2 syntax is going to be the **default** syntax version used by Nextflow, if not otherwise specified.\n\nIn practical terms, this means it will no longer be necessary to add the declaration `nextflow.enable.dsl = 2` at the top of your script or use the command line option `-dsl2 ` to enable the use of this syntax.\n\nIf you still want to continue to use DSL1 for your pipeline scripts, you will need to add the declaration `nextflow.enable.dsl = 1` at the top of your pipeline script or use the command line option `-dsl1`.\n\nTo make this transition as smooth as possible, we have also added the possibility to declare the DSL version in the Nextflow configuration file, using the same syntax shown above.\n\nFinally, if you wish to keep the current DSL behaviour and not make any changes in your pipeline scripts, the following variable can be defined in your system environment:\n\n```\nexport NXF_DEFAULT_DSL=1\n```\n\n### DSL1 end-of-life phase\n\nMaintaining two separate DSL implementations in the same programming environment is not sustainable and, above all, does not make much sense. For this reason, along with making DSL2 the default Nextflow syntax, DSL1 will enter into a 12-month end-of-life phase, at the end of which it will be removed. Therefore version 22.04.x and 22.10.x will be the last stable versions providing the ability to run DSL1 scripts.\n\nThis is required to keep evolving the framework and to create a more solid implementation of Nextflow grammar. Maintaining compatibility with the legacy syntax implementation and data structures is a challenging task that prevents the evolution of the new syntax.\n\nBear in mind, this does **not** mean it will not be possible to use DSL1 starting from 2023. All existing Nextflow runtimes will continue to be available, and it will be possible to for any legacy pipeline to run using the required version available from the GitHub [releases page](https://github.com/nextflow-io/nextflow/releases), or by specifying the version using the NXF_VER variable, e.g.\n\n```\nNXF_VER: 21.10.6 nextflow run \n```\n\n### New configuration format\n\nThe configuration file is a key component of the Nextflow framework since it allows workflow developers to decouple the pipeline logic from the execution parameters and infrastructure deployment settings.\n\nThe current Nextflow configuration file mechanism is extremely powerful, but it also has some serious drawbacks due to its _dynamic_ nature that makes it very hard to keep stable and maintainable over time.\n\nFor this reason, we are planning to re-engineer the current configuration component and replace it with a better configuration component with two major goals: 1) continue to provide a rich and human-readable configuration system (so, no YAML or JSON), 2) have a well-defined syntax with a solid foundation that guarantees predictable configurations, simpler troubleshooting and more sustainable maintenance.\n\nCurrently, the most likely options are [Hashicorp HCL](https://github.com/hashicorp/hcl) (as used by Terraform and other Hashicorp tools) and [Lightbend HOCON](https://github.com/lightbend/config). You can read more about this feature at [this link](https://github.com/nextflow-io/nextflow/issues/2723).\n\n### Ignite executor deprecation\n\nThe executor for [Apache Ignite](https://www.nextflow.io/docs/latest/ignite.html) was an early attempt to provide Nextflow with a self-contained, distributed cluster for the deployment of pipelines into HPC environments. However, it had very little adoption over the years, which was not balanced by the increasing complexity of its maintenance.\n\nFor this reason, it was decided to deprecate it and remove it from the default Nextflow distribution. The module is still available in the form of a separate project plugin and available at [this link](https://github.com/nextflow-io/nf-ignite), however, it will not be actively maintained.\n\n### Conclusion\n\nThis post is focused on the most fundamental changes we are planning to make in the following months.\n\nWith the adoption of Java 11, the full migration of DSL1 to DSL2 and the re-engineering of the configuration system, our purpose is to consolidate the Nextflow technology and lay the foundation for all the new exciting developments and features on which we are working on. Stay tuned for future blogs about each of them in upcoming posts.\n\nIf you want to learn more about the upcoming changes reach us out on [Slack at this link](https://app.slack.com/client/T03L6DM9G).\n", + "images": [] + }, + { + "slug": "2022/learn-nextflow-in-2022", + "title": "Learning Nextflow in 2022", + "date": "2022-01-21T00:00:00.000Z", + "content": "\nA lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow [Software Carpentries](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course and an excellent write-up by [23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1).\n\nWe have collated some links below from a diverse collection of resources to help you on your journey to learn Nextflow. Nextflow is a community-driven project - if you have any suggestions, please make a pull request to [this page on GitHub](https://github.com/nextflow-io/website/tree/master/content/blog/2022/learn-nextflow-in-2022.md).\n\nWithout further ado, here is the definitive guide for learning Nextflow in 2022. These resources will support anyone in the journey from total beginner to Nextflow expert.\n\n### Prerequisites\n\nBefore you start writing Nextflow pipelines, we recommend that you are comfortable with using the command-line and understand the basic concepts of scripting languages such as Python or Perl. Nextflow is widely used for bioinformatics applications, and scientific data analysis. The examples and guides below often focus on applications in these areas. However, Nextflow is now adopted in a number of data-intensive domains such as image analysis, machine learning, astronomy and geoscience.\n\n### Time commitment\n\nWe estimate that it will take at least 20 hours to complete the material. How quickly you finish will depend on your background and how deep you want to dive into the content. Most of the content is introductory but there are some more advanced dataflow and configuration concepts outlined in the workshop and pattern sections.\n\n### Contents\n\n- Why learn Nextflow?\n- Introduction to Nextflow from 23andMe\n- An RNA-Seq hands-on tutorial\n- Nextflow workshop from Seqera Labs\n- Software Carpentries Course\n- Managing Pipelines in the Cloud\n- The nf-core tutorial\n- Advanced implementation patterns\n- Awesome Nextflow\n- Further resources\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FAIR guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support** available from Seqera Labs. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow by 23andMe\n\nThis informative post begins with the basic concepts of Nextflow and builds towards how Nextflow is used at 23andMe. It includes a detailed use case for how 23andMe run their imputation pipeline in the cloud, processing over 1 million individuals per day with over 10,000 CPUs in a single compute environment.\n\n👉 [Nextflow at 23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1)\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, using channels to pass data around and writing processes to perform tasks. It includes all scripts, input data and resources and is perfect for getting a taste of Nextflow.\n\n👉 [Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Nextflow workshop from Seqera Labs\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n👉 [Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Software Carpentry workshop\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (in active development) motivates the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience are those with little programming experience, and the course provides a foundation to comfortably write and run Nextflow and nf-core workflows. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit [The Carpentries](https://carpentries.org/) style of training. The Carpentries emphasize feedback to improve teaching materials so we would like to hear back from you about what you thought was both well-explained and what needs improvement. Pull requests to the course material are very welcome.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n### 6. Managing Pipelines in the Cloud - GenomeWeb Webinar\n\nThis on-demand webinar features Phil Ewels from SciLifeLab and nf-core, Brendan Boufler from Amazon Web Services and Evan Floden from Seqera Labs. The wide ranging dicussion covers the significance of scientific workflow, examples of Nextflow in production settings and how Nextflow can be integrated with other processes.\n\n👉 [Watch the webinar](https://seqera.io/webinars-and-podcasts/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n### 7. Nextflow implementation patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along, as well as a GitHub repository.\n\n👉 [Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### 8. nf-core tutorials\n\nA tutorial covering the basics of using and creating nf-core pipelines. It provides an overview of the nf-core framework including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n👉 [nf-core usage tutorials](https://nf-co.re/usage/usage_tutorials)\nand [nf-core developer tutorials](https://nf-co.re/developers/developer_tutorials)\n\n### 9. Awesome Nextflow\n\nA collections of awesome Nextflow pipelines.\n\n👉 [Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) on GitHub\n\n### 10. Further resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\n👉 Latest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### Seqera Labs docs\n\nAn index of documentation, deployment guides, training materials and resources for all things Nextflow and Tower.\n\n👉 [Seqera Labs docs](https://seqera.io/docs)\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n👉 [nf-core website](https://nf-co.re) and [nf-core Slack](https://nf-co.re/join)\n\n#### Nextflow Tower\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface, CLI or API.\n\n👉 [Nextflow Tower](https://tower.nf) and [user documentation](http://help.tower.nf).\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n👉 [Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Nextflow Data Pipelines on Azure Batch\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guides below to set up both services on Azure, and to get your storage and batch account names and keys.\n\n👉 [Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n#### Running Nextflow by Google Cloud\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n👉 [Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Bonus: Nextflow Tutorial - Variant Calling Edition\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash-script into a Nextflow workflow and to modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io) and [Nextflow Tower](https://tower.nf)\n\n### Credits\n\nSpecial thanks to Mahesh Binzer-Panchal for reviewing the latest revision of this post and contributing the Software Carpentry workshop section.\n", + "images": [] + }, + { + "slug": "2022/nextflow-is-moving-to-slack", + "title": "Nextflow’s community is moving to Slack!", + "date": "2022-02-22T00:00:00.000Z", + "content": "\n
\n\n“Software communities don’t just write code together. They brainstorm feature ideas, help new users get their bearings, and collaborate on best ways to use the software.…conversations need their own place\" - GitHub Satellite Blog 2020\n\n
\n\nThe Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.\n\nI still remember when a former colleague proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : \"would that not be a waste of time?\".\n\nFortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.\n\n### Where the future lies\n\nAs the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.\n\nFor these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!\n\nYou can join today using this link!\n\nOnce you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.\n\nAlong the same lines, the Nextflow discussion forum is moving from Google Groups to the Discussion forum in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.\n\nThe old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.\n\nIf you have any questions or problems signing up then please feel free to let us know at info@nextflow.io.\n\nAs always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!\n\nSee you on Slack!\n\n### Credits\n\nThis was also made possible thanks to sponsorship from the Chan Zuckerberg Initiative, the Slack for Nonprofits program and support from Seqera Labs.\n", + "images": [] + }, + { + "slug": "2022/nextflow-summit-2022-recap", + "title": "Nextflow Summit 2022 Recap", + "date": "2022-11-03T00:00:00.000Z", + "content": "\n## Three days of Nextflow goodness in Barcelona\n\nAfter a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!\n\nThe week-long event kicked off with 50 people participating in a hackathon organized by nf-core beginning on October 10th. The [hackathon](https://nf-co.re/events/2022/hackathon-october-2022) tackled several cutting-edge projects with developer teams focused on various aspects of nf-core including documentation, subworkflows, pipelines, DSL2 conversions, modules, and infrastructure. The Nextflow Summit began mid-week attracting nearly 600 people, including 165 attending in person and another 433 remotely. The [YouTube live streams](https://summit.nextflow.io/stream/) have now collected over two and half thousand views. Just prior to the summit, three virtual Nextflow training events were also run with separate sessions for the Americas, EMEA, and APAC in which 835 people participated.\n\n## An action-packed agenda\n\nThe three-day Nextflow Summit featured 33 talks delivered by speakers from academia, research, healthcare providers, biotechs, and cloud providers. This year’s speakers came from the following organizations:\n\n- Amazon Web Services\n- Center for Genomic Regulation\n- Centre for Molecular Medicine and Therapeutics, University of British Columbia\n- Chan Zukerberg Biohub\n- Curative\n- DNA Nexus\n- Enterome\n- Google\n- Janelia Research Campus\n- Microsoft\n- Oxford Nanopore\n- Quadram Institute BioScience\n- Seqera Labs\n- Quantitative Biology Center, University of Tübingen\n- Quilt Data\n- UNC Lineberger Comprehensive Cancer Center\n- Università degli Studi di Macerata\n- University of Maryland\n- Wellcome Sanger Institute\n- Wyoming Public Health Laboratory\n\n## Some recurring themes\n\nWhile there were too many excellent talks to cover individually, a few themes surfaced throughout the summit. Not surprisingly, SARS-Cov-2 was a thread that wound through several talks. Tony Zeljkovic from Curative led a discussion about [unlocking automated bioinformatics for large-scale healthcare](https://www.youtube.com/watch?v=JZMaRYzZxGU&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=8), and Thanh Le Viet of Quadram Institute Bioscience discussed [large-scale SARS-Cov-2 genomic surveillance at QIB](https://www.youtube.com/watch?v=6jQr9dDaais&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=30). Several speakers discussed best practices for building portable, modular pipelines. Other common themes were data provenance & traceability, data management, and techniques to use compute and storage more efficiently. There were also a few talks about the importance of dataflows in new application areas outside of genomics and bioinformatics.\n\n## Data provenance tracking\n\nIn the Thursday morning keynote, Rob Patro﹘Associate Professor at the University of Maryland Dept. of Computer Science and CTO and co-founder of Ocean Genomics﹘described in his talk “[What could be next(flow)](https://www.youtube.com/watch?v=vNrKFT5eT8U&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=6),” how far the Nextflow community had come in solving problems such as reproducibility, scalability, modularity, and ease of use. He then challenged the community with some complex issues still waiting in the wings. He focused on data provenance as a particularly vexing challenge explaining how tremendous effort currently goes into manual metadata curation.\n\nRob offered suggestions about how Nextflow might evolve, and coined the term “augmented execution contexts” (AECs) drawing from his work on provenance tracking – answering questions such as “what are these files, and where did they come from.” This thinking is reflected in [tximeta](https://github.com/mikelove/tximeta), a project co-developed with Mike Love of UNC. Rob also proposed ideas around automating data format conversions analogous to type casting in programming languages explaining how such conversions might be built into Nextflow channels to make pipelines more interoperable.\n\nIn his talk with the clever title “[one link to rule them all](https://www.youtube.com/watch?v=dttkcuP3OBc&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=13),” Aneesh Karve of Quilt explained how every pipeline run is a function of the code, environment, and data, and went on to show how Quilt could help dramatically simplify data management with dataset versioning, accessibility, and verifiability. Data provenance and traceability were also front and center when Yih-Chii Hwang of DNAnexus described her team’s work around [bringing GxP compliance to Nextflow workflows](https://www.youtube.com/watch?v=RIwpJTDlLiE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=21).\n\n## Data management and storage\n\nOther speakers also talked about challenges related to data management and performance. Angel Pizarro of AWS gave an interesting talk comparing the [price/performance of different AWS cloud storage options](https://www.youtube.com/watch?v=VXtYCAqGEQQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=12). [Hatem Nawar](https://www.youtube.com/watch?v=jB91uqUqsRM&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=9) (Google) and [Venkat Malladi](https://www.youtube.com/watch?v=GAIL8ZAMJPQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=20) (Microsoft) also talked about cloud economics and various approaches to data handling in their respective clouds. Data management was also a key part of Evan Floden’s discussion about Nextflow Tower where he discussed Tower Datasets, as well as the various cloud storage options accessible through Nextflow Tower. Finally, Nextflow creator Paolo Di Tommaso unveiled new work being done in Nextflow to simplify access to data residing in object stores in his talk “[Nextflow and the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14)”.\n\n## Compute optimization\n\nAnother recurring theme was improving compute efficiency. Several talks discussed using containers more effectively, leveraging GPUs & FPGAs for added performance, improving virtual machine instance type selection, and automating resource requirements. Mike Smoot of Illumina talked about Nextflow, Kubernetes, and DRAGENs and how Illumina’s FPGA-based Bio-IT Platform can dramatically accelerate analysis. Venkat Malladi discussed efforts to suggest optimal VM types based on different standardized nf-core labels in the Azure cloud (process_low, process_medium, process_high, etc.) Finally, Evan Floden discussed [Nextflow Tower](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22) and unveiled an exciting new [resource optimization feature](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/) that can intelligently tune pipeline resource requests to radically reduce cloud costs and improve run speed. Overall, the Nextflow community continues to make giant strides in improving efficiency and managing costs in the cloud.\n\n## Beyond genomics\n\nWhile most summit speakers focused on genomics, a few discussed data pipelines in other areas, including statistical modeling, analysis, and machine learning. Nicola Visonà from Università degli Studi di Macerata gave a fascinating talk about [using agent-based models to simulate the first industrial revolution](https://www.youtube.com/watch?v=PlKJ0IDV_ds&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=27). Similarly, Konrad Rokicki from the Janelia Research Campus explained how Janelia are using [Nextflow for petascale bioimaging data](https://www.youtube.com/watch?v=ZjSzx1I76z0&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=18) and why bioimage processing remains a large domain area with an unmet need for reproducible workflows.\n\n## Summit Announcements\n\nThis year’s summit also saw several exciting announcements from Nextflow developers. Paolo Di Tommaso, during his talk on [the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14), announced the availability of [Nextflow 22.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v22.10.0). In addition to various bug fixes, the latest Nextflow release introduces an exciting new technology called Wave that allows containers to be built on the fly from Dockerfiles or Conda recipes saved within a Nextflow pipeline. Wave also helps to simplify containerized pipeline deployment with features such as “container augmentation”; enabling developers to inject new container scripts and functionality on the fly without needing to rebuild the base containers such as a cloud-native [Fusion file system](https://www.nextflow.io/docs/latest/fusion.html). When used with Nextflow Tower, Wave also simplifies authentication to various public and private container registries. The latest Nextflow release also brings improved support for Kubernetes and enhancements to documentation, along with many other features.\n\nSeveral other announcements were made during [Evan Floden’s talk](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22&t=127s), such as:\n\n- MultiQC is joining the Seqera Labs family of products\n- Fusion – a distributed virtual file system for cloud-native data pipelines\n- Nextflow Tower support for Google Cloud Batch\n- Nextflow Tower resource optimization\n- Improved Resource Labels support in Tower with integrations for cost accounting with all major cloud providers\n- A new Nextflow Tower dashboard coming soon, providing visibility across workspaces\n\n## Thank you to our sponsors\n\nThe summit organizers wish to extend a sincere thank you to the event sponsors: AWS, Google Cloud, Seqera Labs, Quilt Data, Oxford Nanopore Technologies, and Element BioSciences. In addition, the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/) continues to play a key role with their EOSS grants funding important work related to Nextflow and the nf-core community. The success of this year’s summit reminds us of the tremendous value of community and the critical impact of open science software in improving the quality, accessibility, and efficiency of scientific research.\n\n## Learning more\n\nFor anyone who missed the summit, you can still watch the sessions or view the training sessions at your convenience:\n\n- Watch post-event recordings of the [Nextflow Summit on YouTube](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32)\n- View replays of the recent online [Nextflow and nf-core training](https://nf-co.re/events/2022/training-october-2022)\n\nFor additional detail on the summit and the preceding nf-core events, also check out an excellent [summary of the event](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/) written by Marcel Ribeiro-Dantas in his blog, the [Dataist Storyteller](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/)!\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)\n", + "images": [] + }, + { + "slug": "2022/nextflow-summit-call-for-abstracts", + "title": "Nextflow Summit 2022", + "date": "2022-06-17T00:00:00.000Z", + "content": "\n[As recently announced](https://twitter.com/nextflowio/status/1534903352810676224), we are super excited to host a new Nextflow community event late this year! The Nextflow Summit will take place **October 12-14, 2022** at the iconic Torre Glòries in Barcelona, with an associated [nf-core hackathon](https://nf-co.re/events/2022/hackathon-october-2022) beforehand.\n\n### Call for abstracts\n\nToday we’re excited to open the call for abstracts! We’re looking for talks and posters about anything and everything happening in the Nextflow world. Specifically, we’re aiming to shape the program into four key areas:\n\n- Nextflow: central tool / language / plugins\n- Community: pipelines / applications / use cases\n- Ecosystem: infrastructure / environments\n- Software: containers / tool packaging\n\nSpeaking at the summit will primarily be in-person, but we welcome posters from remote attendees. Posters will be submitted digitally and available online during and after the event. Talks will be streamed live and be available after the event.\n\n

\n Apply for a talk or poster\n

\n\n### Key dates\n\nRegistration for the event will happen separately, with key dates as follows (subject to change):\n\n- Jun 17: Call for abstracts opens\n- July 1: Registration opens\n- July 22: Call for abstracts closes\n- July 29: Accepted speakers notified\n- Sept 9: Registration closes\n- Oct 10-12: Hackathon\n- Oct 12-14: Summit\n\nAbstracts will be read and speakers notified on a rolling basis, so apply soon!\n\nThe Nextflow Summit will start Weds, Oct 12, 5:00 PM CEST and close Fri, Oct 14, 1:00 PM CEST.\n\n### Travel bursaries\n\nThanks to funding from the Chan Zuckerberg Initiative [EOSS Diversity & Inclusion grant](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), we are offering 5 bursaries for travel and accommodation. These will only be available to those who have applied to present a talk or poster and will cover up to $1500 USD, plus registration costs.\n\nIf you’re interested, please select this option when filling the abstracts application form and we will be in touch with more details.\n\n### Stay in the loop\n\nMore information about the summit will be available soon, as we continue to plan the event. Please visit [https://summit.nextflow.io](https://summit.nextflow.io) for details and to sign up to the email list for event updates.\n\n

\n Subscribe for updates\n

\n\nWe will be tweeting about the event using the [#NextflowSummit](http://twitter.com/hashtag/NextflowSummit) hashtag on Twitter. See you in Barcelona!\n", + "images": [] + }, + { + "slug": "2022/rethinking-containers-for-cloud-native-pipelines", + "title": "Rethinking containers for cloud native pipelines", + "date": "2022-10-13T00:00:00.000Z", + "content": "\nContainers have become an essential part of well-structured data analysis pipelines. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and [reproducible results](https://www.nature.com/articles/nbt.3820).\n\nNextflow was one of the first workflow technologies to fully embrace [containers](https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html) for data analysis pipelines. Community curated container collections such as [BioContainers](https://biocontainers.pro/) also helped speed container adoption.\n\nHowever, the increasing complexity of data analysis pipelines and the need to deploy them across different clouds and platforms pose new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task.\n\nAlso, multi-cloud deployments and the increased use of private container registries further increase complexity for developers. Building and maintaining containers, pushing them to multiple registries, and dealing with platform-specific authentication schemes are tedious, time consuming, and a source of potential errors.\n\n## Wave – a game changer\n\nFor these reasons, we decided to fundamentally rethink how containers are deployed and managed in Nextflow. Today we are thrilled to announce Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.\n\nInstead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself. This approach helps simplify development, improves reliability, and makes pipelines easier to maintain. It can even improve pipeline performance.\n\n## How container provisioning works with Wave\n\nInstead of creating container images, pushing them to registries, and referencing them using Nextflow's [container](https://www.nextflow.io/docs/latest/process.html#container) directive, Wave allows developers to simply include a Dockerfile in the directory where a process is defined.\n\nWhen a process runs, the new Wave plug-in for Nextflow takes the Dockerfile and submits it to the Wave service. Wave then builds a container on-the-fly, pushes it to a destination container registry, and returns the container used for the actual process execution. The Wave service also employs caching at multiple levels to ensure that containers are built only once or when there is a change in the corresponding Dockerfile.\n\nThe registry where images are stored can be specified in the Nextflow config file, along with the other pipeline settings. This means containers can be served from cloud registries closer to where pipelines execute, delivering better performance and reducing network traffic.\n\n![Wave diagram](/img/wave-diagram.png)\n\n## Nextflow, Wave, and Conda – a match made in heaven\n\n[Conda](https://conda.io/) is an excellent package manager, fully [supported in Nextflow](https://www.nextflow.io/blog/2018/conda-support-has-landed.html) as an alternative to using containers to manage software dependencies in pipelines. However, until now, Conda could not be easily used in cloud-native computing platforms such as AWS Batch or Kubernetes.\n\nWave provides developers with a powerful new way to leverage Conda in Nextflow by using a [conda](https://www.nextflow.io/docs/latest/process.html#conda) directive as an alternative way to provision containers in their pipelines. When Wave encounters the `conda` directive in a process definition, and no container or Dockerfile is present, Wave automatically builds a container based on the Conda recipe using the strategy described above. Wave makes this process exceptionally fast (at least compared to vanilla Conda) by leveraging with the [Micromamba](https://github.com/mamba-org/mamba) project under the hood.\n\n## Support for private registries\n\nA long-standing problem with containers in Nextflow was the lack of support for private container registries. Wave solves this problem by acting as an authentication proxy between the Docker client requesting the container and a target container repository. Wave relies on [Nextflow Tower](https://seqera.io/tower/) to authenticate user requests to container registries.\n\nTo access private container registries from a Nextflow pipeline, developers can simply specify their Tower access token in the pipeline configuration file and store their repository credentials in [Nextflow Tower](https://help.tower.nf/22.2/credentials/overview/) page in your account. Wave will automatically and securely use these credentials to authenticate to the private container registry.\n\n## But wait, there's more! Container augmentation!\n\nBy automatically building and provisioning containers, Wave dramatically simplifies how containers are handled in Nextflow. However, there are cases where organizations are required to use validated containers for security or policy reasons rather than build their own images, but still they need to provide additional functionality, like for example, adding site-specific scripts or logging agents while keeping the base container layers intact.\n\nNextflow allows for the definition of pipeline level (and more recently module level) scripts executed in the context of the task execution environment. These scripts can be made accessible to the container environment by mounting a host volume. However, this approach only works when using a local or shared file system.\n\nWave solves these problems by dynamically adding one or more layers to an existing container image during the container image download phase from the registry. Developers can use container augmentation to inject an arbitrary payload into any container without re-building it. Wave then recomputes the image's final manifest adding new layers and checksums on-the-fly, so that the final downloaded image reflects the added content.\n\nWith container augmentation, developers can include a directory called `resources` in pipeline [module directories](https://www.nextflow.io/docs/latest/dsl2.html#module-directory). When the corresponding containerized task is executed, Wave automatically mirrors the content of the resources directory in the root path of the container where it can be accessed by scripts running within the container.\n\n## A sneak preview of Fusion file system\n\nOne of the main motivations for implementing Wave is that we wanted to have the ability to easily package a Fusion client in containers to make this important functionality readily available in Nextflow pipelines.\n\nFusion implements a virtual distributed file system and presents a thin-client allowing data hosted in AWS S3 buckets to be accessed via the standard POSIX filesystem interface expected by the pipeline tools. This client runs in the task container and is added automatically via the Wave augmentation capability. This makes Fusion functionality available for pipeline execution at runtime.\n\nThis means the Nextflow pipeline can use an AWS S3 bucket as the work directory, and pipeline tasks can access the S3 bucket natively as a local file system path. This is an important innovation as it avoids the additional step of copying files in and out of object storage. Fusion takes advantage for the Nextflow tasks segregation and idempotent execution model to optimise and speedup file access operations.\n\n## Getting started\n\nWave requires Nextflow version 22.10.0 or later and can be enabled by using the `-with-wave` command line option or by adding the following snippet in your nextflow.config file:\n\n```\nwave {\n enabled = true\n strategy = 'conda,container'\n}\n\ntower {\n accessToken = \"\"\n}\n```\n\nThe use of the Tower access token is not mandatory, however, it's required to enable the access to private repositories. The use of authentication also allows higher service rate limits compared to anonymous users. You can run a Nextflow pipeline such as rnaseq-nf with Wave, as follows:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-wave\n```\n\nThe configuration in the nextflow.config snippet above will enable the provisioning of Wave containers created starting from the `conda` requirements specified in the pipeline processes.\n\nYou can find additional information and examples in the Nextflow [documentation](https://www.nextflow.io/docs/latest/wave.html) and in the Wave [showcase project](https://github.com/seqeralabs/wave-showcase).\n\n## Availability\n\nThe Wave container provisioning service is available free of charge as technology preview to all Nextflow and Tower users. Wave supports all major container registries including [Docker Hub](https://hub.docker.com/), [Quay.io](https://quay.io/), [AWS Elastic Container Registry](https://aws.amazon.com/ecr/), [Google Artifact Registry](https://cloud.google.com/artifact-registry) and [Azure Container Registry](https://azure.microsoft.com/en-us/products/container-registry/).\n\nDuring the preview period, anonymous users can build up to 10 container images per day and pull 100 containers per hour. Tower authenticated users can build 100 container images per hour and pull 1000 containers per minute. After the preview period, we plan to make the Wave service available free of charge to academic users and open-source software (OSS) projects.\n\n## Conclusion\n\nSoftware containers greatly simplify the deployment of complex data analysis pipelines. However, there still have been many challenges preventing organizations from fully unlocking the potential of this exciting technology. For too long, containers have been viewed as a replacement for package managers, but they serve a different purpose.\n\nIn our view, it's time to re-consider containers as monolithic artifacts that are assembled separately from pipeline code. Instead, containers should be viewed simply as an execution substrate facilitating the deployment of the pipeline software dependencies defined via a proper package manager such as Conda.\n\nWave, Nextflow, and Nextflow Tower combine to fully automate the container lifecycle including management, provisioning and dependencies of complex data pipelines on-demand while removing unnecessary error-prone manual steps.\n", + "images": [] + }, + { + "slug": "2022/turbocharging-nextflow-with-fig", + "title": "Turbo-charging the Nextflow command line with Fig!", + "date": "2022-09-22T00:00:00.000Z", + "content": "\nNextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn't be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options.\n\n![xkcd charge making fun of tar tricky command line arguments](/img/xkcd_tar_charge.png)\nhttps://xkcd.com/1168/\n\nMany tools have sprung up to make the command-line more user friendly, such as tldr pages and rich-click. [Fig](https://fig.io) is one such tool that adds powerful autocomplete functionality to your terminal. Fig gives you graphical popups with color-coded contexts more dynamic than shaded text for recent commands or long blocks of text after pressing tab.\n\nFig is compatible with most terminals, shells and IDEs (such as the VSCode terminal), is fully supported in MacOS, and has beta support for Linux and Windows. In MacOS, you can simply install it with `brew install --cask fig` and then running the `fig` command to set it up.\n\nWe have now added Nextflow for Fig. Thanks to Figs open source core we were able to contribute specifications in Typescript that will now be automatically added for anyone installing or updating Fig. Now, with Fig, when you start typing your Nextflow commands, you’ll see autocomplete suggestions based on what you are typing and what you have typed in the past, such as your favorite options.\n\n![GIF with a demo of nextflow log/list subcommands](/img/nxf-log-list-params.gif)\n\nThe Fig autocomplete functionality can also be adjusted to suit our preferences. Suggestions can be displayed in alphabetical order or as a list of your most recent commands. Similarly, suggestions can be displayed all the time or only when you press tab.\n\nThe Fig specification that we've written not only suggests commands and options, but dynamic inputs too. For example, finding previous run names when resuming or cleaning runs is tedious and error prone. Similarly, pipelines that you’ve already downloaded with `nextflow pull` will be autocompleted if they have been run in the past. You won't have to remember the full names anymore, as Fig generators in the autocomplete allow you to automatically complete the run name after typing a few letters where a run name is expected. Importantly, this also works for pipeline names!\n\n![GIF with a demo of nextflow pull/run/clean/view/config subcommands](/img/nxf-pull-run-clean-view-config.gif)\n\nFig for Nextflow will make you increase your productivity regardless of your user level. If you run multiple pipelines during your day you will immediately see the benefit of Fig. Your productivity will increase by taking advantage of this autocomplete function for run and project names. For Nextflow newcomers it will provide an intuitive way to explore the Nextflow CLI with built-in help text.\n\nWhile Fig won’t replace the need to view help menus and documentation it will undoubtedly save you time and energy searching for commands and copying and pasting run names. Take your coding to the next level using Fig!\n", + "images": [] + }, + { + "slug": "2023/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer", + "title": "A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer”", + "date": "2023-06-19T00:00:00.000Z", + "content": "\nMost support tickets crossing our desks don’t warrant a blog article. However, occasionally we encounter a genuine mystery—a bug so pervasive and vile that it threatens innocent containers and pipelines everywhere. Such was the case of the **_OOM killer_**.\n\nIn this article, we alert our colleagues in the Nextflow community to the threat. We also discuss how to recognize the killer’s signature in case you find yourself dealing with a similar murder mystery in your own cluster or cloud.\n\n\n\n## To catch a killer\n\nIn mid-2022, Nextflow jobs began to mysteriously die. Containerized tasks were being struck down in the prime of life, seemingly at random. By November, the body count was beginning to mount: Out-of-memory (OOM) errors were everywhere we looked!\n\nIt became clear that we had a serial killer on our hands. Unfortunately, identifying a suspect turned out to be easier said than done. Nextflow is rather good at restarting failed containers after all, giving the killer a convenient alibi and plenty of places to hide. Sometimes, the killings went unnoticed, requiring forensic analysis of log files.\n\nWhile we’ve made great strides, and the number of killings has dropped dramatically, the killer is still out there. In this article, we offer some tips that may prove helpful if the killer strikes in your environment.\n\n## Establishing an MO\n\nFortunately for our intrepid investigators, the killer exhibited a consistent _modus operandi_. Containerized jobs on [Amazon EC2](https://aws.amazon.com/ec2/) were being killed due to out-of-memory (OOM) errors, even when plenty of memory was available on the container host. While we initially thought the killer was native to the AWS cloud, we later realized it could also strike in other locales.\n\nWhat the killings had in common was that they tended to occur when Nextflow tasks copied large files from Amazon S3 to a container’s local file system via the AWS CLI. As some readers may know, Nextflow leverages the AWS CLI behind the scenes to facilitate data movement. The killer’s calling card was an `[Errno 12] Cannot allocate memory` message, causing the container to terminate with an exit status of 1.\n\n```\nNov-08 21:54:07.926 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:MARKDUPLICATES:BAM_TO_CRAM:SAMTOOLS_STATS_CRAM (004-005_L3.SSHT82)'\nCaused by:\n Essential container in task exited\n..\nCommand error:\n download failed: s3://myproject/NFTower-Ref/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory\n```\n\nThe problem is illustrated in the diagram below. In theory, Nextflow should have been able to dispatch multiple containerized tasks to a single host. However, tasks were being killed with out-of-memory errors even though plenty of memory was available. Rather than being able to run many containers per host, we could only run two or three and even that was dicey! Needless to say, this resulted in a dramatic loss of efficiency.\n\n\n\nAmong our crack team of investigators, alarm bells began to ring. We asked ourselves, _“Could the killer be inside the house?”_ Was it possible that Nextflow was nefariously killing its own containerized tasks?\n\nBefore long, reports of similar mysterious deaths began to trickle in from other jurisdictions. It turned out that the killer had struck [Cromwell](https://cromwell.readthedocs.io/en/stable/) also ([see the police report here](https://github.com/aws/aws-cli/issues/5876)). We breathed a sigh of relief that we could rule out Nextflow as the culprit, but we still had a killer on the loose and a series of container murders to solve!\n\n## Recreating the scene of the crime\n\nAs any good detective knows, recreating the scene of the crime is a good place to start. It turned out that our killer had a profile and had been targeting containers processing large datasets since 2020. We came across an excellent [codefresh.io article](https://codefresh.io/blog/docker-memory-usage/) by Saffi Hartal, discussing similar murders and suggesting techniques to lure the killer out of hiding and protect the victims. Unfortunately, the suggested workaround of periodically clearing kernel buffers was impractical in our Nextflow pipeline scenario.\n\nWe borrowed the Python script from [Saffi’s article](https://codefresh.io/blog/docker-memory-usage/) designed to write huge files and simulate the issues we saw with the Linux buffer and page cache. Using this script, we hoped to replicate the conditions at the time of the murders.\n\nUsing separate SSH sessions to the same docker host, we manually launched the Python script from the command line to run in a Docker container, allocating 512MB of memory to each container. This was meant to simulate the behavior of the Nextflow head job dispatching multiple tasks to the same Docker host. We monitored memory usage as each container was started.\n\n```bash\n$ docker run --rm -it -v $PWD/dockertest.py:/dockertest.py --entrypoint /bin/bash --memory=\"512M\" --memory-swap=0 python:3.10.5-slim-bullseye\n```\n\nSure enough, we found that containers began dying with out-of-memory errors. Sometimes we could run a single container, and sometimes we could run two. Containers died even though memory use was well under the cgroups-enforced maximum, as reported by docker stats. As containers ran, we also used the Linux `free` command to monitor memory usage and the combined memory used by kernel buffers and the page cache.\n\n## Developing a theory of the case\n\nFrom our testing, we were able to clear both Nextflow and the AWS S3 copy facility since we could replicate the out-of-memory error in our controlled environment independent of both.\n\nWe had multiple theories of the case: **_Was it Colonel Mustard with an improper cgroups configuration? Was it Professor Plum and the size of the SWAP partition? Was it Mrs. Peacock running a Linux 5.20 kernel?_**\n\n_For the millennials and Gen Zs in the crowd, you can find a primer on the CLUE/Cluedo references [here](https://en.wikipedia.org/wiki/Cluedo)_\n\nTo make a long story short, we identified several suspects and conducted tests to clear each suspect one by one. Tests included the following:\n\n- We conducted tests with EBS vs. NVMe disk volumes to see if the error was related to page caches when using EBS. The problems persisted with NVMe but appeared to be much less severe.\n- We attempted to configure a swap partition as recommended in this [AWS article](https://repost.aws/knowledge-center/ecs-resolve-outofmemory-errors), which discusses similar out-of-memory errors in Amazon ECS (used by AWS Batch). AWS provides good documentation on managing container [swap space](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-swap.html) using the `--memory-swap` switch. You can learn more about how Docker manages swap space in the [Docker documentation](https://docs.docker.com/config/containers/resource_constraints/).\n- Creating swap files on the Docker host and making swap available to containers using the switch `--memory-swap=\"1g\"` appeared to help, and we learned a lot in the process. Using this workaround we could reliably run 10 containers simultaneously, whereas previously, we could run only one or two. This was a good workaround for static clusters but wasn’t always helpful in cloud batch environments. Creating the swap partition requires root privileges, and in batch environments, where resources may be provisioned automatically, this could be difficult to implement. It also didn’t explain the root cause of why containers were being killed. You can use the commands below to create a swap partition:\n\n```bash\n$ sudo dd if=/dev/zero of=/mnt/2GiB.swap bs=2048 count=1048576\n$ mkswap /mnt/2GiB.swap\n$ swapon /mnt/2GiB.swap\n```\n\n## A break in the case!\n\nOn Nov 16th, we finally caught a break in the case. A hot tip from Seqera Lab’s own [Jordi Deu-Pons](https://github.com/jordeu), indicated the culprit may be lurking in the Linux kernel. He suggested hard coding limits for two Linux kernel parameters as follows:\n\n```bash\n$ echo \"838860800\" > /proc/sys/vm/dirty_bytes\n$ echo \"524288000\" > /proc/sys/vm/dirty_background_bytes\n```\n\nWhile it may seem like a rather unusual and specific leap of brilliance, our tipster’s hypothesis was inspired by this [kernel bug](https://bugzilla.kernel.org/show_bug.cgi?id=207273) description. With this simple change, the reported memory usage for each container, as reported by docker stats, dropped dramatically. **Suddenly, we could run as many containers simultaneously as physical memory would allow.** It turns out that this was a regression bug that only manifested in newer versions of the Linux kernel.\n\nBy hardcoding these [kernel parameters](https://docs.kernel.org/admin-guide/sysctl/vm.html), we were limiting the number of dirty pages the kernel could hold before writing pages to disk. When these variables were not set, they defaulted to 0, and the default parameters `dirty_ratio` and `dirty_bakground_ratio` took effect instead.\n\nIn high-load conditions (such as data-intensive Nextflow pipeline tasks), processes accumulated dirty pages faster than the kernel could flush them to disk, eventually leading to the out-of-memory condition. By hard coding the dirty pages limit, we forced the kernel to flush the dirty pages to disk, thereby avoiding the bug. This also explained why the problem was less pronounced using NVMe storage, where flushing to disk occurred more quickly, thus mitigating the problem.\n\nFurther testing determined that the bug appeared reliably on the newer [AMI Linux 2 AMI using the 5.10 kernel](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-linux-2-ami-kernel-5-10/). The bug did not seem to appear when using the older Amazon Linux 2 AMI running the 4.14 kernel version.\n\nWe now had two solid strategies to resolve the problem and thwart our killer:\n\n- Create a swap partition and run containers with the `--memory-swap` flag set.\n- Set `dirty_bytes` and `dirty_background_bytes` kernel variables on the Docker host before launching the jobs.\n\n## The killer is (mostly) brought to justice\n\nAvoiding the Linux 5.10 kernel was obviously not a viable option. The 5.10 kernel includes support for important processor architectures such as Intel® Ice Lake. This bug did not manifest earlier because, by default, AWS Batch was using ECS-optimized AMIs based on the 4.14 kernel. Further testing showed us that the killer could still appear in 4.14 environments, but the bug was harder to trigger.\n\nWe ended up working around the problem for Nextflow Tower users by tweaking the kernel parameters in the compute environment deployed by Tower Forge. This solution works reliably with AMIs based on both the 4.14 and 5.10 kernels. We considered adding a swap partition as this was another potential solution to the problem. However, we were concerned that this could have performance implications, particularly for customers running with EBS gp2 magnetic disk storage.\n\nInterestingly, we also tested the [Fusion v2 file system](https://seqera.io/fusion/) with NVMe disk. Using Fusion, we avoided the bug entirely on both kernel versions without needing to adjust kernel partitions or add a swap partition.\n\n## Some helpful investigative tools\n\nIf you find evidence of foul play in your cloud or cluster, here are some useful investigative tools you can use:\n\n- After manually starting a container, use [docker stats](https://docs.docker.com/engine/reference/commandline/stats/) to monitor the CPU and memory used by each container compared to available memory.\n\n ```bash\n $ watch docker stats\n ```\n\n- The Linux [free](https://linuxhandbook.com/free-command/) utility is an excellent way to monitor memory usage. You can track total, used, and free memory and monitor the combined memory used by kernel buffers and page cache reported in the _buff/cache_ column.\n\n ```bash\n $ free -h\n ```\n\n- After a container was killed, we executed the command below on the Docker host to confirm why the containerized Python script was killed.\n\n ```bash\n $ dmesg -T | grep -i ‘killed process’\n ```\n\n- We used the Linux [htop](https://man7.org/linux/man-pages/man1/htop.1.html) command to monitor CPU and memory usage to check the results reported by Docker and double-check CPU and memory use.\n- You can use the command [systemd-cgtop](https://www.commandlinux.com/man-page/man1/systemd-cgtop.1.html) to validate group settings and ensure you are not running into arbitrary limits imposed by _cgroups_.\n- Related to the _cgroups_ settings described above, you can inspect various memory-related limits directly from the file system. You can also use an alias to make the large numbers associated with _cgroups_ parameters easier to read. For example:\n\n ```bash\n $ alias n='numft --to=iec-i'\n $ cat /sys/fs/cgroup/memory/docker/DOCKER_CONTAINER/memory.limit_in_bytes | n\n 512Mi\n ```\n\n- You can clear the kernel buffer and page cache that appears in the buff/cache columns reported by the Linux _free_ command using either of these commands:\n\n ```bash\n $ echo 1 > /proc/sys/vm/drop_caches\n $ sysctl -w vm.drop_caches=1\n ```\n\n## The bottom line\n\nWhile we’ve come a long way in bringing the killer to justice, out-of-memory issues still crop up occasionally. It’s hard to say whether these are copycats, but you may still run up against this bug in a dark alley near you!\n\nIf you run into similar problems, hopefully, some of the suggestions offered above, such as tweaking kernel parameters or adding a swap partition on the Docker host, can help.\n\nFor some users, a good workaround is to use the [Fusion file system](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) instead of Nextflow’s conventional approach based on the AWS CLI. As explained above, the combination of more efficient data handling in Fusion and fast NVMe storage means that dirty pages are flushed more quickly, and containers are less likely to reach hard limits and exit with an out-of-memory error.\n\nYou can learn more about the Fusion file system by downloading the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/). If you encounter similar issues or have ideas to share, join the discussion on the [Nextflow Slack channel](https://join.slack.com/t/nextflow/shared_invite/zt-11iwlxtw5-R6SNBpVksOJAx5sPOXNrZg).\n", + "images": [ + "/img/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer-1.jpg" + ] + }, + { + "slug": "2023/best-practices-deploying-pipelines-with-hpc-workload-managers", + "title": "Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters", + "date": "2023-05-26T00:00:00.000Z", + "content": "\nWith all the focus on cloud computing, it's easy to forget that most Nextflow pipelines still run on traditional HPC clusters. In fact, according to our latest [State of the Workflow 2023](https://seqera.io/blog/the-state-of-the-workflow-the-2023-nextflow-and-nf-core-community-survey/) community survey, **62.8%** of survey respondents report running Nextflow on HPC clusters, and **75%** use an HPC workload manager.1 While the cloud is making gains, traditional clusters aren't going away anytime soon.\n\nTapping cloud infrastructure offers many advantages in terms of convenience and scalability. However, for organizations with the capacity to manage in-house clusters, there are still solid reasons to run workloads locally:\n\n- _Guaranteed access to resources_. Users don't need to worry about shortages of particular instance types, spot instance availability, or exceeding cloud spending caps.\n- _Predictable pricing_. Organizations are protected against price inflation and unexpected rate increases by capitalizing assets and depreciating them over time.\n- _Reduced costs_. Contrary to conventional wisdom, well-managed, highly-utilized, on-prem clusters are often less costly per core hour than cloud-based alternatives.\n- _Better performance and throughput_. While HPC infrastructure in the cloud is impressive, state-of-the-art on-prem clusters are still tough to beat.2\n\nThis article provides some helpful tips for organizations running Nextflow on HPC clusters.\n\n## The anatomy of an HPC cluster\n\nHPC Clusters come in many shapes and sizes. Some are small, consisting of a single head node and a few compute hosts, while others are huge, with tens or even hundreds of host computers.\n\nThe diagram below shows the topology of a typical mid-sized HPC cluster. Clusters typically have one or more \"head nodes\" that run workload and/or cluster management software. Cluster managers, such as [Warewulf](https://warewulf.lbl.gov/), [xCAT](https://xcat.org/), [NVIDIA Bright Cluster Manager](https://www.nvidia.com/en-us/data-center/bright-cluster-manager/), [HPE Performance Cluster Manager](https://www.hpe.com/psnow/doc/a00044858enw), or [IBM Spectrum Cluster Foundation](https://www.ibm.com/docs/en/scf/4.2.2?topic=guide-spectrum-cluster-foundation), are typically used to manage software images and provision cluster nodes. Large clusters may have multiple head nodes, with workload management software configured to failover if the master host fails.\n\n\n\nLarge clusters may have dedicated job submission hosts (also called login hosts) so that user activity does not interfere with scheduling and management activities on the head node. In smaller environments, users may simply log in to the head node to submit their jobs.\n\nClusters are often composed of different compute hosts suited to particular workloads.3 They may also have separate dedicated networks for management, internode communication, and connections to a shared storage subsystem. Users typically have network access only to the head node(s) and job submission hosts and are prevented from connecting to the compute hosts directly.\n\nDepending on the workloads a cluster is designed to support, compute hosts may be connected via a private high-speed 100 GbE or Infiniband-based network commonly used for MPI parallel workloads. Cluster hosts typically have access to a shared file system as well. In life sciences environments, NFS filers are commonly used. However, high-performance clusters may use parallel file systems such as [Lustre](https://www.lustre.org/), [IBM Spectrum Scale](https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html) (formerly GPFS), [BeeGFS](https://www.beegfs.io/c/), or [WEKA](https://www.weka.io/data-platform/solutions/hpc-data-management/).\n\n[Learn about selecting the right storage architecture for your Nextflow pipelines](https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html).\n\n## HPC workload managers\n\nHPC workload managers have been around for decades. Initial efforts date back to the original [Portable Batch System](https://www.chpc.utah.edu/documentation/software/pbs-scheduler.php) (PBS) developed for NASA in the early 1990s. While modern workload managers have become enormously sophisticated, many of their core principles remain unchanged.\n\nWorkload managers are designed to share resources efficiently between users and groups. Modern workload managers support many different scheduling policies and workload types — from parallel jobs to array jobs to interactive jobs to affinity/NUMA-aware scheduling. As a result, schedulers have many \"knobs and dials\" to support various applications and use cases. While complicated, all of this configurability makes them extremely powerful and flexible in the hands of a skilled cluster administrator.\n\n### Some notes on terminology\n\nHPC terminology can be confusing because different terms sometimes refer to the same thing. Nextflow refers to individual steps in a workflow as a \"process\". Sometimes, process steps spawned by Nextflow are also described as \"tasks\". When Nextflow processes are dispatched to an HPC workload manager, however, each process is managed as a \"job\" in the context of the workload manager.\n\nHPC workload managers are sometimes referred to as schedulers. In this text, we use the terms HPC workload manager, workload manager, and scheduler interchangeably.\n\n## Nextflow and HPC workload managers\n\nNextflow supports at least **14 workload managers**, not including popular cloud-based compute services. This number is even higher if one counts variants of popular schedulers. For example, the Grid Engine executor works with Altair® Grid Engine™ as well as older Grid Engine dialects, including Oracle Grid Engine (previously Sun Grid Engine), Open Grid Engine (OGE), and SoGE (son of Grid Engine). Similarly, the PBS integration works successors to the original OpenPBS project, including Altair® PBS Professional®, TORQUE, and Altair's more recent open-source version, OpenPBS.4 Workload managers supported by Nextflow are listed below:\n\n\n\nBelow we present some helpful tips and best practices when working with HPC workload managers.\n\n## Some best practices\n\n### 1. Select an HPC executor\n\nTo ensure that pipelines are portable across clouds and HPC clusters, Nextflow uses the notion of [executor](https://nextflow.io/docs/latest/executor.html) to insulate pipelines from the underlying compute environment. A Nextflow executor determines the system where a pipeline is run and supervises its execution.\n\nYou can specify the executor to use in the [nextflow.config](https://nextflow.io/docs/latest/config.html?highlight=queuesize#configuration-file) file, inline in your pipeline code, or by setting the shell variable `NXF_EXECUTOR` before running a pipeline.\n\n```groovy\nprocess.executor = 'slurm'\n```\n\nExecutors are defined as part of the process scope in Nextflow, so in theory, each process can have a different executor. You can use the [local](https://www.nextflow.io/docs/latest/executor.html?highlight=local#local) executor to run a process on the same host as the Nextflow head job rather than dispatching it to an HPC cluster.\n\nA complete list of available executors is available in the [Nextflow documentation](https://nextflow.io/docs/latest/executor.html). Below is a handy list of executors for HPC workload managers.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Workload Manager\n \n Executor\n \n License\n \n Documentation\n
\n Slurm\n \n slurm\n \n Open source\n \n Slurm\n
\n IBM Spectrum LSF\n \n lsf\n \n Commercial\n \n IBM Spectrum LSF knowledge center\n
\n OpenPBS\n \n pbspro\n \n Open source\n \n OpenPBS (docs packaged with software)\n
\n Altair® Grid Engine™\n \n sge\n \n Commercial\n \n Altair Grid Engine introductory guide\n
\n Altair® PBS Professional®\n \n pbspro\n \n Commercial\n \n Altair PBS Professional user's guide\n
\n Adaptive Computing MOAB\n \n moab\n \n Commercial\n \n Adaptive Computing Maui Scheduler5\n
\n Adaptive Computing TORQUE\n \n pbs\n \n Open source\n \n Torque administrators guide\n
\n HTCondor\n \n condor\n \n Open source\n \n HTCondor documentation\n
\n Apache Ignite\n \n ignite\n \n Open source\n \n Apache Ignite Documentation\n
\n HyperQueue\n \n hyperqueue\n \n Open source\n \n Docs on GitHub\n
\n\n### 2. Select a queue\n\nMost HPC workload managers support the notion of queues. In a small cluster with a few users, queues may not be important. However, they are essential in large environments. Cluster administrators typically configure queues to reflect site-specific scheduling and resource-sharing policies. For example, a site may have a short queue that only supports short-running jobs and kills them after 60 seconds. A _night_ queue may only dispatch jobs between midnight and 6:00 AM. Depending on the sophistication of the workload manager, different queues may have different priorities and access to queues may be limited to particular users or groups.\n\nWorkload managers typically have default queues. For example, `normal` is the default queue in LSF, while `all.q` is the default queue in Grid Engine. Slurm supports the notion of partitions that are essentially the same as queues, so Slurm partitions are referred to as queues within Nextflow. You should ask your HPC cluster administrator what queue to use when submitting Nextflow jobs.\n\nLike the executor, queues are part of the process scope. The queue to dispatch jobs to is usually defined once in the `nextflow.config` file and applied to all processes in the workflow as shown below, or it can be set per-process.\n\n```\nprocess {\n queue = 'myqueue'\n executor = 'sge'\n}\n```\n\nSome organizations use queues as a mechanism to request particular types of resources. For example, suppose hosts with the latest NVIDIA A100 or K100 GPUs are in high demand. In that case, a cluster administrator may configure a particular queue called `gpu_queue` to dispatch jobs to those hosts and limit access to specific users. For process steps requiring access to GPUs, the administrator may require submitting jobs to this queue. This is why it is important to consult site-specific documentation or ask your cluster administrator which queues are available.\n\n### 3. Specify process-level resource requirements\n\nDepending on the executor, you can pass various resource requirements for each process/job to the workload manager. Like _executors_ and _queues_, these settings are configured at the process level. Not all executors support the same resource directives, but the settings below are common to most HPC workload managers.\n\n[cpus](https://nextflow.io/docs/latest/process.html#process-cpus) – specifies the number of logical CPUs requested for a particular process/job. A logical CPU maps to a physical processor core or thread depending on whether hyperthreading is enabled on the underlying cluster hosts.\n\n[memory](https://nextflow.io/docs/latest/process.html#process-memory) – different process steps/jobs will typically have different memory requirements. It is important to specify memory requirements accurately because the HPC schedulers use this information to decide how many jobs can execute concurrently on a host. If you overstate resource requirements, you are wasting resources on the cluster.\n\n[time](https://nextflow.io/docs/latest/process.html#process-time) – it is helpful to limit how much time a particular process or job is allowed to run. To avoid jobs hanging and consuming resources indefinitely, you can specify a time limit after which a job will be automatically terminated and re-queued. Time limits may also be enforced at the queue level behind the scenes based on workload management policies. If you have long-running jobs, your cluster administrator may ask you to use a particular queue for those Nextflow process steps to prevent jobs from being automatically killed.6\n\nWhen writing pipelines, it is a good practice to consolidate per-process resource requirements in the `nextflow.config` file, and use process selectors to indicate what resource requirements apply to what process steps. For example, in the example below, processes will be dispatched to the Slurm cluster by default. Each process will require two cores, 4 GB of memory, and can run for no more than 10 minutes. For the foo and long-running bar jobs, process-specific selectors can override these default settings as shown below:\n\n```groovy\nprocess {\n executor = 'slurm'\n queue = 'general'\n cpus = 2\n memory = '4 GB'\n time = '10m'\n\n\n withName: foo {\n cpus = 8\n memory = '8 GB'\n }\n\n\n withName: bar {\n queue = 'long'\n cpus = 32\n memory = '8 GB'\n time = '1h 30m'\n }\n}\n```\n\n### 4. Take advantage of workload manager-specific features\n\nSometimes, organizations may want to take advantage of syntax specific to a particular workload manager. To accommodate this, most Nextflow executors provide a `clusterOptions` setting to inject one or more switches to the job submission command line specific to the selected workload manager ([bsub](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options), [msub](http://docs.adaptivecomputing.com/maui/commands/msub.php), [qsub](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html), etc).\n\nThese scheduler-specific commands can get very detailed and granular. They can apply to all processes in a workflow or only to specific processes. As an LSF-specific example, suppose a deep learning model training workload is a step in a Nextflow pipeline. The deep learning framework used may be GPU-aware and have specific topology requirements.\n\nIn this example, we specify a job consisting of two tasks where each task runs on a separate host and requires exclusive use of two GPUs. We also impose a resource requirement that we want to schedule the CPU portion of each CUDA job in physical proximity to the GPU to improve performance (on a processor core close to the same PCIe or NVLink connection, for example).\n\n```groovy\nprocess {\n withName: dl_workload {\n executor = 'lsf'\n queue = 'gpu_hosts'\n memory = '16B'\n clusterOptions = '-gpu \"num=2:mode=exclusive_process\" -n2 -R \"span[ptile=1] affinity[core(1)]\"'\n }\n}\n```\n\nIn addition to `clusterOptions`, several other settings in the [executor scope](https://nextflow.io/docs/latest/config.html?highlight=queuesize#scope-executor) can be helpful when controlling how jobs behave on an HPC workload manager.\n\n### 5. Decide where to launch your pipeline\n\nLaunching jobs from a head node is common in small HPC clusters. Launching jobs from dedicated job submission hosts (sometimes called login hosts) is more common in large environments. Depending on the workload manager, the head node or job submission host will usually have the workload manager’s client tools pre-installed. These include client binaries such as `sbatch` (Slurm), `qsub` (PBS or Grid Engine), or `bsub` (LSF). Nextflow expects to be able to find these job submission commands on the Linux `PATH`.\n\nRather than launching the Nextflow driver job for a long-running pipeline from the head node or a job submission host, a better practice is to wrap the Nextflow run command in a script and submit the entire workflow as a job. An example using LSF is provided below:\n\n```\n$ cat submit_pipeline.sh\n#!/bin/bash\n#BSUB -q Nextflow\n#BSUB -m \"hostgroupA\"\n#BSUB -o out.%J\n#BSUB -e err.%J\n#BSUB -J headjob\n#BSUB -R \"rusage[mem=16GB]\"\nnextflow run nextflow=io/hello -c my.config -ansi-log false\n\n\n$ bsub < submit_pipeline.sh\n```\n\nThe specifics will depend on the cluster environment and how the environment is configured. For this to work, the job submission commands must also be available on the execution hosts to which the head job is dispatched. This is not always the case, so you should check with your HPC cluster administrator.\n\nDepending on the workload manager, check your queue or cluster configuration to ensure that submitted jobs can spawn other jobs and that you do not bump up against hard limits. For example, Slurm by default allows a job step to spawn up to 512 tasks per node by default.7\n\n### 6. Limit your heap size\n\nSetting the JVM’s max heap size is another good practice when running on an HPC cluster. The Nextflow runtime runs on top of a Java virtual machine which by design, tries to allocate as much memory as possible. To avoid this, specify the maximum amount of memory that can be used by the Java VM using the `-Xms` and `-Xmx` Java flags.\n\nThese can be specified using the `NXF_OPTS` environment variable.\n\n```bash\nexport NFX_OPTS=\"-Xms=512m -Xmx=8g\"\n```\n\nThe `-Xms` flag specifies the minimum heap size, and -Xmx specifies the maximum heap size. In the example above, the minimum heap size is set to 512 MB, which can grow to a maximum of 8 GB. You will need to experiment with appropriate values for each pipeline to determine how many concurrent head jobs you can run on the same host.\n\nFor more information about memory management with Java, consult this [Oracle documentation regarding tuning JVMs](https://docs.oracle.com/cd/E21764_01/web.1111/e13814/jvm_tuning.htm#PERFM150).\n\n### 7. Use the scratch directive\n\nNextflow requires a shared file system path as a working directory to allow the pipeline tasks to share data with each other. When using this model, a common practice is to use the node's local scratch storage as the working directory. This avoids cluster nodes needing to simultaneously read and write files to a shared network file system, which can become a bottleneck.\n\nNextflow implements this best practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```groovy\nprocess.scratch = true\n```\n\nBy default, if you enable `process.scratch`, Nextflow will use the directory pointed to by `$TMPDIR` as a scratch directory on the execution host.\n\nYou can optionally specify a specific path for the scratch directory as shown:\n\n```groovy\nprocess.scratch = '/ssd_drive/scratch_dir'\n```\n\nWhen the scratch directive is enabled, Nextflow:\n\n- Creates a unique directory for process execution in the supplied scratch directory;\n- Creates a symbolic link in the scratch directory for each input file in the shared work directory required for job execution;\n- Runs the job using the local scratch path as the working directory;\n- Copies output files to the job's shared work directory on the shared file system when the job is complete.\n\nScratch storage is particularly beneficial for process steps that perform a lot of file system I/O or create large numbers of intermediate files.\n\nTo learn more about Nextflow and how it works with various storage architectures, including shared file systems, check out our recent article [Selecting the right storage architecture for your Nextflow pipelines](https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html).\n\n### 8. Launch pipelines in the background\n\nIf you are launching your pipeline from a login node or cluster head node, it is useful to run pipelines in the background without losing the execution output reported by Nextflow. You can accomplish this by using the -bg switch in Nextflow and redirecting _stdout_ to a log file as shown:\n\n```bash\nnextflow run -bg > my-file.log\n```\n\nThis frees up the interactive command line to run commands such as [squeue](https://slurm.schedmd.com/squeue.html) (Slurm) or [qstat](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html) (Grid Engine) to monitor job execution on the cluster. It is also beneficial because it prevents network connection issues from interfering with pipeline execution.\n\nNextflow has rich terminal logging and uses ANSI escape codes to update pipeline execution counters interactively as the pipeline runs. If you are logging output to a file as shown above, it is a good idea to disable ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`. ANSI logging can also be disabled when wrapping the Nextflow head job in a script and launching it as a job managed by the workload manager as explained above.\n\n### 9. Retry failing jobs after increasing resource allocation\n\nGetting resource requirements such as cpu, memory, and time is often challenging since resource requirements can vary depending on the size of the dataset processed by each job step. If you request too much resource, you end up wasting resources on the cluster and reducing the effectiveness of the compute environment for everyone. On the other hand, if you request insufficient resources, process steps can fail.\n\nTo address this problem, Nextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure on the fly and attempt to re-execute it using a higher limit. For example:\n\n```groovy\nprocess {\n withName: foo {\n memory = { 2.GB * task.attempt }\n time = { 1.hour * task.attempt }\n\n errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries = 3\n }\n}\n```\n\nYou can manage how many times a job can be retried and specify different behaviours depending on the exit error code. You will see this automated mechanism used in many production pipelines. It is a common practice to double the resources requested after a failure until the job runs successfully.\n\nFor sites running Nextflow Tower, Tower has a powerful resource optimization facility built in that essentially learns per-process resource requirements from previously executed pipelines and auto-generates resource requirements that can be placed in a pipeline's `nextflow.config` file. By using resource optimization in Tower, pipelines will request only the resources that they actually need. This avoids unnecessary delays due to failed/retried jobs and also uses the shared cluster more efficiently.\n\nTower resource optimizations works with all HPC workload managers as well as popular cloud services. You can learn more about resource optimization in the article [Optimizing resource usage with Nextflow Tower](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/).\n\n### 10. Cloud Bursting\n\nCloud bursting is a configuration method in hybrid cloud environments where cloud computing resources are used automatically whenever on-premises infrastructure reaches peak capacity. The idea is that when sites run out of compute capacity on their local infrastructure, they can dynamically burst additional workloads to the cloud.\n\nWith its built-in support for cloud executors, Nextflow handles bursting to the cloud with ease, but it is important to remember that large HPC sites run other workloads beyond Nextflow pipelines. As such, they often have their own bursting solutions tightly coupled to the workload manager.\n\nCommercial HPC schedulers tend to have facilities for cloud bursting built in. While there are many ways to enable burstings, and implementations vary by workload manager, a few examples are provided here:\n\n- Open source Slurm provides a native mechanism to burst workloads to major cloud providers when local cluster resources are fully subscribed. To learn more, see the Slurm [Cloud Scheduling Guide](https://slurm.schedmd.com/elastic_computing.html).\n- IBM Spectrum LSF provides a cloud resource connector enabling policy-driven cloud bursting to various clouds. See the [IBM Spectrum LSF Resource Connector](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsf-resource-connnector) documentation for details.\n- Altair PBS Professional also provides sophisticated support for cloud bursting to multiple clouds, with cloud cost integration features that avoid overspending in the cloud. [See PBS Professional 2022.1](https://altair.service-now.com/community?sys_id=0e9b07dadbf8d150cfd5f6a4e2961997&view=sp&id=community_blog&table=sn_communities_blog).\n- Adaptive Computing offers [Moab Cloud/NODUS Cloud Bursting](https://support.adaptivecomputing.com/wp-content/uploads/2018/08/Moab_Cloud-NODUS_Cloud_Bursting_datasheet_web.pdf), a commercial offering that works with an extensive set of resource providers including AliCloud, OCI, OpenStack, VMware vSphere, and others.\n\nData handling makes cloud bursting complex. Some HPC centers deploy solutions that provide a consistent namespace where on-premises and cloud-based nodes have a consistent view of a shared file system.\n\nIf you are in a larger facility, it's worth having a discussion with your HPC cluster administrator. Cloud bursting may be handled automatically for you. You may be able to use the executor associated with your on-premises workload manager, and simply point your workloads to a particular queue. The good news is that Nextflow provides you with tremendous flexibility.\n\n### 11. Fusion file system\n\nTraditionally, on-premises clusters have used a local shared file system such as NFS or Lustre. The new Fusion file system provides an alternative way to manage data.\n\nFusion is a lightweight, POSIX-compliant file system deployed inside containers that provides transparent access to cloud-based object stores such as Amazon S3. While users running pipelines on local clusters may not have considered using cloud storage, doing so has some advantages:\n\n- Cloud object storage is economical for long-term storage.\n- Object stores such as Amazon S3 provided virtually unlimited capacity.\n- Many reference datasets in life sciences already reside in cloud object stores.\n\nIn cloud computing environments, Fusion FS has demonstrated that it can improve pipeline throughput by up to **2.2x** and reduce long-term cloud storage costs by up to **76%**. To learn more about Fusion file systems and how it works, you can download the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nRecently, Fusion support has been added for selected HPC workload managers including Slurm, IBM Spectrum LSF, and Grid Engine. This is an exciting development as it enables on-premises cluster users to seamlessly run workload locally using cloud-based storage with minimal configuration effort.\n\n### 12. Additional configuration options\n\nThere are several additional Nextflow configuration options that are important to be aware of when working with HPC clusters. You can find a complete list in the Netflow documentation in the [Scope executor](https://nextflow.io/docs/latest/config.html#scope-executor) section.\n\n`queueSize` – The queueSize parameter is optionally defined in the `nextflow.config` file or within a process and defines how many Nextflow processes can be queued in the selected workload manager at a given time. By default, this value is set to 100 jobs. In large sites with multiple users, HPC cluster administrators may limit the number of pending or executing jobs per user on the cluster. For example, on an LSF cluster, this is done by setting the parameter `MAX_JOBS` in the `lsb.users` file to enforce per user or per group slot limits. If your administrators have placed limits on the number of jobs you can run, you should tune the `queueSize` parameter in Nextflow to match your site enforced maximums.\n\n`submitRateLimit` – Depending on the scheduler, having many users simultaneously submitting large numbers of jobs to a cluster can overwhelm the scheduler on the head node and cause it to become unresponsive to commands. To mitigate this, if your pipeline submits a large number of jobs, it is a good practice to throttle the rate at which jobs will be dispatched from Nextflow. By default the job submission rate is unlimited. If you wanted to allow no more than 50 jobs to be submitted every two minutes, set this parameter as shown:\n\n```groovy\nexecutor.submitRateLimit = '50/2min'\nexecutor.queueSize = 50\n```\n\n`jobName` – Many workload managers have interactive web interfaces or downstream reporting or analysis tools for monitoring or analyzing workloads. A few examples include [Slurm-web](http://rackslab.github.io/slurm-web/introduction.html), [MOAB HPC Suite](https://adaptivecomputing.com/moab-hpc-suite/) (MOAB and Torque), [Platform Management Console](https://www.ibm.com/docs/en/pasc/1.1.1?topic=asc-platform-management-console) (for LSF), [Spectrum LSF RTM](https://www.ibm.com/docs/en/spectrum-lsf-rtm/10.2.0?topic=about-spectrum-lsf-rtm), and [Altair® Access™](https://altair.com/access).\n\nWhen using these tools, it is helpful to associate a meaningful name with each job. Remember, a job in the context of the workload manager maps to a process or task in Nextflow. Use the `jobName` property associated with the executor to give your job a name. You can construct these names dynamically as illustrated below so the job reported by the workload manager reflects the name of our Nextflow process step and its unique ID.\n\n```groovy\nexecutor.jobName = { \"$task.name - $task.hash\" }\n```\n\nYou will need to make sure that generated name matches the validation constraints of the underlying workload manager. This also makes troubleshooting easier because it allows you to cross reference Nextflow log files with files generated by the workload manager.\n\n## The bottom line\n\nIn addition to supporting major cloud environments, Nextflow works seamlessly with a wide variety of on-premises workload managers. If you are fortunate enough to have access to large-scale compute infrastructure at your facility, taking advantage of these powerful HPC workload management integrations is likely the way to go.\n\n
\n\n1While this may sound like a contradiction, remember that HPC workload managers can also run in the cloud.\n\n2A cloud vCPU is equivalent to a thread on a multicore CPU, and HPC workloads often run with hyperthreading disabled for the best performance. As a result, you may need 64 vCPUs in the cloud to match the performance of a 32-core processor SKU on-premises. Similarly, interconnects such as Amazon Elastic Fabric Adapter (EFA) deliver impressive performance. However, even with high-end cloud instance types, its 100 Gbps throughput falls short compared to interconnects such as [NDR InfiniBand](https://www.hpcwire.com/2020/11/16/nvidia-mellanox-debuts-ndr-400-gigabit-infiniband-at-sc20/) and [HPE Cray Slingshot](https://www.nextplatform.com/2022/01/31/crays-slingshot-interconnect-is-at-the-heart-of-hpes-hpc-and-ai-ambitions/), delivering 400 Gbps or more.\n\n3While MPI parallel jobs are less common in Nextflow pipelines, sites may also run fluid dynamics, computational chemistry, or molecular dynamics workloads using tools such as [NWChem](https://www.nwchem-sw.org/) or [GROMACS](https://www.gromacs.org/) that rely on MPI and fast interconnects to facilitate efficient inter-node communication.\n\n4Altair’s open-source OpenPBS is distinct from the original OpenPBS project released in 1998 of the same name.\n\n5MOAB HPC is a commercial product offered by Adaptive Computing. Its scheduler is based on the open source Maui scheduler.\n\n6For some workload managers, knowing how much time a job is expected to run is considered in scheduling algorithms. For example, suppose it becomes necessary to preempt particular jobs when higher priority jobs come along or because of resource ownership issues. In that case, the scheduler may take into account actual vs. estimated runtime to avoid terminating long-running jobs that are closed to completion.\n\n7[MaxTasksPerNode](https://slurm.schedmd.com/slurm.conf.html#OPT_MaxTasksPerNode) setting is configurable in the slurm.conf file.\n\n8Jobs that request large amounts of resource often pend in queues and take longer to schedule impacting productivity as there may be fewer candidate hosts available that meet the job’s resource requirement.\n", + "images": [ + "/img/nextflow-on-big-iron-twelve-tips-for-improving-the-effectiveness-of-pipelines-on-hpc-clusters-1.jpg", + "/img/nextflow-on-big-iron-twelve-tips-for-improving-the-effectiveness-of-pipelines-on-hpc-clusters-2.jpg" + ] + }, + { + "slug": "2023/celebrating-our-largest-international-training-event-and-hackathon-to-date", + "title": "Celebrating our largest international training event and hackathon to date", + "date": "2023-04-25T00:00:00.000Z", + "content": "\nIn mid-March, we conducted our bi-annual Nextflow and [nf-core](https://nf-co.re/) training and hackathon in what was unquestionably our best-attended community events to date. This year we had an impressive **1,345 participants** attend the training from **76 countries**. Attendees came from far and wide — from Algeria to Andorra to Zambia to Zimbabwe!\n\nAmong our event attendees, we observed the following statistics:\n\n- 40% were 30 years old or younger, pointing to a young cohort of Nextflow users;\n- 55.3% identified as male vs. 40% female, highlighting our growing diversity;\n- 68.2% came from research institutions;\n- 71.4% were attending their first Nextflow training event;\n- 96.7% had never attended a Nextflow hackathon.\n\nRead on to learn more about these exciting events. If you missed it, you can still [watch the Nextflow & nf-core training](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo) at your convenience.\n\n\n\n## Multilingual training\n\nThis year, we were pleased to offer [Nextflow / nf-core training](https://nf-co.re/events/2023/training-march-2023) in multiple languages: in addition to English, we delivered sessions in French, Hindi, Portuguese, and Spanish.\n\nIn our pre-event registration, **~88%** of respondents indicated they would watch the training in English. However, there turned out to be a surprising appetite for training in other languages. We hope that multilingual training will make Nextflow even more accessible to talented scientists and researchers around the world.\n\nThe training consisted of four separate sessions in **5 languages** for a total of **20 sessions**. As of April 19th, we’ve amassed over **6,600 YouTube views** with **2,300+ hours** of training watched so far. **27%** have watched the non-English sessions, making the effort at translation highly worthwhile.\n\nThank you to the following people who delivered the training: [Chris Hakkaart](https://twitter.com/Chris_Hakk) (English), [Marcel Ribeiro-Dantas](https://twitter.com/mribeirodantas) (Portuguese), [Maxime Garcia](https://twitter.com/gau) (French), [Julia Mir Pedrol](https://twitter.com/juliamirpedrol) and [Gisela Gabernet](https://twitter.com/GGabernet) (Spanish), and [Abhinav Sharma](https://twitter.com/abhi18av) (Hindi).\n\nYou can view the community training sessions on YouTube here:\n\n- [March 2023 Community Training – English](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo)\n- [March 2023 Community Training – Portugese](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhi41yDYhyHitUhIcUHIbJg)\n- [March 2023 Community Training – French](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhiv9SjhoA1EDOXj9nzIqdS)\n- [March 2023 Community Training – Spanish](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhSlCWVoa3GURacuLWeFc8O)\n- [March 2023 Community Training – Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ)\n\nThe videos accompany the written training material, which you can find at [https://training.nextflow.io/](https://training.nextflow.io/)\n\n## Improved community training resources\n\nAlong with the updated training and hackathon resources above, we’ve significantly enhanced our online training materials available at [https://training.nextflow.io/](https://training.nextflow.io/). Thanks to the efforts of our volunteers, technical training, [Gitpod resources](https://training.nextflow.io/basic_training/setup/#gitpod), and materials for hands-on, self-guided learning are now available in English and Portuguese. Some of the materials are also available in Spanish and French.\n\nThe training comprises a significant set of resources covering topics including managing dependencies, containers, channels, processes, operators, and an introduction to the Groovy language. It also includes topics related to nf-core for users and developers as well as Nextflow Tower. Marcel Ribeiro-Dantas describes his experience leading the translation effort for this documentation in his latest nf-core/bytesize [translation talk](https://nf-co.re/events/2023/bytesize_translations).\n\nAdditional educational resources are provided in the recent Seqera Labs blog article, [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html), posted in February before our latest training event.\n\n## The nf-core hackathon\n\nWe also ran a separate [hackathon](https://nf-co.re/events/2023/hackathon-march-2023) event from March 27th to 29th. This hackathon ran online via Gather, a virtual hosting platform, but for the first time we also asked community members to host local sites. We were blown away by the response, with volunteers coming forward to organize in-person attendance in 16 different locations across the world (and this was before we announced that Seqera would organize pizza for all the sites!). These gatherings had a big impact on the feel of the hackathon, whilst remaining accessible and eco-friendly, avoiding the need for air travel.\n\nThe hackathon was divided into five focus areas: modules, pipelines, documentation, infrastructure, and subworkflows. We had **411** people register, including **278 in-person attendees** at **16 locations**. This is an increase of **38%** compared to the **289** people that attended our October 2022 event. The hackathon was hosted in multiple countries including Brazil, France, Germany, Italy, Poland, Senegal, Serbia, South Africa, Spain, Sweden, the UK, and the United States.\n\nWe would like to thank the many organizations worldwide who provided a venue to host the hackathon and helped make it a resounding success. Besides being an excellent educational event, we resolved many longstanding Nextflow and nf-core issues.\n\n
\n \"Hackathon\n
\n\nYou can access the project reports from each hackathon team over the three-day event compiled in HackMD below:\n\n- [Modules team](https://hackmd.io/A5v4soteQjKywl3UgFa_6g)\n- [Pipelines Team](https://hackmd.io/Bj_MK3ubQWGBD4t0X2KpjA)\n- [Documentation Team](https://hackmd.io/o6AgPTZ7RBGCyZI72O1haA)\n- [Infrastructure Team](https://hackmd.io/uC-mZlEXQy6DaXZdjV6akA)\n- [Subworkflows Team](https://hackmd.io/Udtvj4jASsWLtMgrbTNwBA)\n\nYou can also view ten Hackathon videos outlining the event, introducing an overview of the teams, and daily hackathon activities in the [March 2023 nf-core hackathon YouTube playlist](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhfyF_QJIfSslnxRCU817yc). Check out activity in the nf-core hackathon March 2023 Github [issues board](https://github.com/orgs/nf-core/projects/38/views/16?layout=board) for a summary of what each team worked on.\n\n## A diverse and growing community\n\nWe were particularly pleased to see the growing diversity of the Nextflow and nf-core community membership, enabled partly by support from the Chan Zuckerberg Initiative Diversity and Inclusion grant and our nf-core mentorship programs. You can learn more about our mentorship efforts and exciting efforts of our global team in Chris Hakkaart’s excellent post, [Nextflow and nf-core Mentorship](https://nextflow.io/blog/2023/czi-mentorship-round-2.html) on the Nextflow blog.\n\nThe growing diversity of our community was also reflected in the results of our latest Nextflow Community survey, which you can read more about on the [Seqera Labs blog](https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results/).\n\n
\n \"Hackathon\n
\n\n## Looking forward\n\nRunning global events at this scale takes a tremendous team effort. The resources compiled will be valuable in introducing more people to Nextflow and nf-core. Thanks to everyone who participated in this year’s training and hackathon events. We look forward to making these even bigger and better in the future!\n\nThe next community training will be held online September 2023. This will be followed by two Nextflow Summit events with associated nf-core hackathons:\n\n- Barcelona: October 16-20, 2023\n- Boston: November 2023 (dates to be confirmed)\n\nIf you’d like to join, you can register to receive news and updates about the events at [https://summit.nextflow.io/summit-2023-preregistration/](https://summit.nextflow.io/summit-2023-preregistration/)\n\nYou can follow us on Twitter at [@nextflowio](https://twitter.com/nextflowio) or [@nf_core](https://twitter.com/nf_core) or join the discussion on the [Nextflow](https://www.nextflow.io/slack-invite.html) and [nf-core](https://nf-co.re/join) community Slack channels.\n\n
\n \"Hackathon\n
\n\n
\n \"Hackathon\n
\n", + "images": [ + "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-1.jpg", + "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-2.jpg", + "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-3.jpg", + "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-4.jpg" + ] + }, + { + "slug": "2023/community-forum", + "title": "Introducing community.seqera.io", + "date": "2023-10-18T00:00:00.000Z", + "content": "\nWe are very excited to introduce the [Seqera community forum](https://community.seqera.io/) - the new home of the Nextflow community!\n\n

community.seqera.io

\n\nThe growth of the Nextflow community over recent years has been phenomenal. The Nextflow Slack organization was launched in early 2022 and has already reached a membership of nearly 3,000 members. As we look ahead to growing to 5,000 and even 50,000, we are making a new tool available to the community: a community forum.\n\nWe expect the new forum to coexist with the Nextflow Slack. The forum will be great at medium-format discussion, whereas Slack is largely designed for short-term ephemeral conversations. We want to support this growth of the community and believe the new forum will allow us to scale.\n\nDiscourse is an open-source, web-based platform designed for online community discussions and forum-style interactions. Discourse offers a user-friendly interface, real-time notifications, and a wide range of customization options. It prioritizes healthy and productive conversations by promoting user-friendly features, such as trust levels, gamification, and robust moderation tools. Discourse is well known for its focus on fostering engaging and respectful discussions and already caters to many large developer communities. It’s able to serve immense groups, giving us confidence that it will meet the needs of our growing developer community just as well. We believe that Discourse is a natural fit for the evolution of the Nextflow community.\n\n

\n\nThe community forum offers many exciting new features. Here are some of the things you can expect:\n\n- **Open content:** Content on the new forum is public – accessible without login, indexed by Google, and can be linked to directly. This means that it will be much easier to find answers to your problems, as well as share solutions on other platforms.\n- **Don’t ask the same thing twice:** It’s not always easy to find answers when there’s a lot of content available. The community forum helps you by suggesting similar topics as you write a new post. An upcoming [Discourse AI Bot](https://www.discourse.org/plugins/ai.html) may even allow you to ask questions using natural language in the future!\n- **Stay afloat:** The community forum will ensure developers have a space where they can post without fear that what they write might be drowned out, and where anything that our community finds useful will rise to the top of the list. Discourse will give life to threads with high-quality content that may have otherwise gone unnoticed and lost in a sea of new posts.\n- **Better organized:** The forum model for categories, tags, threads, and quoting forces conversations to be structured. Many questions involve the broader Nextflow ecosystem, tagging with multiple topics will cut through the noise and allow people to participate in targeted and well-labeled discussions. Importantly, maintainers can move miscategorized posts without asking the original author to delete and write again.\n- **Multi-product:** The forum has categories for Nextflow but also [Seqera Platform](https://seqera.io/platform/), [MultiQC](https://seqera.io/multiqc/), [Wave](https://seqera.io/wave/), and [Fusion](https://seqera.io/fusion/). Questions that involve multiple Seqera products can now span these boundaries, and content can be shared between posts easily.\n- **Community recognition:** The community forum will encourage a healthy ecosystem of developers that provides value to everyone involved and rewards the most active users. The new forum encourages positive community behaviors through features such as badges, a trust system, and community moderation. There’s even a [community leaderboard](https://community.seqera.io/leaderboard/)! We plan to gradually introduce additional features over time as adoption grows.\n\nOnline discussion platforms have been the beating heart of the Nextflow community from its inception. The first was a Google groups email list, which was followed by the Gitter instant messaging platform, GitHub Discussions, and most recently, Slack. We’re thrilled to embark on this new chapter of the Nextflow community – let us know what you think and ask any questions you might have in the [“Site Feedback” forum category](https://community.seqera.io/c/community/site-feedback/2)! Join us today at [https://community.seqera.io](https://community.seqera.io/) for a new and improved developer experience.\n\n

Visit the Seqera community forum

\n", + "images": [ + "/img/seqera-community-all.png" + ] + }, + { + "slug": "2023/czi-mentorship-round-2", + "title": "Nextflow and nf-core Mentorship, Round 2", + "date": "2023-04-17T00:00:00.000Z", + "content": "\n## Introduction\n\n
\n \"Mentorship\n

Nextflow and nf-core mentorship rocket.

\n
\n\nThe global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation.\n\nThanks to the support of our Chan Zuckerberg Initiative Diversity and Inclusion grant, we established an international Nextflow and nf-core mentoring program. With the second round of the mentorship program now complete, we celebrate the success of the most recent cohort of mentors and mentees.\n\nFrom hundreds of applications, thirteen pairs of mentors and mentees were chosen for the second round of the program. For the past four months, they met regularly to collaborate on Nextflow or nf-core projects. The project scope was left up to the mentees, enabling them to work on any project aligned with their scientific interests and schedules.\n\nMentor-mentee pairs worked on a range of projects that included learning Nextflow and nf-core fundamentals, setting up Nextflow on their institutional clusters, translating Nextflow training material into other languages, and developing and implementing Nextflow and nf-core pipelines. Impressively, despite many mentees starting the program with very limited knowledge of Nextflow and nf-core, they completed the program with confidence and improved their abilities to develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor and mentee pairs](/img/mentorships-round2-map.png)
\n_The second round of the mentorship program was global._\n\n## Jing Lu (Mentee) & Moritz Beber (Mentor)\n\nJing joined the program with the goal of learning how to develop advanced Nextflow pipelines for disease surveillance at the Guangdong Provincial Center for Diseases Control and Prevention in China. His mentor was Moritz Beber from Denmark.\n\nTogether, Jing and Moritz developed a pipeline for the analysis of SARS-CoV-2 genomes from sewage samples. They also used GitHub and docker containers to make the pipeline more sharable and reproducible. In the future, Jing hopes to use Nextflow Tower to share the pipeline with other institutions.\n\n## Luria Leslie Founou (Mentee) & Sebastian Malkusch (Mentor)\n\nLuria's goal was to accelerate her understanding of Nextflow and apply it to her exploration of the resistome, virulome, mobilome, and phylogeny of bacteria at the Research Centre of Expertise and Biological Diagnostic of Cameroon. Luria was mentored by Sebastian Malkusch, Kolja Becker, and Alex Peltzer from the Boehringer Ingelheim Pharma GmbH & Co. KG in Germany.\n\nFor their project, Luria and her mentors developed a [pipeline](https://github.com/SMLMS/nfml) for mapping multi-dimensional feature space onto a discrete or continuous response variable by using multivariate models from the field of classical machine learning. Their pipeline will be able to handle classification, regression, and time-to-event models and can be used for model training, validation, and feature selection.\n\n## Sebastian Musundi (Mentee) & Athanasios Baltzis (Mentor)\n\nSebastian, from Mount Kenya University in Kenya, joined the mentorship program with the goal of using Nextflow pipelines to identify vaccine targets in Apicomplexan parasites. He was mentored by Athanasios Balzis from the Centre for Genomic Regulation in Spain.\n\nWith Athanasios’s help, Sebastian learned the fundamentals for developing Nextflow pipelines. During the learning process, they developed a [pipeline](https://github.com/sebymusundi/simple_RNA-seq) for customized RNA sequencing and a [pipeline](https://github.com/sebymusundi/AMR_pipeline) for predicting antimicrobial resistance genes. With his new skills, Sebastian plans to keep using Nextflow on a daily basis and start contributing to nf-core.\n\n## Juan Ugalde (Mentee) & Robert Petit (Mentor)\n\nJuan joined the mentorship program with the goal of improving his understanding of Nextflow to support microbial and viral analysis at the Universidad Andres Bello in Chile. Juan was mentored by Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert is an experienced Nextflow mentor who also mentored in Round 1 of the program.\n\nJuan and Robert shared an interest in viral genomics. After learning more about the Nextflow and nf-core ecosystem, Robert mentored Juan as he developed a Nextflow viral amplicon analysis [pipeline](https://github.com/gene2dis/hantaflow). Juan will continue his Nextflow and nf-core journey by sharing his new knowledge with his group and incorporating it into his classes in the coming semester.\n\n## Bhargava Reddy Morampalli (Mentee) & Venkat Malladi (Mentor)\n\nBhargava studies at Massey University in New Zealand and joined the program with the goal of improving his understanding of Nextflow and resolving issues he was facing while developing a pipeline to analyze Nanopore direct RNA sequencing data. Bhargava was mentored by Venkat Malladi from Microsoft in the USA.\n\nBhargava and Venkat worked on Bhargava’s [pipeline](https://github.com/bhargava-morampalli/rnamods-nf/) to identify RNA modifications from bacteria. Their successes included advancing the pipeline and making Singularity images for the tools Bhargava was using to make it more reproducible. For Bhargava, the mentorship program was a great kickstart for learning Nextflow and his pipeline development. He hopes to continue to develop his pipeline and optimize it for cloud platforms in the future.\n\n## Odion Ikhimiukor (Mentee) & Ben Sherman (Mentor)\n\nBefore the program, Odion, who is at the University at Albany in the USA, was new to Nextflow and nf-core. He joined the program with the goal of improving his understanding and to learn how to develop pipelines for bacterial genome analysis. His mentor Ben Sherman works for Seqera Labs in the USA.\n\nDuring the program Odion and Ben developed a [pipeline](https://github.com/odionikh/nf-practice) to analyze bacterial genomes for antimicrobial resistance surveillance. They also developed configuration settings to enable the deployment of their pipeline with high and low resources. Odion has plans to share his new knowledge with others in his community.\n\n## Batool Almarzouq (Mentee) & Murray Wham (Mentor)\n\nBatool works at the King Abdullah International Medical Research Center in Saudi Arabia. Her goal for the mentorship program was to contribute to, and develop, nf-core pipelines.\nAdditionally, she aimed to develop new educational resources for nf-core that can support researchers from lowly represented groups. Her mentor was Murray Wham from the ​​University of Edinburgh in the UK.\n\nDuring the mentorship program, Murray helped Batool develop her molecular dynamics pipeline and participate in the 1st Biohackathon in MENA (KAUST). Batool and Murray also found ways to make documentation more accessible and are actively promoting Nextlfow and nf-core in Saudi Arabia.\n\n## Mariama Telly Diallo (Mentee) & Emilio Garcia (Mentor)\n\nMariama Telly joined the mentorship program with the goal of developing and implementing Nextflow pipelines for malaria research at the Medical Research Unit at The London School of Hygiene and Tropical Medicine in Gambia. She was mentored by Emilio Garcia from Platomics in Austria. Emilio is another experienced mentor who joined the program for a second time.\n\nTogether, Mariama Telly and Emilio worked on learning the basics of Nextflow, Git, and Docker. Putting these skills into practice they started to develop a Nextflow pipeline with a docker file and custom configuration. Mariama Telly greatly improved her understanding of best practices and Nextflow and intends to use her newfound knowledge for future projects.\n\n## Anabella Trigila (Mentee) & Matthias De Smet (Mentor)\n\nAnabella’s goal was to set up Nextflow on her institutional cluster at Héritas S.A. in Argentina and translate some bash pipelines into Nextflow pipelines. Anabella was mentored by Matthias De Smet from Ghent University in Belgium.\n\nAnabella and Matthias worked on developing several new nf-core modules. Extending this, they started the development of a [pipeline](https://github.com/atrigila/nf-core-saliva) to process VCFs obtained from saliva samples and a [pipeline](https://github.com/atrigila/nf-core-ancestry) to infer ancestry from VCF samples. Anabella has now transitioned from a user to a developer and made multiple contributions to the most recent nf-core hackathon. She also contributed to the Spanish translation of the Nextflow [training material](https://training.nextflow.io/es/).\n\n## Juliano de Oliveira Silveira (Mentee) & Maxime Garcia (Mentor)\n\nJuliano works at the Laboratório Central de Saúde Pública RS in Brazil. He joined the program with the goal of setting up Nextflow at his institution, which led him to learn to write his own pipelines. Juliano was mentored by Maxime Garcia from Seqera Labs in Sweden.\n\nJuliano and Maxime worked on learning about Nextflow and nf-core. Juliano applied his new skills to an open-source bioinformatics program that used Nextflow with a customized R script. Juliano hopes to give back to the wider community and peers in Brazil.\n\n## Patricia Agudelo-Romero (Mentee) & Abhinav Sharma (Mentor)\n\nPatricia's goal was to create, customize, and deploy nf-core pipelines at the Telethon Kids Institute in Australia. Her mentor was Abhinav Sharma from Stellenbosch University in South Africa.\n\nAbhinav helped Patricia learn how to write reproducible pipelines with Nextflow and how to work with shared code repositories on GitHub. With Abhinav's support, Patricia worked on translating a Snakemake [pipeline](https://github.com/agudeloromero/everest_nf) designed for genome virus identification and classification into Nextflow. Patricia is already applying her new skills and supporting others at her institute as they adopt Nextflow.\n\n## Mariana Guilardi (Mentee) & Alyssa Briggs (Mentor)\n\nMariana’s goal was to learn the fundamentals of Nextflow, construct and run pipelines, and help with nf-core pipeline development. Her mentor was Alyssa Briggs from the University of Texas at Dallas in the USA\n\nAt the start of the program, Alyssa helped Mariana learn the fundamentals of Nextflow. With Alyssa’s help, Mariana’s skills progressed rapidly and by the end of the program, they were running pipelines and developing new nf-core modules and the [nf-core/viralintegration](https://github.com/nf-core/viralintegration) pipeline. Mariana also made community contributions to the Portuguese translation of the Nextflow [training material](https://training.nextflow.io/pt/).\n\n## Liliane Cavalcante (Mentee) & Marcel Ribeiro-Dantas (Mentor)\n\nLiliane’s goal was to develop and apply Nextflow pipelines for genomic and epidemiological analyses at the Laboratório Central de Saúde Pública Noel Nutels in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera Labs in Brazil.\n\nLiliane and Marcel used Nextflow and nf-core to analyze SARS-CoV-2 genomes and demographic data for public health surveillance. They used the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline and made a new Nextflow script for additional analysis and generating graphs.\n\n## Conclusion\n\nAs with the first round of the program, the feedback about the second round of the mentorship program was overwhelmingly positive. All mentees found the experience to be highly beneficial and were grateful for the opportunity to participate.\n\n
\n “Having a mentor guide through the entire program was super cool. We worked all the way from the basics of Nextflow and learned a lot about developing and debugging pipelines. Today, I feel more confident than before in using Nextflow on a daily basis.” - Sebastian Musundi (Mentee)\n
\n\nSimilarly, the mentors also found the experience to be highly rewarding.\n\n
\n “As a mentor, I really enjoyed participating in the program. Not only did I have the chance to support and work with colleagues from lowly represented regions, but also I learned a lot and improved myself through the mentoring and teaching process.” - Athanasios Baltzis (Mentor)\n
\n\nImportantly, all program participants expressed their willingness to encourage others to be part of it in the future.\n\n
\n “The mentorship allows mentees not only to learn nf-core/Nextflow but also a lot of aspects about open-source reproducible research. With your learning, at the end of the mentorship, you could even contribute back to the nf-core community, which is fantastic! I would tell everyone who is interested in the program to go for it.” - Anabella Trigila (Mentee)\n
\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-lasting benefits beyond those that can be immediately measured. Mentees from the program have already become positive role models, contributing new perspectives to the broader community.\n\n
\n “I highly recommend this program. Independent if you are new to Nextflow or already have some experience, the possibility of working with amazing people to learn about the Nextflow ecosystem is invaluable. It helped me to improve my work, learn new things, and become confident enough to teach Nextflow to students.” - Juan Ugalde (Mentee)\n
\n\nWe were delighted with the achievements of the mentors and mentees. Applications for the third round are now open! For more information, please visit https://nf-co.re/mentorships.\n", + "images": [ + "/img/mentorships-round2-rocket.png" + ] + }, + { + "slug": "2023/czi-mentorship-round-3", + "title": "Nextflow and nf-core Mentorship, Round 3", + "date": "2023-11-13T00:00:00.000Z", + "content": "\n
\n \"Mentorship\n

Nextflow and nf-core mentorship rocket.

\n
\n\nWith the third round of the [Nextflow and nf-core mentorship program](https://nf-co.re/mentorships) now behind us, it's time to pop the confetti and celebrate the outstanding achievements of our latest group of mentors and mentees!\n\nAs with the [first](https://www.nextflow.io/blog/2022/czi-mentorship-round-1.html) and [second](https://www.nextflow.io/blog/2023/czi-mentorship-round-2.html) rounds of the program, we received hundreds of applications from all over the world. Mentors and mentees were matched based on compatible interests and time zones and set off to work on a project of their choosing. Pairs met regularly to work on their projects and reported back to the group to discuss their progress every month.\n\nThe mentor-mentee duos chose to tackle many interesting projects during the program. From learning how to develop pipelines with Nextflow and nf-core, setting up Nextflow on their institutional clusters, and translating Nextflow training materials into other languages, this cohort of mentors and mentees did it all. Regardless of all initial challenges, every pair emerged from the program brimming with confidence and a knack for building scalable and reproducible scientific workflows with Nextlfow. Way to go, team!\n\n![Map of mentor and mentee pairs](/img/mentorship_3_map.png)
\n_Participants of the third round of the mentorship program._\n\n## Abhay Rastogi and Matthias De Smet\n\nAbhay Rastogi is a Clinical Research Fellow at the All India Institute Of Medical Sciences (AllMS Delhi). During the program, he wanted to contribute to the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline. He was mentored by Matthias De Smet, a Bioinformatician at the Center for Medical Genetics in the Ghent University Hospital. Together they worked on developing an nf-core module for Exomiser, a variant prioritization tool for short-read WGS data that they hope to incorporate into [nf-core/sarek](https://github.com/nf-core/sarek/). Keep an eye out for this brand new feature as they continue to work towards implementing this new feature into the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline!\n\n## Alan Möbbs and Simon Pearce\n\nAlan Möbbs, a Bioinformatics Analyst at MultiplAI, was mentored by Simon Pearce, Principal Bioinformatician at the Cancer Research UK Cancer Biomarker Centre. During the program, Alan wanted to create a custom pipeline that merges functionalities from the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) and [nf-core/rnavar](https://github.com/nf-core/rnavar/) pipelines. They started their project by forking the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) pipeline and adding a subworkflow with variant calling functionalities. As the project moved on, they were able to remove tools from the pipeline that were no longer required. Finally, they created some custom definitions for processing samples and work queues to optimize the workflow on AWS. Alan plans to keep working on this project in the future.\n\n## Cen Liau and Chris Hakkaart\n\nCen Liau is a scientist at the Bragato Research Institute in New Zealand, analyzing the epigenetics of grapevines in response to environmental stress. Her mentor was Chris Hakkaart, a Developer Advocate at Seqera. They started the program by deploying the [nf-core/methylseq](https://github.com/nf-core/methylseq/) pipeline on New Zealand’s national infrastructure to analyze data Cen had produced. Afterward, they started to develop a proof of concept methylation pipeline to analyze additional data Cen has produced. Along the way, they learned about nf-core best practices and how to use GitHub to build pipelines collaboratively.\n\n## Chenyu Jin and Ben Sherman\n\nChenyu Jin is a Ph.D. student at the Center for Palaeogenetics of the Swedish Museum of Natural History. She worked with Ben Sherman, a Software Engineer at Seqera. Together they worked towards establishing a workflow for recursive step-down classification using experimental Nextflow features. During the program, they made huge progress in developing a cutting-edge pipeline that can be used for analyzing ancient environmental DNA and reconstructing flora and fauna. Watch this space for future developments!\n\n## Georgie Samaha and Cristina Tuñí i Domínguez\n\nGeorgie Samaha, a bioinformatician from the University of Sydney, was mentored by Cristina Tuñi i Domínguez, a Bioinformatics Scientist at Flomics Biotech SL. During the program, they developed Nextflow configuration files. As a part of this, they built institutional configuration files for multiple national research HPC and cloud infrastructures in Australia. Towards the end of the mentorship, they [built a tool for building configuration files](https://github.com/georgiesamaha/configBuilder-nf) that they hope to share widely in the future.\n\n## Ícaro Maia Santos de Castro and Robert Petit\n\nÍcaro Maia Santos is a Ph.D. Candidate at the University of São Paulo. He was mentored by Robert, a Research Scientist from Wyoming Public Health Lab. After learning the basics of Nextflow and nf-core, they worked on a [metatranscriptomics pipeline](https://github.com/icaromsc/nf-core-phiflow) that simultaneously characterizes microbial composition and host gene expression RNA sequencing samples. As a part of this process, they used nf-core modules that were already available and developed and contributed new modules to the nf-core repository. Ícaro found having someone to help him learn and overcome issues as he was developing his pipeline was invaluable for his career.\n\n![phiflow metro map](/img/phiflow_metro_map.png)
\n_Metro map of the phiflow workflow._\n\n## Lila Maciel Rodríguez Pérez and Priyanka Surana\n\nLila Maciel Rodríguez Pérez, from the National Agrarian University in Peru, was mentored by Priyanka Surana, a researcher from the Wellcome Sanger Institute in the UK. Lila and Priyanka focused on building and deploying Nextflow scripts for metagenomic assemblies. In particular, they were interested in the identification of Antibiotic-Resistant Genes (ARG), Metal-Resistant Genes (MRG), and Mobile Genetic Elements (MGE) in different environments, and in figuring out how these genes are correlated. Both Lila and Priyanka spoke highly of each other and how much they enjoyed being a part of the program.\n\n## Luisa Sacristan and Gisela Gabernet\n\nLuisa is an MSc. student studying computational biology in the Computational Biology and Microbial Ecology group at Universidad de los Andes in Colombia. She was mentored by Gisela Gabernet, a researcher at Yale Medical School. At the start of the program, Luisa and Gisela focused on learning more about GitHub. They quickly moved on to developing an nf-core configuration file for Luisa’s local university cluster. Finally, they started developing a pipeline for the analysis of custom ONT metagenomic amplicons from coffee beans.\n\n## Natalia Coutouné and Marcel Ribeiro-Dantas\n\nNatalia Coutoné is a Ph.D. Candidate at the University of Campinas in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera. Natalia and Marcel worked on developing a pipeline to identify relevant QTL among two or more pool-seq samples. Learning the little things, such as how and where to get help was a valuable part of the learning process for Natalia. She also found it especially useful to consolidate a “Frankenstein” pipeline she had been using into a cohesive Nextflow pipeline that she could share with others.\n\n## Raquel Manzano and Maxime Garcia\n\nRaquel Manzano is a bioinformatician and Ph.D. candidate at the University of Cambridge, Cancer Research UK Cambridge Institute. She was mentored by Maxime Garcia, a bioinformatics engineer at Seqera. During the program, they spent their time developing the [nf-core/rnadnavar](https://github.com/nf-core/rnadnavar/) pipeline. Initially designed for cancer research, this pipeline identifies a consensus call set from RNA and DNA somatic variant calling tools. Both Raquel and Maxime found the program to be highly rewarding. Raquel’s [presentation](https://www.youtube.com/watch?v=PzGOvqSI5n0) about the rnadnavar pipeline and her experience as a mentee from the 2023 Nextflow Summit in Barcelona is now online.\n\n## Conclusion\n\nWe are thrilled to report that the feedback from both mentors and mentees has been overwhelmingly positive. Every participant, whether mentor or mentee, found the experience extremely valuable and expressed gratitude for the chance to participate.\n\n
\n “I loved the experience and the opportunity to develop my autonomy in nextflow/nf-core. This community is totally amazing!” - Icaro Castro\n
\n\n
\n “I think this was a great opportunity to learn about a tool that can make our day-to-day easier and reproducible. Who knows, maybe it can give you a better chance when applying for jobs.” - Alan Möbbs\n
\n\nThanks to the fantastic support of the Chan Zuckerberg Initiative Diversity and Inclusion grant, Seqera, and our fantastic community, who made it possible to run all three rounds of the Nextflow and nf-core mentorship program.\n", + "images": [ + "/img/mentorship_3_sticker.png" + ] + }, + { + "slug": "2023/geraldine-van-der-auwera-joins-seqera", + "title": "Geraldine Van der Auwera joins Seqera", + "date": "2023-10-11T00:00:00.000Z", + "content": "\n\n\nI’m excited to announce that I’m joining Seqera as Lead Developer Advocate. My mission is to support the growth of the Nextflow user community, especially in the USA, which will involve running community events, conducting training sessions, managing communications and working globally with our partners across the field to ensure Nextflow users have what they need to be successful. I’ll be working remotely from Boston, in collaboration with Paolo, Phil and the rest of the Nextflow team.\n\nSome of you may already know me from my previous job at the Broad Institute, where I spent a solid decade doing outreach and providing support for the genomics research community, first for GATK, then for WDL and Cromwell, and eventually Terra. A smaller subset might have come across the O’Reilly book I co-authored, [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/).\n\nThis new mission is very much a continuation of my dedication to helping the research community use cutting-edge software tools effectively.\n\n## From bacterial cultures to large-scale genomics\n\nTo give you a brief sense of where I’m coming from, I originally trained as a wetlab microbiologist in my homeland of Belgium, so it’s fair to say I’ve come a long way, quite literally. I never took a computing class, but taught myself Python during my PhD to analyze bacterial plasmid sequencing data (72 kb of Sanger sequence!) and sort of fell in love with bioinformatics in the process. Later, I got the opportunity to deepen my bioinformatics skills during my postdoc at Harvard Medical School, although my overall research project was still very focused on wetlab work.\n\nToward the end of my postdoc, I realized I had become more interested in the software side of things, though I didn’t have any formal qualifications. Fortunately I was able to take a big leap sideways and found a new home at the Broad Institute, where I was hired as a Bioinformatics Scientist to build out the GATK community, at a time when it was still a bit niche. (It’s a long story that I don’t have time for today, but I’m always happy to tell it over drinks at a conference reception…)\n\nThe GATK job involved providing technical and scientific support to researchers, developing documentation, and teaching workshops about genomics and variant calling specifically. Which is hilarious because at the time I was hired, I had no clue what variant calling even meant! I think I was easily a month or two into the job before that part actually started making a little bit of sense. I still remember the stress and confusion of trying to figure all that out, and it’s something I always carry with me when I think about how to help newcomers to the ecosystem. I can safely say, whatever aspect of this highly multidisciplinary field is causing you trouble, I’ve struggled with it myself at some point.\n\nAnyway, I can’t fully summarize a decade in a couple of paragraphs, but suffice to say, I learned an enormous amount on the job. And in the process, I developed a passion for helping researchers take maximum advantage of the powerful bioinformatics at their disposal. Which inevitably involves workflows.\n\n## Going with the flow\n\nOver time my responsibilities at the Broad grew into supporting not just GATK, but also the workflow systems people use to run tools like GATK at scale, both on premises and increasingly, on public cloud platforms. My own pipelining experience has been focused on WDL and Cromwell, but I’ve dabbled with most of the mainstream tools in the space.\n\nIf I had a dollar for every time I’ve been asked the question “What’s the best workflow language?” I’d still need a full-time job, but I could maybe take a nice holiday somewhere warm. Oh, and my answer is: whatever gets the work done, plays nice with the systems you’re tied to, and connects you to a community.\n\nThat’s one of the reasons I’ve been watching the growth of Nextflow’s popularity with great interest for the last few years. The amount of community engagement that we’ve seen around Nextflow, and especially around the development of nf-core, has been really impressive.\n\nSo I’m especially thrilled to be joining the Seqera team the week of the [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, because it means I’ll get to meet a lot of people from the community in person during my very first few days on the job. I’m also very much looking forward to participating in the hackathon, which should be a great way for me to get started doing real work with Nextflow.\n\nI’m hoping to see many of you there!\n", + "images": [ + "/img/geraldine-van-der-auwera.jpg" + ] + }, + { + "slug": "2023/introducing-nextflow-ambassador-program", + "title": "Introducing the Nextflow Ambassador Program", + "date": "2023-10-18T00:00:00.000Z", + "content": "\n\n\nWe are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers.\n\nNextflow ambassadors will play a vital role in:\n\n- Sharing Knowledge: Ambassadors provide valuable insights and best practices to help users make the most of Nextflow by writing training material and blog posts, giving seminars and workshops, organizing hackathons and meet-ups, and helping with community support.\n- Fostering Collaboration: As knowledgeable members of our community, ambassadors facilitate connections among users and developers, enabling collaboration on community projects, such as nf-core pipelines, sub-workflows, and modules, among other things, in the Nextflow ecosystem.\n- Community Growth: Ambassadors help expand and enrich the Nextflow community, making it more vibrant and supportive. They are local contacts for new community members and engage with potential users in their region and fields of expertise.\n\nAs community members who already actively contribute to outreach, ambassadors will be supported to extend the work they're already doing. For example, many of our ambassadors run local Nextflow training events – to help with this, the program will include “train the trainer” sessions and give access to our content library with slide decks, templates, and more. Ambassadors can also request stickers and financial support for events they organize (e.g., for pizza). Seqera is opening an exclusive travel fund that ambassadors can apply to help cover travel costs for events where they will present relevant work. Social media content written by ambassadors will be amplified by the nextflow and nf-core accounts, increasing their reach. Ambassadors will get \"behind the scenes\" access, with insights into running an open-source community, early access to new features, and a great networking experience. The ambassador network will enable members to be kept up-to-date with events and opportunities happening all over the world. To recognize their efforts, ambassadors will receive exclusive swag and apparel, a certificate for their work, and a profile on the ambassador page of our website.\n\n## Meet Our Ambassadors\n\nYou can visit our [Nextflow ambassadors page](https://www.nextflow.io/our_ambassadors.html) to learn more about our first group of ambassadors. You will find their profiles there, highlighting their interests, expertise, and insights they bring to the Nextflow ecosystem.\n\nYou can see snippets about some of our ambassadors below:\n\n#### Priyanka Surana\n\nPriyanka Surana is a Principal Bioinformatician at the Wellcome Sanger Institute, where she oversees the Nextflow development for the Tree of Life program. Over the last almost two years, they have released nine pipelines with nf-core standards and have three more in development. You can learn more about them [here](https://pipelines.tol.sanger.ac.uk/pipelines).\n\nShe’s one of our ambassadors in the UK 🇬🇧 and has already done fantastic outreach work, organizing seminars and bringing many new users to our community! 🤩 In the March Hackathon, she organized a local site with over 70 individuals participating in person, plus over five other events in 2023. The Nextflow community on the Wellcome Genome Campus started in March 2023 with the nf-core hackathon, and now it has grown to over 150 members across 11 different organizations across Cambridge. Currently, they are planning a day-long Nextflow Symposium in December 🤯. They do seminars, workshops, coffee meetups, and trainings. In our previous round of the Nextflow and nf-core mentorship, Priyanka mentored Lila, a graduate student in Peru, to build her first Nextflow pipeline using nf-core tools to analyze bacterial metagenomics data. This is the power of a Nextflow ambassador! Not only growing a local community but helping people all over the world to get the best out of Nextflow and nf-core 🥰.\n\n#### Abhinav Sharma\n\nAbhinav is a PhD candidate at Stellenbosch University, South Africa. As a Nextflow Ambassador, Abhinav has been tremendously active in the Global South, supporting young scientists in Africa 🇿🇦🇿🇲, Brazil 🇧🇷, India 🇮🇳 and Australia 🇦🇺 leading to the growth of local communities. He has contributed to the [Nextflow training in Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ) and played a key role in integrating African bioinformaticians in the Nextflow and nf-core community and initiatives, showcased by the high participation of individuals in African countries who benefited from mentorship during nf-core Hackathons, Training events and prominent workshops like [VEME, 2023](https://twitter.com/abhi18av/status/1695863348162675042). In Australia, Abhinav continues to collaborate with Patricia, a research scientist from Telethon Kids Institute, Perth (whom he mentored during the nf-core mentorship round 2), to organize monthly seminars on [BioWiki](https://github.com/TelethonKids/Nextflow-BioWiki) and bootcamp for local capacity building. In addition, he engages in regular capacity-building sessions in Brazilian institutes such as [Instituto Evandro Chagas](https://www.gov.br/iec/pt-br/assuntos/noticias/curso-contribui-para-criacao-da-rede-norte-nordeste-de-vigilancia-genomica-para-tuberculose-no-iec) (Belém, Brazil) and INI, FIOCRUZ (Rio de Janeiro, Brazil). Last but not least, Abhinav has contributed to the Nextflow community and project in several ways, even to the extent of contributing to the Nextflow code base and plugin ecosystem! 😎\n\n#### Robert Petit\n\nRobert Petit is the Senior Bioinformatics Scientist at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/) 🦬 and a long-time contributor to the Nextflow community! 🥳 Being a Nextflow Ambassador, Robert has made extensive efforts to grow the Nextflow and nf-core communities, both locally and internationally. Through his work on [Bactopia](https://bactopia.github.io/), a popular and extensive Nextflow pipeline for the analysis of bacterial genomes, Robert has been able to [contribute to nf-core regularly](https://bactopia.github.io/v3.0.0/impact-and-outreach/enhancements/#enhancements-and-fixes). As a Bioconda Core team member, he is always lending a hand when called upon by the Nextflow community, whether it is to add a new recipe or approve a pull request! ⚒️ He has also delivered multiple trainings to the local community in Wyoming, US 🇺🇸, and workshops at conferences, including ASM Microbe. Robert's dedication as a Nextflow Ambassador is best highlighted, and he'll agree, by his active role as a mentor. Robert has acted as a mentor multiple times during virtual nf-core hackathons, and he is the only person to be a mentor in all three rounds of the Nextflow and nf-core mentorship program 😍!\n\nThe Nextflow Ambassador Program is a testament to the power of community-driven innovation, and we invite you to join us in celebrating this exceptional group. In the coming weeks and months, you will hear more from our ambassadors as they continue to share their experiences, insights, and expertise with the community as freshly minted Nextflow ambassadors.\n", + "images": [ + "/img/ambassadors-hackathon.jpeg" + ] + }, + { + "slug": "2023/learn-nextflow-in-2023", + "title": "Learn Nextflow in 2023", + "date": "2023-02-24T00:00:00.000Z", + "content": "\nIn 2023, the world of Nextflow is more exciting than ever! With new resources constantly being released, there is no better time to dive into this powerful tool. From a new [Software Carpentries’](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course to [recordings of mutiple nf-core training events](https://nf-co.re/events/training/) to [new tutorials on Wave and Fusion](https://github.com/seqeralabs/wave-showcase), the options for learning Nextflow are endless.\n\nWe've compiled a list of the best resources in 2023 to make your journey to Nextflow mastery as seamless as possible. And remember, Nextflow is a community-driven project. If you have suggestions or want to contribute to this list, head to the [GitHub page](https://github.com/nextflow-io/) and make a pull request.\n\n## Before you start\n\nBefore learning Nextflow, you should be comfortable with the Linux command line and be familiar with some basic scripting languages, such as Perl or Python. The beauty of Nextflow is that task logic can be written in your language of choice. You will just need to learn Nextflow’s domain-specific language (DSL) to control overall flow.\n\nNextflow is widely used in bioinformatics, so many tutorials focus on life sciences. However, Nextflow can be used for almost any data-intensive workflow, including image analysis, ML model training, astronomy, and geoscience applications.\n\nSo, let's get started! These resources will guide you from beginner to expert and make you unstoppable in the field of scientific workflows.\n\n## Contents\n\n- [Why Learn Nextflow](#why-learn-nextflow)\n- [Meet the Tutorials!](#meet-the-tutorials)\n 1. [Basic Nextflow Community Training](#introduction-to-nextflow-by-community)\n 2. [Hands-on Nextflow Community Training](#nextflow-hands-on-by-community)\n 3. [Advanced Nextflow Community Training](#advanced-nextflow-by-community)\n 4. [Software Carpentry workshop](#software-carpentry-workshop)\n 5. [An introduction to Nextflow course by Uppsala University](#intro-nexflow-by-uppsala)\n 6. [Introduction to Nextflow workshop by VIB](#intro-nextflow-by-vib)\n 7. [Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)](#nextflow-training-cira)\n 8. [Managing Pipelines in the Cloud - GenomeWeb Webinar](#managing-pipelines-in-the-cloud-genomeweb-webinar)\n 9. [Nextflow implementation patterns](#nextflow-implementation-patterns)\n 10. [nf-core tutorials](#nf-core-tutorials)\n 11. [Awesome Nextflow](#awesome-nextflow)\n 12. [Wave showcase: Wave and Fusion tutorials](#wave-showcase-wave-and-fusion-tutorials)\n 13. [Building Containers for Scientific Workflows](#building-containers-for-scientific-workflows)\n 14. [Best Practices for Deploying Pipelines with Nextflow Tower](#best-practices-for-deploying-pipelines-with-nextflow-tower)\n- [Cloud integration tutorials](#cloud-integration-tutorials)\n 1. [Nextflow and AWS Batch Inside the Integration](#nextflow-and-aws-batch-inside-the-integration)\n 2. [Nextflow and Azure Batch Inside the Integration](#nextflow-and-azure-batch-inside-the-integration)\n 3. [Get started with Nextflow on Google Cloud Batch](#get-started-with-nextflow-on-google-cloud-batch)\n 4. [Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](#nextflow-and-k8s-rebooted-running-nextflow-on-amazon-eks)\n- [Additional resources](#additional-resources)\n 1. [Nextflow docs](#nextflow-docs)\n 2. [Seqera Labs docs](#seqera-labs-docs)\n 3. [nf-core](#nf-core)\n 4. [Nextflow Tower](#nextflow-tower)\n 5. [Nextflow on AWS](#nextflow-on-aws)\n 6. [Nextflow Data pipelines on Azure Batch](#nextflow-data-pipelines-on-azure-batch)\n 7. [Running Nextflow with Google Life Sciences](#running-nextflow-with-google-life-sciences)\n 8. [Bonus: Nextflow Tutorial - Variant Calling Edition](#bonus-nextflow-tutorial-variant-calling-edition)\n- [Community and support](#community-and-support)\n\n

Why Learn Nextflow

\n\nThere are hundreds of workflow managers to choose from. In fact, Meir Wahnon and several of his colleagues have gone to the trouble of compiling an awesome-workflow-engines list. The workflows community initiative is another excellent source of information about workflow engines.\n\n- Using Nextflow in your analysis workflows helps you implement reproducible pipelines. Nextflow pipelines follow [FAIR guidelines](https://www.go-fair.org/fair-principles/) (findability, accessibility, interoperability, and reuse). Nextflow also supports version control and containers to manage all software dependencies.\n- Nextflow is portable; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon AWS, Microsoft Azure, Google Cloud Platform, or Kubernetes. With features like [configuration profiles](https://nextflow.io/docs/latest/config.html?#config-profiles), code can be written so that it is 100% portable across different on-prem and cloud infrastructures enabling collaboration and avoiding lock-in.\n- It is massively **scalable**, allowing the parallelization of tasks using the dataflow paradigm without hard-coding pipelines to specific platforms, workload managers, or batch services.\n- Nextflow is **flexible**, supporting scientific workflow requirements like caching processes to avoid redundant computation and workflow reporting to help understand and diagnose workflow execution patterns.\n- It is **growing fast**, and **support is available** from [Seqera Labs](https://seqera.io). The project has been active since 2013 with a vibrant developer community, and the Nextflow ecosystem continues to expand rapidly.\n- Finally, Nextflow is open source and licensed under Apache 2.0. You are free to use it, modify it, and distribute it.\n\n

Meet the Tutorials!

\n\nSome of the best publicly available tutorials are listed below:\n\n

1. Basic Nextflow Community Training

\n\nBasic training for all things Nextflow. Perfect for anyone looking to get to grips with using Nextflow to run analyses and build workflows. This is the primary Nextflow training material used in most Nextflow and nf-core training events. It covers a large number of topics, with both theoretical and hands-on chapters.\n\n[Basic Nextflow Community Training](https://training.nextflow.io/basic_training/)\n\nWe run a free online training event for this course approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community. You can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-basic-2023)) in the [YouTube playlist](https://youtu.be/ERbTqLtAkps?si=6xDoDXsb6kGQ_Qa8) below:\n\n
\n \n
\n\n

2. Hands-on Nextflow Community Training

\n\nA \"learn by doing\" tutorial with less focus on theory, instead leading through exercises of slowly increasing complexity. This course is quite short and hands-on, great if you want to practice your Nextflow skills.\n\n[Hands-on Nextflow Community Training](https://training.nextflow.io/hands_on/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-hands-on-2023/)) below:\n\n
\n \n
\n\n

3. Advanced Nextflow Community Training

\n\nAn advanced material exploring the advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows. This is the Nextflow training material used in advanced training events.\n\n[Advanced Nextflow Community Training](https://training.nextflow.io/advanced/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-sept-2023/)) below:\n\n
\n \n
\n\n

4. Software Carpentry workshop

\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (still being developed) explains the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience is those with little programming experience. The course provides a foundation to write and run Nextflow and nf-core workflows comfortably. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit The Carpentries training style. [The Carpentries](https://carpentries.org/) emphasize feedback to improve teaching materials, so we would like to hear back from you about what you thought was well-explained and what needs improvement. Pull requests to the course material are very welcome.\nThe workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nYou can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n

5. An introduction to Nextflow course from Uppsala University

\n\nThis 5-module course by Uppsala University covers the basics of Nextflow, from running Nextflow pipelines, writing your own pipelines and even using containers and conda.\n\nThe course can be viewed [here](https://uppsala.instructure.com/courses/51980/pages/nextflow-1-introduction?module_item_id=328997).\n\n

6. Introduction to Nextflow workshop by VIB

\n\nWorkshop materials by VIB (mainly) in DSL2 aiming to get familiar with the Nextflow syntax by explaining basic concepts and building a simple RNAseq pipeline. Highlights also reproducibility aspects with adding containers (docker & singularity).\n\nThe course can be viewed [here](https://vibbits-nextflow-workshop.readthedocs.io/en/latest/).\n\n

7. Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)

\n\nThis training was prepared for physicists and has examples applied to astronomy which may be interesting for Nextflow users coming from this background!\n\nThe course can be viewed [here](https://carpentries-incubator.github.io/Pipeline_Training_with_Nextflow/).\n\n

8. Managing Pipelines in the Cloud - GenomeWeb Webinar

\n\nThis on-demand webinar features Phil Ewels from SciLifeLab, nf-core (now also Seqera Labs), Brendan Boufler from Amazon Web Services, and Evan Floden from Seqera Labs. The wide-ranging discussion covers the significance of scientific workflows, examples of Nextflow in production settings, and how Nextflow can be integrated with other processes.\n\n[Watch the webinar](https://seqera.io/events/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n

9. Nextflow implementation patterns

\n\nThis advanced documentation discusses recurring patterns in Nextflow and solutions to many common implementation requirements. Code examples are available with notes to follow along and a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n

10. nf-core tutorials

\n\nA set of tutorials covering the basics of using and creating nf-core pipelines developed by the team at [nf-core](https://nf-co.re/). These tutorials provide an overview of the nf-core framework, including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n[nf-core usage tutorials](https://nf-co.re/docs/usage/tutorials) and [nf-core developer tutorials](https://nf-co.re/docs/contributing/tutorials).\n\n

11. Awesome Nextflow

\n\nA collection of awesome Nextflow pipelines compiled by various contributors to the open-source Nextflow project.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) and GitHub\n\n

12. Wave showcase: Wave and Fusion tutorials

\n\nWave and the Fusion file system are new Nextflow capabilities introduced in November 2022. Wave is a container provisioning and augmentation service fully integrated with the Nextflow ecosystem. Instead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself.\n\nTightly coupled with Wave is the new Fusion 2.0 file system. Fusion implements a virtual distributed file system and presents a thin client, allowing data hosted in AWS S3 buckets (and other object stores in the future) to be accessed via the standard POSIX filesystem interface expected by most applications.\n\nWave can help simplify development, improve reliability, and make pipelines easier to maintain. It can even improve pipeline performance. The optional Fusion 2.0 file system offers further advantages, delivering performance on par with FSx for Lustre while enabling organizations to reduce their cloud computing bill and improve pipeline efficiency throughput. See the [blog article](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) released in February 2023 explaining the Fusion file system and providing benchmarks comparing Fusion to other data handling approaches in the cloud.\n\n[Wave showcase](https://github.com/seqeralabs/wave-showcase) on GitHub\n\n

13. Building Containers for Scientific Workflows

\n\nWhile not strictly a guide about Nextflow, this article provides an overview of scientific containers and provides a tutorial involved in creating your own container and integrating it into a Nextflow pipeline. It also provides some useful tips on troubleshooting containers and publishing them to registries.\n\n[Building Containers for Scientific Workflows](https://seqera.io/blog/building-containers-for-scientific-workflows/)\n\n

14. Best Practices for Deploying Pipelines with Nextflow Tower

\n\nWhen building Nextflow pipelines, a best practice is to supply a nextflow_schema.json file describing pipeline input parameters. The benefit of adding this file to your code repository, is that if the pipeline is launched using Nextflow, the schema enables an easy-to-use web interface that users through the process of parameter selection. While it is possible to craft this file by hand, the nf-core community provides a handy schema build tool. This step-by-step guide explains how to adapt your pipeline for use with Nextflow Tower by using the schema build tool to automatically generate the nextflow_schema.json file.\n\n[Best Practices for Deploying Pipelines with Nextflow Tower](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-nextflow-tower/)\n\n

Cloud integration tutorials

\n\nIn addition to the learning resources above, several step-by-step integration guides explain how to run Nextflow pipelines on your cloud platform of choice. Some of these tutorials extend to the use of [Nextflow Tower](https://cloud.tower.nf/). Organizations can use the Tower Cloud Free edition to launch pipelines quickly in the cloud. Organizations can optionally use Tower Cloud Professional or run self-hosted or on-premises Tower Enterprise environments as requirements grow. This year, we added Google Cloud Batch to the cloud services supported by Nextflow.\n\n

1. Nextflow and AWS Batch — Inside the Integration

\n\nThis three-part series of articles provides a step-by-step guide explaining how to use Nextflow with AWS Batch. The [first of three articles](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) covers AWS Batch concepts, the Nextflow execution model, and explains how the integration works under the covers. The [second article](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/) in the series provides a step-by-step guide explaining how to set up the AWS batch environment and how to run and troubleshoot open-source Nextflow pipelines. The [third article](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/) builds on what you've learned, explaining how to integrate workflows with Nextflow Tower and share the AWS Batch environment with other users by \"publishing\" your workflows to the cloud.\n\nNextflow and AWS Batch — Inside the Integration ([part 1 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/), [part 2 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/), [part 3 of 3](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/))\n\n

2. Nextflow and Azure Batch — Inside the Integration

\n\nSimilar to the tutorial above, this set of articles does a deep dive into the Nextflow Azure Batch integration. [Part 1](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/) covers Azure Batch and essential concepts, provides an overview of the integration, and explains how to set up Azure Batch and Storage accounts. It also covers deploying a machine instance in the Azure cloud and configuring it to run Nextflow pipelines against the Azure Batch service.\n\n[Part 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/) builds on what you learned in part 1 and shows how to use Azure Batch from within Nextflow Tower Cloud. It provides a walkthrough of how to make the environment set up in part 1 accessible to users through Tower's intuitive web interface.\n\nNextflow and Azure Batch — Inside the Integration ([part 1 of 2](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/), [part 2 of 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/))\n\n

3. Get started with Nextflow on Google Cloud Batch

\n\nThis excellent article by Marcel Ribeiro-Dantas provides a step-by-step tutorial on using Nextflow with Google’s new Google Cloud Batch service. Google Cloud Batch is expected to replace the Google Life Sciences integration over time. The article explains how to deploy the Google Cloud Batch and Storage environments in GCP using the gcloud CLI. It then goes on to explain how to configure Nextflow to launch pipelines into the newly created Google Cloud Batch environment.\n\n[Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html)\n\n

4. Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS

\n\nWhile not commonly used for HPC workloads, Kubernetes has clear momentum. In this educational article, Ben Sherman provides an overview of how the Nextflow / Kubernetes integration has been simplified by avoiding the requirement for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). This detailed guide provides step-by-step instructions for using Amazon EKS as a compute environment complete with how to configure IAM Roles for Kubernetes Services Accounts (IRSA), now an Amazon EKS best practice.\n\n[Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/)\n\n

Additional resources

\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community which maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n

1. Nextflow docs

\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month, with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n

2. Seqera Labs docs

\n\nAn index of documentation, deployment guides, training materials, and resources for all things Nextflow and Tower.\n\n[Seqera Labs docs](https://seqera.io/docs/)\n\n

3. nf-core

\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up for the Slack channel.\n\n[nf-core website](https://nf-co.re/) and [nf-core Slack](https://nf-co.re/join)\n\n

4. Nextflow Tower

\n\nNextflow Tower is a platform to easily monitor, launch, and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines, and launching using either the web graphic interface, CLI, or API.\n\n[Nextflow Tower](https://tower.nf/) and [user documentation](http://help.tower.nf/).\n\n

5. Nextflow on AWS

\n\nPart of the Genomics Workflows on AWS, Amazon provides a quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes. While this article is packed with good information, the procedure outlined in the more recent [Nextflow and AWS Batch – Inside the integration](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) series, may be an easier place to start. Some of the steps that previously needed to be performed manually have been updated in the latest integration.\n\n[Nextflow on AWS Batch](https://docs.opendata.aws/genomics-workflows/orchestration/nextflow/nextflow-overview.html)\n\n

6. Nextflow Data Pipelines on Azure Batch

\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guide below developed by the team at Microsoft to set up both services on Azure, and to get your storage and batch account names and keys.\n\n[Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n

7. Running Nextflow with Google Life Sciences

\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud. Note that this integration process is specific to Google Life Sciences – an offering that pre-dates Google Cloud Batch. If you want to use the newer integration approach, you can also check out the Nextflow blog article [Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html).\n\n[Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow]\n\n

8. Bonus: Nextflow Tutorial - Variant Calling Edition

\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash script into a Nextflow workflow, and modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git), where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\nYou can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n

Community and support

\n\n- [Seqera Community Forum](https://community.seqera.io)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Nextflow Slack](https://www.nextflow.io/slack-invite.html)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io/) and [Nextflow Tower](https://tower.nf/)\n- [Nextflow patterns](https://github.com/nextflow-io/patterns)\n- [Nextflow Snippets](https://github.com/mribeirodantas/NextflowSnippets)\n", + "images": [] + }, + { + "slug": "2023/nextflow-goes-to-university", + "title": "Nextflow goes to university!", + "date": "2023-07-24T00:00:00.000Z", + "content": "\nThe Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual [Nextflow and nf-core community training online](https://www.youtube.com/@nf-core/playlists?view=50&sort=dd&shelf_id=2). In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, [Nextflow](https://www.nextflow.io/) and [nf-core](https://nf-co.re/) training for Masters and Ph.D. students in Bioinformatics and other related fields.\n\nWe recently held one of our first such projects, a collaboration with the [Bioinformatics Multidisciplinary Environment, BioME](https://bioinfo.imd.ufrn.br/site/en-US) at the [Federal University of Rio Grande do Norte (UFRN)](https://www.ufrn.br/) in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.\n\n\n\nThe course participants included one undergraduate student, Master's students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!\n\n Picture with some of the students that attended the course\n\n## Final projects\n\nFinal hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion.\n\n### Euryale 🐍\n\n[João Vitor Cavalcante](https://www.linkedin.com/in/joao-vitor-cavalcante), along with collaborators, had developed and [published](https://www.frontiersin.org/articles/10.3389/fgene.2022.814437/full) a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result [here](https://github.com/dalmolingroup/euryale/) 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓\n\n### Bringing Nanopore to Google Batch ☁️\n\nThe Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, [wf-alignment](https://github.com/epi2me-labs/wf-alignment), takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, [Danilo Imparato](https://www.linkedin.com/in/daniloimparato) noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his [final project](https://github.com/daniloimparato/wf-alignment), which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯\n\n### Nextflow applied to Economics! 🤩\n\n[Galileu Nobre](https://www.linkedin.com/in/galileu-nobre-901551187/) is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his [final project](https://github.com/galileunobre/nextflow_projeto_1). The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.\n\n### Whole Exome Sequencing 🧬\n\nFor her [final project](https://github.com/RafaellaFerraz/exome), [Rafaella Ferraz](https://www.linkedin.com/in/rafaella-sousa-ferraz) used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻\n\n### RNASeq with contamination 🧫\n\nIn her [final project](https://github.com/iaradsouza1/tab-projeto-final), [Iara Souza](https://www.linkedin.com/in/iaradsouza) developed a bioinformatics pipeline that analyzed RNA-Seq data when it's required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the \"batteries included\" that come with it 🔋😬\n\n### SARS-CoV-2 Genome assembly and lineage classification 🦠\n\n[Diego Teixeira](https://www.linkedin.com/in/diego-go-tex) has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a [Nextflow pipeline](https://github.com/diegogotex/sarscov2_irma_nf) aggregating all tools and analyses he's been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.\n\nIn the nf-core project, there are almost a [thousand modules](https://nf-co.re/modules) ready to plug in your pipeline, together with [dozens of full-featured pipelines](https://nf-co.re/pipelines). However, in many situations, you'll need a custom pipeline. With that in mind, it's very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.\n\n## Exciting experience!\n\nIt was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩\n\nWe’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don't hesitate to reach out to us at community@seqera.io. We would love to hear from you!\n", + "images": [ + "/img/nextflow-university-class-ufrn.jpg" + ] + }, + { + "slug": "2023/nextflow-summit-2023-recap", + "title": "Nextflow Summit 2023 Recap", + "date": "2023-10-25T00:00:00.000Z", + "content": "\n## Five days of Nextflow Awesomeness in Barcelona\n\nOn Friday, Oct 20, we wrapped up our [hackathon](https://nf-co.re/events/hackathon) and [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, Spain. By any measure, this year’s Summit was our best community event ever, drawing roughly 900 attendees across multiple channels, including in-person attendees, participants in our [#summit-2023](https://nextflow.slack.com/archives/C0602TWRT5G) Slack channel, and [Summit Livestream](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL) viewers on YouTube.\n\nThe Summit drew attendees, speakers, and sponsors from around the world. Over the course of the three-day event, we heard from dozens of impressive speakers working at the cutting edge of life sciences from academia, research, healthcare providers, biotechs, and cloud providers, including:\n\n- Australian BioCommons\n- Genomics England\n- Pixelgen Technologies\n- University of Tennessee Health Science Center\n- Amazon Web Services\n- Quantitative Biology Center - University of Tübingen\n- Biomodal\n- Matterhorn Studio\n- Centre for Genomic Regulation (CRG)\n- Heidelberg University Hospital\n- MemVerge\n- University of Cambridge\n- Oxford Nanopore Technologies\n- Medical University of Innsbruck\n- Sano Genetics\n- Institute of Genetics and Development of Rennes, University of Rennes\n- Ardigen\n- ZS\n- Wellcome Sanger Institute\n- SciLifeLab\n- AstraZeneca UK Ltd\n- University of Texas at Dallas\n- Seqera\n\n## The Hackathon – advancing the Nextflow ecosystem\n\nThe week began with a three-day in-person and virtual nf-core hackathon event. With roughly 100 in-person developers, this was twice the size of our largest Hackathon to date. As with previous Hackathons, participants were divided into project groups, with activities coordinated via a single [GitHub project board](https://github.com/orgs/nf-core/projects/47/views/1) focusing on different aspects of [nf-core](https://nf-co.re/) and Nextflow, including:\n\n- Pipelines\n- Modules & subworkflows\n- Infrastructure\n- Nextflow & plugins development\n\nThis year, the focus of the hackathon was [nf-test](https://code.askimed.com/nf-test/), an open-source testing framework for Nextflow pipelines. The team made considerable progress applying nf-test consistently across various nf-core pipelines and modules — and of course, no Hackathon would be complete without a community cooking class, quiz, bingo, a sock hunt, and a scavenger hunt!\n\nFor an overview of the tremendous progress made advancing the state of Nextflow and nf-core in three short days, view Chris Hakkaart’s talk on [highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/).\n\n## The Summit kicks off\n\nThe Summit began on Wednesday Oct 18 with excellent talks from [Australian BioCommons](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) and [Genomics England](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/). This was followed by a presentation where [Pixelgen Technologies](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) described their unique Molecular Pixelation (MPX) technologies and unveiled their new [nf-core/pixelator](https://nf-co.re/pixelator/1.0.0) community pipeline for molecular pixelation assays.\n\nNext, Seqera’s Phil Ewels took the stage providing a series of community updates, including the announcement of a new [Nextflow Ambassador](https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html) program, [a new community forum](https://nextflow.io/blog/2023/community-forum.html) at [community.seqera.io](https://community.seqera.io), and the exciting appointment of [Geraldine Van der Auwera](https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html) as lead developer advocate for the Nextflow. Geraldine is well known for her work on GATK, WDL, and Terra.bio and is the co-author of the book [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/). As Geraldine assumes leadership of the developer advocacy team, Phil will spend more time focusing on open-source development, as product manager of open source at Seqera.\n\n
\n \"Hackathon\n
\n\nSeqera’s Evan Floden shared his vision of the modern biotech stack for open science, highlighting recent developments at Seqera, including a revamped [Seqera platform](https://seqera.io/platform/), new [Data Explorer](https://seqera.io/blog/introducing-data-explorer/) functionality, and providing an exciting glimpse of the new Data Studios feature now in private preview. You can view [Evan’s full talk here](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/).\n\nA highlight was the keynote delivered by Erik Garrison of the University of Tennessee Health Science Center provided. In his talk, [Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/), Erik shared how his team's cutting-edge work applying new computational methods in the context of the Human Pangenome Project has yielded the most complete picture of human sequence evolution available to date.\n\nDay one wrapped up with a surprise [announcement](https://www.globenewswire.com/news-release/2023/10/20/2763899/0/en/Seqera-Sets-Sail-With-Alinghi-Red-Bull-Racing-as-Official-High-Performance-Computing-Supplier.html) that Seqera has been confirmed as the official High-Performance Computing Supplier for Alinghi Red Bull Racing at the [37th America’s Cup](https://www.americascup.com/) in Barcelona. This was followed by an evening reception hosted by [Alinghi Red Bull Racing](https://alinghiredbullracing.americascup.com/).\n\n## Day two starts off on the right foot\n\nDay two kicked off with a brisk sunrise run along the iconic Barcelona Waterfront attended by a team of hardy Summit participants. After that, things kicked into high gear for the morning session with talks on everything from using Nextflow to power [Machine Learning pipelines for materials science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) to [standardized frameworks for protein structure prediction](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) to discussions on [how to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/).\n\n
\n \"Summit\n
\n\nNextflow creator and Seqera CTO and co-founder Paolo Di Tommaso provided an update on some of the technologies he and his team have been working on including a deep dive on the [Fusion file system](https://seqera.io/fusion/). Paolo also delved into [Wave containers](https://seqera.io/wave/), discussing the dynamic assembly of containers using the [Spack package manager](https://nextflow.io/docs/latest/process.html#spack), echoing a similar theme from AWS’s [Brendan Bouffler](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) earlier in the day. During the conference, Seqera announced Wave Containers as our latest [open-source](https://github.com/seqeralabs/wave) contribution to the bioinformatics community — a huge contribution to the open science movement.\n\nPaolo also provided an impressive command-line focused demo of Wave, echoing Harshil Patel’s equally impressive demo earlier in the day focused on [seqerakit and automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/). Both Harshil and Paolo showed themselves to be **\"kings of the live demo\"** for their command line mastery under pressure! You can view [Paolo’s talk and demos here](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) and [Harshil’s talk here](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/).\n\nTalks during day two included [bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/), a discussion of [nf-validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/), and a talk on the [development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/).\n\nUnfortunately, there were too many brilliant speakers and topics to mention them all here, so we’ve provided a handy summary of talks at the end of this post so you can look up topics of interest.\n\nThe Summit also featured an exhibition area, and attendees visited booths hosted by [event sponsors](https://summit.nextflow.io/barcelona/sponsors/) between talks and viewed the many excellent [scientific posters](https://summit.nextflow.io/barcelona/posters/) contributed for the event. Following a packed day of sessions that went into the evening, attendees relaxed and socialized with colleagues over dinner.\n\n
\n \"Morning\n
\n\n## Wrapping up\n\nAs things wound to a close on day three, there were additional talks on topics ranging from ZS’s [contributing to nf-core through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) to [decoding the Tree of Life at Wellcome Sanger Institute](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) to [performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) at Medical University of Innsbruck.\n\nPhil Ewels discussed [future plans for MultiQC](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/), and Edmund Miller [shared his experience working on nf-test](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) and how it is empowering scalable and streamlined testing for nf-core projects.\n\nTo close the event, Evan took the stage a final time, thanking the many Summit organizers and contributors, and announcing the next Nextflow Summit Barcelona, scheduled for **October 21-25, 2024**. He also reminded attendees of the upcoming North American Hackathon and [Nextflow Summit in Boston](https://summit.nextflow.io/boston/) beginning on November 28, 2023.\n\nOn behalf of the Seqera team, thank you to our fellow [sponsors](https://summit.nextflow.io/boston/sponsors/) who helped make the Nextflow Summit a resounding success. This year’s sponsors included:\n\n- AWS\n- ZS\n- Element Biosciences\n- Microsoft\n- MemVerge\n- Pixelgen Technologies\n- Oxford Nanopore\n- Quilt\n- TileDB\n\n## In case you missed it\n\nIf you were unable to attend in person, or missed a talk, you can watch all three days of the Summit on our [YouTube channel](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL).\n\nFor information about additional upcoming events including bytesize talks, hackathons, webinars, and training events, you can visit [https://nf-co.re/events](https://nf-co.re/events) or [https://seqera.io/events/seqera/](https://seqera.io/events/seqera/).\n\nFor your convenience, a handy list of talks from Nextflow Summit 2023 are summarized below.\n\n### Day one (Wednesday Oct 18):\n\n- [The National Nextflow Tower Service for Australian researchers](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) – Steven Manos\n- [Analysing ONT long read data for cancer with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/) – Arthur Gymer\n- [Community updates](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-community-updates/) – Phil Ewels\n- [Pixelgen Technologies ❤︎ Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) – John Dahlberg\n- [The modern biotech stack](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/) – Evan Floden\n- [Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/) – Erik Garrison\n\n### Day two (Thursday Oct 19):\n\n- [It’s been quite a year for research technology in the cloud: we’ve been busy](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) – Brendan Bouffler\n- [nf-validation: a Nextflow plugin to validate pipeline parameters and input files](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/) - Júlia Mir Pedrol\n- [Computational methods for allele-specific methylation with biomodal Duet](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-biomodal-duet/) – Michael Wilson\n- [How to use data pipelines in Machine Learning for Material Science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) – Jakob Zeitler\n- [nf-core/proteinfold: a standardized workflow framework for protein structure prediction tools](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) - Jose Espinosa-Carrasco\n- [Automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/) - Harshil Patel\n- [nf-co2footprint: a Nextflow plugin to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/) - Sabrina Krakau\n- [Bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/) - Victor Perez\n- [Bioinformatics at the speed of cloud: revolutionizing genomics with Nextflow and MMCloud](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bioinformatics-at-the-speed-of-cloud/) - Sateesh Peri\n- [Enabling converged computing with the Nextflow ecosystem](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) - Paolo Di Tommaso\n- [Cluster scalable pangenome graph construction with nf-core/pangenome](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-cluster-scalable-pangenome/) - Simon Heumos\n- [Development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/) - Raquel Manzano\n- [Annotation cache: using nf-core/modules and Seqera Platform to build an AWS open data resource](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-annotation-cache/) - Maxime Garcia\n- [Real-time sequencing analysis with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-real-time-sequencing-analysis-with-nextflow/) - Chris Wright\n- [nf-core/sarek: a comprehensive & efficient somatic & germline variant calling workflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-sarek/) - Friederike Hanssen\n- [nf-test: a simple but powerful testing framework for Nextflow pipelines](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-test-simple-but-powerful/) - Lukas Forer\n- [Empowering distributed precision medicine: scalable genomic analysis in clinical trial recruitment](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-empowering-distributed-precision-medicine/) - Heath Obrien\n- [nf-core pipeline for genomic imputation: from phasing to imputation to validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-pipeline-for-genomic-imputation/) - Louis Le Nézet\n- [Porting workflow managers to Nextflow at a national diagnostic genomics medical service – strategy and learnings](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-genomics-england/) - Several Speakers\n\n### Day three (Thursday Oct 19):\n\n- [Driving discovery: contributing to the nf-core project through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) - Felipe Almeida & Juliet Frederiksen\n- [Automated production engine to decode the Tree of Life](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) - Guoying Qi\n- [Building a community: experiences from one year as a developer advocate](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-community-building/) - Marcel Ribeiro-Dantas\n- [nf-core/raredisease: a workflow to analyse data from patients with rare diseases](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-core-raredisease/) - Ramprasad Neethiraj\n- [Enabling AZ bioinformatics with Nextflow/Nextflow Tower](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-az/) - Manasa Surakala\n- [Bringing MultiQC into a new era](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/) - Phil Ewels\n- [nf-test at nf-core: empowering scalable and streamlined testing](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) - Edmund Miller\n- [Performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) - Sebastian Schönherr\n- [Highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/) - Chris Hakkaart\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)\n", + "images": [ + "/img/blog-summit-2023-recap--img1b.jpg", + "/img/blog-summit-2023-recap--img2b.jpg", + "/img/blog-summit-2023-recap--img3b.jpg" + ] + }, + { + "slug": "2023/nextflow-with-gbatch", + "title": "Get started with Nextflow on Google Cloud Batch", + "date": "2023-02-01T00:00:00.000Z", + "content": "\n[We have talked about Google Cloud Batch before](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html). Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The [Nextflow official documentation](https://www.nextflow.io/docs/latest/google.html) also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳\n\n### Running an RNAseq pipeline with Google Cloud Batch\n\nWelcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check [nf-core/rnaseq](https://github.com/nf-core/rnaseq). For the proof-of-concept RNAseq pipeline that we will use here, check [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf).\n\nNextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis.\n\nWe will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!\n\n### Setting up Google Cloud CLI (gcloud)\n\nIn this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions [here](https://cloud.google.com/sdk/docs/install) to install it. Once you have gcloud installed, run the command `gcloud init` to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project \"my-rnaseq-pipeline\". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.\n\n### Setting up Batch and Storage in Google Cloud Platform\n\n#### Enable Google Batch\n\nAccording to the [official Google documentation](https://cloud.google.com/batch/docs/get-started) _Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale_.\n\nThe first step is to download the `beta` command group. You can do this by executing:\n\n```bash\n$ gcloud components install beta\n```\n\nThen, enable billing for this project. You will first need to get your account id with\n\n```bash\n$ gcloud beta billing accounts list\n```\n\nAfter that, you will see something like the following appear in your window:\n\n```console\nACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True\n```\n\nIf you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.\n\n```bash\n$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ\n```\n\nNext, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:\n\n```bash\n$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com\n```\n\nYou should see a message similar to the one below:\n\n```console\nOperation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.\n```\n\n#### Create a Service Account\n\nIn order to access the APIs we enabled, you need to [create a Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud) and set the necessary IAM roles for the project. You can create the Service Account by executing:\n\n```bash\n$ gcloud iam service-accounts create rnaseq-pipeline-sa\n```\n\nAfter this, set appropriate roles for the project using the commands below:\n\n```bash\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"\n```\n\n#### Create your Bucket\n\nNow it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name\n\n```bash\n$ gcloud storage buckets create gs://rnaseq-pipeline-bckt\n```\n\nNow it's time for Nextflow to join the party! 🥳\n\n### Setting up Nextflow to make use of Batch and Storage\n\n#### Write the configuration file\n\nHere you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.\n\nStart by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally.\n\nInside the folder that you created for the project, create a file named `nextflow.config` with the following content (remember to replace PROJECT-ID with the project id you created above):\n\n```groovy\nworkDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}\n```\n\nThe `workDir` option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice.\n\nThe `process` scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the `nextflow/rnaseq-nf` Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed.\n\nThe `google` scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances [here](https://www.nextflow.io/docs/latest/google.html#spot-instances)), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the `nextflow.config` file should contain \"rnaseq-nxf\" as the project id.\n\nUse the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.\n\n```bash\n$ gcloud auth application-default login\n```\n\n#### Launch the pipeline!\n\nWith that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of [this tutorial](https://github.com/nextflow-io/rnaseq-nf), and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at .\n\nOne important thing to mention is that in this repository there is already a `nextflow.config` file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the `-c` Nextflow parameter. Run the command line below:\n\n```bash\n$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config\n```\n\nWhile the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called `results`, because of how the `publishDir` directive was specified in the `main.nf` script (example [here](https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5)). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the `publishDir` to another bucket directory, e.g. `gs://rnaseq-pipeline-bckt/results`.\n\nIn your terminal, you should see something like this:\n\n![Nextflow ongoing run on Google Cloud Batch](/img/ongoing-nxf-gbatch.png)\n\nYou can check the status of your jobs on Google Batch by opening another terminal and running the following command:\n\n```bash\n$ gcloud batch jobs list\n```\n\nBy the end of it, if everything worked well, you should see something like:\n\n![Nextflow run on Google Cloud Batch finished](/img/nxf-gbatch-finished.png)\n\nAnd that's all, folks! 😆\n\nYou will find more information about Nextflow on Google Batch in [this blog post](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html) and the [official Nextflow documentation](https://www.nextflow.io/docs/latest/google.html).\n\nSpecial thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document.\n", + "images": [] + }, + { + "slug": "2023/reflecting-on-ten-years-of-nextflow-awesomeness", + "title": "Reflecting on ten years of Nextflow awesomeness", + "date": "2023-06-06T00:00:00.000Z", + "content": "\nThere's been a lot of water under the bridge since the first release of Nextflow in July 2013. From its humble beginnings at the [Centre for Genomic Regulation](https://www.crg.eu/) (CRG) in Barcelona, Nextflow has evolved from an upstart workflow orchestrator to one of the most consequential projects in open science software (OSS). Today, Nextflow is downloaded **120,000+** times monthly, boasts vibrant user and developer communities, and is used by leading pharmaceutical, healthcare, and biotech research firms.\n\nOn the occasion of Nextflow's anniversary, I thought it would be fun to share some perspectives and point out how far we've come as a community. I also wanted to recognize the efforts of Paolo Di Tommaso and the many people who have contributed enormous time and effort to make Nextflow what it is today.\n\n## A decade of innovation\n\nBill Gates is credited with observing that \"people often overestimate what they can do in one year, but underestimate what they can do in ten.\" The lesson, of course, is that real, meaningful change takes time. Progress is measured in a series of steps. Considered in isolation, each new feature added to Nextflow seems small, but they combine to deliver powerful capabilities.\n\nLife sciences has seen a staggering amount of innovation. According to estimates from the National Human Genome Research Institute (NHGRI), the cost of sequencing a human genome in 2013 was roughly USD 10,000. Today, sequencing costs are in the range of USD 200—a **50-fold reduction**.1\n\nA fundamental principle of economics is that _\"if you make something cheaper, you get more of it.\"_ One didn't need a crystal ball to see that, driven by plummeting sequencing and computing costs, the need for downstream analysis was poised to explode. With advances in sequencing technology outpacing Moore's Law, It was clear that scaling analysis capacity would be a significant issue.2\n\n## Getting the fundamentals right\n\nWhen Paolo and his colleagues started the Nextflow project, it was clear that emerging technologies such as cloud computing, containers, and collaborative software development would be important. Even so, it is still amazing how rapidly these key technologies have advanced in ten short years.\n\nIn an [article for eLife magazine in 2021](https://elifesciences.org/labs/d193babe/the-story-of-nextflow-building-a-modern-pipeline-orchestrator), Paolo described how Solomon Hyke's talk \"[Why we built Docker](https://www.youtube.com/watch?v=3N3n9FzebAA)\" at DotScale in the summer of 2013 impacted his thinking about the design of Nextflow. It was evident that containers would be a game changer for scientific workflows. Encapsulating application logic in self-contained, portable containers solved a multitude of complexity and dependency management challenges — problems experienced daily at the CRG and by many bioinformaticians to this day. Nextflow was developed concurrent with the container revolution, and Nextflow’s authors had the foresight to make containers first-class citizens.\n\nWith containers, HPC environments have been transformed — from complex environments where application binaries were typically served to compute nodes via NFS to simpler architectures where task-specific containers are pulled from registries on demand. Today, most bioinformatic pipelines use containers. Nextflow supports [multiple container formats](https://www.nextflow.io/docs/latest/container.html?highlight=containers) and runtimes, including [Docker](https://www.docker.com/), [Singularity](https://sylabs.io/), [Podman](https://podman.io/), [Charliecloud](https://hpc.github.io/charliecloud/), [Sarus](https://sarus.readthedocs.io/en/stable/), and [Shifter](https://github.com/NERSC/shifter).\n\n## The shift to the cloud\n\nSome of the earliest efforts around Nextflow centered on building high-quality executors for HPC workload managers. A key idea behind schedulers such as LSF, PBS, Slurm, and Grid Engine was to share a fixed pool of on-premises resources among multiple users, maximizing throughput, efficiency, and resource utilization.\n\nSee the article [Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters](https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html)\n\nWhile cloud infrastructure was initially \"clunky\" and hard to deploy and use, the idea of instant access and pay-per-use models was too compelling to ignore. In the early days, many organizations attempted to replicate on-premises HPC clusters in the cloud, deploying the same software stacks and management tools used locally to cloud-based VMs.\n\nWith the launch of [AWS Batch](https://aws.amazon.com/batch/) in December 2016, Nextflow’s developers realized there was a better way. In cloud environments, resources are (in theory) infinite and just an API call away. The traditional scheduling paradigm of sharing a finite resource pool didn't make sense in the cloud, where users could dynamically provision a private, scalable resource pool for only the duration of their workload. All the complex scheduling and control policies that tended to make HPC workload managers hard to use and manage were no longer required.3\n\nAWS Batch also relied on containerization, so it only made sense that AWS Batch was the first cloud-native integration to the Nextflow platform early in 2017, along with native support for S3 storage buckets. Nextflow has since been enhanced to support other batch services, including [Azure Batch](https://azure.microsoft.com/en-us/products/batch) and [Google Cloud Batch](https://cloud.google.com/batch), along with a rich set of managed cloud storage solutions. Nextflow’s authors have also embraced [Kubernetes](https://kubernetes.io/docs/concepts/overview/), developed by Google, yet another way to marshal and manage containerized application environments across public and private clouds.\n\n## SCMs come of age\n\nA major trend shaping software development has been the use of collaborative source code managers (SCMs) based on Git. When Paolo was thinking about the design of Nextflow, GitHub had already been around for several years, and DevOps techniques were revolutionizing software. These advances turned out to be highly relevant to managing pipelines. Ten years ago, most bioinformaticians stored copies of pipeline scripts locally. Nextflow’s authors recognized what now seems obvious — it would be easier to make Nextflow SCM aware and launch pipelines directly from a code repository. Today, this simple idea has become standard practice. Most users run pipelines directly from GitHub, GitLab, Gitea, or other favorite SCMs.\n\n## Modularization on steroids\n\nA few basic concepts and patterns in computer science appear repeatedly in different contexts. These include iteration, indirection, abstraction, and component reuse/modularization. Enabled by containers, we have seen a significant shift towards modularization in bioinformatics pipelines enabled by catalogs of reusable containers. In addition to general-purpose registries such as [Docker Hub](https://hub.docker.com/) and [Quay.io](https://quay.io/), domain-specific efforts such as [biocontainers](https://biocontainers.pro/) have emerged, aimed at curating purpose-built containers to meet the specialized needs of bioinformaticians.\n\nWe have also seen the emergence of platform and language-independent package managers such as [Conda](https://docs.conda.io/en/latest/). Today, almost **10,000** Conda recipes for various bioinformatics tools are freely available from [Bioconda](https://anaconda.org/bioconda/repo). Gone are the days of manually installing software. In addition to pulling pre-built bioinformatics containers from registries, developers can leverage [packages of bioconda](http://bioconda.github.io/conda-package_index.html) recipes directly from the bioconda channel.\n\nThe Nextflow community has helped lead this trend toward modularization in several areas. For example, in 2022, Seqera Labs introduced [Wave](https://seqera.io/wave/). This new service can dynamically build and serve containers on the fly based on bioconda recipes, enabling the two technologies to work together seamlessly and avoiding building and maintaining containers by hand.\n\nWith [nf-core](https://nf-co.re/), the Nextflow community has extended the concept of modularization and reuse one step further. Much as bioconda and containers have made bioinformatics software modular and portable, [nf-core modules](https://nf-co.re/modules) extend these concepts to pipelines. Today, there are **900+** nf-core modules — essentially building blocks with pre-defined inputs and outputs based on Nextflow's elegant dataflow model. Rather than creating pipelines from scratch, developers can now wire together these pre-assembled modules to deliver new functionality rapidly or use any of **80** of the pre-built [nf-core analysis pipelines](https://nf-co.re/pipelines). The result is a dramatic reduction in development and maintenance costs.\n\n## Some key Nextflow milestones\n\nSince the [first Nextflow release](https://github.com/nextflow-io/nextflow/releases/tag/v0.3.0) in July 2013, there have been **237 releases** and **5,800 commits**. Also, the project has been forked over **530** times. There have been too many important enhancements and milestones to capture here. We capture some important developments in the timeline below:\n\n\"Nextflow\n\nAs we look to the future, the pace of innovation continues to increase. It’s been exciting to see Nextflow expand beyond the various _omics_ disciplines to new areas such as medical imaging, data science, and machine learning. We continue to evolve Nextflow, adding new features and capabilities to support these emerging use cases and support new compute and storage environments. I can hardly wait to see what the next ten years will bring.\n\nFor those new to Nextflow and wishing to learn more about the project, we have compiled an excellent collection of resources to help you [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html).\n\n---\n\n1 [https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost](https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost)\n2 Coined by Gordon Moore of Intel in 1965, Moore’s Law predicted that transistor density, roughly equating to compute performance, would roughly double every two years. This was later revised in some estimates to 18 months. Over ten years, Moore’s law predicts roughly a 2^5 = 32X increase in performance – less than the ~50-fold decrease in sequencing costs. See [chart here](https://www.genome.gov/sites/default/files/inline-images/2021_Sequencing_cost_per_Human_Genome.jpg).\n3 This included features like separate queues, pre-emption policies, application profiles, and weighted fairshare algorithms.\n", + "images": [ + "/img/nextflow_ten_years_graphic.jpg" + ] + }, + { + "slug": "2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines", + "title": "Selecting the right storage architecture for your Nextflow pipelines", + "date": "2023-05-04T00:00:00.000Z", + "content": "\n_In this article we present the various storage solutions supported by Nextflow including on-prem and cloud file systems, parallel file systems, and cloud object stores. We also discuss Fusion file system 2.0, a new high-performance file system that can help simplify configuration, improve throughput, and reduce costs in the cloud._\n\nAt one time, selecting a file system for distributed workloads was straightforward. Through the 1990s, the Network File System (NFS), developed by Sun Microsystems in 1984, was pretty much the only game in town. It was part of every UNIX distribution, and it presented a standard [POSIX interface](https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html), meaning that applications could read and write data without modification. Dedicated NFS servers and NAS filers became the norm in most clustered computing environments.\n\nFor organizations that outgrew the capabilities of NFS, other POSIX file systems emerged. These included parallel file systems such as [Lustre](https://www.lustre.org/), [PVFS](https://www.anl.gov/mcs/pvfs-parallel-virtual-file-system), [OpenZFS](https://openzfs.org/wiki/Main_Page), [BeeGFS](https://www.beegfs.io/c/), and [IBM Spectrum Scale](https://www.ibm.com/products/storage-scale-system) (formerly GPFS). Parallel file systems can support thousands of compute clients and deliver more than a TB/sec combined throughput, however, they are expensive, and can be complex to deploy and manage. While some parallel file systems work with standard Ethernet, most rely on specialized low-latency fabrics such as Intel® Omni-Path Architecture (OPA) or InfiniBand. Because of this, these file systems are typically found in only the largest HPC data centers.\n\n## Cloud changes everything\n\nWith the launch of [Amazon S3](https://aws.amazon.com/s3/) in 2006, new choices began to emerge. Rather than being a traditional file system, S3 is an object store accessible through a web API. S3 abandoned traditional ideas around hierarchical file systems. Instead, it presented a simple programmatic interface and CLI for storing and retrieving binary objects.\n\nObject stores are a good fit for cloud services because they are simple and scalable to multiple petabytes of storage. Rather than relying on central metadata that presents a bottleneck, metadata is stored with each object. All operations are atomic, so there is no need for complex POSIX-style file-locking mechanisms that add complexity to the design. Developers interact with object stores using simple calls like [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html) (store an object in a bucket in return for a key) and [GetObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html) (retrieve a binary object, given a key).\n\nThis simple approach was ideal for internet-scale applications. It was also much less expensive than traditional file systems. As a result, S3 usage grew rapidly. Similar object stores quickly emerged, including Microsoft [Azure Blob Storage](https://azure.microsoft.com/en-ca/products/storage/blobs/), [Open Stack Swift](https://wiki.openstack.org/wiki/Swift), and [Google Cloud Storage](https://cloud.google.com/storage/), released in 2010.\n\n## Cloud object stores vs. shared file systems\n\nObject stores are attractive because they are reliable, scalable, and cost-effective. They are frequently used to store large amounts of data that are accessed infrequently. Examples include archives, images, raw video footage, or in the case of bioinformatics applications, libraries of biological samples or reference genomes. Object stores provide near-continuous availability by spreading data replicas across cloud availability zones (AZs). AWS claims theoretical data availability of up to 99.999999999% (11 9's) – a level of availability so high that it does not even register on most [downtime calculators](https://availability.sre.xyz/)!\n\nBecause they support both near-line and cold storage, object stores are sometimes referred to as \"cheap and deep.\" Based on current [S3 pricing](https://aws.amazon.com/s3/pricing), the going rate for data storage is USD 0.023 per GB for the first 50 TB of data. Users can \"pay as they go\" — spinning up S3 storage buckets and storing arbitrary amounts of data for as long as they choose. Some high-level differences between object stores and traditional file systems are summarized below.\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Cloud object stores\n \n Traditional file systems\n
\n Interface / access protocol\n \n HTTP-based API\n \n POSIX interface\n
\n Cost\n \n $\n \n $$$\n
\n Scalability / capacity\n \n Practically unlimited\n \n Limited\n
\n Reliability / availability\n \n Extremely high\n \n Varies\n
\n Performance\n \n Typically lower\n \n Varies\n
\n Support for existing application\n \n NO\n \n YES\n
\n
\n\nThe downside of object storage is that the vast majority of applications are written to work with POSIX file systems. As a result, applications seldom interact directly with object stores. A common practice is to copy data from an object store, perform calculations locally on a cluster node, and write results back to the object store for long-term storage.\n\n## Data handling in Nextflow\n\nUnlike older pipeline orchestrators, Nextflow was built with cloud object stores in mind. Depending on the cloud where pipelines run, Nextflow manages cloud credentials and allows users to provide a path to shared data. This can be a shared file system such as `/my-shared-filesystem/data` or a cloud object store e.g. `s3://my-bucket/data/`.\n\n**Nextflow is exceptionally versatile when it comes to data handling, and can support almost any file system or object store.** Internally, Nextflow uses [executors](https://nextflow.io/docs/latest/executor.html) implemented as plug-ins to insulate pipeline code from underlying compute and storage environments. This enables pipelines to run without modification across multiple clouds regardless of the underlying storage technology.\n\nSuppose an S3 bucket is specified as a location for shared data during pipeline execution. In that case, aided by the [nf-amazon](https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-amazon) plug-in, Nextflow transparently copies data from the S3 bucket to a file system on a cloud instance. Containerized applications mount the local file system and read and write data directly. Once processing is complete, Nextflow copies data to the shared bucket to be available for the next task. All of this is completely transparent to the pipeline and applications. The same plug-in-based approach is used for other cloud object stores such as Azure BLOBs and Google Cloud Storage.\n\n## The Nextflow scratch directive\n\nThe idea of staging data from shared repositories to a local disk, as described above, is not new. A common practice with HPC clusters when using NFS file systems is to use local \"scratch\" storage.\n\nA common problem with shared NFS file systems is that they can be relatively slow — especially when there are multiple clients. File systems introduce latency, have limited IO capacity, and are prone to problems such as “hot spots” and bandwidth limitations when multiple clients read and write files in the same directory.\n\nTo avoid bottlenecks, data is often copied from an NFS filer to local scratch storage for processing. Depending on data volumes, users often use fast solid-state drives or [RAM disks](https://www.mvps.net/docs/how-to-mount-the-physical-memory-from-a-linux-system-as-a-partition/) for scratch storage to accelerate processing.\n\nNextflow automates this data handling pattern with built-in support for a [scratch](https://nextflow.io/docs/latest/process.html?highlight=scratch#scratch) directive that can be enabled or disabled per process. If scratch is enabled, data is automatically copied to a designated local scratch device prior to processing.\n\nWhen high-performance file systems such as Lustre or Spectrum Scale are available, the question of whether to use scratch storage becomes more complicated. Depending on the file system and interconnect, parallel file systems performance can sometimes exceed that of local disk. In these cases, customers may set scratch to false and perform I/O directly on the parallel file system.\n\nResults will vary depending on the performance of the shared file system, the speed of local scratch storage, and the amount of shared data to be shuttled back and forth. Users will want to experiment to determine whether enabling scratch benefits pipelines performance.\n\n## Multiple storage options for Nextflow users\n\nStorage solutions used with Nextflow can be grouped into five categories as described below:\n\n- Traditional file systems\n- Cloud object stores\n- Cloud file systems\n- High-performance cloud file systems\n- Fusion file system v2.0\n\nThe optimal choice will depend on your environment and the nature of your applications and compute environments.\n\n**Traditional file systems** — These are file systems typically deployed on-premises that present a POSIX interface. NFS is the most popular choice, but some users may use high-performance parallel file systems. Storage vendors often package their offerings as appliances, making them easier to deploy and manage. Solutions common in on-prem HPC environments include [Network Appliance](https://www.netapp.com/), [Data Direct Networks](https://www.ddn.com/) (DDN), [HPE Cray ClusterStor](https://www.hpe.com/psnow/doc/a00062172enw), and [IBM Storage Scale](https://www.ibm.com/products/storage-scale-system). While customers can deploy self-managed NFS or parallel file systems in the cloud, most don’t bother with this in practice. There are generally better solutions available in the cloud.\n\n**Cloud object stores** — In the cloud, object stores tend to be the most popular solution among Nextflow users. Although object stores don’t present a POSIX interface, they are inexpensive, easy to configure, and scale practically without limit. Depending on performance, access, and retention requirements, customers can purchase different object storage tiers at different price points. Popular cloud object stores include Amazon S3, Azure BLOBs, and Google Cloud Storage. As pipelines execute, the Nextflow executors described above manage data transfers to and from cloud object storage automatically. One drawback is that because of the need to copy data to and from the object store for every process, performance may be lower than a fast shared file system.\n\n**Cloud file systems** — Often, it is desirable to have a shared file NFS system. However, these environments can be tedious to deploy and manage in the cloud. Recognizing this, most cloud providers offer cloud file systems that combine some of the best properties of traditional file systems and object stores. These file systems present a POSIX interface and are accessible via SMB and NFS file-sharing protocols. Like object stores, they are easy to deploy and scalable on demand. Examples include [Amazon EFS](https://aws.amazon.com/efs/), [Azure Files](https://azure.microsoft.com/en-us/products/storage/files/), and [Google Cloud Filestore](https://cloud.google.com/filestore). These file systems are described as \"serverless\" and \"elastic\" because there are no servers to manage, and capacity scales automatically.\n\nComparing price and performance can be tricky because cloud file systems are highly configurable. For example, [Amazon EFS](https://aws.amazon.com/efs/pricing/) is available in [four storage classes](https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html) – Amazon EFS Standard, Amazon EFS Standard-IA, and two One Zone storage classes – Amazon EFS One Zone and Amazon EFS One Zone-IA. Similarly, Azure Files is configurable with [four different redundancy options](https://azure.microsoft.com/en-us/pricing/details/storage/files/), and different billing models apply depending on the offer selected. To provide a comparison, Amazon EFS Standard costs $0.08 /GB-Mo in the US East region, which is ~4x more expensive than Amazon S3.\n\nFrom the perspective of Nextflow users, using Amazon EFS and similar cloud file systems is the same as using a local NFS system. Nextflow users must ensure that their cloud instances mount the NFS share, so there is slightly more management overhead than using an S3 bucket. Nextflow users and administrators can experiment with the scratch directive governing whether Nextflow stages data in a local scratch area or reads and writes directly to the shared file system.\n\nCloud file systems suffer from some of the same limitations as on-prem NFS file systems. They often don’t scale efficiently, and performance is limited by network bandwidth. Also, depending on the pipeline, users may need to stage data to the shared file system in advance, often by copying data from an object store used for long term storage.\n\nFor [Nextflow Tower](https://cloud.tower.nf/) users, there is a convenient integration with Amazon EFS. Tower Cloud users can have an Amazon EFS instance created for them automatically via Tower Forge, or they can leverage an existing EFS instance in their compute environment. In either case, Tower ensures that the EFS share is available to compute hosts in the AWS Batch environment, reducing configuration requirements.\n\n**Cloud high-performance file systems** — For customers that need high levels of performance in the cloud, Amazon offers Amazon FSx. Amazon FSx comes in different flavors, including NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. In HPC circles, [FSx for Lustre](https://aws.amazon.com/fsx/lustre/) is most popular delivering sub-millisecond latency, up to 1 TB/sec maximum throughput per file system, and millions of IOPs. Some Nextflow users with data bottlenecks use FSx for Lustre, but it is more difficult to configure and manage than Amazon S3.\n\nLike Amazon EFS, FSx for Lustre is a fully-managed, serverless, elastic file system. Amazon FSx for Lustre is configurable, depending on customer requirements. For example, customers with latency-sensitive applications can deploy FSx cluster nodes with SSD drives. Customers concerned with cost and throughput can select standard hard drives (HDD). HDD-based FSx for Lustre clusters can be optionally configured with an SSD-based cache to accelerate performance. Customers also choose between different persistent file system options and a scratch file system option. Another factor to remember is that with parallel file systems, bandwidth scales with capacity. If you deploy a Lustre file system that is too small, you may be disappointed in the performance.\n\nFSx for Lustre persistent file systems ranges from 125 to 1,000 MB/s/TiB at [prices](https://aws.amazon.com/fsx/lustre/pricing/) ranging from **$0.145** to **$0.600** per GB month. Amazon also offers a lower-cost scratch FSx for Lustre file systems (not to be confused with the scratch directive in Nextflow). At this tier, FSx for Lustre does not replicate data across availability zones, so it is suited to short-term data storage. Scratch FSx for Lustre storage delivers **200 MB/s/TiB**, costing **$0.140** per GB month. This is **~75%** more expensive than Amazon EFS (Standard) and **~6x** the cost of standard S3 storage. Persistent FSx for Lustre file systems configured to deliver **1,000 MB/s/TiB** can be up to **~26x** the price of standard S3 object storage!\n\n**Hybrid Cloud file systems** — In addition to the solutions described above, there are other solutions that combine the best of object stores and high-performance parallel file systems. An example is [WekaFS™](https://www.weka.io/) from WEKA. WekaFS is used by several Nextflow users and is deployable on-premises or across your choice cloud platforms. WekaFS is attractive because it provides multi-protocol access to the same data (POSIX, S3, NFS, SMB) while presenting a common namespace between on-prem and cloud resident compute environments. Weka delivers the performance benefits of a high-performance parallel file system and optionally uses cloud object storage as a backing store for file system data to help reduce costs.\n\nFrom a Nextflow perspective, WekaFS behaves like any other shared file system. As such, Nextflow and Tower have no specific integration with WEKA. Nextflow users will need to deploy and manage WekaFS themselves making the environment more complex to setup and manage. However, the flexibility and performance provided by a hybrid cloud file system makes this worthwhile for many organizations.\n\n**Fusion file system 2.0** — Fusion file system is a solution developed by [Seqera Labs](https://seqera.io/fusion) that aims to bridge the gap between cloud-native storage and data analysis workflows. The solution implements a thin client that allows pipeline jobs to access object storage using a standard POSIX interface, thus simplifying and speeding up most operations.\n\nThe advantage of the Fusion file system is that there is no need to copy data between S3 and local storage. The Fusion file system driver accesses and manipulates files in Amazon S3 directly. You can learn more about the Fusion file system and how it works in the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nFor sites struggling with performance and scalability issues on shared file systems or object storage, the Fusion file system offers several advantages. [Benchmarks conducted](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) by Seqera Labs have shown that, in some cases, **Fusion can deliver performance on par with Lustre but at a much lower cost.** Fusion is also significantly easier to configure and manage and can result in lower costs for both compute and storage resources.\n\n## Comparing the alternatives\n\nA summary of storage options is presented in the table below:\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Traditional file systems\n \n Cloud object storage\n \n Cloud file systems\n \n Fusion FS\n
\n NFS, Lustre, Spectrum Scale\n \n Amazon S3\n \n Azure BLOB storage\n \n Google Cloud Storage\n \n Amazon EFS\n \n Amazon FSX for Lustre\n \n Azure File\n \n Fusion file system 2.0\n
\n Deployment model\n \n Manual\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n
\n Access model\n \n POSIX\n \n Object\n \n Object\n \n Object\n \n POSIX\n \n POSIX\n \n POSIX\n \n POSIX\n
\n Clouds supported\n \n On-prem, any cloud\n \n AWS only\n \n Azure only\n \n GCP only\n \n AWS only\n \n AWS only\n \n Azure only\n \n AWS, GCP and Azure 1\n
\n Requires block storage\n \n Yes\n \n Optional\n \n Optional\n \n Optional\n \n Optional\n \n No\n \n Optional\n \n No\n
\n Relative cost\n \n $$\n \n $\n \n $\n \n $\n \n $$\n \n $$$\n \n $$\n \n $\n
\n Nextflow plugins\n \n -\n \n nf-amazon\n \n nf-azure\n \n nf-google\n \n -\n \n -\n \n -\n \n nf-amazon\n
\n Tower support\n \n Yes\n \n Yes, existing buckets\n \n Yes, existing BLOB container\n \n Yes, existing cloud storage bucket\n \n Yes, creates EFS instances\n \n Yes, creates FSx for Lustre instances\n \n File system created manually\n \n Yes, fully automated\n
\n Dependencies\n \n Externally configured\n \n Wave Amazon S3\n
\n Cost model\n \n Fixed price on-prem, instance+block storage costs\n \n GB per month\n \n GB per month\n \n GB per month\n \n Multiple factors\n \n Multiple factors\n \n Multiple factors\n \n GB per month (uses S3)\n
\n Level of configuration effort (when used with Tower)\n \n High\n \n Low\n \n Low\n \n Low\n \n Medium (low with Tower)\n \n High (easier with Tower)\n \n Medium\n \n Low\n
\n Works best with:\n \n Any on-prem cluster manager (LSF, Slurm, etc.)\n \n AWS Batch\n \n Azure Batch\n \n Google Cloud Batch\n \n AWS Batch\n \n AWS Batch\n \n Azure Batch\n \n AWS Batch, Amazon EKS, Azure Batch, Google Cloud Batch 1\n
\n
\n\n## So what’s the bottom line?\n\nThe choice or storage solution depends on several factors. Object stores like Amazon S3 are popular because they are convenient and inexpensive. However, depending on data access patterns, and the amount of data to be staged in advance, file systems such as EFS, Azure Files or FSx for Lustre can also be a good alternative.\n\nFor many Nextflow users, Fusion file system will be a better option since it offers performance comparable to a high-performance file system at the cost of cloud object storage. Fusion is also dramatically easier to deploy and manage. [Adding Fusion support](https://nextflow.io/docs/latest/fusion.html) is just a matter of adding a few lines to the `nextflow.config` file.\n\nWhere workloads run is also an important consideration. For example, on-premises clusters will typically use whatever shared file system is available locally. When operating in the cloud, you can choose whether to use cloud file systems, object stores, high-performance file systems, Fusion FS, or hybrid cloud solutions such as Weka.\n\nStill unsure what storage solution will best meet your needs? Consider joining our community at [nextflow.slack.com](https://nextflow.slack.com/). You can engage with others, post technical questions, and learn more about the pros and cons of the storage solutions described above.\n", + "images": [] + }, + { + "slug": "2023/the-state-of-kubernetes-in-nextflow", + "title": "The State of Kubernetes in Nextflow", + "date": "2023-03-10T00:00:00.000Z", + "content": "\nHi, my name is Ben, and I’m a software engineer at Seqera Labs. I joined Seqera in November 2021 after finishing my Ph.D. at Clemson University. I work on a number of things at Seqera, but my primary role is that of a Nextflow core contributor.\n\nI have run Nextflow just about everywhere, from my laptop to my university cluster to the cloud and Kubernetes. I have written Nextlfow pipelines for bioinformatics and machine learning, and I even wrote a pipeline to run other Nextflow pipelines for my [dissertation research](https://github.com/bentsherman/tesseract). While I tried to avoid contributing code to Nextflow as a student (I had enough work already), now I get to work on it full-time!\n\nWhich brings me to the topic of this post: Nextflow and Kubernetes.\n\nOne of my first contributions was a “[best practices guide](https://github.com/seqeralabs/nf-k8s-best-practices)” for running Nextflow on Kubernetes. The guide has helped many people, but for me it provided a map for how to improve K8s support in Nextflow. You see, Nextflow was originally built for HPC, while Kubernetes and cloud batch executors were added later. While Nextflow’s extensible design makes adding features like new executors relatively easy, support for Kubernetes is still a bit spotty.\n\nSo, I set out to make Nextflow + K8s great! Over the past year, in collaboration with talented members of the Nextflow community, we have added all sorts of enhancements to the K8s executor. In this blog post, I’d like to show off all of these improvements in one place. So here we go!\n\n## New features\n\n### Submit tasks as Kubernetes Jobs\n\n_New in version 22.05.0-edge._\n\nNextflow submits tasks as Pods by default, which is sort of a bad practice. In Kubernetes, every Pod should be created through a controller (e.g., Deployment, Job, StatefulSet) so that Pod failures can be handled automatically. For Nextflow, the appropriate controller is a K8s Job. Using Jobs instead of Pods directly has greatly improved the stability of large Nextflow runs on Kubernetes, and will likely become the default behavior in a future version.\n\nYou can enable this feature with the following configuration option:\n\n```groovy\nk8s.computeResourceType = 'Job'\n```\n\nCredit goes to @xhejtman from CERIT-SC for leading the charge on this one!\n\n### Object storage as the work directory\n\n_New in version 22.10.0._\n\nOne of the most difficult aspects of using Nextflow with Kubernetes is that Nextflow needs a `PersistentVolumeClaim` (PVC) to store the shared work directory, which also means that Nextflow itself must run inside the Kubernetes cluster in order to access this storage. While the `kuberun` command attempts to automate this process, it has never been reliable enough for production usage.\n\nAt the Nextflow Summit in October 2022, we introduced [Fusion](https://seqera.io/fusion/), a file system driver that can mount S3 buckets as POSIX-like directories. The combination of Fusion and [Wave](https://seqera.io/wave/) (a just-in-time container provisioning service) enables you to have your work directory in S3-compatible storage. See the [Wave blog post](https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html) for an explanation of how it works — it’s pretty cool.\n\nThis functionality is useful in general, but it is especially useful for Kubernetes, because (1) you don’t need to provision your own PVC and (2) you can run Nextflow on Kubernetes without using `kuberun` or creating your own submitter Pod.\n\nThis feature currently supports AWS S3 on Elastic Kubernetes Service (EKS) clusters and Google Cloud Storage on Google Kubernetes Engine (GKE) clusters.\n\nCheck out [this article](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/) over at the Seqera blog for an in-depth guide to running Nextflow (with Fusion) on Amazon EKS.\n\n### No CPU limits by default\n\n_New in version 22.11.0-edge._\n\nWe have changed the default behavior of CPU requests for the K8s executor. Before, a single number in a Nextflow resource request (e.g., `cpus = 8`) was interpreted as both a “request” (lower bound) and a “limit” (upper bound) in the Pod definition. However, setting an explicit CPU limit in K8s is increasingly seen as an anti-pattern (see [this blog post](https://home.robusta.dev/blog/stop-using-cpu-limits) for an explanation). The bottom line is that it is better to specify a request without a limit, because that will ensure that each task has the CPU time it requested, while also allowing the task to use more CPU time if it is available. Unlike other resources like memory and disk, CPU time is compressible — it can be given and taken away without killing the application.\n\nWe have also updated the Docker integration in Nextflow to use [CPU shares](https://www.batey.info/cgroup-cpu-shares-for-docker.html), which is the mechanism used by [Kubernetes](https://www.batey.info/cgroup-cpu-shares-for-kubernetes.html) and [AWS Batch](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions) under the hood to define expandable CPU requests. These changes make the behavior of CPU requests in Nextflow much more consistent across executors.\n\n### CSI ephemeral volumes\n\n_New in version 22.11.0-edge._\n\nIn Kubernetes, volumes are used to provide storage and data (e.g., configuration and secrets) to Pods. Persistent volumes exist independently of Pods and can be mounted and unmounted over time, while ephemeral volumes are attached to a single Pod and are created and destroyed alongside it. While Nextflow can use any persistent volume through a `PersistentVolumeClaim`, ephemeral volume types are supported on a case-by-case basis. For example, `ConfigMaps` and `Secrets` are two ephemeral volume types that are already supported by Nextflow.\n\nNextflow now also supports [CSI ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes). CSI stands for Container Storage Interface, and it is a standard used by Kubernetes to support third-party storage systems as volumes. The most common example of a CSI ephemeral volume is [Secrets Store](https://secrets-store-csi-driver.sigs.k8s.io/getting-started/usage.html), which is used to inject secrets from a remote vault such as [Hashicorp Vault](https://www.vaultproject.io/) or [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/).\n\n_Note: CSI persistent volumes can already be used in Nextflow through a `PersistentVolumeClaim`._\n\n### Local disk storage for tasks\n\n_New in version 22.11.0-edge._\n\nNextflow uses a shared work directory to coordinate tasks. Each task receives its own subdirectory with the required input files, and each task is expected to write its output files to this directory. As a workflow scales to thousands of concurrent tasks, this shared storage becomes a major performance bottleneck. We are investigating a few different ways to overcome this challenge.\n\nOne of the tools we have to reduce I/O pressure on the shared work directory is to make tasks use local storage. For example, if a task takes input file A, produces an intermediate file B, then produces an output file C, the file B can be written to local storage instead of shared storage because it isn’t a required output file. Or, if the task writes an output file line by line instead of all at once at the end, it can stream the output to local storage first and then copy the file to shared storage.\n\nWhile it is far from a comprehensive solution, local storage can reduce I/O congestion in some cases. Provisioning local storage for every task looks different on every platform, and in some cases it is not supported. Fortunately, Kubernetes provides a seamless interface for local storage, and now Nextflow supports it as well.\n\nTo provision local storage for tasks, you must (1) add an `emptyDir` volume to your Pod options, (2) request disk storage via the `disk` directive, and (3) direct tasks to use the local storage with the `scratch` directive. Here’s an example:\n\n```groovy\nprocess {\n disk = 10.GB\n pod = [ [emptyDir: [:], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nAs a bonus, you can also provision an `emptyDir` backed by memory:\n\n```groovy\nprocess {\n memory = 10.GB\n pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nNextflow maps the `disk` directive to the [`ephemeral-storage`](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage) resource request, which is provided by the [`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume (another ephemeral volume type).\n\n### Miscellaneous\n\nCheck the [release notes](https://github.com/nextflow-io/nextflow/releases) or the list of [K8s pull requests](https://github.com/nextflow-io/nextflow/pulls?q=is%3Apr+label%3Aplatform%2Fk8s) on Github to see what else has been added. Here are some notable improvements from the past year:\n\n- Support Pod `affinity` ([640cbed4](https://github.com/nextflow-io/nextflow/commit/640cbed4813a34887d4dc10f87fa2e4aa524d055))\n- Support Pod `automountServiceAccountToken` ([1b5908e4](https://github.com/nextflow-io/nextflow/commit/1b5908e4cbbb79f93be2889eec3acfa6242068a1))\n- Support Pod `priorityClassName` ([51650f8c](https://github.com/nextflow-io/nextflow/commit/51650f8c411ba40f0966031035e7a47c036f542e))\n- Support Pod `tolerations` ([7f7cdadc](https://github.com/nextflow-io/nextflow/commit/7f7cdadc6a36d0fb99ef125f6c6f89bfca8ca52e))\n- Support `time` directive via `activeDeadlineSeconds` ([2b6f70a8](https://github.com/nextflow-io/nextflow/commit/2b6f70a8fa55b993fa48755f7a47ac9e1b584e48))\n- Improved control over error conditions ([064f9bc4](https://github.com/nextflow-io/nextflow/commit/064f9bc4), [58be2128](https://github.com/nextflow-io/nextflow/commit/58be2128), [d86ddc36](https://github.com/nextflow-io/nextflow/commit/d86ddc36))\n- Improved support for labels and queue annotation ([9951fcd9](https://github.com/nextflow-io/nextflow/commit/9951fcd9), [4df8c8d2](https://github.com/nextflow-io/nextflow/commit/4df8c8d2))\n- Add support for AWS IAM role for Service Accounts ([62df42c3](https://github.com/nextflow-io/nextflow/commit/62df42c3), [c3364d0f](https://github.com/nextflow-io/nextflow/commit/c3364d0f), [b3d33e3b](https://github.com/nextflow-io/nextflow/commit/b3d33e3b))\n\n## Beyond Kubernetes\n\nWe’ve added tons of value to Nextflow over the past year – not just in terms of Kubernetes support, but also in terms of performance, stability, and integrations with other technologies – and we aren’t stopping any time soon! We have greater ambitions still for Nextflow, and I for one am looking forward to what we will accomplish together. As always, keep an eye on this blog, as well as the [Nextflow GitHub](https://github.com/nextflow-io/nextflow) page, for the latest updates to Nextflow.\n", + "images": [] + }, + { + "slug": "2024/addressing-bioinformatics-core-challenges", + "title": "Addressing Bioinformatics Core Challenges with Nextflow and nf-core", + "date": "2024-09-11T00:00:00.000Z", + "content": "\nI was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face. From managing complex pipelines to optimizing resource utilization, Nextflow offers a range of benefits that can streamline workflows and improve productivity. In this blog, I'll summarize my talk and share insights on how Nextflow can help overcome some of those challenges, including meeting the needs of a wide range of users or customers, automate reporting, customising pipelines and training.\n\n### Challenge 1: running multiple services\n\n_Challenge description: “I have a wide range of stakeholders, and my pipelines need to address different needs in multiple scientific domains”_\n\nOne of the biggest challenges faced by bioinformatics cores is catering to a diverse range of users with varying applications. On one hand, one might need to run analyses for researchers focused on cancer or human genetics. On the other hand, one may also need to support scientists working with mass spectrometry or metagenomics. Fortunately, the nf-core community has made it relatively easy to tackle these diverse needs with their curated pipelines. These pipelines are ready to use, covering a broad spectrum of applications, from genomics and metagenomics to immunology and mass spectrometry. In one of my slides I showed a non-exhaustive list, which spans genomics, metagenomics, immunology, mass spec, and more: one can find best-practice pipelines for almost any bioinformatics application imaginable, including emerging areas like imaging and spatial-omics. By leveraging this framework, one can not only tap into the expertise of the pipeline developers but also engage with them to discuss specific needs and requirements. This collaborative approach can significantly ease the deployment of a workflow, allowing the user to focus on high-priority tasks while ensuring that the analyses are always up to date and aligned with current best practices.\n\n### Challenge 2: customising applications\n\n_Challenge description: “We often need to customise our applications and pipeline, to meet specific in-house needs of our users”_\n\nWhile ready-to-use applications are a huge advantage, there are times when customisation is necessary. Perhaps the standard pipeline that works for most users doesn't quite meet the specific needs of a facilities user or customer. Fortunately, the nf-core community has got these cases covered. With over 1,300 modules at everyone’s disposal, one can easily compose their own pipeline using the nf-core components and tooling. Should that not be enough though, one can even create a pipeline from scratch using nf-core tools. For instance, one can run a simple command like “nf-core create” followed by the name of the pipeline, and voilà! The software package will create a complete skeleton for the pipeline, filled with pre-compiled code and placeholders to ease customisation. This process is incredibly quick, as I demonstrated in a video clip during the talk, where a pipeline skeleton was created in just a few moments.\n\nOf course, customisation isn't limited to pipelines. It also applies to containers, which are a crucial enabler of portability. When it comes to containers, Nextflow users have two options: an easy way and a more advanced approach. The easy way involves using Seqera Containers, a platform that allows anyone to compose a container using tools from bioconda, pypi, and conda-forge. No need for logging in, just select the tools, and the URL of your container will be made available in no time. One can build containers for either Docker or Singularity, and for different platforms (amd64 or arm64).\n\nIf one is looking for more control, they can use Wave as a command line. This is a powerful tool that can act as an intermediary between the user and a container registry. Wave builds containers on the fly, allowing anyone to pass a wave build command as an evaluation inside a docker run command. It's incredibly fast, and builds containers from conda packages in a matter of seconds. Wave, which is also the engine behind Seqera Containers, can be extremely handy to allow other operations like container augmentation. This feature enables a user to add new layers to existing containers without having to rebuild them, thanks to Docker's layer-based architecture. One can simply create a folder where configuration files or executable scripts are located, pass the folder to Wave which will add the folder with a new layer, and get the URL of the augmented container on the fly.\n\n### Challenge 3: Reporting\n\n_Challenge description: “I need to deliver a clear report of the analysis results, in a format that is accessible and can be used for publication purposes by my users”_\n\nReporting is a crucial aspect of any bioinformatics pipeline, and as for customisation Nextflow offers different ways to approach it. suitable for different levels of expertise. The most straightforward solution involves running MultiQC, a tool that collects the output and logs of a wide range of software in a pipeline and generates a nicely formatted HTML report. This is a great option if one wants a quick and easy way to get a summary of their pipeline's results. MultiQC is a widely used tool that supports a huge list (and growing) of bioinformatics tools and file formats, making it a great choice for many use cases.\n\nHowever, if the developer needs more control over the reporting process or wants to create a custom report that meets some specific needs, it is entirely possible to engineer the reports from scratch. This involves collecting the outputs from various processes in the pipeline and passing them as an input to a process that runs an R Markdown or Quarto script. R Markdown and Quarto are popular tools for creating dynamic documents that can be parameterised, allowing anyone to customize the content and the layout of a report dynamically.\nBy using this approach, one can create a report that is tailored to your specific needs, including the types of plots and visualizations they want to include, the formatting and layouting, branding, and anything specific one might want to highlight.\n\nTo follow this approach, the user can either create their own customised module, or re-use one of the available notebooks modules in the nf-core repository (quarto [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/quartonotebook), or jupyter [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/jupyternotebook)).\n\n### Challenge 4: Monitoring\n\n_Challenge description: “I need to be able to estimate and optimise runtimes as well as costs of my pipelines, fitting our cost model”_\n\nMonitoring is a critical aspect of pipeline management, and Nextflow provides a robust set of tools to help you track and optimise a pipeline's performance. At its core, monitoring involves tracking the execution of the pipeline to ensure that it's running efficiently and effectively. But it's not just about knowing how long a pipeline takes to run or how much it costs - it's also about making sure each process in the pipeline is using the requested resources efficiently.\nWith Nextflow, the user can track the resources used by each process in your pipeline, including CPU, memory, and disk usage and compare them visually with the resources requested in the pipeline configuration and reserved by each job. This information allows the user to identify bottlenecks and areas for optimisation, so one can fine-tune their pipeline for a better resource consumption. For example, if the user notices that one process is using a disproportionate amount of memory, they can adjust the configuration to better match the actual usage.\n\nBut monitoring isn't just about optimising a pipeline's performance - it's also about reducing the environmental impact where possible. A recently developed Nextflow plugin allows to track the carbon footprint of a pipeline, including the energy consumption and greenhouse gas emissions associated with running that pipeline. This information allows one to make informed decisions about their environmental impact, and gaining better awareness or even adopting greener strategies to computing.\n\nOne of the key benefits of Nextflow’s monitoring system is its flexibility. The user can either use the built-in html reports for trace and pipeline execution, or could monitor a run live by connecting to Seqera Platform and visualising its progress on a graphical interface in real time. More expert or creative users could use the trace file produced by a Nextflow execution, to create their own metrics and visualisations.\n\n### Challenge 5: User accessibility\n\n_Challenge description: “I could balance workloads better, by giving users a certain level of autonomy in running some of my pipelines”_\n\nUser accessibility is a crucial aspect of pipeline development, as it enables users with varying levels of bioinformatics experience to run complex pipelines with ease. One of the advantages of Nextflow, is that a developer can create pipelines that are not only robust and efficient but also user-friendly. Allowing your users to run them with a certain level of autonomy might be a good strategy in a bioinformatics core to decentralise straightforward analyses and invest human resources on more complex projects. Empowering a facility’s users to run specific pipelines independently could be a solution to reduce certain workloads.\n\nThe nf-core template includes a parameters schema, which is captured by the nf-core website to create a graphical interface for parameters configuration of the pipelines hosted under the nf-core organisation on GitHub. This interface allows users to fill in the necessary fields for parameters needed to run a pipeline, and allows even users with minimal experience with bioinformatics or command-line interfaces to quickly set up a run. The user can then simply copy and paste the command generated by the webpage into a terminal, and the pipeline will launch as configured. This approach is ideal for users who are familiar with basic computer tasks, and have a very minimal familiarity with a terminal.\n\nHowever, for users with even less bioinformatics experience, Nextflow and the nf-core template together offer an even more intuitive solution. The pipeline can be added to the launcher of the Seqera Platform, and one can provide users with a comprehensive and user-friendly interface that allows them to launch pipelines with ease. This platform offers a range of features, including access to datasets created from sample sheets, the ability to launch pipelines on a wide range of cloud environments as well as on HPC on-premise. A simple graphical interface simplifies the entire process.The Seqera Platform provides in this way a seamless and intuitive experience for users, allowing them to run pipelines without requiring extensive bioinformatics knowledge.\n\n### Challenge 6: Training\n\n_Challenge description: “Training my team and especially onboarding new team members is always challenging and requires documentation and good materials”_\n\nThe final challenge we often face in bioinformatics facilities is training. We all know that training is an ongoing issue, not just because of staff turnover and the need to onboard new recruits, but also because the field is constantly evolving. With new tools, techniques, and technologies emerging all the time, it can be difficult to keep up with the latest developments. However, training is crucial for ensuring that pipelines are robust, efficient, and accurate.\n\nFortunately, there are now many resources available to help with training. The Nextflow training website, for example, has been completely rebuilt recently and now offers a wealth of material suitable for everyone, from beginners to experts. Whether you're just starting out with Nextflow or are already an experienced user, you'll find plenty of resources to help you improve your skills. From introductory tutorials to advanced guides, the training website has everything you need to get the most out of this workflow manager.\n\nEveryone can access the material at their own pace, but regular training events have been scheduled during the year. Additionally, there is now a network of Nextflow Ambassadors who often organise local training events across the world. Without making comparisons with other solutions, I can easily say that the steep learning curve to get going with Nextflow is just a myth nowadays. The quality of the training material, the examples available, the frequency of events in person or online you can attend to, and more importantly a welcoming community of users, make learning Nextflow quite easy.\n\nIn my laboratory, usually in a couple of months bachelor students are reasonably confident with the code and with running pipelines and debugging common issues.\n\n### Conclusions\n\nIn conclusion, the presentation at ISMB has gathered quite some interest because I believe it has shown how Nextflow is a powerful and versatile tool that can help bioinformatics cores address those common challenges everyone has experienced. With its comprehensive tooling, extensive training materials, and active community of users, Nextflow offers a complete package that can help people streamline their workflows and improve their productivity.\nAlthough I might be biased on this, I also believe that by adopting Nextflow one also becomes part of a community of researchers and developers who are passionate about bioinformatics and committed to sharing their knowledge and expertise. Beginners not only will have access to a wealth of resources and tutorials, but more importantly to a supportive network of peers who can offer advice and guidance, and which is really fun to be part of.\n", + "images": [] + }, + { + "slug": "2024/ambassador-second-call", + "title": "Open call for new Nextflow Ambassadors closes June 14", + "date": "2024-05-17T00:00:00.000Z", + "content": "\nNextflow Ambassadors are passionate individuals within the Nextflow community who play a more active role in fostering collaboration, knowledge sharing, and engagement. We launched this program at the Nextflow Summit in Barcelona last year, and it's been a great experience so far, so we've been recruiting more volunteers to expand the program. We’re going to close applications in June with the goal of having new ambassadors start in July, so if you’re interested in becoming an ambassador, now is your chance to apply!\n\n\n\nThe program has been off to a great start, bringing together a diverse group of 46 passionate individuals from around the globe. Our ambassadors have done a great job in their dedication to spreading the word about Nextflow, contributing significantly to the community in numerous ways, including writing insightful content, organizing impactful events, conducting training sessions, leading hackathons, and even contributing to the codebase. Their efforts have not only enhanced the Nextflow ecosystem but have also fostered a stronger, more interconnected global community.\n\nTo support their endeavors, we provide our ambassadors with exclusive swag, essential assets to facilitate their work and funding to attend events where they can promote Nextflow. With the end of the first semester fast approaching, we are excited to officially announce the second cohort of the Nextflow Ambassador program will start in July. If you are passionate about Nextflow and eager to make a meaningful impact, we invite you to [apply](http://seqera.typeform.com/ambassadors/) and join our vibrant community of ambassadors.\n\n**Application Details:**\n\n- **Call for Applications:** Open until June 14 (23h59 any timezone)\n- **Notification of Acceptance:** By June 30\n- **Program Start:** July 2024\n\n
\n \"Ambassadors\n
\n\nWe seek enthusiastic individuals ready to take their contribution to the next level through various initiatives such as content creation, event organization, training, hackathons, and more. As an ambassador, you will receive support and resources to help you succeed in your role, including swag, necessary assets, and funding for event participation.\n\nTo apply, please visit our [Nextflow Ambassador Program Application Page](http://seqera.typeform.com/ambassadors/) and submit your application no later than 23h59 June 14 (any timezone). The form shouldn’t take more than a few minutes to complete. We are eager to welcome a new group of ambassadors who will help support the growth and success of the Nextflow community.\n\nThanks to all our current ambassadors for their incredible work and dedication. We look forward to seeing the new ideas and initiatives that the next cohort of ambassadors will bring to the table. Together, let's continue to build a stronger, more dynamic Nextflow community.\n\n[Apply now and become a part of the Nextflow journey!](http://seqera.typeform.com/ambassadors/)\n\n---\n\nStay tuned for more updates and follow us on our [social](https://twitter.com/nextflowio) [media](https://x.com/seqeralabs) [channels](https://www.linkedin.com/company/seqera/posts/) to keep up with the latest news and events from the Nextflow community.\n", + "images": [ + "/img/ambassadors-hackathon.jpeg" + ] + }, + { + "slug": "2024/better-support-through-community-forum-2024", + "title": "Moving toward better support through the Community forum", + "date": "2024-08-28T00:00:00.000Z", + "content": "\nAs the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support. By improving the forum’s usability and accessibility, we’re aiming to create a more efficient and welcoming environment for everyone. Read on to learn about the changes we’re implementing and how you can contribute to making the forum an even better resource for the community.\n\n\n\n
\n\nOne of the things that impressed me the most when I joined Seqera last year as a developer advocate for the Nextflow community, was how engaged people are, and how much peer-to-peer interaction there is across a vast range of scientific domains, cultures, and geographies. That’s wonderful for a number of reasons, not least of which is that whenever you run into a problem —or you’re trying to do something a bit complicated or new— it’s very likely that there is someone out there who is able and willing to help you figure it out.\n\nFor the past few months, our small team of developer advocates have been thinking about how to nurture that dynamism, and how to further improve the experience of peer-to-peer support as the Nextflow community continues to grow. We’ve come to the conclusion that the best thing we can do is make the [community forum](https://community.seqera.io/) an awesome place to go for help, answers, and resources.\n\n## Why focus on the forum?\n\nIf you’re familiar with the Nextflow Slack workspace, you know there’s a lot of activity there, and the #help channel is always hopping. It’s true, and that’s great, buuuuut using Slack has some important downsides that the forum doesn’t suffer from.\n\nOne of the standout features of the forum is the ability to search past questions and answers really easily. Whether you're browsing directly within the forum, or using Google or some other search engine, you can quickly find relevant information in a way that’s much harder to do on Slack. This means that solutions to common issues are readily accessible, saving you (and the resident experts who have already answered the same question a bunch of times) a whole lot of time and effort.\n\nAdditionally, the forum has no barrier to access— you can view all the content without the need to join yet another app. This open access ensures that everyone can benefit from the wealth of knowledge shared by community members.\n\n## Immediate improvements to the forum’s ease of use\n\nWe’re excited to roll out a few immediate changes to the forum that should make it easier and more pleasant to use.\n\n- We’re introducing a new, sleeker visual design to make navigation and posting more intuitive and enjoyable.\n\n- We’ve reorganized the categories to streamline the process of finding and providing help. Instead of having separate categories for various things (like Nextflow, Wave, Seqera Platform etc), there is now a single \"Ask for help\" category for all topics, eliminating any confusion about where to post your question. Simply put, if you need help, just post in the \"Ask for help\" category. Done.\n\nWe’re also planning to mirror existing categories from the Nextflow Slack workspace, such as the jobs board and shameless promo channels, to make that content more visible and searchable. This will help you find opportunities and promote your work more effectively.\n\n## What you can do to help\n\nThese changes are meant to make the forum a great place for peer-to-peer support for the Nextflow community. You can help us improve it further by giving us your feedback about the forum functionality (don’t be shy), by posting your questions in the forum, and of course, if you’re already a Nextflow expert, by answering questions there.\n\nCheck out the [community forum](https://community.seqera.io/) now!\n", + "images": [] + }, + { + "slug": "2024/bioinformatics-growth-in-turkiye", + "title": "Fostering Bioinformatics Growth in Türkiye", + "date": "2024-06-12T00:00:00.000Z", + "content": "\nAfter diving into the Nextflow community, I've seen how it benefits bioinformatics in places like South Africa, Brazil, and France. I'm confident it can do the same for Türkiye by fostering collaboration and speeding up research. Since I became a Nextflow Ambassador, I am happy and excited because I can contribute to this development! Even though our first attempt to organize an introductory Nextflow workshop was online, it was a fruitful collaboration with RSG-Türkiye that initiated our effort to promote more Nextflow in Türkiye. We are happy to announce that we will organize a hands-on workshop soon.\n\n\n\nI am [Kübra Narcı](https://www.ghga.de/about-us/team-members/narci-kuebra), currently employed as a bioinformatician within the [German Human Genome Phenome Archive (GHGA) Workflows workstream](https://www.ghga.de/about-us/how-we-work/workstreams). Upon commencing this position nearly two years ago, I was introduced to Nextflow due to the necessity of transporting certain variant calling workflows here, and given my prior experience with other workflow managers, I was well-suited for the task. Though the initial two months were marked by challenges and moments of frustration, my strong perseverance ultimately led to the successful development of my first pipeline.\n\nSubsequently, owing much to the supportive Nextflow community, my interest, as well as my proficiency in the platform, steadily grew, culminating in my acceptance to the role of Nextflow Ambassador for the past six months. I jumped into the role since it was a great opportunity for GHGA and Nextflow to be connected even more.\n\n
\n \"meme\n
\n\nTransitioning into this ambassadorial role prompted a solid realization: the absence of a dedicated Nextflow community in Türkiye. This revelation was a shock, particularly given my academic background in bioinformatics there, where the community’s live engagement in workflow development is undeniable. Witnessing Turkish contributors within Nextflow and nf-core Slack workspaces further underscored this sentiment. It became evident that what was lacking was a spark for organizing events to ignite the Turkish community, a task I gladly undertook.\n\nWhile I possessed foresight regarding the establishment of a Nextflow community, I initially faced uncertainty regarding the appropriate course of action. To address this, I sought counsel from [Marcel](https://www.twitter.com/mribeirodantas), given his pivotal role in the initiation of the Nextflow community in Brazil. Following our discussion and receipt of valuable insights, it became evident that establishing connections with the appropriate community from my base in Germany was a necessity.\n\nThis attempt led me to meet with [RSG-Türkiye](https://rsgturkey.com). RSG-Türkiye aims to create a platform for students and post-docs in computational biology and bioinformatics in Türkiye. It aims to share knowledge and experience, promote collaboration, and expand training opportunities. The organization also collaborates with universities and the Bioinformatics Council, a recently established national organization as the Turkish counterpart of the ISCB (International Society for Computational Biology) to introduce industrial and academic research. To popularize the field, they have offline and online talk series in university student centers to promote computational biology and bioinformatics.\n\nFollowing our introduction, RSG-Türkiye and I hosted a workshop focusing on workflow reproducibility, Nextflow, and nf-core. We chose Turkish as the language to make it more accessible for participants who are not fluent in English. The online session lasted a bit more than an hour and attracted nearly 50 attendees, mostly university students but also individuals from the research and industry sectors. The strong student turnout was especially gratifying as it aligned with my goal of building a vibrant Nextflow community in Türkiye. I took the opportunity to discuss Nextflow’s ambassadorship and mentorship programs, which can greatly benefit students, given Türkiye’s growing interest in bioinformatics. The whole workshop was recorded and can be viewed on [YouTube](https://www.youtube.com/watch?v=AqNmIkoQrNo&ab_channel=RSG-Turkey).\n\nI am delighted to report that the workshop was a success. It was not only attracting considerable interest but also marked the commencement of a promising journey. Our collaboration with RSG-Türkiye persists, with plans underway for a more comprehensive on-site training session in Türkiye scheduled for later this year. I look forward to more engagement from Turkish participants as we work together to strengthen our community. Hopefully, this effort will lead to more Turkish-language content, new mentor relations from the core Nextflow team, and the emergence of a local Nextflow ambassador.\n\n
\n \"meme\n
\n\n## How can I contact the Nextflow Türkiye community?\n\nIf you want to help grow the Nextflow community in Türkiye, join the Nextflow and nf-core Slack workspaces and connect with Turkish contributors in the #region-turkiye channel. Don't be shy—say hello, and let's build up the community together! Feel free to contact me if you're interested in helping organize local hands-on Nextflow workshops. We welcome both advanced users and beginners. By participating, you'll contribute to the growth of bioinformatics in Türkiye, collaborate with peers, and access resources to advance your research and career.\n", + "images": [ + "/img/blog-2024-06-12-turkish_workshop1a.png", + "/img/blog-2024-06-12-turkish_workshop2a.png" + ] + }, + { + "slug": "2024/empowering-bioinformatics-mentoring", + "title": "Empowering Bioinformatics: Mentoring Across Continents with Nextflow", + "date": "2024-04-25T00:00:00.000Z", + "content": "\nIn my journey with the nf-core Mentorship Program, I've mentored individuals from Malawi, Chile, and Brazil, guiding them through Nextflow and nf-core. Despite the distances, my mentees successfully adapted their workflows, contributing to the open-source community. Witnessing the transformative impact of mentorship firsthand, I'm encouraged to continue participating in future mentorship efforts and urge others to join this rewarding experience. But how did it all start?\n\n\n\nI’m [Robert Petit](https://www.robertpetit.com/), a bioinformatician at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/), in [Wyoming, USA](https://en.wikipedia.org/wiki/Wyoming). If you don’t know where that is, haha that’s fine, I’m pretty sure half the people in the US don’t know either! Wyoming is the 10th largest US state (253,000 km2), but the least populated with only about 580,000 people. It’s home to some very beautiful mountains and national parks, large animals including bears, wolves and the fastest land animal in the northern hemisphere, the Pronghorn. But it’s rural, can get cold (-10 C) and the high wind speeds (somedays average 50 kmph, with gusts 100+ kmph) only make it feel colder during the winter (sometimes feeling like -60 C to -40 C). You might be wondering:\n\nHow did some random person from Wyoming get involved in the nf-core Mentorship Program, and end up being the only mentor to have participated in all three rounds?\n\nI’ve been in the Nextflow world for over 7 years now (as of 2024), when I first converted a pipeline, [Staphopia](https://staphopia.github.io/) from Ruffus to Nextflow. Eventually, I would develop [Bactopia](https://bactopia.github.io/latest/), one of the leading and longest maintained (5 years now!) Nextflow pipelines for the analysis of Bacterial genomes. Through Bactopia, I’ve had the opportunity to help people all around the world get started using Nextflow and analyzing their own bacterial sequencing. It has also allowed me to make numerous contributions to nf-core, mostly through the nf-core/modules. So, when I heard about the opportunity to be a mentor in the nf-core’s Mentorship Program, I immediately applied.\n\nRound 1! To be honest, I didn’t know what to expect from the program. Only that I would help a mentee with whatever they needed related to Nextflow and nf-core. Then at the first meeting, I learned I would be working with Phil Ashton the Lead Bioinformatcian at Malawi Liverpool Wellcome Trust, in Blantyre, Malawi, and immediately sent him a “Yo!”. Phil and I had run into each other in the past because when it comes to bacterial genomics, the field is very small! Phil’s goal was to get Nextflow pipelines running on their infrastructure in Malawi to help with their public health response. We would end up using Bactopia as the model. But this mentorship wasn’t just about “running Bactopia”, for Phil it was important we built a basic understanding of how things are working on the back-end with Nextflow. In the end, Phil was able to get Nextflow, and Bactopia running, using Singularity, but also gain a better understanding of Nextflow by writing his own Nextflow code.\n\nRound 2! When Round 2 was announced, I didn’t hesitate to apply again as a mentor. This time, I would be paired up with Juan Ugalde, an Assistant Professor at Universidad Andres Bello in Santiago, Chile. I think Juan and I were both excited by this, as similar to Phil, Juan and I had run into each other (virtually) through MetaSub, a project to sequence samples taken from public transport systems across the globe. Like many during the COVID-19 pandemic, Juan was pulled into the response, during which he began looking into Nextflow for other viruses. In particular, hantavirus, a public health concern due to it being endemic in parts of Chile. Juan had developed a pipeline for hantavirus sequence analysis, and his goal was to convert it into Nextflow. Throughout this Juan got to learn about the nf-core community and Nextflow development, which he was successful at! As he was able to convert his pipeline into Nextflow and make it publicly available as [hantaflow](https://github.com/microbialds/hantaflow).\n\nRound 3! Well Round 3 almost didn’t happen for me, but I’m glad it did happen! At the first meeting, I learned I would be paired with Ícaro Maia Santos de Castro, at the time a PhD candidate at the University of São Paulo, in São Paulo, Brazil. We quickly learned we were both fans of One Piece, as Ícaro’s GitHub picture was Luffy from One Piece, haha and my background included a poster from One Piece. With Ícaro, we were starting with the basics of Nextflow (e.g. the nf-core training materials) with the goal of writing a Nextflow pipeline for his meta-transcriptomics dissertation work. We set the goal to develop his Nextflow pipeline, before an overseas move he had a few months away. He brought so many questions, his motivation never waned, and once he was asking questions about Channel Operators, I knew he was ready to write his pipeline. While writing his pipeline he learned about the nf-core/tools and also got to submit a new recipe to Bioconda, and modules to nf-core. By the end of the mentorship, Ícaro had succeeded in writing his pipeline in Nextflow and making it publicly available at [phiflow](https://github.com/icaromsc/nf-core-phiflow).\n\n
\n \"phiflow\n

Metromap of the phiflow workflow

\n
\n\nThrough all three rounds, I had the opportunity to work with some incredible people! But the awesomeness didn’t end with my mentees. One thing that always stuck out to me was how motivated everyone was, both mentees and mentors. There was a sense of excitement and real progress was being made by every group. After the first round ended, I remember thinking to myself, “how could it get better?” Haha, well it did, and it continued to get better and better in Rounds 2 and 3. I think this is a great testament to the organizers at nf-core that put it all together, the mentors and mentees, and the community behind Nextflow and nf-core.\n\nFor the future mentees in mentorship opportunities! Please don’t let yourself stop you from applying. Whether it’s a time issue, or a fear of not having enough experience to be productive. In each round, we’ve had people from all over the world, starting from the ground with no experience, to some mentees in which I wondered if maybe they should have been a mentor (some mentees did end up being a mentor in the last round!). As a mentee, it is a great opportunity to work directly with a mentor dedicated to seeing you grow and build confidence when it comes to Nextflow and bioinformatics. In addition, you will be introduced to the incredible community that is behind Nextflow and nf-core. I think you will quickly learn there are so many people in this community that are willing to help!\n\nFor the future mentors! It’s always awesome to be able to help others learn, but sometimes the mentor needs to learn too! For me, I found the nf-core Mentorship Program to be a great opportunity to improve my skills as a mentor. But it wasn’t just from working with my mentees. During each round I was surrounded by many great role models in the form of mentors and mentees to learn from. No two groups ever had the same goals, so you really get the chance to see so many different styles of mentorship being implemented, all producing significant results for each mentee. Like I told the mentees, if the opportunity comes up again, take the chance and apply to be a mentor!\n\nThere have now been three rounds of the nf-core Mentorship Program, and I am very proud to have been a mentor in each round! During this I have learned so much and been able to help my mentees and the community grow. I look forward to seeing what the future holds for the mentorship opportunities in the Nextflow community, and I encourage potential mentors and mentees to consider joining the program!\n", + "images": [ + "/img/blog-2024-04-25-mentorship-img1a.png" + ] + }, + { + "slug": "2024/experimental-cleanup-with-nf-boost", + "title": "Experimental cleanup with nf-boost", + "date": "2024-08-08T00:00:00.000Z", + "content": "\n### Backstory\n\nWhen I (Ben) was in grad school, I worked on a Nextflow pipeline called [GEMmaker](https://github.com/systemsgenetics/gemmaker), an RNA-seq analysis pipeline similar to [nf-core/rnaseq](https://github.com/nf-core/rnaseq). We quickly ran into a problem, which is that on large runs, we were running out of storage! As it turns out, it wasn’t the final outputs, but the intermediate outputs (the BAM files, etc) that were taking up so much space, and we figured that if we could just delete those intermediate files sooner, we might be able to make it through a pipeline run without running out of storage. We were far from alone.\n\n\n\nAutomatic cleanup is currently the [oldest open issue](https://github.com/nextflow-io/nextflow/issues/452) on the Nextflow repository. For many users, the ability to quickly delete intermediate files makes the difference between a run being possible or impossible. [Stephen Ficklin](https://github.com/spficklin), the creator of GEMmaker, came up with a clever way to delete intermediate files and even “trick” Nextflow into skipping deleted tasks on a resumed run, which you can read about in the GitHub issue. It involved wiring the intermediate output channels to a “cleanup” process, along with a “done” signal from the relevant downstream processes to ensure that the intermediates were deleted at the right time.\n\nThis hack worked, but it required a lot of manual effort to wire up the cleanup process correctly, and it left me wondering whether it could be done automatically. Nextflow should be able to analyze the DAG, figure out when an output file can be deleted, and then delete it! During my time on the Nextflow team, I have implemented this exact idea in a [pull request](https://github.com/nextflow-io/nextflow/pull/3849), but there are still a few challenges to resolve, such as resuming from deleted runs (which is not as impossible as it sounds).\n\n### Introducing nf-boost: experimental features for Nextflow\n\nMany users have told me that they would gladly take the cleanup without the resume, so I found a way to provide the cleanup functionality in a plugin, which I call [nf-boost](https://github.com/bentsherman/nf-boost). This plugin is not just about automatic cleanup – it contains a variety of experimental features, like new operators and functions, that anyone can try today with a few extra lines of config, which is much less tedious than building Nextflow from a pull request. Not every new feature can be implemented via plugin, but for those features that can, it’s nice for the community to be able to try it out before we make it official.\n\nThe nf-boost plugin requires Nextflow v23.10.0 or later. You can enable the experimental cleanup by adding the following lines to your config file:\n\n```groovy\nplugins {\n id 'nf-boost'\n}\n\nboost {\n cleanup = true\n}\n```\n\n### Automatic cleanup: how it works\n\nThe strategy of automatic cleanup is simple:\n\n1. As soon as an output file can be deleted, delete it\n2. An output file can be deleted when (1) all downstream tasks that use the output file as an input have completed AND (2) the output file has been published (if it needs to be published)\n\nIn practice, the conditions for 2(a) are tricky to get right because Nextflow doesn’t know the full task graph from the start (thanks to the flexibility of Nextflow’s dataflow operators). But you don’t have to worry about any of that because we already figured out how to make it work! All you have to do is flip a switch (`boost.cleanup = true`) and enjoy the ride.\n\n### Real-world example\n\nLet’s consider a variant calling pipeline following standard best practices. Sequencing reads are mapped onto the genome, producing a BAM file which will be marked for duplicates, filtered, recalibrated using GATK, etc. This means that, for a given sample, at least four copies of the BAM file will be stored in the work directory. In other words, for an initial paired-end whole-exome sequencing (WES) sample of 12 GB, the work directory will quickly grow to 50 GB just to store the BAM files for one sample, or 100 GB for a paired sample (e.g. germline and tumor).\n\nNow suppose that we want to analyze a cohort of 100 patients – that’s ~10 TB of intermediate data, which is a real problem. For some users, it means processing only a few samples at a time, even though they might have the compute capacity to do much more. For others, it means not being able to process even one sample, because the accumulated intermediate data is simply too large. With automatic cleanup, Nextflow should be able to delete the previous BAM as soon as the next BAM is produced, for each sample independently.\n\nWe tested this use-case with a paired WES sample (total input size of 26.8 GB), by tracking the work directory size for a run with and a run without automatic cleanup. The results are shown below.\n\n\"disk\n\n_Note: we also changed the `boost.cleanupInterval` config option to 180 seconds, which was more optimal for our system._\n\nAs expected, we see that without automatic cleanup, the size of the work directory reaches 110 GB when all BAM files are produced and never deleted. On the other hand, when the nf-boost cleanup is enabled, the work directory occasionally peaks at ~50 GB (i.e. no more than two BAM files are stored at the same time), but always returns to ~25 GB, since the previous BAM is deleted immediately after the next BAM is ready. There is no impact on the size of the results (since they are identical) or the total runtime (since cleanup happens in parallel with the workflow itself).\n\nIn this case, automatic cleanup reduced the total storage by 50-75% (depending on how you measure the storage). In general, the effectiveness of automatic cleanup will depend greatly on how you write your pipeline. Here are a few rules of thumb that we’ve come up with so far:\n\n- As your pipeline becomes “deeper” (i.e. more processing steps in sequence), automatic cleanup becomes more effective, because it only needs to keep two steps’ worth of data, regardless of the total number of steps\n- As your pipeline becomes “wider” (i.e. more inputs being processed in parallel), automatic cleanup should have roughly the same level of effectiveness. If some samples take longer to process than others, the peak storage should be lower with automatic cleanup, since the “peaks” for each sample will happen at different times.\n- As you add more dependencies between processes, automatic cleanup becomes less effective, because it has to wait longer before it can delete the upstream outputs. Note that each output is tracked independently, so for example, sending logs to a summary process won’t affect the cleanup of other outputs from that same process.\n\n### Closing thoughts\n\nAutomatic cleanup in nf-boost is an experimental feature, and notably does not support resumability, meaning that the deleted files will simply be re-executed on a resumed run. While we work through these last few challenges, the nf-boost plugin is a nice option for users who want to benefit from what we’ve built so far and don’t need the resumability.\n\nThe nice thing about nf-boost’s automatic cleanup is that it is just a preview of what will eventually be the “official” cleanup feature in Nextflow (when it is merged), so by using nf-boost, you are helping the future of Nextflow directly! We hope that this experimental version will help users run workloads that were previously difficult or even impossible, and we look forward to when we can bring this feature home to Nextflow.\n", + "images": [ + "/img/blog-2024-08-08-nfboost-img1a.png" + ] + }, + { + "slug": "2024/how_i_became_a_nextflow_ambassador", + "title": "How I became a Nextflow Ambassador!", + "date": "2024-07-24T00:00:00.000Z", + "content": "\nAs a PhD student in bioinformatics, I aimed to build robust pipelines to analyze diverse datasets throughout my research. Initially, mastering Bash scripting was a time-consuming challenge, but this journey ultimately led me to become a Nextflow Ambassador, engaging actively with the expert Nextflow community.\n\n\n\nMy name is [Firas Zemzem](https://www.linkedin.com/in/firaszemzem/), a PhD student based in [Tunisia](https://www.google.com/search?q=things+to+do+in+tunisia&sca_esv=3b07b09e3325eaa7&sca_upv=1&udm=15&biw=1850&bih=932&ei=AS2eZuqnFpG-i-gPwciJyAk&ved=0ahUKEwiqrOiRsbqHAxUR3wIHHUFkApkQ4dUDCBA&uact=5&oq=things+to+do+in+tunisia&gs_lp=Egxnd3Mtd2l6LXNlcnAiF3RoaW5ncyB0byBkbyBpbiB0dW5pc2lhMgUQABiABDIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yCBAAGBYYHhgPSOIGULYDWNwEcAF4AZABAJgBfaAB9gGqAQMwLjK4AQPIAQD4AQGYAgOgAoYCwgIKEAAYsAMY1gQYR5gDAIgGAZAGCJIHAzEuMqAH_Aw&sclient=gws-wiz-serp) working with the Laboratory of Cytogenetics, Molecular Genetics, and Biology of Reproduction at CHU Farhat Hached Sousse. I was specialized in human genetics, focusing on studying genomics behind neurodevelopmental disorders. Hence Developing methods for detecting SNPs and variants related to my work was crucial step for advancing medical research and improving patient outcomes. On the other hand, pipelines integration and bioinformatics tools were essential in this process, enabling efficient data analysis, accurate variant detection, and streamlined workflows that enhance the reliability and reproducibility of our findings.\n\n## The initial nightmare of Bash\n\nDuring my master's degree, I was a steadfast user of Bash scripting. Bash had been my go-to tool for automating tasks and managing workflows in my bioinformatics projects, such as variant calling. Its simplicity and versatility made it an indispensable part of my toolkit. I was writing Bash scripts for various next-generation sequencing (NGS) high-throughput analyses, including data preprocessing, quality control, alignment, and variant calling. However, as my projects grew more complex, I began to encounter the limitations of Bash. Managing dependencies, handling parallel executions, and ensuring reproducibility became increasingly challenging. Handling the vast amount of data generated by NGS and other high-throughput technologies was cumbersome. Using Bash became a nightmare for debugging and maintaining. I spent countless hours trying to make it work, only to be met with more errors and inefficiencies. It was nearly impossible to scale for larger datasets and more complex analyses. Additionally, managing different environments and versions of tools was beyond Bash's capabilities. I needed a solution that could handle these challenges more gracefully.\n\n## Game-Changing Call\n\nOne evening, I received a call from my friend, Mr. HERO, a bioinformatician. As we discussed our latest projects, I vented my frustrations with Bash. Mr. HERO, as I called him, the problem-solver, mentioned a tool called Nextflow. He described how it had revolutionized his workflow, making complex pipeline management a breeze. Intrigued, I decided to look into it.\n\n## Diving Into the process\n\nReading the [documentation](https://www.nextflow.io/docs/latest/index.html) and watching [tutorials](https://training.nextflow.io/) were my first steps. Nextflow's approach to workflow management was a revelation. Unlike Bash, Nextflow was designed to address the complexities of modern computational questions. It provided a transparent, declarative syntax for defining tasks and their dependencies and supported parallel execution out of the box. The first thing I did when I decided to convert one of my existing Bash scripts into a Nextflow pipeline was to start experimenting with simple code. Doing this was no small feat. I had to rethink my approach to workflow design and embrace a new way of defining tasks and dependencies. My learning curve was not too steep, so understanding how to translate my Bash logic into Nextflow's domain-specific language (DSL) was not that hard.\n\n## Eureka Moment: First run\n\nThe first time I ran my Nextflow pipeline, I was amazed by how smoothly and efficiently it handled tasks that previously took hours to debug and execute in Bash. Nextflow managed task dependencies, parallel execution, and error handling with ease, resulting in a faster, more reliable, and maintainable pipeline. The ability to run pipelines on different computing environments, from local machines to high-performance clusters and cloud platforms, was a game-changer. Several Nextflow features were particularly valuable: Containerization Support using Docker and Singularity ensured consistency across environments; Error Handling with automatic retry mechanisms and detailed error reporting saved countless debugging hours; Portability and scalability allowed seamless execution on various platforms; Modularity facilitated the reuse and combination of processes across different pipelines, enhancing efficiency and organization; and Reproducibility features, including versioning and traceability, ensured that workflows could be reliably reproduced and shared across different research projects and teams.\n\n
\n \"meme\n
\n\n## New Horizons: Becoming a Nextflow Ambassador\n\nSwitching from Bash scripting to Nextflow was more than just adopting a new tool. It was about embracing a new mindset. Nextflow’s emphasis on scalability, reproducibility, and ease of use transformed how I approached bioinformatics. The initial effort to learn Nextflow paid off in spades, leading to more robust, maintainable, and scalable workflows. My enthusiasm and advocacy for Nextflow didn't go unnoticed. Recently, I became a Nextflow Ambassador. This role allows me to further contribute to the community, promote best practices, and support new users as they embark on their own Nextflow journeys.\n\n## Future Projects and Community Engagement\n\nCurrently I am working on developing a Nextflow pipeline with my team that will help in analyzing variants, providing valuable insights for medical and clinical applications. This pipeline aims to improve the accuracy and efficiency of variant detection, ultimately supporting better diagnostic for patients with various genetic conditions. As part of my ongoing efforts within the Nextflow community, I am planning a series of projects aimed at developing and sharing advanced Nextflow pipelines tailored to specific genetic rare disorder analyses. These initiative will include detailed tutorials, case studies, and collaborative efforts with other researchers to enhance the accessibility and utility of Nextflow for various bioinformatics applications. Additionally, I plan to host workshops and seminars to spread knowledge and best practices among my colleagues and other researchers. This will help foster a collaborative environment where we can all benefit from the power and flexibility of Nextflow.\n\n## Invitation for researchers over the world\n\nAs a Nextflow Ambassador, I invite you to become part of a dynamic group of experts and enthusiasts dedicated to advancing workflow automation. Whether you're just starting or looking to deepen your knowledge, our community offers invaluable resources, support, and networking opportunities. You can chat with us on the [Nextflow Slack Workspace](https://www.nextflow.io/slack-invite.html) and ask your questions at the [Seqera Community Forum](https://community.seqera.io).\n", + "images": [ + "/img/ZemFiras-nextflowtestpipeline-Blog.png" + ] + }, + { + "slug": "2024/nextflow-24.04-highlights", + "title": "Nextflow 24.04 - Release highlights", + "date": "2024-05-27T00:00:00.000Z", + "content": "\nWe release an \"edge\" version of Nextflow every month and a \"stable\" version every six months. The stable releases are recommended for production usage and represent a significant milestone. The [release changelogs](https://github.com/nextflow-io/nextflow/releases) contain a lot of detail, so we thought we'd highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let's get into it!\n\n:::tip\nWe also did a podcast episode about some of these changes!\nCheck it out here: [Channels Episode 41](/podcast/2024/ep41_nextflow_2404.html).\n:::\n\n## Table of contents\n\n- [New features](#new-features)\n - [Seqera Containers](#seqera-containers)\n - [Workflow output definition](#workflow-output-definition)\n - [Topic channels](#topic-channels)\n - [Process eval outputs](#process-eval-outputs)\n - [Resource limits](#resource-limits)\n - [Job arrays](#job-arrays)\n- [Enhancements](#enhancements)\n - [Colored logs](#colored-logs)\n - [AWS Fargate support](#aws-fargate-support)\n - [OCI auto pull mode for Singularity and Apptainer](#oci-auto-pull-mode-for-singularity-and-apptainer)\n - [Support for GA4GH TES](#support-for-ga4gh-tes)\n- [Fusion](#fusion)\n - [Enhanced Garbage Collection](#enhanced-garbage-collection)\n - [Increased File Handling Capacity](#increased-file-handling-capacity)\n - [Correct Publishing of Symbolic Links](#correct-publishing-of-symbolic-links)\n- [Other notable changes](#other-notable-changes)\n\n## New features\n\n### Seqera Containers\n\nA new flagship community offering was revealed at the Nextflow Summit 2024 Boston - **Seqera Containers**. This is a free-to-use container cache powered by [Wave](https://seqera.io/wave/), allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a [dedicated blog post](https://seqera.io/blog/introducing-seqera-pipelines-containers/) about this, but it's worth noting that the service can be used directly from Nextflow and not only through [https://seqera.io/containers/](https://seqera.io/containers/)\n\nIn order to use Seqera Containers in Nextflow, simply set `wave.freeze` _without_ setting `wave.build.repository` - for example, by using the following config for your pipeline:\n\n```groovy\nwave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'\n```\n\nAny processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether `singularity.enabled` is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the `nextflow inspect` command.\n\n### Workflow output definition\n\nThe workflow output definition is a new syntax for defining workflow outputs:\n\n```groovy\nnextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]\n```\n\nIt essentially provides a DSL2-style approach for publishing, and will replace `publishDir` once it is finalized. It also provides extra flexibility as it allows you to publish _any_ channel, not just process outputs. See the [Nextflow docs](https://nextflow.io/docs/latest/workflow.html#publishing-outputs) for more information.\n\n:::info\nThis feature is still in preview and may change in a future release.\nWe hope to finalize it in version 24.10, so don't hesitate to share any feedback with us!\n:::\n\n### Topic channels\n\nTopic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or \"topic\":\n\n```groovy\nprocess foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}\n```\n\nTopic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the `mix` operator). See the [Nextflow docs](https://nextflow.io/docs/latest/channel.html#topic) for more information.\n\n### Process `eval` outputs\n\nProcess `eval` outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:\n\n```groovy\nprocess sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}\n```\n\nThe shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a `path` or `env` output. The new `eval` output is a much more convenient way to capture this kind of command output directly. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#output-type-eval) for more information.\n\n#### Collecting software versions\n\nTogether, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:\n\n```groovy\nprocess FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}\n```\n\nThis approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq [here](https://github.com/nf-core/rnaseq/pull/1109) and [here](https://github.com/nf-core/rnaseq/pull/1115) to see them in action!\n\n### Resource limits\n\nThe **resourceLimits** directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:\n\n```groovy\nprocess.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]\n```\n\nIf a task requests more than the specified limit (e.g. due to [retry with dynamic resources](https://nextflow.io/docs/latest/process.html#dynamic-computing-resources)), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the `check_max()` function, which can now be replaced with `resourceLimits`. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#resourcelimits) for more information.\n\n### Job arrays\n\n**Job arrays** are now supported in Nextflow using the `array` directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a \"job array\" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.\n\nWith Nextflow, using job arrays is a one-liner:\n\n```groovy\nprocess.array = 100\n```\n\nYou can also enable job arrays for individual processes like any other directive. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#array) for more information.\n\n:::tip\nOn Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the `machineType` directive in conjunction with the `cpus` and `memory` directives.\n:::\n\n## Enhancements\n\n### Colored logs\n\n
\n\n**Colored logs** have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire [blog post](https://nextflow.io/blog/2024/nextflow-colored-logs.html) about it, so go check that out for more details!\n\n
\n\n![New coloured output from Nextflow](/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png)\n\n
\n\n### AWS Fargate support\n\nNextflow now supports **AWS Fargate** for AWS Batch jobs. See the [Nextflow docs](https://nextflow.io/docs/latest/aws.html#aws-fargate) for details.\n\n### OCI auto pull mode for Singularity and Apptainer\n\nNextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format.\n\nNow Nextflow allows specifying the option `ociAutoPull` both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the `exec` command.\n\n```groovy\nsingularity.ociAutoPull = true\n```\n\nThis results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/config.html#scope-singularity) for more information.\n\n### Support for GA4GH TES\n\nThe [Task Execution Service (TES)](https://ga4gh.github.io/task-execution-schemas/docs/) is an API specification, developed by [GA4GH](https://www.ga4gh.org/), which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are [Funnel](https://github.com/ohsu-comp-bio/funnel) and [TES Azure](https://github.com/microsoft/ga4gh-tes).\n\nNextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the `bin` directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:\n\n```groovy\nplugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'\n```\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/executor.html#ga4gh-tes) for more information.\n\n:::note\nTo better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, `nextflow-io/nf-ga4gh`. To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above.\n:::\n\n## Fusion\n\n[Fusion](https://seqera.io/fusion/) is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:\n\n### Enhanced Garbage Collection\n\nFusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage.\n\n### Increased File Handling Capacity\n\nSupport for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets.\n\n### Correct Publishing of Symbolic Links\n\nIn previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.\n\nThese enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users.\n\n## Other notable changes\n\n- Add native retry on spot termination for Google Batch ([`ea1c1b`](https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b))\n- Add support for instance templates in Google Batch ([`df7ed2`](https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96))\n- Allow secrets to be used with `includeConfig` ([`00c9f2`](https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d))\n- Allow secrets to be used in the pipeline script ([`df866a`](https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27))\n- Add retry strategy for publishing ([`c9c703`](https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761))\n- Add `k8s.cpuLimits` config option ([`3c6e96`](https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7))\n- Removed `seqera` and `defaults` from the standard channels used by the nf-wave plugin. ([`ec5ebd`](https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062))\n\nYou can view the full [Nextflow release notes on GitHub](https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0).\n", + "images": [] + }, + { + "slug": "2024/nextflow-colored-logs", + "title": "Nextflow's colorful new console output", + "date": "2024-03-28T00:00:00.000Z", + "content": "\nNextflow is a command-line interface (CLI) tool that runs in the terminal. Everyone who has launched Nextflow from the command line knows what it’s like to follow the console output as a pipeline runs: the excitement of watching jobs zipping off as they’re submitted, the satisfaction of the phrase _\"Pipeline completed successfully!\"_ and occasionally, the sinking feeling of seeing an error message.\n\nBecause the CLI is the primary way that people interact with Nextflow, a little bit of polish can have a big effect. In this article, I’m excited to describe an upgrade for the console output that should make monitoring workflow progress just a little easier.\n\nThe new functionality is available in `24.02-0-edge` and will be included in the next `24.04.0` stable release. You can try it out now by updating Nextflow as follows:\n\n```bash\nNXF_EDGE=1 nextflow self-update\n```\n\n## Background\n\nThe Nextflow console output hasn’t changed much over the 10 years that it’s been around. The biggest update happened in 2018 when \"ANSI logging\" was released in version `18.10.0`. This replaced the stream of log messages announcing each task submission with a view that updates dynamically, giving an overview of each process. This gives an overview of the pipeline’s progress rather than being swamped with thousands of individual task submissions.\n\n
\n \"Nextflow\n
\n\nANSI console output. Nextflow log output from running the nf-core/rnaseq pipeline before (Left) and after (Right) enabling ANSI logging.\n\n
\n
\n\nI can be a little obsessive about tool user interfaces. The nf-core template, as well as MultiQC and nf-core/tools all have coloured terminal output, mostly using the excellent [textualize/rich](https://github.com/Textualize/rich). I’ve also written a couple of general-use tools around this such as [ewels/rich-click](https://github.com/ewels/rich-click/) for Python CLI help texts, and [ewels/rich-codex](https://github.com/ewels/rich-codex) to auto-generate screenshots from code / commands in markdown. The problem with being surrounded by so much colored CLI output is that any tools _without_ colors start to stand out. Dropping hints to the Nextflow team didn’t work, so eventually I whipped up [a proposal](https://github.com/nextflow-io/nextflow/issues/3976) of what the console output could look like using the tools I knew: Python and Rich. Paolo knows me well and [offered up a bait](https://github.com/nextflow-io/nextflow/issues/3976#issuecomment-1568071479) that I couldn’t resist: _\"Phil. I think this a great opportunity to improve your Groovy skills 😆\"._\n\n## Showing what’s important\n\nThe console output shown by Nextflow describes a range of information. Much of it aligns in vertical columns, but not all. There’s also a variety of fields, some of which are more important than others to see at a glance.\n\n
\n \"New\n
\n\nIntroducing: colored console output. Output from running nf-core/rnaseq with the new colors applied (nf-core header removed for clarity).\n\n
\n
\n\nWith some judicious use of the `dim` style, we can make less important information fade into the background. For example, the \"stem\" of the fully qualified process identifiers now step back to allow the process name to stand out. Secondary information such as the number of tasks that were cached, or the executor that is being submitted to, are still there to see but take a back seat. Doing the reverse with some `bold` text helps to highlight the run name – key information for identifying and resuming pipeline runs. Using color allows different fields to be easily distinguished, such as process labels and task hashes. Greens, blues, and reds in the task statuses allow a reader to get an impression of the run progress without needing to read every number.\n\nProbably the most difficult aspect technically was the `NEXTFLOW` header line. I knew I wanted to use the _\"Nextflow Green\"_ here, or as close to it as possible. But colors in the terminal are tricky. What the ANSI standard defines as `green`, `black`, and `blue` can vary significantly across different systems and terminal themes. Some people use a light color scheme and others run in dark mode. This hadn’t mattered much for most of the colors up until this point - I could use the [Jansi](https://github.com/fusesource/jansi) library to use named colors and they should look ok. But for the specific RGB of the _\"Nextflow Green\"_ I had to [hardcode specific ANSI control characters](https://github.com/nextflow-io/nextflow/blob/c9c7032c2e34132cf721ffabfea09d893adf3761/modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy#L379-L389). But it got worse - it turns out that the default Terminal app that ships with macOS only supports 256 colors, so I had to find the closest match (_\"light sea green\"_ if you’re curious). Even once the green was ok, using `black` as the text color meant that it would actually render as white with some terminal color themes and be unreadable. In the end, the header text is a very dark gray.\n\n
\n \"Testing\n
\n\nTesting color rendering across a wide range of themes in the OS X Terminal app.\n\n
\n
\n\n## More than just colors\n\nWhilst the original intent was focused on using color, it didn’t take long to come up with a shortlist of other niggles that I wanted to fix. I took this project as an opportunity to address a few of these, specifically:\n\n- Make the most of the available width in the terminal:\n - Redundant text is now cut down when the screen is narrow. Specifically the repeated `process >` text, plus other small gains such as replacing the three `...` characters with a single `…` character. The percentage-complete is removed if the window is really narrow. These changes happen dynamically every time the screen refreshes, so should update if you resize the terminal window.\n- Be more selective about which part of process names are truncated:\n - There’s only so much width that can be saved, and fully qualified process names are long. The current Nextflow console output truncates the end of the identifier if there’s no space, but this is the part that varies most between pipeline steps. Instead, we can truncate the start and preserve the process name and label.\n- Don’t show all pending processes without tasks:\n - The existing ANSI logging shows _all_ processes in the pipeline, even those that haven’t had any tasks submitted. If a pipeline has a lot of processes this can push the running processes out of view.\n - Nextflow now tracks the number of available rows in the terminal and hides pending processes once we run out of space. Running processes are always printed.\n\nThe end result is console output that makes the most of the available space in your terminal window:\n\n
\n \"Nextflow\n
\n\nProgress of the nf-core/rnaseq shown across 3 different terminal-width breakpoints, with varying levels of text truncation.\n\n
\n
\n\n## Contributing to Nextflow\n\nDespite building tools that use Nextflow for many years, I’ve spent relatively little time venturing into the main codebase myself. Just as with any contributor, part of the challenge was figuring out how to build Nextflow, how to navigate its code structure and how to write tests. I found it quite a fun experience, so I described and demoed the process in a recent nf-core Bytesize talk titled \"[Contributing to Nextflow](https://nf-co.re/events/2024/bytesize_nextflow_dev)\". You can watch the talk on [YouTube](https://www.youtube.com/watch?v=R0fqk5OS-nw), where I explain the mechanics of forking Nextflow, enhancing, compiling, and testing changes locally, and contributing enhancements back to the main code base.\n\n
\n \n
\n\n## But wait, there’s more!\n\nI’m happy with how the new console output looks, and it seems to have been well received so far. But once the warm glow of the newly merged pull request started to subside, I realized there was more to do. The console output is great for monitoring a running pipeline, but I spend most of my time these days digging through much more verbose `.nextflow.log` files. Suddenly it seemed a little unfair that these didn’t also benefit from a similar treatment.\n\nThis project was a little different because the logs are just files on the disk, meaning that I could approach the problem with whatever code stack I liked. Coincidentally, [Will McGugan](https://github.com/willmcgugan) (author of [textualize/rich](https://github.com/Textualize/rich)) was recently [writing about](https://textual.textualize.io/blog/2024/02/11/file-magic-with-the-python-standard-library/) a side project of his own: [Toolong](https://github.com/textualize/toolong). This is a terminal app built using [Textual](https://www.textualize.io/) which is specifically aimed at viewing large log files. I took it for a spin and it did a great job with Nextflow log files right out of the box, but I figured that I could take it further. At its core, Toolong uses the [Rich](https://github.com/textualize/rich) library to format text and so with a little hacking, I was able to introduce a handful of custom formatters for the Nextflow logs. And voilà, we have colored console output for log files too!\n\n
\n \"Formatting\n
\n\nThe tail end of a `.nextflow.log` file, rendered with `less` (Left) and Toolong (Right). Try finding the warning log message in both!\n\n
\n
\n\nBy using Toolong as a viewer we get much more than just syntax highlighting too - it provides powerful file navigation and search functionality. It also supports tailing files in real time, so you can launch a pipeline in one window and tail the log in another to have the best of both worlds!\n\n
\n \n
\n\nRunning nf-core/rnaseq with the new Nextflow coloured console output (Left) whilst simultaneously tailing the `.nextflow.log` file using `nf-core log` (Right).\n\n
\n
\n\nThis work with Toolong is still in two [open](https://github.com/Textualize/toolong/pull/47) [pull requests](https://github.com/nf-core/tools/pull/2895) as I write this, but hopefully you’ll soon be able to use the `nf-core log` command in a directory where you’ve run Nextflow, and it’ll launch Toolong with any log files it finds.\n", + "images": [ + "/img/blog-nextflow-colored-logs/nextflow_log_with_without_ansi.png", + "/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png", + "/img/blog-nextflow-colored-logs/testing_terminal_themes.png", + "/img/blog-nextflow-colored-logs/nextflow_console_varying_widths.png", + "/img/blog-nextflow-colored-logs/nextflow_logs_side_by_side.png" + ] + }, + { + "slug": "2024/nextflow-nf-core-ancient-env-dna", + "title": "Application of Nextflow and nf-core to ancient environmental eDNA", + "date": "2024-04-17T00:00:00.000Z", + "content": "\nAncient environmental DNA (eDNA) is currently a hot topic in archaeological, ecological, and metagenomic research fields. Recent eDNA studies have shown that authentic ‘ancient’ DNA can be recovered from soil and sediments even as far back as 2 million years ago(1). However, as with most things metagenomics (the simultaneous analysis of the entire DNA content of a sample), there is a need to work at scale, processing the large datasets of many sequencing libraries to ‘fish’ out the tiny amounts of temporally degraded ancient DNA from amongst a huge swamp of contaminating modern biomolecules.\n\n\n\nThis need to work at scale, while also conducting reproducible analyses to demonstrate the authenticity of ancient DNA, lends itself to the processing of DNA with high-quality pipelines and open source workflow managers such as Nextflow. In this context, I was invited to the Australian Center for Ancient DNA (ACAD) at the University of Adelaide in February 2024 to co-teach a graduate-level course on ‘Hands-on bioinformatics for ancient environmental DNA’, alongside other members of the ancient eDNA community. Workshop participants included PhD students from across Australia, New Zealand, and even from as far away as Estonia.\n\n
\n \"Mentor\n © Photo: Peter Mundy and Australian Center for Ancient DNA\n
\n\nWe began the five-day workshop with an overview of the benefits of using workflow managers and pipelines in academic research, which include efficiency, portability, reproducibility, and fault-tolerance, and we then proceeded to introduce the Ph.D. students to installing Nextflow, and configure pipelines for running on different types of computing infrastructure.\n\n
\n \"Review\n © Photo: Peter Mundy and Australian Center for Ancient DNA\n
\n\nOver the next two days, I then introduced two well-established nf-core pipelines: [nf-core/eager](https://nf-co.re/eager) (2) and [nf-core/mag](https://nf-co.re/mag) (3), and explained to students how these pipelines can be applied to various aspects of environmental metagenomic and ancient DNA analysis:\nnf-core/eager is a dedicated ‘swiss-army-knife’ style pipeline for ancient DNA analysis that performs genetic data preprocessing, genomic alignment, variant calling, and metagenomic screening with specific tools and parameters to account for the characteristics of degraded DNA.\nnf-core/mag is a best-practice pipeline for metagenomic de novo assembly of microbial genomes that performs preprocessing, assembly, binning, bin-refinement and validation. It also contains a specific subworkflow for the authentication of ancient contigs.\nIn both cases, the students of the workshops were given practical tasks to set up and run both pipelines on real data, and time was spent exploring the extensive nf-core documentation and evaluating the outputs from MultiQC, both important components that contribute to the quality of nf-core pipelines.\n\nThe workshop was well received by students, and many were eager (pun intended) to start running Nextflow and nf-core pipelines on their own data at their own institutions.\n\nI would like to thank Vilma Pérez at ACAD for the invitation to contribute to the workshop as well as Mikkel Winther Pedersen for being my co-instructor, and the nf-core community for continued support in the development of the pipelines. Thank you also to Tina Warinner for proof-reading this blog post, and I would like to acknowledge [ACAD](https://www.adelaide.edu.au/acad/), the [University of Adelaide Environment Institute](https://www.adelaide.edu.au/environment/), the [Werner Siemens-Stiftung](https://www.wernersiemens-stiftung.ch/), [Leibniz HKI](https://www.leibniz-hki.de/), and [MPI for Evolutionary Anthropology](https://www.eva.mpg.de) for financial support to attend the workshop and support in developing nf-core pipelines.\n\n---\n\n(1) Kjær, K.H., Winther Pedersen, M., De Sanctis, B. et al. A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA. Nature **612**, 283–291 (2022). [https://doi.org/10.1038/s41586-022-05453-y](https://doi.org/10.1038/s41586-022-05453-y)\n\n(2) Fellows Yates, J.A., Lamnidis, T.C., Borry, M., Andrades Valtueña, A., Fagernäs, Z., Clayton, S., Garcia, M.U., Neukamm, J., Peltzer, A.. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:10947 (2021) [http://doi.org/10.7717/peerj.10947](http://doi.org/10.7717/peerj.10947)\n\n(3) Krakau, S., Straub, D., Gourlé, H., Gabernet, G., Nahnsen, S., nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning, NAR Genomics and Bioinformatics, **4**:1 (2022) [https://doi.org/10.1093/nargab/lqac007](https://doi.org/10.1093/nargab/lqac007)\n", + "images": [ + "/img/blog-2024-04-17-img1a.jpg", + "/img/blog-2024-04-17-img1b.jpg" + ] + }, + { + "slug": "2024/nf-schema", + "title": "nf-schema: the new and improved nf-validation", + "date": "2024-05-01T00:00:00.000Z", + "content": "\nCheck out Nextflow's newest plugin, nf-schema! It's an enhanced version of nf-validation, utilizing JSON schemas to validate parameters and sample sheets. Unlike its predecessor, it supports the latest JSON schema draft and can convert pipeline-generated files. But what's the story behind its development?\n\n\n\n`nf-validation` is a well-known Nextflow plugin that uses JSON schemas to validate parameters and sample sheets. It can also convert sample sheets to channels using a built-in channel factory. On top of that, it can create a nice summary of pipeline parameters and can even be used to generate a help message for the pipeline.\n\nAll of this has made the plugin very popular in the Nextflow community, but it wasn’t without its issues. For example, the plugin uses an older version of the JSON schema draft, namely draft `07` while the latest draft is `2020-12`. It also can’t convert any files/sample sheets created by the pipeline itself since the channel factory is only able to access values from pipeline parameters.\n\nBut then `nf-schema` came to the rescue! In this plugin we rewrote large parts of the `nf-validation` code, making the plugin way faster and more flexible while adding a lot of requested features. Let’s see what’s been changed in this new and improved version of `nf-validation`.\n\n# What a shiny new JSON schema draft\n\nTo quote the official JSON schema website:\n\n> “JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.”\n\nThis one sentence does an excellent job of explaining what JSON schema is and why it was such a great fit for `nf-validation` and `nf-schema`. By using these schemas, we can validate pipeline inputs in a way that would otherwise be impossible. The JSON schema drafts define a set of annotations that are used to set some conditions to which the data has to adhere. In our case, this can be used to determine what a parameter or sample sheet value should look like (this can range from what type of data it has to be to a specific pattern that the data has to follow).\n\nThe JSON schema draft `07` already has a lot of useful annotations, but it lacked some special annotations that could elevate our validations to the next level. That’s where the JSON schema draft `2020-12` came in. This draft contained a lot more specialized annotations, like dependent requirements of values (if one value is set, another value also has to be set). Although this example was already possible in `nf-validation`, it was poorly implemented and didn’t follow any consensus specified by the JSON schema team.\n\n
\n \"meme\n
\n\n# Bye-bye Channel Factory, hello Function\n\nOne major shortcoming in the `nf-validation` plugin was the lack of the `fromSamplesheet` channel factory to handle files created by the pipeline (or files imported from another pipeline as part of a meta pipeline). That’s why we decided to remove the `fromSamplesheet` channel factory and replace it with a function called `samplesheetToList` that can be deployed in an extremely flexible way. It takes two inputs: the sample sheet to be validated and converted, and the JSON schema used for the conversion. Both inputs can either be a `String` value containing the path to the files or a Nextflow `file` object. By converting the channel factory to a function, we also decoupled the parameter schema from the actual sample sheet conversion. This means all validation and conversion of the sample sheet is now fully done by the `samplesheetToList` function. In `nf-validation`, you could add a relative path to another JSON schema to the parameter schema so that the plugin would validate the file given with that parameter using the supplied JSON schema. It was necessary to also add this for sample sheet inputs as they would not be validated otherwise. Due to the change described earlier, the schema should no longer be given to the sample sheet inputs because they will be validated twice that way. Last, but certainly not least, this function also introduces the possibility of using nested sample sheets. This was probably one of the most requested features and it’s completely possible right now! Mind that this feature only works for YAML and JSON sample sheets since CSV and TSV do not support nesting.\n\n# Configuration sensation\n\nIn `nf-validation`, you could configure how the plugin worked by certain parameters (like `validationSchemaIgnoreParams`, which could be used to exempt certain parameters from the validation). These parameters have now been converted to proper configuration options under the `validation` scope. The `validationSchemaIgnoreParams` has even been expanded into two configuration options: `validation.ignoreParams` and `validation.defaultIgnoreParams`. The former is to be used by the pipeline user to exclude certain parameters from validation, while the latter is to be used by the pipeline developer to set which parameters should be ignored by default. The plugin combines both options so users no longer need to supply the defaults alongside their parameters that need to be ignored.\n\n# But, why not stick to nf-validation?\n\nIn February we released an earlier version of these changes as `nf-validation` version `2.0.0`. This immediately caused massive issues in quite some nf-core pipelines (I think I set a new record of how many pipelines could be broken by one release). This was due to the fact that a lot of pipelines didn’t pin the `nf-validation` version, so all these pipelines started pulling the newest version of `nf-validation`. The pipelines all started showing errors because this release contained breaking changes. For that reason we decided to remove the version `2.0.0` release until more pipelines pinned their plugin versions.\n\n
\n \"meme\n
\n\nSome discussion arose from this and we decided that version `2.0.0` would always cause issues since a lot of older versions of the nf-core pipelines didn’t pin their nf-validation version either, which would mean that all those older versions (that were probably running as production pipelines) would suddenly start breaking. That’s why there seemed to be only one sensible solution: make a new plugin with the breaking changes! And it would also need a new name. We started collecting feedback from the community and got some very nice suggestions. I made a poll with the 5 most popular suggestions and let everyone vote on their preferred options. The last place was tied between `nf-schemavalidator` and `nf-validationutils`, both with 3 votes. In third place was `nf-checker` with 4 votes. The second place belonged to `nf-validation2` with 7 votes. And with 13 votes we had a winner: `nf-schema`!\n\nSo, a fork was made of `nf-validation` that we called `nf-schema`. At this point, the only breaking change was the new JSON schema draft, but some other feature requests started pouring in. That’s the reason why the new `samplesheetToList` function and the configuration options were implemented before the first release of `nf-schema` on the 22nd of April 2024.\n\nAnd to try and mitigate the same issue from ever happening again, we added an automatic warning when the pipeline is being run with an unpinned version of nf-schema:\n\n
\n \"meme\n
\n\n# So, what’s next?\n\nOne of the majorly requested features is the support for nested parameters. The version `2.0.0` already was getting pretty big so I decided not to implement any extra features into it. This is, however, one of the first features that I will try to tackle in version `2.1.0`.\n\nFurthermore, I’d also like to improve the functionality of the `exists` keyword to also work for non-conventional paths (like s3 and azure paths).\n\nIt’s also a certainty that some weird bugs will pop up over time, those will, of course, also be fixed.\n\n# Useful links\n\nHere are some useful links to get you started on using `nf-schema`:\n\nIf you want to easily migrate from nf-validation to `nf-schema`, you can use the migration guide: https://nextflow-io.github.io/nf-schema/latest/migration_guide/\n\nIf you are completely new to the plugin I suggest reading through the documentation: https://nextflow-io.github.io/nf-schema/latest/\n\nIf you need some examples, look no further: https://github.com/nextflow-io/nf-schema/tree/master/examples\n\nAnd to conclude this blog post, here are some very wise words from Master Yoda himself:\n\n
\n \"meme\n
\n", + "images": [ + "/img/blog-2024-05-01-nfschema-img1a.jpg", + "/img/blog-2024-05-01-nfschema-img1b.jpg", + "/img/blog-2024-05-01-nfschema-img1c.png", + "/img/blog-2024-05-01-nfschema-img1d.jpg" + ] + }, + { + "slug": "2024/nf-test-in-nf-core", + "title": "Leveraging nf-test for enhanced quality control in nf-core", + "date": "2024-04-03T00:00:00.000Z", + "content": "\n# The ever-changing landscape of bioinformatics\n\nReproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is **hopefully** being updated, and pipelines are **ideally** being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.\n\n\n\n# The Wonderful World of Testing\n\n> \"Software testing is the process of evaluating and verifying that a software product does what it is supposed to do,\"\n> Lukas Forer, co-creator of nf-test.\n\nSoftware testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs across runs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are introduced during an update.\n\n# Writing effective tests\n\nAlthough having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups:\n\n1. Which inputs/functionalities should be tested?\n2. What contents should be tested?\n\n## Consideration 1: Testing inputs/functionality\n\nGenerally, software will have a default or most common use case. For instance, the nf-core [FastQC](https://nf-co.re/modules/fastqc) module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage ([\"the degree to which a test or set of tests exercises a particular program or system\"](https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/)), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the [bowtie2/align](https://nf-co.re/modules/bowtie2_align) module, aside from input files, the `save_unaligned` and `sort_bam` parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process.\n\n## Consideration 2: Testing outputs\n\nOnce test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible (for example, outputs containing timestamps or paths). In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced.\n\n# Testing in nf-core\n\nnf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs.\n\n# New and Improved Nextflow Testing with nf-test\n\nCreated by [Lukas Forer](https://github.com/lukfor) and [Sebastian Schönherr](https://github.com/seppinho), nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform:\n\n1. **Comprehensive Output Testing**: nf-test employs [snapshots](https://www.nf-test.com/docs/assertions/snapshots/) for handling complex data structures. This feature evaluates the contents of any specified output channel/file, enabling comprehensive and reliable tests that ensure data integrity following changes.\n2. **A Consistent Testing Framework for All Nextflow Components**: nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components.\n3. **A DSL for Tests**: Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing.\n4. **Readable Assertions**: nf-test offers a wide range of functions for writing clear and understandable [assertions](https://www.nf-test.com/docs/assertions/assertions/), improving the clarity and maintainability of tests.\n5. **Boilerplate Code Generation**: To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests.\n\n# But wait… there's more!\n\nThe merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have nf-test implemented. Moreover, an [expanding collection of examples](https://nf-co.re/docs/contributing/tutorials/nf-test_assertions) is available to guide you through adopting nf-test for your projects.\n\n# Adding nf-test to pipelines\n\nSeveral nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, [nf-core/methylseq](https://nf-co.re/methylseq/) was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future.\nFollowing these insights, [nf-core/fetchngs](https://nf-co.re/fetchngs) has been at the forefront of incorporating nf-test for testing modules, subworkflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other nf-core pipelines actively integrating nf-test include [mag](https://nf-co.re/mag), [sarek](https://nf-co.re/sarek), [readsimulator](https://nf-co.re/readsimulator), and [rnaseq](https://nf-co.re/rnaseq).\n\n# Pipeline development with nf-test\n\n**For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended**. An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is [phageannotator](https://github.com/nf-core/phageannotator). Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected.\nFor those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience:\n\n1. **Leverage nf-core modules/subworkflows extensively**. Devoting time early to contribute modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests.\n2. **Prioritize incremental changes over large overhauls**. Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable.\n3. **Facilitate parallel execution of nf-test to generate and test snapshots**. By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or updating snapshots time-consuming. Implementing scripts that allow tests to run in parallel—whether via a workload manager or in the cloud—can significantly save time and simplify the process of monitoring tests for pass or fail outcomes.\n\n# Community and contribution\n\nnf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds true as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition:\n\n1. **Adding nf-tests to new and existing nf-core modules/subworkflows**. There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful [instructions](https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test) to the website, which has made this process much more streamlined.\n2. **Adding nf-tests to nf-core pipelines**. Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started ([fetchngs](https://github.com/nf-core/fetchngs/tree/master), [sarek](https://github.com/nf-core/sarek/tree/master), [rnaseq](https://github.com/nf-core/rnaseq/tree/master), [readsimulator](https://github.com/nf-core/readsimulator/tree/master), [phageannotator](https://github.com/nf-core/phageannotator)).\n\n> These are great areas to work on & contribute in nf-core hackathons\n\nThe nf-core community added a significant number of nf-tests during the recent [hackathon in March 2024](https://nf-co.re/events/2024/hackathon-march-2024). Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about a much more effective, transparent, and inclusive system than if we worked in isolation.\n\n# Future directions\n\nLooking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management.\n\n# Conclusion\n\nThe integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research.\n\nSpecial thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration.\n", + "images": [] + }, + { + "slug": "2024/nxf-nf-core-workshop-kogo", + "title": "Nextflow workshop at the 20th KOGO Winter Symposium", + "date": "2024-03-14T00:00:00.000Z", + "content": "\nThrough a partnership between AWS Asia Pacific and Japan, and Seqera, Nextflow touched ground in South Korea for the first time with a training session at the Korea Genome Organization (KOGO) Winter Symposium. The objective was to introduce participants to Nextflow, empowering them to craft their own pipelines. Recognizing the interest among bioinformaticians, MinSung Cho from AWS Korea’s Healthcare & Research Team decided to sponsor this 90-minute workshop session. This initiative covered my travel expenses and accommodations.\n\n\n\n
\n \"Nextflow\n
\n\nThe training commenced with an overview of Nextflow pipelines, exemplified by the [nf-core/nanoseq](https://nf-co.re/nanoseq/3.1.0) Nextflow pipeline, highlighting the subworkflows and modules. nfcore/nanoseq is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data that can be used to perform base-calling, demultiplexing, QC, alignment, and downstream analysis. Following this, participants engaged in a hands-on workshop using the AWS Cloud9 environment. In 70 minutes, they constructed a basic pipeline for analyzing nanopore sequencing data, incorporating workflow templates, modules, and subworkflows from [nf-core/tools](https://github.com/nf-core/tools). If you're interested in learning more about the nf-core/nanoseq Nextflow pipeline, I recorded a video talking about it in the nf-core bytesize meeting. You can watch it [here](https://www.youtube.com/watch?v=KM1A0_GD2vQ).\n\n
\n \"Slide\n
\n\nYou can find the workshop slides [here](https://docs.google.com/presentation/d/1OC4ccgbrNet4e499ShIT7S6Gm6S0xr38_OauKPa4G88/edit?usp=sharing) and the GitHub repository with source code [here](https://github.com/yuukiiwa/nf-core-koreaworkshop).\n\nThe workshop received positive feedback, with participants expressing interest in further sessions to deepen their Nextflow proficiency. Due to this feedback, AWS and the nf-core outreach team are considering organizing small-group local or Zoom training sessions in response to these requests.\n\nIt is imperative to acknowledge the invaluable contributions and support from AWS Korea’s Health Care & Research Team, including MinSung Cho, HyunMin Kim, YoungUng Kim, SeungChang Kang, and Jiyoon Hwang, without whom this workshop would not have been possible. Gratitude is also extended to Charlie Lee for fostering collaboration with the nf-core/outreach team.\n", + "images": [ + "/img/blog-2024-03-14-kogo-img1a.jpg", + "/img/blog-2024-03-14-kogo-img1b.png" + ] + }, + { + "slug": "2024/optimizing-nextflow-for-hpc-and-cloud-at-scale", + "title": "Optimizing Nextflow for HPC and Cloud at Scale", + "date": "2024-01-17T00:00:00.000Z", + "content": "\n## Introduction\n\nA Nextflow workflow run consists of the head job (Nextflow itself) and compute tasks (defined in the pipeline script). It is common to request resources for the tasks via process directives such as `cpus` and `memory`, but the Nextflow head job also requires compute resources. Most of the time, users don’t need to explicitly define the head job resources, as Nextflow generally does a good job of allocating resources for itself. For very large workloads, however, head job resource sizing becomes much more important.\n\nIn this article, we will help you understand how the Nextflow head job works and show you how to tune head job resources such as CPUs and memory for your use case.\n\n\n\n## Head job resources\n\n### CPUs\n\nNextflow uses a thread pool to run native Groovy code (e.g. channel operators, `exec` processes), submit tasks to executors, and publish output files. The number of threads is based on the number of available CPUs, so if you want to provide more compute power to the head job, simply allocate more CPUs and Nextflow will use them. In the [Seqera Platform](https://seqera.io/platform/), you can use **Head Job CPUs** or **Head Job submit options** (depending on the compute environment) to allocate more CPUs.\n\n### Memory\n\nNextflow runs on the Java Virtual Machine (JVM), so it allocates memory based on the standard JVM options, specifically the initial and maximum heap size. You can view the default JVM options for your environment by running this command:\n\n```bash\njava -XX:+PrintFlagsFinal -version | grep 'HeapSize\\|RAM'\n```\n\nFor example, here are the JVM options for an environment with 8 GB of RAM and OpenJDK Temurin 17.0.6:\n\n```\n size_t ErgoHeapSizeLimit = 0\n size_t HeapSizePerGCThread = 43620760\n size_t InitialHeapSize = 127926272\n uintx InitialRAMFraction = 64\n double InitialRAMPercentage = 1.562500\n size_t LargePageHeapSizeThreshold = 134217728\n size_t MaxHeapSize = 2044723200\n uint64_t MaxRAM = 137438953472\n uintx MaxRAMFraction = 4\n double MaxRAMPercentage = 25.000000\n size_t MinHeapSize = 8388608\n uintx MinRAMFraction = 2\n double MinRAMPercentage = 50.000000\n uintx NonNMethodCodeHeapSize = 5839372\n uintx NonProfiledCodeHeapSize = 122909434\n uintx ProfiledCodeHeapSize = 122909434\n size_t SoftMaxHeapSize = 2044723200\n```\n\nThese settings (displayed in bytes) show an initial and maximum heap size of ~128MB and ~2GB, or 1/64 (1.5625%) and 1/4 (25%) of physical memory. These percentages are the typical default settings, although different environments may have different defaults. In the Seqera Platform, the default settings are 40% and 75%, respectively.\n\nYou can set these options for Nextflow at runtime, for example:\n\n```bash\n# absolute values\nexport NXF_JVM_ARGS=\"-Xms2g -Xmx6g\"\n\n# percentages\nexport NXF_JVM_ARGS=\"-XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=75\"\n```\n\nIf you need to provide more memory to Nextflow, you can (1) allocate more memory to the head job and/or (2) use `NXF_JVM_ARGS` to increase the percentage of available memory that Nextflow can use. In the Seqera Platform, you can use **Head Job memory** or **Head Job submit options** (depending on the compute environment) to allocate more memory.\n\n### Disk\n\nThe Nextflow head job is generally responsible for downloading software dependencies and transferring inputs and outputs, but the details vary depending on the environment:\n\n- In an HPC environment, the home directory is typically used to store pipeline code and container images, while the work directory is typically stored in high-performance shared storage. Within the work directory, task inputs are staged from previous tasks via symlinks. Remote inputs (e.g. from HTTP or S3) are first staged into the work directory and then symlinked into the task directory.\n- In a cloud environment like AWS Batch, each task is responsible for pulling its own container image, downloading input files from the work directory (e.g. in S3), and uploading outputs. The head job’s local storage is only used to download the pipeline code.\n\nOverall, the head job uses very little local storage, since most data is saved to shared storage (HPC) or object storage (cloud) rather than the head job itself. However, there are a few specific cases to keep in mind, which we will cover in the following section.\n\n## Common failure modes\n\n### Not enough CPUs for local tasks\n\nIf your workflow has any tasks that use the local executor, make sure the Nextflow head job has enough CPUs to execute these tasks. For example, if a local task requires 4 CPUs, the Nextflow head job should have at least 5 CPUs (the local executor reserves 1 CPU for Nextflow by default).\n\n### Not enough memory for native pipeline code\n\nNextflow pipelines are a combination of native Groovy code (channels, operators, `exec` processes) and embedded shell scripts (`script` processes). Native code is executed directly by the Nextflow head job, while tasks with shell scripts are delegated to executors. Typically, tasks are used to perform the “actual” computations, while channels and operators are used to pass data between tasks.\n\nHowever much Groovy code you write, keep in mind that the Nextflow head job needs to have enough memory to execute it at the desired scale. The simplest way to determine how much memory Nextflow needs is to iteratively allocate more memory to the head job until it succeeds (e.g. start with 1 GB, then 2 GB, then 4 GB, and so on). In general, 2-4 GB is more than enough memory for the Nextflow head job.\n\n### Not enough memory to stage and publish files\n\nIn Nextflow, input files can come from a variety of sources: local files, an HTTP or FTP server, an S3 bucket, etc. When an input file is not local, Nextflow automatically stages the file into the work directory. Similarly, when a `publishDir` directive points to a remote path, Nextflow automatically “publishes” the output files using the correct protocol. These transfers are usually performed in-memory.\n\nMany users have encountered head job errors when running large-scale workloads, where the head job runs out of memory while staging or publishing files. While you can try to give more and more memory to Nextflow as in the previous example, you might be able to fix your problem by simply updating your Nextflow version. There have been many improvements to Nextflow over the past few years around file staging, particularly with S3, and overall we have seen fewer out-of-memory errors of this kind.\n\n### Not enough disk storage to build Singularity images\n\nSingularity / Apptainer can download and convert Docker images on the fly, and it uses the head job’s local scratch storage to do so. This is a common pattern in HPC environments, since container images are usually published as Docker images but HPC environments usually require the use of a rootless container runtime like Singularity. In this case, make sure the head job has enough scratch storage to build each image, even if the image is eventually saved to shared storage.\n\nSince Nextflow version [23.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0), you can use [Wave](https://seqera.io/wave/) to build Singularity images for you. Refer to the [Nextflow documentation](https://nextflow.io/docs/latest/wave.html#build-singularity-native-images) for more details.\n\nAdditionally, Nextflow version [23.11.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v23.11.0-edge) introduced support for [Singularity OCI mode](https://docs.sylabs.io/guides/3.1/user-guide/oci_runtime.html), which allows Singularity / Apptainer to use the OCI container format (the same as Docker) instead of having to build and store a SIF container image locally.\n\n### Failures due to head job and tasks sharing local storage\n\nThere are some situations where the head job and tasks may run on the same node and thereby share the node’s local storage, for example, Kubernetes. If this storage becomes full, any one of the jobs might fail first, including the head job. You can avoid this problem by segregating the head job to its own node, or explicitly requesting disk storage for each task so that they each have sufficient storage.\n\n## Virtual threads\n\n[Virtual threads](https://www.infoq.com/articles/java-virtual-threads/) were introduced in Java 19 and finalized in Java 21. Whereas threads in Java are normally “platform” threads managed by the operating system, “virtual” threads are user-space threads that share a pool of platform threads. Virtual threads use less memory and can be context-switched faster than platform threads, so an application that uses a fixed-size pool of platform threads (e.g. one thread per CPU) could instead have thousands of virtual threads (one thread per “task”) with the same memory footprint and more flexibility – if a virtual thread is blocked (i.e. waiting on I/O), the underlying platform thread can be switched to another virtual thread that isn’t blocked.\n\nSince Nextflow [23.05.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v23.05.0-edge), you can enable virtual threads by using Java 19 or later and setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`. Since version [23.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0), when using Java 21, virtual threads are enabled by default.\n\n### Initial Benchmark: S3 Upload\n\nVirtual threads are particularly useful when there are many I/O-bound tasks, such as uploading many files to S3. So to demonstrate this benefit, we wrote a pipeline… that uploads many files to S3! Here is the core pipeline code:\n\n```groovy\nparams.upload_count = 1000\nparams.upload_size = '10M'\n\nprocess make_random_file {\n publishDir 's3://my-bucket/data/'\n\n input:\n val index\n val size\n\n output:\n path '*.data'\n\n script:\n \"\"\"\n dd \\\n if=/dev/random \\\n of=upload-${size}-${index}.data \\\n bs=1 count=0 seek=${size}\n \"\"\"\n}\n\nworkflow {\n index = Channel.of(1..params.upload_count)\n make_random_file(index, params.upload_size)\n}\n```\n\nThe full source code is available on [GitHub](https://github.com/bentsherman/nf-head-job-benchmark).\n\nWe ran this pipeline across a variety of file sizes and counts, and the results are shown below. Error bars denote +/- 1 standard deviation across three independent trials.\n\nAt larger scales, virtual threads significantly reduce the total runtime, at the cost of higher CPU and memory usage. Considering that the head job resources are typically underutilized anyway, we think the lower time-to-solution is a decent trade!\n\nThe reason why virtual threads are faster in this case is that Nextflow usually spends extra time waiting for files to be published after all tasks have completed. Normally, these publishing tasks are executed by a fixed-size thread pool based on the number of CPUs, but with virtual threads there is no such limit, so Nextflow can fully utilize the available network bandwidth. In the largest case (1000x 100 MB files), virtual threads reduce the runtime by over 30%.\n\n
\n \"CPU\n
Figure 1: CPU usage
\n
\n\n
\n \"Memory\n
Figure 2: Memory usage
\n
\n\n
\n \"Workflow\n
Figure 3: Workflow runtime
\n
\n\n### Realistic Benchmark: nf-core/rnaseq\n\nTo evaluate virtual threads on a real pipeline, we also ran [nf-core/rnaseq](https://github.com/nf-core/rnaseq) with the `test` profile. To simulate a run with many samples, we upsampled the test dataset to 1000 samples. The results are summarized below:\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WalltimeMemory
Platform threads2h 51m1.5 GB
Virtual threads2h 47m1.9 GB
\n\nAs you can see, the benefit here is not so clear. Whereas the upload benchmark was almost entirely I/O, a typical Nextflow pipeline spends most of its time scheduling compute tasks and waiting for them to finish. These tasks are generally not I/O bound and do not block for very long, so there may be little opportunity for improvement from virtual threads.\n\nThat being said, this benchmark consisted of only two runs of nf-core/rnaseq. We didn’t perform more runs here because they were so large, so your results may vary. In particular, if your Nextflow runs spend a lot of time publishing outputs after all the compute tasks have completed, you will likely benefit the most from using virtual threads. In any case, virtual threads should perform at least as well as platform threads, albeit with higher memory usage in some cases.\n\n## Summary\n\nThe key to right-sizing the Nextflow head job is to understand which parts of a Nextflow pipeline are executed directly by Nextflow, and which parts are delegated to compute tasks. This knowledge will help prevent head job failures at scale.\n\nHere are the main takeaways:\n\n- Nextflow uses a thread pool based on the number of available CPUs.\n- Nextflow uses a maximum heap size based on the standard JVM options, which is typically 25% of physical memory (75% in the Seqera Platform).\n- You can use `NXF_JVM_ARGS` to make more system memory available to Nextflow.\n- The easiest way to figure out how much memory Nextflow needs is to iteratively double the memory allocation until the workflow succeeds (but usually 2-4 GB is enough).\n- You can enable virtual threads in Nextflow, which may reduce overall runtime for some pipelines.\n", + "images": [ + "/img/blog-2024-01-17--s3-upload-cpu.png", + "/img/blog-2024-01-17--s3-upload-memory.png", + "/img/blog-2024-01-17--s3-upload-walltime.png" + ] + }, + { + "slug": "2024/reflecting-ambassador-collaboration", + "title": "Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador", + "date": "2024-06-19T00:00:00.000Z", + "content": "\nAs a Nextflow Ambassador and a PhD student working in bioinformatics, I’ve always believed in the power of collaboration. Over the past six months, I’ve had the privilege of working with another PhD student specializing in metagenomics environmental science. This collaboration began through a simple email after the other researcher discovered my contact information on the ambassadors’ list page. It has been a journey of learning, problem-solving, and mutual growth. I’d like to share some reflections on this experience, highlighting both the challenges and the rewards.\n\n\n\n## Connecting Across Disciplines\n\nOur partnership began with a simple question about running one of nf-core’s metagenomics analysis pipelines. Despite being in different parts of Europe and coming from different academic backgrounds, we quickly found common ground. The combination of our expertise – my focus on bioinformatics workflows and their deep knowledge of microbial ecosystems – created a synergy that enriched our work.\n\n## Navigating Challenges Together\n\nLike any collaboration, ours was not without its difficulties. We faced numerous technical challenges, from optimizing computational resources to troubleshooting pipeline errors. There were moments of frustration when things didn’t work as expected. However, each challenge was an opportunity to learn and grow. Working through these challenges together made them much more manageable and even enjoyable at times. We focused on mastering Nextflow in a high-performance computing (HPC) environment, managing large datasets, and conducting comprehensive data analysis. Additionally, we explored effective data visualization techniques to better interpret and present the findings.\nWe leaned heavily on the Nextflow and nf-core community for support. The extensive documentation and guides were invaluable, and the different Slack channels provided real-time problem-solving assistance. Having the possibility of contacting the main developers of the pipeline that was troubling was a great resource that we are fortunate to have. The community’s willingness to share and offer help was a constant source of encouragement, making us feel supported every step of the way.\n\n## Learning and Growing\n\nOver the past six months, we’ve both learned a tremendous amount. The other PhD student became more adept at using and understanding Nextflow, particularly when running the nf-core/ampliseq pipeline, managing files, and handling high-performance computing (HPC) environments. I, on the other hand, gained a deeper understanding of environmental microbiomes and the specific needs of metagenomics research.\nOur sessions were highly collaborative, allowing us to share knowledge and insights freely. It was reassuring to know that we weren’t alone in our journey and that there was a whole community of researchers ready to share their wisdom and experiences. These interactions made our learning process more rewarding.\n\n## Achieving Synergy\n\nOne of the most rewarding aspects of this collaboration has been the synergy between our different backgrounds. Our combined expertise enabled us to efficiently analyze a high volume of metagenomics samples. The journey does not stop here, of course. Now that they have their samples processed, it comes the time to interpret the data, one of my favorite parts. Our work together highlighted the potential for Nextflow and the nf-core community to facilitate research across diverse fields. The collaboration has been a testament to the idea that when individuals from different disciplines come together, they can achieve more than they could alone.\nThis collaboration is poised to result in significant academic contributions. The other PhD student is preparing to publish a paper with the findings enabled by the use of the nf-core/ampliseq pipeline, which will be a key component of their thesis. This paper is going to serve as an excellent example of using Nextflow and nf-core pipelines in the field of metagenomics environmental science.\n\n## Reflecting on the Journey\n\nAs I reflect on these six months, I’m struck by the power of this community in fostering such collaborations. The support network, comprehensive resources, and culture of knowledge sharing have been essential in our success. This experience has reinforced my belief in the importance of open-source bioinformatics and data science communities for professional development and scientific advancement. Through it all, having a collaborator who understood the struggles and celebrated the successes with me made the journey all the more rewarding.\nMoving forward, I’m excited about the potential for more such collaborations. The past six months have been a journey of discovery and growth, and I’m grateful for the opportunity to work with such a dedicated and talented researcher. Our work is far from over, and I look forward to continuing this journey, learning more, and contributing to the field of environmental science.\n\n## Join the Journey!\n\nFor those of you in the Nextflow community or considering joining, I encourage you to take advantage of the resources available. Engage with the community, attend webinars, and don’t hesitate to ask questions. Whether you’re a seasoned expert or a curious newcomer, the Nextflow family is here to support you. Together, we can achieve great things.\n", + "images": [] + }, + { + "slug": "2024/reflections-on-nextflow-mentorship", + "title": "One-Year Reflections on Nextflow Mentorship", + "date": "2024-04-10T00:00:00.000Z", + "content": "\nFrom December 2022 to March 2023, I was part of the second cohort of the Nextflow and nf-core mentorship program, which spanned four months and attracted participants globally. I could not have anticipated the extent to which my participation in this program and the associated learning experiences would positively change my professional growth.\nThe mentorship aims to foster collaboration, knowledge exchange, flexible learning, collaborative coding, and contributions to the nf-core community. It was funded by the Chan Zuckerberg Initiative and is guided by experienced mentors in the community.\nIn the upcoming paragraphs, I'll be sharing more details about the program—its structure, the valuable learning experiences it brought, and the exciting opportunities it opened up for me.\n\n\n\n# Meeting my mentor\n\nOne of the most interesting aspects of the mentorship is that the program emphasizes that mentor-mentee pairs share research interests. In addition, the mentor should have significant experience in the areas where the mentee wants to develop. I found this extremely valuable, as it makes the program very flexible while also considering individual goals and interests. My goal as a mentee was to transition from a **Nextflow user to a Nextflow developer**.\n\nI was lucky enough to have Matthias De Smet as a mentor. He is a member of the Center for Medical Genetics in Ghent and has extensive experience working with open-source projects such as nf-core and Bioconda. His experience working in clinical genomics was a common ground for us to communicate, share experiences and build effective collaboration.\n\nDuring my first days, he guided me to the most useful Nextflow resources available online, tailored to my goals. Then, I drafted a pipeline that I wanted to build and attempted to write my first lines of code in Nextflow. We communicated via Slack and Matthias reviewed and corrected my code via GitHub. He introduced me to the supportive nf-core community, to ask for help when needed, and to acknowledge every success along the way.\n\n
\n \"Mentor\n
\n\n# Highlights of the program\n\nWe decided to start small, setting step-by-step goals. Matthias suggested that a doable goal would be to create my first Nextflow module in the context of a broader pipeline I wanted to develop. A module is a building block that encapsulates a specific functionality or task within a workflow. We realized that the tool I wanted to modularize was not available as part of nf-core. The nf-core GitHub has a community-driven collection of Nextflow modules, subworkflows and pipelines for bioinformatics, providing standardized and well-documented modules. The goal, therefore, was to create a module for this missing tool and then submit it as a contribution to nf-core.\n\nFor those unfamiliar, contributing to nf-core requires another member of the community, usually a maintainer, to review your code. As a newcomer, I was obviously curious about how the process would be. In academia, where anonymity often prevails, feedback can occasionally be a bit stringent. Conversely, during my submission to the nf-core project, I was pleasantly surprised that reviewers look for collective improvement, providing quick, constructive and amicable reviews, leading to a positive environment.\n\n
\n \"Review\n
\n\nFor my final project in the mentorship program, I successfully ported a complete pipeline from Bash to Nextflow. This was a learning experience that allowed me to explore a diverse range of skills, such as modularizing content, understanding how crucial the meta map is, and creating Docker container images for software. This process not only enhanced my proficiency in Nextflow but also allowed me to interact with and contribute to related projects like Bioconda and BioContainers.\n\n# Life after the mentorship\n\nWith the skills I acquired during the mentorship as a mentee, I proposed and successfully implemented a custom solution in Nextflow for a precision medicine start-up I worked at the time that could sequentially do several diagnostics and consumer-genetics applications in the cloud, resulting in substantial cost savings and increasing flexibility for the company.\nBeyond my immediate projects, I joined a group actively developing an open-source Nextflow pipeline for genetic imputation. This project allowed me to be in close contact with members of the nf-core community working on similar projects, adding new tools to this pipeline, giving and receiving feedback, and continuing to improve my overall Nextflow skills while also contributing to the broader bioinformatics community. You can learn more about this project with the fantastic talk by Louis Le Nézet at Nextflow Summit 2023 [here](https://www.youtube.com/watch?v=GHb2Wt9VCOg).\n\nFinally, I was honored to become a Nextflow ambassador. The program’s goal is to extend the awareness of Nextflow around the world while also building a supportive community. In particular, the South American community is underrepresented, so I serve as a point of contact for any institution or newcomer who wants to implement pipelines with Nextflow.\nAs part of this program, I was invited to speak at the second Chilean Congress of Bioinformatics, where I gave a talk about how Nextflow and nf-core can support scaling bioinformatics projects in the cloud. It was incredibly rewarding to introduce Nextflow to a community for the first time and witness the genuine enthusiasm it sparks among students and attendees for the potential in their research projects.\n\n
\n \"Second\n
\n\n# What’s next?\n\nThe comprehensive skill set acquired in my journey proved to be incredibly valuable for my professional development and allowed me to join the ZS Discovery Team as a Senior Bioinformatician. This organization accelerates transformation in research and early development with direct contribution to impactful bioinformatics projects with a globally distributed, multidisciplinary talented team.\n\nIn addition, we organized a local site for the nf-core hackathon in March 2024, the first Nextflow Hackathon in Argentina, fostering a space to advance our skills in workflow management collectively. It was a pleasure to see how beginners got their first PRs approved and how they interacted with the nf-core community for the first time.\n\n
\n \"nf-core\n
\n\nMy current (and probably future!) day-to-day work involves working and developing pipelines with Nextflow, while also mentoring younger bioinformaticians into this language. The commitment to open-source projects remains a cornerstone of my journey and I am thankful that it has provided me the opportunity to collaborate with individuals from diverse backgrounds all over the world.\n\nWhether you're interested in the mentorship program, curious about the hackathon, or simply wish to connect, feel free to reach out at the nf-core Slack!\n", + "images": [ + "/img/blog-2024-04-10-img1a.png", + "/img/blog-2024-04-10-img1b.png", + "/img/blog-2024-04-10-img1c.png", + "/img/blog-2024-04-10-img1d.png" + ] + }, + { + "slug": "2024/training-local-site", + "title": "Nextflow Training: Bridging Online Learning with In-Person Connections", + "date": "2024-05-08T00:00:00.000Z", + "content": "\nNextflow and nf-core provide frequent community training events to new users, which offer an opportunity to get started using and understanding Nextflow, Groovy and nf-core. These events are live-streamed and are available for on-demand viewing on YouTube, but what if you could join friends in person and watch it live?\n\n\n\nLearning something new by yourself can be a daunting task. Having colleagues and friends go through the learning and discovering process alongside you can really enrich the experience and be a lot of fun! With that in mind, we decided to host a get-together for the fundamentals training streams in person. Anybody from the scientific community in and around Heidelberg who wanted to learn Nextflow was welcome to join.\n\nThis year, [Marcel Ribeiro-Dantas](https://twitter.com/mribeirodantas) and [Chris Hakkaart](https://twitter.com/Chris_Hakk) from Seqera held the training over two days, offering the first steps into the Nextflow universe (you can watch it [here](https://www.youtube.com/playlist?list=PL3xpfTVZLcNgLBGLAiY6Rl9fizsz-DTCT)). [Kübra Narcı](https://twitter.com/kubranarci) and [Florian Wünneman](https://twitter.com/flowuenne) hosted a local training site for the recent community fundamentals training in Heidelberg. Kübra is a Nextflow ambassador, working as a bioinformatician and using Nextflow to develop pipelines for the German Human Genome Phenome Archive (GHGA) project in her daily life. At the time, Florian was a Postdoc at the Institute of Computational Biomedicine with Denis Schapiro in Heidelberg, though he has since then joined Seqera as a Bioinformatics Engineer.\n\nWe advertised the event about a month beforehand in our local communities (genomics, transcriptomics, spatial omics among others) to give people enough time to decide whether they want to join. We had quite a bit of interest and a total of 15 people participated. The event took place at the Marsilius Arkaden at the University Clinic campus in Heidelberg. Participants brought their laptops and followed along with the stream, which we projected for everyone, so people could use their laptops exclusively for coding and did not have to switch between stream and coding environment.\n\n
\n \"meme\n
\n\n
\n \"meme\n
\n\nThe goal of this local training site was for everyone to follow the fundamentals training sessions on their laptop and be able to ask follow-up questions in person to the room. We also had a few experienced Nextflow users be there for support. There is a dedicated nf-core Slack channel during the training events for people to ask questions, which is a great tool for help. We also found that in-person discussions around topics that remained confusing to participants were really helpful for many people, as they could provide some more context and allow quick follow-up questions. During the course of the fundamentals training, we found ourselves naturally pausing the video and taking the time to discuss with the group. It was particularly great to see new users explaining concepts they just learned to each other.\n\nThis local training site was also an excellent opportunity for new Nextflow users in Heidelberg to get to know each other and make new connections before the upcoming nf-core hackathon, for which there was also a [local site](https://nf-co.re/events/2024/hackathon-march-2024/germany-heidelberg) organized in Heidelberg. It was a great experience to organize a smaller local event to learn Nextflow with the local community. We learned some valuable lessons from this experience, that we will apply for the next local Nextflow gatherings. Advertising a bit earlier will give people more time to spread the word, we would likely aim for 2 months in advance next time. Offering coffee during breaks can go a long way to keep people awake and motivated, so we would try to serve up some hot coffee next time. Finally, having a bit more in-depth introductions (maybe via short posts on a forum) of everyone joining could be an even better ice breaker to foster contacts and collaborations for the future.\n\nThe ability to join training sessions, bytesize talks, and other events from nf-core and Nextflow online is absolutely fantastic and enables the free dissemination of knowledge. However, the opportunity to join a group in person and work through the content together can really enrich the experience and bring people closer together.\n\nIf you're looking for a training opportunity, there will be one in Basel, Switzerland, on June 25 and another one in Cambridge, UK, on September 12. These and other events will be displayed in the [Seqera Events](https://seqera.io/events/) page when it gets closer to the dates of the events.\n\nWho knows, maybe you will meet someone interested in the same topic, a new collaborator or even a new friend in your local Nextflow community!\n", + "images": [ + "/img/blog-2024-05-06-training-img1a.jpg", + "/img/blog-2024-05-06-training-img2a.jpg" + ] + }, + { + "slug": "2024/welcome_ambassadors_20242", + "title": "Join us in welcoming the new Nextflow Ambassadors", + "date": "2024-07-10T00:00:00.000Z", + "content": "\nAs the second semester of 2024 kicks off, I am thrilled to welcome a new cohort of ambassadors to the Nextflow Ambassador Program. This vibrant group joins the dedicated ambassadors who are continuing their remarkable work from the previous semester. Together, they form a diverse and talented team, representing a variety of countries and backgrounds, encompassing both industry and academia.\n\n\n\n## A Diverse and Inclusive Cohort\n\nThis semester, I am proud to announce that our ambassadors hail from over 20 countries, reflecting the increasingly global reach and inclusive nature of the Nextflow community. There has historically been a strong presence of Nextflow in the US and Europe, so I would like to extend an especially warm welcome to all those in Asia and the global south who are joining us through the program, from countries such as Argentina, Chile, Brazil, Ghana, Tunisia, Nigeria, South Africa, India, Indonesia, Singapore, and Australia. From seasoned bioinformaticians to emerging data scientists, our ambassadors bring a wealth of expertise and unique perspectives to the program.\n\n## Industry and Academia Unite\n\nOne of the strengths of the Nextflow Ambassador Program is its ability to bridge the gap between industry and academia. This semester, we have an exciting mix of professionals from biotech companies, renowned research institutions, and leading universities. This synergy fosters a rich exchange of ideas, driving innovation and collaboration.\n\n## Spotlight on New Ambassadors\n\nI am particularly happy with this last call for ambassadors. Amazing people were selected, and I would like to highlight a few, though all of them are good additions to the team! For example, while Carson Miller, a PhD Candidate in the Department of Microbiology at the University of Washington, is new to the ambassador program, he has been making impactful contributions to the community for a long time. He hosted a local site for the nf-core Hackathon back in March, wrote a post to the Nextflow blog and has been very active in the nf-core community. The same can be said about Mahesh Binzer-Panchal, a Bioinformatician at NBIS, who has been very active in the community answering technical questions about Nextflow.\n\nThe previous round of ambassadors allowed us to achieve a broad global presence. However, some regions were more represented than others. I am especially thrilled to have new ambassadors in new regions of the globe, For example, Fadinda Shafira and Edwin Simjaya from Indonesia, AI Engineer and Head of AI at Kalbe, respectively. Prior to joining the program, they had already been strong advocates for Nextflow in Indonesia and had conducted Nextflow training sessions!\n\n## Continuing the Good Work\n\nI'm also delighted to see the continuing work of several dedicated ambassadors who have made significant contributions to the program. Abhinav Sharma, a Ph.D. Candidate at Stellenbosch University in South Africa, has been a key community contact in the African continent, and with the support we were able to provide him through the program, he was able to travel around Brazil and visit multiple research groups to advocate for Open Science, Nextflow, and nf-core. Similarly, Kübra Narcı, a bioinformatician at DKFZ in Germany, increased the awareness of [Nextflow in her home country, Türkiye](https://www.nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html), while also contributing to the [German research community](https://www.nextflow.io/blog/2024/training-local-site.html).\n\nThe program has been shown to welcome a variety of backgrounds and both new and long-time community members. Just last year, Anabella Trigila, a Senior Bioinformatician at ZS in Argentina, was a mentee in the Nextflow and nf-core mentorship program and has quickly become a [key member in Latin America](https://www.nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html). Robert Petit, a Bioinformatician at the Wyoming Public Health Laboratory in the US, meanwhile, has been [a contributor for many years](https://www.nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html) and keeps giving back to the community.\n\n## Where we are\n\n
\n \"Map\n
\n\n## Looking Ahead\n\nThe upcoming semester promises to be an exciting period of growth and innovation for the Nextflow Ambassador Program. Based on current plans, our ambassadors are set to make sure people worldwide know Nextflow and have all the support they need to use it to advance the field of computational biology, among others. I look forward to seeing the incredible work that will emerge from this talented group.\n\nWelcome, new and continuing ambassadors, to another inspiring semester! Together, we will continue to help push the boundaries of what's possible with Nextflow.\n\nStay tuned for more updates and follow our ambassadors' journeys on the Nextflow blog here and the [Nextflow's Twitter/X account](https://x.com/nextflowio).\n\n
\n \n\n
\n

\n Ambassadors are passionate individuals who support\n the Nextflow community. Interested in becoming an ambassador? Read more about it\n here.\n

\n
\n
\n", + "images": [ + "/img/blog-2024-07-10-img1a.png", + "/img/nextflow_ambassador_logo.svg" + ] + } +] \ No newline at end of file diff --git a/internal/export.mjs b/internal/export.mjs new file mode 100644 index 00000000..22cc9f4f --- /dev/null +++ b/internal/export.mjs @@ -0,0 +1,49 @@ +import fs from 'fs'; +import path from 'path'; +import matter from 'gray-matter'; +import * as cheerio from 'cheerio'; + +const postsDirectory = path.join(process.cwd(), '../src/content/blog'); +const outputFile = path.join(process.cwd(), 'export.json'); + +function extractImageUrls(content) { + const $ = cheerio.load(content); + const images = []; + $('img').each((i, elem) => { + const src = $(elem).attr('src'); + if (src) images.push(src); + }); + return images; +} + +function getPostsRecursively(dir) { + let posts = []; + const items = fs.readdirSync(dir, { withFileTypes: true }); + + for (const item of items) { + const fullPath = path.join(dir, item.name); + + if (item.isDirectory()) { + posts = posts.concat(getPostsRecursively(fullPath)); + } else if (item.isFile() && item.name.endsWith('.md')) { + const fileContents = fs.readFileSync(fullPath, 'utf8'); + const { data, content } = matter(fileContents); + const images = extractImageUrls(content); + + posts.push({ + slug: path.relative(postsDirectory, fullPath).replace('.md', ''), + title: data.title, + date: data.date, + content: content, + images: images, + }); + } + } + + return posts; +} + +const posts = getPostsRecursively(postsDirectory); + +fs.writeFileSync(outputFile, JSON.stringify(posts, null, 2)); +console.log(`Exported ${posts.length} posts to ${outputFile}`); \ No newline at end of file diff --git a/internal/import.mjs b/internal/import.mjs new file mode 100644 index 00000000..59a967cc --- /dev/null +++ b/internal/import.mjs @@ -0,0 +1,75 @@ +import sanityClient from '@sanity/client'; +import fs from 'fs'; +import path from 'path'; +import axios from 'axios'; +import cheerio from 'cheerio'; + +const client = sanityClient({ + projectId: 'your-project-id', + dataset: 'your-dataset', + token: 'your-write-token', + useCdn: false, +}); + +const postsFile = path.join(process.cwd(), 'blog-posts.json'); +const posts = JSON.parse(fs.readFileSync(postsFile, 'utf8')); + +async function downloadImage(url) { + const response = await axios.get(url, { responseType: 'arraybuffer' }); + return Buffer.from(response.data, 'binary'); +} + +async function uploadImageToSanity(imageBuffer, filename) { + return client.assets.upload('image', imageBuffer, { filename }); +} + +async function replaceImageUrls(content, imageMap) { + const $ = cheerio.load(content); + $('img').each((i, elem) => { + const src = $(elem).attr('src'); + if (src && imageMap[src]) { + $(elem).attr('src', imageMap[src]); + } + }); + return $.html(); +} + +async function migratePosts() { + for (const post of posts) { + const imageMap = {}; + for (const imageUrl of post.images) { + try { + const imageBuffer = await downloadImage(imageUrl); + const filename = path.basename(imageUrl); + const uploadedImage = await uploadImageToSanity(imageBuffer, filename); + imageMap[imageUrl] = uploadedImage.url; + } catch (error) { + console.error(`Failed to process image: ${imageUrl}`, error); + } + } + + const updatedContent = await replaceImageUrls(post.content, imageMap); + + const sanityPost = { + _type: 'post', + title: post.title, + slug: { current: post.slug }, + publishedAt: new Date(post.date).toISOString(), + body: [ + { + _type: 'block', + children: [{ _type: 'span', text: updatedContent }], + }, + ], + }; + + try { + const result = await client.create(sanityPost); + console.log(`Successfully migrated post: ${result.title}`); + } catch (error) { + console.error(`Failed to migrate post: ${post.title}`, error); + } + } +} + +migratePosts().then(() => console.log('Migration complete')); \ No newline at end of file diff --git a/package-lock.json b/package-lock.json index a72a4131..d6414ff0 100644 --- a/package-lock.json +++ b/package-lock.json @@ -14,6 +14,8 @@ "@shikijs/transformers": "^1.6.3", "astro": "^4.14.5", "astro-remark-description": "^1.1.2", + "cheerio": "^1.0.0", + "gray-matter": "^4.0.3", "remark-directive": "^3.0.0", "typescript": "^5.4.5" }, @@ -2293,6 +2295,11 @@ "node": ">=8" } }, + "node_modules/boolbase": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/boolbase/-/boolbase-1.0.0.tgz", + "integrity": "sha512-JZOSA7Mo9sNGB8+UjSgzdLtokWAky1zbztM3WRLCbZ70/3cTANmQmOdR7y2g+J0e2WXywy1yS468tY+IruqEww==" + }, "node_modules/boxen": { "version": "7.1.1", "resolved": "https://registry.npmjs.org/boxen/-/boxen-7.1.1.tgz", @@ -2479,6 +2486,46 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/cheerio": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/cheerio/-/cheerio-1.0.0.tgz", + "integrity": "sha512-quS9HgjQpdaXOvsZz82Oz7uxtXiy6UIsIQcpBj7HRw2M63Skasm9qlDocAM7jNuaxdhpPU7c4kJN+gA5MCu4ww==", + "dependencies": { + "cheerio-select": "^2.1.0", + "dom-serializer": "^2.0.0", + "domhandler": "^5.0.3", + "domutils": "^3.1.0", + "encoding-sniffer": "^0.2.0", + "htmlparser2": "^9.1.0", + "parse5": "^7.1.2", + "parse5-htmlparser2-tree-adapter": "^7.0.0", + "parse5-parser-stream": "^7.1.2", + "undici": "^6.19.5", + "whatwg-mimetype": "^4.0.0" + }, + "engines": { + "node": ">=18.17" + }, + "funding": { + "url": "https://github.com/cheeriojs/cheerio?sponsor=1" + } + }, + "node_modules/cheerio-select": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/cheerio-select/-/cheerio-select-2.1.0.tgz", + "integrity": "sha512-9v9kG0LvzrlcungtnJtpGNxY+fzECQKhK4EGJX2vByejiMX84MFNQw4UxPJl3bFbTMw+Dfs37XaIkCwTZfLh4g==", + "dependencies": { + "boolbase": "^1.0.0", + "css-select": "^5.1.0", + "css-what": "^6.1.0", + "domelementtype": "^2.3.0", + "domhandler": "^5.0.3", + "domutils": "^3.0.1" + }, + "funding": { + "url": "https://github.com/sponsors/fb55" + } + }, "node_modules/chokidar": { "version": "3.6.0", "resolved": "https://registry.npmjs.org/chokidar/-/chokidar-3.6.0.tgz", @@ -2752,6 +2799,32 @@ "node": ">= 8" } }, + "node_modules/css-select": { + "version": "5.1.0", + "resolved": "https://registry.npmjs.org/css-select/-/css-select-5.1.0.tgz", + "integrity": "sha512-nwoRF1rvRRnnCqqY7updORDsuqKzqYJ28+oSMaJMMgOauh3fvwHqMS7EZpIPqK8GL+g9mKxF1vP/ZjSeNjEVHg==", + "dependencies": { + "boolbase": "^1.0.0", + "css-what": "^6.1.0", + "domhandler": "^5.0.2", + "domutils": "^3.0.1", + "nth-check": "^2.0.1" + }, + "funding": { + "url": "https://github.com/sponsors/fb55" + } + }, + "node_modules/css-what": { + "version": "6.1.0", + "resolved": "https://registry.npmjs.org/css-what/-/css-what-6.1.0.tgz", + "integrity": "sha512-HTUrgRJ7r4dsZKU6GjmpfRK1O76h97Z8MfS1G0FozR+oF2kG6Vfe8JE6zwrkbxigziPHinCJ+gCPjA9EaBDtRw==", + "engines": { + "node": ">= 6" + }, + "funding": { + "url": "https://github.com/sponsors/fb55" + } + }, "node_modules/cssesc": { "version": "3.0.0", "resolved": "https://registry.npmjs.org/cssesc/-/cssesc-3.0.0.tgz", @@ -2850,6 +2923,57 @@ "resolved": "https://registry.npmjs.org/dlv/-/dlv-1.1.3.tgz", "integrity": "sha512-+HlytyjlPKnIG8XuRG8WvmBP8xs8P71y+SKKS6ZXWoEgLuePxtDoUEiH7WkdePWrQ5JBpE6aoVqfZfJUQkjXwA==" }, + "node_modules/dom-serializer": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/dom-serializer/-/dom-serializer-2.0.0.tgz", + "integrity": "sha512-wIkAryiqt/nV5EQKqQpo3SToSOV9J0DnbJqwK7Wv/Trc92zIAYZ4FlMu+JPFW1DfGFt81ZTCGgDEabffXeLyJg==", + "dependencies": { + "domelementtype": "^2.3.0", + "domhandler": "^5.0.2", + "entities": "^4.2.0" + }, + "funding": { + "url": "https://github.com/cheeriojs/dom-serializer?sponsor=1" + } + }, + "node_modules/domelementtype": { + "version": "2.3.0", + "resolved": "https://registry.npmjs.org/domelementtype/-/domelementtype-2.3.0.tgz", + "integrity": "sha512-OLETBj6w0OsagBwdXnPdN0cnMfF9opN69co+7ZrbfPGrdpPVNBUj02spi6B1N7wChLQiPn4CSH/zJvXw56gmHw==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fb55" + } + ] + }, + "node_modules/domhandler": { + "version": "5.0.3", + "resolved": "https://registry.npmjs.org/domhandler/-/domhandler-5.0.3.tgz", + "integrity": "sha512-cgwlv/1iFQiFnU96XXgROh8xTeetsnJiDsTc7TYCLFd9+/WNkIqPTxiM/8pSd8VIrhXGTf1Ny1q1hquVqDJB5w==", + "dependencies": { + "domelementtype": "^2.3.0" + }, + "engines": { + "node": ">= 4" + }, + "funding": { + "url": "https://github.com/fb55/domhandler?sponsor=1" + } + }, + "node_modules/domutils": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/domutils/-/domutils-3.1.0.tgz", + "integrity": "sha512-H78uMmQtI2AhgDJjWeQmHwJJ2bLPD3GMmO7Zja/ZZh84wkm+4ut+IUnUdRa8uCGX88DiVx1j6FRe1XfxEgjEZA==", + "dependencies": { + "dom-serializer": "^2.0.0", + "domelementtype": "^2.3.0", + "domhandler": "^5.0.3" + }, + "funding": { + "url": "https://github.com/fb55/domutils?sponsor=1" + } + }, "node_modules/dset": { "version": "3.1.3", "resolved": "https://registry.npmjs.org/dset/-/dset-3.1.3.tgz", @@ -2890,6 +3014,18 @@ "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-10.3.0.tgz", "integrity": "sha512-QpLs9D9v9kArv4lfDEgg1X/gN5XLnf/A6l9cs8SPZLRZR3ZkY9+kwIQTxm+fsSej5UMYGE8fdoaZVIBlqG0XTw==" }, + "node_modules/encoding-sniffer": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/encoding-sniffer/-/encoding-sniffer-0.2.0.tgz", + "integrity": "sha512-ju7Wq1kg04I3HtiYIOrUrdfdDvkyO9s5XM8QAj/bN61Yo/Vb4vgJxy5vi4Yxk01gWHbrofpPtpxM8bKger9jhg==", + "dependencies": { + "iconv-lite": "^0.6.3", + "whatwg-encoding": "^3.1.1" + }, + "funding": { + "url": "https://github.com/fb55/encoding-sniffer?sponsor=1" + } + }, "node_modules/entities": { "version": "4.5.0", "resolved": "https://registry.npmjs.org/entities/-/entities-4.5.0.tgz", @@ -3446,6 +3582,24 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/htmlparser2": { + "version": "9.1.0", + "resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-9.1.0.tgz", + "integrity": "sha512-5zfg6mHUoaer/97TxnGpxmbR7zJtPwIYFMZ/H5ucTlPZhKvtum05yiPK3Mgai3a0DyVxv7qYqoweaEd2nrYQzQ==", + "funding": [ + "https://github.com/fb55/htmlparser2?sponsor=1", + { + "type": "github", + "url": "https://github.com/sponsors/fb55" + } + ], + "dependencies": { + "domelementtype": "^2.3.0", + "domhandler": "^5.0.3", + "domutils": "^3.1.0", + "entities": "^4.5.0" + } + }, "node_modules/http-cache-semantics": { "version": "4.1.1", "resolved": "https://registry.npmjs.org/http-cache-semantics/-/http-cache-semantics-4.1.1.tgz", @@ -3459,6 +3613,17 @@ "node": ">=16.17.0" } }, + "node_modules/iconv-lite": { + "version": "0.6.3", + "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz", + "integrity": "sha512-4fCk79wshMdzMp2rH06qWrJE4iolqLhCUH+OiuIgU++RB0+94NlDL81atO7GX55uUKueo0txHNtvEyI6D7WdMw==", + "dependencies": { + "safer-buffer": ">= 2.1.2 < 3.0.0" + }, + "engines": { + "node": ">=0.10.0" + } + }, "node_modules/import-meta-resolve": { "version": "4.1.0", "resolved": "https://registry.npmjs.org/import-meta-resolve/-/import-meta-resolve-4.1.0.tgz", @@ -4805,6 +4970,17 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/nth-check": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/nth-check/-/nth-check-2.1.1.tgz", + "integrity": "sha512-lqjrjmaOoAnWfMmBPL+XNnynZh2+swxiX3WUE0s4yEHI6m+AwrK2UZOimIRl3X/4QctVqS8AiZjFqyOGrMXb/w==", + "dependencies": { + "boolbase": "^1.0.0" + }, + "funding": { + "url": "https://github.com/fb55/nth-check?sponsor=1" + } + }, "node_modules/onetime": { "version": "6.0.0", "resolved": "https://registry.npmjs.org/onetime/-/onetime-6.0.0.tgz", @@ -4982,6 +5158,29 @@ "url": "https://github.com/inikulin/parse5?sponsor=1" } }, + "node_modules/parse5-htmlparser2-tree-adapter": { + "version": "7.0.0", + "resolved": "https://registry.npmjs.org/parse5-htmlparser2-tree-adapter/-/parse5-htmlparser2-tree-adapter-7.0.0.tgz", + "integrity": "sha512-B77tOZrqqfUfnVcOrUvfdLbz4pu4RopLD/4vmu3HUPswwTA8OH0EMW9BlWR2B0RCoiZRAHEUu7IxeP1Pd1UU+g==", + "dependencies": { + "domhandler": "^5.0.2", + "parse5": "^7.0.0" + }, + "funding": { + "url": "https://github.com/inikulin/parse5?sponsor=1" + } + }, + "node_modules/parse5-parser-stream": { + "version": "7.1.2", + "resolved": "https://registry.npmjs.org/parse5-parser-stream/-/parse5-parser-stream-7.1.2.tgz", + "integrity": "sha512-JyeQc9iwFLn5TbvvqACIF/VXG6abODeB3Fwmv/TGdLk2LfbWkaySGY72at4+Ty7EkPZj854u4CrICqNk2qIbow==", + "dependencies": { + "parse5": "^7.0.0" + }, + "funding": { + "url": "https://github.com/inikulin/parse5?sponsor=1" + } + }, "node_modules/path-browserify": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/path-browserify/-/path-browserify-1.0.1.tgz", @@ -5545,6 +5744,11 @@ "integrity": "sha512-AUNrbEUHeKY8XsYr/DYpl+qk5+aM+DChopnWOPEzn8YKzOhv4l2zH6LzZms3tOZP3wwdOyc0RmTciyi46HLIuA==", "devOptional": true }, + "node_modules/safer-buffer": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz", + "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==" + }, "node_modules/sass-formatter": { "version": "0.7.9", "resolved": "https://registry.npmjs.org/sass-formatter/-/sass-formatter-0.7.9.tgz", @@ -5938,6 +6142,14 @@ "semver": "^7.3.8" } }, + "node_modules/undici": { + "version": "6.19.8", + "resolved": "https://registry.npmjs.org/undici/-/undici-6.19.8.tgz", + "integrity": "sha512-U8uCCl2x9TK3WANvmBavymRzxbfFYG+tAu+fgx3zxQy3qdagQqBLwJVrdyO1TBfUXvfKveMKJZhpvUYoOjM+4g==", + "engines": { + "node": ">=18.17" + } + }, "node_modules/undici-types": { "version": "5.26.5", "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-5.26.5.tgz", @@ -6469,6 +6681,25 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/whatwg-encoding": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/whatwg-encoding/-/whatwg-encoding-3.1.1.tgz", + "integrity": "sha512-6qN4hJdMwfYBtE3YBTTHhoeuUrDBPZmbQaxWAqSALV/MeEnR5z1xd8UKud2RAkFoPkmB+hli1TZSnyi84xz1vQ==", + "dependencies": { + "iconv-lite": "0.6.3" + }, + "engines": { + "node": ">=18" + } + }, + "node_modules/whatwg-mimetype": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/whatwg-mimetype/-/whatwg-mimetype-4.0.0.tgz", + "integrity": "sha512-QaKxh0eNIi2mE9p2vEdzfagOKHCcj1pJ56EEHGQOVxp8r9/iszLUUV7v89x9O1p/T+NlTM5W7jW6+cz4Fq1YVg==", + "engines": { + "node": ">=18" + } + }, "node_modules/which": { "version": "2.0.2", "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", diff --git a/package.json b/package.json index 88dcad6b..27b2cbea 100644 --- a/package.json +++ b/package.json @@ -17,6 +17,8 @@ "@shikijs/transformers": "^1.6.3", "astro": "^4.14.5", "astro-remark-description": "^1.1.2", + "cheerio": "^1.0.0", + "gray-matter": "^4.0.3", "remark-directive": "^3.0.0", "typescript": "^5.4.5" }, From 6dc7c39d5620daa8dc2cca270685430fa8c6cbf9 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Tue, 17 Sep 2024 17:46:34 +0200 Subject: [PATCH 02/21] Fix import script --- internal/import.mjs | 17 +-- package-lock.json | 313 +++++++++++++++++++++++++++++++++++++++++++- package.json | 2 + 3 files changed, 322 insertions(+), 10 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index 59a967cc..159baabe 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -2,16 +2,16 @@ import sanityClient from '@sanity/client'; import fs from 'fs'; import path from 'path'; import axios from 'axios'; -import cheerio from 'cheerio'; +import * as cheerio from 'cheerio'; const client = sanityClient({ - projectId: 'your-project-id', - dataset: 'your-dataset', - token: 'your-write-token', + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, useCdn: false, }); -const postsFile = path.join(process.cwd(), 'blog-posts.json'); +const postsFile = path.join(process.cwd(), 'export.json'); const posts = JSON.parse(fs.readFileSync(postsFile, 'utf8')); async function downloadImage(url) { @@ -35,7 +35,8 @@ async function replaceImageUrls(content, imageMap) { } async function migratePosts() { - for (const post of posts) { + const p = [posts[0]] + for (const post of p) { const imageMap = {}; for (const imageUrl of post.images) { try { @@ -51,9 +52,9 @@ async function migratePosts() { const updatedContent = await replaceImageUrls(post.content, imageMap); const sanityPost = { - _type: 'post', + _type: 'blogPostDev', title: post.title, - slug: { current: post.slug }, + meta: { slug: { current: post.slug } }, publishedAt: new Date(post.date).toISOString(), body: [ { diff --git a/package-lock.json b/package-lock.json index d6414ff0..2ddeecef 100644 --- a/package-lock.json +++ b/package-lock.json @@ -11,9 +11,11 @@ "@astrojs/check": "^0.9.3", "@astrojs/rss": "^4.0.7", "@astrojs/sitemap": "^3.1.6", + "@sanity/client": "^6.21.3", "@shikijs/transformers": "^1.6.3", "astro": "^4.14.5", "astro-remark-description": "^1.1.2", + "axios": "^1.7.7", "cheerio": "^1.0.0", "gray-matter": "^4.0.3", "remark-directive": "^3.0.0", @@ -1714,6 +1716,30 @@ "win32" ] }, + "node_modules/@sanity/client": { + "version": "6.21.3", + "resolved": "https://registry.npmjs.org/@sanity/client/-/client-6.21.3.tgz", + "integrity": "sha512-oE2+4kKRTZhFCc4IIsojkzKF0jIhsSYSRxkPZjScZ1k/EQ3Y2tEcQYiKwvvotzaXoaWsIL3RTpulE+R4iBYiBw==", + "dependencies": { + "@sanity/eventsource": "^5.0.2", + "get-it": "^8.6.4", + "rxjs": "^7.0.0" + }, + "engines": { + "node": ">=14.18" + } + }, + "node_modules/@sanity/eventsource": { + "version": "5.0.2", + "resolved": "https://registry.npmjs.org/@sanity/eventsource/-/eventsource-5.0.2.tgz", + "integrity": "sha512-/B9PMkUvAlUrpRq0y+NzXgRv5lYCLxZNsBJD2WXVnqZYOfByL9oQBV7KiTaARuObp5hcQYuPfOAVjgXe3hrixA==", + "dependencies": { + "@types/event-source-polyfill": "1.0.5", + "@types/eventsource": "1.1.15", + "event-source-polyfill": "1.0.31", + "eventsource": "2.0.2" + } + }, "node_modules/@shikijs/core": { "version": "1.6.3", "resolved": "https://registry.npmjs.org/@shikijs/core/-/core-1.6.3.tgz", @@ -1782,6 +1808,24 @@ "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.5.tgz", "integrity": "sha512-/kYRxGDLWzHOB7q+wtSUQlFrtcdUccpfy+X+9iMBpHK8QLLhx2wIPYuS5DYtR9Wa/YlZAbIovy7qVdB1Aq6Lyw==" }, + "node_modules/@types/event-source-polyfill": { + "version": "1.0.5", + "resolved": "https://registry.npmjs.org/@types/event-source-polyfill/-/event-source-polyfill-1.0.5.tgz", + "integrity": "sha512-iaiDuDI2aIFft7XkcwMzDWLqo7LVDixd2sR6B4wxJut9xcp/Ev9bO4EFg4rm6S9QxATLBj5OPxdeocgmhjwKaw==" + }, + "node_modules/@types/eventsource": { + "version": "1.1.15", + "resolved": "https://registry.npmjs.org/@types/eventsource/-/eventsource-1.1.15.tgz", + "integrity": "sha512-XQmGcbnxUNa06HR3VBVkc9+A2Vpi9ZyLJcdS5dwaQQ/4ZMWFO+5c90FnMUpbtMZwB/FChoYHwuVg8TvkECacTA==" + }, + "node_modules/@types/follow-redirects": { + "version": "1.14.4", + "resolved": "https://registry.npmjs.org/@types/follow-redirects/-/follow-redirects-1.14.4.tgz", + "integrity": "sha512-GWXfsD0Jc1RWiFmMuMFCpXMzi9L7oPDVwxUnZdg89kDNnqsRfUKXEtUYtA98A6lig1WXH/CYY/fvPW9HuN5fTA==", + "dependencies": { + "@types/node": "*" + } + }, "node_modules/@types/hast": { "version": "3.0.4", "resolved": "https://registry.npmjs.org/@types/hast/-/hast-3.0.4.tgz", @@ -1820,6 +1864,14 @@ "undici-types": "~5.26.4" } }, + "node_modules/@types/progress-stream": { + "version": "2.0.5", + "resolved": "https://registry.npmjs.org/@types/progress-stream/-/progress-stream-2.0.5.tgz", + "integrity": "sha512-5YNriuEZkHlFHHepLIaxzq3atGeav1qCTGzB74HKWpo66qjfostF+rHc785YYYHeBytve8ZG3ejg42jEIfXNiQ==", + "dependencies": { + "@types/node": "*" + } + }, "node_modules/@types/sax": { "version": "1.2.7", "resolved": "https://registry.npmjs.org/@types/sax/-/sax-1.2.7.tgz", @@ -2264,6 +2316,21 @@ "@types/hast": "^3.0.4" } }, + "node_modules/asynckit": { + "version": "0.4.0", + "resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz", + "integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==" + }, + "node_modules/axios": { + "version": "1.7.7", + "resolved": "https://registry.npmjs.org/axios/-/axios-1.7.7.tgz", + "integrity": "sha512-S4kL7XrjgBmvdGut0sN3yJxqYzrDOnivkBiN0OFs6hLiUam3UPvswUo0kqGyhqUZGEOytHyumEdXsAkgCOUf3Q==", + "dependencies": { + "follow-redirects": "^1.15.6", + "form-data": "^4.0.0", + "proxy-from-env": "^1.1.0" + } + }, "node_modules/axobject-query": { "version": "4.1.0", "resolved": "https://registry.npmjs.org/axobject-query/-/axobject-query-4.1.0.tgz", @@ -2758,6 +2825,17 @@ "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==", "optional": true }, + "node_modules/combined-stream": { + "version": "1.0.8", + "resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz", + "integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==", + "dependencies": { + "delayed-stream": "~1.0.0" + }, + "engines": { + "node": ">= 0.8" + } + }, "node_modules/comma-separated-tokens": { "version": "2.0.3", "resolved": "https://registry.npmjs.org/comma-separated-tokens/-/comma-separated-tokens-2.0.3.tgz", @@ -2786,6 +2864,11 @@ "node": ">= 0.6" } }, + "node_modules/core-util-is": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/core-util-is/-/core-util-is-1.0.3.tgz", + "integrity": "sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ==" + }, "node_modules/cross-spawn": { "version": "7.0.3", "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.3.tgz", @@ -2865,6 +2948,28 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/decompress-response": { + "version": "7.0.0", + "resolved": "https://registry.npmjs.org/decompress-response/-/decompress-response-7.0.0.tgz", + "integrity": "sha512-6IvPrADQyyPGLpMnUh6kfKiqy7SrbXbjoUuZ90WMBJKErzv2pCiwlGEXjRX9/54OnTq+XFVnkOnOMzclLI5aEA==", + "dependencies": { + "mimic-response": "^3.1.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/delayed-stream": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz", + "integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==", + "engines": { + "node": ">=0.4.0" + } + }, "node_modules/dequal": { "version": "2.0.3", "resolved": "https://registry.npmjs.org/dequal/-/dequal-2.0.3.tgz", @@ -3118,11 +3223,24 @@ "@types/estree": "^1.0.0" } }, + "node_modules/event-source-polyfill": { + "version": "1.0.31", + "resolved": "https://registry.npmjs.org/event-source-polyfill/-/event-source-polyfill-1.0.31.tgz", + "integrity": "sha512-4IJSItgS/41IxN5UVAVuAyczwZF7ZIEsM1XAoUzIHA6A+xzusEZUutdXz2Nr+MQPLxfTiCvqE79/C8HT8fKFvA==" + }, "node_modules/eventemitter3": { "version": "5.0.1", "resolved": "https://registry.npmjs.org/eventemitter3/-/eventemitter3-5.0.1.tgz", "integrity": "sha512-GWkBvjiSZK87ELrYOSESUYeVIc9mvLLf/nXalMOS5dYrgZq9o5OVkbZAVM06CVxYsCwH9BDZFPlQTlPA1j4ahA==" }, + "node_modules/eventsource": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/eventsource/-/eventsource-2.0.2.tgz", + "integrity": "sha512-IzUmBGPR3+oUG9dUeXynyNmf91/3zUSJg1lCktzKw47OXuhco54U3r9B7O4XX+Rb1Itm9OZ2b0RkTs10bICOxA==", + "engines": { + "node": ">=12.0.0" + } + }, "node_modules/execa": { "version": "8.0.1", "resolved": "https://registry.npmjs.org/execa/-/execa-8.0.1.tgz", @@ -3271,6 +3389,38 @@ "node": ">=8" } }, + "node_modules/follow-redirects": { + "version": "1.15.9", + "resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.9.tgz", + "integrity": "sha512-gew4GsXizNgdoRyqmyfMHyAmXsZDk6mHkSxZFCzW9gwlbtOW44CDtYavM+y+72qD/Vq2l550kMF52DT8fOLJqQ==", + "funding": [ + { + "type": "individual", + "url": "https://github.com/sponsors/RubenVerborgh" + } + ], + "engines": { + "node": ">=4.0" + }, + "peerDependenciesMeta": { + "debug": { + "optional": true + } + } + }, + "node_modules/form-data": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz", + "integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==", + "dependencies": { + "asynckit": "^0.4.0", + "combined-stream": "^1.0.8", + "mime-types": "^2.1.12" + }, + "engines": { + "node": ">= 6" + } + }, "node_modules/fsevents": { "version": "2.3.3", "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz", @@ -3312,6 +3462,23 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/get-it": { + "version": "8.6.5", + "resolved": "https://registry.npmjs.org/get-it/-/get-it-8.6.5.tgz", + "integrity": "sha512-o1hjPwrb/icm3WJbCweTSq8mKuDfJlqwbFauI+Pdgid99at/BFaBXFBJZE+uqvHyOVARE4z680S44vrDm8SsCw==", + "dependencies": { + "@types/follow-redirects": "^1.14.4", + "@types/progress-stream": "^2.0.5", + "decompress-response": "^7.0.0", + "follow-redirects": "^1.15.6", + "is-retry-allowed": "^2.2.0", + "progress-stream": "^2.0.0", + "tunnel-agent": "^0.6.0" + }, + "engines": { + "node": ">=14.0.0" + } + }, "node_modules/get-stream": { "version": "8.0.1", "resolved": "https://registry.npmjs.org/get-stream/-/get-stream-8.0.1.tgz", @@ -3634,6 +3801,11 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/inherits": { + "version": "2.0.4", + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==" + }, "node_modules/is-alphabetical": { "version": "2.0.1", "resolved": "https://registry.npmjs.org/is-alphabetical/-/is-alphabetical-2.0.1.tgz", @@ -3787,6 +3959,17 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/is-retry-allowed": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/is-retry-allowed/-/is-retry-allowed-2.2.0.tgz", + "integrity": "sha512-XVm7LOeLpTW4jV19QSH38vkswxoLud8sQ57YwJVTPWdiaI9I8keEhGFpBlslyVsgdQy4Opg8QOLb8YRgsyZiQg==", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/is-stream": { "version": "3.0.0", "resolved": "https://registry.npmjs.org/is-stream/-/is-stream-3.0.0.tgz", @@ -3823,6 +4006,11 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/isarray": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz", + "integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==" + }, "node_modules/isexe": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz", @@ -4861,6 +5049,25 @@ "node": ">=8.6" } }, + "node_modules/mime-db": { + "version": "1.52.0", + "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz", + "integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/mime-types": { + "version": "2.1.35", + "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz", + "integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==", + "dependencies": { + "mime-db": "1.52.0" + }, + "engines": { + "node": ">= 0.6" + } + }, "node_modules/mimic-fn": { "version": "4.0.0", "resolved": "https://registry.npmjs.org/mimic-fn/-/mimic-fn-4.0.0.tgz", @@ -4872,6 +5079,17 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/mimic-response": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/mimic-response/-/mimic-response-3.1.0.tgz", + "integrity": "sha512-z0yWI+4FDrrweS8Zmt4Ej5HdJmky15+L2e6Wgn3+iK5fWzb6T3fhNFq2+MeTRb064c6Wr4N/wv0DzQTjNzHNGQ==", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/mrmime": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/mrmime/-/mrmime-2.0.0.tgz", @@ -5333,6 +5551,20 @@ "node": ">=6" } }, + "node_modules/process-nextick-args": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/process-nextick-args/-/process-nextick-args-2.0.1.tgz", + "integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==" + }, + "node_modules/progress-stream": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/progress-stream/-/progress-stream-2.0.0.tgz", + "integrity": "sha512-xJwOWR46jcXUq6EH9yYyqp+I52skPySOeHfkxOZ2IY1AiBi/sFJhbhAKHoV3OTw/omQ45KTio9215dRJ2Yxd3Q==", + "dependencies": { + "speedometer": "~1.0.0", + "through2": "~2.0.3" + } + }, "node_modules/prompts": { "version": "2.4.2", "resolved": "https://registry.npmjs.org/prompts/-/prompts-2.4.2.tgz", @@ -5362,6 +5594,11 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/proxy-from-env": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/proxy-from-env/-/proxy-from-env-1.1.0.tgz", + "integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==" + }, "node_modules/queue-microtask": { "version": "1.2.3", "resolved": "https://registry.npmjs.org/queue-microtask/-/queue-microtask-1.2.3.tgz", @@ -5381,6 +5618,20 @@ } ] }, + "node_modules/readable-stream": { + "version": "2.3.8", + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-2.3.8.tgz", + "integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==", + "dependencies": { + "core-util-is": "~1.0.0", + "inherits": "~2.0.3", + "isarray": "~1.0.0", + "process-nextick-args": "~2.0.0", + "safe-buffer": "~5.1.1", + "string_decoder": "~1.1.1", + "util-deprecate": "~1.0.1" + } + }, "node_modules/readdirp": { "version": "3.6.0", "resolved": "https://registry.npmjs.org/readdirp/-/readdirp-3.6.0.tgz", @@ -5738,12 +5989,25 @@ "queue-microtask": "^1.2.2" } }, + "node_modules/rxjs": { + "version": "7.8.1", + "resolved": "https://registry.npmjs.org/rxjs/-/rxjs-7.8.1.tgz", + "integrity": "sha512-AA3TVj+0A2iuIoQkWEK/tqFjBq2j+6PO6Y0zJcvzLAFhEFIO3HL0vls9hWLncZbAAbK0mar7oZ4V079I/qPMxg==", + "dependencies": { + "tslib": "^2.1.0" + } + }, "node_modules/s.color": { "version": "0.0.15", "resolved": "https://registry.npmjs.org/s.color/-/s.color-0.0.15.tgz", "integrity": "sha512-AUNrbEUHeKY8XsYr/DYpl+qk5+aM+DChopnWOPEzn8YKzOhv4l2zH6LzZms3tOZP3wwdOyc0RmTciyi46HLIuA==", "devOptional": true }, + "node_modules/safe-buffer": { + "version": "5.1.2", + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.1.2.tgz", + "integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==" + }, "node_modules/safer-buffer": { "version": "2.1.2", "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz", @@ -5922,6 +6186,11 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/speedometer": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/speedometer/-/speedometer-1.0.0.tgz", + "integrity": "sha512-lgxErLl/7A5+vgIIXsh9MbeukOaCb2axgQ+bKCdIE+ibNT4XNYGNCR1qFEGq6F+YDASXK3Fh/c5FgtZchFolxw==" + }, "node_modules/sprintf-js": { "version": "1.0.3", "resolved": "https://registry.npmjs.org/sprintf-js/-/sprintf-js-1.0.3.tgz", @@ -5943,6 +6212,14 @@ "resolved": "https://registry.npmjs.org/stream-replace-string/-/stream-replace-string-2.0.0.tgz", "integrity": "sha512-TlnjJ1C0QrmxRNrON00JvaFFlNh5TTG00APw23j74ET7gkQpTASi6/L2fuiav8pzK715HXtUeClpBTw2NPSn6w==" }, + "node_modules/string_decoder": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz", + "integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==", + "dependencies": { + "safe-buffer": "~5.1.0" + } + }, "node_modules/string-width": { "version": "7.2.0", "resolved": "https://registry.npmjs.org/string-width/-/string-width-7.2.0.tgz", @@ -6041,6 +6318,15 @@ "node": ">=4" } }, + "node_modules/through2": { + "version": "2.0.5", + "resolved": "https://registry.npmjs.org/through2/-/through2-2.0.5.tgz", + "integrity": "sha512-/mrRod8xqpA+IHSLyGCQ2s8SPHiCDEeQJSep1jqLYeEUClOFG2Qsh+4FU6G9VeqpZnGW/Su8LQGc4YKni5rYSQ==", + "dependencies": { + "readable-stream": "~2.3.6", + "xtend": "~4.0.1" + } + }, "node_modules/to-fast-properties": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/to-fast-properties/-/to-fast-properties-2.0.0.tgz", @@ -6101,8 +6387,18 @@ "node_modules/tslib": { "version": "2.6.2", "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz", - "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q==", - "optional": true + "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q==" + }, + "node_modules/tunnel-agent": { + "version": "0.6.0", + "resolved": "https://registry.npmjs.org/tunnel-agent/-/tunnel-agent-0.6.0.tgz", + "integrity": "sha512-McnNiV1l8RYeY8tBgEpuodCC1mLUdbSN+CYBL7kJsJNInOP8UjDDEwdk6Mw60vdLLrr5NHKZhMAOSrR2NZuQ+w==", + "dependencies": { + "safe-buffer": "^5.0.1" + }, + "engines": { + "node": "*" + } }, "node_modules/type-fest": { "version": "2.19.0", @@ -6322,6 +6618,11 @@ "browserslist": ">= 4.21.0" } }, + "node_modules/util-deprecate": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", + "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==" + }, "node_modules/vfile": { "version": "6.0.2", "resolved": "https://registry.npmjs.org/vfile/-/vfile-6.0.2.tgz", @@ -6817,6 +7118,14 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/xtend": { + "version": "4.0.2", + "resolved": "https://registry.npmjs.org/xtend/-/xtend-4.0.2.tgz", + "integrity": "sha512-LKYU1iAXJXUgAXn9URjiu+MWhyUXHsvfp7mcuYm9dSUKK0/CjtrUwFAxD82/mCWbtLsGjFIad0wIsod4zrTAEQ==", + "engines": { + "node": ">=0.4" + } + }, "node_modules/xxhash-wasm": { "version": "1.0.2", "resolved": "https://registry.npmjs.org/xxhash-wasm/-/xxhash-wasm-1.0.2.tgz", diff --git a/package.json b/package.json index 27b2cbea..1b000710 100644 --- a/package.json +++ b/package.json @@ -14,9 +14,11 @@ "@astrojs/check": "^0.9.3", "@astrojs/rss": "^4.0.7", "@astrojs/sitemap": "^3.1.6", + "@sanity/client": "^6.21.3", "@shikijs/transformers": "^1.6.3", "astro": "^4.14.5", "astro-remark-description": "^1.1.2", + "axios": "^1.7.7", "cheerio": "^1.0.0", "gray-matter": "^4.0.3", "remark-directive": "^3.0.0", From 3b883bac0afe7769bcef8f4fc009e715575380b3 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Tue, 17 Sep 2024 18:18:42 +0200 Subject: [PATCH 03/21] Fix block content import --- internal/import.mjs | 9 +++++++-- package-lock.json | 29 +++++++++++++++++++++++------ package.json | 1 + 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index 159baabe..d07cec11 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -3,6 +3,9 @@ import fs from 'fs'; import path from 'path'; import axios from 'axios'; import * as cheerio from 'cheerio'; +import { customAlphabet } from 'nanoid' + +const nanoid = customAlphabet('0123456789abcdef', 12) const client = sanityClient({ projectId: 'o2y1bt2g', @@ -35,7 +38,7 @@ async function replaceImageUrls(content, imageMap) { } async function migratePosts() { - const p = [posts[0]] + const p = [posts[0], posts[1]] for (const post of p) { const imageMap = {}; for (const imageUrl of post.images) { @@ -51,6 +54,7 @@ async function migratePosts() { const updatedContent = await replaceImageUrls(post.content, imageMap); + const sanityPost = { _type: 'blogPostDev', title: post.title, @@ -59,7 +63,8 @@ async function migratePosts() { body: [ { _type: 'block', - children: [{ _type: 'span', text: updatedContent }], + _key: nanoid(), + children: [{ _type: 'span', text: updatedContent, _key: nanoid() }], }, ], }; diff --git a/package-lock.json b/package-lock.json index 2ddeecef..9b501626 100644 --- a/package-lock.json +++ b/package-lock.json @@ -18,6 +18,7 @@ "axios": "^1.7.7", "cheerio": "^1.0.0", "gray-matter": "^4.0.3", + "nanoid": "^5.0.7", "remark-directive": "^3.0.0", "typescript": "^5.4.5" }, @@ -5110,21 +5111,20 @@ "license": "MIT" }, "node_modules/nanoid": { - "version": "3.3.7", - "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.7.tgz", - "integrity": "sha512-eSRppjcPIatRIMC1U6UngP8XFcz8MQWGQdt1MTBQ7NaAmvXDfvNxbvWV3x2y6CdEUciCSsDHDQZbhYaB8QEo2g==", + "version": "5.0.7", + "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-5.0.7.tgz", + "integrity": "sha512-oLxFY2gd2IqnjcYyOXD8XGCftpGtZP2AbHbOkthDkvRywH5ayNtPVy9YlOPcHckXzbLTCHpkb7FB+yuxKV13pQ==", "funding": [ { "type": "github", "url": "https://github.com/sponsors/ai" } ], - "license": "MIT", "bin": { - "nanoid": "bin/nanoid.cjs" + "nanoid": "bin/nanoid.js" }, "engines": { - "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" + "node": "^18 || >=20" } }, "node_modules/neotraverse": { @@ -5493,6 +5493,23 @@ "node": "^10 || ^12 || >=14" } }, + "node_modules/postcss/node_modules/nanoid": { + "version": "3.3.7", + "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.7.tgz", + "integrity": "sha512-eSRppjcPIatRIMC1U6UngP8XFcz8MQWGQdt1MTBQ7NaAmvXDfvNxbvWV3x2y6CdEUciCSsDHDQZbhYaB8QEo2g==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "bin": { + "nanoid": "bin/nanoid.cjs" + }, + "engines": { + "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" + } + }, "node_modules/preferred-pm": { "version": "4.0.0", "resolved": "https://registry.npmjs.org/preferred-pm/-/preferred-pm-4.0.0.tgz", diff --git a/package.json b/package.json index 1b000710..2c8395e7 100644 --- a/package.json +++ b/package.json @@ -21,6 +21,7 @@ "axios": "^1.7.7", "cheerio": "^1.0.0", "gray-matter": "^4.0.3", + "nanoid": "^5.0.7", "remark-directive": "^3.0.0", "typescript": "^5.4.5" }, From 31addd3b76adb4921bdb6c30bd87717f03202c7a Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Tue, 17 Sep 2024 19:13:33 +0200 Subject: [PATCH 04/21] Image fix --- internal/export.mjs | 10 +++++++--- internal/import.mjs | 29 +++++++++++++++++------------ 2 files changed, 24 insertions(+), 15 deletions(-) diff --git a/internal/export.mjs b/internal/export.mjs index 22cc9f4f..06d63608 100644 --- a/internal/export.mjs +++ b/internal/export.mjs @@ -6,12 +6,16 @@ import * as cheerio from 'cheerio'; const postsDirectory = path.join(process.cwd(), '../src/content/blog'); const outputFile = path.join(process.cwd(), 'export.json'); -function extractImageUrls(content) { +function extractImagePaths(content, postPath) { const $ = cheerio.load(content); const images = []; $('img').each((i, elem) => { const src = $(elem).attr('src'); - if (src) images.push(src); + if (src) { + // Convert the src to a path relative to the content root + const imagePath = path.relative(contentRoot, path.resolve(path.dirname(postPath), src)); + images.push(imagePath); + } }); return images; } @@ -28,7 +32,7 @@ function getPostsRecursively(dir) { } else if (item.isFile() && item.name.endsWith('.md')) { const fileContents = fs.readFileSync(fullPath, 'utf8'); const { data, content } = matter(fileContents); - const images = extractImageUrls(content); + const images = extractImagePaths(content, fullPath); posts.push({ slug: path.relative(postsDirectory, fullPath).replace('.md', ''), diff --git a/internal/import.mjs b/internal/import.mjs index d07cec11..0bd91dec 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -1,7 +1,6 @@ import sanityClient from '@sanity/client'; import fs from 'fs'; import path from 'path'; -import axios from 'axios'; import * as cheerio from 'cheerio'; import { customAlphabet } from 'nanoid' @@ -15,11 +14,16 @@ const client = sanityClient({ }); const postsFile = path.join(process.cwd(), 'export.json'); -const posts = JSON.parse(fs.readFileSync(postsFile, 'utf8')); +const contentRoot = path.join(process.cwd(), '../public'); // Adjust this to your content root -async function downloadImage(url) { - const response = await axios.get(url, { responseType: 'arraybuffer' }); - return Buffer.from(response.data, 'binary'); +async function readPosts() { + const data = await fs.promises.readFile(postsFile, 'utf8'); + return JSON.parse(data); +} + +async function readImageFromFileSystem(imagePath) { + const fullPath = path.join(contentRoot, imagePath); + return fs.promises.readFile(fullPath); } async function uploadImageToSanity(imageBuffer, filename) { @@ -38,23 +42,24 @@ async function replaceImageUrls(content, imageMap) { } async function migratePosts() { - const p = [posts[0], posts[1]] + const posts = await readPosts(); + const p = [posts[4]]; + for (const post of p) { const imageMap = {}; - for (const imageUrl of post.images) { + for (const imagePath of post.images) { try { - const imageBuffer = await downloadImage(imageUrl); - const filename = path.basename(imageUrl); + const imageBuffer = await readImageFromFileSystem(imagePath); + const filename = path.basename(imagePath); const uploadedImage = await uploadImageToSanity(imageBuffer, filename); - imageMap[imageUrl] = uploadedImage.url; + imageMap[imagePath] = uploadedImage.url; } catch (error) { - console.error(`Failed to process image: ${imageUrl}`, error); + console.error(`Failed to process image: ${imagePath}`, error); } } const updatedContent = await replaceImageUrls(post.content, imageMap); - const sanityPost = { _type: 'blogPostDev', title: post.title, From 66403e5c3d45df156638649c169a38186b27a564 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 18 Sep 2024 12:13:32 +0200 Subject: [PATCH 05/21] Markdown to PT updates --- internal/import.mjs | 105 +++++++++++++++++++++++++++++++++++--------- package-lock.json | 12 +++++ package.json | 1 + 3 files changed, 97 insertions(+), 21 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index 0bd91dec..08c61553 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -1,10 +1,10 @@ import sanityClient from '@sanity/client'; import fs from 'fs'; import path from 'path'; -import * as cheerio from 'cheerio'; -import { customAlphabet } from 'nanoid' +import { customAlphabet } from 'nanoid'; +import { marked } from 'marked'; -const nanoid = customAlphabet('0123456789abcdef', 12) +const nanoid = customAlphabet('0123456789abcdef', 12); const client = sanityClient({ projectId: 'o2y1bt2g', @@ -14,7 +14,7 @@ const client = sanityClient({ }); const postsFile = path.join(process.cwd(), 'export.json'); -const contentRoot = path.join(process.cwd(), '../public'); // Adjust this to your content root +const contentRoot = path.join(process.cwd(), '../public'); async function readPosts() { const data = await fs.promises.readFile(postsFile, 'utf8'); @@ -30,15 +30,84 @@ async function uploadImageToSanity(imageBuffer, filename) { return client.assets.upload('image', imageBuffer, { filename }); } -async function replaceImageUrls(content, imageMap) { - const $ = cheerio.load(content); - $('img').each((i, elem) => { - const src = $(elem).attr('src'); - if (src && imageMap[src]) { - $(elem).attr('src', imageMap[src]); - } - }); - return $.html(); +function markdownToPortableText(markdown, imageMap) { + const tokens = marked.lexer(markdown); + return tokens.map(tokenToPortableText.bind(null, imageMap)).filter(Boolean); +} + +function tokenToPortableText(imageMap, token) { + switch (token.type) { + case 'heading': + return { + _type: 'block', + _key: nanoid(), + style: `h${token.depth}`, + children: [{ _type: 'span', text: token.text, _key: nanoid() }], + }; + case 'paragraph': + return { + _type: 'block', + _key: nanoid(), + children: token.tokens.map(inlineTokenToPortableText.bind(null, imageMap)), + }; + case 'image': + const imageUrl = imageMap[token.href] || token.href; + return { + _type: 'image', + _key: nanoid(), + asset: { + _type: 'reference', + _ref: imageUrl.split('-')[1], + }, + alt: token.text, + }; + case 'code': + return { + _type: 'code', + _key: nanoid(), + code: token.text + }; + // Add more cases for other block-level elements as needed + default: + console.warn(`Unsupported token type: ${token.type}`, token); + return null; + } +} + +function inlineTokenToPortableText(imageMap, token) { + switch (token.type) { + case 'text': + return { _type: 'span', text: token.text, _key: nanoid() }; + case 'link': + return { + _type: 'span', + _key: nanoid(), + marks: ['link'], + text: token.text, + data: { href: token.href }, + }; + case 'image': + const imageUrl = imageMap[token.href] || token.href; + return { + _type: 'image', + _key: nanoid(), + asset: { + _type: 'reference', + _ref: imageUrl.split('-')[1], + }, + alt: token.text, + }; + case 'codespan': + return { + _type: 'span', + _key: nanoid(), + marks: ['code'], + text: token.text, + }; + default: + console.warn(`Unsupported inline token type: ${token.type}`); + return { _type: 'span', text: token.raw, _key: nanoid() }; + } } async function migratePosts() { @@ -58,20 +127,14 @@ async function migratePosts() { } } - const updatedContent = await replaceImageUrls(post.content, imageMap); + const portableTextContent = markdownToPortableText(post.content, imageMap); const sanityPost = { _type: 'blogPostDev', title: post.title, meta: { slug: { current: post.slug } }, publishedAt: new Date(post.date).toISOString(), - body: [ - { - _type: 'block', - _key: nanoid(), - children: [{ _type: 'span', text: updatedContent, _key: nanoid() }], - }, - ], + body: portableTextContent, }; try { diff --git a/package-lock.json b/package-lock.json index 9b501626..ff7334c2 100644 --- a/package-lock.json +++ b/package-lock.json @@ -18,6 +18,7 @@ "axios": "^1.7.7", "cheerio": "^1.0.0", "gray-matter": "^4.0.3", + "marked": "^14.1.2", "nanoid": "^5.0.7", "remark-directive": "^3.0.0", "typescript": "^5.4.5" @@ -4215,6 +4216,17 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/marked": { + "version": "14.1.2", + "resolved": "https://registry.npmjs.org/marked/-/marked-14.1.2.tgz", + "integrity": "sha512-f3r0yqpz31VXiDB/wj9GaOB0a2PRLQl6vJmXiFrniNwjkKdvakqJRULhjFKJpxOchlCRiG5fcacoUZY5Xa6PEQ==", + "bin": { + "marked": "bin/marked.js" + }, + "engines": { + "node": ">= 18" + } + }, "node_modules/mdast-util-definitions": { "version": "6.0.0", "resolved": "https://registry.npmjs.org/mdast-util-definitions/-/mdast-util-definitions-6.0.0.tgz", diff --git a/package.json b/package.json index 2c8395e7..818c2d6c 100644 --- a/package.json +++ b/package.json @@ -21,6 +21,7 @@ "axios": "^1.7.7", "cheerio": "^1.0.0", "gray-matter": "^4.0.3", + "marked": "^14.1.2", "nanoid": "^5.0.7", "remark-directive": "^3.0.0", "typescript": "^5.4.5" From 8db8a7c29244a8be10f191e0b6aed985c2af3b51 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 18 Sep 2024 13:00:33 +0200 Subject: [PATCH 06/21] Image import fixes --- internal/import.mjs | 59 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 52 insertions(+), 7 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index 08c61553..3f9a6c7b 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -36,6 +36,7 @@ function markdownToPortableText(markdown, imageMap) { } function tokenToPortableText(imageMap, token) { + switch (token.type) { case 'heading': return { @@ -51,13 +52,17 @@ function tokenToPortableText(imageMap, token) { children: token.tokens.map(inlineTokenToPortableText.bind(null, imageMap)), }; case 'image': - const imageUrl = imageMap[token.href] || token.href; + const image = imageMap[src]; + if (!image?._id) { + console.warn(`Failed to find image for token: ${token.href}`); + return null; + } return { _type: 'image', _key: nanoid(), asset: { _type: 'reference', - _ref: imageUrl.split('-')[1], + _ref: image._id, }, alt: token.text, }; @@ -67,7 +72,33 @@ function tokenToPortableText(imageMap, token) { _key: nanoid(), code: token.text }; - // Add more cases for other block-level elements as needed + case 'html': + if (token.text.includes('/)[0]; + const srcMatch = imgTag.match(/src=(['"])(.*?)\1/); + const altMatch = imgTag.match(/alt=(['"])(.*?)\1/); + const src = srcMatch ? srcMatch[2] : ''; + const alt = altMatch ? altMatch[2] : ''; + + const image = imageMap[src]; + if (!image?._id) { + console.warn(`Failed to find image for token: ${token.text}`); + return null; + } + + return { + _type: 'image', + _key: nanoid(), + asset: { + _type: 'reference', + _ref: image._id, + }, + alt, + }; + } else { + console.warn(`Unsupported HTML token: ${token.text}`); + return null; + } default: console.warn(`Unsupported token type: ${token.type}`, token); return null; @@ -77,7 +108,15 @@ function tokenToPortableText(imageMap, token) { function inlineTokenToPortableText(imageMap, token) { switch (token.type) { case 'text': - return { _type: 'span', text: token.text, _key: nanoid() }; + let marks = []; + if (token.bold) marks.push('strong'); + if (token.italic) marks.push('em'); + return { + _type: 'span', + text: token.text, + marks: marks, + _key: nanoid() + }; case 'link': return { _type: 'span', @@ -87,13 +126,17 @@ function inlineTokenToPortableText(imageMap, token) { data: { href: token.href }, }; case 'image': - const imageUrl = imageMap[token.href] || token.href; + const image = imageMap[token.href]; + if (!image?._id) { + console.warn(`Failed to find image for token: ${token.href}`); + return null; + } return { _type: 'image', _key: nanoid(), asset: { _type: 'reference', - _ref: imageUrl.split('-')[1], + _ref: image._id, }, alt: token.text, }; @@ -121,11 +164,13 @@ async function migratePosts() { const imageBuffer = await readImageFromFileSystem(imagePath); const filename = path.basename(imagePath); const uploadedImage = await uploadImageToSanity(imageBuffer, filename); - imageMap[imagePath] = uploadedImage.url; + imageMap[imagePath] = uploadedImage; } catch (error) { console.error(`Failed to process image: ${imagePath}`, error); } } + console.log('Image map:', imageMap); + const portableTextContent = markdownToPortableText(post.content, imageMap); From 1ad77c9756861d7f9f4c258bc2010acf802fe01d Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Thu, 19 Sep 2024 12:16:54 +0200 Subject: [PATCH 07/21] Sanitize text --- internal/import.mjs | 63 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 54 insertions(+), 9 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index 3f9a6c7b..cb13b9bc 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -35,6 +35,11 @@ function markdownToPortableText(markdown, imageMap) { return tokens.map(tokenToPortableText.bind(null, imageMap)).filter(Boolean); } +function sanitizeText(text) { + // Replace all instances of ' with ' + return text.replace(/'/g, "'"); +} + function tokenToPortableText(imageMap, token) { switch (token.type) { @@ -43,7 +48,7 @@ function tokenToPortableText(imageMap, token) { _type: 'block', _key: nanoid(), style: `h${token.depth}`, - children: [{ _type: 'span', text: token.text, _key: nanoid() }], + children: [{ _type: 'span', text: sanitizeText(token.text), _key: nanoid() }], }; case 'paragraph': return { @@ -95,10 +100,32 @@ function tokenToPortableText(imageMap, token) { }, alt, }; + + } else if (token.text.startsWith(' + item.tokens.map(inlineTokenToPortableText.bind(null, imageMap)) + ), + }; default: console.warn(`Unsupported token type: ${token.type}`, token); return null; @@ -113,7 +140,7 @@ function inlineTokenToPortableText(imageMap, token) { if (token.italic) marks.push('em'); return { _type: 'span', - text: token.text, + text: sanitizeText(token.text), marks: marks, _key: nanoid() }; @@ -122,7 +149,7 @@ function inlineTokenToPortableText(imageMap, token) { _type: 'span', _key: nanoid(), marks: ['link'], - text: token.text, + text: sanitizeText(token.text), data: { href: token.href }, }; case 'image': @@ -147,6 +174,12 @@ function inlineTokenToPortableText(imageMap, token) { marks: ['code'], text: token.text, }; + case 'list_item': + return { + _type: 'span', + _key: nanoid(), + text: sanitizeText(token.text), + }; default: console.warn(`Unsupported inline token type: ${token.type}`); return { _type: 'span', text: token.raw, _key: nanoid() }; @@ -155,9 +188,21 @@ function inlineTokenToPortableText(imageMap, token) { async function migratePosts() { const posts = await readPosts(); - const p = [posts[4]]; + const firstTen = posts.slice(0, 10); + const selectedPost = posts.find(p => p.slug === '2016/deploy-in-the-cloud-at-snap-of-a-finger'); + + console.log(''); + console.log(''); + console.log(''); + console.log(''); + console.log(''); + console.log(''); + console.log('🪣 Migrating posts...'); + console.log(''); + - for (const post of p) { + for (const post of [selectedPost]) { + const imageMap = {}; for (const imagePath of post.images) { try { @@ -169,22 +214,22 @@ async function migratePosts() { console.error(`Failed to process image: ${imagePath}`, error); } } - console.log('Image map:', imageMap); - const portableTextContent = markdownToPortableText(post.content, imageMap); + const newSlug = post.slug.split('/').pop(); + const sanityPost = { _type: 'blogPostDev', title: post.title, - meta: { slug: { current: post.slug } }, + meta: { slug: { current: newSlug } }, publishedAt: new Date(post.date).toISOString(), body: portableTextContent, }; try { const result = await client.create(sanityPost); - console.log(`Successfully migrated post: ${result.title}`); + console.log(`✅ Successfully migrated post: ${result.title}`); } catch (error) { console.error(`Failed to migrate post: ${post.title}`, error); } From 2784d00600af113476f3304b76ac82108962ead4 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Thu, 19 Sep 2024 13:40:01 +0200 Subject: [PATCH 08/21] Fix links parsing --- internal/import.mjs | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index cb13b9bc..8eea6903 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -51,10 +51,34 @@ function tokenToPortableText(imageMap, token) { children: [{ _type: 'span', text: sanitizeText(token.text), _key: nanoid() }], }; case 'paragraph': + const children = []; + const markDefs = []; + + token.tokens.forEach(t => { + if (t.type === 'link') { + const linkKey = nanoid(); + children.push({ + _type: 'span', + _key: nanoid(), + marks: [linkKey], + text: sanitizeText(t.text) + }); + markDefs.push({ + _key: linkKey, + _type: 'link', + href: t.href + }); + } else { + children.push(inlineTokenToPortableText(imageMap, t)); + } + }); + return { _type: 'block', _key: nanoid(), - children: token.tokens.map(inlineTokenToPortableText.bind(null, imageMap)), + style: 'normal', + children: children, + markDefs: markDefs }; case 'image': const image = imageMap[src]; @@ -145,13 +169,8 @@ function inlineTokenToPortableText(imageMap, token) { _key: nanoid() }; case 'link': - return { - _type: 'span', - _key: nanoid(), - marks: ['link'], - text: sanitizeText(token.text), - data: { href: token.href }, - }; + // This case is now handled in tokenToPortableText + return null; case 'image': const image = imageMap[token.href]; if (!image?._id) { @@ -181,7 +200,7 @@ function inlineTokenToPortableText(imageMap, token) { text: sanitizeText(token.text), }; default: - console.warn(`Unsupported inline token type: ${token.type}`); + console.warn(`Unsupported inline token type: ${token.type}`, token); return { _type: 'span', text: token.raw, _key: nanoid() }; } } From a9a2cc85739efd90ee9b3765d7c5f4e773773cd5 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Thu, 19 Sep 2024 14:14:27 +0200 Subject: [PATCH 09/21] Import/export fixes & sanitization --- internal/export.json | 2 +- internal/export.mjs | 4 +--- internal/import.mjs | 28 ++++++++++++++++++---------- 3 files changed, 20 insertions(+), 14 deletions(-) diff --git a/internal/export.json b/internal/export.json index 3721aee1..3635ae97 100644 --- a/internal/export.json +++ b/internal/export.json @@ -61,7 +61,7 @@ "slug": "2016/deploy-in-the-cloud-at-snap-of-a-finger", "title": "Deploy your computational pipelines in the cloud at the snap-of-a-finger", "date": "2016-09-01T00:00:00.000Z", - "content": "\n

\nLearn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers\n

\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\n

\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n

\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\nWe are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image `ami-43f49030` `ami-4b7daa32`\\* (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the documentation for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", + "content": "\n

\nLearn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers\n

\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\n

\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n

\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n- The AWS access and secret keys are provided by using the usual environment variables.\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\nWe are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image `ami-43f49030` `ami-4b7daa32`\\* (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the documentation for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", "images": [] }, { diff --git a/internal/export.mjs b/internal/export.mjs index 06d63608..29da2d46 100644 --- a/internal/export.mjs +++ b/internal/export.mjs @@ -12,9 +12,7 @@ function extractImagePaths(content, postPath) { $('img').each((i, elem) => { const src = $(elem).attr('src'); if (src) { - // Convert the src to a path relative to the content root - const imagePath = path.relative(contentRoot, path.resolve(path.dirname(postPath), src)); - images.push(imagePath); + images.push(src); } }); return images; diff --git a/internal/import.mjs b/internal/import.mjs index 8eea6903..f41620de 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -35,9 +35,13 @@ function markdownToPortableText(markdown, imageMap) { return tokens.map(tokenToPortableText.bind(null, imageMap)).filter(Boolean); } -function sanitizeText(text) { +function sanitizeText(text, removeLineBreaks = false) { // Replace all instances of ' with ' - return text.replace(/'/g, "'"); + const t = text.replace(/'/g, "'"); + + if (removeLineBreaks) return t.replace(/\n/g, ' '); + + return t } function tokenToPortableText(imageMap, token) { @@ -164,13 +168,10 @@ function inlineTokenToPortableText(imageMap, token) { if (token.italic) marks.push('em'); return { _type: 'span', - text: sanitizeText(token.text), + text: sanitizeText(token.text, true), marks: marks, _key: nanoid() }; - case 'link': - // This case is now handled in tokenToPortableText - return null; case 'image': const image = imageMap[token.href]; if (!image?._id) { @@ -197,7 +198,7 @@ function inlineTokenToPortableText(imageMap, token) { return { _type: 'span', _key: nanoid(), - text: sanitizeText(token.text), + text: sanitizeText(token.text, true), }; default: console.warn(`Unsupported inline token type: ${token.type}`, token); @@ -208,7 +209,11 @@ function inlineTokenToPortableText(imageMap, token) { async function migratePosts() { const posts = await readPosts(); const firstTen = posts.slice(0, 10); - const selectedPost = posts.find(p => p.slug === '2016/deploy-in-the-cloud-at-snap-of-a-finger'); + const selected = [ + '2016/deploy-in-the-cloud-at-snap-of-a-finger', + '2017/caw-and-singularity', + ] + const selectedPosts = posts.filter(post => selected.includes(post.slug)); console.log(''); console.log(''); @@ -216,11 +221,14 @@ async function migratePosts() { console.log(''); console.log(''); console.log(''); - console.log('🪣 Migrating posts...'); + console.log(''); + console.log(''); + console.log(''); + console.log('🟢🟢🟢 Migrating posts...'); console.log(''); - for (const post of [selectedPost]) { + for (const post of selectedPosts) { const imageMap = {}; for (const imagePath of post.images) { From dc0689952b37629da8362d04cb60c0e8a6d3f452 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Fri, 20 Sep 2024 11:08:53 +0200 Subject: [PATCH 10/21] Attempt to fix lists --- internal/import.mjs | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/internal/import.mjs b/internal/import.mjs index f41620de..ce27528d 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -38,14 +38,14 @@ function markdownToPortableText(markdown, imageMap) { function sanitizeText(text, removeLineBreaks = false) { // Replace all instances of ' with ' const t = text.replace(/'/g, "'"); - + if (removeLineBreaks) return t.replace(/\n/g, ' '); - + return t } function tokenToPortableText(imageMap, token) { - + switch (token.type) { case 'heading': return { @@ -57,7 +57,7 @@ function tokenToPortableText(imageMap, token) { case 'paragraph': const children = []; const markDefs = []; - + token.tokens.forEach(t => { if (t.type === 'link') { const linkKey = nanoid(); @@ -76,7 +76,7 @@ function tokenToPortableText(imageMap, token) { children.push(inlineTokenToPortableText(imageMap, t)); } }); - + return { _type: 'block', _key: nanoid(), @@ -118,7 +118,7 @@ function tokenToPortableText(imageMap, token) { console.warn(`Failed to find image for token: ${token.text}`); return null; } - + return { _type: 'image', _key: nanoid(), @@ -149,8 +149,9 @@ function tokenToPortableText(imageMap, token) { return { _type: 'block', _key: nanoid(), - style: token.ordered ? 'number' : 'bullet', - children: token.items.flatMap(item => + listItem: 'bullet', + style: 'normal', + children: token.items.flatMap(item => item.tokens.map(inlineTokenToPortableText.bind(null, imageMap)) ), }; @@ -166,11 +167,11 @@ function inlineTokenToPortableText(imageMap, token) { let marks = []; if (token.bold) marks.push('strong'); if (token.italic) marks.push('em'); - return { - _type: 'span', - text: sanitizeText(token.text, true), + return { + _type: 'span', + text: sanitizeText(token.text, true), marks: marks, - _key: nanoid() + _key: nanoid() }; case 'image': const image = imageMap[token.href]; @@ -199,6 +200,7 @@ function inlineTokenToPortableText(imageMap, token) { _type: 'span', _key: nanoid(), text: sanitizeText(token.text, true), + marks: [], }; default: console.warn(`Unsupported inline token type: ${token.type}`, token); @@ -226,10 +228,10 @@ async function migratePosts() { console.log(''); console.log('🟢🟢🟢 Migrating posts...'); console.log(''); - + for (const post of selectedPosts) { - + const imageMap = {}; for (const imagePath of post.images) { try { @@ -241,7 +243,7 @@ async function migratePosts() { console.error(`Failed to process image: ${imagePath}`, error); } } - + const portableTextContent = markdownToPortableText(post.content, imageMap); const newSlug = post.slug.split('/').pop(); From d19f1a778c1e3f3ec2941ed91b85cc90e11acf33 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Fri, 20 Sep 2024 11:08:58 +0200 Subject: [PATCH 11/21] Add clear function --- internal/clearAll.mjs | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 internal/clearAll.mjs diff --git a/internal/clearAll.mjs b/internal/clearAll.mjs new file mode 100644 index 00000000..ab9c2587 --- /dev/null +++ b/internal/clearAll.mjs @@ -0,0 +1,39 @@ +import sanityClient from '@sanity/client'; + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + +async function deleteAllBlogPostDev() { + try { + // 1. Fetch all documents of type 'blogPostDev' + const query = '*[_type == "blogPostDev"]._id'; + const ids = await client.fetch(query); + + console.log(`Found ${ids.length} blogPostDev documents to delete.`); + + // 2. Delete the documents in batches + const batchSize = 100; // Adjust based on your needs + for (let i = 0; i < ids.length; i += batchSize) { + const batch = ids.slice(i, i + batchSize); + const transaction = client.transaction(); + + batch.forEach(id => { + transaction.delete(id); + }); + + console.log(`Deleting batch ${i / batchSize + 1}...`); + await transaction.commit(); + console.log(`Batch ${i / batchSize + 1} deleted.`); + } + + console.log('All blogPostDev documents have been deleted.'); + } catch (error) { + console.error('Error deleting documents:', error); + } +} + +deleteAllBlogPostDev(); \ No newline at end of file From db1c32cfdaf93efadfff9a58e1900c1afda96c4f Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 25 Sep 2024 11:41:01 +0200 Subject: [PATCH 12/21] Add author export/import --- internal/export.json | 346 ++++++++++++++++++++++++++++++---------- internal/export.mjs | 6 +- internal/findPerson.mjs | 21 +++ internal/import.mjs | 17 +- 4 files changed, 299 insertions(+), 91 deletions(-) create mode 100644 internal/findPerson.mjs diff --git a/internal/export.json b/internal/export.json index 3635ae97..b18c7446 100644 --- a/internal/export.json +++ b/internal/export.json @@ -4,28 +4,36 @@ "title": "Reproducibility in Science - Nextflow meets Docker", "date": "2014-09-09T00:00:00.000Z", "content": "\nThe scientific world nowadays operates on the basis of published articles.\nThese are used to report novel discoveries to the rest of the scientific community.\n\nBut have you ever wondered what a scientific article is? It is a:\n\n1. defeasible argument for claims, supported by\n2. exhibited, reproducible data and methods, and\n3. explicit references to other work in that domain;\n4. described using domain-agreed technical terminology,\n5. which exists within a complex ecosystem of technologies, people and activities.\n\nHence the very essence of Science relies on the ability of scientists to reproduce and\nbuild upon each other’s published results.\n\nSo how much can we rely on published data? In a recent report in Nature, researchers at the\nAmgen corporation found that only 11% of the academic research in the literature was\nreproducible by their groups [[1](http://www.nature.com/nature/journal/v483/n7391/full/483531a.html)].\n\nWhile many factors are likely at play here, perhaps the most basic requirement for\nreproducibility holds that the materials reported in a study can be uniquely identified\nand obtained, such that experiments can be reproduced as faithfully as possible.\nThis information is meant to be documented in the \"materials and methods\" of journal articles,\nbut as many can attest, the information provided there is often not adequate for this task.\n\n### Promoting Computational Research Reproducibility\n\nEncouragingly scientific reproducibility has been at the forefront of many news stories\nand there exist numerous initiatives to help address this problem. Particularly, when it\ncomes to producing reproducible computational analyses, some publications are starting\nto publish the code and data used for analysing and generating figures.\n\nFor example, many articles in Nature and in the new Elife journal (and others) provide a\n\"source data\" download link next to figures. Sometimes Elife might even have an option\nto download the source code for figures.\n\nAs pointed out by Melissa Gymrek [in a recent post](http://melissagymrek.com/science/2014/08/29/docker-reproducible-research.html)\nthis is a great start, but there are still lots of problems. She wrote that, for example, if one wants\nto re-execute a data analyses from these papers, he/she will have to download the\nscripts and the data, to only realize that he/she has not all the required libraries,\nor that it only runs on, for example, an Ubuntu version he/she doesn't have, or some\npaths are hard-coded to match the authors' machine.\n\nIf it's not easy to run and doesn't run out of the box the chances that a researcher\nwill actually ever run most of these scripts is close to zero, especially if they lack\nthe time or expertise to manage the required installation of third-party libraries,\ntools or implement from scratch state-of-the-art data processing algorithms.\n\n### Here comes Docker\n\n[Docker](http://www.docker.com) containers technology is a solution to many of the computational\nresearch reproducibility problems. Basically, it is a kind of a lightweight virtual machine\nwhere you can set up a computing environment including all the libraries, code and data that you need,\nwithin a single _image_.\n\nThis image can be distributed publicly and can seamlessly run on any major Linux operating system.\nNo need for the user to mess with installation, paths, etc.\n\nThey just run the Docker image you provided, and everything is set up to work out of the box.\nResearchers have already started discussing this (e.g. [here](http://www.bioinformaticszen.com/post/reproducible-assembler-benchmarks/),\nand [here](https://bcbio.wordpress.com/2014/03/06/improving-reproducibility-and-installation-of-genomic-analysis-pipelines-with-docker/)).\n\n### Docker and Nextflow: a perfect match\n\nOne big advantage Docker has compared to _traditional_ machine virtualisation technology\nis that it doesn't need a complete copy of the operating system, thus it has a minimal\nstartup time. This makes it possible to virtualise single applications or launch the execution\nof multiple containers, that can run in parallel, in order to speedup a large computation.\n\nNextflow is a data-driven toolkit for computational pipelines, which aims to simplify the deployment of\ndistributed and highly parallelised pipelines for scientific applications.\n\nThe latest version integrates the support for Docker containers that enables the deployment\nof self-contained and truly reproducible pipelines.\n\n### How they work together\n\nA Nextflow pipeline is made up by putting together several processes. Each process\ncan be written in any scripting language that can be executed by the Linux platform\n(BASH, Perl, Ruby, Python, etc). Parallelisation is automatically managed\nby the framework and it is implicitly defined by the processes input and\noutput declarations.\n\nBy integrating Docker with Nextflow, every pipeline process can be executed independently\nin its own container, this guarantees that each of them run in a predictable\nmanner without worrying about the configuration of the target execution platform. Moreover the\nminimal overhead added by Docker allows us to spawn multiple container executions in a parallel\nmanner with a negligible performance loss when compared to a platform _native_ execution.\n\n### An example\n\nAs a proof of concept of the Docker integration with Nextflow you can try out the\npipeline example at this [link](https://github.com/nextflow-io/examples/blob/master/blast-parallel.nf).\n\nIt splits a protein sequences multi FASTA file into chunks of _n_ entries, executes a BLAST query\nfor each of them, then extracts the top 10 matching sequences and\nfinally aligns the results with the T-Coffee multiple sequence aligner.\n\nIn a common scenario you generally need to install and configure the tools required by this\nscript: BLAST and T-Coffee. Moreover you should provide a formatted protein database in order\nto execute the BLAST search.\n\nBy using Docker with Nextflow you only need to have the Docker engine installed in your\ncomputer and a Java VM. In order to try this example out, follow these steps:\n\nInstall the latest version of Nextflow by entering the following command in your shell terminal:\n\n curl -fsSL get.nextflow.io | bash\n\nThen download the required Docker image with this command:\n\n docker pull nextflow/examples\n\nYou can check the content of the image looking at the [Dockerfile](https://github.com/nextflow-io/examples/blob/master/Dockerfile)\nused to create it.\n\nNow you are ready to run the demo by launching the pipeline execution as shown below:\n\n nextflow run examples/blast-parallel.nf -with-docker\n\nThis will run the pipeline printing the final alignment out on the terminal screen.\nYou can also provide your own protein sequences multi FASTA file by adding, in the above command line,\nthe option `--query ` and change the splitting chunk size with `--chunk n` option.\n\nNote: the result doesn't have a real biological meaning since it uses a very small protein database.\n\n### Conclusion\n\nThe mix of Docker, GitHub and Nextflow technologies make it possible to deploy\nself-contained and truly replicable pipelines. It requires zero configuration and\nenables the reproducibility of data analysis pipelines in any system in which a Java VM and\nthe Docker engine are available.\n\n### Learn how to do it!\n\nFollow our documentation for a quick start using Docker with Nextflow at\nthe following link https://www.nextflow.io/docs/latest/docker.html\n", - "images": [] + "images": [], + "author": "Maria Chatzou", + "tags": "docker,github,reproducibility,data-analysis" }, { "slug": "2014/share-nextflow-pipelines-with-github", "title": "Share Nextflow pipelines with GitHub", "date": "2014-08-07T00:00:00.000Z", "content": "\nThe [GitHub](https://github.com) code repository and collaboration platform is widely\nused between researchers to publish their work and to collaborate on projects source code.\n\nEven more interestingly a few months ago [GitHub announced improved support for researchers](https://github.com/blog/1840-improving-github-for-science)\nmaking it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.\n\nWith a DOI for your GitHub repository archive your code becomes formally citable\nin scientific publications.\n\n### Why use GitHub with Nextflow?\n\nThe latest Nextflow release (0.9.0) seamlessly integrates with GitHub.\nThis feature allows you to manage your code in a more consistent manner, or use other\npeople's Nextflow pipelines, published through GitHub, in a quick and transparent manner.\n\n### How it works\n\nThe idea is very simple, when you launch a script execution with Nextflow, it will look for\na file with the pipeline name you've specified. If that file does not exist,\nit will look for a public repository with the same name on GitHub. If it is found, the\nrepository is automatically downloaded to your computer and the code executed. This repository\nis stored in the Nextflow home directory, by default `$HOME/.nextflow`, thus it will be reused\nfor any further execution.\n\nYou can try this feature out, having Nextflow (version 0.9.0 or higher) installed in your computer,\nby simply entering the following command in your shell terminal:\n\n nextflow run nextflow-io/hello\n\nThe first time you execute this command Nextflow will download the pipeline\nat the following GitHub repository `https://github.com/nextflow-io/hello`,\nas you don't already have it in your computer. It will then execute it producing the expected output.\n\nIn order for a GitHub repository to be used as a Nextflow project, it must\ncontain at least one file named `main.nf` that defines your Nextflow pipeline script.\n\n### Run a specific revision\n\nAny Git branch, tag or commit ID in the GitHub repository can be used to specify a revision,\nthat you want to execute, when running your pipeline by adding the `-r` option to the run command line.\nSo for example you could enter:\n\n nextflow run nextflow-io/hello -r mybranch\n\nor\n\n nextflow run nextflow-io/hello -r v1.1\n\nThis can be very useful when comparing different versions of your project.\nIt also guarantees consistent results in your pipeline as your source code evolves.\n\n### Commands to manage pipelines\n\nThe following commands allows you to perform some basic operations that can be used to manage your pipelines.\nAnyway Nextflow is not meant to replace functionalities provided by the [Git](http://git-scm.com/) tool,\nyou may still need it to create new repositories or commit changes, etc.\n\n#### List available pipelines\n\nThe `ls` command allows you to list all the pipelines you have downloaded in\nyour computer. For example:\n\n nextflow ls\n\nThis prints a list similar to the following one:\n\n cbcrg/piper-nf\n nextflow-io/hello\n\n#### Show pipeline information\n\nBy using the `info` command you can show information from a downloaded pipeline. For example:\n\n $ nextflow info hello\n\nThis command prints:\n\n repo name : nextflow-io/hello\n home page : http://github.com/nextflow-io/hello\n local path : $HOME/.nextflow/assets/nextflow-io/hello\n main script: main.nf\n revisions :\n * master (default)\n mybranch\n v1.1 [t]\n v1.2 [t]\n\nStarting from the top it shows: 1) the repository name; 2) the project home page; 3) the local folder where the pipeline has been downloaded; 4) the script that is executed\nwhen launched; 5) the list of available revisions i.e. branches + tags. Tags are marked with\na `[t]` on the right, the current checked-out revision is marked with a `*` on the left.\n\n#### Pull or update a pipeline\n\nThe `pull` command allows you to download a pipeline from a GitHub repository or to update\nit if that repository has already been downloaded. For example:\n\n nextflow pull nextflow-io/examples\n\nDownloaded pipelines are stored in the folder `$HOME/.nextflow/assets` in your computer.\n\n#### Clone a pipeline into a folder\n\nThe `clone` command allows you to copy a Nextflow pipeline project to a directory of your choice. For example:\n\n nextflow clone nextflow-io/hello target-dir\n\nIf the destination directory is omitted the specified pipeline is cloned to a directory\nwith the same name as the pipeline _base_ name (e.g. `hello`) in the current folder.\n\nThe clone command can be used to inspect or modify the source code of a pipeline. You can\neventually commit and push back your changes by using the usual Git/GitHub workflow.\n\n#### Drop an installed pipeline\n\nDownloaded pipelines can be deleted by using the `drop` command, as shown below:\n\n nextflow drop nextflow-io/hello\n\n### Limitations and known problems\n\n- GitHub private repositories currently are not supported Support for private GitHub repositories has been introduced with version 0.10.0.\n- Symlinks committed in a Git repository are not resolved correctly\n when downloaded/cloned by Nextflow Symlinks are resolved correctly when using Nextflow version 0.11.0 (or higher).\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "git,github,reproducibility" }, { "slug": "2014/using-docker-in-hpc-cluster", "title": "Using Docker for scientific data analysis in an HPC cluster", "date": "2014-11-06T00:00:00.000Z", "content": "\nScientific data analysis pipelines are rarely composed by a single piece of software.\nIn a real world scenario, computational pipelines are made up of multiple stages, each of which\ncan execute many different scripts, system commands and external tools deployed in a hosting computing\nenvironment, usually an HPC cluster.\n\nAs I work as a research engineer in a bioinformatics lab I experience on a daily basis the\ndifficulties related on keeping such a piece of software consistent.\n\nComputing environments can change frequently in order to test new pieces of software or\nmaybe because system libraries need to be updated. For this reason replicating the results\nof a data analysis over time can be a challenging task.\n\n[Docker](http://www.docker.com) has emerged recently as a new type of virtualisation technology that allows one\nto create a self-contained runtime environment. There are plenty of examples\nshowing the benefits of using it to run application services, like web servers\nor databases.\n\nHowever it seems that few people have considered using Docker for the deployment of scientific\ndata analysis pipelines on distributed cluster of computer, in order to simplify the development,\nthe deployment and the replicability of this kind of applications.\n\nFor this reason I wanted to test the capabilities of Docker to solve these problems in the\ncluster available in our [institute](http://www.crg.eu).\n\n## Method\n\nThe Docker engine has been installed in each node of our cluster, that runs a [Univa grid engine](http://www.univa.com/products/grid-engine.php) resource manager.\nA Docker private registry instance has also been installed in our internal network, so that images\ncan be pulled from the local repository in a much faster way when compared to the public\n[Docker registry](http://registry.hub.docker.com).\n\nMoreover the Univa grid engine has been configured with a custom [complex](http://www.gridengine.eu/mangridengine/htmlman5/complex.html)\nresource type. This allows us to request a specific Docker image as a resource type while\nsubmitting a job execution to the cluster.\n\nThe Docker image is requested as a _soft_ resource, by doing that the UGE scheduler\ntries to run a job to a node where that image has already been pulled,\notherwise a lower priority is given to it and it is executed, eventually, by a node where\nthe specified Docker image is not available. This will force the node to pull the required\nimage from the local registry at the time of the job execution.\n\nThis environment has been tested with [Piper-NF](https://github.com/cbcrg/piper-nf), a genomic pipeline for the\ndetection and mapping of long non-coding RNAs.\n\nThe pipeline runs on top of Nextflow, which takes care of the tasks parallelisation and submits\nthe jobs for execution to the Univa grid engine.\n\nThe Piper-NF code wasn't modified in order to run it using Docker.\nNextflow is able to handle it automatically. The Docker containers are run in such a way that\nthe tasks result files are created in the hosting file system, in other\nwords it behaves in a completely transparent manner without requiring extra steps or affecting\nthe flow of the pipeline execution.\n\nIt was only necessary to specify the Docker image (or images) to be used in the Nextflow\nconfiguration file for the pipeline. You can read more about this at [this link](https://www.nextflow.io/docs/latest/docker.html).\n\n## Results\n\nTo benchmark the impact of Docker on the pipeline performance a comparison was made running\nit with and without Docker.\n\nFor this experiment 10 cluster nodes were used. The pipeline execution launches around 100 jobs,\nand it was run 5 times by using the same dataset with and without Docker.\n\nThe average execution time without Docker was 28.6 minutes, while the average\npipeline execution time, running each job in a Docker container, was 32.2 minutes.\nThus, by using Docker the overall execution time increased by something around 12.5%.\n\nIt is important to note that this time includes both the Docker bootstrap time,\nand the time overhead that is added to the task execution by the virtualisation layer.\n\nFor this reason the actual task run time was measured as well i.e. without including the\nDocker bootstrap time overhead. In this case, the aggregate average task execution time was 57.3 minutes\nand 59.5 minutes when running the same tasks using Docker. Thus, the time overhead\nadded by the Docker virtualisation layer to the effective task run time can be estimated\nto around 4% in our test.\n\nKeeping the complete toolset required by the pipeline execution within a Docker image dramatically\nreduced configuration and deployment problems. Also storing these images into the private and\n[public](https://registry.hub.docker.com/repos/cbcrg/) repositories with a unique tag allowed us\nto replicate the results without the usual burden required to set-up an identical computing environment.\n\n## Conclusion\n\nThe fast start-up time for Docker containers technology allows one to virtualise a single process or\nthe execution of a bunch of applications, instead of a complete operating system. This opens up new possibilities,\nfor example the possibility to \"virtualise\" distributed job executions in an HPC cluster of computers.\n\nThe minimal performance loss introduced by the Docker engine is offset by the advantages of running\nyour analysis in a self-contained and dead easy to reproduce runtime environment, which guarantees\nthe consistency of the results over time and across different computing platforms.\n\n#### Credits\n\nThanks to Arnau Bria and the all scientific systems admins team to manage the Docker installation\nin the CRG computing cluster.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "docker,reproducibility,data-analysis,hpc" }, { "slug": "2015/innovation-in-science-the-story-behind-nextflow", "title": "Innovation In Science - The story behind Nextflow", "date": "2015-06-09T00:00:00.000Z", "content": "\nInnovation can be viewed as the application of solutions that meet new requirements or\nexisting market needs. Academia has traditionally been the driving force of innovation.\nScientific ideas have shaped the world, but only a few of them were brought to market by\nthe inventing scientists themselves, resulting in both time and financial loses.\n\nLately there have been several attempts to boost scientific innovation and translation,\nwith most notable in Europe being the Horizon 2020 funding program. The problem with these\ntypes of funding is that they are not designed for PhDs and Postdocs, but rather aim to\npromote the collaboration of senior scientists in different institutions. This neglects two\nvery important facts, first and foremost that most of the Nobel prizes were given for\ndiscoveries made when scientists were in their 20's / 30's (not in their 50's / 60's).\nSecondly, innovation really happens when a few individuals (not institutions) face a\nproblem in their everyday life/work, and one day they just decide to do something about it\n(end-user innovation). Without realizing, these people address a need that many others have.\nThey don’t do it for the money or the glory; they do it because it bothers them!\nMany examples of companies that started exactly this way include Apple, Google, and\nVirgin Airlines.\n\n### The story of Nextflow\n\nSimilarly, Nextflow started as an attempt to solve the every-day computational problems we\nwere facing with “big biomedical data” analyses. We wished that our huge and almost cryptic\nBASH-based pipelines could handle parallelization automatically. In our effort to make that\nhappen we stumbled upon the [Dataflow](http://en.wikipedia.org/wiki/Dataflow_programming)\nprogramming model and Nextflow was created.\nWe were getting furious every time our two-week long pipelines were crashing and we had\nto re-execute them from the beginning. We, therefore, developed a caching system, which\nallows Nextflow to resume any pipeline from the last executed step. While we were really\nenjoying developing a new [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) and\ncreating our own operators, at the same time we were not willing to give up our favorite\nPerl/Python scripts and one-liners, and thus Nextflow became a polyglot.\n\nAnother problem we were facing was that our pipelines were invoking a lot of\nthird-party software, making distribution and execution on different platforms a nightmare.\nOnce again while searching for a solution to this problem, we were able to identify a\nbreakthrough technology [Docker](https://www.docker.com/), which is now revolutionising\ncloud computation. Nextflow has been one of the first framework, that fully\nsupports Docker containers and allows pipeline execution in an isolated and easy to distribute manner.\nOf course, sharing our pipelines with our friends rapidly became a necessity and so we had\nto make Nextflow smart enough to support [Github](https://github.com) and [Bitbucket](https://bitbucket.org/) integration.\n\nI don’t know if Nextflow will make as much difference in the world as the Dataflow\nprogramming model and Docker container technology are making, but it has already made a\nbig difference in our lives and that is all we ever wanted…\n\n### Conclusion\n\nSummarising, it is a pity that PhDs and Postdocs are the neglected engine of Innovation.\nThey are not empowered to innovate, by identifying and addressing their needs, and to\npotentially set up commercial solutions to their problems. This fact becomes even sadder\nwhen you think that only 3% of Postdocs have a chance to become PIs in the UK. Instead more\nand more money is being invested into the senior scientists who only require their PhD students\nand Postdocs to put another step into a well-defined ladder. In todays world it seems that\nideas, such as Nextflow, will only get funded for their scientific value, not as innovative\nconcepts trying to address a need.\n", - "images": [] + "images": [], + "author": "Maria Chatzou", + "tags": "innovation,science,pipelines,nextflow" }, { "slug": "2015/introducing-nextflow-console", @@ -34,70 +42,90 @@ "content": "\nThe latest version of Nextflow introduces a new _console_ graphical interface.\n\nThe Nextflow console is a REPL ([read-eval-print loop](http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop))\nenvironment that allows one to quickly test part of a script or pieces of Nextflow code\nin an interactive manner.\n\nIt is a handy tool that allows one to evaluate fragments of Nextflow/Groovy code\nor fast prototype a complete pipeline script.\n\n### Getting started\n\nThe console application is included in the latest version of Nextflow\n([0.13.1](https://github.com/nextflow-io/nextflow/releases) or higher).\n\nYou can try this feature out, having Nextflow installed on your computer, by entering the\nfollowing command in your shell terminal: `nextflow console `.\n\nWhen you execute it for the first time, Nextflow will spend a few seconds downloading\nthe required runtime dependencies. When complete the console window will appear as shown in\nthe picture below.\n\n\"Nextflow\n\nIt contains a text editor (the top white box) that allows you to enter and modify code snippets.\nThe results area (the bottom yellow box) will show the executed code's output.\n\nAt the top you will find the menu bar (not shown in this picture) and the actions\ntoolbar that allows you to open, save, execute (etc.) the code been tested.\n\nAs a practical execution example, simply copy and paste the following piece of code in the\nconsole editor box:\n\n echo true\n\n process sayHello {\n\n \"\"\"\n echo Hello world\n \"\"\"\n\n }\n\nThen, in order to evaluate it, open the `Script` menu in the top menu bar and select the `Run`\ncommand. Alternatively you can use the `CTRL+R` keyboard shortcut to run it (`⌘+R` on the Mac).\nIn the result box an output similar to the following will appear:\n\n [warm up] executor > local\n [00/d78a0f] Submitted process > sayHello (1)\n Hello world\n\nNow you can try to modify the entered process script, execute it again and check that\nthe printed result has changed.\n\nIf the output doesn't appear, open the `View` menu and make sure that the entry `Capture Standard\nOutput` is selected (it must have a tick on the left).\n\nIt is worth noting that the global script context is maintained across script executions.\nThis means that variables declared in the global script scope are not lost when the\nscript run is complete, and they can be accessed in further executions of the same or another\npiece of code.\n\nIn order to reset the global context you can use the command `Clear Script Context`\navailable in the `Script` menu.\n\n### Conclusion\n\nThe Nextflow console is a REPL environment which allows you to experiment and get used\nto the Nextflow programming environment. By using it you can prototype or test your code\nwithout the need to create/edit script files.\n\nNote: the Nextflow console is implemented by sub-classing the [Groovy console](http://groovy-lang.org/groovyconsole.html) tool.\nFor this reason you may find some labels that refer to the Groovy programming environment\nin this program.\n", "images": [ "/img/nextflow-console1.png" - ] + ], + "author": "Paolo Di Tommaso", + "tags": "data-analysis,pipelines,repl,groovy" }, { "slug": "2015/mpi-like-execution-with-nextflow", "title": "MPI-like distributed execution with Nextflow", "date": "2015-11-13T00:00:00.000Z", "content": "\nThe main goal of Nextflow is to make workflows portable across different\ncomputing platforms taking advantage of the parallelisation features provided\nby the underlying system without having to reimplement your application code.\n\nFrom the beginning Nextflow has included executors designed to target the most popular\nresource managers and batch schedulers commonly used in HPC data centers,\nsuch as [Univa Grid Engine](http://www.univa.com), [Platform LSF](http://www.ibm.com/systems/platformcomputing/products/lsf/),\n[SLURM](https://computing.llnl.gov/linux/slurm/), [PBS](http://www.pbsworks.com/Product.aspx?id=1) and [Torque](http://www.adaptivecomputing.com/products/open-source/torque/).\n\nWhen using one of these executors Nextflow submits the computational workflow tasks\nas independent job requests to the underlying platform scheduler, specifying\nfor each of them the computing resources needed to carry out its job.\n\nThis approach works well for workflows that are composed of long running tasks, which\nis the case of most common genomic pipelines.\n\nHowever this approach does not scale well for workloads made up of a large number of\nshort-lived tasks (e.g. a few seconds or sub-seconds). In this scenario the resource\nmanager scheduling time is much longer than the actual task execution time, thus resulting\nin an overall execution time that is much longer than the real execution time.\nIn some cases this represents an unacceptable waste of computing resources.\n\nMoreover supercomputers, such as [MareNostrum](https://www.bsc.es/marenostrum-support-services/mn3)\nin the [Barcelona Supercomputer Center (BSC)](https://www.bsc.es/), are optimized for\nmemory distributed applications. In this context it is needed to allocate a certain\namount of computing resources in advance to run the application in a distributed manner,\ncommonly using the [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) standard.\n\nIn this scenario, the Nextflow execution model was far from optimal, if not unfeasible.\n\n### Distributed execution\n\nFor this reason, since the release 0.16.0, Nextflow has implemented a new distributed execution\nmodel that greatly improves the computation capability of the framework. It uses [Apache Ignite](https://ignite.apache.org/),\na lightweight clustering engine and in-memory data grid, which has been recently open sourced\nunder the Apache software foundation umbrella.\n\nWhen using this feature a Nextflow application is launched as if it were an MPI application.\nIt uses a job wrapper that submits a single request specifying all the needed computing\nresources. The Nextflow command line is executed by using the `mpirun` utility, as shown in the\nexample below:\n\n #!/bin/bash\n #$ -l virtual_free=120G\n #$ -q \n #$ -N \n #$ -pe ompi \n mpirun --pernode nextflow run -with-mpi [pipeline parameters]\n\nThis tool spawns a Nextflow instance in each of the computing nodes allocated by the\ncluster manager.\n\nEach Nextflow instance automatically connects with the other peers creating an _private_\ninternal cluster, thanks to the Apache Ignite clustering feature that\nis embedded within Nextflow itself.\n\nThe first node becomes the application driver that manages the execution of the\nworkflow application, submitting the tasks to the remaining nodes that act as workers.\n\nWhen the application is complete, the Nextflow driver automatically shuts down the\nNextflow/Ignite cluster and terminates the job execution.\n\n![Nextflow distributed execution](/img/nextflow-distributed-execution.png)\n\n### Conclusion\n\nIn this way it is possible to deploy a Nextflow workload in a supercomputer using an\nexecution strategy that resembles the MPI distributed execution model. This doesn't\nrequire to implement your application using the MPI api/library and it allows you to\nmaintain your code portable across different execution platforms.\n\nAlthough we do not currently have a performance comparison between a Nextflow distributed\nexecution and an equivalent MPI application, we assume that the latter provides better\nperformance due to its low-level optimisation.\n\nNextflow, however, focuses on the fast prototyping of scientific applications in a portable\nmanner while maintaining the ability to scale and distribute the application workload in an\nefficient manner in an HPC cluster.\n\nThis allows researchers to validate an experiment, quickly, reusing existing tools and\nsoftware components. This eventually makes it possible to implement an optimised version\nusing a low-level programming language in the second stage of a project.\n\nRead the documentation to learn more about the [Nextflow distributed execution model](https://www.nextflow.io/docs/latest/ignite.html#execution-with-mpi).\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "mpi,hpc,pipelines,genomic" }, { "slug": "2015/the-impact-of-docker-on-genomic-pipelines", "title": "The impact of Docker containers on the performance of genomic pipelines", "date": "2015-06-15T00:00:00.000Z", "content": "\nIn a recent publication we assessed the impact of Docker containers technology\non the performance of bioinformatic tools and data analysis workflows.\n\nWe benchmarked three different data analyses: a RNA sequence pipeline for gene expression,\na consensus assembly and variant calling pipeline, and finally a pipeline for the detection\nand mapping of long non-coding RNAs.\n\nWe found that Docker containers have only a minor impact on the performance\nof common genomic data analysis, which is negligible when the executed tasks are demanding\nin terms of computational time.\n\n_[This publication is available as PeerJ preprint at this link](https://peerj.com/preprints/1171/)._\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "docker,reproducibility,pipelines,nextflow,genomic" }, { "slug": "2016/best-practice-for-reproducibility", "title": "Workflows & publishing: best practice for reproducibility", "date": "2016-04-13T00:00:00.000Z", "content": "\nPublication time acts as a snapshot for scientific work. Whether a project is ongoing\nor not, work which was performed months ago must be described, new software documented,\ndata collated and figures generated.\n\nThe monumental increase in data and pipeline complexity has led to this task being\nperformed to many differing standards, or [lack of thereof](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080278).\nWe all agree it is not good enough to simply note down the software version number.\nBut what practical measures can be taken?\n\nThe recent publication describing _Kallisto_ [(Bray et al. 2016)](https://doi.org/10.1038/nbt.3519)\nprovides an excellent high profile example of the growing efforts to ensure reproducible\nscience in computational biology. The authors provide a GitHub [repository](https://github.com/pachterlab/kallisto_paper_analysis)\nthat _“contains all the analysis to reproduce the results in the kallisto paper”_.\n\nThey should be applauded and indeed - in the Twittersphere - they were. The corresponding\nauthor Lior Pachter stated that the publication could be reproduced starting from raw\nreads in the NCBI Sequence Read Archive through to the results, which marks a fantastic\naccomplishment.\n\n

Hoping people will notice https://t.co/qiu3LFozMX by @yarbsalocin @hjpimentel @pmelsted reproducing ALL the #kallisto paper from SRA→results

— Lior Pachter (@lpachter) April 5, 2016
\n\n\nThey achieve this utilising the workflow framework [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home).\nIncreasingly, we are seeing scientists applying workflow frameworks to their pipelines,\nwhich is great to see. There is a learning curve, but I have personally found the payoffs\nin productivity to be immense.\n\nAs both users and developers of Nextflow, we have long discussed best practice to ensure\nreproducibility of our work. As a community, we are at the beginning of that conversation\n\n- there are still many ideas to be aired and details ironed out - nevertheless we wished\n to provide a _state-of-play_ as we see it and to describe what is possible with Nextflow\n in this regard.\n\n### Guaranteed Reproducibility\n\nThis is our goal. It is one thing for a pipeline to be able to be reproduced in your own\nhands, on your machine, yet is another for this to be guaranteed so that anyone anywhere\ncan reproduce it. What I mean by guaranteed is that when a given pipeline is executed,\nthere is only one result which can be output.\nEnvisage what I term the _reproducibility triangle_: consisting of data, code and\ncompute environment.\n\n![Reproducibility Triangle](/img/reproducibility-triangle.png)\n\n**Figure 1:** The Reproducibility Triangle. _Data_: raw data such as sequencing reads,\ngenomes and annotations but also metadata such as experimental design. _Code_:\nscripts, binaries and libraries/dependencies. _Environment_: operating system.\n\nIf there is any change to one of these then the reproducibililty is no longer guaranteed.\nFor years there have been solutions to each of these individual components. But they have\nlived a somewhat discrete existence: data in databases such as the SRA and Ensembl, code\non GitHub and compute environments in the form of virtual machines. We think that in the\nfuture science must embrace solutions that integrate each of these components natively and\nholistically.\n\n### Implementation\n\nNextflow provides a solution to reproduciblility through version control and sandboxing.\n\n#### Code\n\nVersion control is provided via [native integration with GitHub](https://www.nextflow.io/docs/latest/sharing.html)\nand other popular code management platforms such as Bitbucket and GitLab.\nPipelines can be pulled, executed, developed, collaborated on and shared. For example,\nthe command below will pull a specific version of a [simple Kallisto + Sleuth pipeline](https://github.com/cbcrg/kallisto-nf)\nfrom GitHub and execute it. The `-r` parameter can be used to specify a specific tag, branch\nor revision that was previously defined in the Git repository.\n\n nextflow run cbcrg/kallisto-nf -r v0.9\n\n#### Environment\n\nSandboxing during both development and execution is another key concept; version control\nalone does not ensure that all dependencies nor the compute environment are the same.\n\nA simplified implementation of this places all binaries, dependencies and libraries within\nthe project repository. In Nextflow, any binaries within the the `bin` directory of a\nrepository are added to the path. Also, within the Nextflow [config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config),\nenvironmental variables such as `PERL5LIB` can be defined so that they are automatically\nadded during the task executions.\n\nThis can be taken a step further with containerisation such as [Docker](https://www.nextflow.io/docs/latest/docker.html).\nWe have recently published [work](https://doi.org/10.7717/peerj.1273) about this:\nbriefly a [dockerfile](https://github.com/cbcrg/kallisto-nf/blob/master/Dockerfile)\ncontaining the instructions on how to build the docker image resides inside a repository.\nThis provides a specification for the operating system, software, libraries and\ndependencies to be run.\n\nThe images themself also have content-addressable identifiers in the form of\n[digests](https://docs.docker.com/engine/userguide/containers/dockerimages/#image-digests),\nwhich ensure not a single byte of information, from the operating system through to the\nlibraries pulled from public repos, has been changed. This container digest can be specified\nin the [pipeline config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config).\n\n process {\n container = \"cbcrg/kallisto-nf@sha256:9f84012739...\"\n }\n\nWhen doing so Nextflow automatically pulls the specified image from the Docker Hub and\nmanages the execution of the pipeline tasks from within the container in a transparent manner,\ni.e. without having to adapt or modify your code.\n\n#### Data\n\nData is currently one of the more challenging aspect to address. _Small data_ can be\neasily version controlled within git-like repositories. For larger files\nthe [Git Large File Storage](https://git-lfs.github.com/), for which Nextflow provides\nbuilt-in support, may be one solution. Ultimately though, the real home of scientific data\nis in publicly available, programmatically accessible databases.\n\nProviding out-of-box solutions is difficult given the hugely varying nature of the data\nand meta-data within these databases. We are currently looking to incorporate the most\nhighly used ones, such as the [SRA](http://www.ncbi.nlm.nih.gov/sra) and [Ensembl](http://www.ensembl.org/index.html).\nIn the long term we have an eye on initiatives, such as [NCBI BioProject](https://www.ncbi.nlm.nih.gov/bioproject/),\nwith the idea there is a single identifier for both the data and metadata that can be referenced in a workflow.\n\nAdhering to the practices above, one could imagine one line of code which would appear within a publication.\n\n nextflow run [user/repo] -r [version] --data[DB_reference:data_reference] -with-docker\n\nThe result would be guaranteed to be reproduced by whoever wished.\n\n### Conclusion\n\nWith this approach the reproducilbility triangle is complete. But it must be noted that\nthis does not guard against conceptual or implementation errors. It does not replace proper\ndocumentation. What it does is to provide transparency to a result.\n\nThe assumption that the deterministic nature of computation makes results insusceptible\nto irreproducbility is clearly false. We consider Nextflow with its other features such\nits polyglot nature, out-of-the-box portability and native support across HPC and Cloud\nenvironments to be an ideal solution in our everyday work. We hope to see more scientists\nadopt this approach to their workflows.\n\nThe recent efforts by the _Kallisto_ authors highlight the appetite for increasing these\nstandards and we encourage the community at large to move towards ensuring this becomes\nthe normal state of affairs for publishing in science.\n\n### References\n\nBray, Nicolas L., Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology, April. Nature Publishing Group. doi:10.1038/nbt.3519.\n\nDi Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. (2015) \"The impact of Docker containers on the performance of genomic pipelines.\" PeerJ 3:e1273 doi.org:10.7717/peerj.1273.\n\nGarijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) \"Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.\" PLoS ONE 8(11): e80278. doi:10.1371/journal.pone.0080278\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,docker" }, { "slug": "2016/deploy-in-the-cloud-at-snap-of-a-finger", "title": "Deploy your computational pipelines in the cloud at the snap-of-a-finger", "date": "2016-09-01T00:00:00.000Z", - "content": "\n

\nLearn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers\n

\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\n

\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n

\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n- The AWS access and secret keys are provided by using the usual environment variables.\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\nWe are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image `ami-43f49030` `ami-4b7daa32`\\* (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the documentation for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", - "images": [] + "content": "\n

\nLearn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers\n

\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\n

\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n

\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\nWe are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image `ami-43f49030` `ami-4b7daa32`\\* (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the documentation for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", + "images": [], + "author": "Paolo Di Tommaso", + "tags": "aws,cloud,pipelines,nextflow,genomic,docker" }, { "slug": "2016/developing-bioinformatics-pipeline-across-multiple-environments", "title": "Developing a bioinformatics pipeline across multiple environments", "date": "2016-02-04T00:00:00.000Z", "content": "\nAs a new bioinformatics student with little formal computer science training, there are\nfew things that scare me more than PhD committee meetings and having to run my code in a\ncompletely different operating environment.\n\nRecently my work landed me in the middle of the phylogenetic tree jungle and the computational\nrequirements of my project far outgrew the resources that were available on our institute’s\n[Univa Grid Engine](https://en.wikipedia.org/wiki/Univa_Grid_Engine) based cluster. Luckily for me,\nan opportunity arose to participate in a joint program at the MareNostrum HPC at the\n[Barcelona Supercomputing Centre](http://www.bsc.es) (BSC).\n\nAs one of the top 100 supercomputers in the world, the [MareNostrum III](https://www.bsc.es/discover-bsc/the-centre/marenostrum)\ndwarfs our cluster and consists of nearly 50'000 processors. However it soon became apparent\nthat with great power comes great responsibility and in the case of the BSC, great restrictions.\nThese include no internet access, restrictive wall times for jobs, longer queues,\nfewer pre-installed binaries and an older version of bash. Faced with the possibility of\nhaving to rewrite my 16 bodged scripts for another queuing system I turned to Nextflow.\n\nStraight off the bat I was able to reduce all my previous scripts to a single Nextflow script.\nAdmittedly, the original code was not great, but the data processing model made me feel confident\nin what I was doing and I was able to reduce the volume of code to 25% of its initial amount\nwhilst making huge improvements in the readability. The real benefits however came from the portability.\n\nI was able to write the project on my laptop (Macbook Air), continuously test it on my local\ndesktop machine (Linux) and then perform more realistic heavy lifting runs on the cluster,\nall managed from a single GitHub repository. The BSC uses the [Load Sharing Facility](https://en.wikipedia.org/wiki/Platform_LSF)\n(LSF) platform with longer queue times, but a large number of CPUs. My project on the other\nhand had datasets that require over 100'000 tasks, but the tasks processes themselves run\nfor a matter of seconds or minutes. We were able to marry these two competing interests\ndeploying Nextflow in a [distributed execution manner that resemble the one of an MPI application](/blog/2015/mpi-like-execution-with-nextflow.html).\n\nIn this configuration, the queuing system allocates the Nextflow requested resources and\nusing the embedded [Apache Ignite](https://ignite.apache.org/) clustering engine, Nextflow handles\nthe submission of processes to the individual nodes.\n\nHere is some examples of how to run the same Nextflow project over multiple platforms.\n\n#### Local\n\nIf I wished to launch a job locally I can run it with the command:\n\n nextflow run myproject.nf\n\n#### Univa Grid Engine (UGE)\n\nFor the UGE I simply needed to specify the following in the `nextflow.config` file:\n\n process {\n executor='uge'\n queue='my_queue'\n }\n\nAnd then launch the pipeline execution as we did before:\n\n nextflow run myproject.nf\n\n#### Load Sharing Facility (LSF)\n\nFor running the same pipeline in the MareNostrum HPC environment, taking advantage of the MPI\nstandard to deploy my workload, I first created a wrapper script (for example `bsc-wrapper.sh`)\ndeclaring the resources that I want to reserve for the pipeline execution:\n\n #!/bin/bash\n #BSUB -oo logs/output_%J.out\n #BSUB -eo logs/output_%J.err\n #BSUB -J myProject\n #BSUB -q bsc_ls\n #BSUB -W 2:00\n #BSUB -x\n #BSUB -n 512\n #BSUB -R \"span[ptile=16]\"\n export NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)\n mpirun --pernode bin/nextflow run concMSA.nf -with-mpi\n\nAnd then can execute it using `bsub` as shown below:\n\n bsub < bsc-wrapper.sh\n\nBy running Nextflow in this way and given the wrapper above, a single `bsub` job will run\non 512 cores in 32 computing nodes (512/16 = 32) with a maximum wall time of 2 hours.\nThousands of Nextflow processes can be spawned during this and the execution can be monitored\nin the standard manner from a single Nextflow output and error files. If any errors occur\nthe execution can of course to continued with [`-resume` command line option](/docs/latest/getstarted.html?highlight=resume#modify-and-resume).\n\n### Conclusion\n\nNextflow provides a simplified way to develop across multiple platforms and removes\nmuch of the overhead associated with running niche, user developed pipelines in an HPC\nenvironment.\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,hpc" }, { "slug": "2016/docker-for-dunces-nextflow-for-nunces", "title": "Docker for dunces & Nextflow for nunces", "date": "2016-06-10T00:00:00.000Z", "content": "\n_Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution._\n\nModern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.\n\nDuring several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.\n\nCreating the Nextflow pipeline ([here](http://www.github.com/cbcrg/lncrna-annotation-nf)) in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.\n\nThe pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the _long_ dependency tree. The solution was to turn to [Docker](https://www.docker.com/) containers.\n\nI have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.\n\n###Docker Installation\n\nBy far the most challenging issue is the installation of Docker. For local installations, the [process is relatively straight forward](https://docs.docker.com/engine/installation). However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with [updates addressing these issues](https://blog.docker.com/2016/02/docker-engine-1-10-security/).\n\nThat being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.\n\nIf you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.\n\n docker pull cbcrg/lncrna_annotation\n nextflow run cbcrg/lncrna-annotation-nf -profile test\n\n[If the following does not work, there could be a problem with your Docker installation.]\n\nThe first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and\nrun it using the test data included with it.\n\n###The Dockerfile\n\nThe `Dockerfile` contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules.\n\nWe begin by creating a file `Dockerfile` in the Nextflow project directory. The Dockerfile begins with:\n\n # Set the base image to debian jessie\n FROM debian:jessie\n\n # File Author / Maintainer\n MAINTAINER Evan Floden \n\nThis sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.\n\nNext we update the repository sources and install some essential tools such as `wget` and `perl`.\n\n RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...\n\nNotice that we use the command `RUN` before each line. The `RUN` instruction executes commands as if they are performed from the Linux shell.\n\nAlso is good practice to group as many as possible commands in the same `RUN` statement. This reduces the size of the final Docker image. See [here](https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/) for these details and [here](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/) for more best practices.\n\nNext we can specify the install of the required perl modules using [cpan minus](http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo):\n\n # Install perl modules\n RUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...\n\nWe can give the instructions to download and install software from GitHub using:\n\n # Install Star Mapper\n RUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR\n\nWe can add custom Perl modules and specify environmental variables such as `PERL5LIB` as below:\n\n # Install FEELnc\n RUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\n ENV FEELNCPATH /FEELnc\n ENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/\n\nR and R libraries can be installed as follows:\n\n # Install R\n RUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\n apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\n apt-get update --fix-missing && \\\n apt-get -y install r-base\n\n # Install R libraries\n RUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'\n\nFor the complete working Dockerfile of this project see [here](https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile)\n\n###Building the Docker Image\n\nOnce we start working on the Dockerfile, we can build it anytime using:\n\n docker build -t skptic/lncRNA_annotation .\n\nThis builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.\n\n###Testing the Docker Image\n\nWe find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:\n\n docker run -ti lncrna_annotation\n\n cd FEELnc/test\n\n FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n > -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\n exit # remember to exit the Docker image\n\n###Tagging the Docker Image\n\nOnce you are confident your image is built correctly, you can tag it, allowing you to push it to [Dockerhub.io](https://hub.docker.com/). Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.\n\nYou can view the images in your local repository with the `docker images` command and tag using `docker tag` with the image ID and the name.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n docker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest\n\nNow when we check our local images we can see the updated tag.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n cbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n###Pushing the Docker Image to Dockerhub\n\nIf you have not previously, sign up for a Dockerhub account [here](https://hub.docker.com/). From the command line, login to Dockerhub and push your image.\n\n docker login --username=cbcrg\n docker push cbcrg/lncrna_annotation\n\nYou can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:\n\n docker rmi -f d8ec49cbe3ed\n\n # Ensure the local version is not listed.\n docker images\n\n docker pull cbcrg/lncrna_annotation\n\nWe are now almost ready to run our pipeline. The last step is to set up the Nexflow config.\n\n###Nextflow Configuration\n\nWithin the `nextflow.config` file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:\n\n- General (same docker image for all processes):\n\n process {\n container = 'cbcrg/lncrna_annotation'\n }\n\n- Specific to a profile (specified by `-profile crg` for example):\n\n profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n\n- Specific to a given process within a pipeline:\n\n $processName.container = 'cbcrg/lncrna_annotation'\n\nIn most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have [previously written about this](https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html), but briefly, including a hash ensures that not a single byte of the operating system or software is different.\n\n process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }\n\nAll that is left now to run the pipeline.\n\n nextflow run lncRNA-Annotation-nf -profile test\n\nWhilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.\n\n###CircleCI and Nextflow\n\nNow that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the [CircleCI](http://www.circleci.com) servers (using Docker).\n\nTo include CircleCI in the Nexflow pipeline, create a file named `circle.yml` in the project directory. We add the following instructions to the file:\n\n machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\n dependencies:\n override:\n\n test:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test\n\nNext you can sign up to CircleCI, linking your GitHub account.\n\nWithin the GitHub README.md you can add a badge with the following:\n\n ![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)\n\n###Tips and Tricks\n\n**File permissions**: When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:\n\nfind -type f -exec chmod 644 {} \\;\nfind -type d -exec chmod 755 {} \\;\n\n###Summary\nThis was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform.\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,docker" }, { "slug": "2016/enabling-elastic-computing-nextflow", "title": "Enabling elastic computing with Nextflow", "date": "2016-10-19T00:00:00.000Z", "content": "\n

\nLearn how to deploy an elastic computing cluster in the AWS cloud with Nextflow \n

\n\nIn the [previous post](/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html) I introduced\nthe new cloud native support for AWS provided by Nextflow.\n\nIt allows the creation of a computing cluster in the cloud in a no-brainer way, enabling\nthe deployment of complex computational pipelines in a few commands.\n\nThis solution is characterised by using a lean application stack which does not\nrequire any third party component installed in the EC2 instances other than a Java VM and the\nDocker engine (the latter it's only required in order to deploy pipeline binary dependencies).\n\n![Nextflow cloud deployment](/img/cloud-deployment.png)\n\nEach EC2 instance runs a script, at bootstrap time, that mounts the [EFS](https://aws.amazon.com/efs/)\nstorage and downloads and launches the Nextflow cluster daemon. This daemon is self-configuring,\nit automatically discovers the other running instances and joins them forming the computing cluster.\n\nThe simplicity of this stack makes it possible to setup the cluster in the cloud in just a few minutes,\na little more time than is required to spin up the EC2 VMs. This time does not depend on\nthe number of instances launched, as they configure themself independently.\n\nThis also makes it possible to add or remove instances as needed, realising the [long promised\nelastic scalability](http://www.nextplatform.com/2016/09/21/three-great-lies-cloud-computing/)\nof cloud computing.\n\nThis ability is even more important for bioinformatic workflows, which frequently crunch\nnot homogeneous datasets and are composed of tasks with very different computing requirements\n(eg. a few very long running tasks and many short-lived tasks in the same workload).\n\n### Going elastic\n\nThe Nextflow support for the cloud features an elastic cluster which is capable of resizing itself\nto adapt to the actual computing needs at runtime, thus spinning up new EC2 instances when jobs\nwait for too long in the execution queue, or terminating instances that are not used for\na certain amount of time.\n\nIn order to enable the cluster autoscaling you will need to specify the autoscale\nproperties in the `nextflow.config` file. For example:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.xlarge'\n\n autoscale {\n enabled = true\n minInstances = 5\n maxInstances = 10\n }\n}\n```\n\nThe above configuration enables the autoscaling features so that the cluster will include\nat least 5 nodes. If at any point one or more tasks spend more than 5 minutes without being\nprocessed, the number of instances needed to fullfil the pending tasks, up to limit specified\nby the `maxInstances` attribute, are launched. On the other hand, if these instances are\nidle, they are terminated before reaching the 60 minutes instance usage boundary.\n\nThe autoscaler launches instances by using the same AMI ID and type specified in the `cloud`\nconfiguration. However it is possible to define different attributes as shown below:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.large'\n\n autoscale {\n enabled = true\n maxInstances = 10\n instanceType = 'm4.2xlarge'\n spotPrice = 0.05\n }\n}\n```\n\nThe cluster is first created by using instance(s) of type `m4.large`. Then, when new\ncomputing nodes are required the autoscaler launches instances of type `m4.2xlarge`.\nAlso, since the `spotPrice` attribute is specified, [EC2 spot](https://aws.amazon.com/ec2/spot/)\ninstances are launched, instead of regular on-demand ones, bidding for the price specified.\n\n### Conclusion\n\nNextflow implements an easy though effective cloud scheduler that is able to scale dynamically\nto meet the computing needs of deployed workloads taking advantage of the _elastic_ nature\nof the cloud platform.\n\nThis ability, along the support for spot/preemptible instances, allows a cost effective solution\nfor the execution of your pipeline in the cloud.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "aws,cloud,pipelines,nextflow,genomic,docker" }, { "slug": "2016/error-recovery-and-automatic-resources-management", "title": "Error recovery and automatic resource management with Nextflow", "date": "2016-02-11T00:00:00.000Z", "content": "\nRecently a new feature has been added to Nextflow that allows failing jobs to be rescheduled,\nautomatically increasing the amount of computational resources requested.\n\n## The problem\n\nNextflow provides a mechanism that allows tasks to be automatically re-executed when\na command terminates with an error exit status. This is useful to handle errors caused by\ntemporary or even permanent failures (i.e. network hiccups, broken disks, etc.) that\nmay happen in a cloud based environment.\n\nHowever in an HPC cluster these events are very rare. In this scenario\nerror conditions are more likely to be caused by a peak in computing resources, allocated\nby a job exceeding the original resource requested. This leads to the batch scheduler\nkilling the job which in turn stops the overall pipeline execution.\n\nIn this context automatically re-executing the failed task is useless because it\nwould simply replicate the same error condition. A common solution consists of increasing\nthe resource request for the needs of the most consuming job, even though this will result\nin a suboptimal allocation of most of the jobs that are less resource hungry.\n\nMoreover it is also difficult to predict such upper limit. In most cases the only way to\ndetermine it is by using a painful fail-and-retry approach.\n\nTake in consideration, for example, the following Nextflow process:\n\n process align {\n executor 'sge'\n memory 1.GB\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nThe above definition will execute as many jobs as there are fasta files emitted\nby the `sequences` channel. Since the `retry` _error strategy_ is specified, if the\ntask returns a non-zero error status, Nextflow will reschedule the job execution requesting\nthe same amount of memory and disk storage. In case the error is generated by `t_coffee` that\nit needs more than one GB of memory for a specific alignment, the task will continue to fail,\nstopping the pipeline execution as a consequence.\n\n## Increase job resources automatically\n\nA better solution can be implemented with Nextflow which allows resources to be defined in\na dynamic manner. By doing this it is possible to increase the memory request when\nrescheduling a failing task execution. For example:\n\n process align {\n executor 'sge'\n memory { 1.GB * task.attempt }\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nIn the above example the memory requirement is defined by using a dynamic rule.\nThe `task.attempt` attribute represents the current task attempt (`1` the first time the task\nis executed, `2` the second and so on).\n\nThe task will then request one GB of memory. In case of an error it will be rescheduled\nrequesting 2 GB and so on, until it is executed successfully or the limit of times a task\ncan be retried is reached, forcing the termination of the pipeline.\n\nIt is also possible to define the `errorStrategy` directive in a dynamic manner. This\nis useful to re-execute failed jobs only if a certain condition is verified.\n\nFor example the Univa Grid Engine batch scheduler returns the exit status `140` when a job\nis terminated because it's using more resources than the ones requested.\n\nBy checking this exit status we can reschedule only the jobs that fail by exceeding the\nresources allocation. This can be done with the following directive declaration:\n\n errorStrategy { task.exitStatus == 140 ? 'retry' : 'terminate' }\n\nIn this way a failed task is rescheduled only when it returns the `140` exit status.\nIn all other cases the pipeline execution is terminated.\n\n## Conclusion\n\nNextflow provides a very flexible mechanism for defining the job resource request and\nhandling error events. It makes it possible to automatically reschedule failing tasks under\ncertain conditions and to define job resource requests in a dynamic manner so that they\ncan be adapted to the actual job's needs and to optimize the overall resource utilisation.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "bioinformatics,pipelines,nextflow,hpc" }, { "slug": "2016/more-fun-containers-hpc", "title": "More fun with containers in HPC", "date": "2016-12-20T00:00:00.000Z", "content": "\nNextflow was one of the [first workflow framework](https://www.nextflow.io/blog/2014/nextflow-meets-docker.html)\nto provide built-in support for Docker containers. A couple of years ago we also started\nto experiment with the deployment of containerised bioinformatic pipelines at CRG,\nusing Docker technology (see [here](<(https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html)>) and [here](https://www.nextplatform.com/2016/01/28/crg-goes-with-the-genomics-flow/)).\n\nWe found that by isolating and packaging the complete computational workflow environment\nwith the use of Docker images, radically simplifies the burden of maintaining complex\ndependency graphs of real workload data analysis pipelines.\n\nEven more importantly, the use of containers enables replicable results with minimal effort\nfor the system configuration. The entire computational environment can be archived in a\nself-contained executable format, allowing the replication of the associated analysis at\nany point in time.\n\nThis ability is the main reason that drove the rapid adoption of Docker in the bioinformatic\ncommunity and its support in many projects, like for example [Galaxy](https://galaxyproject.org),\n[CWL](http://commonwl.org), [Bioboxes](http://bioboxes.org), [Dockstore](https://dockstore.org) and many others.\n\nHowever, while the popularity of Docker spread between the developers, its adaption in\nresearch computing infrastructures continues to remain very low and it's very unlikely\nthat this trend will change in the future.\n\nThe reason for this resides in the Docker architecture, which requires a daemon running\nwith root permissions on each node of a computing cluster. Such a requirement raises many\nsecurity concerns, thus good practices would prevent its use in shared HPC cluster or\nsupercomputer environments.\n\n### Introducing Singularity\n\nAlternative implementations, such as [Singularity](http://singularity.lbl.gov), have\nfortunately been promoted by the interested in containers technology.\n\nSingularity is a containers engine developed at the Berkeley Lab and designed for the\nneeds of scientific workloads. The main differences with Docker are: containers are file\nbased, no root escalation is allowed nor root permission is needed to run a container\n(although a privileged user is needed to create a container image), and there is no\nseparate running daemon.\n\nThese, along with other features, such as support for autofs mounts, makes Singularity a\ncontainer engine better suited to the requirements of HPC clusters and supercomputers.\n\nMoreover, although Singularity uses a container image format different to that of Docker,\nthey provide a conversion tool that allows Docker images to be converted to the\nSingularity format.\n\n### Singularity in the wild\n\nWe integrated Singularity support in Nextflow framework and tested it in the CRG\ncomputing cluster and the BSC [MareNostrum](https://www.bsc.es/discover-bsc/the-centre/marenostrum) supercomputer.\n\nThe absence of a separate running daemon or image gateway made the installation\nstraightforward when compared to Docker or other solutions.\n\nTo evaluate the performance of Singularity we carried out the [same benchmarks](https://peerj.com/articles/1273/)\nwe performed for Docker and compared the results of the two engines.\n\nThe benchmarks consisted in the execution of three Nextflow based genomic pipelines:\n\n1. [Rna-toy](https://github.com/nextflow-io/rnatoy/tree/peerj5515): a simple pipeline for RNA-Seq data analysis.\n2. [Nmdp-Flow](https://github.com/nextflow-io/nmdp-flow/tree/peerj5515/): an assembly-based variant calling pipeline.\n3. [Piper-NF](https://github.com/cbcrg/piper-nf/tree/peerj5515): a pipeline for the detection and mapping of long non-coding RNAs.\n\nIn order to repeat the analyses, we converted the container images we used to perform\nthe Docker benchmarks to Singularity image files by using the [docker2singularity](https://github.com/singularityware/docker2singularity) tool\n_(this is not required anymore, see the update below)_.\n\nThe only change needed to run these pipelines with Singularity was to replace the Docker\nspecific settings with the following ones in the configuration file:\n\n singularity.enabled = true\n process.container = ''\n\nEach pipeline was executed 10 times, alternately by using Docker and Singularity as\ncontainer engine. The results are shown in the following table (time in minutes):\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PipelineTasksMean task timeMean execution timeExecution time std devRatio
  SingularityDockerSingularityDockerSingularityDocker 
RNA-Seq973.773.6663.6662.32.03.10.998
Variant call4822.122.41061.21074.443.138.51.012
Piper-NF981.21.3120.0124.56.9 2.81.038
\n\nThe benchmark results show that there isn't any significative difference in the\nexecution times of containerised workflows between Docker and Singularity. In two\ncases Singularity was slightly faster and a third one it was almost identical although\na little slower than Docker.\n\n### Conclusion\n\nIn our evaluation Singularity proved to be an easy to install,\nstable and performant container engine.\n\nThe only minor drawback, we found when compared to Docker, was the need to define the\nhost path mount points statically when the Singularity images were created. In fact,\neven if Singularity supports user mount points to be defined dynamically when the\ncontainer is launched, this feature requires the overlay file system which was not\nsupported by the kernel available in our system.\n\nDocker surely will remain the _de facto_ standard engine and image format for containers\ndue to its popularity and [impressive growth](http://www.coscale.com/blog/docker-usage-statistics-increased-adoption-by-enterprises-and-for-production-use).\n\nHowever, in our opinion, Singularity is the tool of choice for the execution of\ncontainerised workloads in the context of HPC, thanks to its focus on system security\nand its simpler architectural design.\n\nThe transparent support provided by Nextflow for both Docker and Singularity technology\nguarantees the ability to deploy your workflows in a range of different platforms (cloud,\ncluster, supercomputer, etc). Nextflow transparently manages the deployment of the\ncontainerised workload according to the runtime available in the target system.\n\n#### Credits\n\nThanks to Gabriel Gonzalez (CRG), Luis Exposito (CRG) and Carlos Tripiana Montes (BSC)\nfor the support installing Singularity.\n\n**Update** Singularity, since version 2.3.x, is able to pull and run Docker images from the Docker Hub.\nThis greatly simplifies the interoperability with existing Docker containers. You only need\nto prefix the image name with the `docker://` pseudo-protocol to download it as a Singularity image,\nfor example:\n\n singularity pull --size 1200 docker://nextflow/rnatoy\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "aws,pipelines,nextflow,genomic,docker,singularity" }, { "slug": "2017/caw-and-singularity", @@ -106,42 +134,54 @@ "content": "\nThis is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max\ndescribes how they deploy complex cancer data analysis pipelines using Nextflow\nand Singularity. We are very happy to share their experience across the Nextflow community.\n\n### The CAW pipeline\n\n\"Cancer\n\n[Cancer Analysis Workflow](http://opensource.scilifelab.se/projects/sarek/) (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs.\nIt is developed in collaboration with two infrastructures within [Science for Life Laboratory](https://www.scilifelab.se/): [National Genomics Infrastructure](https://ngisweden.scilifelab.se/) (NGI), in The Stockholm [Genomics Applications Development Facility](https://www.scilifelab.se/facilities/ngi-stockholm/) to be precise and [National Bioinformatics Infrastructure Sweden](https://www.nbis.se/) (NBIS).\n\nCAW is based on [GATK Best Practices](https://software.broadinstitute.org/gatk/best-practices/) for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels ([MuTect1](https://github.com/broadinstitute/mutect/), [MuTect2](https://github.com/broadgsa/gatk-protected/), [Strelka](https://github.com/Illumina/strelka/), [Freebayes](https://github.com/ekg/freebayes/)), ([GATK HaplotyeCaller](https://github.com/broadgsa/gatk-protected/)), for structural variants([Manta](https://github.com/Illumina/manta/)) and for CNVs ([ASCAT](https://github.com/Crick-CancerGenomics/ascat/)).\nAnnotation tools ([snpEff](http://snpeff.sourceforge.net/), [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)) are also used, and finally [MultiQC](http://multiqc.info/) for handling reports.\n\nWe are currently working on a manuscript, but you're welcome to look at (or even contribute to) our [github repository](https://github.com/SciLifeLab/CAW/) or talk with us on our [gitter channel](https://gitter.im/SciLifeLab/CAW/).\n\n### Singularity and UPPMAX\n\n[Singularity](http://singularity.lbl.gov/) is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.\n\nWe're based in Sweden, and [Uppsala Multidisciplinary Center for Advanced Computational Science](https://uppmax.uu.se/) (UPPMAX) provides Computational infrastructures for all Swedish researchers.\nSince we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: [SNIC-SENS](https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/).\n\nIn my case, since we're still developing the pipeline, I am mainly using the research cluster [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/).\nSo I can only transfer files and data in one specific repository using SFTP.\n\nUPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time.\nTypically, [Environment Modules](http://modules.sourceforge.net/) are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.\n\n### Why use containers?\n\nThe idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow.\nWe cannot use [Docker](https://www.docker.com/) on our secure cluster, so we wanted to run CAW with [Singularity](http://singularity.lbl.gov/) images instead.\n\n### How was the switch made?\n\nWe were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process.\nBecause this process is quite slow, repetitive and I'm lazy like to automate everything, I made a simple NF [script](https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf) to build and push all docker containers.\nBasically it's just `build` and `pull` for all containers, with some configuration possibilities.\n\n```\ndocker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}\n```\n\nSince Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.\n\n```\nsingularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}\n```\n\nAfter this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile.\nI'll spare you the details of the SFTP transfer.\nThis is what the configuration file for such Singularity images looks like: [`singularity-path.config`](https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config)\n\n```\n/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}\n```\n\nThis approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...\n\n### Conclusion\n\nThis switch was completed a couple of months ago and has been a great success.\nWe are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI.\nEven if we do enjoy the improved control, we must not forgot that:\n\n> With great power comes great responsibility!\n\n### Credits\n\nThanks to [Rickard Hammarén](https://github.com/Hammarn) and [Phil Ewels](http://phil.ewels.co.uk/) for comments and suggestions for improving the post.\n", "images": [ "/img/CAW_logo.png" - ] + ], + "author": "Maxime Garcia", + "tags": "pipelines,nextflow,genomic,workflow,singularity,cancer" }, { "slug": "2017/nextflow-and-cwl", "title": "Nextflow and the Common Workflow Language", "date": "2017-07-20T00:00:00.000Z", "content": "\nThe Common Workflow Language ([CWL](http://www.commonwl.org/)) is a specification for defining\nworkflows in a declarative manner. It has been implemented to varying degrees\nby different software packages. Nextflow and CWL share a common goal of enabling portable\nreproducible workflows.\n\nWe are currently investigating the automatic conversion of CWL workflows into Nextflow scripts\nto increase the portability of workflows. This work is being developed as\nthe [cwl2nxf](https://github.com/nextflow-io/cwl2nxf) project, currently in early prototype stage.\n\nOur first phase of the project was to determine mappings of CWL to Nextflow and familiarize\nourselves with how the current implementation of the converter supports a number of CWL specific\nfeatures.\n\n### Mapping CWL to Nextflow\n\nInputs in the CWL workflow file are initially parsed as _channels_ or other Nextflow input types.\nEach step specified in the workflow is then parsed independently. At the time of writing\nsubworkflows are not supported, each step must be a CWL `CommandLineTool` file.\n\nThe image below shows an example of the major components in the CWL files and then post-conversion (click to zoom).\n\n[![Nextflow CWL conversion](/img/cwl2nxf-min.png)](/img/cwl2nxf-min.png)\n\nCWL and Nextflow share a similar structure of defining inputs and outputs as shown above.\n\nA notable difference between the two is how tasks are defined. CWL requires either a separate\nfile for each task or a sub-workflow. CWL also requires the explicit mapping of each command\nline option for an executed tool. This is done using YAML meta-annotation to indicate the position, prefix, etc.\nfor each command line option.\n\nIn Nextflow a task command is defined as a separated component in the `process` definition and\nit is ultimately a multiline string which is interpreted by a command script by the underlying\nsystem. Input parameters can be used in the command string with a simple variable interpolation\nmechanism. This is beneficial as it simplifies porting existing BASH scripts to Nextflow\nwith minimal refactoring.\n\nThese examples highlight some of the differences between the two approaches, and the difficulties\nconverting complex use cases such as scatter, CWL expressions, and conditional command line inclusion.\n\n### Current status\n\nThe cwl2nxf is a Groovy based tool with a limited conversion ability. It parses the\nYAML documents and maps the various CWL objects to Nextflow. Conversion examples are\nprovided as part of the repository along with documentation for each example specifying the mapping.\n\nThis project was initially focused on developing an understanding of how to translate CWL to Nextflow.\nA number of CWL specific features such as scatter, secondary files and simple JavaScript expressions\nwere analyzed and implemented.\n\nThe GitHub repository includes instructions on how to build cwl2nxf and an example usage.\nThe tool can be executed as either just a parser printing the converted CWL to stdout,\nor by specifying an output file which will generate the Nextflow script file and if necessary\na config file.\n\nThe tool takes in a CWL workflow file and the YAML inputs file. It does not currently work\nwith a standalone `CommandLineTool`. The following example show how to run it:\n\n```\njava -jar build/libs/cwl2nxf-*.jar rnatoy.cwl samp.yaml\n```\n\n
\nSee the GitHub [repository](https://github.com/nextflow-io/cwl2nxf) for further details.\n\n### Conclusion\n\nWe are continuing to investigate ways to improve the interoperability of Nextflow with CWL.\nAlthough still an early prototype, the cwl2nxf tool provides some level of conversion of CWL to Nextflow.\n\nWe are also planning to explore [CWL Avro](https://github.com/common-workflow-language/cwlavro),\nwhich may provide a more efficient way to parse and handle CWL objects for conversion to Nextflow.\n\nAdditionally, a number of workflows in the GitHub repository have been implemented in both\nCWL and Nextflow which can be used as a comparison of the two languages.\n\nThe Nextflow team will be presenting a short talk and participating in the Codefest at [BOSC 2017](https://www.open-bio.org/wiki/BOSC_2017).\nWe are interested in hearing from the community regarding CWL to Nextflow conversion, and would like\nto encourage anyone interested to contribute to the cwl2nxf project.\n", - "images": [] + "images": [], + "author": "Kevin Sayers", + "tags": "nextflow,workflow,reproducibility,cwl" }, { "slug": "2017/nextflow-hack17", "title": "Nexflow Hackathon 2017", "date": "2017-09-30T00:00:00.000Z", "content": "\nLast week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation\n(CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as\nwell as two hackathon sessions for more advanced users.\n\nNearly 50 participants attended over the two days which included an entertaining tapas course\nduring the first evening!\n\nOne of the main objectives of the event was to bring together Nextflow users to work\ntogether on common interest projects. There were several proposals for the hackathon\nsessions and in the end five diverse ideas were chosen for communal development ranging from\nnew pipelines through to the addition of new features in Nextflow.\n\nThe proposals and outcomes of each the projects, which can be found in the issues section\nof [this GitHub repository](https://github.com/nextflow-io/hack17), have been summarised below.\n\n### Nextflow HTML tracing reports\n\nThe HTML tracing project aims to generate a rendered version of the Nextflow trace file to\nenable fast sorting and visualisation of task/process execution statistics.\n\nCurrently the data in the trace includes information such as CPU duration, memory usage and\ncompletion status of each task, however wading through the file is often not convenient\nwhen a large number of tasks have been executed.\n\n[Phil Ewels](https://github.com/ewels) proposed the idea and led the coordination effort\nwith the outcome being a very impressive working prototype which can be found in the Nextflow\nbranch `html-trace`.\n\nAn image of the example report is shown below with the interactive HTML available\n[here](/misc/nf-trace-report.html). It is expected to be merged into the main branch of Nextflow\nwith documentation in a near-future release.\n\n![Nextflow HTML execution report](/img/nf-trace-report-min.png)\n\n### Nextflow pipeline for 16S microbial data\n\nThe H3Africa Bioinformatics Network have been developing several pipelines which are used\nacross the participating centers. The diverse computing resources available across the nodes has led to\nmembers wanting workflow solutions with a particular focus on portability.\n\nWith this is mind, Scott Hazelhurst proposed a project for a 16S Microbial data analysis\npipeline which had [previously been developed using CWL](https://github.com/h3abionet/h3abionet16S/tree/master).\n\nThe participants made a new [branch](https://github.com/h3abionet/h3abionet16S/tree/nextflow)\nof the original pipeline and ported it into Nextflow.\n\nThe pipeline will continue to be developed with the goal of acting as a comparison between\nCWL and Nextflow. It is thought this can then be extended to other pipelines by both those\nwho are already familiar with Nextflow as well as used as a tool for training newer users.\n\n### Nextflow modules prototyping\n\n_Toolboxing_ allows users to incorporate software into their pipelines in an efficient and\nreproducible manner. Various software repositories are becoming increasing popular,\nhighlighted by the over 5,000 tools available in the [Galaxy Toolshed](https://toolshed.g2.bx.psu.edu/).\n\nProjects such as [Biocontainers](http://biocontainers.pro/) aim to wrap up the execution\nenvironment using containers. [Myself](https://github.com/skptic) and [Johan Viklund](https://github.com/viklund)\nwished to piggyback off existing repositories and settled on [Dockstore](https://dockstore.org)\nwhich is an open platform compliant with the [GA4GH](http://genomicsandhealth.org) initiative.\n\nThe majority of tools in Dockstore are written in the CWL and therefore we required a parser\nbetween the CWL CommandLineTool class and Nextflow processes. Johan was able to develop\na parser which generates Nextflow processes for several Dockstore tools.\n\nAs these resources such as Dockstore become mature and standardised, it will be\npossible to automatically generate a _Nextflow Store_ and enable efficient incorporation\nof tools into workflows.\n\n\n\n_Example showing a Nextflow process generated from the Dockstore CWL repository for the tool BAMStats._\n\n### Nextflow pipeline for de novo assembly of nanopore reads\n\n[Nanopore sequencing](https://en.wikipedia.org/wiki/Nanopore_sequencing) is an exciting\nand emerging technology which promises to change the landscape of nucleotide sequencing.\n\nWith keen interest in Nanopore specific pipelines, [Hadrien Gourlé](https://github.com/HadrienG)\nlead the hackathon project for _Nanoflow_.\n\n[Nanoflow](https://github.com/HadrienG/nanoflow) is a de novo assembler of bacterials genomes\nfrom nanopore reads using Nextflow.\n\nDuring the two days the participants developed the pipeline for adapter trimming as well\nas assembly and consensus sequence generation using either\n[Canu](https://github.com/marbl/canu) and [Miniasm](https://github.com/lh3/miniasm).\n\nThe future plans are to finalise the pipeline to include a polishing step and a genome\nannotation step.\n\n### Nextflow AWS Batch integration\n\nNextflow already has experimental support for [AWS Batch](https://aws.amazon.com/batch/)\nand the goal of this project proposed by [Francesco Strozzi](https://github.com/fstrozzi)\nwas to improve this support, add features and test the implementation on real world pipelines.\n\nEarlier work from [Paolo Di Tommaso](https://github.com/pditommaso) in the Nextflow\nrepository, highlighted several challenges to using AWS Batch with Nextflow.\n\nThe major obstacle described by [Tim Dudgeon](https://github.com/tdudgeon) was the requirement\nfor each Docker container to have a version of the Amazon Web Services Command Line tools\n(aws-cli) installed.\n\nA solution was to install the AWS CLI tools on a custom AWS image that is used by the\nDocker host machine, and then mount the directory that contains the necessary items into\neach of the Docker containers as a volume. Early testing suggests this approach works\nwith the hope of providing a more elegant solution in future iterations.\n\nThe code and documentation for AWS Batch has been prepared and will be tested further\nbefore being rolled into an official Nextflow release in the near future.\n\n### Conclusion\n\nThe event was seen as an overwhelming success and special thanks must be made to all the\nparticipants. As the Nextflow community continues to grow, it would be fantastic to make these types\nmeetings more regular occasions.\n\nIn the meantime we have put together a short video containing some of the highlights\nof the two days.\n\nWe hope to see you all again in Barcelona soon or at new events around the world!\n\n\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "nextflow,docker,hackathon" }, { "slug": "2017/nextflow-nature-biotech-paper", "title": "Nextflow published in Nature Biotechnology", "date": "2017-04-12T00:00:00.000Z", "content": "\nWe are excited to announce the publication of our work _[Nextflow enables reproducible computational workflows](http://rdcu.be/qZVo)_ in Nature Biotechnology.\n\nThe article provides a description of the fundamental components and principles of Nextflow.\nWe illustrate how the unique combination of containers, pipeline sharing and portable\ndeployment provides tangible advantages to researchers wishing to generate reproducible\ncomputational workflows.\n\nReproducibility is a [major challenge](http://www.nature.com/news/reproducibility-1.17552)\nin today's scientific environment. We show how three bioinformatics data analyses produce\ndifferent results when executed on different execution platforms and how Nextflow, along\nwith software containers, can be used to control numerical stability, enabling consistent\nand replicable results across different computing platforms. As complex omics analyses\nenter the clinical setting, ensuring that results remain stable brings on extra importance.\n\nSince its first release three years ago, the Nextflow user base has grown in an organic fashion.\nFrom the beginning it has been our own demands in a workflow tool and those of our users that\nhave driven the development of Nextflow forward. The publication forms an important milestone\nin the project and we would like to extend a warm thank you to all those who have been early\nusers and contributors.\n\nWe kindly ask if you use Nextflow in your own work to cite the following article:\n\n
\nDi Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017).\nNextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319.\ndoi:10.1038/nbt.3820\n
\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "pipelines,nextflow,genomic,workflow,paper" }, { "slug": "2017/nextflow-workshop", "title": "Nextflow workshop is coming!", "date": "2017-04-26T00:00:00.000Z", "content": "\nWe are excited to announce the first Nextflow workshop that will take place at the\nBarcelona Biomedical Research Park building ([PRBB](https://www.prbb.org/)) on 14-15th September 2017.\n\nThis event is open to everybody who is interested in the problem of computational workflow\nreproducibility. Leading experts and users will discuss the current state of the Nextflow\ntechnology and how it can be applied to manage -omics analyses in a reproducible manner.\nBest practices will be introduced on how to deploy real-world large-scale genomic\napplications for precision medicine.\n\nDuring the hackathon, organized for the second day, participants will have the\nopportunity to learn how to write self-contained, replicable data analysis\npipelines along with Nextflow expert developers.\n\nMore details at [this link](http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics).\nThe registration form is [available here](http://apps.crg.es/content/internet/events/webforms/17502) (deadline 15th Jun).\n\n### Schedule (draft)\n\n#### Thursday, 14 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
10.00Welcome & introduction
\n Cedric Notredame
\n Comparative Bioinformatics, CRG, Spain
10.15Nextflow: a quick review
\n Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain
10.30Standardising Swedish genomics analyses using Nextflow
\n Phil Ewels
\n National Genomics Infrastructure, SciLifeLab, Sweden
\n
11.00Building Pipelines to Support African Bioinformatics: the H3ABioNet Pipelines Project
\n Scott Hazelhurst
\n University of the Witwatersrand, Johannesburg, South Africa
\n
11.30coffee break\n
12.00Using Nextflow for Large Scale Benchmarking of Phylogenetic methods and tools
\n Frédéric Lemoine
\n Evolutionary Bioinformatics, Institut Pasteur, France
\n
12.30Nextflow for chemistry - crossing the divide
\n Tim Dudgeon
\n Informatics Matters Ltd, UK
\n
12.50From zero to Nextflow @ CRG's Biocore
\n Luca Cozzuto
\n Bioinformatics Core Facility, CRG, Spain
\n
13.10(to be determined)
13.30Lunch
14.30
18.30
Hackathon & course
\n\n#### Friday, 15 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
9.30Computational workflows for omics analyses at the IARC
\n Matthieu Foll
\n International Agency for Research on Cancer (IARC), France
10.00Medical Genetics at Oslo University Hospital
\n Hugues Fontanelle
\n Oslo University Hospital, Norway
10.30Inside-Out: reproducible analysis of external data, inside containers with Nextflow
\n Evan Floden
\n Comparative Bioinformatics, CRG, Spain
11.00coffee break
11.30(title to be defined)
\n Johnny Wu
\n Roche Sequencing, Pleasanton, USA
12.00Standardizing life sciences datasets to improve studies reproducibility in the EOSC
\n Jordi Rambla
\n European Genome-Phenome Archive, CRG
12.20Unbounded by Economics
\n Brendan Bouffler
\n AWS Research Cloud Program, UK
12.40Challenges with large-scale portable computational workflows
\n Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain
13.00Lunch
14.00
18.00
Hackathon
\n\n
\nSee you in Barcelona!\n\n![Nextflow workshop](/img/nf-workshop.png)\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,genomic,workflow,reproducibility,workshop," }, { "slug": "2017/scaling-with-aws-batch", "title": "Scaling with AWS Batch", "date": "2017-11-08T00:00:00.000Z", "content": "\nThe latest Nextflow release (0.26.0) includes built-in support for [AWS Batch](https://aws.amazon.com/batch/),\na managed computing service that allows the execution of containerised workloads\nover the Amazon EC2 Container Service (ECS).\n\nThis feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading\nthe process executions as managed Batch jobs. The service takes care to spin up the required\ncomputing instances on-demand, scaling up and down the number and composition of the instances\nto best accommodate the actual workload resource needs at any point in time.\n\nAWS Batch shares with Nextflow the same vision regarding workflow containerisation\ni.e. each compute task is executed in its own Docker container. This dramatically\nsimplifies the workflow deployment through the download of a few container images.\nThis common design background made the support for AWS Batch a natural extension for Nextflow.\n\n### Batch in a nutshell\n\nBatch is organised in _Compute Environments_, _Job queues_, _Job definitions_ and _Jobs_.\n\nThe _Compute Environment_ allows you to define the computing resources required for a specific workload (type).\nYou can specify the minimum and maximum number of CPUs that can be allocated,\nthe EC2 provisioning model (On-demand or Spot), the AMI to be used and the allowed instance types.\n\nThe _Job queue_ definition allows you to bind a specific task to one or more Compute Environments.\n\nThen, the _Job definition_ is a template for one or more jobs in your workload. This is required\nto specify the Docker image to be used in running a particular task along with other requirements\nsuch as the container mount points, the number of CPUs, the amount of memory and the number of\nretries in case of job failure.\n\nFinally the _Job_ binds a Job definition to a specific Job queue\nand allows you to specify the actual task command to be executed in the container.\n\nThe job input and output data management is delegated to the user. This means that if you\nonly use Batch API/tools you will need to take care to stage the input data from a S3 bucket\n(or a different source) and upload the results to a persistent storage location.\n\nThis could turn out to be cumbersome in complex workflows with a large number of\ntasks and above all it makes it difficult to deploy the same applications across different\ninfrastructure.\n\n### How to use Batch with Nextflow\n\nNextflow streamlines the use of AWS Batch by smoothly integrating it in its workflow processing\nmodel and enabling transparent interoperability with other systems.\n\nTo run Nextflow you will need to set-up in your AWS Batch account a [Compute Environment](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html)\ndefining the required computing resources and associate it to a [Job Queue](http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html).\n\nNextflow takes care to create the required _Job Definitions_ and _Job_ requests as needed.\nThis spares some Batch configurations steps.\n\nIn the `nextflow.config`, file specify the `awsbatch` executor, the Batch `queue` and\nthe container to be used in the usual manner. You may also need to specify the AWS region\nand access credentials if they are not provided by other means. For example:\n\n process.executor = 'awsbatch'\n process.queue = 'my-batch-queue'\n process.container = your-org/your-docker:image\n aws.region = 'eu-west-1'\n aws.accessKey = 'xxx'\n aws.secretKey = 'yyy'\n\nEach process can eventually use a different queue and Docker image (see Nextflow documentation for details).\nThe container image(s) must be published in a Docker registry that is accessible from the\ninstances run by AWS Batch eg. [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/)\nor [ECS Container Registry](https://aws.amazon.com/ecr/).\n\nThe Nextflow process can be launched either in a local computer or a EC2 instance.\nThe latter is suggested for heavy or long running workloads.\n\nNote that input data should be stored in the S3 storage. In the same manner\nthe pipeline execution must specify a S3 bucket as a working directory by using the `-w` command line option.\n\nA final caveat about custom containers and computing AMI. Nextflow automatically stages input\ndata and shares tasks intermediate results by using the S3 bucket specified as a work directory.\nFor this reason it needs to use the `aws` command line tool which must be installed either\nin your process container or be present in a custom AMI that can be mounted and accessed\nby the Docker containers.\n\nYou may also need to create a custom AMI because the default image used by AWS Batch only\nprovides 22 GB of storage which may not be enough for real world analysis pipelines.\n\nSee the documentation to learn [how to create a custom AMI](/docs/latest/awscloud.html#custom-ami)\nwith larger storage and how to setup the AWS CLI tools.\n\n### An example\n\nIn order to validate Nextflow integration with AWS Batch, we used a simple RNA-Seq pipeline.\n\nThis pipeline takes as input a metadata file from the Encode project corresponding to a [search\nreturning all human RNA-seq paired-end datasets](https://www.encodeproject.org/search/?type=Experiment&award.project=ENCODE&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&files.file_type=fastq&files.run_type=paired-ended&replicates.library.nucleic_acid_term_name=RNA&replicates.library.depleted_in_term_name=rRNA)\n(the metadata file has been additionally filtered to retain only data having a SRA ID).\n\nThe pipeline automatically downloads the FASTQ files for each sample from the EBI ENA database,\nit assesses the overall quality of sequencing data using FastQC and then runs [Salmon](https://combine-lab.github.io/salmon/)\nto perform the quantification over the human transcript sequences. Finally all the QC and\nquantification outputs are summarised using the [MultiQC](http://multiqc.info/) tool.\n\nFor the sake of this benchmark we used the first 38 samples out of the full 375 samples dataset.\n\nThe pipeline was executed both on AWS Batch cloud and in the CRG internal Univa cluster,\nusing [Singularity](/blog/2016/more-fun-containers-hpc.html) as containers runtime.\n\nIt's worth noting that with the exception of the two configuration changes detailed below,\nwe used exactly the same pipeline implementation at [this GitHub repository](https://github.com/nextflow-io/rnaseq-encode-nf).\n\nThe AWS deploy used the following configuration profile:\n\n aws.region = 'eu-west-1'\n aws.client.storageEncryption = 'AES256'\n process.queue = 'large'\n executor.name = 'awsbatch'\n executor.awscli = '/home/ec2-user/miniconda/bin/aws'\n\nWhile for the cluster deployment the following configuration was used:\n\n executor = 'crg'\n singularity.enabled = true\n process.container = \"docker://nextflow/rnaseq-nf\"\n process.queue = 'cn-el7'\n process.time = '90 min'\n process.$quant.time = '4.5 h'\n\n### Results\n\nThe AWS Batch Compute environment was configured to use a maximum of 132 CPUs as the number of CPUs\nthat were available in the queue for local cluster deployment.\n\nThe two executions ran in roughly the same time: 2 hours and 24 minutes when running in the\nCRG cluster and 2 hours and 37 minutes when using AWS Batch.\n\nIt must be noted that 14 jobs failed in the Batch deployment, presumably because one or more spot\ninstances were retired. However Nextflow was able to re-schedule the failed jobs automatically\nand the overall pipeline execution completed successfully, also showing the benefits of a truly\nfault tolerant environment.\n\nThe overall cost for running the pipeline with AWS Batch was **$5.47** ($ 3.28 for EC2 instances,\n$1.88 for EBS volume and $0.31 for S3 storage). This means that with ~ $55 we could have\nperformed the same analysis on the full Encode dataset.\n\nIt is more difficult to estimate the cost when using the internal cluster, because we don't\nhave access to such detailed cost accounting. However, as a user, we can estimate it roughly\ncomes out at $0.01 per CPU-Hour. The pipeline needed around 147 CPU-Hour to carry out the analysis,\nhence with an estimated cost of **$1.47** just for the computation.\n\nThe execution report for the Batch execution is available at [this link](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/aws-batch/report.html)\nand the one for cluster is available [here](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/crg-cluster/report.html).\n\n### Conclusion\n\nThis post shows how Nextflow integrates smoothly with AWS Batch and how it can be used to\ndeploy and execute real world genomics pipeline in the cloud with ease.\n\nThe auto-scaling ability provided by AWS Batch along with the use of spot instances make\nthe use of the cloud even more cost effective. Running on a local cluster may still be cheaper,\neven if it is non trivial to account for all the real costs of a HPC infrastructure.\nHowever the cloud allows flexibility and scalability not possible with common on-premises clusters.\n\nWe also demonstrate how the same Nextflow pipeline can be _transparently_ deployed in two very\ndifferent computing infrastructure, using different containerisation technologies by simply\nproviding a separate configuration profile.\n\nThis approach enables the interoperability across different deployment sites, reduces\noperational and maintenance costs and guarantees consistent results over time.\n\n### Credits\n\nThis post is co-authored with [Francesco Strozzi](https://twitter.com/fstrozzi),\nwho also helped to write the pipeline used for the benchmark in this post and contributed\nto and tested the AWS Batch integration. Thanks to [Emilio Palumbo](https://github.com/emi80)\nthat helped to set-up and configure the AWS Batch environment and [Evan Floden](https://gitter.im/skptic)\nfor the comments.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "pipelines,nextflow,genomic,workflow,aws,batch" }, { "slug": "2018/bringing-nextflow-to-google-cloud-wuxinextcode", @@ -151,21 +191,27 @@ "images": [ "/img/google-cloud.svg", "/img/wuxi-nextcode.jpeg" - ] + ], + "author": "Paolo Di Tommaso", + "tags": "nextflow,wuxinextcode,google,cloud" }, { "slug": "2018/clarification-about-nextflow-license", "title": "Clarification about the Nextflow license", "date": "2018-07-20T00:00:00.000Z", "content": "\nOver past week there was some discussion on social media regarding the Nextflow license\nand its impact on users' workflow applications.\n\n

… don’t use Nextflow, yo. https://t.co/Paip5W1wgG

— Konrad Rudolph 👨‍🔬💻 (@klmr) July 10, 2018
\n\n\n

This is certainly disappointing. An argument in favor of writing workflows in @commonwl, which is independent of the execution engine. https://t.co/mIbdLQQxmf

— John Didion (@jdidion) July 10, 2018
\n\n\n

GPL is generally considered toxic to companies due to fear of the viral nature of the license.

— Jeff Gentry (@geoffjentry) July 10, 2018
\n\n\n### What's the problem with GPL?\n\nNextflow has been released under the GPLv3 license since its early days [over 5 years ago](https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy).\nGPL is a very popular open source licence used by many projects\n(like, for example, [Linux](https://www.kernel.org/doc/html/v4.17/process/license-rules.html) and [Git](https://git-scm.com/about/free-and-open-source))\nand it has been designed to promote the adoption and spread of open source software and culture.\n\nWith this idea in mind, GPL requires the author of a piece of software, _derived_ from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.\n\nThis is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.\n\nHowever, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example\n[this post by Titus Brown](http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html) at this regard.\n\n#### What's the impact of the Nextflow license on my application?\n\nIf you are not distributing your application, based on Nextflow, it doesn't affect you in any way.\nIf you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See [here](https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic) and [here](https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL).\n\nHowever, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though.\n\n\n### That's all?\n\nNo. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.\n\nWe are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,gpl,license" }, { "slug": "2018/conda-support-has-landed", "title": "Conda support has landed!", "date": "2018-06-05T00:00:00.000Z", "content": "\nNextflow aims to ease the development of large scale, reproducible workflows allowing\ndevelopers to focus on the main application logic and to rely on best community tools and\nbest practices.\n\nFor this reason we are very excited to announce that the latest Nextflow version (`0.30.0`) finally\nprovides built-in support for [Conda](https://conda.io/docs/).\n\nConda is a popular package manager that simplifies the installation of software packages\nand the configuration of complex software environments. Above all, it provides access to large\ntool and software package collections maintained by domain specific communities such as\n[Bioconda](https://bioconda.github.io) and [BioBuild](https://biobuilds.org/).\n\nThe native integration with Nextflow allows researchers to develop workflow applications\nin a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the\nconfiguration flexibility, portability and scalability provided by Nextflow.\n\n### How it works\n\nNextflow automatically creates and activates the Conda environment(s) given the dependencies\nspecified by each process.\n\nDependencies are specified by using the [conda](/docs/latest/process.html#conda) directive,\nproviding either the names of the required Conda packages, the path of a Conda environment yaml\nfile or the path of an existing Conda environment directory.\n\nConda environments are stored on the file system. By default Nextflow instructs Conda to save\nthe required environments in the pipeline work directory. You can specify the directory where the\nConda environments are stored using the `conda.cacheDir` configuration property.\n\n#### Use Conda package names\n\nThe simplest way to use one or more Conda packages consists in specifying their names using the `conda` directive.\nMultiple package names can be specified by separating them with a space. For example:\n\n```\nprocess foo {\n conda \"bwa samtools multiqc\"\n\n \"\"\"\n your_command --here\n \"\"\"\n}\n```\n\nUsing the above definition a Conda environment that includes BWA, Samtools and MultiQC tools\nis created and activated when the process is executed.\n\nThe usual Conda package syntax and naming conventions can be used. The version of a package can be\nspecified after the package name as shown here: `bwa=0.7.15`.\n\nThe name of the channel where a package is located can be specified prefixing the package with\nthe channel name as shown here: `bioconda::bwa=0.7.15`.\n\n#### Use Conda environment files\n\nWhen working in a project requiring a large number of dependencies it can be more convenient\nto consolidate all required tools using a Conda environment file. This is a file that\nlists the required packages and channels, structured using the YAML format. For example:\n\n```\nname: my-env\nchannels:\n - bioconda\n - conda-forge\n - defaults\ndependencies:\n - star=2.5.4a\n - bwa=0.7.15\n```\n\nThe path of the environment file can be specified using the `conda` directive:\n\n```\nprocess foo {\n conda '/some/path/my-env.yaml'\n\n '''\n your_command --here\n '''\n}\n```\n\nNote: the environment file name **must** end with a `.yml` or `.yaml` suffix otherwise\nit won't be properly recognized. Also relative paths are resolved against the workflow\nlaunching directory.\n\nThe suggested approach is to store the the Conda environment file in your project root directory\nand reference it in the `nextflow.config` directory using the `baseDir` variable as shown below:\n\n```\nprocess.conda = \"$baseDir/my-env.yaml\"\n```\n\nThis guarantees that the environment paths is correctly resolved independently of the execution path.\n\nSee the [documentation](/docs/latest/conda.html) for more details on how to configure and\nuse Conda environments in your Nextflow workflow.\n\n### Bonus!\n\nThis release includes also a better support for [Biocontainers](https://biocontainers.pro/). So far,\nNextflow users were able to use container images provided by the Biocontainers community. However,\nit was not possible to collect process metrics and runtime statistics within those images due to the usage\nof a legacy version of the `ps` system tool that is not compatible with the one expected by Nextflow.\n\nThe latest version of Nextflow does not require the `ps` tool any more to fetch execution metrics\nand runtime statistics, therefore this information is collected and correctly reported when using Biocontainers\nimages.\n\n### Conclusion\n\nWe are very excited by this new feature bringing the ability to use popular Conda tool collections,\nsuch as Bioconda, directly into Nextflow workflow applications.\n\nNextflow developers have now yet another option to transparently manage the dependencies in their\nworkflows along with [Environment Modules](/docs/latest/process.html#module) and [containers](/docs/latest/docker.html)\n[technology](/docs/latest/singularity.html), giving them great configuration flexibility.\n\nThe resulting workflow applications can easily be reconfigured and deployed across a range of different\nplatforms choosing the best technology according to the requirements of the target system.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,conda,bioconda" }, { "slug": "2018/goodbye-zero-hello-apache", @@ -174,7 +220,9 @@ "content": "\nToday marks an important milestone in the Nextflow project. We are thrilled to announce three important changes to better meet users’ needs and ground the project on a solid foundation upon which to build a vibrant ecosystem of tools and data analysis applications for genomic research and beyond.\n\n### Apache license\n\nNextflow was originally licensed as GPLv3 open source software more than five years ago. GPL is designed to promote the adoption and spread of open source software and culture. On the other hand it has also some controversial side-effects, such as the one on derivative works and legal implications which make the use of GPL released software a headache in many organisations. We have previously discussed these concerns in this blog post and, after community feedback, have opted to change the project license to Apache 2.0.\n\nThis is a popular permissive free software license written by the Apache Software Foundation (ASF). Software distributed with this license requires the preservation of the copyright notice and disclaimer. It allows the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software without dictating the licence terms of the resulting applications and derivative works. We are sure this licensing model addresses the concerns raised by the Nextflow community and will boost further project developments.\n\n### New release schema\n\nIn the time since Nextflow was open sourced, we have released 150 versions which have been used by many organizations to deploy critical production workflows on a large range of computational platforms and under heavy loads and stress conditions.\n\nFor example, at the Centre for Genomic Regulation (CRG) alone, Nextflow has been used to deploy data intensive computation workflows since 2014, and it has orchestrated the execution of over 12 million jobs totalling 1.4 million CPU-hours.\n\n\"Nextflow\n\nThis extensive use across different execution environments has resulted in a reliable software package, and it's therefore finally time to declare Nextflow stable and drop the zero from the version number!\n\nFrom today onwards, Nextflow will use a 3 monthly time-based _stable_ release cycle. Today's release is numbered as **18.10**, the next one will be on January 2019, numbered as 19.01, and so on. This gives our users a more predictable release cadence and allows us to better focus on new feature development and scheduling.\n\nAlong with the 3-months stable release cycle, we will provide a monthly _edge_ release, which will include access to the latest experimental features and developments. As such, it should only be used for evaluation and testing purposes.\n\n### Commercial support\n\nFinally, for organisations requiring commercial support, we have recently incorporated Seqera Labs, a spin-off of the Centre for Genomic Regulation.\n\nSeqera Labs will foster Nextflow adoption as professional open source software by providing commercial support services and exploring new innovative products and solutions.\n\nIt's important to highlight that Seqera Labs will not close or make Nextflow a commercial project. Nextflow is and will continue to be owned by the CRG and the other contributing organisations and individuals.\n\n### Conclusion\n\nThe Nextflow project has reached an important milestone. In the last five years it has grown and managed to become a stable technology used by thousands of people daily to deploy large scale workloads for life science data analysis applications and beyond. The project is now exiting from the experimental stage.\n\nWith the above changes we want to fulfil the needs of researchers, for a reliable tool enabling scalable and reproducible data analysis, along with the demand of production oriented users, who require reliable support and services for critical deployments.\n\nAbove all, our aim is to strengthen the community effort around the Nextflow ecosystem and make it a sustainable and solid technology in the long run.\n\n### Credits\n\nWe want to say thank you to all the people who have supported and contributed to this project to this stage. First of all to Cedric Notredame for his long term commitment to the project within the Comparative Bioinformatics group at CRG. The Open Bioinformatics Foundation (OBF) in the name of Chris Fields and The Ontario Institute for Cancer Research (OICR), namely Dr Lincoln Stein, for supporting the Nextflow change of license. The CRG TBDO department, and in particular Salvatore Cappadona for his continued support and advice. Finally, the user community who with their feedback and constructive criticism contribute everyday to make this project more stable, useful and powerful.\n", "images": [ "/img/nextflow-release-schema-01.png" - ] + ], + "author": "Paolo Di Tommaso", + "tags": "nextflow,gpl,apache,license" }, { "slug": "2018/nextflow-meets-dockstore", @@ -183,140 +231,180 @@ "content": "\n
\nThis post is co-authored with Denis Yuen, lead of the Dockstore project at the Ontario Institute for Cancer Research\n
\n\nOne key feature of Nextflow is the ability to automatically pull and execute a workflow application directly from a sharing platform such as GitHub. We realised this was critical to allow users to properly track code changes and releases and, above all, to enable the [seamless sharing of workflow projects](/blog/2016/best-practice-for-reproducibility.html).\n\nNextflow never wanted to implement its own centralised workflow registry because we thought that in order for a registry to be viable and therefore useful, it should be technology agnostic and it should be driven by a consensus among the wider user community.\n\nThis is exactly what the [Dockstore](https://dockstore.org/) project is designed for and for this reason we are thrilled to announce that Dockstore has just released the support for Nextflow workflows in its latest release!\n\n### Dockstore in a nutshell\n\nDockstore is an open platform that collects and catalogs scientific data analysis tools and workflows, starting from the genomics community. It’s developed by the [OICR](https://oicr.on.ca/) in collaboration with [UCSC](https://ucscgenomics.soe.ucsc.edu/) and it is based on the [GA4GH](https://www.ga4gh.org/) open standards and the FAIR principles i.e. the idea to make research data and applications findable, accessible, interoperable and reusable ([FAIR](https://www.nature.com/articles/sdata201618)).\n\n\"Dockstore\n\nIn Dockstore’s initial release of support for Nextflow, users will be able to register and display Nextflow workflows. Many of Dockstore’s cross-language features will be available such as [searching](https://dockstore.org/search?descriptorType=nfl&searchMode=files), displaying metadata information on authorship from Nextflow’s config ([author and description](https://www.nextflow.io/docs/latest/config.html?highlight=author#scope-manifest)), displaying the [Docker images](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=tools) used by a workflow, and limited support for displaying a visualization of the [workflow structure](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=dag).\n\nThe Dockstore team will initially work to on-board the high-quality [nf-core](https://github.com/nf-core) workflows curated by the Nextflow community. However, all developers that develop Nextflow workflows will be able to login, contribute, and maintain workflows starting with our standard [workflow tutorials](https://docs.dockstore.org/docs/publisher-tutorials/workflows/).\n\nMoving forward, the Dockstore team hopes to engage more with the Nextflow community and integrate Nextflow code in order to streamline the process of publishing Nextflow workflows and draw better visualizations of Nextflow workflows. Dockstore also hopes to work with a cloud vendor to add browser based launch-with support for Nextflow workflows.\n\nFinally, support for Nextflow workflows in Dockstore will also enable the possibility of cloud platforms that implement [GA4GH WES](https://github.com/ga4gh/workflow-execution-service-schemas) to run Nextflow workflows.\n\n### Conclusion\n\nWe welcome the support for Nextflow workflows in the Dockstore platform. This is a valuable contribution and presents great opportunities for workflow developers and the wider scientific community.\n\nWe invite all Nextflow developers to register their data analysis applications in the Dockstore platform to make them accessible and reusable to a wider community of researchers.\n", "images": [ "/img/dockstore.png" - ] + ], + "author": "Paolo Di Tommaso", + "tags": "nextflow,ga4gh,nf-core,dockstore" }, { "slug": "2018/nextflow-turns-5", "title": "Nextflow turns five! Happy birthday!", "date": "2018-04-03T00:00:00.000Z", "content": "\nNextflow is growing up. The past week marked five years since the [first commit](https://github.com/nextflow-io/nextflow/commit/c080150321e5000a2c891e477bb582df07b7f75f) of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!\n\nA lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the [Gitter channel](https://gitter.im/nextflow-io/nextflow).\n\nHere are a few neat developments churning out of the birthday cake mix.\n\n### nf-core\n\n[nf-core](https://nf-core.github.io/) is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being led by [Phil Ewels](https://github.com/ewels) of [MultiQC](http://multiqc.info/) fame. The principle is that _nf-core_ pipelines can be used out-of-the-box or as inspiration for something different.\n\nAs well as being a place for best-practise pipelines, other features of _nf-core_ include the [cookie cutter template tool](https://github.com/nf-core/cookiecutter) which provides a fast way to create a dependable workflow using many of Nextflow’s sweet capabilities such as:\n\n- _Outline:_ Skeleton pipeline script.\n- _Data:_ Reference Genome implementation (AWS iGenomes).\n- _Configuration:_ Robust configuration setup.\n- _Containers:_ Skeleton files for Docker image generation.\n- _Reporting:_ HTML email functionality and and HTML results output.\n- _Documentation:_ Installation, Usage, Output, Troubleshooting, etc.\n- _Continuous Integration:_ Skeleton files for automated testing using Travis CI.\n\nThere is also a Python package with helper tools for Nextflow.\n\nYou can find more information about the community via the project [website](https://nf-core.github.io), [GitHub repository](https://github.com/nf-core), [Twitter account](https://twitter.com/nf_core) or join the dedicated [Gitter](https://gitter.im/nf-core/Lobby) chat.\n\n
\n\n[![nf-core logo](/img/nf-core-logo-min.png)](https://nf-co.re)\n\n
\n\n### Kubernetes has landed\n\nAs of version 0.28.0 Nextflow now has support for Kubernetes. If you don’t know much about Kubernetes, at its heart it is an open-source platform for the management and deployment of containers at scale. Google led the initial design and it is now maintained by the Cloud Native Computing Foundation. I found the [The Illustrated Children's Guide to Kubernetes](https://www.youtube.com/watch?v=4ht22ReBjno) particularly useful in explaining the basic vocabulary and concepts.\n\nKubernetes looks be one of the key technologies for the application of containers in the cloud as well as for building Infrastructure as a Service (IaaS) and Platform and a Service (PaaS) applications. We have been approached by many users who wish to use Nextflow with Kubernetes to be able to deploy workflows across both academic and commercial settings. With enterprise versions of Kubernetes such as Red Hat's [OpenShift](https://www.openshift.com/), it was becoming apparent there was a need for native execution with Nextflow.\n\nThe new command `nextflow kuberun` launches the Nextflow driver as a _pod_ which is then able to run workflow tasks as other pods within a Kubernetes cluster. You can read more in the documentation on Kubernetes support for Nextflow [here](https://www.nextflow.io/docs/latest/kubernetes.html).\n\n![Nextflow and Kubernetes](/img/nextflow-kubernetes-min.png)\n\n### Improved reporting and notifications\n\nFollowing the hackathon in September we wrote about the addition of HTML trace reports that allow for the generation HTML detailing resource usage (CPU time, memory, disk i/o etc).\n\nThanks to valuable feedback there has continued to be many improvements to the reports as tracked through the Nextflow GitHub issues page. Reports are now able to display [thousands of tasks](https://github.com/nextflow-io/nextflow/issues/547) and include extra information such as the [container engine used](https://github.com/nextflow-io/nextflow/issues/521). Tasks can be filtered and an [overall progress bar](https://github.com/nextflow-io/nextflow/issues/534) has been added.\n\nYou can explore a [real-world HTML report](/misc/nf-trace-report2.html) and more information on HTML reports can be found in the [documentation](https://www.nextflow.io/docs/latest/tracing.html).\n\nThere has also been additions to workflow notifications. Currently these can be configured to automatically send a notification email when a workflow execution terminates. You can read more about how to setup notifications in the [documentation](https://www.nextflow.io/docs/latest/mail.html?highlight=notification#workflow-notification).\n\n### Syntax-tic!\n\nWriting workflows no longer has to be done in monochrome. There is now syntax highlighting for Nextflow in the popular [Atom editor](https://atom.io) as well as in [Visual Studio Code](https://code.visualstudio.com).\n\n
\n\n[![Nextflow syntax highlighting with Atom](/img/atom-min.png)](/img/atom-min.png)\n\n
\n\n[![Nextflow syntax highlighting with VSCode](/img/vscode-min.png)](/img/vscode-min.png)\n\n
\n\nYou can find the Atom plugin by searching for Nextflow in Atoms package installer or clicking [here](https://atom.io/packages/language-nextflow). The Visual Studio plugin can be downloaded [here](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow).\n\nOn a related note, Nextflow is now an official language on GitHub!\n\n![GitHub nextflow syntax](/img/github-nf-syntax-min.png)\n\n### Conclusion\n\nNextflow developments are progressing faster than ever and with the help of the community, there are a ton of great new features on the way. If you have any suggestions of your killer NF idea then please drop us a line, open an issue or even better, join in the fun.\n\nOver the coming months Nextflow will be reaching out with several training and presentation sessions across the US and Europe. We hope to see as many of you as possible on the road.\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "nextflow,kubernetes,nf-core" }, { "slug": "2019/demystifying-nextflow-resume", "title": "Demystifying Nextflow resume", "date": "2019-06-24T00:00:00.000Z", "content": "\n_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part two [here](/blog/2019/troubleshooting-nextflow-resume.html)_\n\nTask execution caching and checkpointing is an essential feature of any modern workflow manager and Nextflow provides an automated caching mechanism with every workflow execution. When using the `-resume` flag, successfully completed tasks are skipped and the previously cached results are used in downstream tasks. But understanding the specifics of how it works and debugging situations when the behaviour is not as expected is a common source of frustration.\n\nThe mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, called the working directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number obtained from a composition of the task’s:\n\n- Inputs values\n- Input files\n- Command line string\n- Container ID\n- Conda environment\n- Environment modules\n- Any executed scripts in the bin directory\n\n### How does resume work?\n\nThe `-resume` command line option allows for the continuation of a workflow execution. It can be used in its most basic form with:\n\n```\n$ nextflow run nextflow-io/hello -resume\n```\n\nIn practice, every execution starts from the beginning. However, when using resume, before launching a task, Nextflow uses the unique ID to check if:\n\n- the working directory exists\n- it contains a valid command exit status\n- it contains the expected output files.\n\nIf these conditions are satisfied, the task execution is skipped and the previously computed outputs are applied. When a task requires recomputation, ie. the conditions above are not fulfilled, the downstream tasks are automatically invalidated.\n\n### The working directory\n\nBy default, the task work directories are created in the directory from where the pipeline is launched. This is often a scratch storage area that can be cleaned up once the computation is completed. A different location for the execution work directory can be specified using the command line option `-w` e.g.\n\n```\n$ nextflow run \n\nThe ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the `-bg` option. It can also be explicitly disabled using the `-ansi-log false` option or setting the `NXF_ANSI_LOG=false` variable in your launching environment.\n\n#### NCBI SRA data source\n\nThe support for NCBI SRA archive was introduced in the [previous edge release](/blog/2019/release-19.03.0-edge.html). Given the very positive reaction, we are graduating this feature into the stable release for general availability.\n\n#### Sharing\n\nThis version includes also a new Git repository provider for the [Gitea](https://gitea.io) self-hosted source code management system, which is added to the already existing support for GitHub, Bitbucket and GitLab sharing platforms.\n\n#### Reports and metrics\n\nFinally, this version includes important enhancements and bug fixes for the task executions metrics collected by Nextflow. If you are using this feature we strongly suggest updating Nextflow to this version.\n\nRemember that updating can be done with the `nextflow -self-update` command.\n\n### Changelog\n\nThe complete list of changes and bug fixes is available on GitHub at [this link](https://github.com/nextflow-io/nextflow/releases/tag/v19.04.0).\n\n### Contributions\n\nSpecial thanks to all people contributed to this release by reporting issues, improving the docs or submitting (patiently) a pull request (sorry if we have missed somebody):\n\n- [Alex Cerjanic](https://github.com/acerjanic)\n- [Anthony Underwood](https://github.com/aunderwo)\n- [Akira Sekiguchi](https://github.com/pachiras)\n- [Bill Flynn](https://github.com/wflynny)\n- [Jorrit Boekel](https://github.com/glormph)\n- [Olga Botvinnik](https://github.com/olgabot)\n- [Ólafur Haukur Flygenring](https://github.com/olifly)\n- [Sven Fillinger](https://github.com/sven1103)\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,release,stable" }, { "slug": "2019/troubleshooting-nextflow-resume", "title": "Troubleshooting Nextflow resume", "date": "2019-07-01T00:00:00.000Z", "content": "\n_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part one [here](/blog/2019/demystifying-nextflow-resume.html)_.\n\n### Troubleshooting resume\n\nIf your workflow execution is not resumed as expected, there exists several strategies to debug the problem.\n\n#### Modified input file(s)\n\nMake sure that there has been no change in your input files. Don’t forget the unique task hash is computed by taking into account the complete file path, the last modified timestamp and the file size. If any of these change, the workflow will be re-executed, even if the input content is the same.\n\n#### A process modifying one or more inputs\n\nA process should never alter input files. When this happens, the future execution of tasks will be invalidated for the same reason explained in the previous point.\n\n#### Inconsistent input file attributes\n\nSome shared file system, such as NFS, may report inconsistent file timestamp i.e. a different timestamp for the same file even if it has not been modified. There is an option to use the [lenient mode of caching](https://www.nextflow.io/docs/latest/process.html#cache) to avoid this problem.\n\n#### Race condition in a global variable\n\nNextflow does its best to simplify parallel programming and to prevent race conditions and the access of shared resources. One of the few cases in which a race condition may arise is when using a global variable with two (or more) operators. For example:\n\n```\nChannel\n .from(1,2,3)\n .map { it -> X=it; X+=2 }\n .println { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> X=it; X*=2 }\n .println { \"ch2 = $it\" }\n```\n\nThe problem with this snippet is that the `X` variable in the closure definition is defined in the global scope. Since operators are executed in parallel, the `X` value can, therefore, be overwritten by the other `map` invocation.\n\nThe correct implementation requires the use of the `def` keyword to declare the variable local.\n\n```\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X+=2 }\n .view { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X*=2 }\n .view { \"ch2 = $it\" }\n```\n\n#### Non-deterministic input channels\n\nWhile dataflow channel ordering is guaranteed i.e. data is read in the same order in which it’s written in the channel, when a process declares as input two or more channels, each of which is the output of a different process, the overall input ordering is not consistent across different executions.\n\nConsider the following snippet:\n\n```\nprocess foo {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bam') into bam_ch\n \"\"\"\n your_command --here\n \"\"\"\n}\n\nprocess bar {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bai') into bai_ch\n \"\"\"\n other_command --here\n \"\"\"\n}\n\nprocess gather {\n input:\n set val(pair), file(bam) from bam_ch\n set val(pair), file(bai) from bai_ch\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\nThe inputs declared in the gather process can be delivered in any order as the execution order of the process `foo` and `bar` is not deterministic due to parallel executions.\n\nTherefore, the input of the third process needs to be synchronized using the `join` operator or a similar approach. The third process should be written as:\n\n```\nprocess gather {\n input:\n set val(pair), file(bam), file(bai) from bam_ch.join(bai_ch)\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\n#### Still in trouble?\n\nThese are most frequent causes of problems with the Nextflow resume mechanism. If you are still not able to resolve\nyour problem, identify the first process not resuming correctly, then run your script twice using `-dump-hashes`. You can then compare the resulting `.nextflow.log` files (the first will be named `.nextflow.log.1`).\n\nUnfortunately, the information reported by `-dump-hashes` can be quite cryptic, however, with the help of a good _diff_ tool it is possible to compare the two log files to identify the reason for the cache to be invalidated.\n\n#### The golden rule\n\nNever try to debug this kind of problem with production data! This issue can be annoying, but when it happens\nit should be able to be replicated in a consistent manner with any data.\n\nTherefore, we always suggest Nextflow developers include in their pipeline project\na small synthetic dataset to easily execute and test the complete pipeline execution in a few seconds.\nThis is the golden rule for debugging and troubleshooting execution problems avoids getting stuck with production data.\n\n#### Resume by default?\n\nGiven the majority of users always apply resume, we recently discussed having resume applied by the default.\n\nIs there any situation where you do not use resume? Would a flag specifying `-no-cache` be enough to satisfy these use cases?\n\nWe want to hear your thoughts on this. Help steer Nextflow development and vote in the twitter poll below.\n\n

Should -resume⏯️ be the default when launching a Nextflow pipeline?

— Nextflow (@nextflowio) July 1, 2019
\n\n\n
\n*In the following post of this series, we will show how to produce a provenance report using a built-in Nextflow command.*\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "nextflow,resume" }, { "slug": "2020/cli-docs-release", "title": "The Nextflow CLI - tricks and treats!", "date": "2020-10-22T00:00:00.000Z", "content": "\nFor most developers, the command line is synonymous with agility. While tools such as [Nextflow Tower](https://tower.nf) are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the [Nextflow website](https://www.nextflow.io/docs/edge/cli.html).\n\nAnd given Halloween is just around the corner, in this blog post we'll take a look at 5 CLI tricks and examples which will make your life easier in designing, executing and debugging data pipelines. We are also giving away 5 limited-edition Nextflow hoodies and sticker packs so you can code in style this Halloween season!\n\n### 1. Invoke a remote pipeline execution with the latest revision\n\nNextflow facilitates easy collaboration and re-use of existing pipelines in multiple ways. One of the simplest ways to do this is to use the URL of the Git repository.\n\n```\n$ nextflow run https://www.github.com/nextflow-io/hello\n```\n\nWhen executing a pipeline using the run command, it first checks to see if it has been previously downloaded in the ~/.nextflow/assets directory, and if so, Nextflow uses this to execute the pipeline. If the pipeline is not already cached, Nextflow will download it, store it in the `$HOME/.nextflow/` directory and then launch the execution.\n\nHow can we make sure that we always run the latest code from the remote pipeline? We simply need to add the `-latest` option to the run command, and Nextflow takes care of the rest.\n\n```\n$ nextflow run nextflow-io/hello -latest\n```\n\n### 2. Query work directories for a specific execution\n\nFor every invocation of Nextflow, all the metadata about an execution is stored including task directories, completion status and time etc. We can use the `nextflow log` command to generate a summary of this information for a specific run.\n\nTo see a list of work directories associated with a particular execution (for example, `tiny_leavitt`), use:\n\n```\n$ nextflow log tiny_leavitt\n```\n\nTo filter out specific process-level information from the logs of any execution, we simply need to use the fields (-f) option and specify the fields.\n\n```\n$ nextflow log tiny_leavitt –f 'process, hash, status, duration'\n```\n\nThe hash is the name of the work directory where the process was executed; therefore, the location of a process work directory would be something like `work/74/68ff183`.\n\nThe log command also has other child options including `-before` and `-after` to help with the chronological inspection of logs.\n\n### 3. Top-level configuration\n\nNextflow emphasizes customization of pipelines and exposes multiple options to facilitate this. The configuration is applied to multiple Nextflow commands and is therefore a top-level option. In practice, this means specifying configuration options _before_ the command.\n\nNextflow CLI provides two kinds of config overrides - the soft override and the hard override.\n\nThe top-level soft override \"-c\" option allows us to change the previous config in an additive manner, overriding only the fields included the configuration file.\n\n```\n$ nextflow -c my.config run nextflow-io/hello\n```\n\nOn the other hand, the hard override `-C` completely replaces and ignores any additional configurations.\n\n $ nextflow –C my.config nextflow-io/hello\n\nMoreover, we can also use the config command to inspect the final inferred configuration and view any profiles.\n\n```\n$ nextflow config -show-profiles\n```\n\n### 4. Passing in an input parameter file\n\nNextflow is designed to work across both research and production settings. In production especially, specifying multiple parameters for the pipeline on the command line becomes cumbersome. In these cases, environment variables or config files are commonly used which contain all input files, options and metadata. Love them or hate them, YAML and JSON are the standard formats for human and machines, respectively.\n\nThe Nextflow run option `-params-file` can be used to pass in a file containing parameters in either format.\n\n```\n$ nextflow run nextflow-io/rnaseq -params-file run_42.yaml\n```\n\nThe YAML file could contain the following.\n\n```\nreads : \"s3://gatk-data/run_42/reads/*_R{1,2}_*.fastq.gz\"\nbwa_index : \"$baseDir/index/*.bwa-index.tar.gz\"\npaired_end : true\npenalty : 12\n```\n\n### 5. Specific workflow entry points\n\nThe recently released [DSL2](https://www.nextflow.io/blog/2020/dsl2-is-here.html) adds powerful modularity to Nextflow and enables scripts to contain multiple workflows. By default, the unnamed workflow is assumed to be the main entry point for the script, however, with numerous named workflows, the entry point can be customized by using the `entry` child-option of the run command.\n\n $ nextflow run main.nf -entry workflow1\n\nThis allows users to run a specific sub-workflow or a section of their entire workflow script. For more information, refer to the [implicit workflow](https://www.nextflow.io/docs/latest/dsl2.html#implicit-workflow) section of the documentation.\n\nAdditionally, as of version 20.09.1-edge, you can specify the script in a project to run other than `main.nf` using the command line option\n`-main-script`.\n\n $ nextflow run http://github.com/my/pipeline -main-script my-analysis.nf\n\n### Bonus trick! Web dashboard launched from the CLI\n\nThe tricks above highlight the functionality of the Nextflow CLI. However, for long-running workflows, monitoring becomes all the more crucial. With Nextflow Tower, we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow execution wherever we are. Sign-in to [Tower](https://tower.nf) using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc` or include them in your `nextflow.config`.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n$ export NXF_VER=20.07.1\n```\n\nNext simply add the \"-with-tower\" child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/hello -with-tower\n```\n\n### Nextflow Giveaway\n\nIf you want to look stylish while you put the above tips into practice, or simply like free stuff, we are giving away five of our latest Nextflow hoodie and sticker packs. Retweet or like the Nextflow tweet about this article and we will draw and notify the winners on October 31st!\n\n### About the Author\n\n[Abhinav Sharma](https://www.linkedin.com/in/abhi18av/) is a Bioinformatics Engineer at [Seqera Labs](https://www.seqera.io) interested in Data Science and Cloud Engineering. He enjoys working on all things Genomics, Bioinformatics and Nextflow.\n\n### Acknowledgements\n\nShout out to [Kevin Sayers](https://github.com/KevinSayers) and [Alexander Peltzer](https://github.com/apeltzer) for their earlier efforts in documenting the CLI and which inspired this work.\n\n_The latest CLI docs can be found in the edge release docs at [https://www.nextflow.io/docs/latest/cli.html](https://www.nextflow.io/docs/latest/cli.html)._\n", - "images": [] + "images": [], + "author": "Abhinav Sharma", + "tags": "nextflow,docs" }, { "slug": "2020/dsl2-is-here", "title": "Nextflow DSL 2 is here!", "date": "2020-07-24T00:00:00.000Z", "content": "\nWe are thrilled to announce the stable release of Nextflow DSL 2 as part of the latest 20.07.1 version!\n\nNextflow DSL 2 represents a major evolution of the Nextflow language and makes it possible to scale and modularise your data analysis pipeline while continuing to use the Dataflow programming paradigm that characterises the Nextflow processing model.\n\nWe spent more than one year collecting user feedback and making sure that DSL 2 would naturally fit the programming experience Nextflow developers are used to.\n\n#### DLS 2 in a nutshell\n\nBackward compatibility is a paramount value, for this reason the changes introduced in the syntax have been minimal and above all, guarantee the support of all existing applications. DSL 2 will be an opt-in feature for at least the next 12 to 18 months. After this transitory period, we plan to make it the default Nextflow execution mode.\n\nAs of today, to use DSL 2 in your Nextflow pipeline, you are required to use the following declaration at the top of your script:\n\n```\nnextflow.enable.dsl=2\n```\n\nNote that the previous `nextflow.preview` directive is still available, however, when using the above declaration the use of the final syntax is enforced.\n\n#### Nextflow modules\n\nA module file is nothing more than a Nextflow script containing one or more `process` definitions that can be imported from another Nextflow script.\n\nThe only difference when compared with legacy syntax is that the process is not bound with specific input and output channels, as was previously required using the `from` and `into` keywords respectively. Consider this example of the new syntax:\n\n```\nprocess INDEX {\n input:\n path transcriptome\n output:\n path 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}\n```\n\nThis allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new `workflow` scope. This effectively allows for the composition of the pipeline logic and enables reuse of workflow components. We anticipate this to improve both the speed that users can develop new pipelines, and the robustness of these pipelines through the use of validated modules.\n\nAny process input can be provided as a function argument using the usual channel semantics familiar to Nextflow developers. Moreover process outputs can either be assigned to a variable or accessed using the implicit `.out` attribute in the scope implicitly defined by the process name itself. See the example below:\n\n```\ninclude { INDEX; FASTQC; QUANT; MULTIQC } from './some/module/script.nf'\n\nread_pairs_ch = channel.fromFilePairs( params.reads)\n\nworkflow {\n INDEX( params.transcriptome )\n FASTQC( read_pairs_ch )\n QUANT( INDEX.out, read_pairs_ch )\n MULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n}\n```\n\nAlso enhanced is the ability to use channels as inputs multiple times without the need to duplicate them (previously done with the special into operator) which makes the resulting pipeline code more concise, fluent and therefore readable!\n\n#### Sub-workflows\n\nNotably, the DSL 2 syntax allows for the definition of reusable processes as well as sub-workflow libraries. The only requirement is to provide a `workflow` name that will be used to reference and declare the corresponding inputs and outputs using the new `take` and `emit` keywords. For example:\n\n```\nworkflow RNASEQ {\n take:\n transcriptome\n read_pairs_ch\n\n main:\n INDEX(transcriptome)\n FASTQC(read_pairs_ch)\n QUANT(INDEX.out, read_pairs_ch)\n\n emit:\n QUANT.out.mix(FASTQC.out).collect()\n}\n```\n\nNow named sub-workflows can be used in the same way as processes, allowing you to easily include and reuse multi-step workflows as part of larger workflows. Find more details [here](/docs/latest/dsl2.html).\n\n#### More syntax sugar\n\nAnother exciting feature of Nextflow DSL 2 is the ability to compose built-in operators, pipeline processes and sub-workflows with the pipe (|) operator! For example the last line in the above example could be written as:\n\n```\nemit:\n QUANT.out | mix(FASTQC.out) | collect\n```\n\nThis syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.\n\nAnother change is the introduction of `channel` as an alternative name as a synonym of `Channel` type identifier and therefore allows the use of `channel.fromPath` instead of `Channel.fromPath` and so on. This is a small syntax sugar to keep the capitazionation consistent with the rest of the language.\n\nMoreover, several process inputs and outputs syntax shortcuts were removed when using the final version of DSL 2 to make it more predictable. For example, with DSL1, in a tuple input or output declaration the component type could be omitted, for example:\n\n```\ninput:\n tuple foo, 'bar'\n```\n\nThe `foo` identifier was implicitly considered an input value declaration instead the string `'bar'` was considered a shortcut for `file('bar')`. However, this was a bit confusing especially for new users and therefore using DSL 2, the fully qualified version must be used:\n\n```\ninput:\n tuple val(foo), path('bar')\n```\n\nYou can find more detailed migration notes at [this link](/docs/latest/dsl2.html#dsl2-migration-notes).\n\n#### What's next\n\nAs always, reaching an important project milestone can be viewed as a major success, but at the same time the starting point for challenges and developments. Having a modularization mechanism opens new needs and possibilities. The first one of which will be focused on the ability to test and validate process modules independently using a unit-testing style approach. This will definitely help to make the resulting pipelines more resilient.\n\nAnother important area for the development of the Nextflow language will be the ability to better formalise pipeline inputs and outputs and further decouple for the process declaration. Nextflow currently strongly relies on the `publishDir` constructor for the generation of the workflow outputs.\n\nHowever in the new _module_ world, this approach results in `publishDir` being tied to a single process definition. The plan is instead to extend this concept in a more general and abstract manner, so that it will be possible to capture and redirect the result of any process and sub-workflow based on semantic annotations instead of hardcoding it at the task level.\n\n### Conclusion\n\nWe are extremely excited about today's release. This was a long awaited advancement and therefore we are very happy to make it available for general availability to all Nextflow users. We greatly appreciate all of the community feedback and ideas over the past year which have shaped DSL 2.\n\nWe are confident this represents a big step forward for the project and will enable the writing of a more scalable and complex data analysis pipeline and above all, a more enjoyable experience.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,release,modules,dsl2" }, { "slug": "2020/groovy3-syntax-sugar", "title": "More syntax sugar for Nextflow developers!", "date": "2020-11-03T00:00:00.000Z", "content": "\nThe latest Nextflow version 2020.10.0 is the first stable release running on Groovy 3.\n\nThe first benefit of this change is that now Nextflow can be compiled and run on any modern Java virtual machine,\nfrom Java 8, all the way up to the latest Java 15!\n\nAlong with this, the new Groovy runtime brings a whole lot of syntax enhancements that can be useful in\nthe everyday life of pipeline developers. Let's see them more in detail.\n\n### Improved not operator\n\nThe `!` (not) operator can now prefix the `in` and `instanceof` keywords.\nThis makes for more concise writing of some conditional expression, for example, the following snippet:\n\n```\nlist = [10,20,30]\n\nif( !(x in list) ) {\n // ..\n}\nelse if( !(x instanceof String) ) {\n // ..\n}\n```\n\ncould be replaced by the following:\n\n```\nlist = [10,20,30]\n\nif( x !in list ) {\n // ..\n}\nelse if( x !instanceof String ) {\n // ..\n}\n```\n\nAgain, this is a small syntax change which makes the code a little more\nreadable.\n\n### Elvis assignment operator\n\nThe elvis assignment operator `?=` allows the assignment of a value only if it was not\npreviously assigned (or if it evaluates to `null`). Consider the following example:\n\n```\ndef opts = [foo: 1]\n\nopts.foo ?= 10\nopts.bar ?= 20\n\nassert opts.foo == 1\nassert opts.bar == 20\n```\n\nIn this snippet, the assignment `opts.foo ?= 10` would be ignored because the dictionary `opts` already\ncontains a value for the `foo` attribute, while it is now assigned as expected.\n\nIn other words this is a shortcut for the following idiom:\n\n```\nif( some_variable != null ) {\n some_variable = 'Hello'\n}\n```\n\nIf you are wondering why it's called _Elvis_ assignment, well it's simple, because there's also the [Elvis operator](https://groovy-lang.org/operators.html#_elvis_operator) that you should know (and use!) already. 😆\n\n### Java style lambda expressions\n\nGroovy 3 supports the syntax for Java lambda expression. If you don't know what a Java lambda expression is\ndon't worry; it's a concept very similar to a Groovy closure, though with slight differences\nboth in the syntax and the semantic. In a few words, a Groovy closure can modify a variable in the outside scope,\nwhile a Java lambda cannot.\n\nIn terms of syntax, a Groovy closure is defined as:\n\n```\n{ it -> SOME_EXPRESSION_HERE }\n```\n\nWhile Java lambda expression looks like:\n\n```\nit -> { SOME_EXPRESSION_HERE }\n```\n\nwhich can be simplified to the following form when the expression is a single statement:\n\n```\nit -> SOME_EXPRESSION_HERE\n```\n\nThe good news is that the two syntaxes are interoperable in many cases and we can use the _lambda_\nsyntax to get rid-off of the curly bracket parentheses used by the Groovy notation to make our Nextflow\nscript more readable.\n\nFor example, the following Nextflow idiom:\n\n```\nChannel\n .of( 1,2,3 )\n .map { it * it +1 }\n .view { \"the value is $it\" }\n```\n\nCan be rewritten using the lambda syntax as:\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> it * it +1 )\n .view( it -> \"the value is $it\" )\n```\n\nIt is a bit more consistent. Note however that the `it ->` implicit argument is now mandatory (while when using the closure syntax it could be omitted). Also, when the operator argument is not _single_ value, the lambda requires the\nround parentheses to define the argument e.g.\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> tuple(it * it, it+1) )\n .view( (a,b) -> \"the values are $a and $b\" )\n```\n\n### Full support for Java streams API\n\nSince version 8, Java provides a [stream library](https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/) that is very powerful and implements some concepts and operators similar to Nextflow channels.\n\nThe main differences between the two are that Nextflow channels and the corresponding operators are _non-blocking_\ni.e. their evaluation is performed asynchronously without blocking your program execution, while Java streams are\nexecuted in a synchronous manner (at least by default).\n\nA Java stream looks like the following:\n\n```\nassert (1..10).stream()\n .filter(e -> e % 2 == 0)\n .map(e -> e * 2)\n .toList() == [4, 8, 12, 16, 20]\n\n```\n\nNote, in the above example\n[filter](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-),\n[map](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-) and\n[toList](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toList--)\nmethods are Java stream operator not the\n[Nextflow](https://www.nextflow.io/docs/latest/operator.html#filter)\n[homonymous](https://www.nextflow.io/docs/latest/operator.html#map)\n[ones](https://www.nextflow.io/docs/latest/operator.html#tolist).\n\n### Java style method reference\n\nThe new runtime also allows for the use of the `::` operator to reference an object method.\nThis can be useful to pass a method as an argument to a Nextflow operator in a similar\nmanner to how it was already possible using a closure. For example:\n\n```\nChannel\n .of( 'a', 'b', 'c')\n .view( String::toUpperCase )\n```\n\nThe above prints:\n\n```\n A\n B\n C\n```\n\nBecause to [view](https://www.nextflow.io/docs/latest/operator.html#filter) operator applied\nthe method [toUpperCase](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toUpperCase--)\nto each element emitted by the channel.\n\n### Conclusion\n\nThe new Groovy runtime brings a lot of syntax sugar for Nextflow pipelines and allows the use of modern Java\nruntime which delivers better performance and resource usage.\n\nThe ones listed above are only a small selection which may be useful to everyday Nextflow developers.\nIf you are curious to learn more about all the changes in the new Groovy parser you can find more details in\n[this link](https://groovy-lang.org/releasenotes/groovy-3.0.html).\n\nFinally, a big thanks to the Groovy community for their significant efforts in developing and maintaining this\ngreat programming environment.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,dsl2" }, { "slug": "2020/learning-nextflow-in-2020", "title": "Learning Nextflow in 2020", "date": "2020-12-01T00:00:00.000Z", "content": "\nWith the year nearly over, we thought it was about time to pull together the best-of-the-best guide for learning Nextflow in 2020. These resources will support anyone in the journey from total noob to Nextflow expert so this holiday season, give yourself or someone you know the gift of learning Nextflow!\n\n### Prerequisites to get started\n\nWe recommend that learners are comfortable with using the command line and the basic concepts of a scripting language such as Python or Perl before they start writing pipelines. Nextflow is widely used for bioinformatics applications, and the examples in these guides often focus on applications in these topics. However, Nextflow is now adopted in a number of data-intensive domains such as radio astronomy, satellite imaging and machine learning. No domain expertise is expected.\n\n### Time commitment\n\nWe estimate that the speediest of learners can complete the material in around 12 hours. It all depends on your background and how deep you want to dive into the rabbit-hole! Most of the content is introductory with some more advanced dataflow and configuration material in the workshops and patterns sections.\n\n### Overview of the material\n\n- Why learn Nextflow?\n- Introduction to Nextflow - AWS HPC Conference 2020 (8m)\n- A simple RNA-Seq hands-on tutorial (2h)\n- Full-immersion workshop (8h)\n- Nextflow advanced implementation Patterns (2h)\n- Other resources\n- Community and Support\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FDA repeatability and reproducibility guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support**. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow from the HPC on AWS Conference 2020\n\nThis short YouTube video provides a general overview of Nextflow, the motivations behind its development and a demonstration of some of the latest features.\n\n\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, use channels for data and write processes to perform tasks. It includes all scripts, data and resources and is perfect for getting a flavor for Nextflow.\n\n[Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Full-immersion workshop\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n[Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Nextflow implementation Patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along with as well as a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### Other resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing something.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. The docs should be your first point of reference when something is not clear. Newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines built by domain experts with Nextflow, that have passed tests and have been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n[nf-core website](https://nf-co.re)\n\n#### Tower Docs\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface or API.\n\n[Nextflow Tower documentation](http://help.tower.nf)\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n[Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Running Nextflow by Google Cloud\n\nGoogle Cloud Nextflow step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n[Nextflow on Google Cloud ](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Awesome Nextflow\n\nA collections of Nextflow based pipelines and other resources.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow)\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- [nf-core Slack](https://nfcore.slack.com/)\n- Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Seqera Labs](https://www.seqera.io) technical support & consulting\n\nNextflow is a community-driven project. The list of links below has been collated from a diverse collection of resources and experts to guide you in learning Nextflow. If you have any suggestions, please make a pull request to this page on GitHub.\n\nAlso stay tuned for our upcoming post, where we will discuss the ultimate Nextflow development environment.\n", - "images": [] + "images": [], + "author": "Evan Floden & Alain Coletta", + "tags": "nextflow,learning,workshop" }, { "slug": "2021/5-more-tips-for-nextflow-user-on-hpc", "title": "Five more tips for Nextflow user on HPC", "date": "2021-06-15T00:00:00.000Z", "content": "\nIn May we blogged about [Five Nextflow Tips for HPC Users](/blog/2021/5_tips_for_hpc_users.html) and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers.\n\n### 1. Use the scratch directive\n\nTo allow the pipeline tasks to share data with each other, Nextflow requires a shared file system path as a working directory. When using this model, a common recommendation is to use the node's local scratch storage as the job working directory to avoid unnecessary use of the network shared file system and achieve better performance.\n\nNextflow implements this best-practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```\nprocess.scratch = true\n```\n\nWhen using this option, Nextflow:\n\n- Creates a unique directory in the computing node's local `/tmp` or the path assigned by your cluster via the `TMPDIR` environment variable.\n- Creates a [symlink](https://en.wikipedia.org/wiki/Symbolic_link) for each input file required by the job execution.\n- Runs the job in the local scratch path.\n Copies the job output files into the job shared work directory assigned by Nextflow.\n\n### 2. Use -bg option to launch the execution in the background\n\nIn some circumstances, you may need to run your Nextflow pipeline in the background without losing the execution output. In this scenario use the `-bg` command line option as shown below.\n\n```\nnextflow run -bg > my-file.log\n```\n\nThis can be very useful when launching the execution from an SSH connected terminal and ensures that any connection issues don't stop the pipeline. You can use `ps` and `kill` to find and stop the execution.\n\n### 3. Disable interactive logging\n\nNextflow has rich terminal logging which uses ANSI escape codes to update the pipeline execution counters interactively. However, this is not very useful when submitting the pipeline execution as a cluster job or in the background. In this case, disable the rich ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`.\n\n### 4. Cluster native options\n\nNextlow has portable directives for common resource requests such as [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory) and [disk](https://www.nextflow.io/docs/latest/process.html#disk) allocation.\n\nThese directives allow you to specify the request for a certain number of computing resources e.g CPUs, memory, or disk and Nextflow converts these values to the native setting of the target execution platform specified in the pipeline configuration.\n\nHowever, there can be settings that are only available on some specific cluster technology or vendors.\n\nThe [clusterOptions](https://www.nextflow.io/docs/latest/process.html#clusterOptions) directive allows you to specify any option of your resource manager for which there isn't direct support in Nextflow.\n\n### 5. Retry failing jobs increasing resource allocation\n\nA common scenario is that instances of the same process may require different computing resources. For example, requesting an amount of memory that is too low for some processes will result in those tasks failing. You could specify a higher limit which would accommodate the task with the highest memory utilization, but you then run the risk of decreasing your job’s execution priority.\n\nNextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure and attempt to re-execute it using a higher limit. For example:\n\n```\nprocess foo {\n\n memory { 2.GB * task.attempt }\n time { 1.hour * task.attempt }\n\n errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries 3\n\n script:\n \"\"\"\n your_job_command --here\n \"\"\"\n}\n```\n\nIn the above example the memory and execution time limits are defined dynamically. The first time the process is executed the task.attempt is set to 1, thus it will request 2 GB of memory and one hour of maximum execution time.\n\nIf the task execution fails, reporting an exit status in the range between 137 and 140, the task is re-submitted (otherwise it terminates immediately). This time the value of task.attempt is 2, thus increasing the amount of the memory to four GB and the time to 2 hours, and so on.\n\nNOTE: These exit statuses are not standard and can change depending on the resource manager you are using. Consult your cluster administrator or scheduler administration guide for details on the exit statuses used by your cluster in similar error conditions.\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These Nextflow options allow you to shape how Nextflow submits your processes to your executor, that can make your workflow more robust by avoiding the overloading of the executor. Some systems have hard limits which if you do not take into account, no processes will be executed. Being aware of these configuration values and how to use them is incredibly helpful when working with larger workflows.\n", - "images": [] + "images": [], + "author": "Kevin Sayers", + "tags": "nextflow,hpc" }, { "slug": "2021/5_tips_for_hpc_users", "title": "5 Nextflow Tips for HPC Users", "date": "2021-05-13T00:00:00.000Z", "content": "\nNextflow is a powerful tool for developing scientific workflows for use on HPC systems. It provides a simple solution to deploy parallelized workloads at scale using an elegant reactive/functional programming model in a portable manner.\n\nIt supports the most popular workload managers such as Grid Engine, Slurm, LSF and PBS, among other out-of-the-box executors, and comes with sensible defaults for each. However, each HPC system is a complex machine with its own characteristics and constraints. For this reason you should always consult your system administrator before running a new piece of software or a compute intensive pipeline that spawns a large number of jobs.\n\nIn this series of posts, we will be sharing the top tips we have learned along the way that should help you get results faster while keeping in the good books of your sys admins.\n\n### 1. Don't forget the executor\n\nNextflow, by default, spawns parallel task executions in the computer on which it is running. This is generally useful for development purposes, however, when using an HPC system you should specify the executor matching your system. This instructs Nextflow to submit pipeline tasks as jobs into your HPC workload manager. This can be done adding the following setting to the `nextflow.config` file in the launching directory, for example:\n\n```\nprocess.executor = 'slurm'\n```\n\nWith the above setting Nextflow will submit the job executions to your Slurm cluster spawning a `sbatch` command for each job in your pipeline. Find the executor matching your system at [this link](https://www.nextflow.io/docs/latest/executor.html).\nEven better, to prevent the undesired use of the local executor in a specific environment, define the _default_ executor to be used by Nextflow using the following system variable:\n\n```\nexport NXF_EXECUTOR=slurm\n```\n\n### 2. Nextflow as a job\n\nQuite surely your sys admin has already warned you that the login/head node should only be used to submit job executions and not run compute intensive tasks.\nWhen running a Nextflow pipeline, the driver application submits and monitors the job executions on your cluster (provided you have correctly specified the executor as stated in point 1), and therefore it should not run compute intensive tasks.\n\nHowever, it's never a good practice to launch a long running job in the login node, and therefore a good practice consists of running Nextflow itself as a cluster job. This can be done by wrapping the `nextflow run` command in a shell script and submitting it as any other job. An average pipeline may require 2 CPUs and 2 GB of resources allocation.\n\nNote: the queue where the Nextflow driver job is submitted should allow the spawning of the pipeline jobs to carry out the pipeline execution.\n\n### 3. Use the queueSize directive\n\nThe `queueSize` directive is part of the executor configuration in the `nextflow.config` file, and defines how many processes are queued at a given time. By default, Nextflow will submit up to 100 jobs at a time for execution. Increase or decrease this setting depending your HPC system quota and throughput. For example:\n\n```\nexecutor {\n name = 'slurm'\n queueSize = 50\n}\n```\n\n### 4. Specify the max heap size\n\nThe Nextflow runtime runs on top of the Java virtual machine which, by design, tries to allocate as much memory as is available. This is not a good practice in HPC systems which are designed to share compute resources across many users and applications.\nTo avoid this, specify the maximum amount of memory that can be used by the Java VM using the -Xms and -Xmx Java flags. These can be specified using the `NXF_OPTS` environment variable.\n\nFor example:\n\n```\nexport NXF_OPTS=\"-Xms500M -Xmx2G\"\n```\n\nThe above setting instructs Nextflow to allocate a Java heap in the range of 500 MB and 2 GB of RAM.\n\n### 5. Limit the Nextflow submit rate\n\nNextflow attempts to submit the job executions as quickly as possible, which is generally not a problem. However, in some HPC systems the submission throughput is constrained or it should be limited to avoid degrading the overall system performance.\nTo prevent this problem you can use `submitRateLimit` to control the Nextflow job submission throughput. This directive is part of the `executor` configuration scope, and defines the number of tasks that can be submitted per a unit of time. The default for the `submitRateLimit` is unlimited.\nYou can specify the `submitRateLimit` like this:\n\n```\nexecutor {\n submitRateLimit = '10 sec'\n}\n```\n\nYou can also more explicitly specify it as a rate of # processes / time unit:\n\n```\nexecutor {\n submitRateLimit = '10/2min'\n}\n```\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These options allow you to shape how Nextflow communicates with your HPC system. This can make workflows more robust while avoiding overloading the executor. Some systems have hard limits, and if you do not take them into account, it will stop any jobs from being scheduled.\n\nStay tuned for part two where we will discuss background executions, retry strategies, maxForks and other tips.\n", - "images": [] + "images": [], + "author": "Kevin Sayers", + "tags": "nextflow,hpc" }, { "slug": "2021/configure-git-repositories-with-nextflow", "title": "Configure Git private repositories with Nextflow", "date": "2021-10-21T00:00:00.000Z", "content": "\nGit has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.\n\nNextflow provides builtin support for Git and most popular Git hosting platforms such\nas GitHub, GitLab and Bitbucket between the others, which streamline managing versions\nand track changes in your pipeline projects and facilitate the collaboration across\ndifferent users.\n\nIn order to access public repositories Nextflow does not require any special configuration, just use the _http_ URL of the pipeline project you want to run\nin the run command, for example:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\nHowever to allow Nextflow to access private repositories you will need to specify\nthe repository credentials, and the server hostname in the case of self-managed\nGit server installations.\n\n## Configure access to private repositories\n\nThis is done through a file name `scm` placed in the `$HOME/.nextflow/` directory, containing the credentials and other details for accessing a particular Git hosting solution. You can refer to the Nextflow documentation for all the [SCM configuration file](https://www.nextflow.io/docs/edge/sharing.html) options.\n\nAll of these platforms have their own authentication mechanisms for Git operations which are captured in the `$HOME/.nextflow/scm` file with the following syntax:\n\n```groovy\nproviders {\n\n '' {\n user = value\n password = value\n ...\n }\n\n '' {\n user = value\n password = value\n ...\n }\n\n}\n```\n\nNote: Make sure to enclose the provider name with `'` if it contains a `-` or a\nblank character.\n\nAs of the 21.09.0-edge release, Nextflow integrates with the following Git providers:\n\n## GitHub\n\n[GitHub](https://github.com) is one of the most well known Git providers and is home to some of the most popular open-source Nextflow pipelines from the [nf-core](https://github.com/nf-core/) community project.\n\nIf you wish to use Nextflow code from a **public** repository hosted on GitHub.com, then you don't need to provide credentials (`user` and `password`) to pull code from the repository. However, if you wish to interact with a private repository or are running into GitHub API rate limits for public repos, then you must provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nIt is worth noting that [GitHub recently phased out Git password authentication](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/#what-you-need-to-do-today) and now requires that users supply a more secure GitHub-generated _Personal Access Token_ for authentication. With Nextflow, you can specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n github {\n user = 'me'\n password = 'my-personal-access-token'\n }\n\n}\n```\n\nTo generate a `personal-access-token` for the GitHub platform, follow the instructions provided [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). Ensure that the token has at a minimum all the permissions in the `repo` scope.\n\nOnce you have provided your username and _personal access token_, as shown above, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://github.com/user_name/private_repo\n```\n\n## Bitbucket Cloud\n\n[Bitbucket](https://bitbucket.org/) is a publicly accessible Git solution hosted by Atlassian. Please note that if you are using an on-premises Bitbucket installation, you should follow the instructions for _Bitbucket Server_ in the following section.\n\nIf your Nextflow code is in a public Bitbucket repository, then you don't need to specify your credentials to pull code from the repository. However, if you wish to interact with a private repository, you need to provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that Bitbucket Cloud requires your `app password` in the `password` field, which is different from your login password.\n\n```groovy\nproviders {\n\n bitbucket {\n user = 'me'\n password = 'my-app-password'\n }\n\n}\n```\n\nTo generate an `app password` for the Bitbucket platform, follow the instructions provided [here](https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/). Ensure that the token has at least `Repositories: Read` permission.\n\nOnce these settings are saved in `$HOME/.nextflow/scm`, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://bitbucket.org/user_name/private_repo\n```\n\n## Bitbucket Server\n\n[Bitbucket Server](https://www.atlassian.com/software/bitbucket/enterprise) is a Git hosting solution from Atlassian which is meant for teams that require a self-managed solution. If Nextflow code resides in an open Bitbucket repository, then you don't need to provide credentials to pull code from this repository. However, if you wish to interact with a private repository, you need to give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Bitbucket server as `mybitbucketserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mybitbucketserver {\n platform = 'bitbucketserver'\n server = 'https://your.bitbucket.host.com'\n user = 'me'\n password = 'my-password' // OR \"my-token\"\n }\n\n}\n```\n\nTo generate a _personal access token_ for Bitbucket Server, refer to the [Bitbucket Support documentation](https://confluence.atlassian.com/bitbucketserver/managing-personal-access-tokens-1005339986.html) from Atlassian.\n\nOnce the configuration is saved, you can test the integration by pulling code from a private repository and specifying the `mybitbucketserver` Git provider using the `-hub` option.\n\n```\nnextflow pull https://your.bitbucket.host.com/user_name/private_repo -hub mybitbucketserver\n```\n\nNOTE: It is worth noting that [Atlassian is phasing out the Server offering](https://www.atlassian.com/migration/assess/journey-to-cloud) in favor of cloud product [bitbucket.org](https://bitbucket.org).\n\n## GitLab\n\n[GitLab](https://gitlab.com) is a popular Git provider that offers features covering various aspects of the DevOps cycle.\n\nIf you wish to run a Nextflow pipeline from a public GitLab repository, there is no need to provide credentials to pull code. However, if you wish to interact with a private repository, then you must give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that you need to specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n mygitlab {\n user = 'me'\n password = 'my-password' // or 'my-personal-access-token'\n token = 'my-personal-access-token'\n }\n\n}\n```\n\nIn addition, you can specify the `server` fields for your self-hosted instance of GitLab, by default [https://gitlab.com](https://gitlab.com) is assumed as the server.\n\nTo generate a `personal-access-token` for the GitLab platform follow the instructions provided [here](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html). Please ensure that the token has at least `read_repository`, `read_api` permissions.\n\nOnce the configuration is saved, you can test the integration by pulling the repository code using the `-hub` option.\n\n```\nnextflow pull https://gitlab.com/user_name/private_repo -hub mygitlab\n```\n\n## Gitea\n\n[Gitea server](https://gitea.com/) is an open source Git-hosting solution that can be self-hosted. If you have your Nextflow code in an open Gitea repository, there is no need to specify credentials to pull code from this repository. However, if you wish to interact with a private repository, you can give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Gitea server `mygiteaserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mygiteaserver {\n platform = 'gitea'\n server = 'https://gitea.host.com'\n user = 'me'\n password = 'my-password'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Gitea server, please refer to the [official guide](https://docs.gitea.io/en-us/api-usage/).\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `mygiteaserver` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://git.host.com/user_name/private_repo -hub mygiteaserver\n```\n\n## Azure Repos\n\n[Azure Repos](https://azure.microsoft.com/en-us/services/devops/repos/) is a part of Microsoft Azure Cloud Suite. Nextflow integrates natively Azure Repos via the usual `~/$HOME/.nextflow/scm` file.\n\nIf you'd like to use the `myazure` alias for the `azurerepos` provider, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n myazure {\n server = 'https://dev.azure.com'\n platform = 'azurerepos'\n user = 'me'\n token = 'my-api-token'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Azure Repos integration, please refer to the [official guide](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=preview-page) on Azure.\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `myazure` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://dev.azure.com/org_name/DefaultCollection/_git/repo_name -hub myazure\n```\n\n## Conclusion\n\nGit is a popular, widely used software system for source code management. The native integration of Nextflow with various Git hosting solutions is an important feature to facilitate reproducible workflows that enable collaborative development and deployment of Nextflow pipelines.\n\nStay tuned for more integrations as we continue to improve our support for various source code management solutions!\n", - "images": [] + "images": [], + "author": "Abhinav Sharma", + "tags": "git,github" }, { "slug": "2021/introducing-nextflow-for-azure-batch", "title": "Introducing Nextflow for Azure Batch", "date": "2021-02-22T00:00:00.000Z", "content": "\nWhen the Nextflow project was created, one of the main drivers was to enable reproducible data pipelines that could be deployed across a wide range of execution platforms with minimal effort as well as to empower users to scale their data analysis while facilitating the migration to the cloud.\n\nThroughout the years, the computing services provided by cloud vendors have evolved in a spectacular manner. Eight years ago, the model was focused on launching virtual machines in the cloud, then came containers and then the idea of serverless computing which changed everything again. However, the power of the Nextflow abstraction consists of hiding the complexity of the underlying platform. Through the concept of executors, emerging technologies and new platforms can be easily adapted with no changes required to user pipelines.\n\nWith this in mind, we could not be more excited to announce that over the past months we have been working with Microsoft to implement built-in support for [Azure Batch](https://azure.microsoft.com/en-us/services/batch/) into Nextflow. Today we are delighted to make it available to all users as a beta release.\n\n### How does it work\n\nAzure Batch is a cloud-based computing service that allows the execution of highly scalable, container based, workloads in the Azure cloud.\n\nThe support for Nextflow comes in the form of a plugin which implements a new executor, not surprisingly named `azurebatch`, which offloads the execution of the pipeline jobs to corresponding Azure Batch jobs.\n\nEach job run consists in practical terms of a container execution which ships the job dependencies and carries out the job computation. As usual, each job is assigned a unique working directory allocated into a [Azure Blob](https://azure.microsoft.com/en-us/services/storage/blobs/) container.\n\n### Let's get started!\n\nThe support for Azure Batch requires the latest release of Nextflow from the _edge_ channel (version 21.02-edge or later). If you don't have this, you can install it using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nNote for Windows users, as Nextflow is \\*nix based tool you will need to run it using the [Windows subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Also make sure Java 8 or later is installed in the Linux environment.\n\nOnce Nextflow is installed, to run your data pipelines with Azure Batch, you will need to create an Azure Batch account in the region of your choice using the Azure Portal. In a similar manner, you will need an Azure Blob container.\n\nWith the Azure Batch and Blob storage container configured, your `nextflow.config` file should be set up similar to the example below:\n\n```\nplugins {\n id 'nf-azure'\n}\n\nprocess {\n executor = 'azurebatch'\n}\n\nazure {\n batch {\n location = 'westeurope'\n accountName = ''\n accountKey = ''\n autoPoolMode = true\n }\n storage {\n accountName = \"\"\n accountKey = \"\"\n }\n}\n```\n\nUsing this configuration snippet, Nextflow will automatically create the virtual machine pool(s) required to deploy the pipeline execution in the Azure Batch service.\n\nNow you will be able to launch the pipeline execution using the following command:\n\n```\nnextflow run -w az://my-container/work\n```\n\nReplace `` with a pipeline name e.g. nextflow-io/rnaseq-nf and `my-container` with a blob container in the storage account as defined in the above configuration.\n\nFor more details regarding the Nextflow configuration setting for Azure Batch\nrefers to the Nextflow documentation at [this link](/docs/edge/azure.html).\n\n### Conclusion\n\nThe support for Azure Batch further expands the wide range of computing platforms supported by Nextflow and empowers Nextflow users to deploy their data pipelines in the cloud provider of their choice. Above all, it allows researchers to scale, collaborate and share their work without being locked into a specific platform.\n\nWe thank Microsoft, and in particular [Jer-Ming Chia](https://www.linkedin.com/in/jermingchia/) who works in the HPC and AI team for having supported and sponsored this open source contribution to the Nextflow framework.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,azure" }, { "slug": "2021/nextflow-developer-environment", "title": "6 Tips for Setting Up Your Nextflow Dev Environment", "date": "2021-03-04T00:00:00.000Z", "content": "\n_This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html)._\n\nThis guide is designed to walk you through a basic development setup for writing Nextflow pipelines.\n\n### 1. Installation\n\nNextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Installing Nextflow is straightforward. You just need to download the `nextflow` executable. In your terminal type the following commands:\n\n```\n$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin\n```\n\nThe first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note `/usr/local/bin` is the default for MacOS, you might want to choose `~/bin` or `/usr/bin` depending on your PATH definition and operating system.\n\n### 2. Text Editor or IDE?\n\nNextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience.\n\nMy current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With [VSCode installed](https://code.visualstudio.com/download), you can search for the Nextflow extension in the marketplace.\n\n![VSCode with Nextflow Syntax Highlighting](/img/vscode-nf-highlighting.png)\n\nOther syntax highlighting has been made available by the community including:\n\n- [Atom](https://atom.io/packages/language-nextflow)\n- [Vim](https://github.com/LukeGoodsell/nextflow-vim)\n- [Emacs](https://github.com/Emiller88/nextflow-mode)\n\n### 3. The Nextflow REPL console\n\nThe Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts.\n\nStart the Nextflow console with the following command:\n\n```\n$ nextflow console\n```\n\n![Nextflow REPL console](/img/nf-repl-console.png)\n\nUse the `CTRL+R` keyboard shortcut to run (`⌘+R`on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the **Run selection**.\n\n### 4. Containerize all the things\n\nContainers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.\n\nOnce you have [downloaded and installed Docker](https://docs.docker.com/engine/install/), try pull a public docker image:\n\n```\n$ docker pull quay.io/nextflow/rnaseq-nf\n```\n\nTo run a Nextflow pipeline using the latest tag of the image, we can use:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest\n```\n\nTo learn more about building Docker containers, see the [Seqera Labs tutorial](https://seqera.io/training/#_manage_dependencies_containers) on managing dependencies with containers.\n\nAdditionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.\n\n![VSCode with Docker Extension](/img/vs-code-with-docker-extension.png)\n\n### 5. Use Tower to monitor your pipelines\n\nWhen developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With [Nextflow Tower](https://tower.nf), we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.\n\nSign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc`, or include them in your nextflow.config.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n```\n\nWe can then add the `-with-tower` child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/rnaseq-nf -with-tower\n```\n\n### 6. nf-core tools\n\n[nf-core](https://nf-co.re/) is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage.\n\nThese tools are particularly useful for developers contributing to the community pipelines on [GitHub](https://github.com/nf-core/) with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.\n\n`nf-core tools` is a python package that can be installed in your development environment from Bioconda or PyPi.\n\n```\n$ conda install nf-core\n```\n\nor\n\n```\n$ pip install nf-core\n```\n\n![nf-core tools](/img/nf-core-tools.png)\n\n### Conclusion\n\nDeveloper workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.\n\nSolutions such as [GitHub Codespaces](https://github.com/features/codespaces) and [Gitpod](https://www.gitpod.io/) are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "nextflow,development,learning" }, { "slug": "2021/nextflow-sql-support", "title": "Introducing Nextflow support for SQL databases", "date": "2021-09-16T00:00:00.000Z", "content": "\nThe recent tweet introducing the [Nextflow support for SQL databases](https://twitter.com/PaoloDiTommaso/status/1433120149888974854) raised a lot of positive reaction. In this post, I want to describe more in detail how this extension works.\n\nNextflow was designed with the idea to streamline the deployment of complex data pipelines in a scalable, portable and reproducible manner across different computing platforms. To make this all possible, it was decided the resulting pipeline and the runtime should be self-contained i.e. to not depend on separate services such as database servers.\n\nThis makes the resulting pipelines easier to configure, deploy, and allows for testing them using [CI services](https://en.wikipedia.org/wiki/Continuous_integration), which is a critical best practice for delivering high-quality and stable software.\n\nAnother important consequence is that Nextflow pipelines do not retain the pipeline state on separate storage. Said in a different way, the idea was - and still is - to promote stateless pipeline execution in which the computed results are only determined by the pipeline inputs and the code itself, which is consistent with the _functional_ dataflow paradigm on which Nextflow is based.\n\nHowever, the ability to access SQL data sources can be very useful in data pipelines, for example, to ingest input metadata or to store task executions logs.\n\n### How does it work?\n\nThe support for SQL databases in Nextflow is implemented as an optional plugin component. This plugin provides two new operations into your Nextflow script:\n\n1. `fromQuery` performs a SQL query against the specified database and returns a Nextflow channel emitting them. This channel can be used in your pipeline as any other Nextflow channel to trigger the process execution with the corresponding values.\n2. `sqlInsert` takes the values emitted by a Nextflow channel and inserts them into a database table.\n\nThe plugin supports out-of-the-box popular database servers such as MySQL, PostgreSQL and MariaDB. It should be noted that the technology is based on the Java JDBC database standard, therefore it could easily support any database technology implementing a driver for this standard interface.\n\nDisclaimer: This plugin is a preview technology. Some features, syntax and configuration settings can change in future releases.\n\n### Let's get started!\n\nThe use of the SQL plugin requires the use of Nextflow 21.08.0-edge or later. If are using an older version, check [this page](https://www.nextflow.io/docs/latest/getstarted.html#stable-edge-releases) on how to update to the latest edge release.\n\nTo enable the use of the database plugin, add the following snippet in your pipeline configuration file.\n\n```\nplugins {\n id 'nf-sqldb@0.1.0'\n}\n```\n\nIt is then required to specify the connection _coordinates_ of the database service you want to connect to in your pipeline. This is done by adding a snippet similar to the following in your configuration file:\n\n```\nsql {\n db {\n 'my-db' {\n url = 'jdbc:mysql://localhost:3306/demo'\n user = 'my-user'\n password = 'my-password'\n }\n }\n}\n```\n\nIn the above example, replace `my-db` with a name of your choice (this name will be used in the script to reference the corresponding database connection coordinates). Also, provide a `url`, `user` and `password` matching your database server.\n\nYour script should then look like the following:\n\n```\nnextflow.enable.dsl=2\n\nprocess myProcess {\n input:\n tuple val(sample_id), path(sample_in)\n output:\n tuple val(sample_id), path('sample.out')\n\n \"\"\"\n your_command --input $sample_id > sample.out\n \"\"\"\n}\n\nworkflow {\n\n query = 'select SAMPLE_ID, SAMPLE_FILE from SAMPLES'\n channel.sql.fromQuery(query, db: 'my-db') \\\n | myProcess \\\n | sqlInsert(table: 'RESULTS', db: 'my-db')\n\n}\n```\n\nThe above example shows how to perform a simple database query, pipe the results to a fictitious process named `myProcess` and finally store the process outputs into a database table named `RESULTS`.\n\nIt is worth noting that Nextflow allows the use of any number of database instances in your pipeline, simply defining them in the configuration file using the syntax shown above. This could be useful to fetch database data from one data source and store the results into a different one.\n\nAlso, this makes it straightforward to write [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) scripts that span across multiple data sources.\n\nFind more details about the SQL plugin for Nextflow at [this link](https://github.com/nextflow-io/nf-sqldb).\n\n## What about the self-contained property?\n\nYou may wonder if adding this capability breaks the self-contained property of Nextflow pipelines which allows them to be run in a single command and to be tested with continuous integration services e.g. GitHub Action.\n\nThe good news is that it does not ... or at least it should not if used properly.\n\nIn fact, the SQL plugin includes the [H2](http://www.h2database.com/html/features.html) embedded in-memory SQL database that is used by default when no other database is provided in the Nextflow configuration file and can be used for developing and testing your pipeline without the need for a separate database service.\n\nTip: Other than this, H2 also provides the capability to access and query CSV/TSV files as SQL tables. Read more about this feature at [this link](http://www.h2database.com/html/tutorial.html?highlight=csv&search=csv#csv).\n\n### Conclusion\n\nThe use of this plugin adds to Nextflow the capability to query and store data into the SQL databases. Currently, the most popular SQL technologies are supported such as MySQL, PostgreSQL and MariaDB. In the future, support for other database technologies e.g. MongoDB, DynamoDB could be added.\n\nNotably, the support for SQL data-stores has been implemented preserving the core Nextflow capabilities to allow portable and self-contained pipeline scripts that can be developed locally, tested through CI services, and deployed at scale into production environments.\n\nIf you have any questions or suggestions, please feel free to comment in the project discussion group at [this link](https://github.com/nextflow-io/nf-sqldb/discussions).\n\nCredits to [Francesco Strozzi](https://twitter.com/fstrozzi) & [Raoul J.P. Bonnal](https://twitter.com/bonnalr) for having contributed to this work 🙏.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,plugins,sql" }, { "slug": "2021/setup-nextflow-on-windows", "title": "Setting up a Nextflow environment on Windows 10", "date": "2021-10-13T00:00:00.000Z", "content": "\nFor Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as [Cygwin](http://www.cygwin.com/) or [Wubi](https://wiki.ubuntu.com/WubiGuide). Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.\n\nThe Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch.\n\nThis document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10.\n\n## High-level Steps\n\nThe steps described in this guide are as follows:\n\n- Install Windows PowerShell\n- Configure the Windows Subsystem for Linux (WSL2)\n- Obtain and Install a Linux distribution (on WSL2)\n- Install Windows Terminal\n- Install and configure Docker\n- Download and install an IDE (VS Code)\n- Install and test Nextflow\n- Configure X-Windows for use with the Nextflow Console\n- Install and Configure GIT\n\n## Install Windows PowerShell\n\nPowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install.\n\nPowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality.\n\n- You can obtain PowerShell for Windows from GitHub at the URL https://github.com/PowerShell/PowerShell.\n- Download and install the latest stable version of PowerShell for Windows x64 - e.g., [powershell-7.1.3-win-x64.msi](https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi).\n- If you run into difficulties, Microsoft provides detailed instructions [here](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1).\n\n## Configure the Windows Subsystem for Linux (WSL)\n\n### Enable the Windows Subsystem for Linux\n\nMake sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by select WIN-R (using the Windows key to run a command) and running the utility `winver`.\n\nFrom within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"_Run as administrator_\".\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nYou can learn more about DISM [here](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism).\n\n### Step 2: Enable the Virtual Machine Feature\n\nWithin PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nAfter enabling the Virtual Machine Platform support, **restart your machine**.\n\n### Step 3: Download the Linux Kernel Update Package\n\nNextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 [here](https://docs.microsoft.com/en-us/windows/wsl/compare-versions). Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:\n\n[WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi)\n\nOnce downloaded, double click on the kernel update package and select \"Yes\" to install it with elevated permissions.\n\n### STEP 4: Set WSL2 as your Default Version\n\nFrom within PowerShell:\n\n```powershell\nPS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2\n```\n\nIf you run into difficulties with any of these steps, Microsoft provides detailed installation instructions [here](https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps).\n\n## Obtain and Install a Linux Distribution on WSL\n\nIf you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux.\n\n- Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - https://aka.ms/wslstore.\n\n ![Linux Distributions at the Microsoft Store](/img/ms-store.png)\n\n- We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.\n- When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have `sudo` privileges. Whenever a shell is started, it will default to this user.\n- After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:\n\n ```bash\n sudo apt update && sudo apt upgrade\n ```\n\n- This is also a good time to add any additional Linux packages that you will want to use.\n\n ```bash\n sudo apt install net-tools\n ```\n\n## Install Windows Terminal\n\nWhile not necessary, it is a good idea to install [Windows Terminal](https://github.com/microsoft/terminal) at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.\n\nWindows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs.\n\n![Windows Terminal](/img/windows-terminal.png)\n\nInstructions for downloading and installing Windows Terminal are available at: https://docs.microsoft.com/en-us/windows/terminal/get-started.\n\nIt is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments.\n\nSome Windows Terminal commands you'll need right away are provided below:\n\n- Split the active window vertically: SHIFT ALT =\n- Split the active window horizontally: SHIFT ALT \n- Resize the active window: SHIFT ALT ``\n- Open a new window under the current tab: ALT v (_the new tab icon along the top of the Windows Terminal interface_)\n\n## Installing Docker on Windows\n\nThere are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this \"native Linux\" approach is provided [here](https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9).\n\nA second method is to run [Docker Desktop](https://www.docker.com/products/docker-desktop) on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:\n\n- The installation process is automated\n- Docker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)\n- Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extension\n- Docker Desktop provides support for auto-installing a single-node Kubernetes cluster\n- The Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user's perspective, Docker appears to be running natively on Linux.\n\nAn explanation of how the Docker Desktop WSL 2 Back-end works is provided [here](https://www.docker.com/blog/new-docker-desktop-wsl2-backend/).\n\n### Step 1: Install Docker Desktop on Windows\n\n- Download and install Docker Desktop for Windows from the following link: https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exe\n- Follow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.\n- After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.\n- The installation process is straightforward, but if you run into difficulties, detailed instructions are available [here](https://docs.docker.com/docker-for-windows/install/).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-images.png)\n\n The Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available [here](https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-windows-arch.png)\n\n### Step 2: Verify the Docker installation\n\nNow that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.\n\n- Run a Docker command from the Linux shell as shown below below. This command downloads a **centos** image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n\n ```console\n $ docker run -ti centos:6\n [root@02ac0beb2d2c /]# hostname\n 02ac0beb2d2c\n ```\n\n- You can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running `docker ps` in another window shows the running CentOS Docker container.\n\n ```console\n $ docker ps\n CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\n f5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes \thappy_hopper\n ```\n\n### Step 3: Dealing with exit code 139\n\nYou may encounter exit code `139` when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided [here](https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438) and [here](https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139).\n\nThe solution is to add two lines to a `.wslconfig` file in your Windows home directory. The `.wslconfig` file specifies kernel options that apply to all Linux distributions running under WSL 2.\n\nSome of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix.\n\n- Edit the `.wslconfig` file in your Windows home directory. You can do this using PowerShell as shown:\n\n ```powershell\n PS C:\\Users\\ notepad .wslconfig\n ```\n\n- Add these two lines to the `.wslconfig` file and save it:\n\n ```ini\n [wsl2]\n kernelCommandLine = vsyscall=emulate\n ```\n\n- After this, **restart your machine** to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code `139`.\n\n## Install Visual Studio Code as your IDE (optional)\n\nDevelopers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:\n\n- Visual Studio Code - https://code.visualstudio.com/Download (Nextflow VSCode Language plug-in [here](https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md))\n- Eclipse - https://www.eclipse.org/\n- VIM - https://www.vim.org/ (VIM plug-in for Nextflow [here](https://github.com/LukeGoodsell/nextflow-vim))\n- Emacs - https://www.gnu.org/software/emacs/download.html (Nextflow syntax highlighter [here](https://github.com/Emiller88/nextflow-mode))\n- JetBrains PyCharm - https://www.jetbrains.com/pycharm/\n- IntelliJ IDEA - https://www.jetbrains.com/idea/\n- Atom – https://atom.io/ (Nextflow Atom support available [here](https://atom.io/packages/language-nextflow))\n- Notepad++ - https://notepad-plus-plus.org/\n\nWe decided to install Visual Studio Code because it has some nice features, including:\n\n- Support for source code control from within the IDE (Git)\n- Support for developing on Linux via its WSL 2 Video Studio Code Backend\n- A library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an [extension pack for the nf-core community](https://github.com/nf-core/vscode-extensionpack).\n\nDownload Visual Studio Code from https://code.visualstudio.com/Download and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.\n\n- Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path `\\\\wsl$\\`. In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: [**\\\\wsl$\\Ubuntu-20.04**](file://wsl$/Ubuntu-20.04).\n\nNote that the reverse is possible also – from within Linux, `/mnt/c` maps to the Windows C: drive. You can inspect `/etc/mtab` to see the mounted file systems available under Linux.\n\n- It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at https://github.com/nextflow-io/vscode-language-nextflow\n\n ![Nextflow Visual Studio Code Extension](/img/nf-vscode-ext.png)\n\n## Visual Studio Code Remote Development\n\nVisual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment.\n\nWindows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below.\n\n![The Remote Development Environment in VS Code](/img/vscode-remote-dev.png)\n\nAn explanation of how VS Code Remote Development works is provided [here](https://code.visualstudio.com/docs/remote/remote-overview).\n\nVS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.\n\nTo open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select _\"New WSL Window using Distro ..\"_ and select `Ubuntu 20.04`. You'll notice that the environment changes to show that you are working in the WSL: `Ubuntu-20.04` environment.\n\n![Selecting the Remote Dev Environment within VS Code](/img/remote-dev-side-by-side.png)\n\nSelecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.\n\nThe Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.\n\n![Local vs. Remote Extensions in VS Code](/img/vscode-extensions.png)\n\n## Installing Nextflow\n\nWith Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available at https://www.nextflow.io/docs/latest/getstarted.html#installation\n\n### Step 1: Make sure Java is installed (under WSL)\n\nJava is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available [here](https://linuxize.com/post/install-java-on-ubuntu-18-04/). To install the default OpenJDK, follow the instructions below in a Linux shell window:\n\n- Update the _apt_ package index:\n\n ```bash\n sudo apt update\n ```\n\n- Install the latest default OpenJDK package\n\n ```bash\n sudo apt install default-jdk\n ```\n\n- Verify the installation\n\n ```bash\n java -version\n ```\n\n### Step 2: Make sure curl is installed\n\n`curl` is a convenient way to obtain Nextflow. `curl` is included in the default Ubuntu repositories, so installation is straightforward.\n\n- From the shell:\n\n ```bash\n sudo apt update\n sudo apt install curl\n ```\n\n- Verify that `curl` works:\n\n ```console\n $ curl\n curl: try 'curl --help' or 'curl --manual' for more information\n ```\n\n### STEP 3: Download and install Nextflow\n\n- Use `curl` to retrieve Nextflow into a temporary directory and then install it in `/usr/bin` so that the Nextflow command is on your path:\n\n ```bash\n mkdir temp\n cd temp\n curl -s https://get.nextflow.io | bash\n sudo cp nextflow /usr/bin\n ```\n\n- Make sure that Nextflow is executable:\n\n ```bash\n sudo chmod 755 /usr/bin/nextflow\n ```\n\n or if you prefer:\n\n ```bash\n sudo chmod +x /usr/bin/nextflow\n ```\n\n### Step 4: Verify the Nextflow installation\n\n- Make sure Nextflow runs:\n\n ```console\n $ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io\n ```\n\n- Run a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.\n\n ```console\n $ nextflow run hello\n\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\n executor > local (4)\n [06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\n Ciao world!\n\n Hola world!\n\n Bonjour world!\n\n Hello world!\n ```\n\n### Step 5: Run a Containerized Workflow\n\nTo validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at https://github.com/nextflow-io/blast-example. Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.\n\nThe `nextflow.config` file for the Blast example (below) specifies that process logic is encapsulated in the container `nextflow/examples` available from Docker Hub (https://hub.docker.com/r/nextflow/examples).\n\n- On GitHub: [nextflow-io/blast-example/nextflow.config](https://github.com/nextflow-io/blast-example/blob/master/nextflow.config)\n\n ```groovy\n manifest {\n nextflowVersion = '>= 20.01.0'\n }\n\n process {\n container = 'nextflow/examples'\n }\n ```\n\n- Run the _blast-example_ pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:\n\n ```console\n $ nextflow run blast-example -with-docker\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\n executor > local (2)\n [aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n [b3/c41401] process > extract (1) [100%] 1 of 1 ✔\n matching sequences:\n >lcl|1ABO:B unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1ABO:A unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1YCS:B unnamed protein product\n PEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\n GHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\n VQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n >lcl|1IHD:C unnamed protein product\n LPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT\n ```\n\n- Nextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes.\n\n## Configuring an XServer for the Nextflow Console\n\nPipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console's REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.\n\nThe Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided [here](https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3).\n\n- Download an X-Windows server for Windows. In this example, we use the _VcXsrv Windows X Server_ available from source forge at https://sourceforge.net/projects/vcxsrv/.\n\n- Accept all the defaults when running the automated installer. The X-server will end up installed in `c:\\Program Files\\VcXsrv`.\n\n- The automated installation of VcXsrv will create an _\"XLaunch\"_ shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don't need to interact with the XLaunch interface every time you start the X-server.\n\n- Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:\n\n ```powershell\n \"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto\n ```\n\n- Inspecting the new shortcut properties, it should look something like this:\n\n ![X-Server (vcxsrc) Properties](/img/xserver.png)\n\n- Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.\n\n- One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the `netstat` command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using `netstat`, you should see output like the following:\n\n ```powershell\n PS C:\\WINDOWS\\system32> **netstat -abno | findstr 6000**\n TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING 35176\n TCP 127.0.0.1:6000 127.0.0.1:56516 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56517 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56518 ESTABLISHED 35176\n TCP 127.0.0.1:56516 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56517 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56518 127.0.0.1:6000 ESTABLISHED 35176\n TCP 172.28.192.1:6000 172.28.197.205:46290 TIME_WAIT 0\n TCP [::]:6000 [::]:0 LISTENING 35176\n ```\n\n- At this point, the X-server is up and running and awaiting a connection from a client.\n\n- Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.\n\n- The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly.\n\n ```console\n $ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n $ echo $DISPLAY\n 172.28.192.1:0.0\n ```\n\n- Add this command to the end of your `.bashrc` file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n\n ```bash\n cd ~\n vi .bashrc\n ```\n\n ```bash\n # set the X-Windows display to connect to VcXsrv on Windows\n export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n \".bashrc\" 120L, 3912C written\n ```\n\n- Use an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:\n\n ```bash\n sudo apt install xterm\n ```\n\n- Assuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\n Before testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\n Depending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n\n ![Ensure that the Firewall is not interfering](/img/firewall.png)\n\n- With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:\n\n ```bash\n xterm &\n ```\n\n- If everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n\n ![Launch an xterm to verify functionality](/img/xterm.png)\n\n- Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to _\"Open ports to Work and Home networks\"_ for the `vcxsrv.exe` program only as shown:\n\n ![Allowing access to XServer traffic](/img/xserver_setup.png)\n\n- With the X-server running, the `DISPLAY` variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:\n\n ```bash\n nextflow console\n ```\n\n The command above opens the Nextflow REPL console under X-Windows.\n\n ![Nextflow REPL Console under X-Windows](/img/repl_console.png)\n\nInside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.\n\n# Installing Git\n\nCollaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.\n\nAs explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)\n\nDevelopers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.\n\n### Step 1: Install Git on Windows (optional)\n\n- Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.\n\n- Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).\n\n ![Installing Git on Windows](/img/git-install.png)\n\n- The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the `PATH` variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.\n\n- After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git.\n\n- If you plan to use Git from the command line, GitHub provides a useful cheatsheet [here](https://training.github.com/downloads/github-git-cheat-sheet.pdf).\n\n- After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.\n\n ![Using Git within VS Code](/img/git-vscode.png)\n\n- Documentation on using Git with Visual Studio Code is provided at https://code.visualstudio.com/docs/editor/versioncontrol\n\n### Step 2: Install Git on Linux\n\n- Open a Remote VS Code Window on **\\*WSL: Ubuntu 20.04\\*** (By selecting the green icon on the lower-left corner of the VS code interface.)\n\n- Git should already be installed in `/usr/bin`, but you can validate this from the Ubuntu shell:\n\n ```console\n $ git --version\n git version 2.25.1\n ```\n\n- To get started using Git with VS Code Remote on WSL, select the _Source Control icon_ on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to _Clone a Repository_.\n\n- Select \"Clone Repository,\" and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see _blast-example_ appear as a source code repository within VS code as shown:\n\n ![Using Git within VS Code](/img/git-linux-1.png)\n\n- Select the _Explorer_ panel in VS Code to see the cloned _blast-example_ repo. Now we can explore and modify the pipeline code using the IDE.\n\n ![Using Git within VS Code](/img/git-linux-2.png)\n\n- After making modifications to the pipeline, we can execute the _local copy_ of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:\n\n ![Using Git within VS Code](/img/git-linux-3.png)\n\n- With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.\n\n- Git commands are available from within VS Code by selecting the _Source Control_ icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.\n\n ![Using Git within VS Code](/img/git-linux-4.png)\n\n## Summary\n\nWith WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers.\n\nPipelines developed in the Windows environment can easily be extended to compute environments in the cloud.\n\nWhile installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful.\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "windows,learning" }, { "slug": "2022/caching-behavior-analysis", "title": "Analyzing caching behavior of pipelines", "date": "2022-11-10T00:00:00.000Z", "content": "\nThe ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending `-resume` to the `nextflow run` command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.\n\nWe have previously written about Nextflow's [resume functionality](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) as well as some [troubleshooting strategies](https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html) to gain more insights on the caching behavior.\n\nIn this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.\n\n### Local setup for the test\n\nFirst, we clone the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline locally:\n\n```bash\n$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf\n```\n\nIn the examples below, we have used Nextflow `v22.10.0`, Docker `v20.10.8` and `Java v17 LTS` on MacOS.\n\n### Pipeline flowchart\n\nThe flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.\n\n![rnaseq-nf](/img/rnaseq-nf.base.png)\n\n### Logs from initial (fresh) run\n\nAs a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article [Troubleshooting Nextflow resume](https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).\n\nTo have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called `fresh_run.log` and use them later on as \"ground-truth\" for the analysis. In order to save the process hashes we use the `-dump-hashes` flag, which prints them to the log.\n\n**TIP:** We rely upon the [`-log` option](https://www.nextflow.io/docs/latest/cli.html#execution-logs) in the `nextflow` command line interface to be able to supply a custom log file name instead of the default `.nextflow.log`.\n\n```console\n$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n### Edit the `FastQC` process\n\nAfter the initial run of the pipeline, we introduce a change in the `fastqc.nf` module, hard coding the number of threads which should be used to run the `FASTQC` process via Nextflow's [`cpus` directive](https://www.nextflow.io/docs/latest/process.html#cpus).\n\nHere's the output of `git diff` on the contents of `modules/fastqc/main.nf` file:\n\n```diff\n--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }\n```\n\n### Logs from the follow up run\n\nNext, we run the pipeline again with the `-resume` option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called `resumed_run.log`.\n\n```console\n$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n## Analysis of cache hashes\n\nFrom the summary of the command line output above, we can see that the `RNASEQ:FASTQC (FASTQC on ggal_gut)` and `MULTIQC` processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the `fresh_run` and `resumed_run`.\n\nFor the analysis, we need to keep in mind that:\n\n1. The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.\n\n2. The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.\n\n### Find the process level hashes\n\nWe can use standard Unix tools like `grep`, `cut` and `sort` to address these points and filter out the relevant information:\n\n1. Use `grep` to isolate log entries with `cache hash` string\n2. Remove the prefix time-stamps using `cut -d ‘-’ -f 3`\n3. Remove the caching mode related information using `cut -d ';' -f 1`\n4. Sort the lines based on process names using `sort` to have a standard order before comparison\n5. Use `tee` to print the resultant strings to the terminal and simultaneously save to a file\n\nNow, let’s apply these transformations to the `fresh_run.log` as well as `resumed_run.log` entries.\n\n- `fresh_run.log`\n\n```console\n$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n- `resumed_run.log`\n\n```console\n$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n### Inference from process top-level hashes\n\nComputing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in [Demystifying Nextflow resume](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) blog post) . The change we made in the task level CPUs directive and script section of the `FASTQC` process triggered a re-computation of hashes:\n\n```diff\n--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n\n\n```\n\nEven though we only introduced changes in `FASTQC`, the `MULTIQC` process was re-run since it relies upon the output of the `FASTQC` process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:\n\n![rnaseq-nf after modification](/img/rnaseq-nf.modified.png)\n\n### Understanding why `FASTQC` was re-run\n\nWe can see the full list of `FASTQC` process hashes within the `fresh_run.log` file\n\n```console\n\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n\n```\n\nWhen we isolate and compare the log entries for `FASTQC` between `fresh_run.log` and `resumed_run.log`, we see the following diff:\n\n```diff\n--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n\n```\n\nObservations from the diff:\n\n1. We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command.\n2. There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.\n\nIn other words, the diff from log files is confirming our edits.\n\n### Understanding why `MULTIQC` was re-run\n\nNow, we apply the same analysis technique for the `MULTIQC` process in both log files:\n\n```diff\n--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n```\n\nHere, the highlighted diffs show the directory of the input files, changing as a result of `FASTQC` being re-run; as a result `MULTIQC` has a new hash and has to be re-run as well.\n\n## Conclusion\n\nDebugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run.\n\nWhen analyzing large datasets, it may be worth using the `-dump-hashes` option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.\n\nWhile this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an `nf-cache` plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins.\n", - "images": [] + "images": [], + "author": "Abhinav Sharma", + "tags": "nextflow,cache" }, { "slug": "2022/czi-mentorship-round-1", @@ -325,63 +413,81 @@ "content": "\n## Introduction\n\n
\n \"Word\n

Word cloud of scientific interest keywords, averaged across all applications.

\n
\n\nOur recent [The State of the Workflow 2022: Community Survey Results](https://seqera.io/blog/state-of-the-workflow-2022-results/) showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.\n\nThanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.\n\nFrom almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.\n\nPeople worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor / mentee pairs](/img/mentorships-round1-map.png)
\n_The mentorship program was worldwide._\n\n## Ndeye Marième Top (mentee) & John Juma (mentor)\n\nFor the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.\n\nTogether, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to [GISAID](https://gisaid.org/). She also got up to speed with the [nf-core/mag](https://nf-co.re/mag) pipeline for metagenomic analysis.\n\n
\n \"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\" - Marième Top (mentee)\n
\n
\n \"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\" - John Juma (mentor)\n
\n\n## Philip Ashton (mentee) & Robert Petit (mentor)\n\nPhilip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the [Bactopia](https://bactopia.github.io/) pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.\n\nRobert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.\n\n
\n \"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\" - Phil Ashton (mentee)\n
\n
\n \"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\" - Robert Petit (mentor)\n
\n\n## Kalayanee Chairat (mentee) & Alison Meynert (mentor)\n\nKalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom.\n\nWorking with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to [nf-core/configs](https://github.com/nf-core/configs) as an institutional profile. Now she is familiar with these tools, Kalayanee is using [nf-core/sarek](https://nf-co.re/sarek) and [nf-core/rnaseq](https://nf-co.re/rnaseq) to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment.\n\n
\n \"The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\" - Kalayanee Chairat (mentee)\n
\n
\n \"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\" - Alison Meynert (mentor)\n
\n\n## Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)\n\nFor the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.\n\nEdward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.\n\n
\n \"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\" - Edward Lukyamuzi (mentee)\n
\n
\n \"I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\" - Emilio Garcia-Rios (mentor)\n
\n\n## Suchitra Thapa (mentee) & Maxime Borry (mentor)\n\nSuchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the [metaphlankrona](https://github.com/suchitrathapa/metaphlankrona) pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.\n\nSuchitra will be sharing her work and progress on the pipeline as a poster at the [Nextflow Summit 2022](https://summit.nextflow.io/speakers/suchitra-thapa/).\n\n
\n \"This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\" - Suchitra Thapa (mentee)\n
\n
\n \"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\" - Maxime Borry (mentor)\n
\n\n## Conclusion\n\nFeedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.\n\n
\n \"This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!\" - John Juma (mentor)\n
\n\n![Screenshot of final zoom meetup](/img/mentorships-round1-zoom.png)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow.\n\nWe were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See [https://nf-co.re/mentorships](https://nf-co.re/mentorships) for details.\n\n

\n Mentorship Round 2 - Details\n

\n", "images": [ "/img/mentorships-round1-wordcloud.png" - ] + ], + "author": "Chris Hakkaart", + "tags": "nextflow,nf-core,czi,mentorship,training" }, { "slug": "2022/deploy-nextflow-pipelines-with-google-cloud-batch", "title": "Deploy Nextflow Pipelines with Google Cloud Batch!", "date": "2022-07-13T00:00:00.000Z", "content": "\nA key feature of Nextflow is the ability to abstract the implementation of data analysis pipelines so they can be deployed in a portable manner across execution platforms.\n\nAs of today, Nextflow supports a rich variety of HPC schedulers and all major cloud providers. Our goal is to support new services as they emerge to enable Nextflow users to take advantage of the latest technology and deploy pipelines on the compute environments that best fit their requirements.\n\nFor this reason, we are delighted to announce that Nextflow now supports [Google Cloud Batch](https://cloud.google.com/batch), a new fully managed batch service just announced for beta availability by Google Cloud.\n\n### A New On-Ramp to the Google Cloud\n\nGoogle Cloud Batch is a comprehensive cloud service suitable for multiple use cases, including HPC, AI/ML, and data processing. While it is similar to the Google Cloud Life Sciences API, used by many Nextflow users today, Google Cloud Batch offers a broader set of capabilities. As with Google Cloud Life Sciences, Google Cloud Batch automatically provisions resources, manages capacity, and allows batch workloads to run at scale. It offers several advantages, including:\n\n- The ability to re-use VMs across jobs steps to reduce overhead and boost performance.\n- Granular control over task execution, compute, and storage resources.\n- Infrastructure, application, and task-level logging.\n- Improved task parallelization, including support for multi-node MPI jobs, with support for array jobs, and subtasks.\n- Improved support for spot instances, which provides a significant cost saving when compared to regular instance.\n- Streamlined data handling and provisioning.\n\nA nice feature of Google Cloud Batch API, that fits nicely with Nextflow, is its built-in support for data ingestion from Google Cloud Storage buckets. A batch job can _mount_ a storage bucket and make it directly accessible to a container running a Nextflow task. This feature makes data ingestion and sharing resulting data sets more efficient and reliable than other solutions.\n\n### Getting started with Google Cloud Batch\n\nSupport for the Google Cloud Batch requires the latest release of Nextflow from the edge channel (version `22.07.1-edge` or later). If you don't already have it, you can install this release using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nMake sure your Google account is allowed to access the Google Cloud Batch service by checking the [API & Service](https://console.cloud.google.com/apis/dashboard) dashboard.\n\nCredentials for accessing the service are picked up by Nextflow from your environment using the usual [Google Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http) mechanism. That is, either via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or by using the following command to set up the environment:\n\n```\ngcloud auth application-default login\n```\n\nAfter authenticating yourself to Google Cloud, create a `nextflow.config` file and specify `google-batch` as the Nextflow executor. You will also need to specify the Google Cloud project where execution will occur and the Google Cloud Storage working directory for pipeline execution.\n\n```\ncat < nextflow.config\nprocess.executor = 'google-batch'\nworkDir = 'gs://YOUR-GOOGLE-BUCKET/scratch'\ngoogle.project = 'YOUR GOOGLE PROJECT ID'\nEOT\n```\n\nIn the above snippet replace `` with a Google Storage bucket of your choice where to store the pipeline output data and `` with your Google project Id where the computation will be deployed.\n\nWith this information, you are ready to start. You can verify that the integration is working by running the Nextflow “hello” pipeline as shown below:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\n### Migrating Google Cloud Life Sciences pipelines to Google Cloud Batch\n\nGoogle Cloud Life Sciences users can easily migrate their pipelines to Google Cloud Batch by making just a few edits to their pipeline configuration settings. Simply replace the `google-lifesciences` executor with `google-batch`.\n\nFor each setting having the prefix `google.lifeScience.`, there is a corresponding `google.batch.` setting. Simply update these configuration settings to reflect the new service.\n\nThe usual process directives such as: [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory), [time](https://www.nextflow.io/docs/latest/process.html#time), [machineType](https://www.nextflow.io/docs/latest/process.html#machinetype) are natively supported by Google Cloud Batch, and should not be modified.\n\nFind out more details in the [Nextflow documentation](https://www.nextflow.io/docs/edge/google.html#cloud-batch).\n\n### 100% Open, Built to Scale\n\nThe Google Cloud Batch executor for Nextflow is offered as an open source contribution to the Nextflow project. The integration was developed by Google in collaboration with [Seqera Labs](https://seqera.io/). This is a validation of Google Cloud’s ongoing commitment to open source software (OSS) and a testament to the health and vibrancy of the Nextflow project. We wish to thank the entire Google Cloud Batch team, and Shamel Jacobs in particular, for their support of this effort.\n\n### Conclusion\n\nSupport for Google Cloud Batch further expands the wide range of computing platforms supported by Nextflow. It empowers Nextflow users to easily access cost-effective resources, and take full advantage of the rich capabilities of the Google Cloud. Above all, it enables researchers to easily scale and collaborate, improving their productivity, and resulting in better research outcomes.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,google,cloud" }, { "slug": "2022/evolution-of-nextflow-runtime", "title": "Evolution of the Nextflow runtime", "date": "2022-03-24T00:00:00.000Z", "content": "\nSoftware development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.\n\nNextflow is no exception. This post will summarise the major changes in the evolution of the framework over the next 12 to 18 months.\n\n### Java baseline version\n\nNextflow runs on top of Java (or, more precisely, the Java virtual machine). So far, Java 8 has been the minimal version required to run Nextflow. However, this version was released 8 years ago and is going to reach its end-of-life status at the end of [this month](https://endoflife.date/java). For this reason, as of version 22.01.x-edge and the upcoming stable release 22.04.0, Nextflow will require Java version 11 or later for its execution. This also allows the introduction of new capabilities provided by the modern Java runtime.\n\nTip: If you are confused about how to install or upgrade Java on your computer, consider using [Sdkman](https://sdkman.io/). It’s a one-liner install tool that allows easy management of Java versions.\n\n### DSL2 as default syntax\n\nNextflow DSL2 has been introduced nearly [2 years ago](https://www.nextflow.io/blog/2020/dsl2-is-here.html) (how time flies!) and definitely represented a major milestone for the project. Established pipeline collections such as those in [nf-core](https://nf-co.re/pipelines) have migrated their pipelines to DSL2 syntax.\n\nThis is a confirmation that the DSL2 syntax represents a natural evolution for the project and is not considered to be just an experimental or alternative syntax.\n\nFor this reason, as for Nextflow version 22.03.0-edge and the upcoming 22.04.0 stable release, DSL2 syntax is going to be the **default** syntax version used by Nextflow, if not otherwise specified.\n\nIn practical terms, this means it will no longer be necessary to add the declaration `nextflow.enable.dsl = 2` at the top of your script or use the command line option `-dsl2 ` to enable the use of this syntax.\n\nIf you still want to continue to use DSL1 for your pipeline scripts, you will need to add the declaration `nextflow.enable.dsl = 1` at the top of your pipeline script or use the command line option `-dsl1`.\n\nTo make this transition as smooth as possible, we have also added the possibility to declare the DSL version in the Nextflow configuration file, using the same syntax shown above.\n\nFinally, if you wish to keep the current DSL behaviour and not make any changes in your pipeline scripts, the following variable can be defined in your system environment:\n\n```\nexport NXF_DEFAULT_DSL=1\n```\n\n### DSL1 end-of-life phase\n\nMaintaining two separate DSL implementations in the same programming environment is not sustainable and, above all, does not make much sense. For this reason, along with making DSL2 the default Nextflow syntax, DSL1 will enter into a 12-month end-of-life phase, at the end of which it will be removed. Therefore version 22.04.x and 22.10.x will be the last stable versions providing the ability to run DSL1 scripts.\n\nThis is required to keep evolving the framework and to create a more solid implementation of Nextflow grammar. Maintaining compatibility with the legacy syntax implementation and data structures is a challenging task that prevents the evolution of the new syntax.\n\nBear in mind, this does **not** mean it will not be possible to use DSL1 starting from 2023. All existing Nextflow runtimes will continue to be available, and it will be possible to for any legacy pipeline to run using the required version available from the GitHub [releases page](https://github.com/nextflow-io/nextflow/releases), or by specifying the version using the NXF_VER variable, e.g.\n\n```\nNXF_VER: 21.10.6 nextflow run \n```\n\n### New configuration format\n\nThe configuration file is a key component of the Nextflow framework since it allows workflow developers to decouple the pipeline logic from the execution parameters and infrastructure deployment settings.\n\nThe current Nextflow configuration file mechanism is extremely powerful, but it also has some serious drawbacks due to its _dynamic_ nature that makes it very hard to keep stable and maintainable over time.\n\nFor this reason, we are planning to re-engineer the current configuration component and replace it with a better configuration component with two major goals: 1) continue to provide a rich and human-readable configuration system (so, no YAML or JSON), 2) have a well-defined syntax with a solid foundation that guarantees predictable configurations, simpler troubleshooting and more sustainable maintenance.\n\nCurrently, the most likely options are [Hashicorp HCL](https://github.com/hashicorp/hcl) (as used by Terraform and other Hashicorp tools) and [Lightbend HOCON](https://github.com/lightbend/config). You can read more about this feature at [this link](https://github.com/nextflow-io/nextflow/issues/2723).\n\n### Ignite executor deprecation\n\nThe executor for [Apache Ignite](https://www.nextflow.io/docs/latest/ignite.html) was an early attempt to provide Nextflow with a self-contained, distributed cluster for the deployment of pipelines into HPC environments. However, it had very little adoption over the years, which was not balanced by the increasing complexity of its maintenance.\n\nFor this reason, it was decided to deprecate it and remove it from the default Nextflow distribution. The module is still available in the form of a separate project plugin and available at [this link](https://github.com/nextflow-io/nf-ignite), however, it will not be actively maintained.\n\n### Conclusion\n\nThis post is focused on the most fundamental changes we are planning to make in the following months.\n\nWith the adoption of Java 11, the full migration of DSL1 to DSL2 and the re-engineering of the configuration system, our purpose is to consolidate the Nextflow technology and lay the foundation for all the new exciting developments and features on which we are working on. Stay tuned for future blogs about each of them in upcoming posts.\n\nIf you want to learn more about the upcoming changes reach us out on [Slack at this link](https://app.slack.com/client/T03L6DM9G).\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,dsl2" }, { "slug": "2022/learn-nextflow-in-2022", "title": "Learning Nextflow in 2022", "date": "2022-01-21T00:00:00.000Z", "content": "\nA lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow [Software Carpentries](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course and an excellent write-up by [23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1).\n\nWe have collated some links below from a diverse collection of resources to help you on your journey to learn Nextflow. Nextflow is a community-driven project - if you have any suggestions, please make a pull request to [this page on GitHub](https://github.com/nextflow-io/website/tree/master/content/blog/2022/learn-nextflow-in-2022.md).\n\nWithout further ado, here is the definitive guide for learning Nextflow in 2022. These resources will support anyone in the journey from total beginner to Nextflow expert.\n\n### Prerequisites\n\nBefore you start writing Nextflow pipelines, we recommend that you are comfortable with using the command-line and understand the basic concepts of scripting languages such as Python or Perl. Nextflow is widely used for bioinformatics applications, and scientific data analysis. The examples and guides below often focus on applications in these areas. However, Nextflow is now adopted in a number of data-intensive domains such as image analysis, machine learning, astronomy and geoscience.\n\n### Time commitment\n\nWe estimate that it will take at least 20 hours to complete the material. How quickly you finish will depend on your background and how deep you want to dive into the content. Most of the content is introductory but there are some more advanced dataflow and configuration concepts outlined in the workshop and pattern sections.\n\n### Contents\n\n- Why learn Nextflow?\n- Introduction to Nextflow from 23andMe\n- An RNA-Seq hands-on tutorial\n- Nextflow workshop from Seqera Labs\n- Software Carpentries Course\n- Managing Pipelines in the Cloud\n- The nf-core tutorial\n- Advanced implementation patterns\n- Awesome Nextflow\n- Further resources\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FAIR guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support** available from Seqera Labs. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow by 23andMe\n\nThis informative post begins with the basic concepts of Nextflow and builds towards how Nextflow is used at 23andMe. It includes a detailed use case for how 23andMe run their imputation pipeline in the cloud, processing over 1 million individuals per day with over 10,000 CPUs in a single compute environment.\n\n👉 [Nextflow at 23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1)\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, using channels to pass data around and writing processes to perform tasks. It includes all scripts, input data and resources and is perfect for getting a taste of Nextflow.\n\n👉 [Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Nextflow workshop from Seqera Labs\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n👉 [Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Software Carpentry workshop\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (in active development) motivates the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience are those with little programming experience, and the course provides a foundation to comfortably write and run Nextflow and nf-core workflows. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit [The Carpentries](https://carpentries.org/) style of training. The Carpentries emphasize feedback to improve teaching materials so we would like to hear back from you about what you thought was both well-explained and what needs improvement. Pull requests to the course material are very welcome.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n### 6. Managing Pipelines in the Cloud - GenomeWeb Webinar\n\nThis on-demand webinar features Phil Ewels from SciLifeLab and nf-core, Brendan Boufler from Amazon Web Services and Evan Floden from Seqera Labs. The wide ranging dicussion covers the significance of scientific workflow, examples of Nextflow in production settings and how Nextflow can be integrated with other processes.\n\n👉 [Watch the webinar](https://seqera.io/webinars-and-podcasts/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n### 7. Nextflow implementation patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along, as well as a GitHub repository.\n\n👉 [Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### 8. nf-core tutorials\n\nA tutorial covering the basics of using and creating nf-core pipelines. It provides an overview of the nf-core framework including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n👉 [nf-core usage tutorials](https://nf-co.re/usage/usage_tutorials)\nand [nf-core developer tutorials](https://nf-co.re/developers/developer_tutorials)\n\n### 9. Awesome Nextflow\n\nA collections of awesome Nextflow pipelines.\n\n👉 [Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) on GitHub\n\n### 10. Further resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\n👉 Latest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### Seqera Labs docs\n\nAn index of documentation, deployment guides, training materials and resources for all things Nextflow and Tower.\n\n👉 [Seqera Labs docs](https://seqera.io/docs)\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n👉 [nf-core website](https://nf-co.re) and [nf-core Slack](https://nf-co.re/join)\n\n#### Nextflow Tower\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface, CLI or API.\n\n👉 [Nextflow Tower](https://tower.nf) and [user documentation](http://help.tower.nf).\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n👉 [Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Nextflow Data Pipelines on Azure Batch\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guides below to set up both services on Azure, and to get your storage and batch account names and keys.\n\n👉 [Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n#### Running Nextflow by Google Cloud\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n👉 [Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Bonus: Nextflow Tutorial - Variant Calling Edition\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash-script into a Nextflow workflow and to modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io) and [Nextflow Tower](https://tower.nf)\n\n### Credits\n\nSpecial thanks to Mahesh Binzer-Panchal for reviewing the latest revision of this post and contributing the Software Carpentry workshop section.\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "learn,workshop,webinar" }, { "slug": "2022/nextflow-is-moving-to-slack", "title": "Nextflow’s community is moving to Slack!", "date": "2022-02-22T00:00:00.000Z", "content": "\n
\n\n“Software communities don’t just write code together. They brainstorm feature ideas, help new users get their bearings, and collaborate on best ways to use the software.…conversations need their own place\" - GitHub Satellite Blog 2020\n\n
\n\nThe Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.\n\nI still remember when a former colleague proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : \"would that not be a waste of time?\".\n\nFortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.\n\n### Where the future lies\n\nAs the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.\n\nFor these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!\n\nYou can join today using this link!\n\nOnce you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.\n\nAlong the same lines, the Nextflow discussion forum is moving from Google Groups to the Discussion forum in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.\n\nThe old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.\n\nIf you have any questions or problems signing up then please feel free to let us know at info@nextflow.io.\n\nAs always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!\n\nSee you on Slack!\n\n### Credits\n\nThis was also made possible thanks to sponsorship from the Chan Zuckerberg Initiative, the Slack for Nonprofits program and support from Seqera Labs.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "community, slack, github" }, { "slug": "2022/nextflow-summit-2022-recap", "title": "Nextflow Summit 2022 Recap", "date": "2022-11-03T00:00:00.000Z", "content": "\n## Three days of Nextflow goodness in Barcelona\n\nAfter a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!\n\nThe week-long event kicked off with 50 people participating in a hackathon organized by nf-core beginning on October 10th. The [hackathon](https://nf-co.re/events/2022/hackathon-october-2022) tackled several cutting-edge projects with developer teams focused on various aspects of nf-core including documentation, subworkflows, pipelines, DSL2 conversions, modules, and infrastructure. The Nextflow Summit began mid-week attracting nearly 600 people, including 165 attending in person and another 433 remotely. The [YouTube live streams](https://summit.nextflow.io/stream/) have now collected over two and half thousand views. Just prior to the summit, three virtual Nextflow training events were also run with separate sessions for the Americas, EMEA, and APAC in which 835 people participated.\n\n## An action-packed agenda\n\nThe three-day Nextflow Summit featured 33 talks delivered by speakers from academia, research, healthcare providers, biotechs, and cloud providers. This year’s speakers came from the following organizations:\n\n- Amazon Web Services\n- Center for Genomic Regulation\n- Centre for Molecular Medicine and Therapeutics, University of British Columbia\n- Chan Zukerberg Biohub\n- Curative\n- DNA Nexus\n- Enterome\n- Google\n- Janelia Research Campus\n- Microsoft\n- Oxford Nanopore\n- Quadram Institute BioScience\n- Seqera Labs\n- Quantitative Biology Center, University of Tübingen\n- Quilt Data\n- UNC Lineberger Comprehensive Cancer Center\n- Università degli Studi di Macerata\n- University of Maryland\n- Wellcome Sanger Institute\n- Wyoming Public Health Laboratory\n\n## Some recurring themes\n\nWhile there were too many excellent talks to cover individually, a few themes surfaced throughout the summit. Not surprisingly, SARS-Cov-2 was a thread that wound through several talks. Tony Zeljkovic from Curative led a discussion about [unlocking automated bioinformatics for large-scale healthcare](https://www.youtube.com/watch?v=JZMaRYzZxGU&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=8), and Thanh Le Viet of Quadram Institute Bioscience discussed [large-scale SARS-Cov-2 genomic surveillance at QIB](https://www.youtube.com/watch?v=6jQr9dDaais&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=30). Several speakers discussed best practices for building portable, modular pipelines. Other common themes were data provenance & traceability, data management, and techniques to use compute and storage more efficiently. There were also a few talks about the importance of dataflows in new application areas outside of genomics and bioinformatics.\n\n## Data provenance tracking\n\nIn the Thursday morning keynote, Rob Patro﹘Associate Professor at the University of Maryland Dept. of Computer Science and CTO and co-founder of Ocean Genomics﹘described in his talk “[What could be next(flow)](https://www.youtube.com/watch?v=vNrKFT5eT8U&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=6),” how far the Nextflow community had come in solving problems such as reproducibility, scalability, modularity, and ease of use. He then challenged the community with some complex issues still waiting in the wings. He focused on data provenance as a particularly vexing challenge explaining how tremendous effort currently goes into manual metadata curation.\n\nRob offered suggestions about how Nextflow might evolve, and coined the term “augmented execution contexts” (AECs) drawing from his work on provenance tracking – answering questions such as “what are these files, and where did they come from.” This thinking is reflected in [tximeta](https://github.com/mikelove/tximeta), a project co-developed with Mike Love of UNC. Rob also proposed ideas around automating data format conversions analogous to type casting in programming languages explaining how such conversions might be built into Nextflow channels to make pipelines more interoperable.\n\nIn his talk with the clever title “[one link to rule them all](https://www.youtube.com/watch?v=dttkcuP3OBc&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=13),” Aneesh Karve of Quilt explained how every pipeline run is a function of the code, environment, and data, and went on to show how Quilt could help dramatically simplify data management with dataset versioning, accessibility, and verifiability. Data provenance and traceability were also front and center when Yih-Chii Hwang of DNAnexus described her team’s work around [bringing GxP compliance to Nextflow workflows](https://www.youtube.com/watch?v=RIwpJTDlLiE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=21).\n\n## Data management and storage\n\nOther speakers also talked about challenges related to data management and performance. Angel Pizarro of AWS gave an interesting talk comparing the [price/performance of different AWS cloud storage options](https://www.youtube.com/watch?v=VXtYCAqGEQQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=12). [Hatem Nawar](https://www.youtube.com/watch?v=jB91uqUqsRM&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=9) (Google) and [Venkat Malladi](https://www.youtube.com/watch?v=GAIL8ZAMJPQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=20) (Microsoft) also talked about cloud economics and various approaches to data handling in their respective clouds. Data management was also a key part of Evan Floden’s discussion about Nextflow Tower where he discussed Tower Datasets, as well as the various cloud storage options accessible through Nextflow Tower. Finally, Nextflow creator Paolo Di Tommaso unveiled new work being done in Nextflow to simplify access to data residing in object stores in his talk “[Nextflow and the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14)”.\n\n## Compute optimization\n\nAnother recurring theme was improving compute efficiency. Several talks discussed using containers more effectively, leveraging GPUs & FPGAs for added performance, improving virtual machine instance type selection, and automating resource requirements. Mike Smoot of Illumina talked about Nextflow, Kubernetes, and DRAGENs and how Illumina’s FPGA-based Bio-IT Platform can dramatically accelerate analysis. Venkat Malladi discussed efforts to suggest optimal VM types based on different standardized nf-core labels in the Azure cloud (process_low, process_medium, process_high, etc.) Finally, Evan Floden discussed [Nextflow Tower](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22) and unveiled an exciting new [resource optimization feature](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/) that can intelligently tune pipeline resource requests to radically reduce cloud costs and improve run speed. Overall, the Nextflow community continues to make giant strides in improving efficiency and managing costs in the cloud.\n\n## Beyond genomics\n\nWhile most summit speakers focused on genomics, a few discussed data pipelines in other areas, including statistical modeling, analysis, and machine learning. Nicola Visonà from Università degli Studi di Macerata gave a fascinating talk about [using agent-based models to simulate the first industrial revolution](https://www.youtube.com/watch?v=PlKJ0IDV_ds&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=27). Similarly, Konrad Rokicki from the Janelia Research Campus explained how Janelia are using [Nextflow for petascale bioimaging data](https://www.youtube.com/watch?v=ZjSzx1I76z0&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=18) and why bioimage processing remains a large domain area with an unmet need for reproducible workflows.\n\n## Summit Announcements\n\nThis year’s summit also saw several exciting announcements from Nextflow developers. Paolo Di Tommaso, during his talk on [the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14), announced the availability of [Nextflow 22.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v22.10.0). In addition to various bug fixes, the latest Nextflow release introduces an exciting new technology called Wave that allows containers to be built on the fly from Dockerfiles or Conda recipes saved within a Nextflow pipeline. Wave also helps to simplify containerized pipeline deployment with features such as “container augmentation”; enabling developers to inject new container scripts and functionality on the fly without needing to rebuild the base containers such as a cloud-native [Fusion file system](https://www.nextflow.io/docs/latest/fusion.html). When used with Nextflow Tower, Wave also simplifies authentication to various public and private container registries. The latest Nextflow release also brings improved support for Kubernetes and enhancements to documentation, along with many other features.\n\nSeveral other announcements were made during [Evan Floden’s talk](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22&t=127s), such as:\n\n- MultiQC is joining the Seqera Labs family of products\n- Fusion – a distributed virtual file system for cloud-native data pipelines\n- Nextflow Tower support for Google Cloud Batch\n- Nextflow Tower resource optimization\n- Improved Resource Labels support in Tower with integrations for cost accounting with all major cloud providers\n- A new Nextflow Tower dashboard coming soon, providing visibility across workspaces\n\n## Thank you to our sponsors\n\nThe summit organizers wish to extend a sincere thank you to the event sponsors: AWS, Google Cloud, Seqera Labs, Quilt Data, Oxford Nanopore Technologies, and Element BioSciences. In addition, the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/) continues to play a key role with their EOSS grants funding important work related to Nextflow and the nf-core community. The success of this year’s summit reminds us of the tremendous value of community and the critical impact of open science software in improving the quality, accessibility, and efficiency of scientific research.\n\n## Learning more\n\nFor anyone who missed the summit, you can still watch the sessions or view the training sessions at your convenience:\n\n- Watch post-event recordings of the [Nextflow Summit on YouTube](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32)\n- View replays of the recent online [Nextflow and nf-core training](https://nf-co.re/events/2022/training-october-2022)\n\nFor additional detail on the summit and the preceding nf-core events, also check out an excellent [summary of the event](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/) written by Marcel Ribeiro-Dantas in his blog, the [Dataist Storyteller](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/)!\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)\n", - "images": [] + "images": [], + "author": "Noel Ortiz", + "tags": "nextflow,tower,cloud" }, { "slug": "2022/nextflow-summit-call-for-abstracts", "title": "Nextflow Summit 2022", "date": "2022-06-17T00:00:00.000Z", "content": "\n[As recently announced](https://twitter.com/nextflowio/status/1534903352810676224), we are super excited to host a new Nextflow community event late this year! The Nextflow Summit will take place **October 12-14, 2022** at the iconic Torre Glòries in Barcelona, with an associated [nf-core hackathon](https://nf-co.re/events/2022/hackathon-october-2022) beforehand.\n\n### Call for abstracts\n\nToday we’re excited to open the call for abstracts! We’re looking for talks and posters about anything and everything happening in the Nextflow world. Specifically, we’re aiming to shape the program into four key areas:\n\n- Nextflow: central tool / language / plugins\n- Community: pipelines / applications / use cases\n- Ecosystem: infrastructure / environments\n- Software: containers / tool packaging\n\nSpeaking at the summit will primarily be in-person, but we welcome posters from remote attendees. Posters will be submitted digitally and available online during and after the event. Talks will be streamed live and be available after the event.\n\n

\n Apply for a talk or poster\n

\n\n### Key dates\n\nRegistration for the event will happen separately, with key dates as follows (subject to change):\n\n- Jun 17: Call for abstracts opens\n- July 1: Registration opens\n- July 22: Call for abstracts closes\n- July 29: Accepted speakers notified\n- Sept 9: Registration closes\n- Oct 10-12: Hackathon\n- Oct 12-14: Summit\n\nAbstracts will be read and speakers notified on a rolling basis, so apply soon!\n\nThe Nextflow Summit will start Weds, Oct 12, 5:00 PM CEST and close Fri, Oct 14, 1:00 PM CEST.\n\n### Travel bursaries\n\nThanks to funding from the Chan Zuckerberg Initiative [EOSS Diversity & Inclusion grant](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), we are offering 5 bursaries for travel and accommodation. These will only be available to those who have applied to present a talk or poster and will cover up to $1500 USD, plus registration costs.\n\nIf you’re interested, please select this option when filling the abstracts application form and we will be in touch with more details.\n\n### Stay in the loop\n\nMore information about the summit will be available soon, as we continue to plan the event. Please visit [https://summit.nextflow.io](https://summit.nextflow.io) for details and to sign up to the email list for event updates.\n\n

\n Subscribe for updates\n

\n\nWe will be tweeting about the event using the [#NextflowSummit](http://twitter.com/hashtag/NextflowSummit) hashtag on Twitter. See you in Barcelona!\n", - "images": [] + "images": [], + "author": "Phil Ewels", + "tags": "nextflow,summit,event,hackathon" }, { "slug": "2022/rethinking-containers-for-cloud-native-pipelines", "title": "Rethinking containers for cloud native pipelines", "date": "2022-10-13T00:00:00.000Z", "content": "\nContainers have become an essential part of well-structured data analysis pipelines. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and [reproducible results](https://www.nature.com/articles/nbt.3820).\n\nNextflow was one of the first workflow technologies to fully embrace [containers](https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html) for data analysis pipelines. Community curated container collections such as [BioContainers](https://biocontainers.pro/) also helped speed container adoption.\n\nHowever, the increasing complexity of data analysis pipelines and the need to deploy them across different clouds and platforms pose new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task.\n\nAlso, multi-cloud deployments and the increased use of private container registries further increase complexity for developers. Building and maintaining containers, pushing them to multiple registries, and dealing with platform-specific authentication schemes are tedious, time consuming, and a source of potential errors.\n\n## Wave – a game changer\n\nFor these reasons, we decided to fundamentally rethink how containers are deployed and managed in Nextflow. Today we are thrilled to announce Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.\n\nInstead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself. This approach helps simplify development, improves reliability, and makes pipelines easier to maintain. It can even improve pipeline performance.\n\n## How container provisioning works with Wave\n\nInstead of creating container images, pushing them to registries, and referencing them using Nextflow's [container](https://www.nextflow.io/docs/latest/process.html#container) directive, Wave allows developers to simply include a Dockerfile in the directory where a process is defined.\n\nWhen a process runs, the new Wave plug-in for Nextflow takes the Dockerfile and submits it to the Wave service. Wave then builds a container on-the-fly, pushes it to a destination container registry, and returns the container used for the actual process execution. The Wave service also employs caching at multiple levels to ensure that containers are built only once or when there is a change in the corresponding Dockerfile.\n\nThe registry where images are stored can be specified in the Nextflow config file, along with the other pipeline settings. This means containers can be served from cloud registries closer to where pipelines execute, delivering better performance and reducing network traffic.\n\n![Wave diagram](/img/wave-diagram.png)\n\n## Nextflow, Wave, and Conda – a match made in heaven\n\n[Conda](https://conda.io/) is an excellent package manager, fully [supported in Nextflow](https://www.nextflow.io/blog/2018/conda-support-has-landed.html) as an alternative to using containers to manage software dependencies in pipelines. However, until now, Conda could not be easily used in cloud-native computing platforms such as AWS Batch or Kubernetes.\n\nWave provides developers with a powerful new way to leverage Conda in Nextflow by using a [conda](https://www.nextflow.io/docs/latest/process.html#conda) directive as an alternative way to provision containers in their pipelines. When Wave encounters the `conda` directive in a process definition, and no container or Dockerfile is present, Wave automatically builds a container based on the Conda recipe using the strategy described above. Wave makes this process exceptionally fast (at least compared to vanilla Conda) by leveraging with the [Micromamba](https://github.com/mamba-org/mamba) project under the hood.\n\n## Support for private registries\n\nA long-standing problem with containers in Nextflow was the lack of support for private container registries. Wave solves this problem by acting as an authentication proxy between the Docker client requesting the container and a target container repository. Wave relies on [Nextflow Tower](https://seqera.io/tower/) to authenticate user requests to container registries.\n\nTo access private container registries from a Nextflow pipeline, developers can simply specify their Tower access token in the pipeline configuration file and store their repository credentials in [Nextflow Tower](https://help.tower.nf/22.2/credentials/overview/) page in your account. Wave will automatically and securely use these credentials to authenticate to the private container registry.\n\n## But wait, there's more! Container augmentation!\n\nBy automatically building and provisioning containers, Wave dramatically simplifies how containers are handled in Nextflow. However, there are cases where organizations are required to use validated containers for security or policy reasons rather than build their own images, but still they need to provide additional functionality, like for example, adding site-specific scripts or logging agents while keeping the base container layers intact.\n\nNextflow allows for the definition of pipeline level (and more recently module level) scripts executed in the context of the task execution environment. These scripts can be made accessible to the container environment by mounting a host volume. However, this approach only works when using a local or shared file system.\n\nWave solves these problems by dynamically adding one or more layers to an existing container image during the container image download phase from the registry. Developers can use container augmentation to inject an arbitrary payload into any container without re-building it. Wave then recomputes the image's final manifest adding new layers and checksums on-the-fly, so that the final downloaded image reflects the added content.\n\nWith container augmentation, developers can include a directory called `resources` in pipeline [module directories](https://www.nextflow.io/docs/latest/dsl2.html#module-directory). When the corresponding containerized task is executed, Wave automatically mirrors the content of the resources directory in the root path of the container where it can be accessed by scripts running within the container.\n\n## A sneak preview of Fusion file system\n\nOne of the main motivations for implementing Wave is that we wanted to have the ability to easily package a Fusion client in containers to make this important functionality readily available in Nextflow pipelines.\n\nFusion implements a virtual distributed file system and presents a thin-client allowing data hosted in AWS S3 buckets to be accessed via the standard POSIX filesystem interface expected by the pipeline tools. This client runs in the task container and is added automatically via the Wave augmentation capability. This makes Fusion functionality available for pipeline execution at runtime.\n\nThis means the Nextflow pipeline can use an AWS S3 bucket as the work directory, and pipeline tasks can access the S3 bucket natively as a local file system path. This is an important innovation as it avoids the additional step of copying files in and out of object storage. Fusion takes advantage for the Nextflow tasks segregation and idempotent execution model to optimise and speedup file access operations.\n\n## Getting started\n\nWave requires Nextflow version 22.10.0 or later and can be enabled by using the `-with-wave` command line option or by adding the following snippet in your nextflow.config file:\n\n```\nwave {\n enabled = true\n strategy = 'conda,container'\n}\n\ntower {\n accessToken = \"\"\n}\n```\n\nThe use of the Tower access token is not mandatory, however, it's required to enable the access to private repositories. The use of authentication also allows higher service rate limits compared to anonymous users. You can run a Nextflow pipeline such as rnaseq-nf with Wave, as follows:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-wave\n```\n\nThe configuration in the nextflow.config snippet above will enable the provisioning of Wave containers created starting from the `conda` requirements specified in the pipeline processes.\n\nYou can find additional information and examples in the Nextflow [documentation](https://www.nextflow.io/docs/latest/wave.html) and in the Wave [showcase project](https://github.com/seqeralabs/wave-showcase).\n\n## Availability\n\nThe Wave container provisioning service is available free of charge as technology preview to all Nextflow and Tower users. Wave supports all major container registries including [Docker Hub](https://hub.docker.com/), [Quay.io](https://quay.io/), [AWS Elastic Container Registry](https://aws.amazon.com/ecr/), [Google Artifact Registry](https://cloud.google.com/artifact-registry) and [Azure Container Registry](https://azure.microsoft.com/en-us/products/container-registry/).\n\nDuring the preview period, anonymous users can build up to 10 container images per day and pull 100 containers per hour. Tower authenticated users can build 100 container images per hour and pull 1000 containers per minute. After the preview period, we plan to make the Wave service available free of charge to academic users and open-source software (OSS) projects.\n\n## Conclusion\n\nSoftware containers greatly simplify the deployment of complex data analysis pipelines. However, there still have been many challenges preventing organizations from fully unlocking the potential of this exciting technology. For too long, containers have been viewed as a replacement for package managers, but they serve a different purpose.\n\nIn our view, it's time to re-consider containers as monolithic artifacts that are assembled separately from pipeline code. Instead, containers should be viewed simply as an execution substrate facilitating the deployment of the pipeline software dependencies defined via a proper package manager such as Conda.\n\nWave, Nextflow, and Nextflow Tower combine to fully automate the container lifecycle including management, provisioning and dependencies of complex data pipelines on-demand while removing unnecessary error-prone manual steps.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow,tower,cloud" }, { "slug": "2022/turbocharging-nextflow-with-fig", "title": "Turbo-charging the Nextflow command line with Fig!", "date": "2022-09-22T00:00:00.000Z", "content": "\nNextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn't be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options.\n\n![xkcd charge making fun of tar tricky command line arguments](/img/xkcd_tar_charge.png)\nhttps://xkcd.com/1168/\n\nMany tools have sprung up to make the command-line more user friendly, such as tldr pages and rich-click. [Fig](https://fig.io) is one such tool that adds powerful autocomplete functionality to your terminal. Fig gives you graphical popups with color-coded contexts more dynamic than shaded text for recent commands or long blocks of text after pressing tab.\n\nFig is compatible with most terminals, shells and IDEs (such as the VSCode terminal), is fully supported in MacOS, and has beta support for Linux and Windows. In MacOS, you can simply install it with `brew install --cask fig` and then running the `fig` command to set it up.\n\nWe have now added Nextflow for Fig. Thanks to Figs open source core we were able to contribute specifications in Typescript that will now be automatically added for anyone installing or updating Fig. Now, with Fig, when you start typing your Nextflow commands, you’ll see autocomplete suggestions based on what you are typing and what you have typed in the past, such as your favorite options.\n\n![GIF with a demo of nextflow log/list subcommands](/img/nxf-log-list-params.gif)\n\nThe Fig autocomplete functionality can also be adjusted to suit our preferences. Suggestions can be displayed in alphabetical order or as a list of your most recent commands. Similarly, suggestions can be displayed all the time or only when you press tab.\n\nThe Fig specification that we've written not only suggests commands and options, but dynamic inputs too. For example, finding previous run names when resuming or cleaning runs is tedious and error prone. Similarly, pipelines that you’ve already downloaded with `nextflow pull` will be autocompleted if they have been run in the past. You won't have to remember the full names anymore, as Fig generators in the autocomplete allow you to automatically complete the run name after typing a few letters where a run name is expected. Importantly, this also works for pipeline names!\n\n![GIF with a demo of nextflow pull/run/clean/view/config subcommands](/img/nxf-pull-run-clean-view-config.gif)\n\nFig for Nextflow will make you increase your productivity regardless of your user level. If you run multiple pipelines during your day you will immediately see the benefit of Fig. Your productivity will increase by taking advantage of this autocomplete function for run and project names. For Nextflow newcomers it will provide an intuitive way to explore the Nextflow CLI with built-in help text.\n\nWhile Fig won’t replace the need to view help menus and documentation it will undoubtedly save you time and energy searching for commands and copying and pasting run names. Take your coding to the next level using Fig!\n", - "images": [] + "images": [], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,development,learning" }, { "slug": "2023/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer", @@ -390,7 +496,9 @@ "content": "\nMost support tickets crossing our desks don’t warrant a blog article. However, occasionally we encounter a genuine mystery—a bug so pervasive and vile that it threatens innocent containers and pipelines everywhere. Such was the case of the **_OOM killer_**.\n\nIn this article, we alert our colleagues in the Nextflow community to the threat. We also discuss how to recognize the killer’s signature in case you find yourself dealing with a similar murder mystery in your own cluster or cloud.\n\n\n\n## To catch a killer\n\nIn mid-2022, Nextflow jobs began to mysteriously die. Containerized tasks were being struck down in the prime of life, seemingly at random. By November, the body count was beginning to mount: Out-of-memory (OOM) errors were everywhere we looked!\n\nIt became clear that we had a serial killer on our hands. Unfortunately, identifying a suspect turned out to be easier said than done. Nextflow is rather good at restarting failed containers after all, giving the killer a convenient alibi and plenty of places to hide. Sometimes, the killings went unnoticed, requiring forensic analysis of log files.\n\nWhile we’ve made great strides, and the number of killings has dropped dramatically, the killer is still out there. In this article, we offer some tips that may prove helpful if the killer strikes in your environment.\n\n## Establishing an MO\n\nFortunately for our intrepid investigators, the killer exhibited a consistent _modus operandi_. Containerized jobs on [Amazon EC2](https://aws.amazon.com/ec2/) were being killed due to out-of-memory (OOM) errors, even when plenty of memory was available on the container host. While we initially thought the killer was native to the AWS cloud, we later realized it could also strike in other locales.\n\nWhat the killings had in common was that they tended to occur when Nextflow tasks copied large files from Amazon S3 to a container’s local file system via the AWS CLI. As some readers may know, Nextflow leverages the AWS CLI behind the scenes to facilitate data movement. The killer’s calling card was an `[Errno 12] Cannot allocate memory` message, causing the container to terminate with an exit status of 1.\n\n```\nNov-08 21:54:07.926 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:MARKDUPLICATES:BAM_TO_CRAM:SAMTOOLS_STATS_CRAM (004-005_L3.SSHT82)'\nCaused by:\n Essential container in task exited\n..\nCommand error:\n download failed: s3://myproject/NFTower-Ref/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory\n```\n\nThe problem is illustrated in the diagram below. In theory, Nextflow should have been able to dispatch multiple containerized tasks to a single host. However, tasks were being killed with out-of-memory errors even though plenty of memory was available. Rather than being able to run many containers per host, we could only run two or three and even that was dicey! Needless to say, this resulted in a dramatic loss of efficiency.\n\n\n\nAmong our crack team of investigators, alarm bells began to ring. We asked ourselves, _“Could the killer be inside the house?”_ Was it possible that Nextflow was nefariously killing its own containerized tasks?\n\nBefore long, reports of similar mysterious deaths began to trickle in from other jurisdictions. It turned out that the killer had struck [Cromwell](https://cromwell.readthedocs.io/en/stable/) also ([see the police report here](https://github.com/aws/aws-cli/issues/5876)). We breathed a sigh of relief that we could rule out Nextflow as the culprit, but we still had a killer on the loose and a series of container murders to solve!\n\n## Recreating the scene of the crime\n\nAs any good detective knows, recreating the scene of the crime is a good place to start. It turned out that our killer had a profile and had been targeting containers processing large datasets since 2020. We came across an excellent [codefresh.io article](https://codefresh.io/blog/docker-memory-usage/) by Saffi Hartal, discussing similar murders and suggesting techniques to lure the killer out of hiding and protect the victims. Unfortunately, the suggested workaround of periodically clearing kernel buffers was impractical in our Nextflow pipeline scenario.\n\nWe borrowed the Python script from [Saffi’s article](https://codefresh.io/blog/docker-memory-usage/) designed to write huge files and simulate the issues we saw with the Linux buffer and page cache. Using this script, we hoped to replicate the conditions at the time of the murders.\n\nUsing separate SSH sessions to the same docker host, we manually launched the Python script from the command line to run in a Docker container, allocating 512MB of memory to each container. This was meant to simulate the behavior of the Nextflow head job dispatching multiple tasks to the same Docker host. We monitored memory usage as each container was started.\n\n```bash\n$ docker run --rm -it -v $PWD/dockertest.py:/dockertest.py --entrypoint /bin/bash --memory=\"512M\" --memory-swap=0 python:3.10.5-slim-bullseye\n```\n\nSure enough, we found that containers began dying with out-of-memory errors. Sometimes we could run a single container, and sometimes we could run two. Containers died even though memory use was well under the cgroups-enforced maximum, as reported by docker stats. As containers ran, we also used the Linux `free` command to monitor memory usage and the combined memory used by kernel buffers and the page cache.\n\n## Developing a theory of the case\n\nFrom our testing, we were able to clear both Nextflow and the AWS S3 copy facility since we could replicate the out-of-memory error in our controlled environment independent of both.\n\nWe had multiple theories of the case: **_Was it Colonel Mustard with an improper cgroups configuration? Was it Professor Plum and the size of the SWAP partition? Was it Mrs. Peacock running a Linux 5.20 kernel?_**\n\n_For the millennials and Gen Zs in the crowd, you can find a primer on the CLUE/Cluedo references [here](https://en.wikipedia.org/wiki/Cluedo)_\n\nTo make a long story short, we identified several suspects and conducted tests to clear each suspect one by one. Tests included the following:\n\n- We conducted tests with EBS vs. NVMe disk volumes to see if the error was related to page caches when using EBS. The problems persisted with NVMe but appeared to be much less severe.\n- We attempted to configure a swap partition as recommended in this [AWS article](https://repost.aws/knowledge-center/ecs-resolve-outofmemory-errors), which discusses similar out-of-memory errors in Amazon ECS (used by AWS Batch). AWS provides good documentation on managing container [swap space](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-swap.html) using the `--memory-swap` switch. You can learn more about how Docker manages swap space in the [Docker documentation](https://docs.docker.com/config/containers/resource_constraints/).\n- Creating swap files on the Docker host and making swap available to containers using the switch `--memory-swap=\"1g\"` appeared to help, and we learned a lot in the process. Using this workaround we could reliably run 10 containers simultaneously, whereas previously, we could run only one or two. This was a good workaround for static clusters but wasn’t always helpful in cloud batch environments. Creating the swap partition requires root privileges, and in batch environments, where resources may be provisioned automatically, this could be difficult to implement. It also didn’t explain the root cause of why containers were being killed. You can use the commands below to create a swap partition:\n\n```bash\n$ sudo dd if=/dev/zero of=/mnt/2GiB.swap bs=2048 count=1048576\n$ mkswap /mnt/2GiB.swap\n$ swapon /mnt/2GiB.swap\n```\n\n## A break in the case!\n\nOn Nov 16th, we finally caught a break in the case. A hot tip from Seqera Lab’s own [Jordi Deu-Pons](https://github.com/jordeu), indicated the culprit may be lurking in the Linux kernel. He suggested hard coding limits for two Linux kernel parameters as follows:\n\n```bash\n$ echo \"838860800\" > /proc/sys/vm/dirty_bytes\n$ echo \"524288000\" > /proc/sys/vm/dirty_background_bytes\n```\n\nWhile it may seem like a rather unusual and specific leap of brilliance, our tipster’s hypothesis was inspired by this [kernel bug](https://bugzilla.kernel.org/show_bug.cgi?id=207273) description. With this simple change, the reported memory usage for each container, as reported by docker stats, dropped dramatically. **Suddenly, we could run as many containers simultaneously as physical memory would allow.** It turns out that this was a regression bug that only manifested in newer versions of the Linux kernel.\n\nBy hardcoding these [kernel parameters](https://docs.kernel.org/admin-guide/sysctl/vm.html), we were limiting the number of dirty pages the kernel could hold before writing pages to disk. When these variables were not set, they defaulted to 0, and the default parameters `dirty_ratio` and `dirty_bakground_ratio` took effect instead.\n\nIn high-load conditions (such as data-intensive Nextflow pipeline tasks), processes accumulated dirty pages faster than the kernel could flush them to disk, eventually leading to the out-of-memory condition. By hard coding the dirty pages limit, we forced the kernel to flush the dirty pages to disk, thereby avoiding the bug. This also explained why the problem was less pronounced using NVMe storage, where flushing to disk occurred more quickly, thus mitigating the problem.\n\nFurther testing determined that the bug appeared reliably on the newer [AMI Linux 2 AMI using the 5.10 kernel](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-linux-2-ami-kernel-5-10/). The bug did not seem to appear when using the older Amazon Linux 2 AMI running the 4.14 kernel version.\n\nWe now had two solid strategies to resolve the problem and thwart our killer:\n\n- Create a swap partition and run containers with the `--memory-swap` flag set.\n- Set `dirty_bytes` and `dirty_background_bytes` kernel variables on the Docker host before launching the jobs.\n\n## The killer is (mostly) brought to justice\n\nAvoiding the Linux 5.10 kernel was obviously not a viable option. The 5.10 kernel includes support for important processor architectures such as Intel® Ice Lake. This bug did not manifest earlier because, by default, AWS Batch was using ECS-optimized AMIs based on the 4.14 kernel. Further testing showed us that the killer could still appear in 4.14 environments, but the bug was harder to trigger.\n\nWe ended up working around the problem for Nextflow Tower users by tweaking the kernel parameters in the compute environment deployed by Tower Forge. This solution works reliably with AMIs based on both the 4.14 and 5.10 kernels. We considered adding a swap partition as this was another potential solution to the problem. However, we were concerned that this could have performance implications, particularly for customers running with EBS gp2 magnetic disk storage.\n\nInterestingly, we also tested the [Fusion v2 file system](https://seqera.io/fusion/) with NVMe disk. Using Fusion, we avoided the bug entirely on both kernel versions without needing to adjust kernel partitions or add a swap partition.\n\n## Some helpful investigative tools\n\nIf you find evidence of foul play in your cloud or cluster, here are some useful investigative tools you can use:\n\n- After manually starting a container, use [docker stats](https://docs.docker.com/engine/reference/commandline/stats/) to monitor the CPU and memory used by each container compared to available memory.\n\n ```bash\n $ watch docker stats\n ```\n\n- The Linux [free](https://linuxhandbook.com/free-command/) utility is an excellent way to monitor memory usage. You can track total, used, and free memory and monitor the combined memory used by kernel buffers and page cache reported in the _buff/cache_ column.\n\n ```bash\n $ free -h\n ```\n\n- After a container was killed, we executed the command below on the Docker host to confirm why the containerized Python script was killed.\n\n ```bash\n $ dmesg -T | grep -i ‘killed process’\n ```\n\n- We used the Linux [htop](https://man7.org/linux/man-pages/man1/htop.1.html) command to monitor CPU and memory usage to check the results reported by Docker and double-check CPU and memory use.\n- You can use the command [systemd-cgtop](https://www.commandlinux.com/man-page/man1/systemd-cgtop.1.html) to validate group settings and ensure you are not running into arbitrary limits imposed by _cgroups_.\n- Related to the _cgroups_ settings described above, you can inspect various memory-related limits directly from the file system. You can also use an alias to make the large numbers associated with _cgroups_ parameters easier to read. For example:\n\n ```bash\n $ alias n='numft --to=iec-i'\n $ cat /sys/fs/cgroup/memory/docker/DOCKER_CONTAINER/memory.limit_in_bytes | n\n 512Mi\n ```\n\n- You can clear the kernel buffer and page cache that appears in the buff/cache columns reported by the Linux _free_ command using either of these commands:\n\n ```bash\n $ echo 1 > /proc/sys/vm/drop_caches\n $ sysctl -w vm.drop_caches=1\n ```\n\n## The bottom line\n\nWhile we’ve come a long way in bringing the killer to justice, out-of-memory issues still crop up occasionally. It’s hard to say whether these are copycats, but you may still run up against this bug in a dark alley near you!\n\nIf you run into similar problems, hopefully, some of the suggestions offered above, such as tweaking kernel parameters or adding a swap partition on the Docker host, can help.\n\nFor some users, a good workaround is to use the [Fusion file system](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) instead of Nextflow’s conventional approach based on the AWS CLI. As explained above, the combination of more efficient data handling in Fusion and fast NVMe storage means that dirty pages are flushed more quickly, and containers are less likely to reach hard limits and exit with an out-of-memory error.\n\nYou can learn more about the Fusion file system by downloading the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/). If you encounter similar issues or have ideas to share, join the discussion on the [Nextflow Slack channel](https://join.slack.com/t/nextflow/shared_invite/zt-11iwlxtw5-R6SNBpVksOJAx5sPOXNrZg).\n", "images": [ "/img/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer-1.jpg" - ] + ], + "author": "Graham Wright", + "tags": "nextflow" }, { "slug": "2023/best-practices-deploying-pipelines-with-hpc-workload-managers", @@ -400,7 +508,9 @@ "images": [ "/img/nextflow-on-big-iron-twelve-tips-for-improving-the-effectiveness-of-pipelines-on-hpc-clusters-1.jpg", "/img/nextflow-on-big-iron-twelve-tips-for-improving-the-effectiveness-of-pipelines-on-hpc-clusters-2.jpg" - ] + ], + "author": "Gordon Sissons", + "tags": "nextflow" }, { "slug": "2023/celebrating-our-largest-international-training-event-and-hackathon-to-date", @@ -412,7 +522,9 @@ "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-2.jpg", "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-3.jpg", "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-4.jpg" - ] + ], + "author": "Phil Ewels", + "tags": "nextflow" }, { "slug": "2023/community-forum", @@ -421,7 +533,9 @@ "content": "\nWe are very excited to introduce the [Seqera community forum](https://community.seqera.io/) - the new home of the Nextflow community!\n\n

community.seqera.io

\n\nThe growth of the Nextflow community over recent years has been phenomenal. The Nextflow Slack organization was launched in early 2022 and has already reached a membership of nearly 3,000 members. As we look ahead to growing to 5,000 and even 50,000, we are making a new tool available to the community: a community forum.\n\nWe expect the new forum to coexist with the Nextflow Slack. The forum will be great at medium-format discussion, whereas Slack is largely designed for short-term ephemeral conversations. We want to support this growth of the community and believe the new forum will allow us to scale.\n\nDiscourse is an open-source, web-based platform designed for online community discussions and forum-style interactions. Discourse offers a user-friendly interface, real-time notifications, and a wide range of customization options. It prioritizes healthy and productive conversations by promoting user-friendly features, such as trust levels, gamification, and robust moderation tools. Discourse is well known for its focus on fostering engaging and respectful discussions and already caters to many large developer communities. It’s able to serve immense groups, giving us confidence that it will meet the needs of our growing developer community just as well. We believe that Discourse is a natural fit for the evolution of the Nextflow community.\n\n

\n\nThe community forum offers many exciting new features. Here are some of the things you can expect:\n\n- **Open content:** Content on the new forum is public – accessible without login, indexed by Google, and can be linked to directly. This means that it will be much easier to find answers to your problems, as well as share solutions on other platforms.\n- **Don’t ask the same thing twice:** It’s not always easy to find answers when there’s a lot of content available. The community forum helps you by suggesting similar topics as you write a new post. An upcoming [Discourse AI Bot](https://www.discourse.org/plugins/ai.html) may even allow you to ask questions using natural language in the future!\n- **Stay afloat:** The community forum will ensure developers have a space where they can post without fear that what they write might be drowned out, and where anything that our community finds useful will rise to the top of the list. Discourse will give life to threads with high-quality content that may have otherwise gone unnoticed and lost in a sea of new posts.\n- **Better organized:** The forum model for categories, tags, threads, and quoting forces conversations to be structured. Many questions involve the broader Nextflow ecosystem, tagging with multiple topics will cut through the noise and allow people to participate in targeted and well-labeled discussions. Importantly, maintainers can move miscategorized posts without asking the original author to delete and write again.\n- **Multi-product:** The forum has categories for Nextflow but also [Seqera Platform](https://seqera.io/platform/), [MultiQC](https://seqera.io/multiqc/), [Wave](https://seqera.io/wave/), and [Fusion](https://seqera.io/fusion/). Questions that involve multiple Seqera products can now span these boundaries, and content can be shared between posts easily.\n- **Community recognition:** The community forum will encourage a healthy ecosystem of developers that provides value to everyone involved and rewards the most active users. The new forum encourages positive community behaviors through features such as badges, a trust system, and community moderation. There’s even a [community leaderboard](https://community.seqera.io/leaderboard/)! We plan to gradually introduce additional features over time as adoption grows.\n\nOnline discussion platforms have been the beating heart of the Nextflow community from its inception. The first was a Google groups email list, which was followed by the Gitter instant messaging platform, GitHub Discussions, and most recently, Slack. We’re thrilled to embark on this new chapter of the Nextflow community – let us know what you think and ask any questions you might have in the [“Site Feedback” forum category](https://community.seqera.io/c/community/site-feedback/2)! Join us today at [https://community.seqera.io](https://community.seqera.io/) for a new and improved developer experience.\n\n

Visit the Seqera community forum

\n", "images": [ "/img/seqera-community-all.png" - ] + ], + "author": "Phil Ewels", + "tags": "nextflow,community" }, { "slug": "2023/czi-mentorship-round-2", @@ -430,7 +544,9 @@ "content": "\n## Introduction\n\n
\n \"Mentorship\n

Nextflow and nf-core mentorship rocket.

\n
\n\nThe global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation.\n\nThanks to the support of our Chan Zuckerberg Initiative Diversity and Inclusion grant, we established an international Nextflow and nf-core mentoring program. With the second round of the mentorship program now complete, we celebrate the success of the most recent cohort of mentors and mentees.\n\nFrom hundreds of applications, thirteen pairs of mentors and mentees were chosen for the second round of the program. For the past four months, they met regularly to collaborate on Nextflow or nf-core projects. The project scope was left up to the mentees, enabling them to work on any project aligned with their scientific interests and schedules.\n\nMentor-mentee pairs worked on a range of projects that included learning Nextflow and nf-core fundamentals, setting up Nextflow on their institutional clusters, translating Nextflow training material into other languages, and developing and implementing Nextflow and nf-core pipelines. Impressively, despite many mentees starting the program with very limited knowledge of Nextflow and nf-core, they completed the program with confidence and improved their abilities to develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor and mentee pairs](/img/mentorships-round2-map.png)
\n_The second round of the mentorship program was global._\n\n## Jing Lu (Mentee) & Moritz Beber (Mentor)\n\nJing joined the program with the goal of learning how to develop advanced Nextflow pipelines for disease surveillance at the Guangdong Provincial Center for Diseases Control and Prevention in China. His mentor was Moritz Beber from Denmark.\n\nTogether, Jing and Moritz developed a pipeline for the analysis of SARS-CoV-2 genomes from sewage samples. They also used GitHub and docker containers to make the pipeline more sharable and reproducible. In the future, Jing hopes to use Nextflow Tower to share the pipeline with other institutions.\n\n## Luria Leslie Founou (Mentee) & Sebastian Malkusch (Mentor)\n\nLuria's goal was to accelerate her understanding of Nextflow and apply it to her exploration of the resistome, virulome, mobilome, and phylogeny of bacteria at the Research Centre of Expertise and Biological Diagnostic of Cameroon. Luria was mentored by Sebastian Malkusch, Kolja Becker, and Alex Peltzer from the Boehringer Ingelheim Pharma GmbH & Co. KG in Germany.\n\nFor their project, Luria and her mentors developed a [pipeline](https://github.com/SMLMS/nfml) for mapping multi-dimensional feature space onto a discrete or continuous response variable by using multivariate models from the field of classical machine learning. Their pipeline will be able to handle classification, regression, and time-to-event models and can be used for model training, validation, and feature selection.\n\n## Sebastian Musundi (Mentee) & Athanasios Baltzis (Mentor)\n\nSebastian, from Mount Kenya University in Kenya, joined the mentorship program with the goal of using Nextflow pipelines to identify vaccine targets in Apicomplexan parasites. He was mentored by Athanasios Balzis from the Centre for Genomic Regulation in Spain.\n\nWith Athanasios’s help, Sebastian learned the fundamentals for developing Nextflow pipelines. During the learning process, they developed a [pipeline](https://github.com/sebymusundi/simple_RNA-seq) for customized RNA sequencing and a [pipeline](https://github.com/sebymusundi/AMR_pipeline) for predicting antimicrobial resistance genes. With his new skills, Sebastian plans to keep using Nextflow on a daily basis and start contributing to nf-core.\n\n## Juan Ugalde (Mentee) & Robert Petit (Mentor)\n\nJuan joined the mentorship program with the goal of improving his understanding of Nextflow to support microbial and viral analysis at the Universidad Andres Bello in Chile. Juan was mentored by Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert is an experienced Nextflow mentor who also mentored in Round 1 of the program.\n\nJuan and Robert shared an interest in viral genomics. After learning more about the Nextflow and nf-core ecosystem, Robert mentored Juan as he developed a Nextflow viral amplicon analysis [pipeline](https://github.com/gene2dis/hantaflow). Juan will continue his Nextflow and nf-core journey by sharing his new knowledge with his group and incorporating it into his classes in the coming semester.\n\n## Bhargava Reddy Morampalli (Mentee) & Venkat Malladi (Mentor)\n\nBhargava studies at Massey University in New Zealand and joined the program with the goal of improving his understanding of Nextflow and resolving issues he was facing while developing a pipeline to analyze Nanopore direct RNA sequencing data. Bhargava was mentored by Venkat Malladi from Microsoft in the USA.\n\nBhargava and Venkat worked on Bhargava’s [pipeline](https://github.com/bhargava-morampalli/rnamods-nf/) to identify RNA modifications from bacteria. Their successes included advancing the pipeline and making Singularity images for the tools Bhargava was using to make it more reproducible. For Bhargava, the mentorship program was a great kickstart for learning Nextflow and his pipeline development. He hopes to continue to develop his pipeline and optimize it for cloud platforms in the future.\n\n## Odion Ikhimiukor (Mentee) & Ben Sherman (Mentor)\n\nBefore the program, Odion, who is at the University at Albany in the USA, was new to Nextflow and nf-core. He joined the program with the goal of improving his understanding and to learn how to develop pipelines for bacterial genome analysis. His mentor Ben Sherman works for Seqera Labs in the USA.\n\nDuring the program Odion and Ben developed a [pipeline](https://github.com/odionikh/nf-practice) to analyze bacterial genomes for antimicrobial resistance surveillance. They also developed configuration settings to enable the deployment of their pipeline with high and low resources. Odion has plans to share his new knowledge with others in his community.\n\n## Batool Almarzouq (Mentee) & Murray Wham (Mentor)\n\nBatool works at the King Abdullah International Medical Research Center in Saudi Arabia. Her goal for the mentorship program was to contribute to, and develop, nf-core pipelines.\nAdditionally, she aimed to develop new educational resources for nf-core that can support researchers from lowly represented groups. Her mentor was Murray Wham from the ​​University of Edinburgh in the UK.\n\nDuring the mentorship program, Murray helped Batool develop her molecular dynamics pipeline and participate in the 1st Biohackathon in MENA (KAUST). Batool and Murray also found ways to make documentation more accessible and are actively promoting Nextlfow and nf-core in Saudi Arabia.\n\n## Mariama Telly Diallo (Mentee) & Emilio Garcia (Mentor)\n\nMariama Telly joined the mentorship program with the goal of developing and implementing Nextflow pipelines for malaria research at the Medical Research Unit at The London School of Hygiene and Tropical Medicine in Gambia. She was mentored by Emilio Garcia from Platomics in Austria. Emilio is another experienced mentor who joined the program for a second time.\n\nTogether, Mariama Telly and Emilio worked on learning the basics of Nextflow, Git, and Docker. Putting these skills into practice they started to develop a Nextflow pipeline with a docker file and custom configuration. Mariama Telly greatly improved her understanding of best practices and Nextflow and intends to use her newfound knowledge for future projects.\n\n## Anabella Trigila (Mentee) & Matthias De Smet (Mentor)\n\nAnabella’s goal was to set up Nextflow on her institutional cluster at Héritas S.A. in Argentina and translate some bash pipelines into Nextflow pipelines. Anabella was mentored by Matthias De Smet from Ghent University in Belgium.\n\nAnabella and Matthias worked on developing several new nf-core modules. Extending this, they started the development of a [pipeline](https://github.com/atrigila/nf-core-saliva) to process VCFs obtained from saliva samples and a [pipeline](https://github.com/atrigila/nf-core-ancestry) to infer ancestry from VCF samples. Anabella has now transitioned from a user to a developer and made multiple contributions to the most recent nf-core hackathon. She also contributed to the Spanish translation of the Nextflow [training material](https://training.nextflow.io/es/).\n\n## Juliano de Oliveira Silveira (Mentee) & Maxime Garcia (Mentor)\n\nJuliano works at the Laboratório Central de Saúde Pública RS in Brazil. He joined the program with the goal of setting up Nextflow at his institution, which led him to learn to write his own pipelines. Juliano was mentored by Maxime Garcia from Seqera Labs in Sweden.\n\nJuliano and Maxime worked on learning about Nextflow and nf-core. Juliano applied his new skills to an open-source bioinformatics program that used Nextflow with a customized R script. Juliano hopes to give back to the wider community and peers in Brazil.\n\n## Patricia Agudelo-Romero (Mentee) & Abhinav Sharma (Mentor)\n\nPatricia's goal was to create, customize, and deploy nf-core pipelines at the Telethon Kids Institute in Australia. Her mentor was Abhinav Sharma from Stellenbosch University in South Africa.\n\nAbhinav helped Patricia learn how to write reproducible pipelines with Nextflow and how to work with shared code repositories on GitHub. With Abhinav's support, Patricia worked on translating a Snakemake [pipeline](https://github.com/agudeloromero/everest_nf) designed for genome virus identification and classification into Nextflow. Patricia is already applying her new skills and supporting others at her institute as they adopt Nextflow.\n\n## Mariana Guilardi (Mentee) & Alyssa Briggs (Mentor)\n\nMariana’s goal was to learn the fundamentals of Nextflow, construct and run pipelines, and help with nf-core pipeline development. Her mentor was Alyssa Briggs from the University of Texas at Dallas in the USA\n\nAt the start of the program, Alyssa helped Mariana learn the fundamentals of Nextflow. With Alyssa’s help, Mariana’s skills progressed rapidly and by the end of the program, they were running pipelines and developing new nf-core modules and the [nf-core/viralintegration](https://github.com/nf-core/viralintegration) pipeline. Mariana also made community contributions to the Portuguese translation of the Nextflow [training material](https://training.nextflow.io/pt/).\n\n## Liliane Cavalcante (Mentee) & Marcel Ribeiro-Dantas (Mentor)\n\nLiliane’s goal was to develop and apply Nextflow pipelines for genomic and epidemiological analyses at the Laboratório Central de Saúde Pública Noel Nutels in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera Labs in Brazil.\n\nLiliane and Marcel used Nextflow and nf-core to analyze SARS-CoV-2 genomes and demographic data for public health surveillance. They used the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline and made a new Nextflow script for additional analysis and generating graphs.\n\n## Conclusion\n\nAs with the first round of the program, the feedback about the second round of the mentorship program was overwhelmingly positive. All mentees found the experience to be highly beneficial and were grateful for the opportunity to participate.\n\n
\n “Having a mentor guide through the entire program was super cool. We worked all the way from the basics of Nextflow and learned a lot about developing and debugging pipelines. Today, I feel more confident than before in using Nextflow on a daily basis.” - Sebastian Musundi (Mentee)\n
\n\nSimilarly, the mentors also found the experience to be highly rewarding.\n\n
\n “As a mentor, I really enjoyed participating in the program. Not only did I have the chance to support and work with colleagues from lowly represented regions, but also I learned a lot and improved myself through the mentoring and teaching process.” - Athanasios Baltzis (Mentor)\n
\n\nImportantly, all program participants expressed their willingness to encourage others to be part of it in the future.\n\n
\n “The mentorship allows mentees not only to learn nf-core/Nextflow but also a lot of aspects about open-source reproducible research. With your learning, at the end of the mentorship, you could even contribute back to the nf-core community, which is fantastic! I would tell everyone who is interested in the program to go for it.” - Anabella Trigila (Mentee)\n
\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-lasting benefits beyond those that can be immediately measured. Mentees from the program have already become positive role models, contributing new perspectives to the broader community.\n\n
\n “I highly recommend this program. Independent if you are new to Nextflow or already have some experience, the possibility of working with amazing people to learn about the Nextflow ecosystem is invaluable. It helped me to improve my work, learn new things, and become confident enough to teach Nextflow to students.” - Juan Ugalde (Mentee)\n
\n\nWe were delighted with the achievements of the mentors and mentees. Applications for the third round are now open! For more information, please visit https://nf-co.re/mentorships.\n", "images": [ "/img/mentorships-round2-rocket.png" - ] + ], + "author": "Chris Hakkaart", + "tags": "nextflow,nf-core,czi,mentorship,training" }, { "slug": "2023/czi-mentorship-round-3", @@ -439,7 +555,9 @@ "content": "\n
\n \"Mentorship\n

Nextflow and nf-core mentorship rocket.

\n
\n\nWith the third round of the [Nextflow and nf-core mentorship program](https://nf-co.re/mentorships) now behind us, it's time to pop the confetti and celebrate the outstanding achievements of our latest group of mentors and mentees!\n\nAs with the [first](https://www.nextflow.io/blog/2022/czi-mentorship-round-1.html) and [second](https://www.nextflow.io/blog/2023/czi-mentorship-round-2.html) rounds of the program, we received hundreds of applications from all over the world. Mentors and mentees were matched based on compatible interests and time zones and set off to work on a project of their choosing. Pairs met regularly to work on their projects and reported back to the group to discuss their progress every month.\n\nThe mentor-mentee duos chose to tackle many interesting projects during the program. From learning how to develop pipelines with Nextflow and nf-core, setting up Nextflow on their institutional clusters, and translating Nextflow training materials into other languages, this cohort of mentors and mentees did it all. Regardless of all initial challenges, every pair emerged from the program brimming with confidence and a knack for building scalable and reproducible scientific workflows with Nextlfow. Way to go, team!\n\n![Map of mentor and mentee pairs](/img/mentorship_3_map.png)
\n_Participants of the third round of the mentorship program._\n\n## Abhay Rastogi and Matthias De Smet\n\nAbhay Rastogi is a Clinical Research Fellow at the All India Institute Of Medical Sciences (AllMS Delhi). During the program, he wanted to contribute to the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline. He was mentored by Matthias De Smet, a Bioinformatician at the Center for Medical Genetics in the Ghent University Hospital. Together they worked on developing an nf-core module for Exomiser, a variant prioritization tool for short-read WGS data that they hope to incorporate into [nf-core/sarek](https://github.com/nf-core/sarek/). Keep an eye out for this brand new feature as they continue to work towards implementing this new feature into the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline!\n\n## Alan Möbbs and Simon Pearce\n\nAlan Möbbs, a Bioinformatics Analyst at MultiplAI, was mentored by Simon Pearce, Principal Bioinformatician at the Cancer Research UK Cancer Biomarker Centre. During the program, Alan wanted to create a custom pipeline that merges functionalities from the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) and [nf-core/rnavar](https://github.com/nf-core/rnavar/) pipelines. They started their project by forking the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) pipeline and adding a subworkflow with variant calling functionalities. As the project moved on, they were able to remove tools from the pipeline that were no longer required. Finally, they created some custom definitions for processing samples and work queues to optimize the workflow on AWS. Alan plans to keep working on this project in the future.\n\n## Cen Liau and Chris Hakkaart\n\nCen Liau is a scientist at the Bragato Research Institute in New Zealand, analyzing the epigenetics of grapevines in response to environmental stress. Her mentor was Chris Hakkaart, a Developer Advocate at Seqera. They started the program by deploying the [nf-core/methylseq](https://github.com/nf-core/methylseq/) pipeline on New Zealand’s national infrastructure to analyze data Cen had produced. Afterward, they started to develop a proof of concept methylation pipeline to analyze additional data Cen has produced. Along the way, they learned about nf-core best practices and how to use GitHub to build pipelines collaboratively.\n\n## Chenyu Jin and Ben Sherman\n\nChenyu Jin is a Ph.D. student at the Center for Palaeogenetics of the Swedish Museum of Natural History. She worked with Ben Sherman, a Software Engineer at Seqera. Together they worked towards establishing a workflow for recursive step-down classification using experimental Nextflow features. During the program, they made huge progress in developing a cutting-edge pipeline that can be used for analyzing ancient environmental DNA and reconstructing flora and fauna. Watch this space for future developments!\n\n## Georgie Samaha and Cristina Tuñí i Domínguez\n\nGeorgie Samaha, a bioinformatician from the University of Sydney, was mentored by Cristina Tuñi i Domínguez, a Bioinformatics Scientist at Flomics Biotech SL. During the program, they developed Nextflow configuration files. As a part of this, they built institutional configuration files for multiple national research HPC and cloud infrastructures in Australia. Towards the end of the mentorship, they [built a tool for building configuration files](https://github.com/georgiesamaha/configBuilder-nf) that they hope to share widely in the future.\n\n## Ícaro Maia Santos de Castro and Robert Petit\n\nÍcaro Maia Santos is a Ph.D. Candidate at the University of São Paulo. He was mentored by Robert, a Research Scientist from Wyoming Public Health Lab. After learning the basics of Nextflow and nf-core, they worked on a [metatranscriptomics pipeline](https://github.com/icaromsc/nf-core-phiflow) that simultaneously characterizes microbial composition and host gene expression RNA sequencing samples. As a part of this process, they used nf-core modules that were already available and developed and contributed new modules to the nf-core repository. Ícaro found having someone to help him learn and overcome issues as he was developing his pipeline was invaluable for his career.\n\n![phiflow metro map](/img/phiflow_metro_map.png)
\n_Metro map of the phiflow workflow._\n\n## Lila Maciel Rodríguez Pérez and Priyanka Surana\n\nLila Maciel Rodríguez Pérez, from the National Agrarian University in Peru, was mentored by Priyanka Surana, a researcher from the Wellcome Sanger Institute in the UK. Lila and Priyanka focused on building and deploying Nextflow scripts for metagenomic assemblies. In particular, they were interested in the identification of Antibiotic-Resistant Genes (ARG), Metal-Resistant Genes (MRG), and Mobile Genetic Elements (MGE) in different environments, and in figuring out how these genes are correlated. Both Lila and Priyanka spoke highly of each other and how much they enjoyed being a part of the program.\n\n## Luisa Sacristan and Gisela Gabernet\n\nLuisa is an MSc. student studying computational biology in the Computational Biology and Microbial Ecology group at Universidad de los Andes in Colombia. She was mentored by Gisela Gabernet, a researcher at Yale Medical School. At the start of the program, Luisa and Gisela focused on learning more about GitHub. They quickly moved on to developing an nf-core configuration file for Luisa’s local university cluster. Finally, they started developing a pipeline for the analysis of custom ONT metagenomic amplicons from coffee beans.\n\n## Natalia Coutouné and Marcel Ribeiro-Dantas\n\nNatalia Coutoné is a Ph.D. Candidate at the University of Campinas in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera. Natalia and Marcel worked on developing a pipeline to identify relevant QTL among two or more pool-seq samples. Learning the little things, such as how and where to get help was a valuable part of the learning process for Natalia. She also found it especially useful to consolidate a “Frankenstein” pipeline she had been using into a cohesive Nextflow pipeline that she could share with others.\n\n## Raquel Manzano and Maxime Garcia\n\nRaquel Manzano is a bioinformatician and Ph.D. candidate at the University of Cambridge, Cancer Research UK Cambridge Institute. She was mentored by Maxime Garcia, a bioinformatics engineer at Seqera. During the program, they spent their time developing the [nf-core/rnadnavar](https://github.com/nf-core/rnadnavar/) pipeline. Initially designed for cancer research, this pipeline identifies a consensus call set from RNA and DNA somatic variant calling tools. Both Raquel and Maxime found the program to be highly rewarding. Raquel’s [presentation](https://www.youtube.com/watch?v=PzGOvqSI5n0) about the rnadnavar pipeline and her experience as a mentee from the 2023 Nextflow Summit in Barcelona is now online.\n\n## Conclusion\n\nWe are thrilled to report that the feedback from both mentors and mentees has been overwhelmingly positive. Every participant, whether mentor or mentee, found the experience extremely valuable and expressed gratitude for the chance to participate.\n\n
\n “I loved the experience and the opportunity to develop my autonomy in nextflow/nf-core. This community is totally amazing!” - Icaro Castro\n
\n\n
\n “I think this was a great opportunity to learn about a tool that can make our day-to-day easier and reproducible. Who knows, maybe it can give you a better chance when applying for jobs.” - Alan Möbbs\n
\n\nThanks to the fantastic support of the Chan Zuckerberg Initiative Diversity and Inclusion grant, Seqera, and our fantastic community, who made it possible to run all three rounds of the Nextflow and nf-core mentorship program.\n", "images": [ "/img/mentorship_3_sticker.png" - ] + ], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,nf-core,czi,mentorship" }, { "slug": "2023/geraldine-van-der-auwera-joins-seqera", @@ -448,7 +566,9 @@ "content": "\n\n\nI’m excited to announce that I’m joining Seqera as Lead Developer Advocate. My mission is to support the growth of the Nextflow user community, especially in the USA, which will involve running community events, conducting training sessions, managing communications and working globally with our partners across the field to ensure Nextflow users have what they need to be successful. I’ll be working remotely from Boston, in collaboration with Paolo, Phil and the rest of the Nextflow team.\n\nSome of you may already know me from my previous job at the Broad Institute, where I spent a solid decade doing outreach and providing support for the genomics research community, first for GATK, then for WDL and Cromwell, and eventually Terra. A smaller subset might have come across the O’Reilly book I co-authored, [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/).\n\nThis new mission is very much a continuation of my dedication to helping the research community use cutting-edge software tools effectively.\n\n## From bacterial cultures to large-scale genomics\n\nTo give you a brief sense of where I’m coming from, I originally trained as a wetlab microbiologist in my homeland of Belgium, so it’s fair to say I’ve come a long way, quite literally. I never took a computing class, but taught myself Python during my PhD to analyze bacterial plasmid sequencing data (72 kb of Sanger sequence!) and sort of fell in love with bioinformatics in the process. Later, I got the opportunity to deepen my bioinformatics skills during my postdoc at Harvard Medical School, although my overall research project was still very focused on wetlab work.\n\nToward the end of my postdoc, I realized I had become more interested in the software side of things, though I didn’t have any formal qualifications. Fortunately I was able to take a big leap sideways and found a new home at the Broad Institute, where I was hired as a Bioinformatics Scientist to build out the GATK community, at a time when it was still a bit niche. (It’s a long story that I don’t have time for today, but I’m always happy to tell it over drinks at a conference reception…)\n\nThe GATK job involved providing technical and scientific support to researchers, developing documentation, and teaching workshops about genomics and variant calling specifically. Which is hilarious because at the time I was hired, I had no clue what variant calling even meant! I think I was easily a month or two into the job before that part actually started making a little bit of sense. I still remember the stress and confusion of trying to figure all that out, and it’s something I always carry with me when I think about how to help newcomers to the ecosystem. I can safely say, whatever aspect of this highly multidisciplinary field is causing you trouble, I’ve struggled with it myself at some point.\n\nAnyway, I can’t fully summarize a decade in a couple of paragraphs, but suffice to say, I learned an enormous amount on the job. And in the process, I developed a passion for helping researchers take maximum advantage of the powerful bioinformatics at their disposal. Which inevitably involves workflows.\n\n## Going with the flow\n\nOver time my responsibilities at the Broad grew into supporting not just GATK, but also the workflow systems people use to run tools like GATK at scale, both on premises and increasingly, on public cloud platforms. My own pipelining experience has been focused on WDL and Cromwell, but I’ve dabbled with most of the mainstream tools in the space.\n\nIf I had a dollar for every time I’ve been asked the question “What’s the best workflow language?” I’d still need a full-time job, but I could maybe take a nice holiday somewhere warm. Oh, and my answer is: whatever gets the work done, plays nice with the systems you’re tied to, and connects you to a community.\n\nThat’s one of the reasons I’ve been watching the growth of Nextflow’s popularity with great interest for the last few years. The amount of community engagement that we’ve seen around Nextflow, and especially around the development of nf-core, has been really impressive.\n\nSo I’m especially thrilled to be joining the Seqera team the week of the [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, because it means I’ll get to meet a lot of people from the community in person during my very first few days on the job. I’m also very much looking forward to participating in the hackathon, which should be a great way for me to get started doing real work with Nextflow.\n\nI’m hoping to see many of you there!\n", "images": [ "/img/geraldine-van-der-auwera.jpg" - ] + ], + "author": "Geraldine Van der Auwera", + "tags": "nextflow,community" }, { "slug": "2023/introducing-nextflow-ambassador-program", @@ -457,14 +577,18 @@ "content": "\n\n\nWe are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers.\n\nNextflow ambassadors will play a vital role in:\n\n- Sharing Knowledge: Ambassadors provide valuable insights and best practices to help users make the most of Nextflow by writing training material and blog posts, giving seminars and workshops, organizing hackathons and meet-ups, and helping with community support.\n- Fostering Collaboration: As knowledgeable members of our community, ambassadors facilitate connections among users and developers, enabling collaboration on community projects, such as nf-core pipelines, sub-workflows, and modules, among other things, in the Nextflow ecosystem.\n- Community Growth: Ambassadors help expand and enrich the Nextflow community, making it more vibrant and supportive. They are local contacts for new community members and engage with potential users in their region and fields of expertise.\n\nAs community members who already actively contribute to outreach, ambassadors will be supported to extend the work they're already doing. For example, many of our ambassadors run local Nextflow training events – to help with this, the program will include “train the trainer” sessions and give access to our content library with slide decks, templates, and more. Ambassadors can also request stickers and financial support for events they organize (e.g., for pizza). Seqera is opening an exclusive travel fund that ambassadors can apply to help cover travel costs for events where they will present relevant work. Social media content written by ambassadors will be amplified by the nextflow and nf-core accounts, increasing their reach. Ambassadors will get \"behind the scenes\" access, with insights into running an open-source community, early access to new features, and a great networking experience. The ambassador network will enable members to be kept up-to-date with events and opportunities happening all over the world. To recognize their efforts, ambassadors will receive exclusive swag and apparel, a certificate for their work, and a profile on the ambassador page of our website.\n\n## Meet Our Ambassadors\n\nYou can visit our [Nextflow ambassadors page](https://www.nextflow.io/our_ambassadors.html) to learn more about our first group of ambassadors. You will find their profiles there, highlighting their interests, expertise, and insights they bring to the Nextflow ecosystem.\n\nYou can see snippets about some of our ambassadors below:\n\n#### Priyanka Surana\n\nPriyanka Surana is a Principal Bioinformatician at the Wellcome Sanger Institute, where she oversees the Nextflow development for the Tree of Life program. Over the last almost two years, they have released nine pipelines with nf-core standards and have three more in development. You can learn more about them [here](https://pipelines.tol.sanger.ac.uk/pipelines).\n\nShe’s one of our ambassadors in the UK 🇬🇧 and has already done fantastic outreach work, organizing seminars and bringing many new users to our community! 🤩 In the March Hackathon, she organized a local site with over 70 individuals participating in person, plus over five other events in 2023. The Nextflow community on the Wellcome Genome Campus started in March 2023 with the nf-core hackathon, and now it has grown to over 150 members across 11 different organizations across Cambridge. Currently, they are planning a day-long Nextflow Symposium in December 🤯. They do seminars, workshops, coffee meetups, and trainings. In our previous round of the Nextflow and nf-core mentorship, Priyanka mentored Lila, a graduate student in Peru, to build her first Nextflow pipeline using nf-core tools to analyze bacterial metagenomics data. This is the power of a Nextflow ambassador! Not only growing a local community but helping people all over the world to get the best out of Nextflow and nf-core 🥰.\n\n#### Abhinav Sharma\n\nAbhinav is a PhD candidate at Stellenbosch University, South Africa. As a Nextflow Ambassador, Abhinav has been tremendously active in the Global South, supporting young scientists in Africa 🇿🇦🇿🇲, Brazil 🇧🇷, India 🇮🇳 and Australia 🇦🇺 leading to the growth of local communities. He has contributed to the [Nextflow training in Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ) and played a key role in integrating African bioinformaticians in the Nextflow and nf-core community and initiatives, showcased by the high participation of individuals in African countries who benefited from mentorship during nf-core Hackathons, Training events and prominent workshops like [VEME, 2023](https://twitter.com/abhi18av/status/1695863348162675042). In Australia, Abhinav continues to collaborate with Patricia, a research scientist from Telethon Kids Institute, Perth (whom he mentored during the nf-core mentorship round 2), to organize monthly seminars on [BioWiki](https://github.com/TelethonKids/Nextflow-BioWiki) and bootcamp for local capacity building. In addition, he engages in regular capacity-building sessions in Brazilian institutes such as [Instituto Evandro Chagas](https://www.gov.br/iec/pt-br/assuntos/noticias/curso-contribui-para-criacao-da-rede-norte-nordeste-de-vigilancia-genomica-para-tuberculose-no-iec) (Belém, Brazil) and INI, FIOCRUZ (Rio de Janeiro, Brazil). Last but not least, Abhinav has contributed to the Nextflow community and project in several ways, even to the extent of contributing to the Nextflow code base and plugin ecosystem! 😎\n\n#### Robert Petit\n\nRobert Petit is the Senior Bioinformatics Scientist at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/) 🦬 and a long-time contributor to the Nextflow community! 🥳 Being a Nextflow Ambassador, Robert has made extensive efforts to grow the Nextflow and nf-core communities, both locally and internationally. Through his work on [Bactopia](https://bactopia.github.io/), a popular and extensive Nextflow pipeline for the analysis of bacterial genomes, Robert has been able to [contribute to nf-core regularly](https://bactopia.github.io/v3.0.0/impact-and-outreach/enhancements/#enhancements-and-fixes). As a Bioconda Core team member, he is always lending a hand when called upon by the Nextflow community, whether it is to add a new recipe or approve a pull request! ⚒️ He has also delivered multiple trainings to the local community in Wyoming, US 🇺🇸, and workshops at conferences, including ASM Microbe. Robert's dedication as a Nextflow Ambassador is best highlighted, and he'll agree, by his active role as a mentor. Robert has acted as a mentor multiple times during virtual nf-core hackathons, and he is the only person to be a mentor in all three rounds of the Nextflow and nf-core mentorship program 😍!\n\nThe Nextflow Ambassador Program is a testament to the power of community-driven innovation, and we invite you to join us in celebrating this exceptional group. In the coming weeks and months, you will hear more from our ambassadors as they continue to share their experiences, insights, and expertise with the community as freshly minted Nextflow ambassadors.\n", "images": [ "/img/ambassadors-hackathon.jpeg" - ] + ], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,community" }, { "slug": "2023/learn-nextflow-in-2023", "title": "Learn Nextflow in 2023", "date": "2023-02-24T00:00:00.000Z", "content": "\nIn 2023, the world of Nextflow is more exciting than ever! With new resources constantly being released, there is no better time to dive into this powerful tool. From a new [Software Carpentries’](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course to [recordings of mutiple nf-core training events](https://nf-co.re/events/training/) to [new tutorials on Wave and Fusion](https://github.com/seqeralabs/wave-showcase), the options for learning Nextflow are endless.\n\nWe've compiled a list of the best resources in 2023 to make your journey to Nextflow mastery as seamless as possible. And remember, Nextflow is a community-driven project. If you have suggestions or want to contribute to this list, head to the [GitHub page](https://github.com/nextflow-io/) and make a pull request.\n\n## Before you start\n\nBefore learning Nextflow, you should be comfortable with the Linux command line and be familiar with some basic scripting languages, such as Perl or Python. The beauty of Nextflow is that task logic can be written in your language of choice. You will just need to learn Nextflow’s domain-specific language (DSL) to control overall flow.\n\nNextflow is widely used in bioinformatics, so many tutorials focus on life sciences. However, Nextflow can be used for almost any data-intensive workflow, including image analysis, ML model training, astronomy, and geoscience applications.\n\nSo, let's get started! These resources will guide you from beginner to expert and make you unstoppable in the field of scientific workflows.\n\n## Contents\n\n- [Why Learn Nextflow](#why-learn-nextflow)\n- [Meet the Tutorials!](#meet-the-tutorials)\n 1. [Basic Nextflow Community Training](#introduction-to-nextflow-by-community)\n 2. [Hands-on Nextflow Community Training](#nextflow-hands-on-by-community)\n 3. [Advanced Nextflow Community Training](#advanced-nextflow-by-community)\n 4. [Software Carpentry workshop](#software-carpentry-workshop)\n 5. [An introduction to Nextflow course by Uppsala University](#intro-nexflow-by-uppsala)\n 6. [Introduction to Nextflow workshop by VIB](#intro-nextflow-by-vib)\n 7. [Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)](#nextflow-training-cira)\n 8. [Managing Pipelines in the Cloud - GenomeWeb Webinar](#managing-pipelines-in-the-cloud-genomeweb-webinar)\n 9. [Nextflow implementation patterns](#nextflow-implementation-patterns)\n 10. [nf-core tutorials](#nf-core-tutorials)\n 11. [Awesome Nextflow](#awesome-nextflow)\n 12. [Wave showcase: Wave and Fusion tutorials](#wave-showcase-wave-and-fusion-tutorials)\n 13. [Building Containers for Scientific Workflows](#building-containers-for-scientific-workflows)\n 14. [Best Practices for Deploying Pipelines with Nextflow Tower](#best-practices-for-deploying-pipelines-with-nextflow-tower)\n- [Cloud integration tutorials](#cloud-integration-tutorials)\n 1. [Nextflow and AWS Batch Inside the Integration](#nextflow-and-aws-batch-inside-the-integration)\n 2. [Nextflow and Azure Batch Inside the Integration](#nextflow-and-azure-batch-inside-the-integration)\n 3. [Get started with Nextflow on Google Cloud Batch](#get-started-with-nextflow-on-google-cloud-batch)\n 4. [Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](#nextflow-and-k8s-rebooted-running-nextflow-on-amazon-eks)\n- [Additional resources](#additional-resources)\n 1. [Nextflow docs](#nextflow-docs)\n 2. [Seqera Labs docs](#seqera-labs-docs)\n 3. [nf-core](#nf-core)\n 4. [Nextflow Tower](#nextflow-tower)\n 5. [Nextflow on AWS](#nextflow-on-aws)\n 6. [Nextflow Data pipelines on Azure Batch](#nextflow-data-pipelines-on-azure-batch)\n 7. [Running Nextflow with Google Life Sciences](#running-nextflow-with-google-life-sciences)\n 8. [Bonus: Nextflow Tutorial - Variant Calling Edition](#bonus-nextflow-tutorial-variant-calling-edition)\n- [Community and support](#community-and-support)\n\n

Why Learn Nextflow

\n\nThere are hundreds of workflow managers to choose from. In fact, Meir Wahnon and several of his colleagues have gone to the trouble of compiling an awesome-workflow-engines list. The workflows community initiative is another excellent source of information about workflow engines.\n\n- Using Nextflow in your analysis workflows helps you implement reproducible pipelines. Nextflow pipelines follow [FAIR guidelines](https://www.go-fair.org/fair-principles/) (findability, accessibility, interoperability, and reuse). Nextflow also supports version control and containers to manage all software dependencies.\n- Nextflow is portable; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon AWS, Microsoft Azure, Google Cloud Platform, or Kubernetes. With features like [configuration profiles](https://nextflow.io/docs/latest/config.html?#config-profiles), code can be written so that it is 100% portable across different on-prem and cloud infrastructures enabling collaboration and avoiding lock-in.\n- It is massively **scalable**, allowing the parallelization of tasks using the dataflow paradigm without hard-coding pipelines to specific platforms, workload managers, or batch services.\n- Nextflow is **flexible**, supporting scientific workflow requirements like caching processes to avoid redundant computation and workflow reporting to help understand and diagnose workflow execution patterns.\n- It is **growing fast**, and **support is available** from [Seqera Labs](https://seqera.io). The project has been active since 2013 with a vibrant developer community, and the Nextflow ecosystem continues to expand rapidly.\n- Finally, Nextflow is open source and licensed under Apache 2.0. You are free to use it, modify it, and distribute it.\n\n

Meet the Tutorials!

\n\nSome of the best publicly available tutorials are listed below:\n\n

1. Basic Nextflow Community Training

\n\nBasic training for all things Nextflow. Perfect for anyone looking to get to grips with using Nextflow to run analyses and build workflows. This is the primary Nextflow training material used in most Nextflow and nf-core training events. It covers a large number of topics, with both theoretical and hands-on chapters.\n\n[Basic Nextflow Community Training](https://training.nextflow.io/basic_training/)\n\nWe run a free online training event for this course approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community. You can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-basic-2023)) in the [YouTube playlist](https://youtu.be/ERbTqLtAkps?si=6xDoDXsb6kGQ_Qa8) below:\n\n
\n \n
\n\n

2. Hands-on Nextflow Community Training

\n\nA \"learn by doing\" tutorial with less focus on theory, instead leading through exercises of slowly increasing complexity. This course is quite short and hands-on, great if you want to practice your Nextflow skills.\n\n[Hands-on Nextflow Community Training](https://training.nextflow.io/hands_on/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-hands-on-2023/)) below:\n\n
\n \n
\n\n

3. Advanced Nextflow Community Training

\n\nAn advanced material exploring the advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows. This is the Nextflow training material used in advanced training events.\n\n[Advanced Nextflow Community Training](https://training.nextflow.io/advanced/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-sept-2023/)) below:\n\n
\n \n
\n\n

4. Software Carpentry workshop

\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (still being developed) explains the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience is those with little programming experience. The course provides a foundation to write and run Nextflow and nf-core workflows comfortably. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit The Carpentries training style. [The Carpentries](https://carpentries.org/) emphasize feedback to improve teaching materials, so we would like to hear back from you about what you thought was well-explained and what needs improvement. Pull requests to the course material are very welcome.\nThe workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nYou can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n

5. An introduction to Nextflow course from Uppsala University

\n\nThis 5-module course by Uppsala University covers the basics of Nextflow, from running Nextflow pipelines, writing your own pipelines and even using containers and conda.\n\nThe course can be viewed [here](https://uppsala.instructure.com/courses/51980/pages/nextflow-1-introduction?module_item_id=328997).\n\n

6. Introduction to Nextflow workshop by VIB

\n\nWorkshop materials by VIB (mainly) in DSL2 aiming to get familiar with the Nextflow syntax by explaining basic concepts and building a simple RNAseq pipeline. Highlights also reproducibility aspects with adding containers (docker & singularity).\n\nThe course can be viewed [here](https://vibbits-nextflow-workshop.readthedocs.io/en/latest/).\n\n

7. Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)

\n\nThis training was prepared for physicists and has examples applied to astronomy which may be interesting for Nextflow users coming from this background!\n\nThe course can be viewed [here](https://carpentries-incubator.github.io/Pipeline_Training_with_Nextflow/).\n\n

8. Managing Pipelines in the Cloud - GenomeWeb Webinar

\n\nThis on-demand webinar features Phil Ewels from SciLifeLab, nf-core (now also Seqera Labs), Brendan Boufler from Amazon Web Services, and Evan Floden from Seqera Labs. The wide-ranging discussion covers the significance of scientific workflows, examples of Nextflow in production settings, and how Nextflow can be integrated with other processes.\n\n[Watch the webinar](https://seqera.io/events/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n

9. Nextflow implementation patterns

\n\nThis advanced documentation discusses recurring patterns in Nextflow and solutions to many common implementation requirements. Code examples are available with notes to follow along and a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n

10. nf-core tutorials

\n\nA set of tutorials covering the basics of using and creating nf-core pipelines developed by the team at [nf-core](https://nf-co.re/). These tutorials provide an overview of the nf-core framework, including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n[nf-core usage tutorials](https://nf-co.re/docs/usage/tutorials) and [nf-core developer tutorials](https://nf-co.re/docs/contributing/tutorials).\n\n

11. Awesome Nextflow

\n\nA collection of awesome Nextflow pipelines compiled by various contributors to the open-source Nextflow project.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) and GitHub\n\n

12. Wave showcase: Wave and Fusion tutorials

\n\nWave and the Fusion file system are new Nextflow capabilities introduced in November 2022. Wave is a container provisioning and augmentation service fully integrated with the Nextflow ecosystem. Instead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself.\n\nTightly coupled with Wave is the new Fusion 2.0 file system. Fusion implements a virtual distributed file system and presents a thin client, allowing data hosted in AWS S3 buckets (and other object stores in the future) to be accessed via the standard POSIX filesystem interface expected by most applications.\n\nWave can help simplify development, improve reliability, and make pipelines easier to maintain. It can even improve pipeline performance. The optional Fusion 2.0 file system offers further advantages, delivering performance on par with FSx for Lustre while enabling organizations to reduce their cloud computing bill and improve pipeline efficiency throughput. See the [blog article](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) released in February 2023 explaining the Fusion file system and providing benchmarks comparing Fusion to other data handling approaches in the cloud.\n\n[Wave showcase](https://github.com/seqeralabs/wave-showcase) on GitHub\n\n

13. Building Containers for Scientific Workflows

\n\nWhile not strictly a guide about Nextflow, this article provides an overview of scientific containers and provides a tutorial involved in creating your own container and integrating it into a Nextflow pipeline. It also provides some useful tips on troubleshooting containers and publishing them to registries.\n\n[Building Containers for Scientific Workflows](https://seqera.io/blog/building-containers-for-scientific-workflows/)\n\n

14. Best Practices for Deploying Pipelines with Nextflow Tower

\n\nWhen building Nextflow pipelines, a best practice is to supply a nextflow_schema.json file describing pipeline input parameters. The benefit of adding this file to your code repository, is that if the pipeline is launched using Nextflow, the schema enables an easy-to-use web interface that users through the process of parameter selection. While it is possible to craft this file by hand, the nf-core community provides a handy schema build tool. This step-by-step guide explains how to adapt your pipeline for use with Nextflow Tower by using the schema build tool to automatically generate the nextflow_schema.json file.\n\n[Best Practices for Deploying Pipelines with Nextflow Tower](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-nextflow-tower/)\n\n

Cloud integration tutorials

\n\nIn addition to the learning resources above, several step-by-step integration guides explain how to run Nextflow pipelines on your cloud platform of choice. Some of these tutorials extend to the use of [Nextflow Tower](https://cloud.tower.nf/). Organizations can use the Tower Cloud Free edition to launch pipelines quickly in the cloud. Organizations can optionally use Tower Cloud Professional or run self-hosted or on-premises Tower Enterprise environments as requirements grow. This year, we added Google Cloud Batch to the cloud services supported by Nextflow.\n\n

1. Nextflow and AWS Batch — Inside the Integration

\n\nThis three-part series of articles provides a step-by-step guide explaining how to use Nextflow with AWS Batch. The [first of three articles](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) covers AWS Batch concepts, the Nextflow execution model, and explains how the integration works under the covers. The [second article](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/) in the series provides a step-by-step guide explaining how to set up the AWS batch environment and how to run and troubleshoot open-source Nextflow pipelines. The [third article](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/) builds on what you've learned, explaining how to integrate workflows with Nextflow Tower and share the AWS Batch environment with other users by \"publishing\" your workflows to the cloud.\n\nNextflow and AWS Batch — Inside the Integration ([part 1 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/), [part 2 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/), [part 3 of 3](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/))\n\n

2. Nextflow and Azure Batch — Inside the Integration

\n\nSimilar to the tutorial above, this set of articles does a deep dive into the Nextflow Azure Batch integration. [Part 1](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/) covers Azure Batch and essential concepts, provides an overview of the integration, and explains how to set up Azure Batch and Storage accounts. It also covers deploying a machine instance in the Azure cloud and configuring it to run Nextflow pipelines against the Azure Batch service.\n\n[Part 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/) builds on what you learned in part 1 and shows how to use Azure Batch from within Nextflow Tower Cloud. It provides a walkthrough of how to make the environment set up in part 1 accessible to users through Tower's intuitive web interface.\n\nNextflow and Azure Batch — Inside the Integration ([part 1 of 2](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/), [part 2 of 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/))\n\n

3. Get started with Nextflow on Google Cloud Batch

\n\nThis excellent article by Marcel Ribeiro-Dantas provides a step-by-step tutorial on using Nextflow with Google’s new Google Cloud Batch service. Google Cloud Batch is expected to replace the Google Life Sciences integration over time. The article explains how to deploy the Google Cloud Batch and Storage environments in GCP using the gcloud CLI. It then goes on to explain how to configure Nextflow to launch pipelines into the newly created Google Cloud Batch environment.\n\n[Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html)\n\n

4. Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS

\n\nWhile not commonly used for HPC workloads, Kubernetes has clear momentum. In this educational article, Ben Sherman provides an overview of how the Nextflow / Kubernetes integration has been simplified by avoiding the requirement for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). This detailed guide provides step-by-step instructions for using Amazon EKS as a compute environment complete with how to configure IAM Roles for Kubernetes Services Accounts (IRSA), now an Amazon EKS best practice.\n\n[Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/)\n\n

Additional resources

\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community which maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n

1. Nextflow docs

\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month, with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n

2. Seqera Labs docs

\n\nAn index of documentation, deployment guides, training materials, and resources for all things Nextflow and Tower.\n\n[Seqera Labs docs](https://seqera.io/docs/)\n\n

3. nf-core

\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up for the Slack channel.\n\n[nf-core website](https://nf-co.re/) and [nf-core Slack](https://nf-co.re/join)\n\n

4. Nextflow Tower

\n\nNextflow Tower is a platform to easily monitor, launch, and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines, and launching using either the web graphic interface, CLI, or API.\n\n[Nextflow Tower](https://tower.nf/) and [user documentation](http://help.tower.nf/).\n\n

5. Nextflow on AWS

\n\nPart of the Genomics Workflows on AWS, Amazon provides a quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes. While this article is packed with good information, the procedure outlined in the more recent [Nextflow and AWS Batch – Inside the integration](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) series, may be an easier place to start. Some of the steps that previously needed to be performed manually have been updated in the latest integration.\n\n[Nextflow on AWS Batch](https://docs.opendata.aws/genomics-workflows/orchestration/nextflow/nextflow-overview.html)\n\n

6. Nextflow Data Pipelines on Azure Batch

\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guide below developed by the team at Microsoft to set up both services on Azure, and to get your storage and batch account names and keys.\n\n[Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n

7. Running Nextflow with Google Life Sciences

\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud. Note that this integration process is specific to Google Life Sciences – an offering that pre-dates Google Cloud Batch. If you want to use the newer integration approach, you can also check out the Nextflow blog article [Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html).\n\n[Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow]\n\n

8. Bonus: Nextflow Tutorial - Variant Calling Edition

\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash script into a Nextflow workflow, and modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git), where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\nYou can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n

Community and support

\n\n- [Seqera Community Forum](https://community.seqera.io)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Nextflow Slack](https://www.nextflow.io/slack-invite.html)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io/) and [Nextflow Tower](https://tower.nf/)\n- [Nextflow patterns](https://github.com/nextflow-io/patterns)\n- [Nextflow Snippets](https://github.com/mribeirodantas/NextflowSnippets)\n", - "images": [] + "images": [], + "author": "Evan Floden", + "tags": "nextflow, tower" }, { "slug": "2023/nextflow-goes-to-university", @@ -473,7 +597,9 @@ "content": "\nThe Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual [Nextflow and nf-core community training online](https://www.youtube.com/@nf-core/playlists?view=50&sort=dd&shelf_id=2). In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, [Nextflow](https://www.nextflow.io/) and [nf-core](https://nf-co.re/) training for Masters and Ph.D. students in Bioinformatics and other related fields.\n\nWe recently held one of our first such projects, a collaboration with the [Bioinformatics Multidisciplinary Environment, BioME](https://bioinfo.imd.ufrn.br/site/en-US) at the [Federal University of Rio Grande do Norte (UFRN)](https://www.ufrn.br/) in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.\n\n\n\nThe course participants included one undergraduate student, Master's students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!\n\n Picture with some of the students that attended the course\n\n## Final projects\n\nFinal hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion.\n\n### Euryale 🐍\n\n[João Vitor Cavalcante](https://www.linkedin.com/in/joao-vitor-cavalcante), along with collaborators, had developed and [published](https://www.frontiersin.org/articles/10.3389/fgene.2022.814437/full) a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result [here](https://github.com/dalmolingroup/euryale/) 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓\n\n### Bringing Nanopore to Google Batch ☁️\n\nThe Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, [wf-alignment](https://github.com/epi2me-labs/wf-alignment), takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, [Danilo Imparato](https://www.linkedin.com/in/daniloimparato) noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his [final project](https://github.com/daniloimparato/wf-alignment), which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯\n\n### Nextflow applied to Economics! 🤩\n\n[Galileu Nobre](https://www.linkedin.com/in/galileu-nobre-901551187/) is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his [final project](https://github.com/galileunobre/nextflow_projeto_1). The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.\n\n### Whole Exome Sequencing 🧬\n\nFor her [final project](https://github.com/RafaellaFerraz/exome), [Rafaella Ferraz](https://www.linkedin.com/in/rafaella-sousa-ferraz) used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻\n\n### RNASeq with contamination 🧫\n\nIn her [final project](https://github.com/iaradsouza1/tab-projeto-final), [Iara Souza](https://www.linkedin.com/in/iaradsouza) developed a bioinformatics pipeline that analyzed RNA-Seq data when it's required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the \"batteries included\" that come with it 🔋😬\n\n### SARS-CoV-2 Genome assembly and lineage classification 🦠\n\n[Diego Teixeira](https://www.linkedin.com/in/diego-go-tex) has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a [Nextflow pipeline](https://github.com/diegogotex/sarscov2_irma_nf) aggregating all tools and analyses he's been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.\n\nIn the nf-core project, there are almost a [thousand modules](https://nf-co.re/modules) ready to plug in your pipeline, together with [dozens of full-featured pipelines](https://nf-co.re/pipelines). However, in many situations, you'll need a custom pipeline. With that in mind, it's very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.\n\n## Exciting experience!\n\nIt was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩\n\nWe’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don't hesitate to reach out to us at community@seqera.io. We would love to hear from you!\n", "images": [ "/img/nextflow-university-class-ufrn.jpg" - ] + ], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,nf-core" }, { "slug": "2023/nextflow-summit-2023-recap", @@ -484,14 +610,18 @@ "/img/blog-summit-2023-recap--img1b.jpg", "/img/blog-summit-2023-recap--img2b.jpg", "/img/blog-summit-2023-recap--img3b.jpg" - ] + ], + "author": "Noel Ortiz", + "tags": "nextflow,summit,event,hackathon" }, { "slug": "2023/nextflow-with-gbatch", "title": "Get started with Nextflow on Google Cloud Batch", "date": "2023-02-01T00:00:00.000Z", "content": "\n[We have talked about Google Cloud Batch before](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html). Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The [Nextflow official documentation](https://www.nextflow.io/docs/latest/google.html) also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳\n\n### Running an RNAseq pipeline with Google Cloud Batch\n\nWelcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check [nf-core/rnaseq](https://github.com/nf-core/rnaseq). For the proof-of-concept RNAseq pipeline that we will use here, check [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf).\n\nNextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis.\n\nWe will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!\n\n### Setting up Google Cloud CLI (gcloud)\n\nIn this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions [here](https://cloud.google.com/sdk/docs/install) to install it. Once you have gcloud installed, run the command `gcloud init` to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project \"my-rnaseq-pipeline\". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.\n\n### Setting up Batch and Storage in Google Cloud Platform\n\n#### Enable Google Batch\n\nAccording to the [official Google documentation](https://cloud.google.com/batch/docs/get-started) _Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale_.\n\nThe first step is to download the `beta` command group. You can do this by executing:\n\n```bash\n$ gcloud components install beta\n```\n\nThen, enable billing for this project. You will first need to get your account id with\n\n```bash\n$ gcloud beta billing accounts list\n```\n\nAfter that, you will see something like the following appear in your window:\n\n```console\nACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True\n```\n\nIf you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.\n\n```bash\n$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ\n```\n\nNext, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:\n\n```bash\n$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com\n```\n\nYou should see a message similar to the one below:\n\n```console\nOperation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.\n```\n\n#### Create a Service Account\n\nIn order to access the APIs we enabled, you need to [create a Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud) and set the necessary IAM roles for the project. You can create the Service Account by executing:\n\n```bash\n$ gcloud iam service-accounts create rnaseq-pipeline-sa\n```\n\nAfter this, set appropriate roles for the project using the commands below:\n\n```bash\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"\n```\n\n#### Create your Bucket\n\nNow it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name\n\n```bash\n$ gcloud storage buckets create gs://rnaseq-pipeline-bckt\n```\n\nNow it's time for Nextflow to join the party! 🥳\n\n### Setting up Nextflow to make use of Batch and Storage\n\n#### Write the configuration file\n\nHere you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.\n\nStart by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally.\n\nInside the folder that you created for the project, create a file named `nextflow.config` with the following content (remember to replace PROJECT-ID with the project id you created above):\n\n```groovy\nworkDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}\n```\n\nThe `workDir` option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice.\n\nThe `process` scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the `nextflow/rnaseq-nf` Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed.\n\nThe `google` scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances [here](https://www.nextflow.io/docs/latest/google.html#spot-instances)), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the `nextflow.config` file should contain \"rnaseq-nxf\" as the project id.\n\nUse the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.\n\n```bash\n$ gcloud auth application-default login\n```\n\n#### Launch the pipeline!\n\nWith that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of [this tutorial](https://github.com/nextflow-io/rnaseq-nf), and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at .\n\nOne important thing to mention is that in this repository there is already a `nextflow.config` file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the `-c` Nextflow parameter. Run the command line below:\n\n```bash\n$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config\n```\n\nWhile the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called `results`, because of how the `publishDir` directive was specified in the `main.nf` script (example [here](https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5)). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the `publishDir` to another bucket directory, e.g. `gs://rnaseq-pipeline-bckt/results`.\n\nIn your terminal, you should see something like this:\n\n![Nextflow ongoing run on Google Cloud Batch](/img/ongoing-nxf-gbatch.png)\n\nYou can check the status of your jobs on Google Batch by opening another terminal and running the following command:\n\n```bash\n$ gcloud batch jobs list\n```\n\nBy the end of it, if everything worked well, you should see something like:\n\n![Nextflow run on Google Cloud Batch finished](/img/nxf-gbatch-finished.png)\n\nAnd that's all, folks! 😆\n\nYou will find more information about Nextflow on Google Batch in [this blog post](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html) and the [official Nextflow documentation](https://www.nextflow.io/docs/latest/google.html).\n\nSpecial thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document.\n", - "images": [] + "images": [], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,google,cloud" }, { "slug": "2023/reflecting-on-ten-years-of-nextflow-awesomeness", @@ -500,28 +630,36 @@ "content": "\nThere's been a lot of water under the bridge since the first release of Nextflow in July 2013. From its humble beginnings at the [Centre for Genomic Regulation](https://www.crg.eu/) (CRG) in Barcelona, Nextflow has evolved from an upstart workflow orchestrator to one of the most consequential projects in open science software (OSS). Today, Nextflow is downloaded **120,000+** times monthly, boasts vibrant user and developer communities, and is used by leading pharmaceutical, healthcare, and biotech research firms.\n\nOn the occasion of Nextflow's anniversary, I thought it would be fun to share some perspectives and point out how far we've come as a community. I also wanted to recognize the efforts of Paolo Di Tommaso and the many people who have contributed enormous time and effort to make Nextflow what it is today.\n\n## A decade of innovation\n\nBill Gates is credited with observing that \"people often overestimate what they can do in one year, but underestimate what they can do in ten.\" The lesson, of course, is that real, meaningful change takes time. Progress is measured in a series of steps. Considered in isolation, each new feature added to Nextflow seems small, but they combine to deliver powerful capabilities.\n\nLife sciences has seen a staggering amount of innovation. According to estimates from the National Human Genome Research Institute (NHGRI), the cost of sequencing a human genome in 2013 was roughly USD 10,000. Today, sequencing costs are in the range of USD 200—a **50-fold reduction**.1\n\nA fundamental principle of economics is that _\"if you make something cheaper, you get more of it.\"_ One didn't need a crystal ball to see that, driven by plummeting sequencing and computing costs, the need for downstream analysis was poised to explode. With advances in sequencing technology outpacing Moore's Law, It was clear that scaling analysis capacity would be a significant issue.2\n\n## Getting the fundamentals right\n\nWhen Paolo and his colleagues started the Nextflow project, it was clear that emerging technologies such as cloud computing, containers, and collaborative software development would be important. Even so, it is still amazing how rapidly these key technologies have advanced in ten short years.\n\nIn an [article for eLife magazine in 2021](https://elifesciences.org/labs/d193babe/the-story-of-nextflow-building-a-modern-pipeline-orchestrator), Paolo described how Solomon Hyke's talk \"[Why we built Docker](https://www.youtube.com/watch?v=3N3n9FzebAA)\" at DotScale in the summer of 2013 impacted his thinking about the design of Nextflow. It was evident that containers would be a game changer for scientific workflows. Encapsulating application logic in self-contained, portable containers solved a multitude of complexity and dependency management challenges — problems experienced daily at the CRG and by many bioinformaticians to this day. Nextflow was developed concurrent with the container revolution, and Nextflow’s authors had the foresight to make containers first-class citizens.\n\nWith containers, HPC environments have been transformed — from complex environments where application binaries were typically served to compute nodes via NFS to simpler architectures where task-specific containers are pulled from registries on demand. Today, most bioinformatic pipelines use containers. Nextflow supports [multiple container formats](https://www.nextflow.io/docs/latest/container.html?highlight=containers) and runtimes, including [Docker](https://www.docker.com/), [Singularity](https://sylabs.io/), [Podman](https://podman.io/), [Charliecloud](https://hpc.github.io/charliecloud/), [Sarus](https://sarus.readthedocs.io/en/stable/), and [Shifter](https://github.com/NERSC/shifter).\n\n## The shift to the cloud\n\nSome of the earliest efforts around Nextflow centered on building high-quality executors for HPC workload managers. A key idea behind schedulers such as LSF, PBS, Slurm, and Grid Engine was to share a fixed pool of on-premises resources among multiple users, maximizing throughput, efficiency, and resource utilization.\n\nSee the article [Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters](https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html)\n\nWhile cloud infrastructure was initially \"clunky\" and hard to deploy and use, the idea of instant access and pay-per-use models was too compelling to ignore. In the early days, many organizations attempted to replicate on-premises HPC clusters in the cloud, deploying the same software stacks and management tools used locally to cloud-based VMs.\n\nWith the launch of [AWS Batch](https://aws.amazon.com/batch/) in December 2016, Nextflow’s developers realized there was a better way. In cloud environments, resources are (in theory) infinite and just an API call away. The traditional scheduling paradigm of sharing a finite resource pool didn't make sense in the cloud, where users could dynamically provision a private, scalable resource pool for only the duration of their workload. All the complex scheduling and control policies that tended to make HPC workload managers hard to use and manage were no longer required.3\n\nAWS Batch also relied on containerization, so it only made sense that AWS Batch was the first cloud-native integration to the Nextflow platform early in 2017, along with native support for S3 storage buckets. Nextflow has since been enhanced to support other batch services, including [Azure Batch](https://azure.microsoft.com/en-us/products/batch) and [Google Cloud Batch](https://cloud.google.com/batch), along with a rich set of managed cloud storage solutions. Nextflow’s authors have also embraced [Kubernetes](https://kubernetes.io/docs/concepts/overview/), developed by Google, yet another way to marshal and manage containerized application environments across public and private clouds.\n\n## SCMs come of age\n\nA major trend shaping software development has been the use of collaborative source code managers (SCMs) based on Git. When Paolo was thinking about the design of Nextflow, GitHub had already been around for several years, and DevOps techniques were revolutionizing software. These advances turned out to be highly relevant to managing pipelines. Ten years ago, most bioinformaticians stored copies of pipeline scripts locally. Nextflow’s authors recognized what now seems obvious — it would be easier to make Nextflow SCM aware and launch pipelines directly from a code repository. Today, this simple idea has become standard practice. Most users run pipelines directly from GitHub, GitLab, Gitea, or other favorite SCMs.\n\n## Modularization on steroids\n\nA few basic concepts and patterns in computer science appear repeatedly in different contexts. These include iteration, indirection, abstraction, and component reuse/modularization. Enabled by containers, we have seen a significant shift towards modularization in bioinformatics pipelines enabled by catalogs of reusable containers. In addition to general-purpose registries such as [Docker Hub](https://hub.docker.com/) and [Quay.io](https://quay.io/), domain-specific efforts such as [biocontainers](https://biocontainers.pro/) have emerged, aimed at curating purpose-built containers to meet the specialized needs of bioinformaticians.\n\nWe have also seen the emergence of platform and language-independent package managers such as [Conda](https://docs.conda.io/en/latest/). Today, almost **10,000** Conda recipes for various bioinformatics tools are freely available from [Bioconda](https://anaconda.org/bioconda/repo). Gone are the days of manually installing software. In addition to pulling pre-built bioinformatics containers from registries, developers can leverage [packages of bioconda](http://bioconda.github.io/conda-package_index.html) recipes directly from the bioconda channel.\n\nThe Nextflow community has helped lead this trend toward modularization in several areas. For example, in 2022, Seqera Labs introduced [Wave](https://seqera.io/wave/). This new service can dynamically build and serve containers on the fly based on bioconda recipes, enabling the two technologies to work together seamlessly and avoiding building and maintaining containers by hand.\n\nWith [nf-core](https://nf-co.re/), the Nextflow community has extended the concept of modularization and reuse one step further. Much as bioconda and containers have made bioinformatics software modular and portable, [nf-core modules](https://nf-co.re/modules) extend these concepts to pipelines. Today, there are **900+** nf-core modules — essentially building blocks with pre-defined inputs and outputs based on Nextflow's elegant dataflow model. Rather than creating pipelines from scratch, developers can now wire together these pre-assembled modules to deliver new functionality rapidly or use any of **80** of the pre-built [nf-core analysis pipelines](https://nf-co.re/pipelines). The result is a dramatic reduction in development and maintenance costs.\n\n## Some key Nextflow milestones\n\nSince the [first Nextflow release](https://github.com/nextflow-io/nextflow/releases/tag/v0.3.0) in July 2013, there have been **237 releases** and **5,800 commits**. Also, the project has been forked over **530** times. There have been too many important enhancements and milestones to capture here. We capture some important developments in the timeline below:\n\n\"Nextflow\n\nAs we look to the future, the pace of innovation continues to increase. It’s been exciting to see Nextflow expand beyond the various _omics_ disciplines to new areas such as medical imaging, data science, and machine learning. We continue to evolve Nextflow, adding new features and capabilities to support these emerging use cases and support new compute and storage environments. I can hardly wait to see what the next ten years will bring.\n\nFor those new to Nextflow and wishing to learn more about the project, we have compiled an excellent collection of resources to help you [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html).\n\n---\n\n1 [https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost](https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost)\n2 Coined by Gordon Moore of Intel in 1965, Moore’s Law predicted that transistor density, roughly equating to compute performance, would roughly double every two years. This was later revised in some estimates to 18 months. Over ten years, Moore’s law predicts roughly a 2^5 = 32X increase in performance – less than the ~50-fold decrease in sequencing costs. See [chart here](https://www.genome.gov/sites/default/files/inline-images/2021_Sequencing_cost_per_Human_Genome.jpg).\n3 This included features like separate queues, pre-emption policies, application profiles, and weighted fairshare algorithms.\n", "images": [ "/img/nextflow_ten_years_graphic.jpg" - ] + ], + "author": "Noel Ortiz", + "tags": "nextflow" }, { "slug": "2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines", "title": "Selecting the right storage architecture for your Nextflow pipelines", "date": "2023-05-04T00:00:00.000Z", "content": "\n_In this article we present the various storage solutions supported by Nextflow including on-prem and cloud file systems, parallel file systems, and cloud object stores. We also discuss Fusion file system 2.0, a new high-performance file system that can help simplify configuration, improve throughput, and reduce costs in the cloud._\n\nAt one time, selecting a file system for distributed workloads was straightforward. Through the 1990s, the Network File System (NFS), developed by Sun Microsystems in 1984, was pretty much the only game in town. It was part of every UNIX distribution, and it presented a standard [POSIX interface](https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html), meaning that applications could read and write data without modification. Dedicated NFS servers and NAS filers became the norm in most clustered computing environments.\n\nFor organizations that outgrew the capabilities of NFS, other POSIX file systems emerged. These included parallel file systems such as [Lustre](https://www.lustre.org/), [PVFS](https://www.anl.gov/mcs/pvfs-parallel-virtual-file-system), [OpenZFS](https://openzfs.org/wiki/Main_Page), [BeeGFS](https://www.beegfs.io/c/), and [IBM Spectrum Scale](https://www.ibm.com/products/storage-scale-system) (formerly GPFS). Parallel file systems can support thousands of compute clients and deliver more than a TB/sec combined throughput, however, they are expensive, and can be complex to deploy and manage. While some parallel file systems work with standard Ethernet, most rely on specialized low-latency fabrics such as Intel® Omni-Path Architecture (OPA) or InfiniBand. Because of this, these file systems are typically found in only the largest HPC data centers.\n\n## Cloud changes everything\n\nWith the launch of [Amazon S3](https://aws.amazon.com/s3/) in 2006, new choices began to emerge. Rather than being a traditional file system, S3 is an object store accessible through a web API. S3 abandoned traditional ideas around hierarchical file systems. Instead, it presented a simple programmatic interface and CLI for storing and retrieving binary objects.\n\nObject stores are a good fit for cloud services because they are simple and scalable to multiple petabytes of storage. Rather than relying on central metadata that presents a bottleneck, metadata is stored with each object. All operations are atomic, so there is no need for complex POSIX-style file-locking mechanisms that add complexity to the design. Developers interact with object stores using simple calls like [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html) (store an object in a bucket in return for a key) and [GetObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html) (retrieve a binary object, given a key).\n\nThis simple approach was ideal for internet-scale applications. It was also much less expensive than traditional file systems. As a result, S3 usage grew rapidly. Similar object stores quickly emerged, including Microsoft [Azure Blob Storage](https://azure.microsoft.com/en-ca/products/storage/blobs/), [Open Stack Swift](https://wiki.openstack.org/wiki/Swift), and [Google Cloud Storage](https://cloud.google.com/storage/), released in 2010.\n\n## Cloud object stores vs. shared file systems\n\nObject stores are attractive because they are reliable, scalable, and cost-effective. They are frequently used to store large amounts of data that are accessed infrequently. Examples include archives, images, raw video footage, or in the case of bioinformatics applications, libraries of biological samples or reference genomes. Object stores provide near-continuous availability by spreading data replicas across cloud availability zones (AZs). AWS claims theoretical data availability of up to 99.999999999% (11 9's) – a level of availability so high that it does not even register on most [downtime calculators](https://availability.sre.xyz/)!\n\nBecause they support both near-line and cold storage, object stores are sometimes referred to as \"cheap and deep.\" Based on current [S3 pricing](https://aws.amazon.com/s3/pricing), the going rate for data storage is USD 0.023 per GB for the first 50 TB of data. Users can \"pay as they go\" — spinning up S3 storage buckets and storing arbitrary amounts of data for as long as they choose. Some high-level differences between object stores and traditional file systems are summarized below.\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Cloud object stores\n \n Traditional file systems\n
\n Interface / access protocol\n \n HTTP-based API\n \n POSIX interface\n
\n Cost\n \n $\n \n $$$\n
\n Scalability / capacity\n \n Practically unlimited\n \n Limited\n
\n Reliability / availability\n \n Extremely high\n \n Varies\n
\n Performance\n \n Typically lower\n \n Varies\n
\n Support for existing application\n \n NO\n \n YES\n
\n
\n\nThe downside of object storage is that the vast majority of applications are written to work with POSIX file systems. As a result, applications seldom interact directly with object stores. A common practice is to copy data from an object store, perform calculations locally on a cluster node, and write results back to the object store for long-term storage.\n\n## Data handling in Nextflow\n\nUnlike older pipeline orchestrators, Nextflow was built with cloud object stores in mind. Depending on the cloud where pipelines run, Nextflow manages cloud credentials and allows users to provide a path to shared data. This can be a shared file system such as `/my-shared-filesystem/data` or a cloud object store e.g. `s3://my-bucket/data/`.\n\n**Nextflow is exceptionally versatile when it comes to data handling, and can support almost any file system or object store.** Internally, Nextflow uses [executors](https://nextflow.io/docs/latest/executor.html) implemented as plug-ins to insulate pipeline code from underlying compute and storage environments. This enables pipelines to run without modification across multiple clouds regardless of the underlying storage technology.\n\nSuppose an S3 bucket is specified as a location for shared data during pipeline execution. In that case, aided by the [nf-amazon](https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-amazon) plug-in, Nextflow transparently copies data from the S3 bucket to a file system on a cloud instance. Containerized applications mount the local file system and read and write data directly. Once processing is complete, Nextflow copies data to the shared bucket to be available for the next task. All of this is completely transparent to the pipeline and applications. The same plug-in-based approach is used for other cloud object stores such as Azure BLOBs and Google Cloud Storage.\n\n## The Nextflow scratch directive\n\nThe idea of staging data from shared repositories to a local disk, as described above, is not new. A common practice with HPC clusters when using NFS file systems is to use local \"scratch\" storage.\n\nA common problem with shared NFS file systems is that they can be relatively slow — especially when there are multiple clients. File systems introduce latency, have limited IO capacity, and are prone to problems such as “hot spots” and bandwidth limitations when multiple clients read and write files in the same directory.\n\nTo avoid bottlenecks, data is often copied from an NFS filer to local scratch storage for processing. Depending on data volumes, users often use fast solid-state drives or [RAM disks](https://www.mvps.net/docs/how-to-mount-the-physical-memory-from-a-linux-system-as-a-partition/) for scratch storage to accelerate processing.\n\nNextflow automates this data handling pattern with built-in support for a [scratch](https://nextflow.io/docs/latest/process.html?highlight=scratch#scratch) directive that can be enabled or disabled per process. If scratch is enabled, data is automatically copied to a designated local scratch device prior to processing.\n\nWhen high-performance file systems such as Lustre or Spectrum Scale are available, the question of whether to use scratch storage becomes more complicated. Depending on the file system and interconnect, parallel file systems performance can sometimes exceed that of local disk. In these cases, customers may set scratch to false and perform I/O directly on the parallel file system.\n\nResults will vary depending on the performance of the shared file system, the speed of local scratch storage, and the amount of shared data to be shuttled back and forth. Users will want to experiment to determine whether enabling scratch benefits pipelines performance.\n\n## Multiple storage options for Nextflow users\n\nStorage solutions used with Nextflow can be grouped into five categories as described below:\n\n- Traditional file systems\n- Cloud object stores\n- Cloud file systems\n- High-performance cloud file systems\n- Fusion file system v2.0\n\nThe optimal choice will depend on your environment and the nature of your applications and compute environments.\n\n**Traditional file systems** — These are file systems typically deployed on-premises that present a POSIX interface. NFS is the most popular choice, but some users may use high-performance parallel file systems. Storage vendors often package their offerings as appliances, making them easier to deploy and manage. Solutions common in on-prem HPC environments include [Network Appliance](https://www.netapp.com/), [Data Direct Networks](https://www.ddn.com/) (DDN), [HPE Cray ClusterStor](https://www.hpe.com/psnow/doc/a00062172enw), and [IBM Storage Scale](https://www.ibm.com/products/storage-scale-system). While customers can deploy self-managed NFS or parallel file systems in the cloud, most don’t bother with this in practice. There are generally better solutions available in the cloud.\n\n**Cloud object stores** — In the cloud, object stores tend to be the most popular solution among Nextflow users. Although object stores don’t present a POSIX interface, they are inexpensive, easy to configure, and scale practically without limit. Depending on performance, access, and retention requirements, customers can purchase different object storage tiers at different price points. Popular cloud object stores include Amazon S3, Azure BLOBs, and Google Cloud Storage. As pipelines execute, the Nextflow executors described above manage data transfers to and from cloud object storage automatically. One drawback is that because of the need to copy data to and from the object store for every process, performance may be lower than a fast shared file system.\n\n**Cloud file systems** — Often, it is desirable to have a shared file NFS system. However, these environments can be tedious to deploy and manage in the cloud. Recognizing this, most cloud providers offer cloud file systems that combine some of the best properties of traditional file systems and object stores. These file systems present a POSIX interface and are accessible via SMB and NFS file-sharing protocols. Like object stores, they are easy to deploy and scalable on demand. Examples include [Amazon EFS](https://aws.amazon.com/efs/), [Azure Files](https://azure.microsoft.com/en-us/products/storage/files/), and [Google Cloud Filestore](https://cloud.google.com/filestore). These file systems are described as \"serverless\" and \"elastic\" because there are no servers to manage, and capacity scales automatically.\n\nComparing price and performance can be tricky because cloud file systems are highly configurable. For example, [Amazon EFS](https://aws.amazon.com/efs/pricing/) is available in [four storage classes](https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html) – Amazon EFS Standard, Amazon EFS Standard-IA, and two One Zone storage classes – Amazon EFS One Zone and Amazon EFS One Zone-IA. Similarly, Azure Files is configurable with [four different redundancy options](https://azure.microsoft.com/en-us/pricing/details/storage/files/), and different billing models apply depending on the offer selected. To provide a comparison, Amazon EFS Standard costs $0.08 /GB-Mo in the US East region, which is ~4x more expensive than Amazon S3.\n\nFrom the perspective of Nextflow users, using Amazon EFS and similar cloud file systems is the same as using a local NFS system. Nextflow users must ensure that their cloud instances mount the NFS share, so there is slightly more management overhead than using an S3 bucket. Nextflow users and administrators can experiment with the scratch directive governing whether Nextflow stages data in a local scratch area or reads and writes directly to the shared file system.\n\nCloud file systems suffer from some of the same limitations as on-prem NFS file systems. They often don’t scale efficiently, and performance is limited by network bandwidth. Also, depending on the pipeline, users may need to stage data to the shared file system in advance, often by copying data from an object store used for long term storage.\n\nFor [Nextflow Tower](https://cloud.tower.nf/) users, there is a convenient integration with Amazon EFS. Tower Cloud users can have an Amazon EFS instance created for them automatically via Tower Forge, or they can leverage an existing EFS instance in their compute environment. In either case, Tower ensures that the EFS share is available to compute hosts in the AWS Batch environment, reducing configuration requirements.\n\n**Cloud high-performance file systems** — For customers that need high levels of performance in the cloud, Amazon offers Amazon FSx. Amazon FSx comes in different flavors, including NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. In HPC circles, [FSx for Lustre](https://aws.amazon.com/fsx/lustre/) is most popular delivering sub-millisecond latency, up to 1 TB/sec maximum throughput per file system, and millions of IOPs. Some Nextflow users with data bottlenecks use FSx for Lustre, but it is more difficult to configure and manage than Amazon S3.\n\nLike Amazon EFS, FSx for Lustre is a fully-managed, serverless, elastic file system. Amazon FSx for Lustre is configurable, depending on customer requirements. For example, customers with latency-sensitive applications can deploy FSx cluster nodes with SSD drives. Customers concerned with cost and throughput can select standard hard drives (HDD). HDD-based FSx for Lustre clusters can be optionally configured with an SSD-based cache to accelerate performance. Customers also choose between different persistent file system options and a scratch file system option. Another factor to remember is that with parallel file systems, bandwidth scales with capacity. If you deploy a Lustre file system that is too small, you may be disappointed in the performance.\n\nFSx for Lustre persistent file systems ranges from 125 to 1,000 MB/s/TiB at [prices](https://aws.amazon.com/fsx/lustre/pricing/) ranging from **$0.145** to **$0.600** per GB month. Amazon also offers a lower-cost scratch FSx for Lustre file systems (not to be confused with the scratch directive in Nextflow). At this tier, FSx for Lustre does not replicate data across availability zones, so it is suited to short-term data storage. Scratch FSx for Lustre storage delivers **200 MB/s/TiB**, costing **$0.140** per GB month. This is **~75%** more expensive than Amazon EFS (Standard) and **~6x** the cost of standard S3 storage. Persistent FSx for Lustre file systems configured to deliver **1,000 MB/s/TiB** can be up to **~26x** the price of standard S3 object storage!\n\n**Hybrid Cloud file systems** — In addition to the solutions described above, there are other solutions that combine the best of object stores and high-performance parallel file systems. An example is [WekaFS™](https://www.weka.io/) from WEKA. WekaFS is used by several Nextflow users and is deployable on-premises or across your choice cloud platforms. WekaFS is attractive because it provides multi-protocol access to the same data (POSIX, S3, NFS, SMB) while presenting a common namespace between on-prem and cloud resident compute environments. Weka delivers the performance benefits of a high-performance parallel file system and optionally uses cloud object storage as a backing store for file system data to help reduce costs.\n\nFrom a Nextflow perspective, WekaFS behaves like any other shared file system. As such, Nextflow and Tower have no specific integration with WEKA. Nextflow users will need to deploy and manage WekaFS themselves making the environment more complex to setup and manage. However, the flexibility and performance provided by a hybrid cloud file system makes this worthwhile for many organizations.\n\n**Fusion file system 2.0** — Fusion file system is a solution developed by [Seqera Labs](https://seqera.io/fusion) that aims to bridge the gap between cloud-native storage and data analysis workflows. The solution implements a thin client that allows pipeline jobs to access object storage using a standard POSIX interface, thus simplifying and speeding up most operations.\n\nThe advantage of the Fusion file system is that there is no need to copy data between S3 and local storage. The Fusion file system driver accesses and manipulates files in Amazon S3 directly. You can learn more about the Fusion file system and how it works in the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nFor sites struggling with performance and scalability issues on shared file systems or object storage, the Fusion file system offers several advantages. [Benchmarks conducted](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) by Seqera Labs have shown that, in some cases, **Fusion can deliver performance on par with Lustre but at a much lower cost.** Fusion is also significantly easier to configure and manage and can result in lower costs for both compute and storage resources.\n\n## Comparing the alternatives\n\nA summary of storage options is presented in the table below:\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Traditional file systems\n \n Cloud object storage\n \n Cloud file systems\n \n Fusion FS\n
\n NFS, Lustre, Spectrum Scale\n \n Amazon S3\n \n Azure BLOB storage\n \n Google Cloud Storage\n \n Amazon EFS\n \n Amazon FSX for Lustre\n \n Azure File\n \n Fusion file system 2.0\n
\n Deployment model\n \n Manual\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n
\n Access model\n \n POSIX\n \n Object\n \n Object\n \n Object\n \n POSIX\n \n POSIX\n \n POSIX\n \n POSIX\n
\n Clouds supported\n \n On-prem, any cloud\n \n AWS only\n \n Azure only\n \n GCP only\n \n AWS only\n \n AWS only\n \n Azure only\n \n AWS, GCP and Azure 1\n
\n Requires block storage\n \n Yes\n \n Optional\n \n Optional\n \n Optional\n \n Optional\n \n No\n \n Optional\n \n No\n
\n Relative cost\n \n $$\n \n $\n \n $\n \n $\n \n $$\n \n $$$\n \n $$\n \n $\n
\n Nextflow plugins\n \n -\n \n nf-amazon\n \n nf-azure\n \n nf-google\n \n -\n \n -\n \n -\n \n nf-amazon\n
\n Tower support\n \n Yes\n \n Yes, existing buckets\n \n Yes, existing BLOB container\n \n Yes, existing cloud storage bucket\n \n Yes, creates EFS instances\n \n Yes, creates FSx for Lustre instances\n \n File system created manually\n \n Yes, fully automated\n
\n Dependencies\n \n Externally configured\n \n Wave Amazon S3\n
\n Cost model\n \n Fixed price on-prem, instance+block storage costs\n \n GB per month\n \n GB per month\n \n GB per month\n \n Multiple factors\n \n Multiple factors\n \n Multiple factors\n \n GB per month (uses S3)\n
\n Level of configuration effort (when used with Tower)\n \n High\n \n Low\n \n Low\n \n Low\n \n Medium (low with Tower)\n \n High (easier with Tower)\n \n Medium\n \n Low\n
\n Works best with:\n \n Any on-prem cluster manager (LSF, Slurm, etc.)\n \n AWS Batch\n \n Azure Batch\n \n Google Cloud Batch\n \n AWS Batch\n \n AWS Batch\n \n Azure Batch\n \n AWS Batch, Amazon EKS, Azure Batch, Google Cloud Batch 1\n
\n
\n\n## So what’s the bottom line?\n\nThe choice or storage solution depends on several factors. Object stores like Amazon S3 are popular because they are convenient and inexpensive. However, depending on data access patterns, and the amount of data to be staged in advance, file systems such as EFS, Azure Files or FSx for Lustre can also be a good alternative.\n\nFor many Nextflow users, Fusion file system will be a better option since it offers performance comparable to a high-performance file system at the cost of cloud object storage. Fusion is also dramatically easier to deploy and manage. [Adding Fusion support](https://nextflow.io/docs/latest/fusion.html) is just a matter of adding a few lines to the `nextflow.config` file.\n\nWhere workloads run is also an important consideration. For example, on-premises clusters will typically use whatever shared file system is available locally. When operating in the cloud, you can choose whether to use cloud file systems, object stores, high-performance file systems, Fusion FS, or hybrid cloud solutions such as Weka.\n\nStill unsure what storage solution will best meet your needs? Consider joining our community at [nextflow.slack.com](https://nextflow.slack.com/). You can engage with others, post technical questions, and learn more about the pros and cons of the storage solutions described above.\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow" }, { "slug": "2023/the-state-of-kubernetes-in-nextflow", "title": "The State of Kubernetes in Nextflow", "date": "2023-03-10T00:00:00.000Z", "content": "\nHi, my name is Ben, and I’m a software engineer at Seqera Labs. I joined Seqera in November 2021 after finishing my Ph.D. at Clemson University. I work on a number of things at Seqera, but my primary role is that of a Nextflow core contributor.\n\nI have run Nextflow just about everywhere, from my laptop to my university cluster to the cloud and Kubernetes. I have written Nextlfow pipelines for bioinformatics and machine learning, and I even wrote a pipeline to run other Nextflow pipelines for my [dissertation research](https://github.com/bentsherman/tesseract). While I tried to avoid contributing code to Nextflow as a student (I had enough work already), now I get to work on it full-time!\n\nWhich brings me to the topic of this post: Nextflow and Kubernetes.\n\nOne of my first contributions was a “[best practices guide](https://github.com/seqeralabs/nf-k8s-best-practices)” for running Nextflow on Kubernetes. The guide has helped many people, but for me it provided a map for how to improve K8s support in Nextflow. You see, Nextflow was originally built for HPC, while Kubernetes and cloud batch executors were added later. While Nextflow’s extensible design makes adding features like new executors relatively easy, support for Kubernetes is still a bit spotty.\n\nSo, I set out to make Nextflow + K8s great! Over the past year, in collaboration with talented members of the Nextflow community, we have added all sorts of enhancements to the K8s executor. In this blog post, I’d like to show off all of these improvements in one place. So here we go!\n\n## New features\n\n### Submit tasks as Kubernetes Jobs\n\n_New in version 22.05.0-edge._\n\nNextflow submits tasks as Pods by default, which is sort of a bad practice. In Kubernetes, every Pod should be created through a controller (e.g., Deployment, Job, StatefulSet) so that Pod failures can be handled automatically. For Nextflow, the appropriate controller is a K8s Job. Using Jobs instead of Pods directly has greatly improved the stability of large Nextflow runs on Kubernetes, and will likely become the default behavior in a future version.\n\nYou can enable this feature with the following configuration option:\n\n```groovy\nk8s.computeResourceType = 'Job'\n```\n\nCredit goes to @xhejtman from CERIT-SC for leading the charge on this one!\n\n### Object storage as the work directory\n\n_New in version 22.10.0._\n\nOne of the most difficult aspects of using Nextflow with Kubernetes is that Nextflow needs a `PersistentVolumeClaim` (PVC) to store the shared work directory, which also means that Nextflow itself must run inside the Kubernetes cluster in order to access this storage. While the `kuberun` command attempts to automate this process, it has never been reliable enough for production usage.\n\nAt the Nextflow Summit in October 2022, we introduced [Fusion](https://seqera.io/fusion/), a file system driver that can mount S3 buckets as POSIX-like directories. The combination of Fusion and [Wave](https://seqera.io/wave/) (a just-in-time container provisioning service) enables you to have your work directory in S3-compatible storage. See the [Wave blog post](https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html) for an explanation of how it works — it’s pretty cool.\n\nThis functionality is useful in general, but it is especially useful for Kubernetes, because (1) you don’t need to provision your own PVC and (2) you can run Nextflow on Kubernetes without using `kuberun` or creating your own submitter Pod.\n\nThis feature currently supports AWS S3 on Elastic Kubernetes Service (EKS) clusters and Google Cloud Storage on Google Kubernetes Engine (GKE) clusters.\n\nCheck out [this article](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/) over at the Seqera blog for an in-depth guide to running Nextflow (with Fusion) on Amazon EKS.\n\n### No CPU limits by default\n\n_New in version 22.11.0-edge._\n\nWe have changed the default behavior of CPU requests for the K8s executor. Before, a single number in a Nextflow resource request (e.g., `cpus = 8`) was interpreted as both a “request” (lower bound) and a “limit” (upper bound) in the Pod definition. However, setting an explicit CPU limit in K8s is increasingly seen as an anti-pattern (see [this blog post](https://home.robusta.dev/blog/stop-using-cpu-limits) for an explanation). The bottom line is that it is better to specify a request without a limit, because that will ensure that each task has the CPU time it requested, while also allowing the task to use more CPU time if it is available. Unlike other resources like memory and disk, CPU time is compressible — it can be given and taken away without killing the application.\n\nWe have also updated the Docker integration in Nextflow to use [CPU shares](https://www.batey.info/cgroup-cpu-shares-for-docker.html), which is the mechanism used by [Kubernetes](https://www.batey.info/cgroup-cpu-shares-for-kubernetes.html) and [AWS Batch](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions) under the hood to define expandable CPU requests. These changes make the behavior of CPU requests in Nextflow much more consistent across executors.\n\n### CSI ephemeral volumes\n\n_New in version 22.11.0-edge._\n\nIn Kubernetes, volumes are used to provide storage and data (e.g., configuration and secrets) to Pods. Persistent volumes exist independently of Pods and can be mounted and unmounted over time, while ephemeral volumes are attached to a single Pod and are created and destroyed alongside it. While Nextflow can use any persistent volume through a `PersistentVolumeClaim`, ephemeral volume types are supported on a case-by-case basis. For example, `ConfigMaps` and `Secrets` are two ephemeral volume types that are already supported by Nextflow.\n\nNextflow now also supports [CSI ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes). CSI stands for Container Storage Interface, and it is a standard used by Kubernetes to support third-party storage systems as volumes. The most common example of a CSI ephemeral volume is [Secrets Store](https://secrets-store-csi-driver.sigs.k8s.io/getting-started/usage.html), which is used to inject secrets from a remote vault such as [Hashicorp Vault](https://www.vaultproject.io/) or [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/).\n\n_Note: CSI persistent volumes can already be used in Nextflow through a `PersistentVolumeClaim`._\n\n### Local disk storage for tasks\n\n_New in version 22.11.0-edge._\n\nNextflow uses a shared work directory to coordinate tasks. Each task receives its own subdirectory with the required input files, and each task is expected to write its output files to this directory. As a workflow scales to thousands of concurrent tasks, this shared storage becomes a major performance bottleneck. We are investigating a few different ways to overcome this challenge.\n\nOne of the tools we have to reduce I/O pressure on the shared work directory is to make tasks use local storage. For example, if a task takes input file A, produces an intermediate file B, then produces an output file C, the file B can be written to local storage instead of shared storage because it isn’t a required output file. Or, if the task writes an output file line by line instead of all at once at the end, it can stream the output to local storage first and then copy the file to shared storage.\n\nWhile it is far from a comprehensive solution, local storage can reduce I/O congestion in some cases. Provisioning local storage for every task looks different on every platform, and in some cases it is not supported. Fortunately, Kubernetes provides a seamless interface for local storage, and now Nextflow supports it as well.\n\nTo provision local storage for tasks, you must (1) add an `emptyDir` volume to your Pod options, (2) request disk storage via the `disk` directive, and (3) direct tasks to use the local storage with the `scratch` directive. Here’s an example:\n\n```groovy\nprocess {\n disk = 10.GB\n pod = [ [emptyDir: [:], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nAs a bonus, you can also provision an `emptyDir` backed by memory:\n\n```groovy\nprocess {\n memory = 10.GB\n pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nNextflow maps the `disk` directive to the [`ephemeral-storage`](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage) resource request, which is provided by the [`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume (another ephemeral volume type).\n\n### Miscellaneous\n\nCheck the [release notes](https://github.com/nextflow-io/nextflow/releases) or the list of [K8s pull requests](https://github.com/nextflow-io/nextflow/pulls?q=is%3Apr+label%3Aplatform%2Fk8s) on Github to see what else has been added. Here are some notable improvements from the past year:\n\n- Support Pod `affinity` ([640cbed4](https://github.com/nextflow-io/nextflow/commit/640cbed4813a34887d4dc10f87fa2e4aa524d055))\n- Support Pod `automountServiceAccountToken` ([1b5908e4](https://github.com/nextflow-io/nextflow/commit/1b5908e4cbbb79f93be2889eec3acfa6242068a1))\n- Support Pod `priorityClassName` ([51650f8c](https://github.com/nextflow-io/nextflow/commit/51650f8c411ba40f0966031035e7a47c036f542e))\n- Support Pod `tolerations` ([7f7cdadc](https://github.com/nextflow-io/nextflow/commit/7f7cdadc6a36d0fb99ef125f6c6f89bfca8ca52e))\n- Support `time` directive via `activeDeadlineSeconds` ([2b6f70a8](https://github.com/nextflow-io/nextflow/commit/2b6f70a8fa55b993fa48755f7a47ac9e1b584e48))\n- Improved control over error conditions ([064f9bc4](https://github.com/nextflow-io/nextflow/commit/064f9bc4), [58be2128](https://github.com/nextflow-io/nextflow/commit/58be2128), [d86ddc36](https://github.com/nextflow-io/nextflow/commit/d86ddc36))\n- Improved support for labels and queue annotation ([9951fcd9](https://github.com/nextflow-io/nextflow/commit/9951fcd9), [4df8c8d2](https://github.com/nextflow-io/nextflow/commit/4df8c8d2))\n- Add support for AWS IAM role for Service Accounts ([62df42c3](https://github.com/nextflow-io/nextflow/commit/62df42c3), [c3364d0f](https://github.com/nextflow-io/nextflow/commit/c3364d0f), [b3d33e3b](https://github.com/nextflow-io/nextflow/commit/b3d33e3b))\n\n## Beyond Kubernetes\n\nWe’ve added tons of value to Nextflow over the past year – not just in terms of Kubernetes support, but also in terms of performance, stability, and integrations with other technologies – and we aren’t stopping any time soon! We have greater ambitions still for Nextflow, and I for one am looking forward to what we will accomplish together. As always, keep an eye on this blog, as well as the [Nextflow GitHub](https://github.com/nextflow-io/nextflow) page, for the latest updates to Nextflow.\n", - "images": [] + "images": [], + "author": "Ben Sherman", + "tags": "nextflow, kubernetes" }, { "slug": "2024/addressing-bioinformatics-core-challenges", "title": "Addressing Bioinformatics Core Challenges with Nextflow and nf-core", "date": "2024-09-11T00:00:00.000Z", "content": "\nI was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face. From managing complex pipelines to optimizing resource utilization, Nextflow offers a range of benefits that can streamline workflows and improve productivity. In this blog, I'll summarize my talk and share insights on how Nextflow can help overcome some of those challenges, including meeting the needs of a wide range of users or customers, automate reporting, customising pipelines and training.\n\n### Challenge 1: running multiple services\n\n_Challenge description: “I have a wide range of stakeholders, and my pipelines need to address different needs in multiple scientific domains”_\n\nOne of the biggest challenges faced by bioinformatics cores is catering to a diverse range of users with varying applications. On one hand, one might need to run analyses for researchers focused on cancer or human genetics. On the other hand, one may also need to support scientists working with mass spectrometry or metagenomics. Fortunately, the nf-core community has made it relatively easy to tackle these diverse needs with their curated pipelines. These pipelines are ready to use, covering a broad spectrum of applications, from genomics and metagenomics to immunology and mass spectrometry. In one of my slides I showed a non-exhaustive list, which spans genomics, metagenomics, immunology, mass spec, and more: one can find best-practice pipelines for almost any bioinformatics application imaginable, including emerging areas like imaging and spatial-omics. By leveraging this framework, one can not only tap into the expertise of the pipeline developers but also engage with them to discuss specific needs and requirements. This collaborative approach can significantly ease the deployment of a workflow, allowing the user to focus on high-priority tasks while ensuring that the analyses are always up to date and aligned with current best practices.\n\n### Challenge 2: customising applications\n\n_Challenge description: “We often need to customise our applications and pipeline, to meet specific in-house needs of our users”_\n\nWhile ready-to-use applications are a huge advantage, there are times when customisation is necessary. Perhaps the standard pipeline that works for most users doesn't quite meet the specific needs of a facilities user or customer. Fortunately, the nf-core community has got these cases covered. With over 1,300 modules at everyone’s disposal, one can easily compose their own pipeline using the nf-core components and tooling. Should that not be enough though, one can even create a pipeline from scratch using nf-core tools. For instance, one can run a simple command like “nf-core create” followed by the name of the pipeline, and voilà! The software package will create a complete skeleton for the pipeline, filled with pre-compiled code and placeholders to ease customisation. This process is incredibly quick, as I demonstrated in a video clip during the talk, where a pipeline skeleton was created in just a few moments.\n\nOf course, customisation isn't limited to pipelines. It also applies to containers, which are a crucial enabler of portability. When it comes to containers, Nextflow users have two options: an easy way and a more advanced approach. The easy way involves using Seqera Containers, a platform that allows anyone to compose a container using tools from bioconda, pypi, and conda-forge. No need for logging in, just select the tools, and the URL of your container will be made available in no time. One can build containers for either Docker or Singularity, and for different platforms (amd64 or arm64).\n\nIf one is looking for more control, they can use Wave as a command line. This is a powerful tool that can act as an intermediary between the user and a container registry. Wave builds containers on the fly, allowing anyone to pass a wave build command as an evaluation inside a docker run command. It's incredibly fast, and builds containers from conda packages in a matter of seconds. Wave, which is also the engine behind Seqera Containers, can be extremely handy to allow other operations like container augmentation. This feature enables a user to add new layers to existing containers without having to rebuild them, thanks to Docker's layer-based architecture. One can simply create a folder where configuration files or executable scripts are located, pass the folder to Wave which will add the folder with a new layer, and get the URL of the augmented container on the fly.\n\n### Challenge 3: Reporting\n\n_Challenge description: “I need to deliver a clear report of the analysis results, in a format that is accessible and can be used for publication purposes by my users”_\n\nReporting is a crucial aspect of any bioinformatics pipeline, and as for customisation Nextflow offers different ways to approach it. suitable for different levels of expertise. The most straightforward solution involves running MultiQC, a tool that collects the output and logs of a wide range of software in a pipeline and generates a nicely formatted HTML report. This is a great option if one wants a quick and easy way to get a summary of their pipeline's results. MultiQC is a widely used tool that supports a huge list (and growing) of bioinformatics tools and file formats, making it a great choice for many use cases.\n\nHowever, if the developer needs more control over the reporting process or wants to create a custom report that meets some specific needs, it is entirely possible to engineer the reports from scratch. This involves collecting the outputs from various processes in the pipeline and passing them as an input to a process that runs an R Markdown or Quarto script. R Markdown and Quarto are popular tools for creating dynamic documents that can be parameterised, allowing anyone to customize the content and the layout of a report dynamically.\nBy using this approach, one can create a report that is tailored to your specific needs, including the types of plots and visualizations they want to include, the formatting and layouting, branding, and anything specific one might want to highlight.\n\nTo follow this approach, the user can either create their own customised module, or re-use one of the available notebooks modules in the nf-core repository (quarto [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/quartonotebook), or jupyter [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/jupyternotebook)).\n\n### Challenge 4: Monitoring\n\n_Challenge description: “I need to be able to estimate and optimise runtimes as well as costs of my pipelines, fitting our cost model”_\n\nMonitoring is a critical aspect of pipeline management, and Nextflow provides a robust set of tools to help you track and optimise a pipeline's performance. At its core, monitoring involves tracking the execution of the pipeline to ensure that it's running efficiently and effectively. But it's not just about knowing how long a pipeline takes to run or how much it costs - it's also about making sure each process in the pipeline is using the requested resources efficiently.\nWith Nextflow, the user can track the resources used by each process in your pipeline, including CPU, memory, and disk usage and compare them visually with the resources requested in the pipeline configuration and reserved by each job. This information allows the user to identify bottlenecks and areas for optimisation, so one can fine-tune their pipeline for a better resource consumption. For example, if the user notices that one process is using a disproportionate amount of memory, they can adjust the configuration to better match the actual usage.\n\nBut monitoring isn't just about optimising a pipeline's performance - it's also about reducing the environmental impact where possible. A recently developed Nextflow plugin allows to track the carbon footprint of a pipeline, including the energy consumption and greenhouse gas emissions associated with running that pipeline. This information allows one to make informed decisions about their environmental impact, and gaining better awareness or even adopting greener strategies to computing.\n\nOne of the key benefits of Nextflow’s monitoring system is its flexibility. The user can either use the built-in html reports for trace and pipeline execution, or could monitor a run live by connecting to Seqera Platform and visualising its progress on a graphical interface in real time. More expert or creative users could use the trace file produced by a Nextflow execution, to create their own metrics and visualisations.\n\n### Challenge 5: User accessibility\n\n_Challenge description: “I could balance workloads better, by giving users a certain level of autonomy in running some of my pipelines”_\n\nUser accessibility is a crucial aspect of pipeline development, as it enables users with varying levels of bioinformatics experience to run complex pipelines with ease. One of the advantages of Nextflow, is that a developer can create pipelines that are not only robust and efficient but also user-friendly. Allowing your users to run them with a certain level of autonomy might be a good strategy in a bioinformatics core to decentralise straightforward analyses and invest human resources on more complex projects. Empowering a facility’s users to run specific pipelines independently could be a solution to reduce certain workloads.\n\nThe nf-core template includes a parameters schema, which is captured by the nf-core website to create a graphical interface for parameters configuration of the pipelines hosted under the nf-core organisation on GitHub. This interface allows users to fill in the necessary fields for parameters needed to run a pipeline, and allows even users with minimal experience with bioinformatics or command-line interfaces to quickly set up a run. The user can then simply copy and paste the command generated by the webpage into a terminal, and the pipeline will launch as configured. This approach is ideal for users who are familiar with basic computer tasks, and have a very minimal familiarity with a terminal.\n\nHowever, for users with even less bioinformatics experience, Nextflow and the nf-core template together offer an even more intuitive solution. The pipeline can be added to the launcher of the Seqera Platform, and one can provide users with a comprehensive and user-friendly interface that allows them to launch pipelines with ease. This platform offers a range of features, including access to datasets created from sample sheets, the ability to launch pipelines on a wide range of cloud environments as well as on HPC on-premise. A simple graphical interface simplifies the entire process.The Seqera Platform provides in this way a seamless and intuitive experience for users, allowing them to run pipelines without requiring extensive bioinformatics knowledge.\n\n### Challenge 6: Training\n\n_Challenge description: “Training my team and especially onboarding new team members is always challenging and requires documentation and good materials”_\n\nThe final challenge we often face in bioinformatics facilities is training. We all know that training is an ongoing issue, not just because of staff turnover and the need to onboard new recruits, but also because the field is constantly evolving. With new tools, techniques, and technologies emerging all the time, it can be difficult to keep up with the latest developments. However, training is crucial for ensuring that pipelines are robust, efficient, and accurate.\n\nFortunately, there are now many resources available to help with training. The Nextflow training website, for example, has been completely rebuilt recently and now offers a wealth of material suitable for everyone, from beginners to experts. Whether you're just starting out with Nextflow or are already an experienced user, you'll find plenty of resources to help you improve your skills. From introductory tutorials to advanced guides, the training website has everything you need to get the most out of this workflow manager.\n\nEveryone can access the material at their own pace, but regular training events have been scheduled during the year. Additionally, there is now a network of Nextflow Ambassadors who often organise local training events across the world. Without making comparisons with other solutions, I can easily say that the steep learning curve to get going with Nextflow is just a myth nowadays. The quality of the training material, the examples available, the frequency of events in person or online you can attend to, and more importantly a welcoming community of users, make learning Nextflow quite easy.\n\nIn my laboratory, usually in a couple of months bachelor students are reasonably confident with the code and with running pipelines and debugging common issues.\n\n### Conclusions\n\nIn conclusion, the presentation at ISMB has gathered quite some interest because I believe it has shown how Nextflow is a powerful and versatile tool that can help bioinformatics cores address those common challenges everyone has experienced. With its comprehensive tooling, extensive training materials, and active community of users, Nextflow offers a complete package that can help people streamline their workflows and improve their productivity.\nAlthough I might be biased on this, I also believe that by adopting Nextflow one also becomes part of a community of researchers and developers who are passionate about bioinformatics and committed to sharing their knowledge and expertise. Beginners not only will have access to a wealth of resources and tutorials, but more importantly to a supportive network of peers who can offer advice and guidance, and which is really fun to be part of.\n", - "images": [] + "images": [], + "author": "Francesco Lescai", + "tags": "nextflow,ambassador_post" }, { "slug": "2024/ambassador-second-call", @@ -530,14 +668,18 @@ "content": "\nNextflow Ambassadors are passionate individuals within the Nextflow community who play a more active role in fostering collaboration, knowledge sharing, and engagement. We launched this program at the Nextflow Summit in Barcelona last year, and it's been a great experience so far, so we've been recruiting more volunteers to expand the program. We’re going to close applications in June with the goal of having new ambassadors start in July, so if you’re interested in becoming an ambassador, now is your chance to apply!\n\n\n\nThe program has been off to a great start, bringing together a diverse group of 46 passionate individuals from around the globe. Our ambassadors have done a great job in their dedication to spreading the word about Nextflow, contributing significantly to the community in numerous ways, including writing insightful content, organizing impactful events, conducting training sessions, leading hackathons, and even contributing to the codebase. Their efforts have not only enhanced the Nextflow ecosystem but have also fostered a stronger, more interconnected global community.\n\nTo support their endeavors, we provide our ambassadors with exclusive swag, essential assets to facilitate their work and funding to attend events where they can promote Nextflow. With the end of the first semester fast approaching, we are excited to officially announce the second cohort of the Nextflow Ambassador program will start in July. If you are passionate about Nextflow and eager to make a meaningful impact, we invite you to [apply](http://seqera.typeform.com/ambassadors/) and join our vibrant community of ambassadors.\n\n**Application Details:**\n\n- **Call for Applications:** Open until June 14 (23h59 any timezone)\n- **Notification of Acceptance:** By June 30\n- **Program Start:** July 2024\n\n
\n \"Ambassadors\n
\n\nWe seek enthusiastic individuals ready to take their contribution to the next level through various initiatives such as content creation, event organization, training, hackathons, and more. As an ambassador, you will receive support and resources to help you succeed in your role, including swag, necessary assets, and funding for event participation.\n\nTo apply, please visit our [Nextflow Ambassador Program Application Page](http://seqera.typeform.com/ambassadors/) and submit your application no later than 23h59 June 14 (any timezone). The form shouldn’t take more than a few minutes to complete. We are eager to welcome a new group of ambassadors who will help support the growth and success of the Nextflow community.\n\nThanks to all our current ambassadors for their incredible work and dedication. We look forward to seeing the new ideas and initiatives that the next cohort of ambassadors will bring to the table. Together, let's continue to build a stronger, more dynamic Nextflow community.\n\n[Apply now and become a part of the Nextflow journey!](http://seqera.typeform.com/ambassadors/)\n\n---\n\nStay tuned for more updates and follow us on our [social](https://twitter.com/nextflowio) [media](https://x.com/seqeralabs) [channels](https://www.linkedin.com/company/seqera/posts/) to keep up with the latest news and events from the Nextflow community.\n", "images": [ "/img/ambassadors-hackathon.jpeg" - ] + ], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,ambassador_program,ambassador_post" }, { "slug": "2024/better-support-through-community-forum-2024", "title": "Moving toward better support through the Community forum", "date": "2024-08-28T00:00:00.000Z", "content": "\nAs the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support. By improving the forum’s usability and accessibility, we’re aiming to create a more efficient and welcoming environment for everyone. Read on to learn about the changes we’re implementing and how you can contribute to making the forum an even better resource for the community.\n\n\n\n
\n\nOne of the things that impressed me the most when I joined Seqera last year as a developer advocate for the Nextflow community, was how engaged people are, and how much peer-to-peer interaction there is across a vast range of scientific domains, cultures, and geographies. That’s wonderful for a number of reasons, not least of which is that whenever you run into a problem —or you’re trying to do something a bit complicated or new— it’s very likely that there is someone out there who is able and willing to help you figure it out.\n\nFor the past few months, our small team of developer advocates have been thinking about how to nurture that dynamism, and how to further improve the experience of peer-to-peer support as the Nextflow community continues to grow. We’ve come to the conclusion that the best thing we can do is make the [community forum](https://community.seqera.io/) an awesome place to go for help, answers, and resources.\n\n## Why focus on the forum?\n\nIf you’re familiar with the Nextflow Slack workspace, you know there’s a lot of activity there, and the #help channel is always hopping. It’s true, and that’s great, buuuuut using Slack has some important downsides that the forum doesn’t suffer from.\n\nOne of the standout features of the forum is the ability to search past questions and answers really easily. Whether you're browsing directly within the forum, or using Google or some other search engine, you can quickly find relevant information in a way that’s much harder to do on Slack. This means that solutions to common issues are readily accessible, saving you (and the resident experts who have already answered the same question a bunch of times) a whole lot of time and effort.\n\nAdditionally, the forum has no barrier to access— you can view all the content without the need to join yet another app. This open access ensures that everyone can benefit from the wealth of knowledge shared by community members.\n\n## Immediate improvements to the forum’s ease of use\n\nWe’re excited to roll out a few immediate changes to the forum that should make it easier and more pleasant to use.\n\n- We’re introducing a new, sleeker visual design to make navigation and posting more intuitive and enjoyable.\n\n- We’ve reorganized the categories to streamline the process of finding and providing help. Instead of having separate categories for various things (like Nextflow, Wave, Seqera Platform etc), there is now a single \"Ask for help\" category for all topics, eliminating any confusion about where to post your question. Simply put, if you need help, just post in the \"Ask for help\" category. Done.\n\nWe’re also planning to mirror existing categories from the Nextflow Slack workspace, such as the jobs board and shameless promo channels, to make that content more visible and searchable. This will help you find opportunities and promote your work more effectively.\n\n## What you can do to help\n\nThese changes are meant to make the forum a great place for peer-to-peer support for the Nextflow community. You can help us improve it further by giving us your feedback about the forum functionality (don’t be shy), by posting your questions in the forum, and of course, if you’re already a Nextflow expert, by answering questions there.\n\nCheck out the [community forum](https://community.seqera.io/) now!\n", - "images": [] + "images": [], + "author": "Geraldine Van der Auwera", + "tags": "nextflow,community" }, { "slug": "2024/bioinformatics-growth-in-turkiye", @@ -547,7 +689,9 @@ "images": [ "/img/blog-2024-06-12-turkish_workshop1a.png", "/img/blog-2024-06-12-turkish_workshop2a.png" - ] + ], + "author": "Kübra Narcı", + "tags": "nextflow,ambassador_post" }, { "slug": "2024/empowering-bioinformatics-mentoring", @@ -556,7 +700,9 @@ "content": "\nIn my journey with the nf-core Mentorship Program, I've mentored individuals from Malawi, Chile, and Brazil, guiding them through Nextflow and nf-core. Despite the distances, my mentees successfully adapted their workflows, contributing to the open-source community. Witnessing the transformative impact of mentorship firsthand, I'm encouraged to continue participating in future mentorship efforts and urge others to join this rewarding experience. But how did it all start?\n\n\n\nI’m [Robert Petit](https://www.robertpetit.com/), a bioinformatician at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/), in [Wyoming, USA](https://en.wikipedia.org/wiki/Wyoming). If you don’t know where that is, haha that’s fine, I’m pretty sure half the people in the US don’t know either! Wyoming is the 10th largest US state (253,000 km2), but the least populated with only about 580,000 people. It’s home to some very beautiful mountains and national parks, large animals including bears, wolves and the fastest land animal in the northern hemisphere, the Pronghorn. But it’s rural, can get cold (-10 C) and the high wind speeds (somedays average 50 kmph, with gusts 100+ kmph) only make it feel colder during the winter (sometimes feeling like -60 C to -40 C). You might be wondering:\n\nHow did some random person from Wyoming get involved in the nf-core Mentorship Program, and end up being the only mentor to have participated in all three rounds?\n\nI’ve been in the Nextflow world for over 7 years now (as of 2024), when I first converted a pipeline, [Staphopia](https://staphopia.github.io/) from Ruffus to Nextflow. Eventually, I would develop [Bactopia](https://bactopia.github.io/latest/), one of the leading and longest maintained (5 years now!) Nextflow pipelines for the analysis of Bacterial genomes. Through Bactopia, I’ve had the opportunity to help people all around the world get started using Nextflow and analyzing their own bacterial sequencing. It has also allowed me to make numerous contributions to nf-core, mostly through the nf-core/modules. So, when I heard about the opportunity to be a mentor in the nf-core’s Mentorship Program, I immediately applied.\n\nRound 1! To be honest, I didn’t know what to expect from the program. Only that I would help a mentee with whatever they needed related to Nextflow and nf-core. Then at the first meeting, I learned I would be working with Phil Ashton the Lead Bioinformatcian at Malawi Liverpool Wellcome Trust, in Blantyre, Malawi, and immediately sent him a “Yo!”. Phil and I had run into each other in the past because when it comes to bacterial genomics, the field is very small! Phil’s goal was to get Nextflow pipelines running on their infrastructure in Malawi to help with their public health response. We would end up using Bactopia as the model. But this mentorship wasn’t just about “running Bactopia”, for Phil it was important we built a basic understanding of how things are working on the back-end with Nextflow. In the end, Phil was able to get Nextflow, and Bactopia running, using Singularity, but also gain a better understanding of Nextflow by writing his own Nextflow code.\n\nRound 2! When Round 2 was announced, I didn’t hesitate to apply again as a mentor. This time, I would be paired up with Juan Ugalde, an Assistant Professor at Universidad Andres Bello in Santiago, Chile. I think Juan and I were both excited by this, as similar to Phil, Juan and I had run into each other (virtually) through MetaSub, a project to sequence samples taken from public transport systems across the globe. Like many during the COVID-19 pandemic, Juan was pulled into the response, during which he began looking into Nextflow for other viruses. In particular, hantavirus, a public health concern due to it being endemic in parts of Chile. Juan had developed a pipeline for hantavirus sequence analysis, and his goal was to convert it into Nextflow. Throughout this Juan got to learn about the nf-core community and Nextflow development, which he was successful at! As he was able to convert his pipeline into Nextflow and make it publicly available as [hantaflow](https://github.com/microbialds/hantaflow).\n\nRound 3! Well Round 3 almost didn’t happen for me, but I’m glad it did happen! At the first meeting, I learned I would be paired with Ícaro Maia Santos de Castro, at the time a PhD candidate at the University of São Paulo, in São Paulo, Brazil. We quickly learned we were both fans of One Piece, as Ícaro’s GitHub picture was Luffy from One Piece, haha and my background included a poster from One Piece. With Ícaro, we were starting with the basics of Nextflow (e.g. the nf-core training materials) with the goal of writing a Nextflow pipeline for his meta-transcriptomics dissertation work. We set the goal to develop his Nextflow pipeline, before an overseas move he had a few months away. He brought so many questions, his motivation never waned, and once he was asking questions about Channel Operators, I knew he was ready to write his pipeline. While writing his pipeline he learned about the nf-core/tools and also got to submit a new recipe to Bioconda, and modules to nf-core. By the end of the mentorship, Ícaro had succeeded in writing his pipeline in Nextflow and making it publicly available at [phiflow](https://github.com/icaromsc/nf-core-phiflow).\n\n
\n \"phiflow\n

Metromap of the phiflow workflow

\n
\n\nThrough all three rounds, I had the opportunity to work with some incredible people! But the awesomeness didn’t end with my mentees. One thing that always stuck out to me was how motivated everyone was, both mentees and mentors. There was a sense of excitement and real progress was being made by every group. After the first round ended, I remember thinking to myself, “how could it get better?” Haha, well it did, and it continued to get better and better in Rounds 2 and 3. I think this is a great testament to the organizers at nf-core that put it all together, the mentors and mentees, and the community behind Nextflow and nf-core.\n\nFor the future mentees in mentorship opportunities! Please don’t let yourself stop you from applying. Whether it’s a time issue, or a fear of not having enough experience to be productive. In each round, we’ve had people from all over the world, starting from the ground with no experience, to some mentees in which I wondered if maybe they should have been a mentor (some mentees did end up being a mentor in the last round!). As a mentee, it is a great opportunity to work directly with a mentor dedicated to seeing you grow and build confidence when it comes to Nextflow and bioinformatics. In addition, you will be introduced to the incredible community that is behind Nextflow and nf-core. I think you will quickly learn there are so many people in this community that are willing to help!\n\nFor the future mentors! It’s always awesome to be able to help others learn, but sometimes the mentor needs to learn too! For me, I found the nf-core Mentorship Program to be a great opportunity to improve my skills as a mentor. But it wasn’t just from working with my mentees. During each round I was surrounded by many great role models in the form of mentors and mentees to learn from. No two groups ever had the same goals, so you really get the chance to see so many different styles of mentorship being implemented, all producing significant results for each mentee. Like I told the mentees, if the opportunity comes up again, take the chance and apply to be a mentor!\n\nThere have now been three rounds of the nf-core Mentorship Program, and I am very proud to have been a mentor in each round! During this I have learned so much and been able to help my mentees and the community grow. I look forward to seeing what the future holds for the mentorship opportunities in the Nextflow community, and I encourage potential mentors and mentees to consider joining the program!\n", "images": [ "/img/blog-2024-04-25-mentorship-img1a.png" - ] + ], + "author": "Robert Petit", + "tags": "nextflow,nf-core,mentorship,ambassador_post" }, { "slug": "2024/experimental-cleanup-with-nf-boost", @@ -565,7 +711,9 @@ "content": "\n### Backstory\n\nWhen I (Ben) was in grad school, I worked on a Nextflow pipeline called [GEMmaker](https://github.com/systemsgenetics/gemmaker), an RNA-seq analysis pipeline similar to [nf-core/rnaseq](https://github.com/nf-core/rnaseq). We quickly ran into a problem, which is that on large runs, we were running out of storage! As it turns out, it wasn’t the final outputs, but the intermediate outputs (the BAM files, etc) that were taking up so much space, and we figured that if we could just delete those intermediate files sooner, we might be able to make it through a pipeline run without running out of storage. We were far from alone.\n\n\n\nAutomatic cleanup is currently the [oldest open issue](https://github.com/nextflow-io/nextflow/issues/452) on the Nextflow repository. For many users, the ability to quickly delete intermediate files makes the difference between a run being possible or impossible. [Stephen Ficklin](https://github.com/spficklin), the creator of GEMmaker, came up with a clever way to delete intermediate files and even “trick” Nextflow into skipping deleted tasks on a resumed run, which you can read about in the GitHub issue. It involved wiring the intermediate output channels to a “cleanup” process, along with a “done” signal from the relevant downstream processes to ensure that the intermediates were deleted at the right time.\n\nThis hack worked, but it required a lot of manual effort to wire up the cleanup process correctly, and it left me wondering whether it could be done automatically. Nextflow should be able to analyze the DAG, figure out when an output file can be deleted, and then delete it! During my time on the Nextflow team, I have implemented this exact idea in a [pull request](https://github.com/nextflow-io/nextflow/pull/3849), but there are still a few challenges to resolve, such as resuming from deleted runs (which is not as impossible as it sounds).\n\n### Introducing nf-boost: experimental features for Nextflow\n\nMany users have told me that they would gladly take the cleanup without the resume, so I found a way to provide the cleanup functionality in a plugin, which I call [nf-boost](https://github.com/bentsherman/nf-boost). This plugin is not just about automatic cleanup – it contains a variety of experimental features, like new operators and functions, that anyone can try today with a few extra lines of config, which is much less tedious than building Nextflow from a pull request. Not every new feature can be implemented via plugin, but for those features that can, it’s nice for the community to be able to try it out before we make it official.\n\nThe nf-boost plugin requires Nextflow v23.10.0 or later. You can enable the experimental cleanup by adding the following lines to your config file:\n\n```groovy\nplugins {\n id 'nf-boost'\n}\n\nboost {\n cleanup = true\n}\n```\n\n### Automatic cleanup: how it works\n\nThe strategy of automatic cleanup is simple:\n\n1. As soon as an output file can be deleted, delete it\n2. An output file can be deleted when (1) all downstream tasks that use the output file as an input have completed AND (2) the output file has been published (if it needs to be published)\n\nIn practice, the conditions for 2(a) are tricky to get right because Nextflow doesn’t know the full task graph from the start (thanks to the flexibility of Nextflow’s dataflow operators). But you don’t have to worry about any of that because we already figured out how to make it work! All you have to do is flip a switch (`boost.cleanup = true`) and enjoy the ride.\n\n### Real-world example\n\nLet’s consider a variant calling pipeline following standard best practices. Sequencing reads are mapped onto the genome, producing a BAM file which will be marked for duplicates, filtered, recalibrated using GATK, etc. This means that, for a given sample, at least four copies of the BAM file will be stored in the work directory. In other words, for an initial paired-end whole-exome sequencing (WES) sample of 12 GB, the work directory will quickly grow to 50 GB just to store the BAM files for one sample, or 100 GB for a paired sample (e.g. germline and tumor).\n\nNow suppose that we want to analyze a cohort of 100 patients – that’s ~10 TB of intermediate data, which is a real problem. For some users, it means processing only a few samples at a time, even though they might have the compute capacity to do much more. For others, it means not being able to process even one sample, because the accumulated intermediate data is simply too large. With automatic cleanup, Nextflow should be able to delete the previous BAM as soon as the next BAM is produced, for each sample independently.\n\nWe tested this use-case with a paired WES sample (total input size of 26.8 GB), by tracking the work directory size for a run with and a run without automatic cleanup. The results are shown below.\n\n\"disk\n\n_Note: we also changed the `boost.cleanupInterval` config option to 180 seconds, which was more optimal for our system._\n\nAs expected, we see that without automatic cleanup, the size of the work directory reaches 110 GB when all BAM files are produced and never deleted. On the other hand, when the nf-boost cleanup is enabled, the work directory occasionally peaks at ~50 GB (i.e. no more than two BAM files are stored at the same time), but always returns to ~25 GB, since the previous BAM is deleted immediately after the next BAM is ready. There is no impact on the size of the results (since they are identical) or the total runtime (since cleanup happens in parallel with the workflow itself).\n\nIn this case, automatic cleanup reduced the total storage by 50-75% (depending on how you measure the storage). In general, the effectiveness of automatic cleanup will depend greatly on how you write your pipeline. Here are a few rules of thumb that we’ve come up with so far:\n\n- As your pipeline becomes “deeper” (i.e. more processing steps in sequence), automatic cleanup becomes more effective, because it only needs to keep two steps’ worth of data, regardless of the total number of steps\n- As your pipeline becomes “wider” (i.e. more inputs being processed in parallel), automatic cleanup should have roughly the same level of effectiveness. If some samples take longer to process than others, the peak storage should be lower with automatic cleanup, since the “peaks” for each sample will happen at different times.\n- As you add more dependencies between processes, automatic cleanup becomes less effective, because it has to wait longer before it can delete the upstream outputs. Note that each output is tracked independently, so for example, sending logs to a summary process won’t affect the cleanup of other outputs from that same process.\n\n### Closing thoughts\n\nAutomatic cleanup in nf-boost is an experimental feature, and notably does not support resumability, meaning that the deleted files will simply be re-executed on a resumed run. While we work through these last few challenges, the nf-boost plugin is a nice option for users who want to benefit from what we’ve built so far and don’t need the resumability.\n\nThe nice thing about nf-boost’s automatic cleanup is that it is just a preview of what will eventually be the “official” cleanup feature in Nextflow (when it is merged), so by using nf-boost, you are helping the future of Nextflow directly! We hope that this experimental version will help users run workloads that were previously difficult or even impossible, and we look forward to when we can bring this feature home to Nextflow.\n", "images": [ "/img/blog-2024-08-08-nfboost-img1a.png" - ] + ], + "author": "Ben Sherman", + "tags": "nextflow,ambassador_post" }, { "slug": "2024/how_i_became_a_nextflow_ambassador", @@ -574,14 +722,18 @@ "content": "\nAs a PhD student in bioinformatics, I aimed to build robust pipelines to analyze diverse datasets throughout my research. Initially, mastering Bash scripting was a time-consuming challenge, but this journey ultimately led me to become a Nextflow Ambassador, engaging actively with the expert Nextflow community.\n\n\n\nMy name is [Firas Zemzem](https://www.linkedin.com/in/firaszemzem/), a PhD student based in [Tunisia](https://www.google.com/search?q=things+to+do+in+tunisia&sca_esv=3b07b09e3325eaa7&sca_upv=1&udm=15&biw=1850&bih=932&ei=AS2eZuqnFpG-i-gPwciJyAk&ved=0ahUKEwiqrOiRsbqHAxUR3wIHHUFkApkQ4dUDCBA&uact=5&oq=things+to+do+in+tunisia&gs_lp=Egxnd3Mtd2l6LXNlcnAiF3RoaW5ncyB0byBkbyBpbiB0dW5pc2lhMgUQABiABDIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yCBAAGBYYHhgPSOIGULYDWNwEcAF4AZABAJgBfaAB9gGqAQMwLjK4AQPIAQD4AQGYAgOgAoYCwgIKEAAYsAMY1gQYR5gDAIgGAZAGCJIHAzEuMqAH_Aw&sclient=gws-wiz-serp) working with the Laboratory of Cytogenetics, Molecular Genetics, and Biology of Reproduction at CHU Farhat Hached Sousse. I was specialized in human genetics, focusing on studying genomics behind neurodevelopmental disorders. Hence Developing methods for detecting SNPs and variants related to my work was crucial step for advancing medical research and improving patient outcomes. On the other hand, pipelines integration and bioinformatics tools were essential in this process, enabling efficient data analysis, accurate variant detection, and streamlined workflows that enhance the reliability and reproducibility of our findings.\n\n## The initial nightmare of Bash\n\nDuring my master's degree, I was a steadfast user of Bash scripting. Bash had been my go-to tool for automating tasks and managing workflows in my bioinformatics projects, such as variant calling. Its simplicity and versatility made it an indispensable part of my toolkit. I was writing Bash scripts for various next-generation sequencing (NGS) high-throughput analyses, including data preprocessing, quality control, alignment, and variant calling. However, as my projects grew more complex, I began to encounter the limitations of Bash. Managing dependencies, handling parallel executions, and ensuring reproducibility became increasingly challenging. Handling the vast amount of data generated by NGS and other high-throughput technologies was cumbersome. Using Bash became a nightmare for debugging and maintaining. I spent countless hours trying to make it work, only to be met with more errors and inefficiencies. It was nearly impossible to scale for larger datasets and more complex analyses. Additionally, managing different environments and versions of tools was beyond Bash's capabilities. I needed a solution that could handle these challenges more gracefully.\n\n## Game-Changing Call\n\nOne evening, I received a call from my friend, Mr. HERO, a bioinformatician. As we discussed our latest projects, I vented my frustrations with Bash. Mr. HERO, as I called him, the problem-solver, mentioned a tool called Nextflow. He described how it had revolutionized his workflow, making complex pipeline management a breeze. Intrigued, I decided to look into it.\n\n## Diving Into the process\n\nReading the [documentation](https://www.nextflow.io/docs/latest/index.html) and watching [tutorials](https://training.nextflow.io/) were my first steps. Nextflow's approach to workflow management was a revelation. Unlike Bash, Nextflow was designed to address the complexities of modern computational questions. It provided a transparent, declarative syntax for defining tasks and their dependencies and supported parallel execution out of the box. The first thing I did when I decided to convert one of my existing Bash scripts into a Nextflow pipeline was to start experimenting with simple code. Doing this was no small feat. I had to rethink my approach to workflow design and embrace a new way of defining tasks and dependencies. My learning curve was not too steep, so understanding how to translate my Bash logic into Nextflow's domain-specific language (DSL) was not that hard.\n\n## Eureka Moment: First run\n\nThe first time I ran my Nextflow pipeline, I was amazed by how smoothly and efficiently it handled tasks that previously took hours to debug and execute in Bash. Nextflow managed task dependencies, parallel execution, and error handling with ease, resulting in a faster, more reliable, and maintainable pipeline. The ability to run pipelines on different computing environments, from local machines to high-performance clusters and cloud platforms, was a game-changer. Several Nextflow features were particularly valuable: Containerization Support using Docker and Singularity ensured consistency across environments; Error Handling with automatic retry mechanisms and detailed error reporting saved countless debugging hours; Portability and scalability allowed seamless execution on various platforms; Modularity facilitated the reuse and combination of processes across different pipelines, enhancing efficiency and organization; and Reproducibility features, including versioning and traceability, ensured that workflows could be reliably reproduced and shared across different research projects and teams.\n\n
\n \"meme\n
\n\n## New Horizons: Becoming a Nextflow Ambassador\n\nSwitching from Bash scripting to Nextflow was more than just adopting a new tool. It was about embracing a new mindset. Nextflow’s emphasis on scalability, reproducibility, and ease of use transformed how I approached bioinformatics. The initial effort to learn Nextflow paid off in spades, leading to more robust, maintainable, and scalable workflows. My enthusiasm and advocacy for Nextflow didn't go unnoticed. Recently, I became a Nextflow Ambassador. This role allows me to further contribute to the community, promote best practices, and support new users as they embark on their own Nextflow journeys.\n\n## Future Projects and Community Engagement\n\nCurrently I am working on developing a Nextflow pipeline with my team that will help in analyzing variants, providing valuable insights for medical and clinical applications. This pipeline aims to improve the accuracy and efficiency of variant detection, ultimately supporting better diagnostic for patients with various genetic conditions. As part of my ongoing efforts within the Nextflow community, I am planning a series of projects aimed at developing and sharing advanced Nextflow pipelines tailored to specific genetic rare disorder analyses. These initiative will include detailed tutorials, case studies, and collaborative efforts with other researchers to enhance the accessibility and utility of Nextflow for various bioinformatics applications. Additionally, I plan to host workshops and seminars to spread knowledge and best practices among my colleagues and other researchers. This will help foster a collaborative environment where we can all benefit from the power and flexibility of Nextflow.\n\n## Invitation for researchers over the world\n\nAs a Nextflow Ambassador, I invite you to become part of a dynamic group of experts and enthusiasts dedicated to advancing workflow automation. Whether you're just starting or looking to deepen your knowledge, our community offers invaluable resources, support, and networking opportunities. You can chat with us on the [Nextflow Slack Workspace](https://www.nextflow.io/slack-invite.html) and ask your questions at the [Seqera Community Forum](https://community.seqera.io).\n", "images": [ "/img/ZemFiras-nextflowtestpipeline-Blog.png" - ] + ], + "author": "Firas Zemzem", + "tags": "nextflow,ambassador_post" }, { "slug": "2024/nextflow-24.04-highlights", "title": "Nextflow 24.04 - Release highlights", "date": "2024-05-27T00:00:00.000Z", "content": "\nWe release an \"edge\" version of Nextflow every month and a \"stable\" version every six months. The stable releases are recommended for production usage and represent a significant milestone. The [release changelogs](https://github.com/nextflow-io/nextflow/releases) contain a lot of detail, so we thought we'd highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let's get into it!\n\n:::tip\nWe also did a podcast episode about some of these changes!\nCheck it out here: [Channels Episode 41](/podcast/2024/ep41_nextflow_2404.html).\n:::\n\n## Table of contents\n\n- [New features](#new-features)\n - [Seqera Containers](#seqera-containers)\n - [Workflow output definition](#workflow-output-definition)\n - [Topic channels](#topic-channels)\n - [Process eval outputs](#process-eval-outputs)\n - [Resource limits](#resource-limits)\n - [Job arrays](#job-arrays)\n- [Enhancements](#enhancements)\n - [Colored logs](#colored-logs)\n - [AWS Fargate support](#aws-fargate-support)\n - [OCI auto pull mode for Singularity and Apptainer](#oci-auto-pull-mode-for-singularity-and-apptainer)\n - [Support for GA4GH TES](#support-for-ga4gh-tes)\n- [Fusion](#fusion)\n - [Enhanced Garbage Collection](#enhanced-garbage-collection)\n - [Increased File Handling Capacity](#increased-file-handling-capacity)\n - [Correct Publishing of Symbolic Links](#correct-publishing-of-symbolic-links)\n- [Other notable changes](#other-notable-changes)\n\n## New features\n\n### Seqera Containers\n\nA new flagship community offering was revealed at the Nextflow Summit 2024 Boston - **Seqera Containers**. This is a free-to-use container cache powered by [Wave](https://seqera.io/wave/), allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a [dedicated blog post](https://seqera.io/blog/introducing-seqera-pipelines-containers/) about this, but it's worth noting that the service can be used directly from Nextflow and not only through [https://seqera.io/containers/](https://seqera.io/containers/)\n\nIn order to use Seqera Containers in Nextflow, simply set `wave.freeze` _without_ setting `wave.build.repository` - for example, by using the following config for your pipeline:\n\n```groovy\nwave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'\n```\n\nAny processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether `singularity.enabled` is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the `nextflow inspect` command.\n\n### Workflow output definition\n\nThe workflow output definition is a new syntax for defining workflow outputs:\n\n```groovy\nnextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]\n```\n\nIt essentially provides a DSL2-style approach for publishing, and will replace `publishDir` once it is finalized. It also provides extra flexibility as it allows you to publish _any_ channel, not just process outputs. See the [Nextflow docs](https://nextflow.io/docs/latest/workflow.html#publishing-outputs) for more information.\n\n:::info\nThis feature is still in preview and may change in a future release.\nWe hope to finalize it in version 24.10, so don't hesitate to share any feedback with us!\n:::\n\n### Topic channels\n\nTopic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or \"topic\":\n\n```groovy\nprocess foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}\n```\n\nTopic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the `mix` operator). See the [Nextflow docs](https://nextflow.io/docs/latest/channel.html#topic) for more information.\n\n### Process `eval` outputs\n\nProcess `eval` outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:\n\n```groovy\nprocess sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}\n```\n\nThe shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a `path` or `env` output. The new `eval` output is a much more convenient way to capture this kind of command output directly. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#output-type-eval) for more information.\n\n#### Collecting software versions\n\nTogether, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:\n\n```groovy\nprocess FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}\n```\n\nThis approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq [here](https://github.com/nf-core/rnaseq/pull/1109) and [here](https://github.com/nf-core/rnaseq/pull/1115) to see them in action!\n\n### Resource limits\n\nThe **resourceLimits** directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:\n\n```groovy\nprocess.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]\n```\n\nIf a task requests more than the specified limit (e.g. due to [retry with dynamic resources](https://nextflow.io/docs/latest/process.html#dynamic-computing-resources)), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the `check_max()` function, which can now be replaced with `resourceLimits`. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#resourcelimits) for more information.\n\n### Job arrays\n\n**Job arrays** are now supported in Nextflow using the `array` directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a \"job array\" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.\n\nWith Nextflow, using job arrays is a one-liner:\n\n```groovy\nprocess.array = 100\n```\n\nYou can also enable job arrays for individual processes like any other directive. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#array) for more information.\n\n:::tip\nOn Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the `machineType` directive in conjunction with the `cpus` and `memory` directives.\n:::\n\n## Enhancements\n\n### Colored logs\n\n
\n\n**Colored logs** have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire [blog post](https://nextflow.io/blog/2024/nextflow-colored-logs.html) about it, so go check that out for more details!\n\n
\n\n![New coloured output from Nextflow](/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png)\n\n
\n\n### AWS Fargate support\n\nNextflow now supports **AWS Fargate** for AWS Batch jobs. See the [Nextflow docs](https://nextflow.io/docs/latest/aws.html#aws-fargate) for details.\n\n### OCI auto pull mode for Singularity and Apptainer\n\nNextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format.\n\nNow Nextflow allows specifying the option `ociAutoPull` both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the `exec` command.\n\n```groovy\nsingularity.ociAutoPull = true\n```\n\nThis results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/config.html#scope-singularity) for more information.\n\n### Support for GA4GH TES\n\nThe [Task Execution Service (TES)](https://ga4gh.github.io/task-execution-schemas/docs/) is an API specification, developed by [GA4GH](https://www.ga4gh.org/), which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are [Funnel](https://github.com/ohsu-comp-bio/funnel) and [TES Azure](https://github.com/microsoft/ga4gh-tes).\n\nNextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the `bin` directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:\n\n```groovy\nplugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'\n```\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/executor.html#ga4gh-tes) for more information.\n\n:::note\nTo better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, `nextflow-io/nf-ga4gh`. To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above.\n:::\n\n## Fusion\n\n[Fusion](https://seqera.io/fusion/) is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:\n\n### Enhanced Garbage Collection\n\nFusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage.\n\n### Increased File Handling Capacity\n\nSupport for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets.\n\n### Correct Publishing of Symbolic Links\n\nIn previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.\n\nThese enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users.\n\n## Other notable changes\n\n- Add native retry on spot termination for Google Batch ([`ea1c1b`](https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b))\n- Add support for instance templates in Google Batch ([`df7ed2`](https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96))\n- Allow secrets to be used with `includeConfig` ([`00c9f2`](https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d))\n- Allow secrets to be used in the pipeline script ([`df866a`](https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27))\n- Add retry strategy for publishing ([`c9c703`](https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761))\n- Add `k8s.cpuLimits` config option ([`3c6e96`](https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7))\n- Removed `seqera` and `defaults` from the standard channels used by the nf-wave plugin. ([`ec5ebd`](https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062))\n\nYou can view the full [Nextflow release notes on GitHub](https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0).\n", - "images": [] + "images": [], + "author": "Paolo Di Tommaso", + "tags": "nextflow" }, { "slug": "2024/nextflow-colored-logs", @@ -594,7 +746,9 @@ "/img/blog-nextflow-colored-logs/testing_terminal_themes.png", "/img/blog-nextflow-colored-logs/nextflow_console_varying_widths.png", "/img/blog-nextflow-colored-logs/nextflow_logs_side_by_side.png" - ] + ], + "author": "Phil Ewels", + "tags": "nextflow" }, { "slug": "2024/nextflow-nf-core-ancient-env-dna", @@ -604,7 +758,9 @@ "images": [ "/img/blog-2024-04-17-img1a.jpg", "/img/blog-2024-04-17-img1b.jpg" - ] + ], + "author": "James Fellows Yates", + "tags": "nextflow,nf-core,workshop,ambassador_post" }, { "slug": "2024/nf-schema", @@ -616,14 +772,18 @@ "/img/blog-2024-05-01-nfschema-img1b.jpg", "/img/blog-2024-05-01-nfschema-img1c.png", "/img/blog-2024-05-01-nfschema-img1d.jpg" - ] + ], + "author": "Nicolas Vannieuwkerke", + "tags": "nextflow,nf-core,ambassador_post,nf-schema" }, { "slug": "2024/nf-test-in-nf-core", "title": "Leveraging nf-test for enhanced quality control in nf-core", "date": "2024-04-03T00:00:00.000Z", "content": "\n# The ever-changing landscape of bioinformatics\n\nReproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is **hopefully** being updated, and pipelines are **ideally** being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.\n\n\n\n# The Wonderful World of Testing\n\n> \"Software testing is the process of evaluating and verifying that a software product does what it is supposed to do,\"\n> Lukas Forer, co-creator of nf-test.\n\nSoftware testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs across runs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are introduced during an update.\n\n# Writing effective tests\n\nAlthough having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups:\n\n1. Which inputs/functionalities should be tested?\n2. What contents should be tested?\n\n## Consideration 1: Testing inputs/functionality\n\nGenerally, software will have a default or most common use case. For instance, the nf-core [FastQC](https://nf-co.re/modules/fastqc) module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage ([\"the degree to which a test or set of tests exercises a particular program or system\"](https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/)), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the [bowtie2/align](https://nf-co.re/modules/bowtie2_align) module, aside from input files, the `save_unaligned` and `sort_bam` parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process.\n\n## Consideration 2: Testing outputs\n\nOnce test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible (for example, outputs containing timestamps or paths). In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced.\n\n# Testing in nf-core\n\nnf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs.\n\n# New and Improved Nextflow Testing with nf-test\n\nCreated by [Lukas Forer](https://github.com/lukfor) and [Sebastian Schönherr](https://github.com/seppinho), nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform:\n\n1. **Comprehensive Output Testing**: nf-test employs [snapshots](https://www.nf-test.com/docs/assertions/snapshots/) for handling complex data structures. This feature evaluates the contents of any specified output channel/file, enabling comprehensive and reliable tests that ensure data integrity following changes.\n2. **A Consistent Testing Framework for All Nextflow Components**: nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components.\n3. **A DSL for Tests**: Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing.\n4. **Readable Assertions**: nf-test offers a wide range of functions for writing clear and understandable [assertions](https://www.nf-test.com/docs/assertions/assertions/), improving the clarity and maintainability of tests.\n5. **Boilerplate Code Generation**: To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests.\n\n# But wait… there's more!\n\nThe merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have nf-test implemented. Moreover, an [expanding collection of examples](https://nf-co.re/docs/contributing/tutorials/nf-test_assertions) is available to guide you through adopting nf-test for your projects.\n\n# Adding nf-test to pipelines\n\nSeveral nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, [nf-core/methylseq](https://nf-co.re/methylseq/) was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future.\nFollowing these insights, [nf-core/fetchngs](https://nf-co.re/fetchngs) has been at the forefront of incorporating nf-test for testing modules, subworkflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other nf-core pipelines actively integrating nf-test include [mag](https://nf-co.re/mag), [sarek](https://nf-co.re/sarek), [readsimulator](https://nf-co.re/readsimulator), and [rnaseq](https://nf-co.re/rnaseq).\n\n# Pipeline development with nf-test\n\n**For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended**. An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is [phageannotator](https://github.com/nf-core/phageannotator). Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected.\nFor those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience:\n\n1. **Leverage nf-core modules/subworkflows extensively**. Devoting time early to contribute modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests.\n2. **Prioritize incremental changes over large overhauls**. Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable.\n3. **Facilitate parallel execution of nf-test to generate and test snapshots**. By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or updating snapshots time-consuming. Implementing scripts that allow tests to run in parallel—whether via a workload manager or in the cloud—can significantly save time and simplify the process of monitoring tests for pass or fail outcomes.\n\n# Community and contribution\n\nnf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds true as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition:\n\n1. **Adding nf-tests to new and existing nf-core modules/subworkflows**. There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful [instructions](https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test) to the website, which has made this process much more streamlined.\n2. **Adding nf-tests to nf-core pipelines**. Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started ([fetchngs](https://github.com/nf-core/fetchngs/tree/master), [sarek](https://github.com/nf-core/sarek/tree/master), [rnaseq](https://github.com/nf-core/rnaseq/tree/master), [readsimulator](https://github.com/nf-core/readsimulator/tree/master), [phageannotator](https://github.com/nf-core/phageannotator)).\n\n> These are great areas to work on & contribute in nf-core hackathons\n\nThe nf-core community added a significant number of nf-tests during the recent [hackathon in March 2024](https://nf-co.re/events/2024/hackathon-march-2024). Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about a much more effective, transparent, and inclusive system than if we worked in isolation.\n\n# Future directions\n\nLooking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management.\n\n# Conclusion\n\nThe integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research.\n\nSpecial thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration.\n", - "images": [] + "images": [], + "author": "Carson Miller", + "tags": "nextflow,nf-core,nf-test,ambassador_post" }, { "slug": "2024/nxf-nf-core-workshop-kogo", @@ -633,7 +793,9 @@ "images": [ "/img/blog-2024-03-14-kogo-img1a.jpg", "/img/blog-2024-03-14-kogo-img1b.png" - ] + ], + "author": "Yuk Kei Wan", + "tags": "nextflow,nf-core,workshop" }, { "slug": "2024/optimizing-nextflow-for-hpc-and-cloud-at-scale", @@ -644,14 +806,18 @@ "/img/blog-2024-01-17--s3-upload-cpu.png", "/img/blog-2024-01-17--s3-upload-memory.png", "/img/blog-2024-01-17--s3-upload-walltime.png" - ] + ], + "author": "Ben Sherman", + "tags": "nextflow,hpc,cloud" }, { "slug": "2024/reflecting-ambassador-collaboration", "title": "Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador", "date": "2024-06-19T00:00:00.000Z", "content": "\nAs a Nextflow Ambassador and a PhD student working in bioinformatics, I’ve always believed in the power of collaboration. Over the past six months, I’ve had the privilege of working with another PhD student specializing in metagenomics environmental science. This collaboration began through a simple email after the other researcher discovered my contact information on the ambassadors’ list page. It has been a journey of learning, problem-solving, and mutual growth. I’d like to share some reflections on this experience, highlighting both the challenges and the rewards.\n\n\n\n## Connecting Across Disciplines\n\nOur partnership began with a simple question about running one of nf-core’s metagenomics analysis pipelines. Despite being in different parts of Europe and coming from different academic backgrounds, we quickly found common ground. The combination of our expertise – my focus on bioinformatics workflows and their deep knowledge of microbial ecosystems – created a synergy that enriched our work.\n\n## Navigating Challenges Together\n\nLike any collaboration, ours was not without its difficulties. We faced numerous technical challenges, from optimizing computational resources to troubleshooting pipeline errors. There were moments of frustration when things didn’t work as expected. However, each challenge was an opportunity to learn and grow. Working through these challenges together made them much more manageable and even enjoyable at times. We focused on mastering Nextflow in a high-performance computing (HPC) environment, managing large datasets, and conducting comprehensive data analysis. Additionally, we explored effective data visualization techniques to better interpret and present the findings.\nWe leaned heavily on the Nextflow and nf-core community for support. The extensive documentation and guides were invaluable, and the different Slack channels provided real-time problem-solving assistance. Having the possibility of contacting the main developers of the pipeline that was troubling was a great resource that we are fortunate to have. The community’s willingness to share and offer help was a constant source of encouragement, making us feel supported every step of the way.\n\n## Learning and Growing\n\nOver the past six months, we’ve both learned a tremendous amount. The other PhD student became more adept at using and understanding Nextflow, particularly when running the nf-core/ampliseq pipeline, managing files, and handling high-performance computing (HPC) environments. I, on the other hand, gained a deeper understanding of environmental microbiomes and the specific needs of metagenomics research.\nOur sessions were highly collaborative, allowing us to share knowledge and insights freely. It was reassuring to know that we weren’t alone in our journey and that there was a whole community of researchers ready to share their wisdom and experiences. These interactions made our learning process more rewarding.\n\n## Achieving Synergy\n\nOne of the most rewarding aspects of this collaboration has been the synergy between our different backgrounds. Our combined expertise enabled us to efficiently analyze a high volume of metagenomics samples. The journey does not stop here, of course. Now that they have their samples processed, it comes the time to interpret the data, one of my favorite parts. Our work together highlighted the potential for Nextflow and the nf-core community to facilitate research across diverse fields. The collaboration has been a testament to the idea that when individuals from different disciplines come together, they can achieve more than they could alone.\nThis collaboration is poised to result in significant academic contributions. The other PhD student is preparing to publish a paper with the findings enabled by the use of the nf-core/ampliseq pipeline, which will be a key component of their thesis. This paper is going to serve as an excellent example of using Nextflow and nf-core pipelines in the field of metagenomics environmental science.\n\n## Reflecting on the Journey\n\nAs I reflect on these six months, I’m struck by the power of this community in fostering such collaborations. The support network, comprehensive resources, and culture of knowledge sharing have been essential in our success. This experience has reinforced my belief in the importance of open-source bioinformatics and data science communities for professional development and scientific advancement. Through it all, having a collaborator who understood the struggles and celebrated the successes with me made the journey all the more rewarding.\nMoving forward, I’m excited about the potential for more such collaborations. The past six months have been a journey of discovery and growth, and I’m grateful for the opportunity to work with such a dedicated and talented researcher. Our work is far from over, and I look forward to continuing this journey, learning more, and contributing to the field of environmental science.\n\n## Join the Journey!\n\nFor those of you in the Nextflow community or considering joining, I encourage you to take advantage of the resources available. Engage with the community, attend webinars, and don’t hesitate to ask questions. Whether you’re a seasoned expert or a curious newcomer, the Nextflow family is here to support you. Together, we can achieve great things.\n", - "images": [] + "images": [], + "author": "Cristina Tuñi i Domínguez", + "tags": "nextflow,ambassador_post" }, { "slug": "2024/reflections-on-nextflow-mentorship", @@ -663,7 +829,9 @@ "/img/blog-2024-04-10-img1b.png", "/img/blog-2024-04-10-img1c.png", "/img/blog-2024-04-10-img1d.png" - ] + ], + "author": "Anabella Trigila", + "tags": "nextflow,nf-core,workshop,ambassador_post" }, { "slug": "2024/training-local-site", @@ -673,7 +841,9 @@ "images": [ "/img/blog-2024-05-06-training-img1a.jpg", "/img/blog-2024-05-06-training-img2a.jpg" - ] + ], + "author": "Florian Wuennemann", + "tags": "nextflow,nf-core,ambassador_post,training" }, { "slug": "2024/welcome_ambassadors_20242", @@ -683,6 +853,8 @@ "images": [ "/img/blog-2024-07-10-img1a.png", "/img/nextflow_ambassador_logo.svg" - ] + ], + "author": "Marcel Ribeiro-Dantas", + "tags": "nextflow,ambassador_post" } ] \ No newline at end of file diff --git a/internal/export.mjs b/internal/export.mjs index 29da2d46..32585c29 100644 --- a/internal/export.mjs +++ b/internal/export.mjs @@ -24,20 +24,22 @@ function getPostsRecursively(dir) { for (const item of items) { const fullPath = path.join(dir, item.name); - + if (item.isDirectory()) { posts = posts.concat(getPostsRecursively(fullPath)); } else if (item.isFile() && item.name.endsWith('.md')) { const fileContents = fs.readFileSync(fullPath, 'utf8'); const { data, content } = matter(fileContents); const images = extractImagePaths(content, fullPath); - + posts.push({ slug: path.relative(postsDirectory, fullPath).replace('.md', ''), title: data.title, date: data.date, content: content, images: images, + author: data.author, + tags: data.tags, }); } } diff --git a/internal/findPerson.mjs b/internal/findPerson.mjs new file mode 100644 index 00000000..72ceb728 --- /dev/null +++ b/internal/findPerson.mjs @@ -0,0 +1,21 @@ +import sanityClient from '@sanity/client'; + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + +async function findPerson(name) { + const person = await client.fetch(`*[_type == "person" && name == $name][0]`, { name }); + if (!person) { + console.log(`⭕ No person found with the name "${name}".`); + return; + } else { + console.log(`Person found`, person.name); + return person; + } +} + +export default findPerson; \ No newline at end of file diff --git a/internal/import.mjs b/internal/import.mjs index ce27528d..fddec92f 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -3,6 +3,7 @@ import fs from 'fs'; import path from 'path'; import { customAlphabet } from 'nanoid'; import { marked } from 'marked'; +import findPerson from './findPerson.mjs'; const nanoid = customAlphabet('0123456789abcdef', 12); @@ -244,16 +245,25 @@ async function migratePosts() { } } + const person = await findPerson(post.author); + if (!person) return false; + const portableTextContent = markdownToPortableText(post.content, imageMap); const newSlug = post.slug.split('/').pop(); + let dateStr = post.date.split('T')[0]; + dateStr = `${dateStr} 8:00`; + console.log(dateStr); + + const sanityPost = { _type: 'blogPostDev', title: post.title, meta: { slug: { current: newSlug } }, - publishedAt: new Date(post.date).toISOString(), + publishedAt: new Date(dateStr).toISOString(), body: portableTextContent, + author: { _type: 'reference', _ref: person._id }, }; try { @@ -265,4 +275,7 @@ async function migratePosts() { } } -migratePosts().then(() => console.log('Migration complete')); \ No newline at end of file +migratePosts().then((isSuccess) => { + if (isSuccess) console.log('✅ Migration complete') + else console.log('❌ Migration failed') +}); \ No newline at end of file From 7a4700406d1c6a10c96f5d40db7615097da4b3c7 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 25 Sep 2024 12:28:46 +0200 Subject: [PATCH 13/21] Import/export fixes --- internal/export.json | 172 ++++++++++++++++++++-------------------- internal/export.mjs | 84 +++++++++++++++++++- internal/findPerson.mjs | 8 +- internal/import.mjs | 18 +++-- 4 files changed, 182 insertions(+), 100 deletions(-) diff --git a/internal/export.json b/internal/export.json index b18c7446..75428c42 100644 --- a/internal/export.json +++ b/internal/export.json @@ -3,7 +3,7 @@ "slug": "2014/nextflow-meets-docker", "title": "Reproducibility in Science - Nextflow meets Docker", "date": "2014-09-09T00:00:00.000Z", - "content": "\nThe scientific world nowadays operates on the basis of published articles.\nThese are used to report novel discoveries to the rest of the scientific community.\n\nBut have you ever wondered what a scientific article is? It is a:\n\n1. defeasible argument for claims, supported by\n2. exhibited, reproducible data and methods, and\n3. explicit references to other work in that domain;\n4. described using domain-agreed technical terminology,\n5. which exists within a complex ecosystem of technologies, people and activities.\n\nHence the very essence of Science relies on the ability of scientists to reproduce and\nbuild upon each other’s published results.\n\nSo how much can we rely on published data? In a recent report in Nature, researchers at the\nAmgen corporation found that only 11% of the academic research in the literature was\nreproducible by their groups [[1](http://www.nature.com/nature/journal/v483/n7391/full/483531a.html)].\n\nWhile many factors are likely at play here, perhaps the most basic requirement for\nreproducibility holds that the materials reported in a study can be uniquely identified\nand obtained, such that experiments can be reproduced as faithfully as possible.\nThis information is meant to be documented in the \"materials and methods\" of journal articles,\nbut as many can attest, the information provided there is often not adequate for this task.\n\n### Promoting Computational Research Reproducibility\n\nEncouragingly scientific reproducibility has been at the forefront of many news stories\nand there exist numerous initiatives to help address this problem. Particularly, when it\ncomes to producing reproducible computational analyses, some publications are starting\nto publish the code and data used for analysing and generating figures.\n\nFor example, many articles in Nature and in the new Elife journal (and others) provide a\n\"source data\" download link next to figures. Sometimes Elife might even have an option\nto download the source code for figures.\n\nAs pointed out by Melissa Gymrek [in a recent post](http://melissagymrek.com/science/2014/08/29/docker-reproducible-research.html)\nthis is a great start, but there are still lots of problems. She wrote that, for example, if one wants\nto re-execute a data analyses from these papers, he/she will have to download the\nscripts and the data, to only realize that he/she has not all the required libraries,\nor that it only runs on, for example, an Ubuntu version he/she doesn't have, or some\npaths are hard-coded to match the authors' machine.\n\nIf it's not easy to run and doesn't run out of the box the chances that a researcher\nwill actually ever run most of these scripts is close to zero, especially if they lack\nthe time or expertise to manage the required installation of third-party libraries,\ntools or implement from scratch state-of-the-art data processing algorithms.\n\n### Here comes Docker\n\n[Docker](http://www.docker.com) containers technology is a solution to many of the computational\nresearch reproducibility problems. Basically, it is a kind of a lightweight virtual machine\nwhere you can set up a computing environment including all the libraries, code and data that you need,\nwithin a single _image_.\n\nThis image can be distributed publicly and can seamlessly run on any major Linux operating system.\nNo need for the user to mess with installation, paths, etc.\n\nThey just run the Docker image you provided, and everything is set up to work out of the box.\nResearchers have already started discussing this (e.g. [here](http://www.bioinformaticszen.com/post/reproducible-assembler-benchmarks/),\nand [here](https://bcbio.wordpress.com/2014/03/06/improving-reproducibility-and-installation-of-genomic-analysis-pipelines-with-docker/)).\n\n### Docker and Nextflow: a perfect match\n\nOne big advantage Docker has compared to _traditional_ machine virtualisation technology\nis that it doesn't need a complete copy of the operating system, thus it has a minimal\nstartup time. This makes it possible to virtualise single applications or launch the execution\nof multiple containers, that can run in parallel, in order to speedup a large computation.\n\nNextflow is a data-driven toolkit for computational pipelines, which aims to simplify the deployment of\ndistributed and highly parallelised pipelines for scientific applications.\n\nThe latest version integrates the support for Docker containers that enables the deployment\nof self-contained and truly reproducible pipelines.\n\n### How they work together\n\nA Nextflow pipeline is made up by putting together several processes. Each process\ncan be written in any scripting language that can be executed by the Linux platform\n(BASH, Perl, Ruby, Python, etc). Parallelisation is automatically managed\nby the framework and it is implicitly defined by the processes input and\noutput declarations.\n\nBy integrating Docker with Nextflow, every pipeline process can be executed independently\nin its own container, this guarantees that each of them run in a predictable\nmanner without worrying about the configuration of the target execution platform. Moreover the\nminimal overhead added by Docker allows us to spawn multiple container executions in a parallel\nmanner with a negligible performance loss when compared to a platform _native_ execution.\n\n### An example\n\nAs a proof of concept of the Docker integration with Nextflow you can try out the\npipeline example at this [link](https://github.com/nextflow-io/examples/blob/master/blast-parallel.nf).\n\nIt splits a protein sequences multi FASTA file into chunks of _n_ entries, executes a BLAST query\nfor each of them, then extracts the top 10 matching sequences and\nfinally aligns the results with the T-Coffee multiple sequence aligner.\n\nIn a common scenario you generally need to install and configure the tools required by this\nscript: BLAST and T-Coffee. Moreover you should provide a formatted protein database in order\nto execute the BLAST search.\n\nBy using Docker with Nextflow you only need to have the Docker engine installed in your\ncomputer and a Java VM. In order to try this example out, follow these steps:\n\nInstall the latest version of Nextflow by entering the following command in your shell terminal:\n\n curl -fsSL get.nextflow.io | bash\n\nThen download the required Docker image with this command:\n\n docker pull nextflow/examples\n\nYou can check the content of the image looking at the [Dockerfile](https://github.com/nextflow-io/examples/blob/master/Dockerfile)\nused to create it.\n\nNow you are ready to run the demo by launching the pipeline execution as shown below:\n\n nextflow run examples/blast-parallel.nf -with-docker\n\nThis will run the pipeline printing the final alignment out on the terminal screen.\nYou can also provide your own protein sequences multi FASTA file by adding, in the above command line,\nthe option `--query ` and change the splitting chunk size with `--chunk n` option.\n\nNote: the result doesn't have a real biological meaning since it uses a very small protein database.\n\n### Conclusion\n\nThe mix of Docker, GitHub and Nextflow technologies make it possible to deploy\nself-contained and truly replicable pipelines. It requires zero configuration and\nenables the reproducibility of data analysis pipelines in any system in which a Java VM and\nthe Docker engine are available.\n\n### Learn how to do it!\n\nFollow our documentation for a quick start using Docker with Nextflow at\nthe following link https://www.nextflow.io/docs/latest/docker.html\n", + "content": "The scientific world nowadays operates on the basis of published articles.\nThese are used to report novel discoveries to the rest of the scientific community.\n\nBut have you ever wondered what a scientific article is? It is a:\n\n1. defeasible argument for claims, supported by\n2. exhibited, reproducible data and methods, and\n3. explicit references to other work in that domain;\n4. described using domain-agreed technical terminology,\n5. which exists within a complex ecosystem of technologies, people and activities.\n\nHence the very essence of Science relies on the ability of scientists to reproduce and\nbuild upon each other’s published results.\n\nSo how much can we rely on published data? In a recent report in Nature, researchers at the\nAmgen corporation found that only 11% of the academic research in the literature was\nreproducible by their groups [[1](http://www.nature.com/nature/journal/v483/n7391/full/483531a.html)].\n\nWhile many factors are likely at play here, perhaps the most basic requirement for\nreproducibility holds that the materials reported in a study can be uniquely identified\nand obtained, such that experiments can be reproduced as faithfully as possible.\nThis information is meant to be documented in the \"materials and methods\" of journal articles,\nbut as many can attest, the information provided there is often not adequate for this task.\n\n### Promoting Computational Research Reproducibility\n\nEncouragingly scientific reproducibility has been at the forefront of many news stories\nand there exist numerous initiatives to help address this problem. Particularly, when it\ncomes to producing reproducible computational analyses, some publications are starting\nto publish the code and data used for analysing and generating figures.\n\nFor example, many articles in Nature and in the new Elife journal (and others) provide a\n\"source data\" download link next to figures. Sometimes Elife might even have an option\nto download the source code for figures.\n\nAs pointed out by Melissa Gymrek [in a recent post](http://melissagymrek.com/science/2014/08/29/docker-reproducible-research.html)\nthis is a great start, but there are still lots of problems. She wrote that, for example, if one wants\nto re-execute a data analyses from these papers, he/she will have to download the\nscripts and the data, to only realize that he/she has not all the required libraries,\nor that it only runs on, for example, an Ubuntu version he/she doesn't have, or some\npaths are hard-coded to match the authors' machine.\n\nIf it's not easy to run and doesn't run out of the box the chances that a researcher\nwill actually ever run most of these scripts is close to zero, especially if they lack\nthe time or expertise to manage the required installation of third-party libraries,\ntools or implement from scratch state-of-the-art data processing algorithms.\n\n### Here comes Docker\n\n[Docker](http://www.docker.com) containers technology is a solution to many of the computational\nresearch reproducibility problems. Basically, it is a kind of a lightweight virtual machine\nwhere you can set up a computing environment including all the libraries, code and data that you need,\nwithin a single _image_.\n\nThis image can be distributed publicly and can seamlessly run on any major Linux operating system.\nNo need for the user to mess with installation, paths, etc.\n\nThey just run the Docker image you provided, and everything is set up to work out of the box.\nResearchers have already started discussing this (e.g. [here](http://www.bioinformaticszen.com/post/reproducible-assembler-benchmarks/),\nand [here](https://bcbio.wordpress.com/2014/03/06/improving-reproducibility-and-installation-of-genomic-analysis-pipelines-with-docker/)).\n\n### Docker and Nextflow: a perfect match\n\nOne big advantage Docker has compared to _traditional_ machine virtualisation technology\nis that it doesn't need a complete copy of the operating system, thus it has a minimal\nstartup time. This makes it possible to virtualise single applications or launch the execution\nof multiple containers, that can run in parallel, in order to speedup a large computation.\n\nNextflow is a data-driven toolkit for computational pipelines, which aims to simplify the deployment of\ndistributed and highly parallelised pipelines for scientific applications.\n\nThe latest version integrates the support for Docker containers that enables the deployment\nof self-contained and truly reproducible pipelines.\n\n### How they work together\n\nA Nextflow pipeline is made up by putting together several processes. Each process\ncan be written in any scripting language that can be executed by the Linux platform\n(BASH, Perl, Ruby, Python, etc). Parallelisation is automatically managed\nby the framework and it is implicitly defined by the processes input and\noutput declarations.\n\nBy integrating Docker with Nextflow, every pipeline process can be executed independently\nin its own container, this guarantees that each of them run in a predictable\nmanner without worrying about the configuration of the target execution platform. Moreover the\nminimal overhead added by Docker allows us to spawn multiple container executions in a parallel\nmanner with a negligible performance loss when compared to a platform _native_ execution.\n\n### An example\n\nAs a proof of concept of the Docker integration with Nextflow you can try out the\npipeline example at this [link](https://github.com/nextflow-io/examples/blob/master/blast-parallel.nf).\n\nIt splits a protein sequences multi FASTA file into chunks of _n_ entries, executes a BLAST query\nfor each of them, then extracts the top 10 matching sequences and\nfinally aligns the results with the T-Coffee multiple sequence aligner.\n\nIn a common scenario you generally need to install and configure the tools required by this\nscript: BLAST and T-Coffee. Moreover you should provide a formatted protein database in order\nto execute the BLAST search.\n\nBy using Docker with Nextflow you only need to have the Docker engine installed in your\ncomputer and a Java VM. In order to try this example out, follow these steps:\n\nInstall the latest version of Nextflow by entering the following command in your shell terminal:\n\n curl -fsSL get.nextflow.io | bash\n\nThen download the required Docker image with this command:\n\n docker pull nextflow/examples\n\nYou can check the content of the image looking at the [Dockerfile](https://github.com/nextflow-io/examples/blob/master/Dockerfile)\nused to create it.\n\nNow you are ready to run the demo by launching the pipeline execution as shown below:\n\n nextflow run examples/blast-parallel.nf -with-docker\n\nThis will run the pipeline printing the final alignment out on the terminal screen.\nYou can also provide your own protein sequences multi FASTA file by adding, in the above command line,\nthe option `--query ` and change the splitting chunk size with `--chunk n` option.\n\nNote: the result doesn't have a real biological meaning since it uses a very small protein database.\n\n### Conclusion\n\nThe mix of Docker, GitHub and Nextflow technologies make it possible to deploy\nself-contained and truly replicable pipelines. It requires zero configuration and\nenables the reproducibility of data analysis pipelines in any system in which a Java VM and\nthe Docker engine are available.\n\n### Learn how to do it!\n\nFollow our documentation for a quick start using Docker with Nextflow at\nthe following link https://www.nextflow.io/docs/latest/docker.html\n", "images": [], "author": "Maria Chatzou", "tags": "docker,github,reproducibility,data-analysis" @@ -12,7 +12,7 @@ "slug": "2014/share-nextflow-pipelines-with-github", "title": "Share Nextflow pipelines with GitHub", "date": "2014-08-07T00:00:00.000Z", - "content": "\nThe [GitHub](https://github.com) code repository and collaboration platform is widely\nused between researchers to publish their work and to collaborate on projects source code.\n\nEven more interestingly a few months ago [GitHub announced improved support for researchers](https://github.com/blog/1840-improving-github-for-science)\nmaking it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.\n\nWith a DOI for your GitHub repository archive your code becomes formally citable\nin scientific publications.\n\n### Why use GitHub with Nextflow?\n\nThe latest Nextflow release (0.9.0) seamlessly integrates with GitHub.\nThis feature allows you to manage your code in a more consistent manner, or use other\npeople's Nextflow pipelines, published through GitHub, in a quick and transparent manner.\n\n### How it works\n\nThe idea is very simple, when you launch a script execution with Nextflow, it will look for\na file with the pipeline name you've specified. If that file does not exist,\nit will look for a public repository with the same name on GitHub. If it is found, the\nrepository is automatically downloaded to your computer and the code executed. This repository\nis stored in the Nextflow home directory, by default `$HOME/.nextflow`, thus it will be reused\nfor any further execution.\n\nYou can try this feature out, having Nextflow (version 0.9.0 or higher) installed in your computer,\nby simply entering the following command in your shell terminal:\n\n nextflow run nextflow-io/hello\n\nThe first time you execute this command Nextflow will download the pipeline\nat the following GitHub repository `https://github.com/nextflow-io/hello`,\nas you don't already have it in your computer. It will then execute it producing the expected output.\n\nIn order for a GitHub repository to be used as a Nextflow project, it must\ncontain at least one file named `main.nf` that defines your Nextflow pipeline script.\n\n### Run a specific revision\n\nAny Git branch, tag or commit ID in the GitHub repository can be used to specify a revision,\nthat you want to execute, when running your pipeline by adding the `-r` option to the run command line.\nSo for example you could enter:\n\n nextflow run nextflow-io/hello -r mybranch\n\nor\n\n nextflow run nextflow-io/hello -r v1.1\n\nThis can be very useful when comparing different versions of your project.\nIt also guarantees consistent results in your pipeline as your source code evolves.\n\n### Commands to manage pipelines\n\nThe following commands allows you to perform some basic operations that can be used to manage your pipelines.\nAnyway Nextflow is not meant to replace functionalities provided by the [Git](http://git-scm.com/) tool,\nyou may still need it to create new repositories or commit changes, etc.\n\n#### List available pipelines\n\nThe `ls` command allows you to list all the pipelines you have downloaded in\nyour computer. For example:\n\n nextflow ls\n\nThis prints a list similar to the following one:\n\n cbcrg/piper-nf\n nextflow-io/hello\n\n#### Show pipeline information\n\nBy using the `info` command you can show information from a downloaded pipeline. For example:\n\n $ nextflow info hello\n\nThis command prints:\n\n repo name : nextflow-io/hello\n home page : http://github.com/nextflow-io/hello\n local path : $HOME/.nextflow/assets/nextflow-io/hello\n main script: main.nf\n revisions :\n * master (default)\n mybranch\n v1.1 [t]\n v1.2 [t]\n\nStarting from the top it shows: 1) the repository name; 2) the project home page; 3) the local folder where the pipeline has been downloaded; 4) the script that is executed\nwhen launched; 5) the list of available revisions i.e. branches + tags. Tags are marked with\na `[t]` on the right, the current checked-out revision is marked with a `*` on the left.\n\n#### Pull or update a pipeline\n\nThe `pull` command allows you to download a pipeline from a GitHub repository or to update\nit if that repository has already been downloaded. For example:\n\n nextflow pull nextflow-io/examples\n\nDownloaded pipelines are stored in the folder `$HOME/.nextflow/assets` in your computer.\n\n#### Clone a pipeline into a folder\n\nThe `clone` command allows you to copy a Nextflow pipeline project to a directory of your choice. For example:\n\n nextflow clone nextflow-io/hello target-dir\n\nIf the destination directory is omitted the specified pipeline is cloned to a directory\nwith the same name as the pipeline _base_ name (e.g. `hello`) in the current folder.\n\nThe clone command can be used to inspect or modify the source code of a pipeline. You can\neventually commit and push back your changes by using the usual Git/GitHub workflow.\n\n#### Drop an installed pipeline\n\nDownloaded pipelines can be deleted by using the `drop` command, as shown below:\n\n nextflow drop nextflow-io/hello\n\n### Limitations and known problems\n\n- GitHub private repositories currently are not supported Support for private GitHub repositories has been introduced with version 0.10.0.\n- Symlinks committed in a Git repository are not resolved correctly\n when downloaded/cloned by Nextflow Symlinks are resolved correctly when using Nextflow version 0.11.0 (or higher).\n", + "content": "The [GitHub](https://github.com) code repository and collaboration platform is widely\nused between researchers to publish their work and to collaborate on projects source code.\n\nEven more interestingly a few months ago [GitHub announced improved support for researchers](https://github.com/blog/1840-improving-github-for-science)\nmaking it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.\n\nWith a DOI for your GitHub repository archive your code becomes formally citable\nin scientific publications.\n\n### Why use GitHub with Nextflow?\n\nThe latest Nextflow release (0.9.0) seamlessly integrates with GitHub.\nThis feature allows you to manage your code in a more consistent manner, or use other\npeople's Nextflow pipelines, published through GitHub, in a quick and transparent manner.\n\n### How it works\n\nThe idea is very simple, when you launch a script execution with Nextflow, it will look for\na file with the pipeline name you've specified. If that file does not exist,\nit will look for a public repository with the same name on GitHub. If it is found, the\nrepository is automatically downloaded to your computer and the code executed. This repository\nis stored in the Nextflow home directory, by default `$HOME/.nextflow`, thus it will be reused\nfor any further execution.\n\nYou can try this feature out, having Nextflow (version 0.9.0 or higher) installed in your computer,\nby simply entering the following command in your shell terminal:\n\n nextflow run nextflow-io/hello\n\nThe first time you execute this command Nextflow will download the pipeline\nat the following GitHub repository `https://github.com/nextflow-io/hello`,\nas you don't already have it in your computer. It will then execute it producing the expected output.\n\nIn order for a GitHub repository to be used as a Nextflow project, it must\ncontain at least one file named `main.nf` that defines your Nextflow pipeline script.\n\n### Run a specific revision\n\nAny Git branch, tag or commit ID in the GitHub repository can be used to specify a revision,\nthat you want to execute, when running your pipeline by adding the `-r` option to the run command line.\nSo for example you could enter:\n\n nextflow run nextflow-io/hello -r mybranch\n\nor\n\n nextflow run nextflow-io/hello -r v1.1\n\nThis can be very useful when comparing different versions of your project.\nIt also guarantees consistent results in your pipeline as your source code evolves.\n\n### Commands to manage pipelines\n\nThe following commands allows you to perform some basic operations that can be used to manage your pipelines.\nAnyway Nextflow is not meant to replace functionalities provided by the [Git](http://git-scm.com/) tool,\nyou may still need it to create new repositories or commit changes, etc.\n\n#### List available pipelines\n\nThe `ls` command allows you to list all the pipelines you have downloaded in\nyour computer. For example:\n\n nextflow ls\n\nThis prints a list similar to the following one:\n\n cbcrg/piper-nf\n nextflow-io/hello\n\n#### Show pipeline information\n\nBy using the `info` command you can show information from a downloaded pipeline. For example:\n\n $ nextflow info hello\n\nThis command prints:\n\n repo name : nextflow-io/hello\n home page : http://github.com/nextflow-io/hello\n local path : $HOME/.nextflow/assets/nextflow-io/hello\n main script: main.nf\n revisions :\n * master (default)\n mybranch\n v1.1 [t]\n v1.2 [t]\n\nStarting from the top it shows: 1) the repository name; 2) the project home page; 3) the local folder where the pipeline has been downloaded; 4) the script that is executed\nwhen launched; 5) the list of available revisions i.e. branches + tags. Tags are marked with\na `[t]` on the right, the current checked-out revision is marked with a `*` on the left.\n\n#### Pull or update a pipeline\n\nThe `pull` command allows you to download a pipeline from a GitHub repository or to update\nit if that repository has already been downloaded. For example:\n\n nextflow pull nextflow-io/examples\n\nDownloaded pipelines are stored in the folder `$HOME/.nextflow/assets` in your computer.\n\n#### Clone a pipeline into a folder\n\nThe `clone` command allows you to copy a Nextflow pipeline project to a directory of your choice. For example:\n\n nextflow clone nextflow-io/hello target-dir\n\nIf the destination directory is omitted the specified pipeline is cloned to a directory\nwith the same name as the pipeline _base_ name (e.g. `hello`) in the current folder.\n\nThe clone command can be used to inspect or modify the source code of a pipeline. You can\neventually commit and push back your changes by using the usual Git/GitHub workflow.\n\n#### Drop an installed pipeline\n\nDownloaded pipelines can be deleted by using the `drop` command, as shown below:\n\n nextflow drop nextflow-io/hello\n\n### Limitations and known problems\n\n- ~~GitHub private repositories currently are not supported~~ Support for private GitHub repositories has been introduced with version 0.10.0.\n- ~~Symlinks committed in a Git repository are not resolved correctly\n when downloaded/cloned by Nextflow~~ Symlinks are resolved correctly when using Nextflow version 0.11.0 (or higher).", "images": [], "author": "Paolo Di Tommaso", "tags": "git,github,reproducibility" @@ -21,7 +21,7 @@ "slug": "2014/using-docker-in-hpc-cluster", "title": "Using Docker for scientific data analysis in an HPC cluster", "date": "2014-11-06T00:00:00.000Z", - "content": "\nScientific data analysis pipelines are rarely composed by a single piece of software.\nIn a real world scenario, computational pipelines are made up of multiple stages, each of which\ncan execute many different scripts, system commands and external tools deployed in a hosting computing\nenvironment, usually an HPC cluster.\n\nAs I work as a research engineer in a bioinformatics lab I experience on a daily basis the\ndifficulties related on keeping such a piece of software consistent.\n\nComputing environments can change frequently in order to test new pieces of software or\nmaybe because system libraries need to be updated. For this reason replicating the results\nof a data analysis over time can be a challenging task.\n\n[Docker](http://www.docker.com) has emerged recently as a new type of virtualisation technology that allows one\nto create a self-contained runtime environment. There are plenty of examples\nshowing the benefits of using it to run application services, like web servers\nor databases.\n\nHowever it seems that few people have considered using Docker for the deployment of scientific\ndata analysis pipelines on distributed cluster of computer, in order to simplify the development,\nthe deployment and the replicability of this kind of applications.\n\nFor this reason I wanted to test the capabilities of Docker to solve these problems in the\ncluster available in our [institute](http://www.crg.eu).\n\n## Method\n\nThe Docker engine has been installed in each node of our cluster, that runs a [Univa grid engine](http://www.univa.com/products/grid-engine.php) resource manager.\nA Docker private registry instance has also been installed in our internal network, so that images\ncan be pulled from the local repository in a much faster way when compared to the public\n[Docker registry](http://registry.hub.docker.com).\n\nMoreover the Univa grid engine has been configured with a custom [complex](http://www.gridengine.eu/mangridengine/htmlman5/complex.html)\nresource type. This allows us to request a specific Docker image as a resource type while\nsubmitting a job execution to the cluster.\n\nThe Docker image is requested as a _soft_ resource, by doing that the UGE scheduler\ntries to run a job to a node where that image has already been pulled,\notherwise a lower priority is given to it and it is executed, eventually, by a node where\nthe specified Docker image is not available. This will force the node to pull the required\nimage from the local registry at the time of the job execution.\n\nThis environment has been tested with [Piper-NF](https://github.com/cbcrg/piper-nf), a genomic pipeline for the\ndetection and mapping of long non-coding RNAs.\n\nThe pipeline runs on top of Nextflow, which takes care of the tasks parallelisation and submits\nthe jobs for execution to the Univa grid engine.\n\nThe Piper-NF code wasn't modified in order to run it using Docker.\nNextflow is able to handle it automatically. The Docker containers are run in such a way that\nthe tasks result files are created in the hosting file system, in other\nwords it behaves in a completely transparent manner without requiring extra steps or affecting\nthe flow of the pipeline execution.\n\nIt was only necessary to specify the Docker image (or images) to be used in the Nextflow\nconfiguration file for the pipeline. You can read more about this at [this link](https://www.nextflow.io/docs/latest/docker.html).\n\n## Results\n\nTo benchmark the impact of Docker on the pipeline performance a comparison was made running\nit with and without Docker.\n\nFor this experiment 10 cluster nodes were used. The pipeline execution launches around 100 jobs,\nand it was run 5 times by using the same dataset with and without Docker.\n\nThe average execution time without Docker was 28.6 minutes, while the average\npipeline execution time, running each job in a Docker container, was 32.2 minutes.\nThus, by using Docker the overall execution time increased by something around 12.5%.\n\nIt is important to note that this time includes both the Docker bootstrap time,\nand the time overhead that is added to the task execution by the virtualisation layer.\n\nFor this reason the actual task run time was measured as well i.e. without including the\nDocker bootstrap time overhead. In this case, the aggregate average task execution time was 57.3 minutes\nand 59.5 minutes when running the same tasks using Docker. Thus, the time overhead\nadded by the Docker virtualisation layer to the effective task run time can be estimated\nto around 4% in our test.\n\nKeeping the complete toolset required by the pipeline execution within a Docker image dramatically\nreduced configuration and deployment problems. Also storing these images into the private and\n[public](https://registry.hub.docker.com/repos/cbcrg/) repositories with a unique tag allowed us\nto replicate the results without the usual burden required to set-up an identical computing environment.\n\n## Conclusion\n\nThe fast start-up time for Docker containers technology allows one to virtualise a single process or\nthe execution of a bunch of applications, instead of a complete operating system. This opens up new possibilities,\nfor example the possibility to \"virtualise\" distributed job executions in an HPC cluster of computers.\n\nThe minimal performance loss introduced by the Docker engine is offset by the advantages of running\nyour analysis in a self-contained and dead easy to reproduce runtime environment, which guarantees\nthe consistency of the results over time and across different computing platforms.\n\n#### Credits\n\nThanks to Arnau Bria and the all scientific systems admins team to manage the Docker installation\nin the CRG computing cluster.\n", + "content": "Scientific data analysis pipelines are rarely composed by a single piece of software.\nIn a real world scenario, computational pipelines are made up of multiple stages, each of which\ncan execute many different scripts, system commands and external tools deployed in a hosting computing\nenvironment, usually an HPC cluster.\n\nAs I work as a research engineer in a bioinformatics lab I experience on a daily basis the\ndifficulties related on keeping such a piece of software consistent.\n\nComputing environments can change frequently in order to test new pieces of software or\nmaybe because system libraries need to be updated. For this reason replicating the results\nof a data analysis over time can be a challenging task.\n\n[Docker](http://www.docker.com) has emerged recently as a new type of virtualisation technology that allows one\nto create a self-contained runtime environment. There are plenty of examples\nshowing the benefits of using it to run application services, like web servers\nor databases.\n\nHowever it seems that few people have considered using Docker for the deployment of scientific\ndata analysis pipelines on distributed cluster of computer, in order to simplify the development,\nthe deployment and the replicability of this kind of applications.\n\nFor this reason I wanted to test the capabilities of Docker to solve these problems in the\ncluster available in our [institute](http://www.crg.eu).\n\n## Method\n\nThe Docker engine has been installed in each node of our cluster, that runs a [Univa grid engine](http://www.univa.com/products/grid-engine.php) resource manager.\nA Docker private registry instance has also been installed in our internal network, so that images\ncan be pulled from the local repository in a much faster way when compared to the public\n[Docker registry](http://registry.hub.docker.com).\n\nMoreover the Univa grid engine has been configured with a custom [complex](http://www.gridengine.eu/mangridengine/htmlman5/complex.html)\nresource type. This allows us to request a specific Docker image as a resource type while\nsubmitting a job execution to the cluster.\n\nThe Docker image is requested as a _soft_ resource, by doing that the UGE scheduler\ntries to run a job to a node where that image has already been pulled,\notherwise a lower priority is given to it and it is executed, eventually, by a node where\nthe specified Docker image is not available. This will force the node to pull the required\nimage from the local registry at the time of the job execution.\n\nThis environment has been tested with [Piper-NF](https://github.com/cbcrg/piper-nf), a genomic pipeline for the\ndetection and mapping of long non-coding RNAs.\n\nThe pipeline runs on top of Nextflow, which takes care of the tasks parallelisation and submits\nthe jobs for execution to the Univa grid engine.\n\nThe Piper-NF code wasn't modified in order to run it using Docker.\nNextflow is able to handle it automatically. The Docker containers are run in such a way that\nthe tasks result files are created in the hosting file system, in other\nwords it behaves in a completely transparent manner without requiring extra steps or affecting\nthe flow of the pipeline execution.\n\nIt was only necessary to specify the Docker image (or images) to be used in the Nextflow\nconfiguration file for the pipeline. You can read more about this at [this link](https://www.nextflow.io/docs/latest/docker.html).\n\n## Results\n\nTo benchmark the impact of Docker on the pipeline performance a comparison was made running\nit with and without Docker.\n\nFor this experiment 10 cluster nodes were used. The pipeline execution launches around 100 jobs,\nand it was run 5 times by using the same dataset with and without Docker.\n\nThe average execution time without Docker was 28.6 minutes, while the average\npipeline execution time, running each job in a Docker container, was 32.2 minutes.\nThus, by using Docker the overall execution time increased by something around 12.5%.\n\nIt is important to note that this time includes both the Docker bootstrap time,\nand the time overhead that is added to the task execution by the virtualisation layer.\n\nFor this reason the actual task run time was measured as well i.e. without including the\nDocker bootstrap time overhead. In this case, the aggregate average task execution time was 57.3 minutes\nand 59.5 minutes when running the same tasks using Docker. Thus, the time overhead\nadded by the Docker virtualisation layer to the effective task run time can be estimated\nto around 4% in our test.\n\nKeeping the complete toolset required by the pipeline execution within a Docker image dramatically\nreduced configuration and deployment problems. Also storing these images into the private and\n[public](https://registry.hub.docker.com/repos/cbcrg/) repositories with a unique tag allowed us\nto replicate the results without the usual burden required to set-up an identical computing environment.\n\n## Conclusion\n\nThe fast start-up time for Docker containers technology allows one to virtualise a single process or\nthe execution of a bunch of applications, instead of a complete operating system. This opens up new possibilities,\nfor example the possibility to \"virtualise\" distributed job executions in an HPC cluster of computers.\n\nThe minimal performance loss introduced by the Docker engine is offset by the advantages of running\nyour analysis in a self-contained and dead easy to reproduce runtime environment, which guarantees\nthe consistency of the results over time and across different computing platforms.\n\n#### Credits\n\nThanks to Arnau Bria and the all scientific systems admins team to manage the Docker installation\nin the CRG computing cluster.", "images": [], "author": "Paolo Di Tommaso", "tags": "docker,reproducibility,data-analysis,hpc" @@ -30,7 +30,7 @@ "slug": "2015/innovation-in-science-the-story-behind-nextflow", "title": "Innovation In Science - The story behind Nextflow", "date": "2015-06-09T00:00:00.000Z", - "content": "\nInnovation can be viewed as the application of solutions that meet new requirements or\nexisting market needs. Academia has traditionally been the driving force of innovation.\nScientific ideas have shaped the world, but only a few of them were brought to market by\nthe inventing scientists themselves, resulting in both time and financial loses.\n\nLately there have been several attempts to boost scientific innovation and translation,\nwith most notable in Europe being the Horizon 2020 funding program. The problem with these\ntypes of funding is that they are not designed for PhDs and Postdocs, but rather aim to\npromote the collaboration of senior scientists in different institutions. This neglects two\nvery important facts, first and foremost that most of the Nobel prizes were given for\ndiscoveries made when scientists were in their 20's / 30's (not in their 50's / 60's).\nSecondly, innovation really happens when a few individuals (not institutions) face a\nproblem in their everyday life/work, and one day they just decide to do something about it\n(end-user innovation). Without realizing, these people address a need that many others have.\nThey don’t do it for the money or the glory; they do it because it bothers them!\nMany examples of companies that started exactly this way include Apple, Google, and\nVirgin Airlines.\n\n### The story of Nextflow\n\nSimilarly, Nextflow started as an attempt to solve the every-day computational problems we\nwere facing with “big biomedical data” analyses. We wished that our huge and almost cryptic\nBASH-based pipelines could handle parallelization automatically. In our effort to make that\nhappen we stumbled upon the [Dataflow](http://en.wikipedia.org/wiki/Dataflow_programming)\nprogramming model and Nextflow was created.\nWe were getting furious every time our two-week long pipelines were crashing and we had\nto re-execute them from the beginning. We, therefore, developed a caching system, which\nallows Nextflow to resume any pipeline from the last executed step. While we were really\nenjoying developing a new [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) and\ncreating our own operators, at the same time we were not willing to give up our favorite\nPerl/Python scripts and one-liners, and thus Nextflow became a polyglot.\n\nAnother problem we were facing was that our pipelines were invoking a lot of\nthird-party software, making distribution and execution on different platforms a nightmare.\nOnce again while searching for a solution to this problem, we were able to identify a\nbreakthrough technology [Docker](https://www.docker.com/), which is now revolutionising\ncloud computation. Nextflow has been one of the first framework, that fully\nsupports Docker containers and allows pipeline execution in an isolated and easy to distribute manner.\nOf course, sharing our pipelines with our friends rapidly became a necessity and so we had\nto make Nextflow smart enough to support [Github](https://github.com) and [Bitbucket](https://bitbucket.org/) integration.\n\nI don’t know if Nextflow will make as much difference in the world as the Dataflow\nprogramming model and Docker container technology are making, but it has already made a\nbig difference in our lives and that is all we ever wanted…\n\n### Conclusion\n\nSummarising, it is a pity that PhDs and Postdocs are the neglected engine of Innovation.\nThey are not empowered to innovate, by identifying and addressing their needs, and to\npotentially set up commercial solutions to their problems. This fact becomes even sadder\nwhen you think that only 3% of Postdocs have a chance to become PIs in the UK. Instead more\nand more money is being invested into the senior scientists who only require their PhD students\nand Postdocs to put another step into a well-defined ladder. In todays world it seems that\nideas, such as Nextflow, will only get funded for their scientific value, not as innovative\nconcepts trying to address a need.\n", + "content": "Innovation can be viewed as the application of solutions that meet new requirements or\nexisting market needs. Academia has traditionally been the driving force of innovation.\nScientific ideas have shaped the world, but only a few of them were brought to market by\nthe inventing scientists themselves, resulting in both time and financial loses.\n\nLately there have been several attempts to boost scientific innovation and translation,\nwith most notable in Europe being the Horizon 2020 funding program. The problem with these\ntypes of funding is that they are not designed for PhDs and Postdocs, but rather aim to\npromote the collaboration of senior scientists in different institutions. This neglects two\nvery important facts, first and foremost that most of the Nobel prizes were given for\ndiscoveries made when scientists were in their 20's / 30's (not in their 50's / 60's).\nSecondly, innovation really happens when a few individuals (not institutions) face a\nproblem in their everyday life/work, and one day they just decide to do something about it\n(end-user innovation). Without realizing, these people address a need that many others have.\nThey don’t do it for the money or the glory; they do it because it bothers them!\nMany examples of companies that started exactly this way include Apple, Google, and\nVirgin Airlines.\n\n### The story of Nextflow\n\nSimilarly, Nextflow started as an attempt to solve the every-day computational problems we\nwere facing with “big biomedical data” analyses. We wished that our huge and almost cryptic\nBASH-based pipelines could handle parallelization automatically. In our effort to make that\nhappen we stumbled upon the [Dataflow](http://en.wikipedia.org/wiki/Dataflow_programming)\nprogramming model and Nextflow was created.\nWe were getting furious every time our two-week long pipelines were crashing and we had\nto re-execute them from the beginning. We, therefore, developed a caching system, which\nallows Nextflow to resume any pipeline from the last executed step. While we were really\nenjoying developing a new [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) and\ncreating our own operators, at the same time we were not willing to give up our favorite\nPerl/Python scripts and one-liners, and thus Nextflow became a polyglot.\n\nAnother problem we were facing was that our pipelines were invoking a lot of\nthird-party software, making distribution and execution on different platforms a nightmare.\nOnce again while searching for a solution to this problem, we were able to identify a\nbreakthrough technology [Docker](https://www.docker.com/), which is now revolutionising\ncloud computation. Nextflow has been one of the first framework, that fully\nsupports Docker containers and allows pipeline execution in an isolated and easy to distribute manner.\nOf course, sharing our pipelines with our friends rapidly became a necessity and so we had\nto make Nextflow smart enough to support [Github](https://github.com) and [Bitbucket](https://bitbucket.org/) integration.\n\nI don’t know if Nextflow will make as much difference in the world as the Dataflow\nprogramming model and Docker container technology are making, but it has already made a\nbig difference in our lives and that is all we ever wanted…\n\n### Conclusion\n\nSummarising, it is a pity that PhDs and Postdocs are the neglected engine of Innovation.\nThey are not empowered to innovate, by identifying and addressing their needs, and to\npotentially set up commercial solutions to their problems. This fact becomes even sadder\nwhen you think that only 3% of Postdocs have a chance to become PIs in the UK. Instead more\nand more money is being invested into the senior scientists who only require their PhD students\nand Postdocs to put another step into a well-defined ladder. In todays world it seems that\nideas, such as Nextflow, will only get funded for their scientific value, not as innovative\nconcepts trying to address a need.", "images": [], "author": "Maria Chatzou", "tags": "innovation,science,pipelines,nextflow" @@ -39,7 +39,7 @@ "slug": "2015/introducing-nextflow-console", "title": "Introducing Nextflow REPL Console", "date": "2015-04-14T00:00:00.000Z", - "content": "\nThe latest version of Nextflow introduces a new _console_ graphical interface.\n\nThe Nextflow console is a REPL ([read-eval-print loop](http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop))\nenvironment that allows one to quickly test part of a script or pieces of Nextflow code\nin an interactive manner.\n\nIt is a handy tool that allows one to evaluate fragments of Nextflow/Groovy code\nor fast prototype a complete pipeline script.\n\n### Getting started\n\nThe console application is included in the latest version of Nextflow\n([0.13.1](https://github.com/nextflow-io/nextflow/releases) or higher).\n\nYou can try this feature out, having Nextflow installed on your computer, by entering the\nfollowing command in your shell terminal: `nextflow console `.\n\nWhen you execute it for the first time, Nextflow will spend a few seconds downloading\nthe required runtime dependencies. When complete the console window will appear as shown in\nthe picture below.\n\n\"Nextflow\n\nIt contains a text editor (the top white box) that allows you to enter and modify code snippets.\nThe results area (the bottom yellow box) will show the executed code's output.\n\nAt the top you will find the menu bar (not shown in this picture) and the actions\ntoolbar that allows you to open, save, execute (etc.) the code been tested.\n\nAs a practical execution example, simply copy and paste the following piece of code in the\nconsole editor box:\n\n echo true\n\n process sayHello {\n\n \"\"\"\n echo Hello world\n \"\"\"\n\n }\n\nThen, in order to evaluate it, open the `Script` menu in the top menu bar and select the `Run`\ncommand. Alternatively you can use the `CTRL+R` keyboard shortcut to run it (`⌘+R` on the Mac).\nIn the result box an output similar to the following will appear:\n\n [warm up] executor > local\n [00/d78a0f] Submitted process > sayHello (1)\n Hello world\n\nNow you can try to modify the entered process script, execute it again and check that\nthe printed result has changed.\n\nIf the output doesn't appear, open the `View` menu and make sure that the entry `Capture Standard\nOutput` is selected (it must have a tick on the left).\n\nIt is worth noting that the global script context is maintained across script executions.\nThis means that variables declared in the global script scope are not lost when the\nscript run is complete, and they can be accessed in further executions of the same or another\npiece of code.\n\nIn order to reset the global context you can use the command `Clear Script Context`\navailable in the `Script` menu.\n\n### Conclusion\n\nThe Nextflow console is a REPL environment which allows you to experiment and get used\nto the Nextflow programming environment. By using it you can prototype or test your code\nwithout the need to create/edit script files.\n\nNote: the Nextflow console is implemented by sub-classing the [Groovy console](http://groovy-lang.org/groovyconsole.html) tool.\nFor this reason you may find some labels that refer to the Groovy programming environment\nin this program.\n", + "content": "The latest version of Nextflow introduces a new _console_ graphical interface.\n\nThe Nextflow console is a REPL ([read-eval-print loop](http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop))\nenvironment that allows one to quickly test part of a script or pieces of Nextflow code\nin an interactive manner.\n\nIt is a handy tool that allows one to evaluate fragments of Nextflow/Groovy code\nor fast prototype a complete pipeline script.\n\n### Getting started\n\nThe console application is included in the latest version of Nextflow\n([0.13.1](https://github.com/nextflow-io/nextflow/releases) or higher).\n\nYou can try this feature out, having Nextflow installed on your computer, by entering the\nfollowing command in your shell terminal: `nextflow console `.\n\nWhen you execute it for the first time, Nextflow will spend a few seconds downloading\nthe required runtime dependencies. When complete the console window will appear as shown in\nthe picture below.\n\n\"Nextflow\n\nIt contains a text editor (the top white box) that allows you to enter and modify code snippets.\nThe results area (the bottom yellow box) will show the executed code's output.\n\nAt the top you will find the menu bar (not shown in this picture) and the actions\ntoolbar that allows you to open, save, execute (etc.) the code been tested.\n\nAs a practical execution example, simply copy and paste the following piece of code in the\nconsole editor box:\n\n echo true\n\n process sayHello {\n\n \"\"\"\n echo Hello world\n \"\"\"\n\n }\n\nThen, in order to evaluate it, open the `Script` menu in the top menu bar and select the `Run`\ncommand. Alternatively you can use the `CTRL+R` keyboard shortcut to run it (`⌘+R` on the Mac).\nIn the result box an output similar to the following will appear:\n\n [warm up] executor > local\n [00/d78a0f] Submitted process > sayHello (1)\n Hello world\n\nNow you can try to modify the entered process script, execute it again and check that\nthe printed result has changed.\n\nIf the output doesn't appear, open the `View` menu and make sure that the entry `Capture Standard\nOutput` is selected (it must have a tick on the left).\n\nIt is worth noting that the global script context is maintained across script executions.\nThis means that variables declared in the global script scope are not lost when the\nscript run is complete, and they can be accessed in further executions of the same or another\npiece of code.\n\nIn order to reset the global context you can use the command `Clear Script Context`\navailable in the `Script` menu.\n\n### Conclusion\n\nThe Nextflow console is a REPL environment which allows you to experiment and get used\nto the Nextflow programming environment. By using it you can prototype or test your code\nwithout the need to create/edit script files.\n\nNote: the Nextflow console is implemented by sub-classing the [Groovy console](http://groovy-lang.org/groovyconsole.html) tool.\nFor this reason you may find some labels that refer to the Groovy programming environment\nin this program.", "images": [ "/img/nextflow-console1.png" ], @@ -50,7 +50,7 @@ "slug": "2015/mpi-like-execution-with-nextflow", "title": "MPI-like distributed execution with Nextflow", "date": "2015-11-13T00:00:00.000Z", - "content": "\nThe main goal of Nextflow is to make workflows portable across different\ncomputing platforms taking advantage of the parallelisation features provided\nby the underlying system without having to reimplement your application code.\n\nFrom the beginning Nextflow has included executors designed to target the most popular\nresource managers and batch schedulers commonly used in HPC data centers,\nsuch as [Univa Grid Engine](http://www.univa.com), [Platform LSF](http://www.ibm.com/systems/platformcomputing/products/lsf/),\n[SLURM](https://computing.llnl.gov/linux/slurm/), [PBS](http://www.pbsworks.com/Product.aspx?id=1) and [Torque](http://www.adaptivecomputing.com/products/open-source/torque/).\n\nWhen using one of these executors Nextflow submits the computational workflow tasks\nas independent job requests to the underlying platform scheduler, specifying\nfor each of them the computing resources needed to carry out its job.\n\nThis approach works well for workflows that are composed of long running tasks, which\nis the case of most common genomic pipelines.\n\nHowever this approach does not scale well for workloads made up of a large number of\nshort-lived tasks (e.g. a few seconds or sub-seconds). In this scenario the resource\nmanager scheduling time is much longer than the actual task execution time, thus resulting\nin an overall execution time that is much longer than the real execution time.\nIn some cases this represents an unacceptable waste of computing resources.\n\nMoreover supercomputers, such as [MareNostrum](https://www.bsc.es/marenostrum-support-services/mn3)\nin the [Barcelona Supercomputer Center (BSC)](https://www.bsc.es/), are optimized for\nmemory distributed applications. In this context it is needed to allocate a certain\namount of computing resources in advance to run the application in a distributed manner,\ncommonly using the [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) standard.\n\nIn this scenario, the Nextflow execution model was far from optimal, if not unfeasible.\n\n### Distributed execution\n\nFor this reason, since the release 0.16.0, Nextflow has implemented a new distributed execution\nmodel that greatly improves the computation capability of the framework. It uses [Apache Ignite](https://ignite.apache.org/),\na lightweight clustering engine and in-memory data grid, which has been recently open sourced\nunder the Apache software foundation umbrella.\n\nWhen using this feature a Nextflow application is launched as if it were an MPI application.\nIt uses a job wrapper that submits a single request specifying all the needed computing\nresources. The Nextflow command line is executed by using the `mpirun` utility, as shown in the\nexample below:\n\n #!/bin/bash\n #$ -l virtual_free=120G\n #$ -q \n #$ -N \n #$ -pe ompi \n mpirun --pernode nextflow run -with-mpi [pipeline parameters]\n\nThis tool spawns a Nextflow instance in each of the computing nodes allocated by the\ncluster manager.\n\nEach Nextflow instance automatically connects with the other peers creating an _private_\ninternal cluster, thanks to the Apache Ignite clustering feature that\nis embedded within Nextflow itself.\n\nThe first node becomes the application driver that manages the execution of the\nworkflow application, submitting the tasks to the remaining nodes that act as workers.\n\nWhen the application is complete, the Nextflow driver automatically shuts down the\nNextflow/Ignite cluster and terminates the job execution.\n\n![Nextflow distributed execution](/img/nextflow-distributed-execution.png)\n\n### Conclusion\n\nIn this way it is possible to deploy a Nextflow workload in a supercomputer using an\nexecution strategy that resembles the MPI distributed execution model. This doesn't\nrequire to implement your application using the MPI api/library and it allows you to\nmaintain your code portable across different execution platforms.\n\nAlthough we do not currently have a performance comparison between a Nextflow distributed\nexecution and an equivalent MPI application, we assume that the latter provides better\nperformance due to its low-level optimisation.\n\nNextflow, however, focuses on the fast prototyping of scientific applications in a portable\nmanner while maintaining the ability to scale and distribute the application workload in an\nefficient manner in an HPC cluster.\n\nThis allows researchers to validate an experiment, quickly, reusing existing tools and\nsoftware components. This eventually makes it possible to implement an optimised version\nusing a low-level programming language in the second stage of a project.\n\nRead the documentation to learn more about the [Nextflow distributed execution model](https://www.nextflow.io/docs/latest/ignite.html#execution-with-mpi).\n", + "content": "The main goal of Nextflow is to make workflows portable across different\ncomputing platforms taking advantage of the parallelisation features provided\nby the underlying system without having to reimplement your application code.\n\nFrom the beginning Nextflow has included executors designed to target the most popular\nresource managers and batch schedulers commonly used in HPC data centers,\nsuch as [Univa Grid Engine](http://www.univa.com), [Platform LSF](http://www.ibm.com/systems/platformcomputing/products/lsf/),\n[SLURM](https://computing.llnl.gov/linux/slurm/), [PBS](http://www.pbsworks.com/Product.aspx?id=1) and [Torque](http://www.adaptivecomputing.com/products/open-source/torque/).\n\nWhen using one of these executors Nextflow submits the computational workflow tasks\nas independent job requests to the underlying platform scheduler, specifying\nfor each of them the computing resources needed to carry out its job.\n\nThis approach works well for workflows that are composed of long running tasks, which\nis the case of most common genomic pipelines.\n\nHowever this approach does not scale well for workloads made up of a large number of\nshort-lived tasks (e.g. a few seconds or sub-seconds). In this scenario the resource\nmanager scheduling time is much longer than the actual task execution time, thus resulting\nin an overall execution time that is much longer than the real execution time.\nIn some cases this represents an unacceptable waste of computing resources.\n\nMoreover supercomputers, such as [MareNostrum](https://www.bsc.es/marenostrum-support-services/mn3)\nin the [Barcelona Supercomputer Center (BSC)](https://www.bsc.es/), are optimized for\nmemory distributed applications. In this context it is needed to allocate a certain\namount of computing resources in advance to run the application in a distributed manner,\ncommonly using the [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) standard.\n\nIn this scenario, the Nextflow execution model was far from optimal, if not unfeasible.\n\n### Distributed execution\n\nFor this reason, since the release 0.16.0, Nextflow has implemented a new distributed execution\nmodel that greatly improves the computation capability of the framework. It uses [Apache Ignite](https://ignite.apache.org/),\na lightweight clustering engine and in-memory data grid, which has been recently open sourced\nunder the Apache software foundation umbrella.\n\nWhen using this feature a Nextflow application is launched as if it were an MPI application.\nIt uses a job wrapper that submits a single request specifying all the needed computing\nresources. The Nextflow command line is executed by using the `mpirun` utility, as shown in the\nexample below:\n\n #!/bin/bash\n #$ -l virtual_free=120G\n #$ -q \n #$ -N \n #$ -pe ompi \n mpirun --pernode nextflow run -with-mpi [pipeline parameters]\n\nThis tool spawns a Nextflow instance in each of the computing nodes allocated by the\ncluster manager.\n\nEach Nextflow instance automatically connects with the other peers creating an _private_\ninternal cluster, thanks to the Apache Ignite clustering feature that\nis embedded within Nextflow itself.\n\nThe first node becomes the application driver that manages the execution of the\nworkflow application, submitting the tasks to the remaining nodes that act as workers.\n\nWhen the application is complete, the Nextflow driver automatically shuts down the\nNextflow/Ignite cluster and terminates the job execution.\n\n![Nextflow distributed execution](/img/nextflow-distributed-execution.png)\n\n### Conclusion\n\nIn this way it is possible to deploy a Nextflow workload in a supercomputer using an\nexecution strategy that resembles the MPI distributed execution model. This doesn't\nrequire to implement your application using the MPI api/library and it allows you to\nmaintain your code portable across different execution platforms.\n\nAlthough we do not currently have a performance comparison between a Nextflow distributed\nexecution and an equivalent MPI application, we assume that the latter provides better\nperformance due to its low-level optimisation.\n\nNextflow, however, focuses on the fast prototyping of scientific applications in a portable\nmanner while maintaining the ability to scale and distribute the application workload in an\nefficient manner in an HPC cluster.\n\nThis allows researchers to validate an experiment, quickly, reusing existing tools and\nsoftware components. This eventually makes it possible to implement an optimised version\nusing a low-level programming language in the second stage of a project.\n\nRead the documentation to learn more about the [Nextflow distributed execution model](https://www.nextflow.io/docs/latest/ignite.html#execution-with-mpi).\n", "images": [], "author": "Paolo Di Tommaso", "tags": "mpi,hpc,pipelines,genomic" @@ -59,7 +59,7 @@ "slug": "2015/the-impact-of-docker-on-genomic-pipelines", "title": "The impact of Docker containers on the performance of genomic pipelines", "date": "2015-06-15T00:00:00.000Z", - "content": "\nIn a recent publication we assessed the impact of Docker containers technology\non the performance of bioinformatic tools and data analysis workflows.\n\nWe benchmarked three different data analyses: a RNA sequence pipeline for gene expression,\na consensus assembly and variant calling pipeline, and finally a pipeline for the detection\nand mapping of long non-coding RNAs.\n\nWe found that Docker containers have only a minor impact on the performance\nof common genomic data analysis, which is negligible when the executed tasks are demanding\nin terms of computational time.\n\n_[This publication is available as PeerJ preprint at this link](https://peerj.com/preprints/1171/)._\n", + "content": "In a recent publication we assessed the impact of Docker containers technology\non the performance of bioinformatic tools and data analysis workflows.\n\nWe benchmarked three different data analyses: a RNA sequence pipeline for gene expression,\na consensus assembly and variant calling pipeline, and finally a pipeline for the detection\nand mapping of long non-coding RNAs.\n\nWe found that Docker containers have only a minor impact on the performance\nof common genomic data analysis, which is negligible when the executed tasks are demanding\nin terms of computational time.\n\n_[This publication is available as PeerJ preprint at this link](https://peerj.com/preprints/1171/)._", "images": [], "author": "Paolo Di Tommaso", "tags": "docker,reproducibility,pipelines,nextflow,genomic" @@ -68,7 +68,7 @@ "slug": "2016/best-practice-for-reproducibility", "title": "Workflows & publishing: best practice for reproducibility", "date": "2016-04-13T00:00:00.000Z", - "content": "\nPublication time acts as a snapshot for scientific work. Whether a project is ongoing\nor not, work which was performed months ago must be described, new software documented,\ndata collated and figures generated.\n\nThe monumental increase in data and pipeline complexity has led to this task being\nperformed to many differing standards, or [lack of thereof](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080278).\nWe all agree it is not good enough to simply note down the software version number.\nBut what practical measures can be taken?\n\nThe recent publication describing _Kallisto_ [(Bray et al. 2016)](https://doi.org/10.1038/nbt.3519)\nprovides an excellent high profile example of the growing efforts to ensure reproducible\nscience in computational biology. The authors provide a GitHub [repository](https://github.com/pachterlab/kallisto_paper_analysis)\nthat _“contains all the analysis to reproduce the results in the kallisto paper”_.\n\nThey should be applauded and indeed - in the Twittersphere - they were. The corresponding\nauthor Lior Pachter stated that the publication could be reproduced starting from raw\nreads in the NCBI Sequence Read Archive through to the results, which marks a fantastic\naccomplishment.\n\n

Hoping people will notice https://t.co/qiu3LFozMX by @yarbsalocin @hjpimentel @pmelsted reproducing ALL the #kallisto paper from SRA→results

— Lior Pachter (@lpachter) April 5, 2016
\n\n\nThey achieve this utilising the workflow framework [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home).\nIncreasingly, we are seeing scientists applying workflow frameworks to their pipelines,\nwhich is great to see. There is a learning curve, but I have personally found the payoffs\nin productivity to be immense.\n\nAs both users and developers of Nextflow, we have long discussed best practice to ensure\nreproducibility of our work. As a community, we are at the beginning of that conversation\n\n- there are still many ideas to be aired and details ironed out - nevertheless we wished\n to provide a _state-of-play_ as we see it and to describe what is possible with Nextflow\n in this regard.\n\n### Guaranteed Reproducibility\n\nThis is our goal. It is one thing for a pipeline to be able to be reproduced in your own\nhands, on your machine, yet is another for this to be guaranteed so that anyone anywhere\ncan reproduce it. What I mean by guaranteed is that when a given pipeline is executed,\nthere is only one result which can be output.\nEnvisage what I term the _reproducibility triangle_: consisting of data, code and\ncompute environment.\n\n![Reproducibility Triangle](/img/reproducibility-triangle.png)\n\n**Figure 1:** The Reproducibility Triangle. _Data_: raw data such as sequencing reads,\ngenomes and annotations but also metadata such as experimental design. _Code_:\nscripts, binaries and libraries/dependencies. _Environment_: operating system.\n\nIf there is any change to one of these then the reproducibililty is no longer guaranteed.\nFor years there have been solutions to each of these individual components. But they have\nlived a somewhat discrete existence: data in databases such as the SRA and Ensembl, code\non GitHub and compute environments in the form of virtual machines. We think that in the\nfuture science must embrace solutions that integrate each of these components natively and\nholistically.\n\n### Implementation\n\nNextflow provides a solution to reproduciblility through version control and sandboxing.\n\n#### Code\n\nVersion control is provided via [native integration with GitHub](https://www.nextflow.io/docs/latest/sharing.html)\nand other popular code management platforms such as Bitbucket and GitLab.\nPipelines can be pulled, executed, developed, collaborated on and shared. For example,\nthe command below will pull a specific version of a [simple Kallisto + Sleuth pipeline](https://github.com/cbcrg/kallisto-nf)\nfrom GitHub and execute it. The `-r` parameter can be used to specify a specific tag, branch\nor revision that was previously defined in the Git repository.\n\n nextflow run cbcrg/kallisto-nf -r v0.9\n\n#### Environment\n\nSandboxing during both development and execution is another key concept; version control\nalone does not ensure that all dependencies nor the compute environment are the same.\n\nA simplified implementation of this places all binaries, dependencies and libraries within\nthe project repository. In Nextflow, any binaries within the the `bin` directory of a\nrepository are added to the path. Also, within the Nextflow [config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config),\nenvironmental variables such as `PERL5LIB` can be defined so that they are automatically\nadded during the task executions.\n\nThis can be taken a step further with containerisation such as [Docker](https://www.nextflow.io/docs/latest/docker.html).\nWe have recently published [work](https://doi.org/10.7717/peerj.1273) about this:\nbriefly a [dockerfile](https://github.com/cbcrg/kallisto-nf/blob/master/Dockerfile)\ncontaining the instructions on how to build the docker image resides inside a repository.\nThis provides a specification for the operating system, software, libraries and\ndependencies to be run.\n\nThe images themself also have content-addressable identifiers in the form of\n[digests](https://docs.docker.com/engine/userguide/containers/dockerimages/#image-digests),\nwhich ensure not a single byte of information, from the operating system through to the\nlibraries pulled from public repos, has been changed. This container digest can be specified\nin the [pipeline config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config).\n\n process {\n container = \"cbcrg/kallisto-nf@sha256:9f84012739...\"\n }\n\nWhen doing so Nextflow automatically pulls the specified image from the Docker Hub and\nmanages the execution of the pipeline tasks from within the container in a transparent manner,\ni.e. without having to adapt or modify your code.\n\n#### Data\n\nData is currently one of the more challenging aspect to address. _Small data_ can be\neasily version controlled within git-like repositories. For larger files\nthe [Git Large File Storage](https://git-lfs.github.com/), for which Nextflow provides\nbuilt-in support, may be one solution. Ultimately though, the real home of scientific data\nis in publicly available, programmatically accessible databases.\n\nProviding out-of-box solutions is difficult given the hugely varying nature of the data\nand meta-data within these databases. We are currently looking to incorporate the most\nhighly used ones, such as the [SRA](http://www.ncbi.nlm.nih.gov/sra) and [Ensembl](http://www.ensembl.org/index.html).\nIn the long term we have an eye on initiatives, such as [NCBI BioProject](https://www.ncbi.nlm.nih.gov/bioproject/),\nwith the idea there is a single identifier for both the data and metadata that can be referenced in a workflow.\n\nAdhering to the practices above, one could imagine one line of code which would appear within a publication.\n\n nextflow run [user/repo] -r [version] --data[DB_reference:data_reference] -with-docker\n\nThe result would be guaranteed to be reproduced by whoever wished.\n\n### Conclusion\n\nWith this approach the reproducilbility triangle is complete. But it must be noted that\nthis does not guard against conceptual or implementation errors. It does not replace proper\ndocumentation. What it does is to provide transparency to a result.\n\nThe assumption that the deterministic nature of computation makes results insusceptible\nto irreproducbility is clearly false. We consider Nextflow with its other features such\nits polyglot nature, out-of-the-box portability and native support across HPC and Cloud\nenvironments to be an ideal solution in our everyday work. We hope to see more scientists\nadopt this approach to their workflows.\n\nThe recent efforts by the _Kallisto_ authors highlight the appetite for increasing these\nstandards and we encourage the community at large to move towards ensuring this becomes\nthe normal state of affairs for publishing in science.\n\n### References\n\nBray, Nicolas L., Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology, April. Nature Publishing Group. doi:10.1038/nbt.3519.\n\nDi Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. (2015) \"The impact of Docker containers on the performance of genomic pipelines.\" PeerJ 3:e1273 doi.org:10.7717/peerj.1273.\n\nGarijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) \"Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.\" PLoS ONE 8(11): e80278. doi:10.1371/journal.pone.0080278\n", + "content": "Publication time acts as a snapshot for scientific work. Whether a project is ongoing\nor not, work which was performed months ago must be described, new software documented,\ndata collated and figures generated.\n\nThe monumental increase in data and pipeline complexity has led to this task being\nperformed to many differing standards, or [lack of thereof](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080278).\nWe all agree it is not good enough to simply note down the software version number.\nBut what practical measures can be taken?\n\nThe recent publication describing _Kallisto_ [(Bray et al. 2016)](https://doi.org/10.1038/nbt.3519)\nprovides an excellent high profile example of the growing efforts to ensure reproducible\nscience in computational biology. The authors provide a GitHub [repository](https://github.com/pachterlab/kallisto_paper_analysis)\nthat _“contains all the analysis to reproduce the results in the kallisto paper”_.\n\nThey should be applauded and indeed - in the Twittersphere - they were. The corresponding\nauthor Lior Pachter stated that the publication could be reproduced starting from raw\nreads in the NCBI Sequence Read Archive through to the results, which marks a fantastic\naccomplishment.\n\n> Hoping people will notice [https://t.co/qiu3LFozMX](https://t.co/qiu3LFozMX) by [@yarbsalocin](https://twitter.com/yarbsalocin) [@hjpimentel](https://twitter.com/hjpimentel) [@pmelsted](https://twitter.com/pmelsted) reproducing ALL the [#kallisto](https://twitter.com/hashtag/kallisto?src=hash) paper from SRA→results\n> \n> — Lior Pachter (@lpachter) [April 5, 2016](https://twitter.com/lpachter/status/717279998424457216)\n\n\n\nThey achieve this utilising the workflow framework [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home).\nIncreasingly, we are seeing scientists applying workflow frameworks to their pipelines,\nwhich is great to see. There is a learning curve, but I have personally found the payoffs\nin productivity to be immense.\n\nAs both users and developers of Nextflow, we have long discussed best practice to ensure\nreproducibility of our work. As a community, we are at the beginning of that conversation\n\n- there are still many ideas to be aired and details ironed out - nevertheless we wished\n to provide a _state-of-play_ as we see it and to describe what is possible with Nextflow\n in this regard.\n\n### Guaranteed Reproducibility\n\nThis is our goal. It is one thing for a pipeline to be able to be reproduced in your own\nhands, on your machine, yet is another for this to be guaranteed so that anyone anywhere\ncan reproduce it. What I mean by guaranteed is that when a given pipeline is executed,\nthere is only one result which can be output.\nEnvisage what I term the _reproducibility triangle_: consisting of data, code and\ncompute environment.\n\n![Reproducibility Triangle](/img/reproducibility-triangle.png)\n\n**Figure 1:** The Reproducibility Triangle. _Data_: raw data such as sequencing reads,\ngenomes and annotations but also metadata such as experimental design. _Code_:\nscripts, binaries and libraries/dependencies. _Environment_: operating system.\n\nIf there is any change to one of these then the reproducibililty is no longer guaranteed.\nFor years there have been solutions to each of these individual components. But they have\nlived a somewhat discrete existence: data in databases such as the SRA and Ensembl, code\non GitHub and compute environments in the form of virtual machines. We think that in the\nfuture science must embrace solutions that integrate each of these components natively and\nholistically.\n\n### Implementation\n\nNextflow provides a solution to reproduciblility through version control and sandboxing.\n\n#### Code\n\nVersion control is provided via [native integration with GitHub](https://www.nextflow.io/docs/latest/sharing.html)\nand other popular code management platforms such as Bitbucket and GitLab.\nPipelines can be pulled, executed, developed, collaborated on and shared. For example,\nthe command below will pull a specific version of a [simple Kallisto + Sleuth pipeline](https://github.com/cbcrg/kallisto-nf)\nfrom GitHub and execute it. The `-r` parameter can be used to specify a specific tag, branch\nor revision that was previously defined in the Git repository.\n\n nextflow run cbcrg/kallisto-nf -r v0.9\n\n#### Environment\n\nSandboxing during both development and execution is another key concept; version control\nalone does not ensure that all dependencies nor the compute environment are the same.\n\nA simplified implementation of this places all binaries, dependencies and libraries within\nthe project repository. In Nextflow, any binaries within the the `bin` directory of a\nrepository are added to the path. Also, within the Nextflow [config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config),\nenvironmental variables such as `PERL5LIB` can be defined so that they are automatically\nadded during the task executions.\n\nThis can be taken a step further with containerisation such as [Docker](https://www.nextflow.io/docs/latest/docker.html).\nWe have recently published [work](https://doi.org/10.7717/peerj.1273) about this:\nbriefly a [dockerfile](https://github.com/cbcrg/kallisto-nf/blob/master/Dockerfile)\ncontaining the instructions on how to build the docker image resides inside a repository.\nThis provides a specification for the operating system, software, libraries and\ndependencies to be run.\n\nThe images themself also have content-addressable identifiers in the form of\n[digests](https://docs.docker.com/engine/userguide/containers/dockerimages/#image-digests),\nwhich ensure not a single byte of information, from the operating system through to the\nlibraries pulled from public repos, has been changed. This container digest can be specified\nin the [pipeline config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config).\n\n process {\n container = \"cbcrg/kallisto-nf@sha256:9f84012739...\"\n }\n\nWhen doing so Nextflow automatically pulls the specified image from the Docker Hub and\nmanages the execution of the pipeline tasks from within the container in a transparent manner,\ni.e. without having to adapt or modify your code.\n\n#### Data\n\nData is currently one of the more challenging aspect to address. _Small data_ can be\neasily version controlled within git-like repositories. For larger files\nthe [Git Large File Storage](https://git-lfs.github.com/), for which Nextflow provides\nbuilt-in support, may be one solution. Ultimately though, the real home of scientific data\nis in publicly available, programmatically accessible databases.\n\nProviding out-of-box solutions is difficult given the hugely varying nature of the data\nand meta-data within these databases. We are currently looking to incorporate the most\nhighly used ones, such as the [SRA](http://www.ncbi.nlm.nih.gov/sra) and [Ensembl](http://www.ensembl.org/index.html).\nIn the long term we have an eye on initiatives, such as [NCBI BioProject](https://www.ncbi.nlm.nih.gov/bioproject/),\nwith the idea there is a single identifier for both the data and metadata that can be referenced in a workflow.\n\nAdhering to the practices above, one could imagine one line of code which would appear within a publication.\n\n nextflow run [user/repo] -r [version] --data[DB_reference:data_reference] -with-docker\n\nThe result would be guaranteed to be reproduced by whoever wished.\n\n### Conclusion\n\nWith this approach the reproducilbility triangle is complete. But it must be noted that\nthis does not guard against conceptual or implementation errors. It does not replace proper\ndocumentation. What it does is to provide transparency to a result.\n\nThe assumption that the deterministic nature of computation makes results insusceptible\nto irreproducbility is clearly false. We consider Nextflow with its other features such\nits polyglot nature, out-of-the-box portability and native support across HPC and Cloud\nenvironments to be an ideal solution in our everyday work. We hope to see more scientists\nadopt this approach to their workflows.\n\nThe recent efforts by the _Kallisto_ authors highlight the appetite for increasing these\nstandards and we encourage the community at large to move towards ensuring this becomes\nthe normal state of affairs for publishing in science.\n\n### References\n\nBray, Nicolas L., Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology, April. Nature Publishing Group. doi:10.1038/nbt.3519.\n\nDi Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. (2015) \"The impact of Docker containers on the performance of genomic pipelines.\" PeerJ 3:e1273 doi.org:10.7717/peerj.1273.\n\nGarijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) \"Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.\" PLoS ONE 8(11): e80278. doi:10.1371/journal.pone.0080278", "images": [], "author": "Evan Floden", "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,docker" @@ -77,7 +77,7 @@ "slug": "2016/deploy-in-the-cloud-at-snap-of-a-finger", "title": "Deploy your computational pipelines in the cloud at the snap-of-a-finger", "date": "2016-09-01T00:00:00.000Z", - "content": "\n

\nLearn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers\n

\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\n

\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n

\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\nWe are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image `ami-43f49030` `ami-4b7daa32`\\* (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the documentation for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", + "content": "*Learn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers*\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\n~~We are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`~~.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image ~~`ami-43f49030`~~ `ami-4b7daa32`^\\* ^ (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the [documentation](/docs/latest/awscloud.html) for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", "images": [], "author": "Paolo Di Tommaso", "tags": "aws,cloud,pipelines,nextflow,genomic,docker" @@ -86,7 +86,7 @@ "slug": "2016/developing-bioinformatics-pipeline-across-multiple-environments", "title": "Developing a bioinformatics pipeline across multiple environments", "date": "2016-02-04T00:00:00.000Z", - "content": "\nAs a new bioinformatics student with little formal computer science training, there are\nfew things that scare me more than PhD committee meetings and having to run my code in a\ncompletely different operating environment.\n\nRecently my work landed me in the middle of the phylogenetic tree jungle and the computational\nrequirements of my project far outgrew the resources that were available on our institute’s\n[Univa Grid Engine](https://en.wikipedia.org/wiki/Univa_Grid_Engine) based cluster. Luckily for me,\nan opportunity arose to participate in a joint program at the MareNostrum HPC at the\n[Barcelona Supercomputing Centre](http://www.bsc.es) (BSC).\n\nAs one of the top 100 supercomputers in the world, the [MareNostrum III](https://www.bsc.es/discover-bsc/the-centre/marenostrum)\ndwarfs our cluster and consists of nearly 50'000 processors. However it soon became apparent\nthat with great power comes great responsibility and in the case of the BSC, great restrictions.\nThese include no internet access, restrictive wall times for jobs, longer queues,\nfewer pre-installed binaries and an older version of bash. Faced with the possibility of\nhaving to rewrite my 16 bodged scripts for another queuing system I turned to Nextflow.\n\nStraight off the bat I was able to reduce all my previous scripts to a single Nextflow script.\nAdmittedly, the original code was not great, but the data processing model made me feel confident\nin what I was doing and I was able to reduce the volume of code to 25% of its initial amount\nwhilst making huge improvements in the readability. The real benefits however came from the portability.\n\nI was able to write the project on my laptop (Macbook Air), continuously test it on my local\ndesktop machine (Linux) and then perform more realistic heavy lifting runs on the cluster,\nall managed from a single GitHub repository. The BSC uses the [Load Sharing Facility](https://en.wikipedia.org/wiki/Platform_LSF)\n(LSF) platform with longer queue times, but a large number of CPUs. My project on the other\nhand had datasets that require over 100'000 tasks, but the tasks processes themselves run\nfor a matter of seconds or minutes. We were able to marry these two competing interests\ndeploying Nextflow in a [distributed execution manner that resemble the one of an MPI application](/blog/2015/mpi-like-execution-with-nextflow.html).\n\nIn this configuration, the queuing system allocates the Nextflow requested resources and\nusing the embedded [Apache Ignite](https://ignite.apache.org/) clustering engine, Nextflow handles\nthe submission of processes to the individual nodes.\n\nHere is some examples of how to run the same Nextflow project over multiple platforms.\n\n#### Local\n\nIf I wished to launch a job locally I can run it with the command:\n\n nextflow run myproject.nf\n\n#### Univa Grid Engine (UGE)\n\nFor the UGE I simply needed to specify the following in the `nextflow.config` file:\n\n process {\n executor='uge'\n queue='my_queue'\n }\n\nAnd then launch the pipeline execution as we did before:\n\n nextflow run myproject.nf\n\n#### Load Sharing Facility (LSF)\n\nFor running the same pipeline in the MareNostrum HPC environment, taking advantage of the MPI\nstandard to deploy my workload, I first created a wrapper script (for example `bsc-wrapper.sh`)\ndeclaring the resources that I want to reserve for the pipeline execution:\n\n #!/bin/bash\n #BSUB -oo logs/output_%J.out\n #BSUB -eo logs/output_%J.err\n #BSUB -J myProject\n #BSUB -q bsc_ls\n #BSUB -W 2:00\n #BSUB -x\n #BSUB -n 512\n #BSUB -R \"span[ptile=16]\"\n export NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)\n mpirun --pernode bin/nextflow run concMSA.nf -with-mpi\n\nAnd then can execute it using `bsub` as shown below:\n\n bsub < bsc-wrapper.sh\n\nBy running Nextflow in this way and given the wrapper above, a single `bsub` job will run\non 512 cores in 32 computing nodes (512/16 = 32) with a maximum wall time of 2 hours.\nThousands of Nextflow processes can be spawned during this and the execution can be monitored\nin the standard manner from a single Nextflow output and error files. If any errors occur\nthe execution can of course to continued with [`-resume` command line option](/docs/latest/getstarted.html?highlight=resume#modify-and-resume).\n\n### Conclusion\n\nNextflow provides a simplified way to develop across multiple platforms and removes\nmuch of the overhead associated with running niche, user developed pipelines in an HPC\nenvironment.\n", + "content": "As a new bioinformatics student with little formal computer science training, there are\nfew things that scare me more than PhD committee meetings and having to run my code in a\ncompletely different operating environment.\n\nRecently my work landed me in the middle of the phylogenetic tree jungle and the computational\nrequirements of my project far outgrew the resources that were available on our institute’s\n[Univa Grid Engine](https://en.wikipedia.org/wiki/Univa_Grid_Engine) based cluster. Luckily for me,\nan opportunity arose to participate in a joint program at the MareNostrum HPC at the\n[Barcelona Supercomputing Centre](http://www.bsc.es) (BSC).\n\nAs one of the top 100 supercomputers in the world, the [MareNostrum III](https://www.bsc.es/discover-bsc/the-centre/marenostrum)\ndwarfs our cluster and consists of nearly 50'000 processors. However it soon became apparent\nthat with great power comes great responsibility and in the case of the BSC, great restrictions.\nThese include no internet access, restrictive wall times for jobs, longer queues,\nfewer pre-installed binaries and an older version of bash. Faced with the possibility of\nhaving to rewrite my 16 bodged scripts for another queuing system I turned to Nextflow.\n\nStraight off the bat I was able to reduce all my previous scripts to a single Nextflow script.\nAdmittedly, the original code was not great, but the data processing model made me feel confident\nin what I was doing and I was able to reduce the volume of code to 25% of its initial amount\nwhilst making huge improvements in the readability. The real benefits however came from the portability.\n\nI was able to write the project on my laptop (Macbook Air), continuously test it on my local\ndesktop machine (Linux) and then perform more realistic heavy lifting runs on the cluster,\nall managed from a single GitHub repository. The BSC uses the [Load Sharing Facility](https://en.wikipedia.org/wiki/Platform_LSF)\n(LSF) platform with longer queue times, but a large number of CPUs. My project on the other\nhand had datasets that require over 100'000 tasks, but the tasks processes themselves run\nfor a matter of seconds or minutes. We were able to marry these two competing interests\ndeploying Nextflow in a [distributed execution manner that resemble the one of an MPI application](/blog/2015/mpi-like-execution-with-nextflow.html).\n\nIn this configuration, the queuing system allocates the Nextflow requested resources and\nusing the embedded [Apache Ignite](https://ignite.apache.org/) clustering engine, Nextflow handles\nthe submission of processes to the individual nodes.\n\nHere is some examples of how to run the same Nextflow project over multiple platforms.\n\n#### Local\n\nIf I wished to launch a job locally I can run it with the command:\n\n nextflow run myproject.nf\n\n#### Univa Grid Engine (UGE)\n\nFor the UGE I simply needed to specify the following in the `nextflow.config` file:\n\n process {\n executor='uge'\n queue='my_queue'\n }\n\nAnd then launch the pipeline execution as we did before:\n\n nextflow run myproject.nf\n\n#### Load Sharing Facility (LSF)\n\nFor running the same pipeline in the MareNostrum HPC environment, taking advantage of the MPI\nstandard to deploy my workload, I first created a wrapper script (for example `bsc-wrapper.sh`)\ndeclaring the resources that I want to reserve for the pipeline execution:\n\n #!/bin/bash\n #BSUB -oo logs/output_%J.out\n #BSUB -eo logs/output_%J.err\n #BSUB -J myProject\n #BSUB -q bsc_ls\n #BSUB -W 2:00\n #BSUB -x\n #BSUB -n 512\n #BSUB -R \"span[ptile=16]\"\n export NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)\n mpirun --pernode bin/nextflow run concMSA.nf -with-mpi\n\nAnd then can execute it using `bsub` as shown below:\n\n bsub < bsc-wrapper.sh\n\nBy running Nextflow in this way and given the wrapper above, a single `bsub` job will run\non 512 cores in 32 computing nodes (512/16 = 32) with a maximum wall time of 2 hours.\nThousands of Nextflow processes can be spawned during this and the execution can be monitored\nin the standard manner from a single Nextflow output and error files. If any errors occur\nthe execution can of course to continued with [`-resume` command line option](/docs/latest/getstarted.html?highlight=resume#modify-and-resume).\n\n### Conclusion\n\nNextflow provides a simplified way to develop across multiple platforms and removes\nmuch of the overhead associated with running niche, user developed pipelines in an HPC\nenvironment.", "images": [], "author": "Evan Floden", "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,hpc" @@ -95,7 +95,7 @@ "slug": "2016/docker-for-dunces-nextflow-for-nunces", "title": "Docker for dunces & Nextflow for nunces", "date": "2016-06-10T00:00:00.000Z", - "content": "\n_Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution._\n\nModern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.\n\nDuring several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.\n\nCreating the Nextflow pipeline ([here](http://www.github.com/cbcrg/lncrna-annotation-nf)) in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.\n\nThe pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the _long_ dependency tree. The solution was to turn to [Docker](https://www.docker.com/) containers.\n\nI have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.\n\n###Docker Installation\n\nBy far the most challenging issue is the installation of Docker. For local installations, the [process is relatively straight forward](https://docs.docker.com/engine/installation). However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with [updates addressing these issues](https://blog.docker.com/2016/02/docker-engine-1-10-security/).\n\nThat being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.\n\nIf you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.\n\n docker pull cbcrg/lncrna_annotation\n nextflow run cbcrg/lncrna-annotation-nf -profile test\n\n[If the following does not work, there could be a problem with your Docker installation.]\n\nThe first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and\nrun it using the test data included with it.\n\n###The Dockerfile\n\nThe `Dockerfile` contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules.\n\nWe begin by creating a file `Dockerfile` in the Nextflow project directory. The Dockerfile begins with:\n\n # Set the base image to debian jessie\n FROM debian:jessie\n\n # File Author / Maintainer\n MAINTAINER Evan Floden \n\nThis sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.\n\nNext we update the repository sources and install some essential tools such as `wget` and `perl`.\n\n RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...\n\nNotice that we use the command `RUN` before each line. The `RUN` instruction executes commands as if they are performed from the Linux shell.\n\nAlso is good practice to group as many as possible commands in the same `RUN` statement. This reduces the size of the final Docker image. See [here](https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/) for these details and [here](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/) for more best practices.\n\nNext we can specify the install of the required perl modules using [cpan minus](http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo):\n\n # Install perl modules\n RUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...\n\nWe can give the instructions to download and install software from GitHub using:\n\n # Install Star Mapper\n RUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR\n\nWe can add custom Perl modules and specify environmental variables such as `PERL5LIB` as below:\n\n # Install FEELnc\n RUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\n ENV FEELNCPATH /FEELnc\n ENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/\n\nR and R libraries can be installed as follows:\n\n # Install R\n RUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\n apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\n apt-get update --fix-missing && \\\n apt-get -y install r-base\n\n # Install R libraries\n RUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'\n\nFor the complete working Dockerfile of this project see [here](https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile)\n\n###Building the Docker Image\n\nOnce we start working on the Dockerfile, we can build it anytime using:\n\n docker build -t skptic/lncRNA_annotation .\n\nThis builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.\n\n###Testing the Docker Image\n\nWe find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:\n\n docker run -ti lncrna_annotation\n\n cd FEELnc/test\n\n FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n > -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\n exit # remember to exit the Docker image\n\n###Tagging the Docker Image\n\nOnce you are confident your image is built correctly, you can tag it, allowing you to push it to [Dockerhub.io](https://hub.docker.com/). Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.\n\nYou can view the images in your local repository with the `docker images` command and tag using `docker tag` with the image ID and the name.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n docker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest\n\nNow when we check our local images we can see the updated tag.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n cbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n###Pushing the Docker Image to Dockerhub\n\nIf you have not previously, sign up for a Dockerhub account [here](https://hub.docker.com/). From the command line, login to Dockerhub and push your image.\n\n docker login --username=cbcrg\n docker push cbcrg/lncrna_annotation\n\nYou can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:\n\n docker rmi -f d8ec49cbe3ed\n\n # Ensure the local version is not listed.\n docker images\n\n docker pull cbcrg/lncrna_annotation\n\nWe are now almost ready to run our pipeline. The last step is to set up the Nexflow config.\n\n###Nextflow Configuration\n\nWithin the `nextflow.config` file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:\n\n- General (same docker image for all processes):\n\n process {\n container = 'cbcrg/lncrna_annotation'\n }\n\n- Specific to a profile (specified by `-profile crg` for example):\n\n profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n\n- Specific to a given process within a pipeline:\n\n $processName.container = 'cbcrg/lncrna_annotation'\n\nIn most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have [previously written about this](https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html), but briefly, including a hash ensures that not a single byte of the operating system or software is different.\n\n process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }\n\nAll that is left now to run the pipeline.\n\n nextflow run lncRNA-Annotation-nf -profile test\n\nWhilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.\n\n###CircleCI and Nextflow\n\nNow that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the [CircleCI](http://www.circleci.com) servers (using Docker).\n\nTo include CircleCI in the Nexflow pipeline, create a file named `circle.yml` in the project directory. We add the following instructions to the file:\n\n machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\n dependencies:\n override:\n\n test:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test\n\nNext you can sign up to CircleCI, linking your GitHub account.\n\nWithin the GitHub README.md you can add a badge with the following:\n\n ![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)\n\n###Tips and Tricks\n\n**File permissions**: When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:\n\nfind -type f -exec chmod 644 {} \\;\nfind -type d -exec chmod 755 {} \\;\n\n###Summary\nThis was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform.\n", + "content": "_Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution._\n\nModern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.\n\nDuring several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.\n\nCreating the Nextflow pipeline ([here](http://www.github.com/cbcrg/lncrna-annotation-nf)) in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.\n\nThe pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the _long_ dependency tree. The solution was to turn to [Docker](https://www.docker.com/) containers.\n\nI have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.\n\n###Docker Installation\n\nBy far the most challenging issue is the installation of Docker. For local installations, the [process is relatively straight forward](https://docs.docker.com/engine/installation). However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with [updates addressing these issues](https://blog.docker.com/2016/02/docker-engine-1-10-security/).\n\nThat being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.\n\nIf you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.\n\n docker pull cbcrg/lncrna_annotation\n nextflow run cbcrg/lncrna-annotation-nf -profile test\n\n[If the following does not work, there could be a problem with your Docker installation.]\n\nThe first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and\nrun it using the test data included with it.\n\n###The Dockerfile\n\nThe `Dockerfile` contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules.\n\nWe begin by creating a file `Dockerfile` in the Nextflow project directory. The Dockerfile begins with:\n\n # Set the base image to debian jessie\n FROM debian:jessie\n\n # File Author / Maintainer\n MAINTAINER Evan Floden \n\nThis sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.\n\nNext we update the repository sources and install some essential tools such as `wget` and `perl`.\n\n RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...\n\nNotice that we use the command `RUN` before each line. The `RUN` instruction executes commands as if they are performed from the Linux shell.\n\nAlso is good practice to group as many as possible commands in the same `RUN` statement. This reduces the size of the final Docker image. See [here](https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/) for these details and [here](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/) for more best practices.\n\nNext we can specify the install of the required perl modules using [cpan minus](http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo):\n\n # Install perl modules\n RUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...\n\nWe can give the instructions to download and install software from GitHub using:\n\n # Install Star Mapper\n RUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR\n\nWe can add custom Perl modules and specify environmental variables such as `PERL5LIB` as below:\n\n # Install FEELnc\n RUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\n ENV FEELNCPATH /FEELnc\n ENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/\n\nR and R libraries can be installed as follows:\n\n # Install R\n RUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\n apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\n apt-get update --fix-missing && \\\n apt-get -y install r-base\n\n # Install R libraries\n RUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'\n\nFor the complete working Dockerfile of this project see [here](https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile)\n\n###Building the Docker Image\n\nOnce we start working on the Dockerfile, we can build it anytime using:\n\n docker build -t skptic/lncRNA_annotation .\n\nThis builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.\n\n###Testing the Docker Image\n\nWe find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:\n\n docker run -ti lncrna_annotation\n\n cd FEELnc/test\n\n FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n > -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\n exit # remember to exit the Docker image\n\n###Tagging the Docker Image\n\nOnce you are confident your image is built correctly, you can tag it, allowing you to push it to [Dockerhub.io](https://hub.docker.com/). Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.\n\nYou can view the images in your local repository with the `docker images` command and tag using `docker tag` with the image ID and the name.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n docker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest\n\nNow when we check our local images we can see the updated tag.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n cbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n###Pushing the Docker Image to Dockerhub\n\nIf you have not previously, sign up for a Dockerhub account [here](https://hub.docker.com/). From the command line, login to Dockerhub and push your image.\n\n docker login --username=cbcrg\n docker push cbcrg/lncrna_annotation\n\nYou can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:\n\n docker rmi -f d8ec49cbe3ed\n\n # Ensure the local version is not listed.\n docker images\n\n docker pull cbcrg/lncrna_annotation\n\nWe are now almost ready to run our pipeline. The last step is to set up the Nexflow config.\n\n###Nextflow Configuration\n\nWithin the `nextflow.config` file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:\n\n- General (same docker image for all processes):\n\n process {\n container = 'cbcrg/lncrna_annotation'\n }\n\n- Specific to a profile (specified by `-profile crg` for example):\n\n profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n\n- Specific to a given process within a pipeline:\n\n $processName.container = 'cbcrg/lncrna_annotation'\n\nIn most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have [previously written about this](https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html), but briefly, including a hash ensures that not a single byte of the operating system or software is different.\n\n process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }\n\nAll that is left now to run the pipeline.\n\n nextflow run lncRNA-Annotation-nf -profile test\n\nWhilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.\n\n###CircleCI and Nextflow\n\nNow that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the [CircleCI](http://www.circleci.com) servers (using Docker).\n\nTo include CircleCI in the Nexflow pipeline, create a file named `circle.yml` in the project directory. We add the following instructions to the file:\n\n machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\n dependencies:\n override:\n\n test:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test\n\nNext you can sign up to CircleCI, linking your GitHub account.\n\nWithin the GitHub README.md you can add a badge with the following:\n\n ![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)\n\n###Tips and Tricks\n\n**File permissions**: When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:\n\nfind -type f -exec chmod 644 {} \\;\nfind -type d -exec chmod 755 {} \\;\n\n###Summary\nThis was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform.\n", "images": [], "author": "Evan Floden", "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,docker" @@ -104,7 +104,7 @@ "slug": "2016/enabling-elastic-computing-nextflow", "title": "Enabling elastic computing with Nextflow", "date": "2016-10-19T00:00:00.000Z", - "content": "\n

\nLearn how to deploy an elastic computing cluster in the AWS cloud with Nextflow \n

\n\nIn the [previous post](/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html) I introduced\nthe new cloud native support for AWS provided by Nextflow.\n\nIt allows the creation of a computing cluster in the cloud in a no-brainer way, enabling\nthe deployment of complex computational pipelines in a few commands.\n\nThis solution is characterised by using a lean application stack which does not\nrequire any third party component installed in the EC2 instances other than a Java VM and the\nDocker engine (the latter it's only required in order to deploy pipeline binary dependencies).\n\n![Nextflow cloud deployment](/img/cloud-deployment.png)\n\nEach EC2 instance runs a script, at bootstrap time, that mounts the [EFS](https://aws.amazon.com/efs/)\nstorage and downloads and launches the Nextflow cluster daemon. This daemon is self-configuring,\nit automatically discovers the other running instances and joins them forming the computing cluster.\n\nThe simplicity of this stack makes it possible to setup the cluster in the cloud in just a few minutes,\na little more time than is required to spin up the EC2 VMs. This time does not depend on\nthe number of instances launched, as they configure themself independently.\n\nThis also makes it possible to add or remove instances as needed, realising the [long promised\nelastic scalability](http://www.nextplatform.com/2016/09/21/three-great-lies-cloud-computing/)\nof cloud computing.\n\nThis ability is even more important for bioinformatic workflows, which frequently crunch\nnot homogeneous datasets and are composed of tasks with very different computing requirements\n(eg. a few very long running tasks and many short-lived tasks in the same workload).\n\n### Going elastic\n\nThe Nextflow support for the cloud features an elastic cluster which is capable of resizing itself\nto adapt to the actual computing needs at runtime, thus spinning up new EC2 instances when jobs\nwait for too long in the execution queue, or terminating instances that are not used for\na certain amount of time.\n\nIn order to enable the cluster autoscaling you will need to specify the autoscale\nproperties in the `nextflow.config` file. For example:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.xlarge'\n\n autoscale {\n enabled = true\n minInstances = 5\n maxInstances = 10\n }\n}\n```\n\nThe above configuration enables the autoscaling features so that the cluster will include\nat least 5 nodes. If at any point one or more tasks spend more than 5 minutes without being\nprocessed, the number of instances needed to fullfil the pending tasks, up to limit specified\nby the `maxInstances` attribute, are launched. On the other hand, if these instances are\nidle, they are terminated before reaching the 60 minutes instance usage boundary.\n\nThe autoscaler launches instances by using the same AMI ID and type specified in the `cloud`\nconfiguration. However it is possible to define different attributes as shown below:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.large'\n\n autoscale {\n enabled = true\n maxInstances = 10\n instanceType = 'm4.2xlarge'\n spotPrice = 0.05\n }\n}\n```\n\nThe cluster is first created by using instance(s) of type `m4.large`. Then, when new\ncomputing nodes are required the autoscaler launches instances of type `m4.2xlarge`.\nAlso, since the `spotPrice` attribute is specified, [EC2 spot](https://aws.amazon.com/ec2/spot/)\ninstances are launched, instead of regular on-demand ones, bidding for the price specified.\n\n### Conclusion\n\nNextflow implements an easy though effective cloud scheduler that is able to scale dynamically\nto meet the computing needs of deployed workloads taking advantage of the _elastic_ nature\nof the cloud platform.\n\nThis ability, along the support for spot/preemptible instances, allows a cost effective solution\nfor the execution of your pipeline in the cloud.\n", + "content": "*Learn how to deploy an elastic computing cluster in the AWS cloud with Nextflow *\n\nIn the [previous post](/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html) I introduced\nthe new cloud native support for AWS provided by Nextflow.\n\nIt allows the creation of a computing cluster in the cloud in a no-brainer way, enabling\nthe deployment of complex computational pipelines in a few commands.\n\nThis solution is characterised by using a lean application stack which does not\nrequire any third party component installed in the EC2 instances other than a Java VM and the\nDocker engine (the latter it's only required in order to deploy pipeline binary dependencies).\n\n![Nextflow cloud deployment](/img/cloud-deployment.png)\n\nEach EC2 instance runs a script, at bootstrap time, that mounts the [EFS](https://aws.amazon.com/efs/)\nstorage and downloads and launches the Nextflow cluster daemon. This daemon is self-configuring,\nit automatically discovers the other running instances and joins them forming the computing cluster.\n\nThe simplicity of this stack makes it possible to setup the cluster in the cloud in just a few minutes,\na little more time than is required to spin up the EC2 VMs. This time does not depend on\nthe number of instances launched, as they configure themself independently.\n\nThis also makes it possible to add or remove instances as needed, realising the [long promised\nelastic scalability](http://www.nextplatform.com/2016/09/21/three-great-lies-cloud-computing/)\nof cloud computing.\n\nThis ability is even more important for bioinformatic workflows, which frequently crunch\nnot homogeneous datasets and are composed of tasks with very different computing requirements\n(eg. a few very long running tasks and many short-lived tasks in the same workload).\n\n### Going elastic\n\nThe Nextflow support for the cloud features an elastic cluster which is capable of resizing itself\nto adapt to the actual computing needs at runtime, thus spinning up new EC2 instances when jobs\nwait for too long in the execution queue, or terminating instances that are not used for\na certain amount of time.\n\nIn order to enable the cluster autoscaling you will need to specify the autoscale\nproperties in the `nextflow.config` file. For example:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.xlarge'\n\n autoscale {\n enabled = true\n minInstances = 5\n maxInstances = 10\n }\n}\n```\n\nThe above configuration enables the autoscaling features so that the cluster will include\nat least 5 nodes. If at any point one or more tasks spend more than 5 minutes without being\nprocessed, the number of instances needed to fullfil the pending tasks, up to limit specified\nby the `maxInstances` attribute, are launched. On the other hand, if these instances are\nidle, they are terminated before reaching the 60 minutes instance usage boundary.\n\nThe autoscaler launches instances by using the same AMI ID and type specified in the `cloud`\nconfiguration. However it is possible to define different attributes as shown below:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.large'\n\n autoscale {\n enabled = true\n maxInstances = 10\n instanceType = 'm4.2xlarge'\n spotPrice = 0.05\n }\n}\n```\n\nThe cluster is first created by using instance(s) of type `m4.large`. Then, when new\ncomputing nodes are required the autoscaler launches instances of type `m4.2xlarge`.\nAlso, since the `spotPrice` attribute is specified, [EC2 spot](https://aws.amazon.com/ec2/spot/)\ninstances are launched, instead of regular on-demand ones, bidding for the price specified.\n\n### Conclusion\n\nNextflow implements an easy though effective cloud scheduler that is able to scale dynamically\nto meet the computing needs of deployed workloads taking advantage of the _elastic_ nature\nof the cloud platform.\n\nThis ability, along the support for spot/preemptible instances, allows a cost effective solution\nfor the execution of your pipeline in the cloud.", "images": [], "author": "Paolo Di Tommaso", "tags": "aws,cloud,pipelines,nextflow,genomic,docker" @@ -113,7 +113,7 @@ "slug": "2016/error-recovery-and-automatic-resources-management", "title": "Error recovery and automatic resource management with Nextflow", "date": "2016-02-11T00:00:00.000Z", - "content": "\nRecently a new feature has been added to Nextflow that allows failing jobs to be rescheduled,\nautomatically increasing the amount of computational resources requested.\n\n## The problem\n\nNextflow provides a mechanism that allows tasks to be automatically re-executed when\na command terminates with an error exit status. This is useful to handle errors caused by\ntemporary or even permanent failures (i.e. network hiccups, broken disks, etc.) that\nmay happen in a cloud based environment.\n\nHowever in an HPC cluster these events are very rare. In this scenario\nerror conditions are more likely to be caused by a peak in computing resources, allocated\nby a job exceeding the original resource requested. This leads to the batch scheduler\nkilling the job which in turn stops the overall pipeline execution.\n\nIn this context automatically re-executing the failed task is useless because it\nwould simply replicate the same error condition. A common solution consists of increasing\nthe resource request for the needs of the most consuming job, even though this will result\nin a suboptimal allocation of most of the jobs that are less resource hungry.\n\nMoreover it is also difficult to predict such upper limit. In most cases the only way to\ndetermine it is by using a painful fail-and-retry approach.\n\nTake in consideration, for example, the following Nextflow process:\n\n process align {\n executor 'sge'\n memory 1.GB\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nThe above definition will execute as many jobs as there are fasta files emitted\nby the `sequences` channel. Since the `retry` _error strategy_ is specified, if the\ntask returns a non-zero error status, Nextflow will reschedule the job execution requesting\nthe same amount of memory and disk storage. In case the error is generated by `t_coffee` that\nit needs more than one GB of memory for a specific alignment, the task will continue to fail,\nstopping the pipeline execution as a consequence.\n\n## Increase job resources automatically\n\nA better solution can be implemented with Nextflow which allows resources to be defined in\na dynamic manner. By doing this it is possible to increase the memory request when\nrescheduling a failing task execution. For example:\n\n process align {\n executor 'sge'\n memory { 1.GB * task.attempt }\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nIn the above example the memory requirement is defined by using a dynamic rule.\nThe `task.attempt` attribute represents the current task attempt (`1` the first time the task\nis executed, `2` the second and so on).\n\nThe task will then request one GB of memory. In case of an error it will be rescheduled\nrequesting 2 GB and so on, until it is executed successfully or the limit of times a task\ncan be retried is reached, forcing the termination of the pipeline.\n\nIt is also possible to define the `errorStrategy` directive in a dynamic manner. This\nis useful to re-execute failed jobs only if a certain condition is verified.\n\nFor example the Univa Grid Engine batch scheduler returns the exit status `140` when a job\nis terminated because it's using more resources than the ones requested.\n\nBy checking this exit status we can reschedule only the jobs that fail by exceeding the\nresources allocation. This can be done with the following directive declaration:\n\n errorStrategy { task.exitStatus == 140 ? 'retry' : 'terminate' }\n\nIn this way a failed task is rescheduled only when it returns the `140` exit status.\nIn all other cases the pipeline execution is terminated.\n\n## Conclusion\n\nNextflow provides a very flexible mechanism for defining the job resource request and\nhandling error events. It makes it possible to automatically reschedule failing tasks under\ncertain conditions and to define job resource requests in a dynamic manner so that they\ncan be adapted to the actual job's needs and to optimize the overall resource utilisation.\n", + "content": "Recently a new feature has been added to Nextflow that allows failing jobs to be rescheduled,\nautomatically increasing the amount of computational resources requested.\n\n## The problem\n\nNextflow provides a mechanism that allows tasks to be automatically re-executed when\na command terminates with an error exit status. This is useful to handle errors caused by\ntemporary or even permanent failures (i.e. network hiccups, broken disks, etc.) that\nmay happen in a cloud based environment.\n\nHowever in an HPC cluster these events are very rare. In this scenario\nerror conditions are more likely to be caused by a peak in computing resources, allocated\nby a job exceeding the original resource requested. This leads to the batch scheduler\nkilling the job which in turn stops the overall pipeline execution.\n\nIn this context automatically re-executing the failed task is useless because it\nwould simply replicate the same error condition. A common solution consists of increasing\nthe resource request for the needs of the most consuming job, even though this will result\nin a suboptimal allocation of most of the jobs that are less resource hungry.\n\nMoreover it is also difficult to predict such upper limit. In most cases the only way to\ndetermine it is by using a painful fail-and-retry approach.\n\nTake in consideration, for example, the following Nextflow process:\n\n process align {\n executor 'sge'\n memory 1.GB\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nThe above definition will execute as many jobs as there are fasta files emitted\nby the `sequences` channel. Since the `retry` _error strategy_ is specified, if the\ntask returns a non-zero error status, Nextflow will reschedule the job execution requesting\nthe same amount of memory and disk storage. In case the error is generated by `t_coffee` that\nit needs more than one GB of memory for a specific alignment, the task will continue to fail,\nstopping the pipeline execution as a consequence.\n\n## Increase job resources automatically\n\nA better solution can be implemented with Nextflow which allows resources to be defined in\na dynamic manner. By doing this it is possible to increase the memory request when\nrescheduling a failing task execution. For example:\n\n process align {\n executor 'sge'\n memory { 1.GB * task.attempt }\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n }\n\nIn the above example the memory requirement is defined by using a dynamic rule.\nThe `task.attempt` attribute represents the current task attempt (`1` the first time the task\nis executed, `2` the second and so on).\n\nThe task will then request one GB of memory. In case of an error it will be rescheduled\nrequesting 2 GB and so on, until it is executed successfully or the limit of times a task\ncan be retried is reached, forcing the termination of the pipeline.\n\nIt is also possible to define the `errorStrategy` directive in a dynamic manner. This\nis useful to re-execute failed jobs only if a certain condition is verified.\n\nFor example the Univa Grid Engine batch scheduler returns the exit status `140` when a job\nis terminated because it's using more resources than the ones requested.\n\nBy checking this exit status we can reschedule only the jobs that fail by exceeding the\nresources allocation. This can be done with the following directive declaration:\n\n errorStrategy { task.exitStatus == 140 ? 'retry' : 'terminate' }\n\nIn this way a failed task is rescheduled only when it returns the `140` exit status.\nIn all other cases the pipeline execution is terminated.\n\n## Conclusion\n\nNextflow provides a very flexible mechanism for defining the job resource request and\nhandling error events. It makes it possible to automatically reschedule failing tasks under\ncertain conditions and to define job resource requests in a dynamic manner so that they\ncan be adapted to the actual job's needs and to optimize the overall resource utilisation.", "images": [], "author": "Paolo Di Tommaso", "tags": "bioinformatics,pipelines,nextflow,hpc" @@ -122,7 +122,7 @@ "slug": "2016/more-fun-containers-hpc", "title": "More fun with containers in HPC", "date": "2016-12-20T00:00:00.000Z", - "content": "\nNextflow was one of the [first workflow framework](https://www.nextflow.io/blog/2014/nextflow-meets-docker.html)\nto provide built-in support for Docker containers. A couple of years ago we also started\nto experiment with the deployment of containerised bioinformatic pipelines at CRG,\nusing Docker technology (see [here](<(https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html)>) and [here](https://www.nextplatform.com/2016/01/28/crg-goes-with-the-genomics-flow/)).\n\nWe found that by isolating and packaging the complete computational workflow environment\nwith the use of Docker images, radically simplifies the burden of maintaining complex\ndependency graphs of real workload data analysis pipelines.\n\nEven more importantly, the use of containers enables replicable results with minimal effort\nfor the system configuration. The entire computational environment can be archived in a\nself-contained executable format, allowing the replication of the associated analysis at\nany point in time.\n\nThis ability is the main reason that drove the rapid adoption of Docker in the bioinformatic\ncommunity and its support in many projects, like for example [Galaxy](https://galaxyproject.org),\n[CWL](http://commonwl.org), [Bioboxes](http://bioboxes.org), [Dockstore](https://dockstore.org) and many others.\n\nHowever, while the popularity of Docker spread between the developers, its adaption in\nresearch computing infrastructures continues to remain very low and it's very unlikely\nthat this trend will change in the future.\n\nThe reason for this resides in the Docker architecture, which requires a daemon running\nwith root permissions on each node of a computing cluster. Such a requirement raises many\nsecurity concerns, thus good practices would prevent its use in shared HPC cluster or\nsupercomputer environments.\n\n### Introducing Singularity\n\nAlternative implementations, such as [Singularity](http://singularity.lbl.gov), have\nfortunately been promoted by the interested in containers technology.\n\nSingularity is a containers engine developed at the Berkeley Lab and designed for the\nneeds of scientific workloads. The main differences with Docker are: containers are file\nbased, no root escalation is allowed nor root permission is needed to run a container\n(although a privileged user is needed to create a container image), and there is no\nseparate running daemon.\n\nThese, along with other features, such as support for autofs mounts, makes Singularity a\ncontainer engine better suited to the requirements of HPC clusters and supercomputers.\n\nMoreover, although Singularity uses a container image format different to that of Docker,\nthey provide a conversion tool that allows Docker images to be converted to the\nSingularity format.\n\n### Singularity in the wild\n\nWe integrated Singularity support in Nextflow framework and tested it in the CRG\ncomputing cluster and the BSC [MareNostrum](https://www.bsc.es/discover-bsc/the-centre/marenostrum) supercomputer.\n\nThe absence of a separate running daemon or image gateway made the installation\nstraightforward when compared to Docker or other solutions.\n\nTo evaluate the performance of Singularity we carried out the [same benchmarks](https://peerj.com/articles/1273/)\nwe performed for Docker and compared the results of the two engines.\n\nThe benchmarks consisted in the execution of three Nextflow based genomic pipelines:\n\n1. [Rna-toy](https://github.com/nextflow-io/rnatoy/tree/peerj5515): a simple pipeline for RNA-Seq data analysis.\n2. [Nmdp-Flow](https://github.com/nextflow-io/nmdp-flow/tree/peerj5515/): an assembly-based variant calling pipeline.\n3. [Piper-NF](https://github.com/cbcrg/piper-nf/tree/peerj5515): a pipeline for the detection and mapping of long non-coding RNAs.\n\nIn order to repeat the analyses, we converted the container images we used to perform\nthe Docker benchmarks to Singularity image files by using the [docker2singularity](https://github.com/singularityware/docker2singularity) tool\n_(this is not required anymore, see the update below)_.\n\nThe only change needed to run these pipelines with Singularity was to replace the Docker\nspecific settings with the following ones in the configuration file:\n\n singularity.enabled = true\n process.container = ''\n\nEach pipeline was executed 10 times, alternately by using Docker and Singularity as\ncontainer engine. The results are shown in the following table (time in minutes):\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PipelineTasksMean task timeMean execution timeExecution time std devRatio
  SingularityDockerSingularityDockerSingularityDocker 
RNA-Seq973.773.6663.6662.32.03.10.998
Variant call4822.122.41061.21074.443.138.51.012
Piper-NF981.21.3120.0124.56.9 2.81.038
\n\nThe benchmark results show that there isn't any significative difference in the\nexecution times of containerised workflows between Docker and Singularity. In two\ncases Singularity was slightly faster and a third one it was almost identical although\na little slower than Docker.\n\n### Conclusion\n\nIn our evaluation Singularity proved to be an easy to install,\nstable and performant container engine.\n\nThe only minor drawback, we found when compared to Docker, was the need to define the\nhost path mount points statically when the Singularity images were created. In fact,\neven if Singularity supports user mount points to be defined dynamically when the\ncontainer is launched, this feature requires the overlay file system which was not\nsupported by the kernel available in our system.\n\nDocker surely will remain the _de facto_ standard engine and image format for containers\ndue to its popularity and [impressive growth](http://www.coscale.com/blog/docker-usage-statistics-increased-adoption-by-enterprises-and-for-production-use).\n\nHowever, in our opinion, Singularity is the tool of choice for the execution of\ncontainerised workloads in the context of HPC, thanks to its focus on system security\nand its simpler architectural design.\n\nThe transparent support provided by Nextflow for both Docker and Singularity technology\nguarantees the ability to deploy your workflows in a range of different platforms (cloud,\ncluster, supercomputer, etc). Nextflow transparently manages the deployment of the\ncontainerised workload according to the runtime available in the target system.\n\n#### Credits\n\nThanks to Gabriel Gonzalez (CRG), Luis Exposito (CRG) and Carlos Tripiana Montes (BSC)\nfor the support installing Singularity.\n\n**Update** Singularity, since version 2.3.x, is able to pull and run Docker images from the Docker Hub.\nThis greatly simplifies the interoperability with existing Docker containers. You only need\nto prefix the image name with the `docker://` pseudo-protocol to download it as a Singularity image,\nfor example:\n\n singularity pull --size 1200 docker://nextflow/rnatoy\n", + "content": "Nextflow was one of the [first workflow framework](https://www.nextflow.io/blog/2014/nextflow-meets-docker.html)\nto provide built-in support for Docker containers. A couple of years ago we also started\nto experiment with the deployment of containerised bioinformatic pipelines at CRG,\nusing Docker technology (see [here](<(https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html)>) and [here](https://www.nextplatform.com/2016/01/28/crg-goes-with-the-genomics-flow/)).\n\nWe found that by isolating and packaging the complete computational workflow environment\nwith the use of Docker images, radically simplifies the burden of maintaining complex\ndependency graphs of real workload data analysis pipelines.\n\nEven more importantly, the use of containers enables replicable results with minimal effort\nfor the system configuration. The entire computational environment can be archived in a\nself-contained executable format, allowing the replication of the associated analysis at\nany point in time.\n\nThis ability is the main reason that drove the rapid adoption of Docker in the bioinformatic\ncommunity and its support in many projects, like for example [Galaxy](https://galaxyproject.org),\n[CWL](http://commonwl.org), [Bioboxes](http://bioboxes.org), [Dockstore](https://dockstore.org) and many others.\n\nHowever, while the popularity of Docker spread between the developers, its adaption in\nresearch computing infrastructures continues to remain very low and it's very unlikely\nthat this trend will change in the future.\n\nThe reason for this resides in the Docker architecture, which requires a daemon running\nwith root permissions on each node of a computing cluster. Such a requirement raises many\nsecurity concerns, thus good practices would prevent its use in shared HPC cluster or\nsupercomputer environments.\n\n### Introducing Singularity\n\nAlternative implementations, such as [Singularity](http://singularity.lbl.gov), have\nfortunately been promoted by the interested in containers technology.\n\nSingularity is a containers engine developed at the Berkeley Lab and designed for the\nneeds of scientific workloads. The main differences with Docker are: containers are file\nbased, no root escalation is allowed nor root permission is needed to run a container\n(although a privileged user is needed to create a container image), and there is no\nseparate running daemon.\n\nThese, along with other features, such as support for autofs mounts, makes Singularity a\ncontainer engine better suited to the requirements of HPC clusters and supercomputers.\n\nMoreover, although Singularity uses a container image format different to that of Docker,\nthey provide a conversion tool that allows Docker images to be converted to the\nSingularity format.\n\n### Singularity in the wild\n\nWe integrated Singularity support in Nextflow framework and tested it in the CRG\ncomputing cluster and the BSC [MareNostrum](https://www.bsc.es/discover-bsc/the-centre/marenostrum) supercomputer.\n\nThe absence of a separate running daemon or image gateway made the installation\nstraightforward when compared to Docker or other solutions.\n\nTo evaluate the performance of Singularity we carried out the [same benchmarks](https://peerj.com/articles/1273/)\nwe performed for Docker and compared the results of the two engines.\n\nThe benchmarks consisted in the execution of three Nextflow based genomic pipelines:\n\n1. [Rna-toy](https://github.com/nextflow-io/rnatoy/tree/peerj5515): a simple pipeline for RNA-Seq data analysis.\n2. [Nmdp-Flow](https://github.com/nextflow-io/nmdp-flow/tree/peerj5515/): an assembly-based variant calling pipeline.\n3. [Piper-NF](https://github.com/cbcrg/piper-nf/tree/peerj5515): a pipeline for the detection and mapping of long non-coding RNAs.\n\nIn order to repeat the analyses, we converted the container images we used to perform\nthe Docker benchmarks to Singularity image files by using the [docker2singularity](https://github.com/singularityware/docker2singularity) tool\n_(this is not required anymore, see the update below)_.\n\nThe only change needed to run these pipelines with Singularity was to replace the Docker\nspecific settings with the following ones in the configuration file:\n\n singularity.enabled = true\n process.container = ''\n\nEach pipeline was executed 10 times, alternately by using Docker and Singularity as\ncontainer engine. The results are shown in the following table (time in minutes):\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PipelineTasksMean task timeMean execution timeExecution time std devRatio
  SingularityDockerSingularityDockerSingularityDocker 
RNA-Seq973.773.6663.6662.32.03.10.998
Variant call4822.122.41061.21074.443.138.51.012
Piper-NF981.21.3120.0124.56.9 2.81.038
\n\nThe benchmark results show that there isn't any significative difference in the\nexecution times of containerised workflows between Docker and Singularity. In two\ncases Singularity was slightly faster and a third one it was almost identical although\na little slower than Docker.\n\n### Conclusion\n\nIn our evaluation Singularity proved to be an easy to install,\nstable and performant container engine.\n\nThe only minor drawback, we found when compared to Docker, was the need to define the\nhost path mount points statically when the Singularity images were created. In fact,\neven if Singularity supports user mount points to be defined dynamically when the\ncontainer is launched, this feature requires the overlay file system which was not\nsupported by the kernel available in our system.\n\nDocker surely will remain the _de facto_ standard engine and image format for containers\ndue to its popularity and [impressive growth](http://www.coscale.com/blog/docker-usage-statistics-increased-adoption-by-enterprises-and-for-production-use).\n\nHowever, in our opinion, Singularity is the tool of choice for the execution of\ncontainerised workloads in the context of HPC, thanks to its focus on system security\nand its simpler architectural design.\n\nThe transparent support provided by Nextflow for both Docker and Singularity technology\nguarantees the ability to deploy your workflows in a range of different platforms (cloud,\ncluster, supercomputer, etc). Nextflow transparently manages the deployment of the\ncontainerised workload according to the runtime available in the target system.\n\n#### Credits\n\nThanks to Gabriel Gonzalez (CRG), Luis Exposito (CRG) and Carlos Tripiana Montes (BSC)\nfor the support installing Singularity.\n\n**Update** Singularity, since version 2.3.x, is able to pull and run Docker images from the Docker Hub.\nThis greatly simplifies the interoperability with existing Docker containers. You only need\nto prefix the image name with the `docker://` pseudo-protocol to download it as a Singularity image,\nfor example:\n\n singularity pull --size 1200 docker://nextflow/rnatoy\n
", "images": [], "author": "Paolo Di Tommaso", "tags": "aws,pipelines,nextflow,genomic,docker,singularity" @@ -131,7 +131,7 @@ "slug": "2017/caw-and-singularity", "title": "Running CAW with Singularity and Nextflow", "date": "2017-11-16T00:00:00.000Z", - "content": "\nThis is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max\ndescribes how they deploy complex cancer data analysis pipelines using Nextflow\nand Singularity. We are very happy to share their experience across the Nextflow community.\n\n### The CAW pipeline\n\n\"Cancer\n\n[Cancer Analysis Workflow](http://opensource.scilifelab.se/projects/sarek/) (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs.\nIt is developed in collaboration with two infrastructures within [Science for Life Laboratory](https://www.scilifelab.se/): [National Genomics Infrastructure](https://ngisweden.scilifelab.se/) (NGI), in The Stockholm [Genomics Applications Development Facility](https://www.scilifelab.se/facilities/ngi-stockholm/) to be precise and [National Bioinformatics Infrastructure Sweden](https://www.nbis.se/) (NBIS).\n\nCAW is based on [GATK Best Practices](https://software.broadinstitute.org/gatk/best-practices/) for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels ([MuTect1](https://github.com/broadinstitute/mutect/), [MuTect2](https://github.com/broadgsa/gatk-protected/), [Strelka](https://github.com/Illumina/strelka/), [Freebayes](https://github.com/ekg/freebayes/)), ([GATK HaplotyeCaller](https://github.com/broadgsa/gatk-protected/)), for structural variants([Manta](https://github.com/Illumina/manta/)) and for CNVs ([ASCAT](https://github.com/Crick-CancerGenomics/ascat/)).\nAnnotation tools ([snpEff](http://snpeff.sourceforge.net/), [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)) are also used, and finally [MultiQC](http://multiqc.info/) for handling reports.\n\nWe are currently working on a manuscript, but you're welcome to look at (or even contribute to) our [github repository](https://github.com/SciLifeLab/CAW/) or talk with us on our [gitter channel](https://gitter.im/SciLifeLab/CAW/).\n\n### Singularity and UPPMAX\n\n[Singularity](http://singularity.lbl.gov/) is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.\n\nWe're based in Sweden, and [Uppsala Multidisciplinary Center for Advanced Computational Science](https://uppmax.uu.se/) (UPPMAX) provides Computational infrastructures for all Swedish researchers.\nSince we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: [SNIC-SENS](https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/).\n\nIn my case, since we're still developing the pipeline, I am mainly using the research cluster [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/).\nSo I can only transfer files and data in one specific repository using SFTP.\n\nUPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time.\nTypically, [Environment Modules](http://modules.sourceforge.net/) are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.\n\n### Why use containers?\n\nThe idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow.\nWe cannot use [Docker](https://www.docker.com/) on our secure cluster, so we wanted to run CAW with [Singularity](http://singularity.lbl.gov/) images instead.\n\n### How was the switch made?\n\nWe were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process.\nBecause this process is quite slow, repetitive and I'm lazy like to automate everything, I made a simple NF [script](https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf) to build and push all docker containers.\nBasically it's just `build` and `pull` for all containers, with some configuration possibilities.\n\n```\ndocker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}\n```\n\nSince Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.\n\n```\nsingularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}\n```\n\nAfter this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile.\nI'll spare you the details of the SFTP transfer.\nThis is what the configuration file for such Singularity images looks like: [`singularity-path.config`](https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config)\n\n```\n/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}\n```\n\nThis approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...\n\n### Conclusion\n\nThis switch was completed a couple of months ago and has been a great success.\nWe are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI.\nEven if we do enjoy the improved control, we must not forgot that:\n\n> With great power comes great responsibility!\n\n### Credits\n\nThanks to [Rickard Hammarén](https://github.com/Hammarn) and [Phil Ewels](http://phil.ewels.co.uk/) for comments and suggestions for improving the post.\n", + "content": "*This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max\ndescribes how they deploy complex cancer data analysis pipelines using Nextflow\nand Singularity. We are very happy to share their experience across the Nextflow community.*\n\n### The CAW pipeline\n\n\"Cancer\n\n[Cancer Analysis Workflow](http://opensource.scilifelab.se/projects/sarek/) (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs.\nIt is developed in collaboration with two infrastructures within [Science for Life Laboratory](https://www.scilifelab.se/): [National Genomics Infrastructure](https://ngisweden.scilifelab.se/) (NGI), in The Stockholm [Genomics Applications Development Facility](https://www.scilifelab.se/facilities/ngi-stockholm/) to be precise and [National Bioinformatics Infrastructure Sweden](https://www.nbis.se/) (NBIS).\n\nCAW is based on [GATK Best Practices](https://software.broadinstitute.org/gatk/best-practices/) for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels ([MuTect1](https://github.com/broadinstitute/mutect/), [MuTect2](https://github.com/broadgsa/gatk-protected/), [Strelka](https://github.com/Illumina/strelka/), [Freebayes](https://github.com/ekg/freebayes/)), ([GATK HaplotyeCaller](https://github.com/broadgsa/gatk-protected/)), for structural variants([Manta](https://github.com/Illumina/manta/)) and for CNVs ([ASCAT](https://github.com/Crick-CancerGenomics/ascat/)).\nAnnotation tools ([snpEff](http://snpeff.sourceforge.net/), [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)) are also used, and finally [MultiQC](http://multiqc.info/) for handling reports.\n\nWe are currently working on a manuscript, but you're welcome to look at (or even contribute to) our [github repository](https://github.com/SciLifeLab/CAW/) or talk with us on our [gitter channel](https://gitter.im/SciLifeLab/CAW/).\n\n### Singularity and UPPMAX\n\n[Singularity](http://singularity.lbl.gov/) is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.\n\nWe're based in Sweden, and [Uppsala Multidisciplinary Center for Advanced Computational Science](https://uppmax.uu.se/) (UPPMAX) provides Computational infrastructures for all Swedish researchers.\nSince we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: [SNIC-SENS](https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/).\n\nIn my case, since we're still developing the pipeline, I am mainly using the research cluster [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/).\nSo I can only transfer files and data in one specific repository using SFTP.\n\nUPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time.\nTypically, [Environment Modules](http://modules.sourceforge.net/) are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.\n\n### Why use containers?\n\nThe idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow.\nWe cannot use [Docker](https://www.docker.com/) on our secure cluster, so we wanted to run CAW with [Singularity](http://singularity.lbl.gov/) images instead.\n\n### How was the switch made?\n\nWe were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process.\nBecause this process is quite slow, repetitive and I~~'m lazy~~ like to automate everything, I made a simple NF [script](https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf) to build and push all docker containers.\nBasically it's just `build` and `pull` for all containers, with some configuration possibilities.\n\n```\ndocker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}\n```\n\nSince Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.\n\n```\nsingularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}\n```\n\nAfter this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile.\nI'll spare you the details of the SFTP transfer.\nThis is what the configuration file for such Singularity images looks like: [`singularity-path.config`](https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config)\n\n```\n/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}\n```\n\nThis approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...\n\n### Conclusion\n\nThis switch was completed a couple of months ago and has been a great success.\nWe are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI.\nEven if we do enjoy the improved control, we must not forgot that:\n\n> With great power comes great responsibility!\n\n### Credits\n\nThanks to [Rickard Hammarén](https://github.com/Hammarn) and [Phil Ewels](http://phil.ewels.co.uk/) for comments and suggestions for improving the post.", "images": [ "/img/CAW_logo.png" ], @@ -142,7 +142,7 @@ "slug": "2017/nextflow-and-cwl", "title": "Nextflow and the Common Workflow Language", "date": "2017-07-20T00:00:00.000Z", - "content": "\nThe Common Workflow Language ([CWL](http://www.commonwl.org/)) is a specification for defining\nworkflows in a declarative manner. It has been implemented to varying degrees\nby different software packages. Nextflow and CWL share a common goal of enabling portable\nreproducible workflows.\n\nWe are currently investigating the automatic conversion of CWL workflows into Nextflow scripts\nto increase the portability of workflows. This work is being developed as\nthe [cwl2nxf](https://github.com/nextflow-io/cwl2nxf) project, currently in early prototype stage.\n\nOur first phase of the project was to determine mappings of CWL to Nextflow and familiarize\nourselves with how the current implementation of the converter supports a number of CWL specific\nfeatures.\n\n### Mapping CWL to Nextflow\n\nInputs in the CWL workflow file are initially parsed as _channels_ or other Nextflow input types.\nEach step specified in the workflow is then parsed independently. At the time of writing\nsubworkflows are not supported, each step must be a CWL `CommandLineTool` file.\n\nThe image below shows an example of the major components in the CWL files and then post-conversion (click to zoom).\n\n[![Nextflow CWL conversion](/img/cwl2nxf-min.png)](/img/cwl2nxf-min.png)\n\nCWL and Nextflow share a similar structure of defining inputs and outputs as shown above.\n\nA notable difference between the two is how tasks are defined. CWL requires either a separate\nfile for each task or a sub-workflow. CWL also requires the explicit mapping of each command\nline option for an executed tool. This is done using YAML meta-annotation to indicate the position, prefix, etc.\nfor each command line option.\n\nIn Nextflow a task command is defined as a separated component in the `process` definition and\nit is ultimately a multiline string which is interpreted by a command script by the underlying\nsystem. Input parameters can be used in the command string with a simple variable interpolation\nmechanism. This is beneficial as it simplifies porting existing BASH scripts to Nextflow\nwith minimal refactoring.\n\nThese examples highlight some of the differences between the two approaches, and the difficulties\nconverting complex use cases such as scatter, CWL expressions, and conditional command line inclusion.\n\n### Current status\n\nThe cwl2nxf is a Groovy based tool with a limited conversion ability. It parses the\nYAML documents and maps the various CWL objects to Nextflow. Conversion examples are\nprovided as part of the repository along with documentation for each example specifying the mapping.\n\nThis project was initially focused on developing an understanding of how to translate CWL to Nextflow.\nA number of CWL specific features such as scatter, secondary files and simple JavaScript expressions\nwere analyzed and implemented.\n\nThe GitHub repository includes instructions on how to build cwl2nxf and an example usage.\nThe tool can be executed as either just a parser printing the converted CWL to stdout,\nor by specifying an output file which will generate the Nextflow script file and if necessary\na config file.\n\nThe tool takes in a CWL workflow file and the YAML inputs file. It does not currently work\nwith a standalone `CommandLineTool`. The following example show how to run it:\n\n```\njava -jar build/libs/cwl2nxf-*.jar rnatoy.cwl samp.yaml\n```\n\n
\nSee the GitHub [repository](https://github.com/nextflow-io/cwl2nxf) for further details.\n\n### Conclusion\n\nWe are continuing to investigate ways to improve the interoperability of Nextflow with CWL.\nAlthough still an early prototype, the cwl2nxf tool provides some level of conversion of CWL to Nextflow.\n\nWe are also planning to explore [CWL Avro](https://github.com/common-workflow-language/cwlavro),\nwhich may provide a more efficient way to parse and handle CWL objects for conversion to Nextflow.\n\nAdditionally, a number of workflows in the GitHub repository have been implemented in both\nCWL and Nextflow which can be used as a comparison of the two languages.\n\nThe Nextflow team will be presenting a short talk and participating in the Codefest at [BOSC 2017](https://www.open-bio.org/wiki/BOSC_2017).\nWe are interested in hearing from the community regarding CWL to Nextflow conversion, and would like\nto encourage anyone interested to contribute to the cwl2nxf project.\n", + "content": "The Common Workflow Language ([CWL](http://www.commonwl.org/)) is a specification for defining\nworkflows in a declarative manner. It has been implemented to varying degrees\nby different software packages. Nextflow and CWL share a common goal of enabling portable\nreproducible workflows.\n\nWe are currently investigating the automatic conversion of CWL workflows into Nextflow scripts\nto increase the portability of workflows. This work is being developed as\nthe [cwl2nxf](https://github.com/nextflow-io/cwl2nxf) project, currently in early prototype stage.\n\nOur first phase of the project was to determine mappings of CWL to Nextflow and familiarize\nourselves with how the current implementation of the converter supports a number of CWL specific\nfeatures.\n\n### Mapping CWL to Nextflow\n\nInputs in the CWL workflow file are initially parsed as _channels_ or other Nextflow input types.\nEach step specified in the workflow is then parsed independently. At the time of writing\nsubworkflows are not supported, each step must be a CWL `CommandLineTool` file.\n\nThe image below shows an example of the major components in the CWL files and then post-conversion (click to zoom).\n\n[![Nextflow CWL conversion](/img/cwl2nxf-min.png)](/img/cwl2nxf-min.png)\n\nCWL and Nextflow share a similar structure of defining inputs and outputs as shown above.\n\nA notable difference between the two is how tasks are defined. CWL requires either a separate\nfile for each task or a sub-workflow. CWL also requires the explicit mapping of each command\nline option for an executed tool. This is done using YAML meta-annotation to indicate the position, prefix, etc.\nfor each command line option.\n\nIn Nextflow a task command is defined as a separated component in the `process` definition and\nit is ultimately a multiline string which is interpreted by a command script by the underlying\nsystem. Input parameters can be used in the command string with a simple variable interpolation\nmechanism. This is beneficial as it simplifies porting existing BASH scripts to Nextflow\nwith minimal refactoring.\n\nThese examples highlight some of the differences between the two approaches, and the difficulties\nconverting complex use cases such as scatter, CWL expressions, and conditional command line inclusion.\n\n### Current status\n\nThe cwl2nxf is a Groovy based tool with a limited conversion ability. It parses the\nYAML documents and maps the various CWL objects to Nextflow. Conversion examples are\nprovided as part of the repository along with documentation for each example specifying the mapping.\n\nThis project was initially focused on developing an understanding of how to translate CWL to Nextflow.\nA number of CWL specific features such as scatter, secondary files and simple JavaScript expressions\nwere analyzed and implemented.\n\nThe GitHub repository includes instructions on how to build cwl2nxf and an example usage.\nThe tool can be executed as either just a parser printing the converted CWL to stdout,\nor by specifying an output file which will generate the Nextflow script file and if necessary\na config file.\n\nThe tool takes in a CWL workflow file and the YAML inputs file. It does not currently work\nwith a standalone `CommandLineTool`. The following example show how to run it:\n\n```\njava -jar build/libs/cwl2nxf-*.jar rnatoy.cwl samp.yaml\n```\n\n
\nSee the GitHub [repository](https://github.com/nextflow-io/cwl2nxf) for further details.\n\n### Conclusion\n\nWe are continuing to investigate ways to improve the interoperability of Nextflow with CWL.\nAlthough still an early prototype, the cwl2nxf tool provides some level of conversion of CWL to Nextflow.\n\nWe are also planning to explore [CWL Avro](https://github.com/common-workflow-language/cwlavro),\nwhich may provide a more efficient way to parse and handle CWL objects for conversion to Nextflow.\n\nAdditionally, a number of workflows in the GitHub repository have been implemented in both\nCWL and Nextflow which can be used as a comparison of the two languages.\n\nThe Nextflow team will be presenting a short talk and participating in the Codefest at [BOSC 2017](https://www.open-bio.org/wiki/BOSC_2017).\nWe are interested in hearing from the community regarding CWL to Nextflow conversion, and would like\nto encourage anyone interested to contribute to the cwl2nxf project.", "images": [], "author": "Kevin Sayers", "tags": "nextflow,workflow,reproducibility,cwl" @@ -151,7 +151,7 @@ "slug": "2017/nextflow-hack17", "title": "Nexflow Hackathon 2017", "date": "2017-09-30T00:00:00.000Z", - "content": "\nLast week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation\n(CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as\nwell as two hackathon sessions for more advanced users.\n\nNearly 50 participants attended over the two days which included an entertaining tapas course\nduring the first evening!\n\nOne of the main objectives of the event was to bring together Nextflow users to work\ntogether on common interest projects. There were several proposals for the hackathon\nsessions and in the end five diverse ideas were chosen for communal development ranging from\nnew pipelines through to the addition of new features in Nextflow.\n\nThe proposals and outcomes of each the projects, which can be found in the issues section\nof [this GitHub repository](https://github.com/nextflow-io/hack17), have been summarised below.\n\n### Nextflow HTML tracing reports\n\nThe HTML tracing project aims to generate a rendered version of the Nextflow trace file to\nenable fast sorting and visualisation of task/process execution statistics.\n\nCurrently the data in the trace includes information such as CPU duration, memory usage and\ncompletion status of each task, however wading through the file is often not convenient\nwhen a large number of tasks have been executed.\n\n[Phil Ewels](https://github.com/ewels) proposed the idea and led the coordination effort\nwith the outcome being a very impressive working prototype which can be found in the Nextflow\nbranch `html-trace`.\n\nAn image of the example report is shown below with the interactive HTML available\n[here](/misc/nf-trace-report.html). It is expected to be merged into the main branch of Nextflow\nwith documentation in a near-future release.\n\n![Nextflow HTML execution report](/img/nf-trace-report-min.png)\n\n### Nextflow pipeline for 16S microbial data\n\nThe H3Africa Bioinformatics Network have been developing several pipelines which are used\nacross the participating centers. The diverse computing resources available across the nodes has led to\nmembers wanting workflow solutions with a particular focus on portability.\n\nWith this is mind, Scott Hazelhurst proposed a project for a 16S Microbial data analysis\npipeline which had [previously been developed using CWL](https://github.com/h3abionet/h3abionet16S/tree/master).\n\nThe participants made a new [branch](https://github.com/h3abionet/h3abionet16S/tree/nextflow)\nof the original pipeline and ported it into Nextflow.\n\nThe pipeline will continue to be developed with the goal of acting as a comparison between\nCWL and Nextflow. It is thought this can then be extended to other pipelines by both those\nwho are already familiar with Nextflow as well as used as a tool for training newer users.\n\n### Nextflow modules prototyping\n\n_Toolboxing_ allows users to incorporate software into their pipelines in an efficient and\nreproducible manner. Various software repositories are becoming increasing popular,\nhighlighted by the over 5,000 tools available in the [Galaxy Toolshed](https://toolshed.g2.bx.psu.edu/).\n\nProjects such as [Biocontainers](http://biocontainers.pro/) aim to wrap up the execution\nenvironment using containers. [Myself](https://github.com/skptic) and [Johan Viklund](https://github.com/viklund)\nwished to piggyback off existing repositories and settled on [Dockstore](https://dockstore.org)\nwhich is an open platform compliant with the [GA4GH](http://genomicsandhealth.org) initiative.\n\nThe majority of tools in Dockstore are written in the CWL and therefore we required a parser\nbetween the CWL CommandLineTool class and Nextflow processes. Johan was able to develop\na parser which generates Nextflow processes for several Dockstore tools.\n\nAs these resources such as Dockstore become mature and standardised, it will be\npossible to automatically generate a _Nextflow Store_ and enable efficient incorporation\nof tools into workflows.\n\n\n\n_Example showing a Nextflow process generated from the Dockstore CWL repository for the tool BAMStats._\n\n### Nextflow pipeline for de novo assembly of nanopore reads\n\n[Nanopore sequencing](https://en.wikipedia.org/wiki/Nanopore_sequencing) is an exciting\nand emerging technology which promises to change the landscape of nucleotide sequencing.\n\nWith keen interest in Nanopore specific pipelines, [Hadrien Gourlé](https://github.com/HadrienG)\nlead the hackathon project for _Nanoflow_.\n\n[Nanoflow](https://github.com/HadrienG/nanoflow) is a de novo assembler of bacterials genomes\nfrom nanopore reads using Nextflow.\n\nDuring the two days the participants developed the pipeline for adapter trimming as well\nas assembly and consensus sequence generation using either\n[Canu](https://github.com/marbl/canu) and [Miniasm](https://github.com/lh3/miniasm).\n\nThe future plans are to finalise the pipeline to include a polishing step and a genome\nannotation step.\n\n### Nextflow AWS Batch integration\n\nNextflow already has experimental support for [AWS Batch](https://aws.amazon.com/batch/)\nand the goal of this project proposed by [Francesco Strozzi](https://github.com/fstrozzi)\nwas to improve this support, add features and test the implementation on real world pipelines.\n\nEarlier work from [Paolo Di Tommaso](https://github.com/pditommaso) in the Nextflow\nrepository, highlighted several challenges to using AWS Batch with Nextflow.\n\nThe major obstacle described by [Tim Dudgeon](https://github.com/tdudgeon) was the requirement\nfor each Docker container to have a version of the Amazon Web Services Command Line tools\n(aws-cli) installed.\n\nA solution was to install the AWS CLI tools on a custom AWS image that is used by the\nDocker host machine, and then mount the directory that contains the necessary items into\neach of the Docker containers as a volume. Early testing suggests this approach works\nwith the hope of providing a more elegant solution in future iterations.\n\nThe code and documentation for AWS Batch has been prepared and will be tested further\nbefore being rolled into an official Nextflow release in the near future.\n\n### Conclusion\n\nThe event was seen as an overwhelming success and special thanks must be made to all the\nparticipants. As the Nextflow community continues to grow, it would be fantastic to make these types\nmeetings more regular occasions.\n\nIn the meantime we have put together a short video containing some of the highlights\nof the two days.\n\nWe hope to see you all again in Barcelona soon or at new events around the world!\n\n\n", + "content": "Last week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation\n(CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as\nwell as two hackathon sessions for more advanced users.\n\nNearly 50 participants attended over the two days which included an entertaining tapas course\nduring the first evening!\n\nOne of the main objectives of the event was to bring together Nextflow users to work\ntogether on common interest projects. There were several proposals for the hackathon\nsessions and in the end five diverse ideas were chosen for communal development ranging from\nnew pipelines through to the addition of new features in Nextflow.\n\nThe proposals and outcomes of each the projects, which can be found in the issues section\nof [this GitHub repository](https://github.com/nextflow-io/hack17), have been summarised below.\n\n### Nextflow HTML tracing reports\n\nThe HTML tracing project aims to generate a rendered version of the Nextflow trace file to\nenable fast sorting and visualisation of task/process execution statistics.\n\nCurrently the data in the trace includes information such as CPU duration, memory usage and\ncompletion status of each task, however wading through the file is often not convenient\nwhen a large number of tasks have been executed.\n\n[Phil Ewels](https://github.com/ewels) proposed the idea and led the coordination effort\nwith the outcome being a very impressive working prototype which can be found in the Nextflow\nbranch `html-trace`.\n\nAn image of the example report is shown below with the interactive HTML available\n[here](/misc/nf-trace-report.html). It is expected to be merged into the main branch of Nextflow\nwith documentation in a near-future release.\n\n![Nextflow HTML execution report](/img/nf-trace-report-min.png)\n\n### Nextflow pipeline for 16S microbial data\n\nThe H3Africa Bioinformatics Network have been developing several pipelines which are used\nacross the participating centers. The diverse computing resources available across the nodes has led to\nmembers wanting workflow solutions with a particular focus on portability.\n\nWith this is mind, Scott Hazelhurst proposed a project for a 16S Microbial data analysis\npipeline which had [previously been developed using CWL](https://github.com/h3abionet/h3abionet16S/tree/master).\n\nThe participants made a new [branch](https://github.com/h3abionet/h3abionet16S/tree/nextflow)\nof the original pipeline and ported it into Nextflow.\n\nThe pipeline will continue to be developed with the goal of acting as a comparison between\nCWL and Nextflow. It is thought this can then be extended to other pipelines by both those\nwho are already familiar with Nextflow as well as used as a tool for training newer users.\n\n### Nextflow modules prototyping\n\n_Toolboxing_ allows users to incorporate software into their pipelines in an efficient and\nreproducible manner. Various software repositories are becoming increasing popular,\nhighlighted by the over 5,000 tools available in the [Galaxy Toolshed](https://toolshed.g2.bx.psu.edu/).\n\nProjects such as [Biocontainers](http://biocontainers.pro/) aim to wrap up the execution\nenvironment using containers. [Myself](https://github.com/skptic) and [Johan Viklund](https://github.com/viklund)\nwished to piggyback off existing repositories and settled on [Dockstore](https://dockstore.org)\nwhich is an open platform compliant with the [GA4GH](http://genomicsandhealth.org) initiative.\n\nThe majority of tools in Dockstore are written in the CWL and therefore we required a parser\nbetween the CWL CommandLineTool class and Nextflow processes. Johan was able to develop\na parser which generates Nextflow processes for several Dockstore tools.\n\nAs these resources such as Dockstore become mature and standardised, it will be\npossible to automatically generate a _Nextflow Store_ and enable efficient incorporation\nof tools into workflows.\n\n\n\n_Example showing a Nextflow process generated from the Dockstore CWL repository for the tool BAMStats._\n\n### Nextflow pipeline for de novo assembly of nanopore reads\n\n[Nanopore sequencing](https://en.wikipedia.org/wiki/Nanopore_sequencing) is an exciting\nand emerging technology which promises to change the landscape of nucleotide sequencing.\n\nWith keen interest in Nanopore specific pipelines, [Hadrien Gourlé](https://github.com/HadrienG)\nlead the hackathon project for _Nanoflow_.\n\n[Nanoflow](https://github.com/HadrienG/nanoflow) is a de novo assembler of bacterials genomes\nfrom nanopore reads using Nextflow.\n\nDuring the two days the participants developed the pipeline for adapter trimming as well\nas assembly and consensus sequence generation using either\n[Canu](https://github.com/marbl/canu) and [Miniasm](https://github.com/lh3/miniasm).\n\nThe future plans are to finalise the pipeline to include a polishing step and a genome\nannotation step.\n\n### Nextflow AWS Batch integration\n\nNextflow already has experimental support for [AWS Batch](https://aws.amazon.com/batch/)\nand the goal of this project proposed by [Francesco Strozzi](https://github.com/fstrozzi)\nwas to improve this support, add features and test the implementation on real world pipelines.\n\nEarlier work from [Paolo Di Tommaso](https://github.com/pditommaso) in the Nextflow\nrepository, highlighted several challenges to using AWS Batch with Nextflow.\n\nThe major obstacle described by [Tim Dudgeon](https://github.com/tdudgeon) was the requirement\nfor each Docker container to have a version of the Amazon Web Services Command Line tools\n(aws-cli) installed.\n\nA solution was to install the AWS CLI tools on a custom AWS image that is used by the\nDocker host machine, and then mount the directory that contains the necessary items into\neach of the Docker containers as a volume. Early testing suggests this approach works\nwith the hope of providing a more elegant solution in future iterations.\n\nThe code and documentation for AWS Batch has been prepared and will be tested further\nbefore being rolled into an official Nextflow release in the near future.\n\n### Conclusion\n\nThe event was seen as an overwhelming success and special thanks must be made to all the\nparticipants. As the Nextflow community continues to grow, it would be fantastic to make these types\nmeetings more regular occasions.\n\nIn the meantime we have put together a short video containing some of the highlights\nof the two days.\n\nWe hope to see you all again in Barcelona soon or at new events around the world!\n\n", "images": [], "author": "Evan Floden", "tags": "nextflow,docker,hackathon" @@ -160,7 +160,7 @@ "slug": "2017/nextflow-nature-biotech-paper", "title": "Nextflow published in Nature Biotechnology", "date": "2017-04-12T00:00:00.000Z", - "content": "\nWe are excited to announce the publication of our work _[Nextflow enables reproducible computational workflows](http://rdcu.be/qZVo)_ in Nature Biotechnology.\n\nThe article provides a description of the fundamental components and principles of Nextflow.\nWe illustrate how the unique combination of containers, pipeline sharing and portable\ndeployment provides tangible advantages to researchers wishing to generate reproducible\ncomputational workflows.\n\nReproducibility is a [major challenge](http://www.nature.com/news/reproducibility-1.17552)\nin today's scientific environment. We show how three bioinformatics data analyses produce\ndifferent results when executed on different execution platforms and how Nextflow, along\nwith software containers, can be used to control numerical stability, enabling consistent\nand replicable results across different computing platforms. As complex omics analyses\nenter the clinical setting, ensuring that results remain stable brings on extra importance.\n\nSince its first release three years ago, the Nextflow user base has grown in an organic fashion.\nFrom the beginning it has been our own demands in a workflow tool and those of our users that\nhave driven the development of Nextflow forward. The publication forms an important milestone\nin the project and we would like to extend a warm thank you to all those who have been early\nusers and contributors.\n\nWe kindly ask if you use Nextflow in your own work to cite the following article:\n\n
\nDi Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017).\nNextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319.\ndoi:10.1038/nbt.3820\n
\n", + "content": "We are excited to announce the publication of our work _[Nextflow enables reproducible computational workflows](http://rdcu.be/qZVo)_ in Nature Biotechnology.\n\nThe article provides a description of the fundamental components and principles of Nextflow.\nWe illustrate how the unique combination of containers, pipeline sharing and portable\ndeployment provides tangible advantages to researchers wishing to generate reproducible\ncomputational workflows.\n\nReproducibility is a [major challenge](http://www.nature.com/news/reproducibility-1.17552)\nin today's scientific environment. We show how three bioinformatics data analyses produce\ndifferent results when executed on different execution platforms and how Nextflow, along\nwith software containers, can be used to control numerical stability, enabling consistent\nand replicable results across different computing platforms. As complex omics analyses\nenter the clinical setting, ensuring that results remain stable brings on extra importance.\n\nSince its first release three years ago, the Nextflow user base has grown in an organic fashion.\nFrom the beginning it has been our own demands in a workflow tool and those of our users that\nhave driven the development of Nextflow forward. The publication forms an important milestone\nin the project and we would like to extend a warm thank you to all those who have been early\nusers and contributors.\n\nWe kindly ask if you use Nextflow in your own work to cite the following article:\n\n
\nDi Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017).\n*Nextflow enables reproducible computational workflows.* Nature Biotechnology, 35(4), 316–319.\n[doi:10.1038/nbt.3820](http://www.nature.com/nbt/journal/v35/n4/full/nbt.3820.html)\n
", "images": [], "author": "Paolo Di Tommaso", "tags": "pipelines,nextflow,genomic,workflow,paper" @@ -169,7 +169,7 @@ "slug": "2017/nextflow-workshop", "title": "Nextflow workshop is coming!", "date": "2017-04-26T00:00:00.000Z", - "content": "\nWe are excited to announce the first Nextflow workshop that will take place at the\nBarcelona Biomedical Research Park building ([PRBB](https://www.prbb.org/)) on 14-15th September 2017.\n\nThis event is open to everybody who is interested in the problem of computational workflow\nreproducibility. Leading experts and users will discuss the current state of the Nextflow\ntechnology and how it can be applied to manage -omics analyses in a reproducible manner.\nBest practices will be introduced on how to deploy real-world large-scale genomic\napplications for precision medicine.\n\nDuring the hackathon, organized for the second day, participants will have the\nopportunity to learn how to write self-contained, replicable data analysis\npipelines along with Nextflow expert developers.\n\nMore details at [this link](http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics).\nThe registration form is [available here](http://apps.crg.es/content/internet/events/webforms/17502) (deadline 15th Jun).\n\n### Schedule (draft)\n\n#### Thursday, 14 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
10.00Welcome & introduction
\n Cedric Notredame
\n Comparative Bioinformatics, CRG, Spain
10.15Nextflow: a quick review
\n Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain
10.30Standardising Swedish genomics analyses using Nextflow
\n Phil Ewels
\n National Genomics Infrastructure, SciLifeLab, Sweden
\n
11.00Building Pipelines to Support African Bioinformatics: the H3ABioNet Pipelines Project
\n Scott Hazelhurst
\n University of the Witwatersrand, Johannesburg, South Africa
\n
11.30coffee break\n
12.00Using Nextflow for Large Scale Benchmarking of Phylogenetic methods and tools
\n Frédéric Lemoine
\n Evolutionary Bioinformatics, Institut Pasteur, France
\n
12.30Nextflow for chemistry - crossing the divide
\n Tim Dudgeon
\n Informatics Matters Ltd, UK
\n
12.50From zero to Nextflow @ CRG's Biocore
\n Luca Cozzuto
\n Bioinformatics Core Facility, CRG, Spain
\n
13.10(to be determined)
13.30Lunch
14.30
18.30
Hackathon & course
\n\n#### Friday, 15 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
9.30Computational workflows for omics analyses at the IARC
\n Matthieu Foll
\n International Agency for Research on Cancer (IARC), France
10.00Medical Genetics at Oslo University Hospital
\n Hugues Fontanelle
\n Oslo University Hospital, Norway
10.30Inside-Out: reproducible analysis of external data, inside containers with Nextflow
\n Evan Floden
\n Comparative Bioinformatics, CRG, Spain
11.00coffee break
11.30(title to be defined)
\n Johnny Wu
\n Roche Sequencing, Pleasanton, USA
12.00Standardizing life sciences datasets to improve studies reproducibility in the EOSC
\n Jordi Rambla
\n European Genome-Phenome Archive, CRG
12.20Unbounded by Economics
\n Brendan Bouffler
\n AWS Research Cloud Program, UK
12.40Challenges with large-scale portable computational workflows
\n Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain
13.00Lunch
14.00
18.00
Hackathon
\n\n
\nSee you in Barcelona!\n\n![Nextflow workshop](/img/nf-workshop.png)\n", + "content": "We are excited to announce the first Nextflow workshop that will take place at the\nBarcelona Biomedical Research Park building ([PRBB](https://www.prbb.org/)) on 14-15th September 2017.\n\nThis event is open to everybody who is interested in the problem of computational workflow\nreproducibility. Leading experts and users will discuss the current state of the Nextflow\ntechnology and how it can be applied to manage -omics analyses in a reproducible manner.\nBest practices will be introduced on how to deploy real-world large-scale genomic\napplications for precision medicine.\n\nDuring the hackathon, organized for the second day, participants will have the\nopportunity to learn how to write self-contained, replicable data analysis\npipelines along with Nextflow expert developers.\n\nMore details at [this link](http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics).\nThe registration form is [available here](http://apps.crg.es/content/internet/events/webforms/17502) (deadline 15th Jun).\n\n### Schedule (draft)\n\n#### Thursday, 14 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
10.00Welcome & introduction
\n *Cedric Notredame
\n Comparative Bioinformatics, CRG, Spain*
10.15Nextflow: a quick review
\n *Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain*
10.30Standardising Swedish genomics analyses using Nextflow
\n *Phil Ewels
\n National Genomics Infrastructure, SciLifeLab, Sweden*\n
11.00Building Pipelines to Support African Bioinformatics: the H3ABioNet Pipelines Project
\n *Scott Hazelhurst
\n University of the Witwatersrand, Johannesburg, South Africa*\n
11.30coffee break\n
12.00Using Nextflow for Large Scale Benchmarking of Phylogenetic methods and tools
\n *Frédéric Lemoine
\n Evolutionary Bioinformatics, Institut Pasteur, France*\n
12.30Nextflow for chemistry - crossing the divide
\n *Tim Dudgeon
\n Informatics Matters Ltd, UK*\n
12.50From zero to Nextflow @ CRG's Biocore
\n *Luca Cozzuto
\n Bioinformatics Core Facility, CRG, Spain*\n
13.10(to be determined)
13.30Lunch
14.30
18.30
Hackathon & course
\n\n#### Friday, 15 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
9.30Computational workflows for omics analyses at the IARC
\n *Matthieu Foll
\n International Agency for Research on Cancer (IARC), France*
10.00Medical Genetics at Oslo University Hospital
\n *Hugues Fontanelle
\n Oslo University Hospital, Norway*
10.30Inside-Out: reproducible analysis of external data, inside containers with Nextflow
\n *Evan Floden
\n Comparative Bioinformatics, CRG, Spain*
11.00coffee break
11.30(title to be defined)
\n *Johnny Wu
\n Roche Sequencing, Pleasanton, USA*
12.00Standardizing life sciences datasets to improve studies reproducibility in the EOSC
\n *Jordi Rambla
\n European Genome-Phenome Archive, CRG*
12.20Unbounded by Economics
\n *Brendan Bouffler
\n AWS Research Cloud Program, UK*
12.40Challenges with large-scale portable computational workflows
\n *Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain*
13.00Lunch
14.00
18.00
Hackathon
\n\n
\nSee you in Barcelona!\n\n![Nextflow workshop](/img/nf-workshop.png)", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,genomic,workflow,reproducibility,workshop," @@ -178,7 +178,7 @@ "slug": "2017/scaling-with-aws-batch", "title": "Scaling with AWS Batch", "date": "2017-11-08T00:00:00.000Z", - "content": "\nThe latest Nextflow release (0.26.0) includes built-in support for [AWS Batch](https://aws.amazon.com/batch/),\na managed computing service that allows the execution of containerised workloads\nover the Amazon EC2 Container Service (ECS).\n\nThis feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading\nthe process executions as managed Batch jobs. The service takes care to spin up the required\ncomputing instances on-demand, scaling up and down the number and composition of the instances\nto best accommodate the actual workload resource needs at any point in time.\n\nAWS Batch shares with Nextflow the same vision regarding workflow containerisation\ni.e. each compute task is executed in its own Docker container. This dramatically\nsimplifies the workflow deployment through the download of a few container images.\nThis common design background made the support for AWS Batch a natural extension for Nextflow.\n\n### Batch in a nutshell\n\nBatch is organised in _Compute Environments_, _Job queues_, _Job definitions_ and _Jobs_.\n\nThe _Compute Environment_ allows you to define the computing resources required for a specific workload (type).\nYou can specify the minimum and maximum number of CPUs that can be allocated,\nthe EC2 provisioning model (On-demand or Spot), the AMI to be used and the allowed instance types.\n\nThe _Job queue_ definition allows you to bind a specific task to one or more Compute Environments.\n\nThen, the _Job definition_ is a template for one or more jobs in your workload. This is required\nto specify the Docker image to be used in running a particular task along with other requirements\nsuch as the container mount points, the number of CPUs, the amount of memory and the number of\nretries in case of job failure.\n\nFinally the _Job_ binds a Job definition to a specific Job queue\nand allows you to specify the actual task command to be executed in the container.\n\nThe job input and output data management is delegated to the user. This means that if you\nonly use Batch API/tools you will need to take care to stage the input data from a S3 bucket\n(or a different source) and upload the results to a persistent storage location.\n\nThis could turn out to be cumbersome in complex workflows with a large number of\ntasks and above all it makes it difficult to deploy the same applications across different\ninfrastructure.\n\n### How to use Batch with Nextflow\n\nNextflow streamlines the use of AWS Batch by smoothly integrating it in its workflow processing\nmodel and enabling transparent interoperability with other systems.\n\nTo run Nextflow you will need to set-up in your AWS Batch account a [Compute Environment](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html)\ndefining the required computing resources and associate it to a [Job Queue](http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html).\n\nNextflow takes care to create the required _Job Definitions_ and _Job_ requests as needed.\nThis spares some Batch configurations steps.\n\nIn the `nextflow.config`, file specify the `awsbatch` executor, the Batch `queue` and\nthe container to be used in the usual manner. You may also need to specify the AWS region\nand access credentials if they are not provided by other means. For example:\n\n process.executor = 'awsbatch'\n process.queue = 'my-batch-queue'\n process.container = your-org/your-docker:image\n aws.region = 'eu-west-1'\n aws.accessKey = 'xxx'\n aws.secretKey = 'yyy'\n\nEach process can eventually use a different queue and Docker image (see Nextflow documentation for details).\nThe container image(s) must be published in a Docker registry that is accessible from the\ninstances run by AWS Batch eg. [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/)\nor [ECS Container Registry](https://aws.amazon.com/ecr/).\n\nThe Nextflow process can be launched either in a local computer or a EC2 instance.\nThe latter is suggested for heavy or long running workloads.\n\nNote that input data should be stored in the S3 storage. In the same manner\nthe pipeline execution must specify a S3 bucket as a working directory by using the `-w` command line option.\n\nA final caveat about custom containers and computing AMI. Nextflow automatically stages input\ndata and shares tasks intermediate results by using the S3 bucket specified as a work directory.\nFor this reason it needs to use the `aws` command line tool which must be installed either\nin your process container or be present in a custom AMI that can be mounted and accessed\nby the Docker containers.\n\nYou may also need to create a custom AMI because the default image used by AWS Batch only\nprovides 22 GB of storage which may not be enough for real world analysis pipelines.\n\nSee the documentation to learn [how to create a custom AMI](/docs/latest/awscloud.html#custom-ami)\nwith larger storage and how to setup the AWS CLI tools.\n\n### An example\n\nIn order to validate Nextflow integration with AWS Batch, we used a simple RNA-Seq pipeline.\n\nThis pipeline takes as input a metadata file from the Encode project corresponding to a [search\nreturning all human RNA-seq paired-end datasets](https://www.encodeproject.org/search/?type=Experiment&award.project=ENCODE&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&files.file_type=fastq&files.run_type=paired-ended&replicates.library.nucleic_acid_term_name=RNA&replicates.library.depleted_in_term_name=rRNA)\n(the metadata file has been additionally filtered to retain only data having a SRA ID).\n\nThe pipeline automatically downloads the FASTQ files for each sample from the EBI ENA database,\nit assesses the overall quality of sequencing data using FastQC and then runs [Salmon](https://combine-lab.github.io/salmon/)\nto perform the quantification over the human transcript sequences. Finally all the QC and\nquantification outputs are summarised using the [MultiQC](http://multiqc.info/) tool.\n\nFor the sake of this benchmark we used the first 38 samples out of the full 375 samples dataset.\n\nThe pipeline was executed both on AWS Batch cloud and in the CRG internal Univa cluster,\nusing [Singularity](/blog/2016/more-fun-containers-hpc.html) as containers runtime.\n\nIt's worth noting that with the exception of the two configuration changes detailed below,\nwe used exactly the same pipeline implementation at [this GitHub repository](https://github.com/nextflow-io/rnaseq-encode-nf).\n\nThe AWS deploy used the following configuration profile:\n\n aws.region = 'eu-west-1'\n aws.client.storageEncryption = 'AES256'\n process.queue = 'large'\n executor.name = 'awsbatch'\n executor.awscli = '/home/ec2-user/miniconda/bin/aws'\n\nWhile for the cluster deployment the following configuration was used:\n\n executor = 'crg'\n singularity.enabled = true\n process.container = \"docker://nextflow/rnaseq-nf\"\n process.queue = 'cn-el7'\n process.time = '90 min'\n process.$quant.time = '4.5 h'\n\n### Results\n\nThe AWS Batch Compute environment was configured to use a maximum of 132 CPUs as the number of CPUs\nthat were available in the queue for local cluster deployment.\n\nThe two executions ran in roughly the same time: 2 hours and 24 minutes when running in the\nCRG cluster and 2 hours and 37 minutes when using AWS Batch.\n\nIt must be noted that 14 jobs failed in the Batch deployment, presumably because one or more spot\ninstances were retired. However Nextflow was able to re-schedule the failed jobs automatically\nand the overall pipeline execution completed successfully, also showing the benefits of a truly\nfault tolerant environment.\n\nThe overall cost for running the pipeline with AWS Batch was **$5.47** ($ 3.28 for EC2 instances,\n$1.88 for EBS volume and $0.31 for S3 storage). This means that with ~ $55 we could have\nperformed the same analysis on the full Encode dataset.\n\nIt is more difficult to estimate the cost when using the internal cluster, because we don't\nhave access to such detailed cost accounting. However, as a user, we can estimate it roughly\ncomes out at $0.01 per CPU-Hour. The pipeline needed around 147 CPU-Hour to carry out the analysis,\nhence with an estimated cost of **$1.47** just for the computation.\n\nThe execution report for the Batch execution is available at [this link](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/aws-batch/report.html)\nand the one for cluster is available [here](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/crg-cluster/report.html).\n\n### Conclusion\n\nThis post shows how Nextflow integrates smoothly with AWS Batch and how it can be used to\ndeploy and execute real world genomics pipeline in the cloud with ease.\n\nThe auto-scaling ability provided by AWS Batch along with the use of spot instances make\nthe use of the cloud even more cost effective. Running on a local cluster may still be cheaper,\neven if it is non trivial to account for all the real costs of a HPC infrastructure.\nHowever the cloud allows flexibility and scalability not possible with common on-premises clusters.\n\nWe also demonstrate how the same Nextflow pipeline can be _transparently_ deployed in two very\ndifferent computing infrastructure, using different containerisation technologies by simply\nproviding a separate configuration profile.\n\nThis approach enables the interoperability across different deployment sites, reduces\noperational and maintenance costs and guarantees consistent results over time.\n\n### Credits\n\nThis post is co-authored with [Francesco Strozzi](https://twitter.com/fstrozzi),\nwho also helped to write the pipeline used for the benchmark in this post and contributed\nto and tested the AWS Batch integration. Thanks to [Emilio Palumbo](https://github.com/emi80)\nthat helped to set-up and configure the AWS Batch environment and [Evan Floden](https://gitter.im/skptic)\nfor the comments.\n", + "content": "The latest Nextflow release (0.26.0) includes built-in support for [AWS Batch](https://aws.amazon.com/batch/),\na managed computing service that allows the execution of containerised workloads\nover the Amazon EC2 Container Service (ECS).\n\nThis feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading\nthe process executions as managed Batch jobs. The service takes care to spin up the required\ncomputing instances on-demand, scaling up and down the number and composition of the instances\nto best accommodate the actual workload resource needs at any point in time.\n\nAWS Batch shares with Nextflow the same vision regarding workflow containerisation\ni.e. each compute task is executed in its own Docker container. This dramatically\nsimplifies the workflow deployment through the download of a few container images.\nThis common design background made the support for AWS Batch a natural extension for Nextflow.\n\n### Batch in a nutshell\n\nBatch is organised in _Compute Environments_, _Job queues_, _Job definitions_ and _Jobs_.\n\nThe _Compute Environment_ allows you to define the computing resources required for a specific workload (type).\nYou can specify the minimum and maximum number of CPUs that can be allocated,\nthe EC2 provisioning model (On-demand or Spot), the AMI to be used and the allowed instance types.\n\nThe _Job queue_ definition allows you to bind a specific task to one or more Compute Environments.\n\nThen, the _Job definition_ is a template for one or more jobs in your workload. This is required\nto specify the Docker image to be used in running a particular task along with other requirements\nsuch as the container mount points, the number of CPUs, the amount of memory and the number of\nretries in case of job failure.\n\nFinally the _Job_ binds a Job definition to a specific Job queue\nand allows you to specify the actual task command to be executed in the container.\n\nThe job input and output data management is delegated to the user. This means that if you\nonly use Batch API/tools you will need to take care to stage the input data from a S3 bucket\n(or a different source) and upload the results to a persistent storage location.\n\nThis could turn out to be cumbersome in complex workflows with a large number of\ntasks and above all it makes it difficult to deploy the same applications across different\ninfrastructure.\n\n### How to use Batch with Nextflow\n\nNextflow streamlines the use of AWS Batch by smoothly integrating it in its workflow processing\nmodel and enabling transparent interoperability with other systems.\n\nTo run Nextflow you will need to set-up in your AWS Batch account a [Compute Environment](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html)\ndefining the required computing resources and associate it to a [Job Queue](http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html).\n\nNextflow takes care to create the required _Job Definitions_ and _Job_ requests as needed.\nThis spares some Batch configurations steps.\n\nIn the `nextflow.config`, file specify the `awsbatch` executor, the Batch `queue` and\nthe container to be used in the usual manner. You may also need to specify the AWS region\nand access credentials if they are not provided by other means. For example:\n\n process.executor = 'awsbatch'\n process.queue = 'my-batch-queue'\n process.container = your-org/your-docker:image\n aws.region = 'eu-west-1'\n aws.accessKey = 'xxx'\n aws.secretKey = 'yyy'\n\nEach process can eventually use a different queue and Docker image (see Nextflow documentation for details).\nThe container image(s) must be published in a Docker registry that is accessible from the\ninstances run by AWS Batch eg. [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/)\nor [ECS Container Registry](https://aws.amazon.com/ecr/).\n\nThe Nextflow process can be launched either in a local computer or a EC2 instance.\nThe latter is suggested for heavy or long running workloads.\n\nNote that input data should be stored in the S3 storage. In the same manner\nthe pipeline execution must specify a S3 bucket as a working directory by using the `-w` command line option.\n\nA final caveat about custom containers and computing AMI. Nextflow automatically stages input\ndata and shares tasks intermediate results by using the S3 bucket specified as a work directory.\nFor this reason it needs to use the `aws` command line tool which must be installed either\nin your process container or be present in a custom AMI that can be mounted and accessed\nby the Docker containers.\n\nYou may also need to create a custom AMI because the default image used by AWS Batch only\nprovides 22 GB of storage which may not be enough for real world analysis pipelines.\n\nSee the documentation to learn [how to create a custom AMI](/docs/latest/awscloud.html#custom-ami)\nwith larger storage and how to setup the AWS CLI tools.\n\n### An example\n\nIn order to validate Nextflow integration with AWS Batch, we used a simple RNA-Seq pipeline.\n\nThis pipeline takes as input a metadata file from the Encode project corresponding to a [search\nreturning all human RNA-seq paired-end datasets](https://www.encodeproject.org/search/?type=Experiment&award.project=ENCODE&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&files.file_type=fastq&files.run_type=paired-ended&replicates.library.nucleic_acid_term_name=RNA&replicates.library.depleted_in_term_name=rRNA)\n(the metadata file has been additionally filtered to retain only data having a SRA ID).\n\nThe pipeline automatically downloads the FASTQ files for each sample from the EBI ENA database,\nit assesses the overall quality of sequencing data using FastQC and then runs [Salmon](https://combine-lab.github.io/salmon/)\nto perform the quantification over the human transcript sequences. Finally all the QC and\nquantification outputs are summarised using the [MultiQC](http://multiqc.info/) tool.\n\nFor the sake of this benchmark we used the first 38 samples out of the full 375 samples dataset.\n\nThe pipeline was executed both on AWS Batch cloud and in the CRG internal Univa cluster,\nusing [Singularity](/blog/2016/more-fun-containers-hpc.html) as containers runtime.\n\nIt's worth noting that with the exception of the two configuration changes detailed below,\nwe used exactly the same pipeline implementation at [this GitHub repository](https://github.com/nextflow-io/rnaseq-encode-nf).\n\nThe AWS deploy used the following configuration profile:\n\n aws.region = 'eu-west-1'\n aws.client.storageEncryption = 'AES256'\n process.queue = 'large'\n executor.name = 'awsbatch'\n executor.awscli = '/home/ec2-user/miniconda/bin/aws'\n\nWhile for the cluster deployment the following configuration was used:\n\n executor = 'crg'\n singularity.enabled = true\n process.container = \"docker://nextflow/rnaseq-nf\"\n process.queue = 'cn-el7'\n process.time = '90 min'\n process.$quant.time = '4.5 h'\n\n### Results\n\nThe AWS Batch Compute environment was configured to use a maximum of 132 CPUs as the number of CPUs\nthat were available in the queue for local cluster deployment.\n\nThe two executions ran in roughly the same time: 2 hours and 24 minutes when running in the\nCRG cluster and 2 hours and 37 minutes when using AWS Batch.\n\nIt must be noted that 14 jobs failed in the Batch deployment, presumably because one or more spot\ninstances were retired. However Nextflow was able to re-schedule the failed jobs automatically\nand the overall pipeline execution completed successfully, also showing the benefits of a truly\nfault tolerant environment.\n\nThe overall cost for running the pipeline with AWS Batch was **$5.47** ($ 3.28 for EC2 instances,\n$1.88 for EBS volume and $0.31 for S3 storage). This means that with ~ $55 we could have\nperformed the same analysis on the full Encode dataset.\n\nIt is more difficult to estimate the cost when using the internal cluster, because we don't\nhave access to such detailed cost accounting. However, as a user, we can estimate it roughly\ncomes out at $0.01 per CPU-Hour. The pipeline needed around 147 CPU-Hour to carry out the analysis,\nhence with an estimated cost of **$1.47** just for the computation.\n\nThe execution report for the Batch execution is available at [this link](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/aws-batch/report.html)\nand the one for cluster is available [here](https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/crg-cluster/report.html).\n\n### Conclusion\n\nThis post shows how Nextflow integrates smoothly with AWS Batch and how it can be used to\ndeploy and execute real world genomics pipeline in the cloud with ease.\n\nThe auto-scaling ability provided by AWS Batch along with the use of spot instances make\nthe use of the cloud even more cost effective. Running on a local cluster may still be cheaper,\neven if it is non trivial to account for all the real costs of a HPC infrastructure.\nHowever the cloud allows flexibility and scalability not possible with common on-premises clusters.\n\nWe also demonstrate how the same Nextflow pipeline can be _transparently_ deployed in two very\ndifferent computing infrastructure, using different containerisation technologies by simply\nproviding a separate configuration profile.\n\nThis approach enables the interoperability across different deployment sites, reduces\noperational and maintenance costs and guarantees consistent results over time.\n\n### Credits\n\nThis post is co-authored with [Francesco Strozzi](https://twitter.com/fstrozzi),\nwho also helped to write the pipeline used for the benchmark in this post and contributed\nto and tested the AWS Batch integration. Thanks to [Emilio Palumbo](https://github.com/emi80)\nthat helped to set-up and configure the AWS Batch environment and [Evan Floden](https://gitter.im/skptic)\nfor the comments.", "images": [], "author": "Paolo Di Tommaso", "tags": "pipelines,nextflow,genomic,workflow,aws,batch" @@ -187,7 +187,7 @@ "slug": "2018/bringing-nextflow-to-google-cloud-wuxinextcode", "title": "Bringing Nextflow to Google Cloud Platform with WuXi NextCODE", "date": "2018-12-18T00:00:00.000Z", - "content": "\n
\nThis is a guest post authored by Halli Bjornsson, Head of Product Development Operations at WuXi NextCODE and Jonathan Sheffi, Product Manager, Biomedical Data at Google Cloud.\n\n
\n\nGoogle Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.\n\nWuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the [Google Pipelines API](https://cloud.google.com/genomics/pipelines). Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.\n\n
\n
\n \n
\n
\n \n
\n
\n\n\nNextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud.\n\n### Get started!\n\nThis feature is currently available in the Nextflow edge channel. Follow these steps to get started:\n\n- Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:\n\n ```\n export NXF_VER=18.12.0-edge\n export NXF_MODE=google\n curl https://get.nextflow.io | bash\n ```\n\n- [Enable the Google Genomics API for your GCP projects](https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com).\n\n- [Download and set credentials for your Genomics API-enabled project](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).\n\n- Change your `nextflow.config` file to use the Google Pipelines executor and specify the required config values for it as [described in the documentation](/docs/edge/google.html#google-pipelines).\n\n- Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the `-work-dir` option. For example:\n\n ```\n nextflow run rnaseq-nf -work-dir gs://your-bucket/scratch\n ```\n\n
\nYou can find more detailed info about available configuration settings and deployment options at [this link](/docs/edge/google.html).\n\nWe’re thrilled to make this contribution available to the Nextflow community!\n", + "content": "
\n*This is a guest post authored by Halli Bjornsson, Head of Product Development Operations at WuXi NextCODE and Jonathan Sheffi, Product Manager, Biomedical Data at Google Cloud.\n*\n
\n\nGoogle Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.\n\nWuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the [Google Pipelines API](https://cloud.google.com/genomics/pipelines). Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.\n\n
\n
\n \n
\n
\n \n
\n
\n\n\nNextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud.\n\n### Get started!\n\nThis feature is currently available in the Nextflow edge channel. Follow these steps to get started:\n\n- Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:\n\n ```\n export NXF_VER=18.12.0-edge\n export NXF_MODE=google\n curl https://get.nextflow.io | bash\n ```\n\n- [Enable the Google Genomics API for your GCP projects](https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com).\n\n- [Download and set credentials for your Genomics API-enabled project](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).\n\n- Change your `nextflow.config` file to use the Google Pipelines executor and specify the required config values for it as [described in the documentation](/docs/edge/google.html#google-pipelines).\n\n- Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the `-work-dir` option. For example:\n\n ```\n nextflow run rnaseq-nf -work-dir gs://your-bucket/scratch\n ```\n\n
\nYou can find more detailed info about available configuration settings and deployment options at [this link](/docs/edge/google.html).\n\nWe’re thrilled to make this contribution available to the Nextflow community!", "images": [ "/img/google-cloud.svg", "/img/wuxi-nextcode.jpeg" @@ -199,7 +199,7 @@ "slug": "2018/clarification-about-nextflow-license", "title": "Clarification about the Nextflow license", "date": "2018-07-20T00:00:00.000Z", - "content": "\nOver past week there was some discussion on social media regarding the Nextflow license\nand its impact on users' workflow applications.\n\n

… don’t use Nextflow, yo. https://t.co/Paip5W1wgG

— Konrad Rudolph 👨‍🔬💻 (@klmr) July 10, 2018
\n\n\n

This is certainly disappointing. An argument in favor of writing workflows in @commonwl, which is independent of the execution engine. https://t.co/mIbdLQQxmf

— John Didion (@jdidion) July 10, 2018
\n\n\n

GPL is generally considered toxic to companies due to fear of the viral nature of the license.

— Jeff Gentry (@geoffjentry) July 10, 2018
\n\n\n### What's the problem with GPL?\n\nNextflow has been released under the GPLv3 license since its early days [over 5 years ago](https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy).\nGPL is a very popular open source licence used by many projects\n(like, for example, [Linux](https://www.kernel.org/doc/html/v4.17/process/license-rules.html) and [Git](https://git-scm.com/about/free-and-open-source))\nand it has been designed to promote the adoption and spread of open source software and culture.\n\nWith this idea in mind, GPL requires the author of a piece of software, _derived_ from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.\n\nThis is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.\n\nHowever, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example\n[this post by Titus Brown](http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html) at this regard.\n\n#### What's the impact of the Nextflow license on my application?\n\nIf you are not distributing your application, based on Nextflow, it doesn't affect you in any way.\nIf you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See [here](https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic) and [here](https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL).\n\nHowever, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though.\n\n\n### That's all?\n\nNo. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.\n\nWe are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.\n", + "content": "Over past week there was some discussion on social media regarding the Nextflow license\nand its impact on users' workflow applications.\n\n> … don’t use Nextflow, yo. [https://t.co/Paip5W1wgG](https://t.co/Paip5W1wgG)\n> \n> — Konrad Rudolph 👨‍🔬💻 (@klmr) [July 10, 2018](https://twitter.com/klmr/status/1016606226103357440?ref_src=twsrc%5Etfw)\n\n\n\n> This is certainly disappointing. An argument in favor of writing workflows in [@commonwl](https://twitter.com/commonwl?ref_src=twsrc%5Etfw), which is independent of the execution engine. [https://t.co/mIbdLQQxmf](https://t.co/mIbdLQQxmf)\n> \n> — John Didion (@jdidion) [July 10, 2018](https://twitter.com/jdidion/status/1016612435938160640?ref_src=twsrc%5Etfw)\n\n\n\n> GPL is generally considered toxic to companies due to fear of the viral nature of the license.\n> \n> — Jeff Gentry (@geoffjentry) [July 10, 2018](https://twitter.com/geoffjentry/status/1016656901139025921?ref_src=twsrc%5Etfw)\n\n\n\n### What's the problem with GPL?\n\nNextflow has been released under the GPLv3 license since its early days [over 5 years ago](https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy).\nGPL is a very popular open source licence used by many projects\n(like, for example, [Linux](https://www.kernel.org/doc/html/v4.17/process/license-rules.html) and [Git](https://git-scm.com/about/free-and-open-source))\nand it has been designed to promote the adoption and spread of open source software and culture.\n\nWith this idea in mind, GPL requires the author of a piece of software, _derived_ from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.\n\nThis is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.\n\nHowever, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example\n[this post by Titus Brown](http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html) at this regard.\n\n#### What's the impact of the Nextflow license on my application?\n\nIf you are not distributing your application, based on Nextflow, it doesn't affect you in any way.\nIf you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See [here](https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic) and [here](https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL).\n\n**However, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though.\n**\n\n### That's all?\n\nNo. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.\n\nWe are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,gpl,license" @@ -208,7 +208,7 @@ "slug": "2018/conda-support-has-landed", "title": "Conda support has landed!", "date": "2018-06-05T00:00:00.000Z", - "content": "\nNextflow aims to ease the development of large scale, reproducible workflows allowing\ndevelopers to focus on the main application logic and to rely on best community tools and\nbest practices.\n\nFor this reason we are very excited to announce that the latest Nextflow version (`0.30.0`) finally\nprovides built-in support for [Conda](https://conda.io/docs/).\n\nConda is a popular package manager that simplifies the installation of software packages\nand the configuration of complex software environments. Above all, it provides access to large\ntool and software package collections maintained by domain specific communities such as\n[Bioconda](https://bioconda.github.io) and [BioBuild](https://biobuilds.org/).\n\nThe native integration with Nextflow allows researchers to develop workflow applications\nin a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the\nconfiguration flexibility, portability and scalability provided by Nextflow.\n\n### How it works\n\nNextflow automatically creates and activates the Conda environment(s) given the dependencies\nspecified by each process.\n\nDependencies are specified by using the [conda](/docs/latest/process.html#conda) directive,\nproviding either the names of the required Conda packages, the path of a Conda environment yaml\nfile or the path of an existing Conda environment directory.\n\nConda environments are stored on the file system. By default Nextflow instructs Conda to save\nthe required environments in the pipeline work directory. You can specify the directory where the\nConda environments are stored using the `conda.cacheDir` configuration property.\n\n#### Use Conda package names\n\nThe simplest way to use one or more Conda packages consists in specifying their names using the `conda` directive.\nMultiple package names can be specified by separating them with a space. For example:\n\n```\nprocess foo {\n conda \"bwa samtools multiqc\"\n\n \"\"\"\n your_command --here\n \"\"\"\n}\n```\n\nUsing the above definition a Conda environment that includes BWA, Samtools and MultiQC tools\nis created and activated when the process is executed.\n\nThe usual Conda package syntax and naming conventions can be used. The version of a package can be\nspecified after the package name as shown here: `bwa=0.7.15`.\n\nThe name of the channel where a package is located can be specified prefixing the package with\nthe channel name as shown here: `bioconda::bwa=0.7.15`.\n\n#### Use Conda environment files\n\nWhen working in a project requiring a large number of dependencies it can be more convenient\nto consolidate all required tools using a Conda environment file. This is a file that\nlists the required packages and channels, structured using the YAML format. For example:\n\n```\nname: my-env\nchannels:\n - bioconda\n - conda-forge\n - defaults\ndependencies:\n - star=2.5.4a\n - bwa=0.7.15\n```\n\nThe path of the environment file can be specified using the `conda` directive:\n\n```\nprocess foo {\n conda '/some/path/my-env.yaml'\n\n '''\n your_command --here\n '''\n}\n```\n\nNote: the environment file name **must** end with a `.yml` or `.yaml` suffix otherwise\nit won't be properly recognized. Also relative paths are resolved against the workflow\nlaunching directory.\n\nThe suggested approach is to store the the Conda environment file in your project root directory\nand reference it in the `nextflow.config` directory using the `baseDir` variable as shown below:\n\n```\nprocess.conda = \"$baseDir/my-env.yaml\"\n```\n\nThis guarantees that the environment paths is correctly resolved independently of the execution path.\n\nSee the [documentation](/docs/latest/conda.html) for more details on how to configure and\nuse Conda environments in your Nextflow workflow.\n\n### Bonus!\n\nThis release includes also a better support for [Biocontainers](https://biocontainers.pro/). So far,\nNextflow users were able to use container images provided by the Biocontainers community. However,\nit was not possible to collect process metrics and runtime statistics within those images due to the usage\nof a legacy version of the `ps` system tool that is not compatible with the one expected by Nextflow.\n\nThe latest version of Nextflow does not require the `ps` tool any more to fetch execution metrics\nand runtime statistics, therefore this information is collected and correctly reported when using Biocontainers\nimages.\n\n### Conclusion\n\nWe are very excited by this new feature bringing the ability to use popular Conda tool collections,\nsuch as Bioconda, directly into Nextflow workflow applications.\n\nNextflow developers have now yet another option to transparently manage the dependencies in their\nworkflows along with [Environment Modules](/docs/latest/process.html#module) and [containers](/docs/latest/docker.html)\n[technology](/docs/latest/singularity.html), giving them great configuration flexibility.\n\nThe resulting workflow applications can easily be reconfigured and deployed across a range of different\nplatforms choosing the best technology according to the requirements of the target system.\n", + "content": "Nextflow aims to ease the development of large scale, reproducible workflows allowing\ndevelopers to focus on the main application logic and to rely on best community tools and\nbest practices.\n\nFor this reason we are very excited to announce that the latest Nextflow version (`0.30.0`) finally\nprovides built-in support for [Conda](https://conda.io/docs/).\n\nConda is a popular package manager that simplifies the installation of software packages\nand the configuration of complex software environments. Above all, it provides access to large\ntool and software package collections maintained by domain specific communities such as\n[Bioconda](https://bioconda.github.io) and [BioBuild](https://biobuilds.org/).\n\nThe native integration with Nextflow allows researchers to develop workflow applications\nin a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the\nconfiguration flexibility, portability and scalability provided by Nextflow.\n\n### How it works\n\nNextflow automatically creates and activates the Conda environment(s) given the dependencies\nspecified by each process.\n\nDependencies are specified by using the [conda](/docs/latest/process.html#conda) directive,\nproviding either the names of the required Conda packages, the path of a Conda environment yaml\nfile or the path of an existing Conda environment directory.\n\nConda environments are stored on the file system. By default Nextflow instructs Conda to save\nthe required environments in the pipeline work directory. You can specify the directory where the\nConda environments are stored using the `conda.cacheDir` configuration property.\n\n#### Use Conda package names\n\nThe simplest way to use one or more Conda packages consists in specifying their names using the `conda` directive.\nMultiple package names can be specified by separating them with a space. For example:\n\n```\nprocess foo {\n conda \"bwa samtools multiqc\"\n\n \"\"\"\n your_command --here\n \"\"\"\n}\n```\n\nUsing the above definition a Conda environment that includes BWA, Samtools and MultiQC tools\nis created and activated when the process is executed.\n\nThe usual Conda package syntax and naming conventions can be used. The version of a package can be\nspecified after the package name as shown here: `bwa=0.7.15`.\n\nThe name of the channel where a package is located can be specified prefixing the package with\nthe channel name as shown here: `bioconda::bwa=0.7.15`.\n\n#### Use Conda environment files\n\nWhen working in a project requiring a large number of dependencies it can be more convenient\nto consolidate all required tools using a Conda environment file. This is a file that\nlists the required packages and channels, structured using the YAML format. For example:\n\n```\nname: my-env\nchannels:\n - bioconda\n - conda-forge\n - defaults\ndependencies:\n - star=2.5.4a\n - bwa=0.7.15\n```\n\nThe path of the environment file can be specified using the `conda` directive:\n\n```\nprocess foo {\n conda '/some/path/my-env.yaml'\n\n '''\n your_command --here\n '''\n}\n```\n\nNote: the environment file name **must** end with a `.yml` or `.yaml` suffix otherwise\nit won't be properly recognized. Also relative paths are resolved against the workflow\nlaunching directory.\n\nThe suggested approach is to store the the Conda environment file in your project root directory\nand reference it in the `nextflow.config` directory using the `baseDir` variable as shown below:\n\n```\nprocess.conda = \"$baseDir/my-env.yaml\"\n```\n\nThis guarantees that the environment paths is correctly resolved independently of the execution path.\n\nSee the [documentation](/docs/latest/conda.html) for more details on how to configure and\nuse Conda environments in your Nextflow workflow.\n\n### Bonus!\n\nThis release includes also a better support for [Biocontainers](https://biocontainers.pro/). So far,\nNextflow users were able to use container images provided by the Biocontainers community. However,\nit was not possible to collect process metrics and runtime statistics within those images due to the usage\nof a legacy version of the `ps` system tool that is not compatible with the one expected by Nextflow.\n\nThe latest version of Nextflow does not require the `ps` tool any more to fetch execution metrics\nand runtime statistics, therefore this information is collected and correctly reported when using Biocontainers\nimages.\n\n### Conclusion\n\nWe are very excited by this new feature bringing the ability to use popular Conda tool collections,\nsuch as Bioconda, directly into Nextflow workflow applications.\n\nNextflow developers have now yet another option to transparently manage the dependencies in their\nworkflows along with [Environment Modules](/docs/latest/process.html#module) and [containers](/docs/latest/docker.html)\n[technology](/docs/latest/singularity.html), giving them great configuration flexibility.\n\nThe resulting workflow applications can easily be reconfigured and deployed across a range of different\nplatforms choosing the best technology according to the requirements of the target system.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,conda,bioconda" @@ -217,7 +217,7 @@ "slug": "2018/goodbye-zero-hello-apache", "title": "Goodbye zero, Hello Apache!", "date": "2018-10-24T00:00:00.000Z", - "content": "\nToday marks an important milestone in the Nextflow project. We are thrilled to announce three important changes to better meet users’ needs and ground the project on a solid foundation upon which to build a vibrant ecosystem of tools and data analysis applications for genomic research and beyond.\n\n### Apache license\n\nNextflow was originally licensed as GPLv3 open source software more than five years ago. GPL is designed to promote the adoption and spread of open source software and culture. On the other hand it has also some controversial side-effects, such as the one on derivative works and legal implications which make the use of GPL released software a headache in many organisations. We have previously discussed these concerns in this blog post and, after community feedback, have opted to change the project license to Apache 2.0.\n\nThis is a popular permissive free software license written by the Apache Software Foundation (ASF). Software distributed with this license requires the preservation of the copyright notice and disclaimer. It allows the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software without dictating the licence terms of the resulting applications and derivative works. We are sure this licensing model addresses the concerns raised by the Nextflow community and will boost further project developments.\n\n### New release schema\n\nIn the time since Nextflow was open sourced, we have released 150 versions which have been used by many organizations to deploy critical production workflows on a large range of computational platforms and under heavy loads and stress conditions.\n\nFor example, at the Centre for Genomic Regulation (CRG) alone, Nextflow has been used to deploy data intensive computation workflows since 2014, and it has orchestrated the execution of over 12 million jobs totalling 1.4 million CPU-hours.\n\n\"Nextflow\n\nThis extensive use across different execution environments has resulted in a reliable software package, and it's therefore finally time to declare Nextflow stable and drop the zero from the version number!\n\nFrom today onwards, Nextflow will use a 3 monthly time-based _stable_ release cycle. Today's release is numbered as **18.10**, the next one will be on January 2019, numbered as 19.01, and so on. This gives our users a more predictable release cadence and allows us to better focus on new feature development and scheduling.\n\nAlong with the 3-months stable release cycle, we will provide a monthly _edge_ release, which will include access to the latest experimental features and developments. As such, it should only be used for evaluation and testing purposes.\n\n### Commercial support\n\nFinally, for organisations requiring commercial support, we have recently incorporated Seqera Labs, a spin-off of the Centre for Genomic Regulation.\n\nSeqera Labs will foster Nextflow adoption as professional open source software by providing commercial support services and exploring new innovative products and solutions.\n\nIt's important to highlight that Seqera Labs will not close or make Nextflow a commercial project. Nextflow is and will continue to be owned by the CRG and the other contributing organisations and individuals.\n\n### Conclusion\n\nThe Nextflow project has reached an important milestone. In the last five years it has grown and managed to become a stable technology used by thousands of people daily to deploy large scale workloads for life science data analysis applications and beyond. The project is now exiting from the experimental stage.\n\nWith the above changes we want to fulfil the needs of researchers, for a reliable tool enabling scalable and reproducible data analysis, along with the demand of production oriented users, who require reliable support and services for critical deployments.\n\nAbove all, our aim is to strengthen the community effort around the Nextflow ecosystem and make it a sustainable and solid technology in the long run.\n\n### Credits\n\nWe want to say thank you to all the people who have supported and contributed to this project to this stage. First of all to Cedric Notredame for his long term commitment to the project within the Comparative Bioinformatics group at CRG. The Open Bioinformatics Foundation (OBF) in the name of Chris Fields and The Ontario Institute for Cancer Research (OICR), namely Dr Lincoln Stein, for supporting the Nextflow change of license. The CRG TBDO department, and in particular Salvatore Cappadona for his continued support and advice. Finally, the user community who with their feedback and constructive criticism contribute everyday to make this project more stable, useful and powerful.\n", + "content": "Today marks an important milestone in the Nextflow project. We are thrilled to announce three important changes to better meet users’ needs and ground the project on a solid foundation upon which to build a vibrant ecosystem of tools and data analysis applications for genomic research and beyond.\n\n### Apache license\n\nNextflow was originally licensed as GPLv3 open source software more than five years ago. GPL is designed to promote the adoption and spread of open source software and culture. On the other hand it has also some controversial side-effects, such as the one on [derivative works](https://copyleft.org/guide/comprehensive-gpl-guidech5.html) and [legal implications](https://opensource.com/law/14/7/lawsuit-threatens-break-new-ground-gpl-and-software-licensing-issues) which make the use of GPL released software a headache in many organisations. We have previously discussed these concerns in [this blog post](/blog/2018/clarification-about-nextflow-license.html) and, after community feedback, have opted to change the project license to Apache 2.0.\n\nThis is a popular permissive free software license written by the [Apache Software Foundation](https://www.apache.org/) (ASF). Software distributed with this license requires the preservation of the copyright notice and disclaimer. It allows the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software without dictating the licence terms of the resulting applications and derivative works. We are sure this licensing model addresses the concerns raised by the Nextflow community and will boost further project developments.\n\n### New release schema\n\nIn the time since Nextflow was open sourced, we have released 150 versions which have been used by many organizations to deploy critical production workflows on a large range of computational platforms and under heavy loads and stress conditions.\n\nFor example, at the Centre for Genomic Regulation (CRG) alone, Nextflow has been used to deploy data intensive computation workflows since 2014, and it has orchestrated the execution of over 12 million jobs totalling 1.4 million CPU-hours.\n\n\"Nextflow\n\nThis extensive use across different execution environments has resulted in a reliable software package, and it's therefore finally time to declare Nextflow stable and drop the zero from the version number!\n\nFrom today onwards, Nextflow will use a 3 monthly time-based _stable_ release cycle. Today's release is numbered as **18.10**, the next one will be on January 2019, numbered as 19.01, and so on. This gives our users a more predictable release cadence and allows us to better focus on new feature development and scheduling.\n\nAlong with the 3-months stable release cycle, we will provide a monthly _edge_ release, which will include access to the latest experimental features and developments. As such, it should only be used for evaluation and testing purposes.\n\n### Commercial support\n\nFinally, for organisations requiring commercial support, we have recently incorporated [Seqera Labs](https://www.seqera.io/), a spin-off of the Centre for Genomic Regulation.\n\nSeqera Labs will foster Nextflow adoption as professional open source software by providing commercial support services and exploring new innovative products and solutions.\n\nIt's important to highlight that Seqera Labs will not close or make Nextflow a commercial project. Nextflow is and will continue to be owned by the CRG and the other contributing organisations and individuals.\n\n### Conclusion\n\nThe Nextflow project has reached an important milestone. In the last five years it has grown and managed to become a stable technology used by thousands of people daily to deploy large scale workloads for life science data analysis applications and beyond. The project is now exiting from the experimental stage.\n\nWith the above changes we want to fulfil the needs of researchers, for a reliable tool enabling scalable and reproducible data analysis, along with the demand of production oriented users, who require reliable support and services for critical deployments.\n\nAbove all, our aim is to strengthen the community effort around the Nextflow ecosystem and make it a sustainable and solid technology in the long run.\n\n### Credits\n\nWe want to say thank you to all the people who have supported and contributed to this project to this stage. First of all to Cedric Notredame for his long term commitment to the project within the Comparative Bioinformatics group at CRG. The Open Bioinformatics Foundation (OBF) in the name of Chris Fields and The Ontario Institute for Cancer Research (OICR), namely Dr Lincoln Stein, for supporting the Nextflow change of license. The CRG TBDO department, and in particular Salvatore Cappadona for his continued support and advice. Finally, the user community who with their feedback and constructive criticism contribute everyday to make this project more stable, useful and powerful.", "images": [ "/img/nextflow-release-schema-01.png" ], @@ -228,7 +228,7 @@ "slug": "2018/nextflow-meets-dockstore", "title": "Nextflow meets Dockstore", "date": "2018-09-18T00:00:00.000Z", - "content": "\n
\nThis post is co-authored with Denis Yuen, lead of the Dockstore project at the Ontario Institute for Cancer Research\n
\n\nOne key feature of Nextflow is the ability to automatically pull and execute a workflow application directly from a sharing platform such as GitHub. We realised this was critical to allow users to properly track code changes and releases and, above all, to enable the [seamless sharing of workflow projects](/blog/2016/best-practice-for-reproducibility.html).\n\nNextflow never wanted to implement its own centralised workflow registry because we thought that in order for a registry to be viable and therefore useful, it should be technology agnostic and it should be driven by a consensus among the wider user community.\n\nThis is exactly what the [Dockstore](https://dockstore.org/) project is designed for and for this reason we are thrilled to announce that Dockstore has just released the support for Nextflow workflows in its latest release!\n\n### Dockstore in a nutshell\n\nDockstore is an open platform that collects and catalogs scientific data analysis tools and workflows, starting from the genomics community. It’s developed by the [OICR](https://oicr.on.ca/) in collaboration with [UCSC](https://ucscgenomics.soe.ucsc.edu/) and it is based on the [GA4GH](https://www.ga4gh.org/) open standards and the FAIR principles i.e. the idea to make research data and applications findable, accessible, interoperable and reusable ([FAIR](https://www.nature.com/articles/sdata201618)).\n\n\"Dockstore\n\nIn Dockstore’s initial release of support for Nextflow, users will be able to register and display Nextflow workflows. Many of Dockstore’s cross-language features will be available such as [searching](https://dockstore.org/search?descriptorType=nfl&searchMode=files), displaying metadata information on authorship from Nextflow’s config ([author and description](https://www.nextflow.io/docs/latest/config.html?highlight=author#scope-manifest)), displaying the [Docker images](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=tools) used by a workflow, and limited support for displaying a visualization of the [workflow structure](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=dag).\n\nThe Dockstore team will initially work to on-board the high-quality [nf-core](https://github.com/nf-core) workflows curated by the Nextflow community. However, all developers that develop Nextflow workflows will be able to login, contribute, and maintain workflows starting with our standard [workflow tutorials](https://docs.dockstore.org/docs/publisher-tutorials/workflows/).\n\nMoving forward, the Dockstore team hopes to engage more with the Nextflow community and integrate Nextflow code in order to streamline the process of publishing Nextflow workflows and draw better visualizations of Nextflow workflows. Dockstore also hopes to work with a cloud vendor to add browser based launch-with support for Nextflow workflows.\n\nFinally, support for Nextflow workflows in Dockstore will also enable the possibility of cloud platforms that implement [GA4GH WES](https://github.com/ga4gh/workflow-execution-service-schemas) to run Nextflow workflows.\n\n### Conclusion\n\nWe welcome the support for Nextflow workflows in the Dockstore platform. This is a valuable contribution and presents great opportunities for workflow developers and the wider scientific community.\n\nWe invite all Nextflow developers to register their data analysis applications in the Dockstore platform to make them accessible and reusable to a wider community of researchers.\n", + "content": "
\n*This post is co-authored with Denis Yuen, lead of the Dockstore project at the Ontario Institute for Cancer Research*\n
\n\nOne key feature of Nextflow is the ability to automatically pull and execute a workflow application directly from a sharing platform such as GitHub. We realised this was critical to allow users to properly track code changes and releases and, above all, to enable the [seamless sharing of workflow projects](/blog/2016/best-practice-for-reproducibility.html).\n\nNextflow never wanted to implement its own centralised workflow registry because we thought that in order for a registry to be viable and therefore useful, it should be technology agnostic and it should be driven by a consensus among the wider user community.\n\nThis is exactly what the [Dockstore](https://dockstore.org/) project is designed for and for this reason we are thrilled to announce that Dockstore has just released the support for Nextflow workflows in its latest release!\n\n### Dockstore in a nutshell\n\nDockstore is an open platform that collects and catalogs scientific data analysis tools and workflows, starting from the genomics community. It’s developed by the [OICR](https://oicr.on.ca/) in collaboration with [UCSC](https://ucscgenomics.soe.ucsc.edu/) and it is based on the [GA4GH](https://www.ga4gh.org/) open standards and the FAIR principles i.e. the idea to make research data and applications findable, accessible, interoperable and reusable ([FAIR](https://www.nature.com/articles/sdata201618)).\n\n\"Dockstore\n\nIn Dockstore’s initial release of support for Nextflow, users will be able to register and display Nextflow workflows. Many of Dockstore’s cross-language features will be available such as [searching](https://dockstore.org/search?descriptorType=nfl&searchMode=files), displaying metadata information on authorship from Nextflow’s config ([author and description](https://www.nextflow.io/docs/latest/config.html?highlight=author#scope-manifest)), displaying the [Docker images](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=tools) used by a workflow, and limited support for displaying a visualization of the [workflow structure](https://dockstore.org/workflows/github.com/nf-core/hlatyping:1.1.1?tab=dag).\n\nThe Dockstore team will initially work to on-board the high-quality [nf-core](https://github.com/nf-core) workflows curated by the Nextflow community. However, all developers that develop Nextflow workflows will be able to login, contribute, and maintain workflows starting with our standard [workflow tutorials](https://docs.dockstore.org/docs/publisher-tutorials/workflows/).\n\nMoving forward, the Dockstore team hopes to engage more with the Nextflow community and integrate Nextflow code in order to streamline the process of publishing Nextflow workflows and draw better visualizations of Nextflow workflows. Dockstore also hopes to work with a cloud vendor to add browser based launch-with support for Nextflow workflows.\n\nFinally, support for Nextflow workflows in Dockstore will also enable the possibility of cloud platforms that implement [GA4GH WES](https://github.com/ga4gh/workflow-execution-service-schemas) to run Nextflow workflows.\n\n### Conclusion\n\nWe welcome the support for Nextflow workflows in the Dockstore platform. This is a valuable contribution and presents great opportunities for workflow developers and the wider scientific community.\n\nWe invite all Nextflow developers to register their data analysis applications in the Dockstore platform to make them accessible and reusable to a wider community of researchers.", "images": [ "/img/dockstore.png" ], @@ -239,7 +239,7 @@ "slug": "2018/nextflow-turns-5", "title": "Nextflow turns five! Happy birthday!", "date": "2018-04-03T00:00:00.000Z", - "content": "\nNextflow is growing up. The past week marked five years since the [first commit](https://github.com/nextflow-io/nextflow/commit/c080150321e5000a2c891e477bb582df07b7f75f) of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!\n\nA lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the [Gitter channel](https://gitter.im/nextflow-io/nextflow).\n\nHere are a few neat developments churning out of the birthday cake mix.\n\n### nf-core\n\n[nf-core](https://nf-core.github.io/) is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being led by [Phil Ewels](https://github.com/ewels) of [MultiQC](http://multiqc.info/) fame. The principle is that _nf-core_ pipelines can be used out-of-the-box or as inspiration for something different.\n\nAs well as being a place for best-practise pipelines, other features of _nf-core_ include the [cookie cutter template tool](https://github.com/nf-core/cookiecutter) which provides a fast way to create a dependable workflow using many of Nextflow’s sweet capabilities such as:\n\n- _Outline:_ Skeleton pipeline script.\n- _Data:_ Reference Genome implementation (AWS iGenomes).\n- _Configuration:_ Robust configuration setup.\n- _Containers:_ Skeleton files for Docker image generation.\n- _Reporting:_ HTML email functionality and and HTML results output.\n- _Documentation:_ Installation, Usage, Output, Troubleshooting, etc.\n- _Continuous Integration:_ Skeleton files for automated testing using Travis CI.\n\nThere is also a Python package with helper tools for Nextflow.\n\nYou can find more information about the community via the project [website](https://nf-core.github.io), [GitHub repository](https://github.com/nf-core), [Twitter account](https://twitter.com/nf_core) or join the dedicated [Gitter](https://gitter.im/nf-core/Lobby) chat.\n\n
\n\n[![nf-core logo](/img/nf-core-logo-min.png)](https://nf-co.re)\n\n
\n\n### Kubernetes has landed\n\nAs of version 0.28.0 Nextflow now has support for Kubernetes. If you don’t know much about Kubernetes, at its heart it is an open-source platform for the management and deployment of containers at scale. Google led the initial design and it is now maintained by the Cloud Native Computing Foundation. I found the [The Illustrated Children's Guide to Kubernetes](https://www.youtube.com/watch?v=4ht22ReBjno) particularly useful in explaining the basic vocabulary and concepts.\n\nKubernetes looks be one of the key technologies for the application of containers in the cloud as well as for building Infrastructure as a Service (IaaS) and Platform and a Service (PaaS) applications. We have been approached by many users who wish to use Nextflow with Kubernetes to be able to deploy workflows across both academic and commercial settings. With enterprise versions of Kubernetes such as Red Hat's [OpenShift](https://www.openshift.com/), it was becoming apparent there was a need for native execution with Nextflow.\n\nThe new command `nextflow kuberun` launches the Nextflow driver as a _pod_ which is then able to run workflow tasks as other pods within a Kubernetes cluster. You can read more in the documentation on Kubernetes support for Nextflow [here](https://www.nextflow.io/docs/latest/kubernetes.html).\n\n![Nextflow and Kubernetes](/img/nextflow-kubernetes-min.png)\n\n### Improved reporting and notifications\n\nFollowing the hackathon in September we wrote about the addition of HTML trace reports that allow for the generation HTML detailing resource usage (CPU time, memory, disk i/o etc).\n\nThanks to valuable feedback there has continued to be many improvements to the reports as tracked through the Nextflow GitHub issues page. Reports are now able to display [thousands of tasks](https://github.com/nextflow-io/nextflow/issues/547) and include extra information such as the [container engine used](https://github.com/nextflow-io/nextflow/issues/521). Tasks can be filtered and an [overall progress bar](https://github.com/nextflow-io/nextflow/issues/534) has been added.\n\nYou can explore a [real-world HTML report](/misc/nf-trace-report2.html) and more information on HTML reports can be found in the [documentation](https://www.nextflow.io/docs/latest/tracing.html).\n\nThere has also been additions to workflow notifications. Currently these can be configured to automatically send a notification email when a workflow execution terminates. You can read more about how to setup notifications in the [documentation](https://www.nextflow.io/docs/latest/mail.html?highlight=notification#workflow-notification).\n\n### Syntax-tic!\n\nWriting workflows no longer has to be done in monochrome. There is now syntax highlighting for Nextflow in the popular [Atom editor](https://atom.io) as well as in [Visual Studio Code](https://code.visualstudio.com).\n\n
\n\n[![Nextflow syntax highlighting with Atom](/img/atom-min.png)](/img/atom-min.png)\n\n
\n\n[![Nextflow syntax highlighting with VSCode](/img/vscode-min.png)](/img/vscode-min.png)\n\n
\n\nYou can find the Atom plugin by searching for Nextflow in Atoms package installer or clicking [here](https://atom.io/packages/language-nextflow). The Visual Studio plugin can be downloaded [here](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow).\n\nOn a related note, Nextflow is now an official language on GitHub!\n\n![GitHub nextflow syntax](/img/github-nf-syntax-min.png)\n\n### Conclusion\n\nNextflow developments are progressing faster than ever and with the help of the community, there are a ton of great new features on the way. If you have any suggestions of your killer NF idea then please drop us a line, open an issue or even better, join in the fun.\n\nOver the coming months Nextflow will be reaching out with several training and presentation sessions across the US and Europe. We hope to see as many of you as possible on the road.\n", + "content": "Nextflow is growing up. The past week marked five years since the [first commit](https://github.com/nextflow-io/nextflow/commit/c080150321e5000a2c891e477bb582df07b7f75f) of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!\n\nA lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the [Gitter channel](https://gitter.im/nextflow-io/nextflow).\n\nHere are a few neat developments churning out of the birthday cake mix.\n\n### nf-core\n\n[nf-core](https://nf-core.github.io/) is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being led by [Phil Ewels](https://github.com/ewels) of [MultiQC](http://multiqc.info/) fame. The principle is that _nf-core_ pipelines can be used out-of-the-box or as inspiration for something different.\n\nAs well as being a place for best-practise pipelines, other features of _nf-core_ include the [cookie cutter template tool](https://github.com/nf-core/cookiecutter) which provides a fast way to create a dependable workflow using many of Nextflow’s sweet capabilities such as:\n\n- _Outline:_ Skeleton pipeline script.\n- _Data:_ Reference Genome implementation (AWS iGenomes).\n- _Configuration:_ Robust configuration setup.\n- _Containers:_ Skeleton files for Docker image generation.\n- _Reporting:_ HTML email functionality and and HTML results output.\n- _Documentation:_ Installation, Usage, Output, Troubleshooting, etc.\n- _Continuous Integration:_ Skeleton files for automated testing using Travis CI.\n\nThere is also a Python package with helper tools for Nextflow.\n\nYou can find more information about the community via the project [website](https://nf-core.github.io), [GitHub repository](https://github.com/nf-core), [Twitter account](https://twitter.com/nf_core) or join the dedicated [Gitter](https://gitter.im/nf-core/Lobby) chat.\n\n
\n\n[![nf-core logo](/img/nf-core-logo-min.png)](https://nf-co.re)\n\n
\n\n### Kubernetes has landed\n\nAs of version 0.28.0 Nextflow now has support for Kubernetes. If you don’t know much about Kubernetes, at its heart it is an open-source platform for the management and deployment of containers at scale. Google led the initial design and it is now maintained by the Cloud Native Computing Foundation. I found the [The Illustrated Children's Guide to Kubernetes](https://www.youtube.com/watch?v=4ht22ReBjno) particularly useful in explaining the basic vocabulary and concepts.\n\nKubernetes looks be one of the key technologies for the application of containers in the cloud as well as for building Infrastructure as a Service (IaaS) and Platform and a Service (PaaS) applications. We have been approached by many users who wish to use Nextflow with Kubernetes to be able to deploy workflows across both academic and commercial settings. With enterprise versions of Kubernetes such as Red Hat's [OpenShift](https://www.openshift.com/), it was becoming apparent there was a need for native execution with Nextflow.\n\nThe new command `nextflow kuberun` launches the Nextflow driver as a _pod_ which is then able to run workflow tasks as other pods within a Kubernetes cluster. You can read more in the documentation on Kubernetes support for Nextflow [here](https://www.nextflow.io/docs/latest/kubernetes.html).\n\n![Nextflow and Kubernetes](/img/nextflow-kubernetes-min.png)\n\n### Improved reporting and notifications\n\nFollowing the hackathon in September we wrote about the addition of HTML trace reports that allow for the generation HTML detailing resource usage (CPU time, memory, disk i/o etc).\n\nThanks to valuable feedback there has continued to be many improvements to the reports as tracked through the Nextflow GitHub issues page. Reports are now able to display [thousands of tasks](https://github.com/nextflow-io/nextflow/issues/547) and include extra information such as the [container engine used](https://github.com/nextflow-io/nextflow/issues/521). Tasks can be filtered and an [overall progress bar](https://github.com/nextflow-io/nextflow/issues/534) has been added.\n\nYou can explore a [real-world HTML report](/misc/nf-trace-report2.html) and more information on HTML reports can be found in the [documentation](https://www.nextflow.io/docs/latest/tracing.html).\n\nThere has also been additions to workflow notifications. Currently these can be configured to automatically send a notification email when a workflow execution terminates. You can read more about how to setup notifications in the [documentation](https://www.nextflow.io/docs/latest/mail.html?highlight=notification#workflow-notification).\n\n### Syntax-tic!\n\nWriting workflows no longer has to be done in monochrome. There is now syntax highlighting for Nextflow in the popular [Atom editor](https://atom.io) as well as in [Visual Studio Code](https://code.visualstudio.com).\n\n
\n\n[![Nextflow syntax highlighting with Atom](/img/atom-min.png)](/img/atom-min.png)\n\n
\n\n[![Nextflow syntax highlighting with VSCode](/img/vscode-min.png)](/img/vscode-min.png)\n\n
\n\nYou can find the Atom plugin by searching for Nextflow in Atoms package installer or clicking [here](https://atom.io/packages/language-nextflow). The Visual Studio plugin can be downloaded [here](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow).\n\nOn a related note, Nextflow is now an official language on GitHub!\n\n![GitHub nextflow syntax](/img/github-nf-syntax-min.png)\n\n### Conclusion\n\nNextflow developments are progressing faster than ever and with the help of the community, there are a ton of great new features on the way. If you have any suggestions of your killer NF idea then please drop us a line, open an issue or even better, join in the fun.\n\nOver the coming months Nextflow will be reaching out with several training and presentation sessions across the US and Europe. We hope to see as many of you as possible on the road.", "images": [], "author": "Evan Floden", "tags": "nextflow,kubernetes,nf-core" @@ -248,7 +248,7 @@ "slug": "2019/demystifying-nextflow-resume", "title": "Demystifying Nextflow resume", "date": "2019-06-24T00:00:00.000Z", - "content": "\n_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part two [here](/blog/2019/troubleshooting-nextflow-resume.html)_\n\nTask execution caching and checkpointing is an essential feature of any modern workflow manager and Nextflow provides an automated caching mechanism with every workflow execution. When using the `-resume` flag, successfully completed tasks are skipped and the previously cached results are used in downstream tasks. But understanding the specifics of how it works and debugging situations when the behaviour is not as expected is a common source of frustration.\n\nThe mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, called the working directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number obtained from a composition of the task’s:\n\n- Inputs values\n- Input files\n- Command line string\n- Container ID\n- Conda environment\n- Environment modules\n- Any executed scripts in the bin directory\n\n### How does resume work?\n\nThe `-resume` command line option allows for the continuation of a workflow execution. It can be used in its most basic form with:\n\n```\n$ nextflow run nextflow-io/hello -resume\n```\n\nIn practice, every execution starts from the beginning. However, when using resume, before launching a task, Nextflow uses the unique ID to check if:\n\n- the working directory exists\n- it contains a valid command exit status\n- it contains the expected output files.\n\nIf these conditions are satisfied, the task execution is skipped and the previously computed outputs are applied. When a task requires recomputation, ie. the conditions above are not fulfilled, the downstream tasks are automatically invalidated.\n\n### The working directory\n\nBy default, the task work directories are created in the directory from where the pipeline is launched. This is often a scratch storage area that can be cleaned up once the computation is completed. A different location for the execution work directory can be specified using the command line option `-w` e.g.\n\n```\n$ nextflow run ", "images": [], "author": "Evan Floden", "tags": "nextflow,resume" @@ -257,7 +257,7 @@ "slug": "2019/easy-provenance-report", "title": "Easy provenance reporting", "date": "2019-08-29T00:00:00.000Z", - "content": "\n_Continuing our [series on understanding Nextflow resume](blog/2019/demystifying-nextflow-resume.html), we wanted to delve deeper to show how you can report which tasks contribute to a given workflow output._\n\n### Easy provenance reports\n\nWhen provided with a run name or session ID, the log command can return useful information about a pipeline execution. This can be composed to track the provenance of a workflow result.\n\nWhen supplying a run name or session ID, the log command lists all the work directories used to compute the final result. For example:\n\n```\n$ nextflow log tiny_fermat\n\n/data/.../work/7b/3753ff13b1fa5348d2d9b6f512153a\n/data/.../work/c1/56a36d8f498c99ac6cba31e85b3e0c\n/data/.../work/f7/659c65ef60582d9713252bcfbcc310\n/data/.../work/82/ba67e3175bd9e6479d4310e5a92f99\n/data/.../work/e5/2816b9d4e7b402bfdd6597c2c2403d\n/data/.../work/3b/3485d00b0115f89e4c202eacf82eba\n```\n\nUsing the option `-f` (fields) it’s possible to specify which metadata should be printed by the log command. For example:\n\n```\n$ nextflow log tiny_fermat -f 'process,exit,hash,duration'\n\nindex\t0\t7b/3753ff\t2s\nfastqc\t0\tc1/56a36d\t9.3s\nfastqc\t0\tf7/659c65\t9.1s\nquant\t0\t82/ba67e3\t2.7s\nquant\t0\te5/2816b9\t3.2s\nmultiqc\t0\t3b/3485d0\t6.3s\n```\n\nThe complete list of available fields can be retrieved with the command:\n\n```\n$ nextflow log -l\n```\n\nThe option `-F` allows the specification of filtering criteria to print only a subset of tasks. For example:\n\n```\n$ nextflow log tiny_fermat -F 'process =~ /fastqc/'\n\n/data/.../work/c1/56a36d8f498c99ac6cba31e85b3e0c\n/data/.../work/f7/659c65ef60582d9713252bcfbcc310\n```\n\nThis can be useful to locate specific tasks work directories.\n\nFinally, the `-t` option allows for the creation of a basic custom HTML provenance report that can be generated by providing a template file, in any format of your choice. For example:\n\n```\n
\n

${name}

\n
\nScript:\n
${script}
\n
\n\n
    \n
  • Exit: ${exit}
  • \n
  • Status: ${status}
  • \n
  • Work dir: ${workdir}
  • \n
  • Container: ${container}
  • \n
\n
\n```\n\nBy saving the above snippet in a file named template.html, you can run the following command:\n\n```\n$ nextflow log tiny_fermat -t template.html > provenance.html\n```\n\nOpen it in your browser, et voilà!\n\n## Conclusion\n\nThis post introduces a little know Nextflow feature and it's intended to show how it can be used\nto produce a custom execution report reporting some - basic - provenance information.\n\nIn future releases we plan to support a more formal provenance specification and execution tracking features.\n", + "content": "_Continuing our [series on understanding Nextflow resume](blog/2019/demystifying-nextflow-resume.html), we wanted to delve deeper to show how you can report which tasks contribute to a given workflow output._\n\n### Easy provenance reports\n\nWhen provided with a run name or session ID, the log command can return useful information about a pipeline execution. This can be composed to track the provenance of a workflow result.\n\nWhen supplying a run name or session ID, the log command lists all the work directories used to compute the final result. For example:\n\n```\n$ nextflow log tiny_fermat\n\n/data/.../work/7b/3753ff13b1fa5348d2d9b6f512153a\n/data/.../work/c1/56a36d8f498c99ac6cba31e85b3e0c\n/data/.../work/f7/659c65ef60582d9713252bcfbcc310\n/data/.../work/82/ba67e3175bd9e6479d4310e5a92f99\n/data/.../work/e5/2816b9d4e7b402bfdd6597c2c2403d\n/data/.../work/3b/3485d00b0115f89e4c202eacf82eba\n```\n\nUsing the option `-f` (fields) it’s possible to specify which metadata should be printed by the log command. For example:\n\n```\n$ nextflow log tiny_fermat -f 'process,exit,hash,duration'\n\nindex\t0\t7b/3753ff\t2s\nfastqc\t0\tc1/56a36d\t9.3s\nfastqc\t0\tf7/659c65\t9.1s\nquant\t0\t82/ba67e3\t2.7s\nquant\t0\te5/2816b9\t3.2s\nmultiqc\t0\t3b/3485d0\t6.3s\n```\n\nThe complete list of available fields can be retrieved with the command:\n\n```\n$ nextflow log -l\n```\n\nThe option `-F` allows the specification of filtering criteria to print only a subset of tasks. For example:\n\n```\n$ nextflow log tiny_fermat -F 'process =~ /fastqc/'\n\n/data/.../work/c1/56a36d8f498c99ac6cba31e85b3e0c\n/data/.../work/f7/659c65ef60582d9713252bcfbcc310\n```\n\nThis can be useful to locate specific tasks work directories.\n\nFinally, the `-t` option allows for the creation of a basic custom HTML provenance report that can be generated by providing a template file, in any format of your choice. For example:\n\n```\n
\n

${name}

\n
\nScript:\n
${script}
\n
\n\n- Exit: ${exit}\n- Status: ${status}\n- Work dir: ${workdir}\n- Container: ${container}\n\n
\n```\n\nBy saving the above snippet in a file named template.html, you can run the following command:\n\n```\n$ nextflow log tiny_fermat -t template.html > provenance.html\n```\n\nOpen it in your browser, et voilà!\n\n## Conclusion\n\nThis post introduces a little know Nextflow feature and it's intended to show how it can be used\nto produce a custom execution report reporting some - basic - provenance information.\n\nIn future releases we plan to support a more formal provenance specification and execution tracking features.", "images": [], "author": "Evan Floden", "tags": "nextflow,resume" @@ -266,7 +266,7 @@ "slug": "2019/one-more-step-towards-modules", "title": "One more step towards Nextflow modules", "date": "2019-05-22T00:00:00.000Z", - "content": "\nThe ability to create components, libraries or module files has been\namong the most requested feature ever over the years.\n\nFor this reason, today we are very happy to announce that a preview implementation\nof the [modules feature](https://github.com/nextflow-io/nextflow/issues/984) has been merged\non master branch of the project and included in the\n[19.05.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v19.05.0-edge) release.\n\nThe implementation of this feature has opened the possibility for many fantastic improvements to Nextflow and its syntax. We are extremely excited as it results in a radical new way of writing Nextflow applications! So much so, that we are referring to these changes as DSL 2.\n\n#### Enabling DSL 2 syntax\n\nSince this is still a preview technology and, above all, to not break\nany existing applications, to enable the new syntax you will need to add\nthe following line at the beginning of your workflow script:\n\n```\nnextflow.preview.dsl=2\n```\n\n#### Module files\n\nA module file simply consists of one or more `process` definitions, written with the usual syntax. The _only_ difference is that the `from` and `into` clauses in the `input:` and `output:` definition blocks has to be omitted. For example:\n\n```\nprocess INDEX {\n input:\n file transcriptome\n output:\n file 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}\n```\n\nThe above snippet defines a process component that can be imported in the main\napplication script using the `include` statement shown below.\n\nAlso, module files can declare optional parameters using the usual `params` idiom,\nas it can be done in any standard script file.\n\nThis approach, which is consistent with the current Nextflow syntax, makes very easy to migrate existing code to the new modules system, reducing it to a mere copy & pasting exercise in most cases.\n\nYou can see a complete module file [here](https://github.com/nextflow-io/rnaseq-nf/blob/66ebeea/modules/rnaseq.nf).\n\n### Module inclusion\n\nA module file can be included into a Nextflow script using the `include` statement.\nWith this it becomes possible to reference any process defined in the module using the usual syntax for a function invocation, and specifying the expected input channels as they were function arguments.\n\n```\nnextflow.preview.dsl=2\ninclude 'modules/rnaseq'\n\nread_pairs_ch = Channel.fromFilePairs( params.reads, checkIfExists: true )\ntranscriptome_file = file( params.transcriptome )\n\nINDEX( transcriptome_file )\nFASTQC( read_pairs_ch )\nQUANT( INDEX.out, read_pairs_ch )\nMULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n```\n\nNotably, each process defines its own namespace in the script scope which allows the access of the process output channel(s) using the `.out` attribute. This can be used then as any other Nextflow channel variable in your pipeline script.\n\nThe `include` statement gives also the possibility to include only a [specific process](https://www.nextflow.io/docs/edge/dsl2.html#selective-inclusion)\nor to include a process with a different [name alias](https://www.nextflow.io/docs/edge/dsl2.html#module-aliases).\n\n### Smart channel forking\n\nOne of the most important changes of the new syntax is that any channel can be read as many\ntimes as you need removing the requirement to duplicate them using the `into` operator.\n\nFor example, in the above snippet, the `read_pairs_ch` channel has been used twice, as input both for the `FASTQC` and `QUANT` processes. Nextflow forks it behind the scene for you.\n\nThis makes the writing of workflow scripts much more fluent, readable and ... fun! No more channel names proliferation!\n\n### Nextflow pipes!\n\nFinally, maybe our favourite one. The new DSL introduces the `|` (pipe) operator which allows for the composition\nof Nextflow channels, processes and operators together seamlessly in a much more expressive way.\n\nConsider the following example:\n\n```\nprocess align {\n input:\n file seq\n output:\n file 'result'\n\n \"\"\"\n t_coffee -in=${seq} -out result\n \"\"\"\n}\n\nChannel.fromPath(params.in) | splitFasta | align | view { it.text }\n```\n\nIn the last line, the `fromPath` channel is piped to the [`splitFasta`](https://www.nextflow.io/docs/latest/operator.html#splitfasta) operator whose result is used as input by\nthe `align` process. Then the output is finally printed by the [`view`](https://www.nextflow.io/docs/latest/operator.html#view)\noperator.\n\nThis syntax finally realizes the Nextflow vision of empowering developers to write\ncomplex data analysis applications with a simple but powerful language that mimics\nthe expressiveness of the Unix pipe model but at the same time makes it possible to\nhandle complex data structures and patterns as is required for highly\nparallelised and distributed computational workflows.\n\n#### Conclusion\n\nThis wave of improvements brings a radically new experience when it comes to\nwriting Nextflow workflows. We are releasing it as a preview technology to allow\nusers to try, test, provide their feedback and give us the possibility\nstabilise it.\n\nWe are also working to other important enhancements that will be included soon,\nsuch as remote modules, sub-workflows composition, simplified file path\nwrangling and more. Stay tuned!\n", + "content": "The ability to create components, libraries or module files has been\namong the most requested feature ever over the years.\n\nFor this reason, today we are very happy to announce that a preview implementation\nof the [modules feature](https://github.com/nextflow-io/nextflow/issues/984) has been merged\non master branch of the project and included in the\n[19.05.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v19.05.0-edge) release.\n\nThe implementation of this feature has opened the possibility for many fantastic improvements to Nextflow and its syntax. We are extremely excited as it results in a radical new way of writing Nextflow applications! So much so, that we are referring to these changes as DSL 2.\n\n#### Enabling DSL 2 syntax\n\nSince this is still a preview technology and, above all, to not break\nany existing applications, to enable the new syntax you will need to add\nthe following line at the beginning of your workflow script:\n\n```\nnextflow.preview.dsl=2\n```\n\n#### Module files\n\nA module file simply consists of one or more `process` definitions, written with the usual syntax. The _only_ difference is that the `from` and `into` clauses in the `input:` and `output:` definition blocks has to be omitted. For example:\n\n```\nprocess INDEX {\n input:\n file transcriptome\n output:\n file 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}\n```\n\nThe above snippet defines a process component that can be imported in the main\napplication script using the `include` statement shown below.\n\nAlso, module files can declare optional parameters using the usual `params` idiom,\nas it can be done in any standard script file.\n\nThis approach, which is consistent with the current Nextflow syntax, makes very easy to migrate existing code to the new modules system, reducing it to a mere copy & pasting exercise in most cases.\n\nYou can see a complete module file [here](https://github.com/nextflow-io/rnaseq-nf/blob/66ebeea/modules/rnaseq.nf).\n\n### Module inclusion\n\nA module file can be included into a Nextflow script using the `include` statement.\nWith this it becomes possible to reference any process defined in the module using the usual syntax for a function invocation, and specifying the expected input channels as they were function arguments.\n\n```\nnextflow.preview.dsl=2\ninclude 'modules/rnaseq'\n\nread_pairs_ch = Channel.fromFilePairs( params.reads, checkIfExists: true )\ntranscriptome_file = file( params.transcriptome )\n\nINDEX( transcriptome_file )\nFASTQC( read_pairs_ch )\nQUANT( INDEX.out, read_pairs_ch )\nMULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n```\n\nNotably, each process defines its own namespace in the script scope which allows the access of the process output channel(s) using the `.out` attribute. This can be used then as any other Nextflow channel variable in your pipeline script.\n\nThe `include` statement gives also the possibility to include only a [specific process](https://www.nextflow.io/docs/edge/dsl2.html#selective-inclusion)\nor to include a process with a different [name alias](https://www.nextflow.io/docs/edge/dsl2.html#module-aliases).\n\n### Smart channel forking\n\nOne of the most important changes of the new syntax is that any channel can be read as many\ntimes as you need removing the requirement to duplicate them using the `into` operator.\n\nFor example, in the above snippet, the `read_pairs_ch` channel has been used twice, as input both for the `FASTQC` and `QUANT` processes. Nextflow forks it behind the scene for you.\n\nThis makes the writing of workflow scripts much more fluent, readable and ... fun! No more channel names proliferation!\n\n### Nextflow pipes!\n\nFinally, maybe our favourite one. The new DSL introduces the `|` (pipe) operator which allows for the composition\nof Nextflow channels, processes and operators together seamlessly in a much more expressive way.\n\nConsider the following example:\n\n```\nprocess align {\n input:\n file seq\n output:\n file 'result'\n\n \"\"\"\n t_coffee -in=${seq} -out result\n \"\"\"\n}\n\nChannel.fromPath(params.in) | splitFasta | align | view { it.text }\n```\n\nIn the last line, the `fromPath` channel is piped to the [`splitFasta`](https://www.nextflow.io/docs/latest/operator.html#splitfasta) operator whose result is used as input by\nthe `align` process. Then the output is finally printed by the [`view`](https://www.nextflow.io/docs/latest/operator.html#view)\noperator.\n\nThis syntax finally realizes the Nextflow vision of empowering developers to write\ncomplex data analysis applications with a simple but powerful language that mimics\nthe expressiveness of the Unix pipe model but at the same time makes it possible to\nhandle complex data structures and patterns as is required for highly\nparallelised and distributed computational workflows.\n\n#### Conclusion\n\nThis wave of improvements brings a radically new experience when it comes to\nwriting Nextflow workflows. We are releasing it as a preview technology to allow\nusers to try, test, provide their feedback and give us the possibility\nstabilise it.\n\nWe are also working to other important enhancements that will be included soon,\nsuch as remote modules, sub-workflows composition, simplified file path\nwrangling and more. Stay tuned!", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,release,modules,dsl2" @@ -275,7 +275,7 @@ "slug": "2019/release-19.03.0-edge", "title": "Edge release 19.03: The Sequence Read Archive & more!", "date": "2019-03-19T00:00:00.000Z", - "content": "\nIt's time for the monthly Nextflow release for March, _edge_ version 19.03. This is another great release with some cool new features, bug fixes and improvements.\n\n### SRA channel factory\n\nThis sees the introduction of the long-awaited sequence read archive (SRA) channel factory.\nThe [SRA](https://www.ncbi.nlm.nih.gov/sra) is a key public repository for sequencing data and run in coordination between The National Center for\nBiotechnology Information (NCBI), The European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).\n\nThis feature originates all the way back in [2015](https://github.com/nextflow-io/nextflow/issues/89) and was worked on during a 2018 Nextflow hackathon. It was brought to fore again thanks to the release of Phil Ewels' excellent [SRA Explorer](https://ewels.github.io/sra-explorer/). The SRA channel factory allows users to pull read data in FASTQ format directly from SRA by referencing a study, accession ID or even a keyword. It works in a similar way to [`fromFilePairs`](https://www.nextflow.io/docs/latest/channel.html#fromfilepairs), returning a sample ID and files (single or pairs of files) for each sample.\n\nThe code snippet below creates a channel containing 24 samples from a chromatin dynamics study and runs FASTQC on the resulting files.\n\n```\nChannel\n .fromSRA('SRP043510')\n .set{reads}\n\nprocess fastqc {\n input:\n set sample_id, file(reads_file) from reads\n\n output:\n file(\"fastqc_${sample_id}_logs\") into fastqc_ch\n\n script:\n \"\"\"\n mkdir fastqc_${sample_id}_logs\n fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads_file}\n \"\"\"\n}\n```\n\nSee the [documentation](https://www.nextflow.io/docs/edge/channel.html#fromsra) for more details. When combined with downstream processes, you can quickly open a firehose of data on your workflow!\n\n### Edge release\n\nNote that this is a monthly edge release. To use it simply execute the following command prior to running Nextflow:\n\n```\nexport NXF_VER=19.03.0-edge\n```\n\n### If you need help\n\nPlease don’t hesitate to use our very active [Gitter](https://gitter.im/nextflow-io/nextflow) channel or create a thread in the [Google discussion group](https://groups.google.com/forum/#!forum/nextflow).\n\n### Reporting Issues\n\nExperiencing issues introduced by this release? Please report them in our [issue tracker](https://github.com/nextflow-io/nextflow/issues). Make sure to fill in the fields of the issue template.\n\n### Contributions\n\nSpecial thanks to the contributors of this release:\n\n- Akira Sekiguchi - [pachiras](https://github.com/pachiras)\n- Jon Haitz Legarreta Gorroño - [jhlegarreta](https://github.com/jhlegarreta)\n- Jonathan Leitschuh - [JLLeitschuh](https://github.com/JLLeitschuh)\n- Kevin Sayers - [KevinSayers](https://github.com/KevinSayers)\n- Lukas Jelonek - [lukasjelonek](https://github.com/lukasjelonek)\n- Paolo Di Tommaso - [pditommaso](https://github.com/pditommaso)\n- Toni Hermoso Pulido - [toniher](https://github.com/toniher)\n- Philippe Hupé [phupe](https://github.com/phupe)\n- [phue](https://github.com/phue)\n\n### Complete changes\n\n- Fix Nextflow hangs submitting jobs to AWS batch #1024\n- Fix process builder incomplete output [2fe1052c]\n- Fix Grid executor reports invalid queue status #1045\n- Fix Script execute permission is lost in container #1060\n- Fix K8s serviceAccount is not honoured #1049\n- Fix K8s kuberun login path #1072\n- Fix K8s imagePullSecret and imagePullPolicy #1062\n- Fix Google Storage docs #1023\n- Fix Env variable NXF_CONDA_CACHEDIR is ignored #1051\n- Fix failing task due to legacy sleep command [3e150b56]\n- Fix SplitText operator should accept a closure parameter #1021\n- Add Channel.fromSRA factory method #1070\n- Add voluntary/involuntary context switches to metrics #1047\n- Add noHttps option to singularity config #1041\n- Add docker-daemon Singularity support #1043 [dfef1391]\n- Use peak_vmem and peak_rss as default output in the trace file instead of rss and vmem #1020\n- Improve ansi log rendering #996 [33038a18]\n\n### Breaking changes:\n\nNone known.\n", + "content": "It's time for the monthly Nextflow release for March, _edge_ version 19.03. This is another great release with some cool new features, bug fixes and improvements.\n\n### SRA channel factory\n\nThis sees the introduction of the long-awaited sequence read archive (SRA) channel factory.\nThe [SRA](https://www.ncbi.nlm.nih.gov/sra) is a key public repository for sequencing data and run in coordination between The National Center for\nBiotechnology Information (NCBI), The European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).\n\nThis feature originates all the way back in [2015](https://github.com/nextflow-io/nextflow/issues/89) and was worked on during a 2018 Nextflow hackathon. It was brought to fore again thanks to the release of Phil Ewels' excellent [SRA Explorer](https://ewels.github.io/sra-explorer/). The SRA channel factory allows users to pull read data in FASTQ format directly from SRA by referencing a study, accession ID or even a keyword. It works in a similar way to [`fromFilePairs`](https://www.nextflow.io/docs/latest/channel.html#fromfilepairs), returning a sample ID and files (single or pairs of files) for each sample.\n\nThe code snippet below creates a channel containing 24 samples from a chromatin dynamics study and runs FASTQC on the resulting files.\n\n```\nChannel\n .fromSRA('SRP043510')\n .set{reads}\n\nprocess fastqc {\n input:\n set sample_id, file(reads_file) from reads\n\n output:\n file(\"fastqc_${sample_id}_logs\") into fastqc_ch\n\n script:\n \"\"\"\n mkdir fastqc_${sample_id}_logs\n fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads_file}\n \"\"\"\n}\n```\n\nSee the [documentation](https://www.nextflow.io/docs/edge/channel.html#fromsra) for more details. When combined with downstream processes, you can quickly open a firehose of data on your workflow!\n\n### Edge release\n\nNote that this is a monthly edge release. To use it simply execute the following command prior to running Nextflow:\n\n```\nexport NXF_VER=19.03.0-edge\n```\n\n### If you need help\n\nPlease don’t hesitate to use our very active [Gitter](https://gitter.im/nextflow-io/nextflow) channel or create a thread in the [Google discussion group](https://groups.google.com/forum/#!forum/nextflow).\n\n### Reporting Issues\n\nExperiencing issues introduced by this release? Please report them in our [issue tracker](https://github.com/nextflow-io/nextflow/issues). Make sure to fill in the fields of the issue template.\n\n### Contributions\n\nSpecial thanks to the contributors of this release:\n\n- Akira Sekiguchi - [pachiras](https://github.com/pachiras)\n- Jon Haitz Legarreta Gorroño - [jhlegarreta](https://github.com/jhlegarreta)\n- Jonathan Leitschuh - [JLLeitschuh](https://github.com/JLLeitschuh)\n- Kevin Sayers - [KevinSayers](https://github.com/KevinSayers)\n- Lukas Jelonek - [lukasjelonek](https://github.com/lukasjelonek)\n- Paolo Di Tommaso - [pditommaso](https://github.com/pditommaso)\n- Toni Hermoso Pulido - [toniher](https://github.com/toniher)\n- Philippe Hupé [phupe](https://github.com/phupe)\n- [phue](https://github.com/phue)\n\n### Complete changes\n\n- Fix Nextflow hangs submitting jobs to AWS batch #1024\n- Fix process builder incomplete output [2fe1052c]\n- Fix Grid executor reports invalid queue status #1045\n- Fix Script execute permission is lost in container #1060\n- Fix K8s serviceAccount is not honoured #1049\n- Fix K8s kuberun login path #1072\n- Fix K8s imagePullSecret and imagePullPolicy #1062\n- Fix Google Storage docs #1023\n- Fix Env variable NXF_CONDA_CACHEDIR is ignored #1051\n- Fix failing task due to legacy sleep command [3e150b56]\n- Fix SplitText operator should accept a closure parameter #1021\n- Add Channel.fromSRA factory method #1070\n- Add voluntary/involuntary context switches to metrics #1047\n- Add noHttps option to singularity config #1041\n- Add docker-daemon Singularity support #1043 [dfef1391]\n- Use peak_vmem and peak_rss as default output in the trace file instead of rss and vmem #1020\n- Improve ansi log rendering #996 [33038a18]\n\n### Breaking changes:\n\nNone known.", "images": [], "author": "Evan Floden", "tags": "nextflow,release" @@ -284,7 +284,7 @@ "slug": "2019/release-19.04.0-stable", "title": "Nextflow 19.04.0 stable release is out!", "date": "2019-04-18T00:00:00.000Z", - "content": "\nWe are excited to announce the new Nextflow 19.04.0 stable release!\n\nThis version includes numerous bug fixes, enhancement and new features.\n\n#### Rich logging\n\nIn this release, we are making the new interactive rich output using ANSI escape characters as the default logging option. This produces a much more readable and easy to follow log of the running workflow execution.\n\n\n\nThe ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the `-bg` option. It can also be explicitly disabled using the `-ansi-log false` option or setting the `NXF_ANSI_LOG=false` variable in your launching environment.\n\n#### NCBI SRA data source\n\nThe support for NCBI SRA archive was introduced in the [previous edge release](/blog/2019/release-19.03.0-edge.html). Given the very positive reaction, we are graduating this feature into the stable release for general availability.\n\n#### Sharing\n\nThis version includes also a new Git repository provider for the [Gitea](https://gitea.io) self-hosted source code management system, which is added to the already existing support for GitHub, Bitbucket and GitLab sharing platforms.\n\n#### Reports and metrics\n\nFinally, this version includes important enhancements and bug fixes for the task executions metrics collected by Nextflow. If you are using this feature we strongly suggest updating Nextflow to this version.\n\nRemember that updating can be done with the `nextflow -self-update` command.\n\n### Changelog\n\nThe complete list of changes and bug fixes is available on GitHub at [this link](https://github.com/nextflow-io/nextflow/releases/tag/v19.04.0).\n\n### Contributions\n\nSpecial thanks to all people contributed to this release by reporting issues, improving the docs or submitting (patiently) a pull request (sorry if we have missed somebody):\n\n- [Alex Cerjanic](https://github.com/acerjanic)\n- [Anthony Underwood](https://github.com/aunderwo)\n- [Akira Sekiguchi](https://github.com/pachiras)\n- [Bill Flynn](https://github.com/wflynny)\n- [Jorrit Boekel](https://github.com/glormph)\n- [Olga Botvinnik](https://github.com/olgabot)\n- [Ólafur Haukur Flygenring](https://github.com/olifly)\n- [Sven Fillinger](https://github.com/sven1103)\n", + "content": "We are excited to announce the new Nextflow 19.04.0 stable release!\n\nThis version includes numerous bug fixes, enhancement and new features.\n\n#### Rich logging\n\nIn this release, we are making the new interactive rich output using ANSI escape characters as the default logging option. This produces a much more readable and easy to follow log of the running workflow execution.\n\n\n\nThe ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the `-bg` option. It can also be explicitly disabled using the `-ansi-log false` option or setting the `NXF_ANSI_LOG=false` variable in your launching environment.\n\n#### NCBI SRA data source\n\nThe support for NCBI SRA archive was introduced in the [previous edge release](/blog/2019/release-19.03.0-edge.html). Given the very positive reaction, we are graduating this feature into the stable release for general availability.\n\n#### Sharing\n\nThis version includes also a new Git repository provider for the [Gitea](https://gitea.io) self-hosted source code management system, which is added to the already existing support for GitHub, Bitbucket and GitLab sharing platforms.\n\n#### Reports and metrics\n\nFinally, this version includes important enhancements and bug fixes for the task executions metrics collected by Nextflow. If you are using this feature we strongly suggest updating Nextflow to this version.\n\nRemember that updating can be done with the `nextflow -self-update` command.\n\n### Changelog\n\nThe complete list of changes and bug fixes is available on GitHub at [this link](https://github.com/nextflow-io/nextflow/releases/tag/v19.04.0).\n\n### Contributions\n\nSpecial thanks to all people contributed to this release by reporting issues, improving the docs or submitting (patiently) a pull request (sorry if we have missed somebody):\n\n- [Alex Cerjanic](https://github.com/acerjanic)\n- [Anthony Underwood](https://github.com/aunderwo)\n- [Akira Sekiguchi](https://github.com/pachiras)\n- [Bill Flynn](https://github.com/wflynny)\n- [Jorrit Boekel](https://github.com/glormph)\n- [Olga Botvinnik](https://github.com/olgabot)\n- [Ólafur Haukur Flygenring](https://github.com/olifly)\n- [Sven Fillinger](https://github.com/sven1103)", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,release,stable" @@ -293,7 +293,7 @@ "slug": "2019/troubleshooting-nextflow-resume", "title": "Troubleshooting Nextflow resume", "date": "2019-07-01T00:00:00.000Z", - "content": "\n_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part one [here](/blog/2019/demystifying-nextflow-resume.html)_.\n\n### Troubleshooting resume\n\nIf your workflow execution is not resumed as expected, there exists several strategies to debug the problem.\n\n#### Modified input file(s)\n\nMake sure that there has been no change in your input files. Don’t forget the unique task hash is computed by taking into account the complete file path, the last modified timestamp and the file size. If any of these change, the workflow will be re-executed, even if the input content is the same.\n\n#### A process modifying one or more inputs\n\nA process should never alter input files. When this happens, the future execution of tasks will be invalidated for the same reason explained in the previous point.\n\n#### Inconsistent input file attributes\n\nSome shared file system, such as NFS, may report inconsistent file timestamp i.e. a different timestamp for the same file even if it has not been modified. There is an option to use the [lenient mode of caching](https://www.nextflow.io/docs/latest/process.html#cache) to avoid this problem.\n\n#### Race condition in a global variable\n\nNextflow does its best to simplify parallel programming and to prevent race conditions and the access of shared resources. One of the few cases in which a race condition may arise is when using a global variable with two (or more) operators. For example:\n\n```\nChannel\n .from(1,2,3)\n .map { it -> X=it; X+=2 }\n .println { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> X=it; X*=2 }\n .println { \"ch2 = $it\" }\n```\n\nThe problem with this snippet is that the `X` variable in the closure definition is defined in the global scope. Since operators are executed in parallel, the `X` value can, therefore, be overwritten by the other `map` invocation.\n\nThe correct implementation requires the use of the `def` keyword to declare the variable local.\n\n```\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X+=2 }\n .view { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X*=2 }\n .view { \"ch2 = $it\" }\n```\n\n#### Non-deterministic input channels\n\nWhile dataflow channel ordering is guaranteed i.e. data is read in the same order in which it’s written in the channel, when a process declares as input two or more channels, each of which is the output of a different process, the overall input ordering is not consistent across different executions.\n\nConsider the following snippet:\n\n```\nprocess foo {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bam') into bam_ch\n \"\"\"\n your_command --here\n \"\"\"\n}\n\nprocess bar {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bai') into bai_ch\n \"\"\"\n other_command --here\n \"\"\"\n}\n\nprocess gather {\n input:\n set val(pair), file(bam) from bam_ch\n set val(pair), file(bai) from bai_ch\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\nThe inputs declared in the gather process can be delivered in any order as the execution order of the process `foo` and `bar` is not deterministic due to parallel executions.\n\nTherefore, the input of the third process needs to be synchronized using the `join` operator or a similar approach. The third process should be written as:\n\n```\nprocess gather {\n input:\n set val(pair), file(bam), file(bai) from bam_ch.join(bai_ch)\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\n#### Still in trouble?\n\nThese are most frequent causes of problems with the Nextflow resume mechanism. If you are still not able to resolve\nyour problem, identify the first process not resuming correctly, then run your script twice using `-dump-hashes`. You can then compare the resulting `.nextflow.log` files (the first will be named `.nextflow.log.1`).\n\nUnfortunately, the information reported by `-dump-hashes` can be quite cryptic, however, with the help of a good _diff_ tool it is possible to compare the two log files to identify the reason for the cache to be invalidated.\n\n#### The golden rule\n\nNever try to debug this kind of problem with production data! This issue can be annoying, but when it happens\nit should be able to be replicated in a consistent manner with any data.\n\nTherefore, we always suggest Nextflow developers include in their pipeline project\na small synthetic dataset to easily execute and test the complete pipeline execution in a few seconds.\nThis is the golden rule for debugging and troubleshooting execution problems avoids getting stuck with production data.\n\n#### Resume by default?\n\nGiven the majority of users always apply resume, we recently discussed having resume applied by the default.\n\nIs there any situation where you do not use resume? Would a flag specifying `-no-cache` be enough to satisfy these use cases?\n\nWe want to hear your thoughts on this. Help steer Nextflow development and vote in the twitter poll below.\n\n

Should -resume⏯️ be the default when launching a Nextflow pipeline?

— Nextflow (@nextflowio) July 1, 2019
\n\n\n
\n*In the following post of this series, we will show how to produce a provenance report using a built-in Nextflow command.*\n", + "content": "_This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part one [here](/blog/2019/demystifying-nextflow-resume.html)_.\n\n### Troubleshooting resume\n\nIf your workflow execution is not resumed as expected, there exists several strategies to debug the problem.\n\n#### Modified input file(s)\n\nMake sure that there has been no change in your input files. Don’t forget the unique task hash is computed by taking into account the complete file path, the last modified timestamp and the file size. If any of these change, the workflow will be re-executed, even if the input content is the same.\n\n#### A process modifying one or more inputs\n\nA process should never alter input files. When this happens, the future execution of tasks will be invalidated for the same reason explained in the previous point.\n\n#### Inconsistent input file attributes\n\nSome shared file system, such as NFS, may report inconsistent file timestamp i.e. a different timestamp for the same file even if it has not been modified. There is an option to use the [lenient mode of caching](https://www.nextflow.io/docs/latest/process.html#cache) to avoid this problem.\n\n#### Race condition in a global variable\n\nNextflow does its best to simplify parallel programming and to prevent race conditions and the access of shared resources. One of the few cases in which a race condition may arise is when using a global variable with two (or more) operators. For example:\n\n```\nChannel\n .from(1,2,3)\n .map { it -> X=it; X+=2 }\n .println { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> X=it; X*=2 }\n .println { \"ch2 = $it\" }\n```\n\nThe problem with this snippet is that the `X` variable in the closure definition is defined in the global scope. Since operators are executed in parallel, the `X` value can, therefore, be overwritten by the other `map` invocation.\n\nThe correct implementation requires the use of the `def` keyword to declare the variable local.\n\n```\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X+=2 }\n .view { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X*=2 }\n .view { \"ch2 = $it\" }\n```\n\n#### Non-deterministic input channels\n\nWhile dataflow channel ordering is guaranteed i.e. data is read in the same order in which it’s written in the channel, when a process declares as input two or more channels, each of which is the output of a different process, the overall input ordering is not consistent across different executions.\n\nConsider the following snippet:\n\n```\nprocess foo {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bam') into bam_ch\n \"\"\"\n your_command --here\n \"\"\"\n}\n\nprocess bar {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bai') into bai_ch\n \"\"\"\n other_command --here\n \"\"\"\n}\n\nprocess gather {\n input:\n set val(pair), file(bam) from bam_ch\n set val(pair), file(bai) from bai_ch\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\nThe inputs declared in the gather process can be delivered in any order as the execution order of the process `foo` and `bar` is not deterministic due to parallel executions.\n\nTherefore, the input of the third process needs to be synchronized using the `join` operator or a similar approach. The third process should be written as:\n\n```\nprocess gather {\n input:\n set val(pair), file(bam), file(bai) from bam_ch.join(bai_ch)\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}\n```\n\n#### Still in trouble?\n\nThese are most frequent causes of problems with the Nextflow resume mechanism. If you are still not able to resolve\nyour problem, identify the first process not resuming correctly, then run your script twice using `-dump-hashes`. You can then compare the resulting `.nextflow.log` files (the first will be named `.nextflow.log.1`).\n\nUnfortunately, the information reported by `-dump-hashes` can be quite cryptic, however, with the help of a good _diff_ tool it is possible to compare the two log files to identify the reason for the cache to be invalidated.\n\n#### The golden rule\n\nNever try to debug this kind of problem with production data! This issue can be annoying, but when it happens\nit should be able to be replicated in a consistent manner with any data.\n\nTherefore, we always suggest Nextflow developers include in their pipeline project\na small synthetic dataset to easily execute and test the complete pipeline execution in a few seconds.\nThis is the golden rule for debugging and troubleshooting execution problems avoids getting stuck with production data.\n\n#### Resume by default?\n\nGiven the majority of users always apply resume, we recently discussed having resume applied by the default.\n\nIs there any situation where you do not use resume? Would a flag specifying `-no-cache` be enough to satisfy these use cases?\n\nWe want to hear your thoughts on this. Help steer Nextflow development and vote in the twitter poll below.\n\n> Should -resume⏯️ be the default when launching a Nextflow pipeline?\n> \n> — Nextflow (@nextflowio) [July 1, 2019](https://twitter.com/nextflowio/status/1145599932268785665?ref_src=twsrc%5Etfw)\n\n\n\n
\n*In the following post of this series, we will show how to produce a provenance report using a built-in Nextflow command.*", "images": [], "author": "Evan Floden", "tags": "nextflow,resume" @@ -302,7 +302,7 @@ "slug": "2020/cli-docs-release", "title": "The Nextflow CLI - tricks and treats!", "date": "2020-10-22T00:00:00.000Z", - "content": "\nFor most developers, the command line is synonymous with agility. While tools such as [Nextflow Tower](https://tower.nf) are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the [Nextflow website](https://www.nextflow.io/docs/edge/cli.html).\n\nAnd given Halloween is just around the corner, in this blog post we'll take a look at 5 CLI tricks and examples which will make your life easier in designing, executing and debugging data pipelines. We are also giving away 5 limited-edition Nextflow hoodies and sticker packs so you can code in style this Halloween season!\n\n### 1. Invoke a remote pipeline execution with the latest revision\n\nNextflow facilitates easy collaboration and re-use of existing pipelines in multiple ways. One of the simplest ways to do this is to use the URL of the Git repository.\n\n```\n$ nextflow run https://www.github.com/nextflow-io/hello\n```\n\nWhen executing a pipeline using the run command, it first checks to see if it has been previously downloaded in the ~/.nextflow/assets directory, and if so, Nextflow uses this to execute the pipeline. If the pipeline is not already cached, Nextflow will download it, store it in the `$HOME/.nextflow/` directory and then launch the execution.\n\nHow can we make sure that we always run the latest code from the remote pipeline? We simply need to add the `-latest` option to the run command, and Nextflow takes care of the rest.\n\n```\n$ nextflow run nextflow-io/hello -latest\n```\n\n### 2. Query work directories for a specific execution\n\nFor every invocation of Nextflow, all the metadata about an execution is stored including task directories, completion status and time etc. We can use the `nextflow log` command to generate a summary of this information for a specific run.\n\nTo see a list of work directories associated with a particular execution (for example, `tiny_leavitt`), use:\n\n```\n$ nextflow log tiny_leavitt\n```\n\nTo filter out specific process-level information from the logs of any execution, we simply need to use the fields (-f) option and specify the fields.\n\n```\n$ nextflow log tiny_leavitt –f 'process, hash, status, duration'\n```\n\nThe hash is the name of the work directory where the process was executed; therefore, the location of a process work directory would be something like `work/74/68ff183`.\n\nThe log command also has other child options including `-before` and `-after` to help with the chronological inspection of logs.\n\n### 3. Top-level configuration\n\nNextflow emphasizes customization of pipelines and exposes multiple options to facilitate this. The configuration is applied to multiple Nextflow commands and is therefore a top-level option. In practice, this means specifying configuration options _before_ the command.\n\nNextflow CLI provides two kinds of config overrides - the soft override and the hard override.\n\nThe top-level soft override \"-c\" option allows us to change the previous config in an additive manner, overriding only the fields included the configuration file.\n\n```\n$ nextflow -c my.config run nextflow-io/hello\n```\n\nOn the other hand, the hard override `-C` completely replaces and ignores any additional configurations.\n\n $ nextflow –C my.config nextflow-io/hello\n\nMoreover, we can also use the config command to inspect the final inferred configuration and view any profiles.\n\n```\n$ nextflow config -show-profiles\n```\n\n### 4. Passing in an input parameter file\n\nNextflow is designed to work across both research and production settings. In production especially, specifying multiple parameters for the pipeline on the command line becomes cumbersome. In these cases, environment variables or config files are commonly used which contain all input files, options and metadata. Love them or hate them, YAML and JSON are the standard formats for human and machines, respectively.\n\nThe Nextflow run option `-params-file` can be used to pass in a file containing parameters in either format.\n\n```\n$ nextflow run nextflow-io/rnaseq -params-file run_42.yaml\n```\n\nThe YAML file could contain the following.\n\n```\nreads : \"s3://gatk-data/run_42/reads/*_R{1,2}_*.fastq.gz\"\nbwa_index : \"$baseDir/index/*.bwa-index.tar.gz\"\npaired_end : true\npenalty : 12\n```\n\n### 5. Specific workflow entry points\n\nThe recently released [DSL2](https://www.nextflow.io/blog/2020/dsl2-is-here.html) adds powerful modularity to Nextflow and enables scripts to contain multiple workflows. By default, the unnamed workflow is assumed to be the main entry point for the script, however, with numerous named workflows, the entry point can be customized by using the `entry` child-option of the run command.\n\n $ nextflow run main.nf -entry workflow1\n\nThis allows users to run a specific sub-workflow or a section of their entire workflow script. For more information, refer to the [implicit workflow](https://www.nextflow.io/docs/latest/dsl2.html#implicit-workflow) section of the documentation.\n\nAdditionally, as of version 20.09.1-edge, you can specify the script in a project to run other than `main.nf` using the command line option\n`-main-script`.\n\n $ nextflow run http://github.com/my/pipeline -main-script my-analysis.nf\n\n### Bonus trick! Web dashboard launched from the CLI\n\nThe tricks above highlight the functionality of the Nextflow CLI. However, for long-running workflows, monitoring becomes all the more crucial. With Nextflow Tower, we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow execution wherever we are. Sign-in to [Tower](https://tower.nf) using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc` or include them in your `nextflow.config`.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n$ export NXF_VER=20.07.1\n```\n\nNext simply add the \"-with-tower\" child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/hello -with-tower\n```\n\n### Nextflow Giveaway\n\nIf you want to look stylish while you put the above tips into practice, or simply like free stuff, we are giving away five of our latest Nextflow hoodie and sticker packs. Retweet or like the Nextflow tweet about this article and we will draw and notify the winners on October 31st!\n\n### About the Author\n\n[Abhinav Sharma](https://www.linkedin.com/in/abhi18av/) is a Bioinformatics Engineer at [Seqera Labs](https://www.seqera.io) interested in Data Science and Cloud Engineering. He enjoys working on all things Genomics, Bioinformatics and Nextflow.\n\n### Acknowledgements\n\nShout out to [Kevin Sayers](https://github.com/KevinSayers) and [Alexander Peltzer](https://github.com/apeltzer) for their earlier efforts in documenting the CLI and which inspired this work.\n\n_The latest CLI docs can be found in the edge release docs at [https://www.nextflow.io/docs/latest/cli.html](https://www.nextflow.io/docs/latest/cli.html)._\n", + "content": "For most developers, the command line is synonymous with agility. While tools such as [Nextflow Tower](https://tower.nf) are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the [Nextflow website](https://www.nextflow.io/docs/edge/cli.html).\n\nAnd given Halloween is just around the corner, in this blog post we'll take a look at 5 CLI tricks and examples which will make your life easier in designing, executing and debugging data pipelines. We are also giving away 5 limited-edition Nextflow hoodies and sticker packs so you can code in style this Halloween season!\n\n### 1. Invoke a remote pipeline execution with the latest revision\n\nNextflow facilitates easy collaboration and re-use of existing pipelines in multiple ways. One of the simplest ways to do this is to use the URL of the Git repository.\n\n```\n$ nextflow run https://www.github.com/nextflow-io/hello\n```\n\nWhen executing a pipeline using the run command, it first checks to see if it has been previously downloaded in the ~/.nextflow/assets directory, and if so, Nextflow uses this to execute the pipeline. If the pipeline is not already cached, Nextflow will download it, store it in the `$HOME/.nextflow/` directory and then launch the execution.\n\nHow can we make sure that we always run the latest code from the remote pipeline? We simply need to add the `-latest` option to the run command, and Nextflow takes care of the rest.\n\n```\n$ nextflow run nextflow-io/hello -latest\n```\n\n### 2. Query work directories for a specific execution\n\nFor every invocation of Nextflow, all the metadata about an execution is stored including task directories, completion status and time etc. We can use the `nextflow log` command to generate a summary of this information for a specific run.\n\nTo see a list of work directories associated with a particular execution (for example, `tiny_leavitt`), use:\n\n```\n$ nextflow log tiny_leavitt\n```\n\nTo filter out specific process-level information from the logs of any execution, we simply need to use the fields (-f) option and specify the fields.\n\n```\n$ nextflow log tiny_leavitt –f 'process, hash, status, duration'\n```\n\nThe hash is the name of the work directory where the process was executed; therefore, the location of a process work directory would be something like `work/74/68ff183`.\n\nThe log command also has other child options including `-before` and `-after` to help with the chronological inspection of logs.\n\n### 3. Top-level configuration\n\nNextflow emphasizes customization of pipelines and exposes multiple options to facilitate this. The configuration is applied to multiple Nextflow commands and is therefore a top-level option. In practice, this means specifying configuration options _before_ the command.\n\nNextflow CLI provides two kinds of config overrides - the soft override and the hard override.\n\nThe top-level soft override \"-c\" option allows us to change the previous config in an additive manner, overriding only the fields included the configuration file.\n\n```\n$ nextflow -c my.config run nextflow-io/hello\n```\n\nOn the other hand, the hard override `-C` completely replaces and ignores any additional configurations.\n\n $ nextflow –C my.config nextflow-io/hello\n\nMoreover, we can also use the config command to inspect the final inferred configuration and view any profiles.\n\n```\n$ nextflow config -show-profiles\n```\n\n### 4. Passing in an input parameter file\n\nNextflow is designed to work across both research and production settings. In production especially, specifying multiple parameters for the pipeline on the command line becomes cumbersome. In these cases, environment variables or config files are commonly used which contain all input files, options and metadata. Love them or hate them, YAML and JSON are the standard formats for human and machines, respectively.\n\nThe Nextflow run option `-params-file` can be used to pass in a file containing parameters in either format.\n\n```\n$ nextflow run nextflow-io/rnaseq -params-file run_42.yaml\n```\n\nThe YAML file could contain the following.\n\n```\nreads : \"s3://gatk-data/run_42/reads/*_R{1,2}_*.fastq.gz\"\nbwa_index : \"$baseDir/index/*.bwa-index.tar.gz\"\npaired_end : true\npenalty : 12\n```\n\n### 5. Specific workflow entry points\n\nThe recently released [DSL2](https://www.nextflow.io/blog/2020/dsl2-is-here.html) adds powerful modularity to Nextflow and enables scripts to contain multiple workflows. By default, the unnamed workflow is assumed to be the main entry point for the script, however, with numerous named workflows, the entry point can be customized by using the `entry` child-option of the run command.\n\n $ nextflow run main.nf -entry workflow1\n\nThis allows users to run a specific sub-workflow or a section of their entire workflow script. For more information, refer to the [implicit workflow](https://www.nextflow.io/docs/latest/dsl2.html#implicit-workflow) section of the documentation.\n\nAdditionally, as of version 20.09.1-edge, you can specify the script in a project to run other than `main.nf` using the command line option\n`-main-script`.\n\n $ nextflow run http://github.com/my/pipeline -main-script my-analysis.nf\n\n### Bonus trick! Web dashboard launched from the CLI\n\nThe tricks above highlight the functionality of the Nextflow CLI. However, for long-running workflows, monitoring becomes all the more crucial. With Nextflow Tower, we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow execution wherever we are. Sign-in to [Tower](https://tower.nf) using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc` or include them in your `nextflow.config`.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n$ export NXF_VER=20.07.1\n```\n\nNext simply add the \"-with-tower\" child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/hello -with-tower\n```\n\n### Nextflow Giveaway\n\nIf you want to look stylish while you put the above tips into practice, or simply like free stuff, we are giving away five of our latest Nextflow hoodie and sticker packs. Retweet or like the Nextflow tweet about this article and we will draw and notify the winners on October 31st!\n\n### About the Author\n\n[Abhinav Sharma](https://www.linkedin.com/in/abhi18av/) is a Bioinformatics Engineer at [Seqera Labs](https://www.seqera.io) interested in Data Science and Cloud Engineering. He enjoys working on all things Genomics, Bioinformatics and Nextflow.\n\n### Acknowledgements\n\nShout out to [Kevin Sayers](https://github.com/KevinSayers) and [Alexander Peltzer](https://github.com/apeltzer) for their earlier efforts in documenting the CLI and which inspired this work.\n\n_The latest CLI docs can be found in the edge release docs at [https://www.nextflow.io/docs/latest/cli.html](https://www.nextflow.io/docs/latest/cli.html)._", "images": [], "author": "Abhinav Sharma", "tags": "nextflow,docs" @@ -311,7 +311,7 @@ "slug": "2020/dsl2-is-here", "title": "Nextflow DSL 2 is here!", "date": "2020-07-24T00:00:00.000Z", - "content": "\nWe are thrilled to announce the stable release of Nextflow DSL 2 as part of the latest 20.07.1 version!\n\nNextflow DSL 2 represents a major evolution of the Nextflow language and makes it possible to scale and modularise your data analysis pipeline while continuing to use the Dataflow programming paradigm that characterises the Nextflow processing model.\n\nWe spent more than one year collecting user feedback and making sure that DSL 2 would naturally fit the programming experience Nextflow developers are used to.\n\n#### DLS 2 in a nutshell\n\nBackward compatibility is a paramount value, for this reason the changes introduced in the syntax have been minimal and above all, guarantee the support of all existing applications. DSL 2 will be an opt-in feature for at least the next 12 to 18 months. After this transitory period, we plan to make it the default Nextflow execution mode.\n\nAs of today, to use DSL 2 in your Nextflow pipeline, you are required to use the following declaration at the top of your script:\n\n```\nnextflow.enable.dsl=2\n```\n\nNote that the previous `nextflow.preview` directive is still available, however, when using the above declaration the use of the final syntax is enforced.\n\n#### Nextflow modules\n\nA module file is nothing more than a Nextflow script containing one or more `process` definitions that can be imported from another Nextflow script.\n\nThe only difference when compared with legacy syntax is that the process is not bound with specific input and output channels, as was previously required using the `from` and `into` keywords respectively. Consider this example of the new syntax:\n\n```\nprocess INDEX {\n input:\n path transcriptome\n output:\n path 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}\n```\n\nThis allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new `workflow` scope. This effectively allows for the composition of the pipeline logic and enables reuse of workflow components. We anticipate this to improve both the speed that users can develop new pipelines, and the robustness of these pipelines through the use of validated modules.\n\nAny process input can be provided as a function argument using the usual channel semantics familiar to Nextflow developers. Moreover process outputs can either be assigned to a variable or accessed using the implicit `.out` attribute in the scope implicitly defined by the process name itself. See the example below:\n\n```\ninclude { INDEX; FASTQC; QUANT; MULTIQC } from './some/module/script.nf'\n\nread_pairs_ch = channel.fromFilePairs( params.reads)\n\nworkflow {\n INDEX( params.transcriptome )\n FASTQC( read_pairs_ch )\n QUANT( INDEX.out, read_pairs_ch )\n MULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n}\n```\n\nAlso enhanced is the ability to use channels as inputs multiple times without the need to duplicate them (previously done with the special into operator) which makes the resulting pipeline code more concise, fluent and therefore readable!\n\n#### Sub-workflows\n\nNotably, the DSL 2 syntax allows for the definition of reusable processes as well as sub-workflow libraries. The only requirement is to provide a `workflow` name that will be used to reference and declare the corresponding inputs and outputs using the new `take` and `emit` keywords. For example:\n\n```\nworkflow RNASEQ {\n take:\n transcriptome\n read_pairs_ch\n\n main:\n INDEX(transcriptome)\n FASTQC(read_pairs_ch)\n QUANT(INDEX.out, read_pairs_ch)\n\n emit:\n QUANT.out.mix(FASTQC.out).collect()\n}\n```\n\nNow named sub-workflows can be used in the same way as processes, allowing you to easily include and reuse multi-step workflows as part of larger workflows. Find more details [here](/docs/latest/dsl2.html).\n\n#### More syntax sugar\n\nAnother exciting feature of Nextflow DSL 2 is the ability to compose built-in operators, pipeline processes and sub-workflows with the pipe (|) operator! For example the last line in the above example could be written as:\n\n```\nemit:\n QUANT.out | mix(FASTQC.out) | collect\n```\n\nThis syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.\n\nAnother change is the introduction of `channel` as an alternative name as a synonym of `Channel` type identifier and therefore allows the use of `channel.fromPath` instead of `Channel.fromPath` and so on. This is a small syntax sugar to keep the capitazionation consistent with the rest of the language.\n\nMoreover, several process inputs and outputs syntax shortcuts were removed when using the final version of DSL 2 to make it more predictable. For example, with DSL1, in a tuple input or output declaration the component type could be omitted, for example:\n\n```\ninput:\n tuple foo, 'bar'\n```\n\nThe `foo` identifier was implicitly considered an input value declaration instead the string `'bar'` was considered a shortcut for `file('bar')`. However, this was a bit confusing especially for new users and therefore using DSL 2, the fully qualified version must be used:\n\n```\ninput:\n tuple val(foo), path('bar')\n```\n\nYou can find more detailed migration notes at [this link](/docs/latest/dsl2.html#dsl2-migration-notes).\n\n#### What's next\n\nAs always, reaching an important project milestone can be viewed as a major success, but at the same time the starting point for challenges and developments. Having a modularization mechanism opens new needs and possibilities. The first one of which will be focused on the ability to test and validate process modules independently using a unit-testing style approach. This will definitely help to make the resulting pipelines more resilient.\n\nAnother important area for the development of the Nextflow language will be the ability to better formalise pipeline inputs and outputs and further decouple for the process declaration. Nextflow currently strongly relies on the `publishDir` constructor for the generation of the workflow outputs.\n\nHowever in the new _module_ world, this approach results in `publishDir` being tied to a single process definition. The plan is instead to extend this concept in a more general and abstract manner, so that it will be possible to capture and redirect the result of any process and sub-workflow based on semantic annotations instead of hardcoding it at the task level.\n\n### Conclusion\n\nWe are extremely excited about today's release. This was a long awaited advancement and therefore we are very happy to make it available for general availability to all Nextflow users. We greatly appreciate all of the community feedback and ideas over the past year which have shaped DSL 2.\n\nWe are confident this represents a big step forward for the project and will enable the writing of a more scalable and complex data analysis pipeline and above all, a more enjoyable experience.\n", + "content": "We are thrilled to announce the stable release of Nextflow DSL 2 as part of the latest 20.07.1 version!\n\nNextflow DSL 2 represents a major evolution of the Nextflow language and makes it possible to scale and modularise your data analysis pipeline while continuing to use the Dataflow programming paradigm that characterises the Nextflow processing model.\n\nWe spent more than one year collecting user feedback and making sure that DSL 2 would naturally fit the programming experience Nextflow developers are used to.\n\n#### DLS 2 in a nutshell\n\nBackward compatibility is a paramount value, for this reason the changes introduced in the syntax have been minimal and above all, guarantee the support of all existing applications. DSL 2 will be an opt-in feature for at least the next 12 to 18 months. After this transitory period, we plan to make it the default Nextflow execution mode.\n\nAs of today, to use DSL 2 in your Nextflow pipeline, you are required to use the following declaration at the top of your script:\n\n```\nnextflow.enable.dsl=2\n```\n\nNote that the previous `nextflow.preview` directive is still available, however, when using the above declaration the use of the final syntax is enforced.\n\n#### Nextflow modules\n\nA module file is nothing more than a Nextflow script containing one or more `process` definitions that can be imported from another Nextflow script.\n\nThe only difference when compared with legacy syntax is that the process is not bound with specific input and output channels, as was previously required using the `from` and `into` keywords respectively. Consider this example of the new syntax:\n\n```\nprocess INDEX {\n input:\n path transcriptome\n output:\n path 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}\n```\n\nThis allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new `workflow` scope. This effectively allows for the composition of the pipeline logic and enables reuse of workflow components. We anticipate this to improve both the speed that users can develop new pipelines, and the robustness of these pipelines through the use of validated modules.\n\nAny process input can be provided as a function argument using the usual channel semantics familiar to Nextflow developers. Moreover process outputs can either be assigned to a variable or accessed using the implicit `.out` attribute in the scope implicitly defined by the process name itself. See the example below:\n\n```\ninclude { INDEX; FASTQC; QUANT; MULTIQC } from './some/module/script.nf'\n\nread_pairs_ch = channel.fromFilePairs( params.reads)\n\nworkflow {\n INDEX( params.transcriptome )\n FASTQC( read_pairs_ch )\n QUANT( INDEX.out, read_pairs_ch )\n MULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n}\n```\n\nAlso enhanced is the ability to use channels as inputs multiple times without the need to duplicate them (previously done with the special into operator) which makes the resulting pipeline code more concise, fluent and therefore readable!\n\n#### Sub-workflows\n\nNotably, the DSL 2 syntax allows for the definition of reusable processes as well as sub-workflow libraries. The only requirement is to provide a `workflow` name that will be used to reference and declare the corresponding inputs and outputs using the new `take` and `emit` keywords. For example:\n\n```\nworkflow RNASEQ {\n take:\n transcriptome\n read_pairs_ch\n\n main:\n INDEX(transcriptome)\n FASTQC(read_pairs_ch)\n QUANT(INDEX.out, read_pairs_ch)\n\n emit:\n QUANT.out.mix(FASTQC.out).collect()\n}\n```\n\nNow named sub-workflows can be used in the same way as processes, allowing you to easily include and reuse multi-step workflows as part of larger workflows. Find more details [here](/docs/latest/dsl2.html).\n\n#### More syntax sugar\n\nAnother exciting feature of Nextflow DSL 2 is the ability to compose built-in operators, pipeline processes and sub-workflows with the pipe (|) operator! For example the last line in the above example could be written as:\n\n```\nemit:\n QUANT.out | mix(FASTQC.out) | collect\n```\n\nThis syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.\n\nAnother change is the introduction of `channel` as an alternative name as a synonym of `Channel` type identifier and therefore allows the use of `channel.fromPath` instead of `Channel.fromPath` and so on. This is a small syntax sugar to keep the capitazionation consistent with the rest of the language.\n\nMoreover, several process inputs and outputs syntax shortcuts were removed when using the final version of DSL 2 to make it more predictable. For example, with DSL1, in a tuple input or output declaration the component type could be omitted, for example:\n\n```\ninput:\n tuple foo, 'bar'\n```\n\nThe `foo` identifier was implicitly considered an input value declaration instead the string `'bar'` was considered a shortcut for `file('bar')`. However, this was a bit confusing especially for new users and therefore using DSL 2, the fully qualified version must be used:\n\n```\ninput:\n tuple val(foo), path('bar')\n```\n\nYou can find more detailed migration notes at [this link](/docs/latest/dsl2.html#dsl2-migration-notes).\n\n#### What's next\n\nAs always, reaching an important project milestone can be viewed as a major success, but at the same time the starting point for challenges and developments. Having a modularization mechanism opens new needs and possibilities. The first one of which will be focused on the ability to test and validate process modules independently using a unit-testing style approach. This will definitely help to make the resulting pipelines more resilient.\n\nAnother important area for the development of the Nextflow language will be the ability to better formalise pipeline inputs and outputs and further decouple for the process declaration. Nextflow currently strongly relies on the `publishDir` constructor for the generation of the workflow outputs.\n\nHowever in the new _module_ world, this approach results in `publishDir` being tied to a single process definition. The plan is instead to extend this concept in a more general and abstract manner, so that it will be possible to capture and redirect the result of any process and sub-workflow based on semantic annotations instead of hardcoding it at the task level.\n\n### Conclusion\n\nWe are extremely excited about today's release. This was a long awaited advancement and therefore we are very happy to make it available for general availability to all Nextflow users. We greatly appreciate all of the community feedback and ideas over the past year which have shaped DSL 2.\n\nWe are confident this represents a big step forward for the project and will enable the writing of a more scalable and complex data analysis pipeline and above all, a more enjoyable experience.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,release,modules,dsl2" @@ -320,7 +320,7 @@ "slug": "2020/groovy3-syntax-sugar", "title": "More syntax sugar for Nextflow developers!", "date": "2020-11-03T00:00:00.000Z", - "content": "\nThe latest Nextflow version 2020.10.0 is the first stable release running on Groovy 3.\n\nThe first benefit of this change is that now Nextflow can be compiled and run on any modern Java virtual machine,\nfrom Java 8, all the way up to the latest Java 15!\n\nAlong with this, the new Groovy runtime brings a whole lot of syntax enhancements that can be useful in\nthe everyday life of pipeline developers. Let's see them more in detail.\n\n### Improved not operator\n\nThe `!` (not) operator can now prefix the `in` and `instanceof` keywords.\nThis makes for more concise writing of some conditional expression, for example, the following snippet:\n\n```\nlist = [10,20,30]\n\nif( !(x in list) ) {\n // ..\n}\nelse if( !(x instanceof String) ) {\n // ..\n}\n```\n\ncould be replaced by the following:\n\n```\nlist = [10,20,30]\n\nif( x !in list ) {\n // ..\n}\nelse if( x !instanceof String ) {\n // ..\n}\n```\n\nAgain, this is a small syntax change which makes the code a little more\nreadable.\n\n### Elvis assignment operator\n\nThe elvis assignment operator `?=` allows the assignment of a value only if it was not\npreviously assigned (or if it evaluates to `null`). Consider the following example:\n\n```\ndef opts = [foo: 1]\n\nopts.foo ?= 10\nopts.bar ?= 20\n\nassert opts.foo == 1\nassert opts.bar == 20\n```\n\nIn this snippet, the assignment `opts.foo ?= 10` would be ignored because the dictionary `opts` already\ncontains a value for the `foo` attribute, while it is now assigned as expected.\n\nIn other words this is a shortcut for the following idiom:\n\n```\nif( some_variable != null ) {\n some_variable = 'Hello'\n}\n```\n\nIf you are wondering why it's called _Elvis_ assignment, well it's simple, because there's also the [Elvis operator](https://groovy-lang.org/operators.html#_elvis_operator) that you should know (and use!) already. 😆\n\n### Java style lambda expressions\n\nGroovy 3 supports the syntax for Java lambda expression. If you don't know what a Java lambda expression is\ndon't worry; it's a concept very similar to a Groovy closure, though with slight differences\nboth in the syntax and the semantic. In a few words, a Groovy closure can modify a variable in the outside scope,\nwhile a Java lambda cannot.\n\nIn terms of syntax, a Groovy closure is defined as:\n\n```\n{ it -> SOME_EXPRESSION_HERE }\n```\n\nWhile Java lambda expression looks like:\n\n```\nit -> { SOME_EXPRESSION_HERE }\n```\n\nwhich can be simplified to the following form when the expression is a single statement:\n\n```\nit -> SOME_EXPRESSION_HERE\n```\n\nThe good news is that the two syntaxes are interoperable in many cases and we can use the _lambda_\nsyntax to get rid-off of the curly bracket parentheses used by the Groovy notation to make our Nextflow\nscript more readable.\n\nFor example, the following Nextflow idiom:\n\n```\nChannel\n .of( 1,2,3 )\n .map { it * it +1 }\n .view { \"the value is $it\" }\n```\n\nCan be rewritten using the lambda syntax as:\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> it * it +1 )\n .view( it -> \"the value is $it\" )\n```\n\nIt is a bit more consistent. Note however that the `it ->` implicit argument is now mandatory (while when using the closure syntax it could be omitted). Also, when the operator argument is not _single_ value, the lambda requires the\nround parentheses to define the argument e.g.\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> tuple(it * it, it+1) )\n .view( (a,b) -> \"the values are $a and $b\" )\n```\n\n### Full support for Java streams API\n\nSince version 8, Java provides a [stream library](https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/) that is very powerful and implements some concepts and operators similar to Nextflow channels.\n\nThe main differences between the two are that Nextflow channels and the corresponding operators are _non-blocking_\ni.e. their evaluation is performed asynchronously without blocking your program execution, while Java streams are\nexecuted in a synchronous manner (at least by default).\n\nA Java stream looks like the following:\n\n```\nassert (1..10).stream()\n .filter(e -> e % 2 == 0)\n .map(e -> e * 2)\n .toList() == [4, 8, 12, 16, 20]\n\n```\n\nNote, in the above example\n[filter](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-),\n[map](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-) and\n[toList](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toList--)\nmethods are Java stream operator not the\n[Nextflow](https://www.nextflow.io/docs/latest/operator.html#filter)\n[homonymous](https://www.nextflow.io/docs/latest/operator.html#map)\n[ones](https://www.nextflow.io/docs/latest/operator.html#tolist).\n\n### Java style method reference\n\nThe new runtime also allows for the use of the `::` operator to reference an object method.\nThis can be useful to pass a method as an argument to a Nextflow operator in a similar\nmanner to how it was already possible using a closure. For example:\n\n```\nChannel\n .of( 'a', 'b', 'c')\n .view( String::toUpperCase )\n```\n\nThe above prints:\n\n```\n A\n B\n C\n```\n\nBecause to [view](https://www.nextflow.io/docs/latest/operator.html#filter) operator applied\nthe method [toUpperCase](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toUpperCase--)\nto each element emitted by the channel.\n\n### Conclusion\n\nThe new Groovy runtime brings a lot of syntax sugar for Nextflow pipelines and allows the use of modern Java\nruntime which delivers better performance and resource usage.\n\nThe ones listed above are only a small selection which may be useful to everyday Nextflow developers.\nIf you are curious to learn more about all the changes in the new Groovy parser you can find more details in\n[this link](https://groovy-lang.org/releasenotes/groovy-3.0.html).\n\nFinally, a big thanks to the Groovy community for their significant efforts in developing and maintaining this\ngreat programming environment.\n", + "content": "The latest Nextflow version 2020.10.0 is the first stable release running on Groovy 3.\n\nThe first benefit of this change is that now Nextflow can be compiled and run on any modern Java virtual machine,\nfrom Java 8, all the way up to the latest Java 15!\n\nAlong with this, the new Groovy runtime brings a whole lot of syntax enhancements that can be useful in\nthe everyday life of pipeline developers. Let's see them more in detail.\n\n### Improved not operator\n\nThe `!` (not) operator can now prefix the `in` and `instanceof` keywords.\nThis makes for more concise writing of some conditional expression, for example, the following snippet:\n\n```\nlist = [10,20,30]\n\nif( !(x in list) ) {\n // ..\n}\nelse if( !(x instanceof String) ) {\n // ..\n}\n```\n\ncould be replaced by the following:\n\n```\nlist = [10,20,30]\n\nif( x !in list ) {\n // ..\n}\nelse if( x !instanceof String ) {\n // ..\n}\n```\n\nAgain, this is a small syntax change which makes the code a little more\nreadable.\n\n### Elvis assignment operator\n\nThe elvis assignment operator `?=` allows the assignment of a value only if it was not\npreviously assigned (or if it evaluates to `null`). Consider the following example:\n\n```\ndef opts = [foo: 1]\n\nopts.foo ?= 10\nopts.bar ?= 20\n\nassert opts.foo == 1\nassert opts.bar == 20\n```\n\nIn this snippet, the assignment `opts.foo ?= 10` would be ignored because the dictionary `opts` already\ncontains a value for the `foo` attribute, while it is now assigned as expected.\n\nIn other words this is a shortcut for the following idiom:\n\n```\nif( some_variable != null ) {\n some_variable = 'Hello'\n}\n```\n\nIf you are wondering why it's called _Elvis_ assignment, well it's simple, because there's also the [Elvis operator](https://groovy-lang.org/operators.html#_elvis_operator) that you should know (and use!) already. 😆\n\n### Java style lambda expressions\n\nGroovy 3 supports the syntax for Java lambda expression. If you don't know what a Java lambda expression is\ndon't worry; it's a concept very similar to a Groovy closure, though with slight differences\nboth in the syntax and the semantic. In a few words, a Groovy closure can modify a variable in the outside scope,\nwhile a Java lambda cannot.\n\nIn terms of syntax, a Groovy closure is defined as:\n\n```\n{ it -> SOME_EXPRESSION_HERE }\n```\n\nWhile Java lambda expression looks like:\n\n```\nit -> { SOME_EXPRESSION_HERE }\n```\n\nwhich can be simplified to the following form when the expression is a single statement:\n\n```\nit -> SOME_EXPRESSION_HERE\n```\n\nThe good news is that the two syntaxes are interoperable in many cases and we can use the _lambda_\nsyntax to get rid-off of the curly bracket parentheses used by the Groovy notation to make our Nextflow\nscript more readable.\n\nFor example, the following Nextflow idiom:\n\n```\nChannel\n .of( 1,2,3 )\n .map { it * it +1 }\n .view { \"the value is $it\" }\n```\n\nCan be rewritten using the lambda syntax as:\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> it * it +1 )\n .view( it -> \"the value is $it\" )\n```\n\nIt is a bit more consistent. Note however that the `it ->` implicit argument is now mandatory (while when using the closure syntax it could be omitted). Also, when the operator argument is not _single_ value, the lambda requires the\nround parentheses to define the argument e.g.\n\n```\nChannel\n .of( 1,2,3 )\n .map( it -> tuple(it * it, it+1) )\n .view( (a,b) -> \"the values are $a and $b\" )\n```\n\n### Full support for Java streams API\n\nSince version 8, Java provides a [stream library](https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/) that is very powerful and implements some concepts and operators similar to Nextflow channels.\n\nThe main differences between the two are that Nextflow channels and the corresponding operators are _non-blocking_\ni.e. their evaluation is performed asynchronously without blocking your program execution, while Java streams are\nexecuted in a synchronous manner (at least by default).\n\nA Java stream looks like the following:\n\n```\nassert (1..10).stream()\n .filter(e -> e % 2 == 0)\n .map(e -> e * 2)\n .toList() == [4, 8, 12, 16, 20]\n\n```\n\nNote, in the above example\n[filter](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-),\n[map](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-) and\n[toList](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toList--)\nmethods are Java stream operator not the\n[Nextflow](https://www.nextflow.io/docs/latest/operator.html#filter)\n[homonymous](https://www.nextflow.io/docs/latest/operator.html#map)\n[ones](https://www.nextflow.io/docs/latest/operator.html#tolist).\n\n### Java style method reference\n\nThe new runtime also allows for the use of the `::` operator to reference an object method.\nThis can be useful to pass a method as an argument to a Nextflow operator in a similar\nmanner to how it was already possible using a closure. For example:\n\n```\nChannel\n .of( 'a', 'b', 'c')\n .view( String::toUpperCase )\n```\n\nThe above prints:\n\n```\n A\n B\n C\n```\n\nBecause to [view](https://www.nextflow.io/docs/latest/operator.html#filter) operator applied\nthe method [toUpperCase](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toUpperCase--)\nto each element emitted by the channel.\n\n### Conclusion\n\nThe new Groovy runtime brings a lot of syntax sugar for Nextflow pipelines and allows the use of modern Java\nruntime which delivers better performance and resource usage.\n\nThe ones listed above are only a small selection which may be useful to everyday Nextflow developers.\nIf you are curious to learn more about all the changes in the new Groovy parser you can find more details in\n[this link](https://groovy-lang.org/releasenotes/groovy-3.0.html).\n\nFinally, a big thanks to the Groovy community for their significant efforts in developing and maintaining this\ngreat programming environment.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,dsl2" @@ -329,7 +329,7 @@ "slug": "2020/learning-nextflow-in-2020", "title": "Learning Nextflow in 2020", "date": "2020-12-01T00:00:00.000Z", - "content": "\nWith the year nearly over, we thought it was about time to pull together the best-of-the-best guide for learning Nextflow in 2020. These resources will support anyone in the journey from total noob to Nextflow expert so this holiday season, give yourself or someone you know the gift of learning Nextflow!\n\n### Prerequisites to get started\n\nWe recommend that learners are comfortable with using the command line and the basic concepts of a scripting language such as Python or Perl before they start writing pipelines. Nextflow is widely used for bioinformatics applications, and the examples in these guides often focus on applications in these topics. However, Nextflow is now adopted in a number of data-intensive domains such as radio astronomy, satellite imaging and machine learning. No domain expertise is expected.\n\n### Time commitment\n\nWe estimate that the speediest of learners can complete the material in around 12 hours. It all depends on your background and how deep you want to dive into the rabbit-hole! Most of the content is introductory with some more advanced dataflow and configuration material in the workshops and patterns sections.\n\n### Overview of the material\n\n- Why learn Nextflow?\n- Introduction to Nextflow - AWS HPC Conference 2020 (8m)\n- A simple RNA-Seq hands-on tutorial (2h)\n- Full-immersion workshop (8h)\n- Nextflow advanced implementation Patterns (2h)\n- Other resources\n- Community and Support\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FDA repeatability and reproducibility guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support**. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow from the HPC on AWS Conference 2020\n\nThis short YouTube video provides a general overview of Nextflow, the motivations behind its development and a demonstration of some of the latest features.\n\n\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, use channels for data and write processes to perform tasks. It includes all scripts, data and resources and is perfect for getting a flavor for Nextflow.\n\n[Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Full-immersion workshop\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n[Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Nextflow implementation Patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along with as well as a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### Other resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing something.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. The docs should be your first point of reference when something is not clear. Newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines built by domain experts with Nextflow, that have passed tests and have been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n[nf-core website](https://nf-co.re)\n\n#### Tower Docs\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface or API.\n\n[Nextflow Tower documentation](http://help.tower.nf)\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n[Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Running Nextflow by Google Cloud\n\nGoogle Cloud Nextflow step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n[Nextflow on Google Cloud ](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Awesome Nextflow\n\nA collections of Nextflow based pipelines and other resources.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow)\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- [nf-core Slack](https://nfcore.slack.com/)\n- Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Seqera Labs](https://www.seqera.io) technical support & consulting\n\nNextflow is a community-driven project. The list of links below has been collated from a diverse collection of resources and experts to guide you in learning Nextflow. If you have any suggestions, please make a pull request to this page on GitHub.\n\nAlso stay tuned for our upcoming post, where we will discuss the ultimate Nextflow development environment.\n", + "content": "With the year nearly over, we thought it was about time to pull together the best-of-the-best guide for learning Nextflow in 2020. These resources will support anyone in the journey from total noob to Nextflow expert so this holiday season, give yourself or someone you know the gift of learning Nextflow!\n\n### Prerequisites to get started\n\nWe recommend that learners are comfortable with using the command line and the basic concepts of a scripting language such as Python or Perl before they start writing pipelines. Nextflow is widely used for bioinformatics applications, and the examples in these guides often focus on applications in these topics. However, Nextflow is now adopted in a number of data-intensive domains such as radio astronomy, satellite imaging and machine learning. No domain expertise is expected.\n\n### Time commitment\n\nWe estimate that the speediest of learners can complete the material in around 12 hours. It all depends on your background and how deep you want to dive into the rabbit-hole! Most of the content is introductory with some more advanced dataflow and configuration material in the workshops and patterns sections.\n\n### Overview of the material\n\n- Why learn Nextflow?\n- Introduction to Nextflow - AWS HPC Conference 2020 (8m)\n- A simple RNA-Seq hands-on tutorial (2h)\n- Full-immersion workshop (8h)\n- Nextflow advanced implementation Patterns (2h)\n- Other resources\n- Community and Support\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FDA repeatability and reproducibility guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support**. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow from the HPC on AWS Conference 2020\n\nThis short YouTube video provides a general overview of Nextflow, the motivations behind its development and a demonstration of some of the latest features.\n\n\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, use channels for data and write processes to perform tasks. It includes all scripts, data and resources and is perfect for getting a flavor for Nextflow.\n\n[Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Full-immersion workshop\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n[Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Nextflow implementation Patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along with as well as a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### Other resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing something.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. The docs should be your first point of reference when something is not clear. Newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines built by domain experts with Nextflow, that have passed tests and have been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n[nf-core website](https://nf-co.re)\n\n#### Tower Docs\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface or API.\n\n[Nextflow Tower documentation](http://help.tower.nf)\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n[Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Running Nextflow by Google Cloud\n\nGoogle Cloud Nextflow step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n[Nextflow on Google Cloud ](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Awesome Nextflow\n\nA collections of Nextflow based pipelines and other resources.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow)\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- [nf-core Slack](https://nfcore.slack.com/)\n- Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Seqera Labs](https://www.seqera.io) technical support & consulting\n\nNextflow is a community-driven project. The list of links below has been collated from a diverse collection of resources and experts to guide you in learning Nextflow. If you have any suggestions, please make a pull request to this page on GitHub.\n\nAlso stay tuned for our upcoming post, where we will discuss the ultimate Nextflow development environment.", "images": [], "author": "Evan Floden & Alain Coletta", "tags": "nextflow,learning,workshop" @@ -338,7 +338,7 @@ "slug": "2021/5-more-tips-for-nextflow-user-on-hpc", "title": "Five more tips for Nextflow user on HPC", "date": "2021-06-15T00:00:00.000Z", - "content": "\nIn May we blogged about [Five Nextflow Tips for HPC Users](/blog/2021/5_tips_for_hpc_users.html) and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers.\n\n### 1. Use the scratch directive\n\nTo allow the pipeline tasks to share data with each other, Nextflow requires a shared file system path as a working directory. When using this model, a common recommendation is to use the node's local scratch storage as the job working directory to avoid unnecessary use of the network shared file system and achieve better performance.\n\nNextflow implements this best-practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```\nprocess.scratch = true\n```\n\nWhen using this option, Nextflow:\n\n- Creates a unique directory in the computing node's local `/tmp` or the path assigned by your cluster via the `TMPDIR` environment variable.\n- Creates a [symlink](https://en.wikipedia.org/wiki/Symbolic_link) for each input file required by the job execution.\n- Runs the job in the local scratch path.\n Copies the job output files into the job shared work directory assigned by Nextflow.\n\n### 2. Use -bg option to launch the execution in the background\n\nIn some circumstances, you may need to run your Nextflow pipeline in the background without losing the execution output. In this scenario use the `-bg` command line option as shown below.\n\n```\nnextflow run -bg > my-file.log\n```\n\nThis can be very useful when launching the execution from an SSH connected terminal and ensures that any connection issues don't stop the pipeline. You can use `ps` and `kill` to find and stop the execution.\n\n### 3. Disable interactive logging\n\nNextflow has rich terminal logging which uses ANSI escape codes to update the pipeline execution counters interactively. However, this is not very useful when submitting the pipeline execution as a cluster job or in the background. In this case, disable the rich ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`.\n\n### 4. Cluster native options\n\nNextlow has portable directives for common resource requests such as [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory) and [disk](https://www.nextflow.io/docs/latest/process.html#disk) allocation.\n\nThese directives allow you to specify the request for a certain number of computing resources e.g CPUs, memory, or disk and Nextflow converts these values to the native setting of the target execution platform specified in the pipeline configuration.\n\nHowever, there can be settings that are only available on some specific cluster technology or vendors.\n\nThe [clusterOptions](https://www.nextflow.io/docs/latest/process.html#clusterOptions) directive allows you to specify any option of your resource manager for which there isn't direct support in Nextflow.\n\n### 5. Retry failing jobs increasing resource allocation\n\nA common scenario is that instances of the same process may require different computing resources. For example, requesting an amount of memory that is too low for some processes will result in those tasks failing. You could specify a higher limit which would accommodate the task with the highest memory utilization, but you then run the risk of decreasing your job’s execution priority.\n\nNextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure and attempt to re-execute it using a higher limit. For example:\n\n```\nprocess foo {\n\n memory { 2.GB * task.attempt }\n time { 1.hour * task.attempt }\n\n errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries 3\n\n script:\n \"\"\"\n your_job_command --here\n \"\"\"\n}\n```\n\nIn the above example the memory and execution time limits are defined dynamically. The first time the process is executed the task.attempt is set to 1, thus it will request 2 GB of memory and one hour of maximum execution time.\n\nIf the task execution fails, reporting an exit status in the range between 137 and 140, the task is re-submitted (otherwise it terminates immediately). This time the value of task.attempt is 2, thus increasing the amount of the memory to four GB and the time to 2 hours, and so on.\n\nNOTE: These exit statuses are not standard and can change depending on the resource manager you are using. Consult your cluster administrator or scheduler administration guide for details on the exit statuses used by your cluster in similar error conditions.\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These Nextflow options allow you to shape how Nextflow submits your processes to your executor, that can make your workflow more robust by avoiding the overloading of the executor. Some systems have hard limits which if you do not take into account, no processes will be executed. Being aware of these configuration values and how to use them is incredibly helpful when working with larger workflows.\n", + "content": "In May we blogged about [Five Nextflow Tips for HPC Users](/blog/2021/5_tips_for_hpc_users.html) and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers.\n\n### 1. Use the scratch directive\n\nTo allow the pipeline tasks to share data with each other, Nextflow requires a shared file system path as a working directory. When using this model, a common recommendation is to use the node's local scratch storage as the job working directory to avoid unnecessary use of the network shared file system and achieve better performance.\n\nNextflow implements this best-practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```\nprocess.scratch = true\n```\n\nWhen using this option, Nextflow:\n\n- Creates a unique directory in the computing node's local `/tmp` or the path assigned by your cluster via the `TMPDIR` environment variable.\n- Creates a [symlink](https://en.wikipedia.org/wiki/Symbolic_link) for each input file required by the job execution.\n- Runs the job in the local scratch path.\n Copies the job output files into the job shared work directory assigned by Nextflow.\n\n### 2. Use -bg option to launch the execution in the background\n\nIn some circumstances, you may need to run your Nextflow pipeline in the background without losing the execution output. In this scenario use the `-bg` command line option as shown below.\n\n```\nnextflow run -bg > my-file.log\n```\n\nThis can be very useful when launching the execution from an SSH connected terminal and ensures that any connection issues don't stop the pipeline. You can use `ps` and `kill` to find and stop the execution.\n\n### 3. Disable interactive logging\n\nNextflow has rich terminal logging which uses ANSI escape codes to update the pipeline execution counters interactively. However, this is not very useful when submitting the pipeline execution as a cluster job or in the background. In this case, disable the rich ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`.\n\n### 4. Cluster native options\n\nNextlow has portable directives for common resource requests such as [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory) and [disk](https://www.nextflow.io/docs/latest/process.html#disk) allocation.\n\nThese directives allow you to specify the request for a certain number of computing resources e.g CPUs, memory, or disk and Nextflow converts these values to the native setting of the target execution platform specified in the pipeline configuration.\n\nHowever, there can be settings that are only available on some specific cluster technology or vendors.\n\nThe [clusterOptions](https://www.nextflow.io/docs/latest/process.html#clusterOptions) directive allows you to specify any option of your resource manager for which there isn't direct support in Nextflow.\n\n### 5. Retry failing jobs increasing resource allocation\n\nA common scenario is that instances of the same process may require different computing resources. For example, requesting an amount of memory that is too low for some processes will result in those tasks failing. You could specify a higher limit which would accommodate the task with the highest memory utilization, but you then run the risk of decreasing your job’s execution priority.\n\nNextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure and attempt to re-execute it using a higher limit. For example:\n\n```\nprocess foo {\n\n memory { 2.GB * task.attempt }\n time { 1.hour * task.attempt }\n\n errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries 3\n\n script:\n \"\"\"\n your_job_command --here\n \"\"\"\n}\n```\n\nIn the above example the memory and execution time limits are defined dynamically. The first time the process is executed the task.attempt is set to 1, thus it will request 2 GB of memory and one hour of maximum execution time.\n\nIf the task execution fails, reporting an exit status in the range between 137 and 140, the task is re-submitted (otherwise it terminates immediately). This time the value of task.attempt is 2, thus increasing the amount of the memory to four GB and the time to 2 hours, and so on.\n\nNOTE: These exit statuses are not standard and can change depending on the resource manager you are using. Consult your cluster administrator or scheduler administration guide for details on the exit statuses used by your cluster in similar error conditions.\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These Nextflow options allow you to shape how Nextflow submits your processes to your executor, that can make your workflow more robust by avoiding the overloading of the executor. Some systems have hard limits which if you do not take into account, no processes will be executed. Being aware of these configuration values and how to use them is incredibly helpful when working with larger workflows.\n", "images": [], "author": "Kevin Sayers", "tags": "nextflow,hpc" @@ -347,7 +347,7 @@ "slug": "2021/5_tips_for_hpc_users", "title": "5 Nextflow Tips for HPC Users", "date": "2021-05-13T00:00:00.000Z", - "content": "\nNextflow is a powerful tool for developing scientific workflows for use on HPC systems. It provides a simple solution to deploy parallelized workloads at scale using an elegant reactive/functional programming model in a portable manner.\n\nIt supports the most popular workload managers such as Grid Engine, Slurm, LSF and PBS, among other out-of-the-box executors, and comes with sensible defaults for each. However, each HPC system is a complex machine with its own characteristics and constraints. For this reason you should always consult your system administrator before running a new piece of software or a compute intensive pipeline that spawns a large number of jobs.\n\nIn this series of posts, we will be sharing the top tips we have learned along the way that should help you get results faster while keeping in the good books of your sys admins.\n\n### 1. Don't forget the executor\n\nNextflow, by default, spawns parallel task executions in the computer on which it is running. This is generally useful for development purposes, however, when using an HPC system you should specify the executor matching your system. This instructs Nextflow to submit pipeline tasks as jobs into your HPC workload manager. This can be done adding the following setting to the `nextflow.config` file in the launching directory, for example:\n\n```\nprocess.executor = 'slurm'\n```\n\nWith the above setting Nextflow will submit the job executions to your Slurm cluster spawning a `sbatch` command for each job in your pipeline. Find the executor matching your system at [this link](https://www.nextflow.io/docs/latest/executor.html).\nEven better, to prevent the undesired use of the local executor in a specific environment, define the _default_ executor to be used by Nextflow using the following system variable:\n\n```\nexport NXF_EXECUTOR=slurm\n```\n\n### 2. Nextflow as a job\n\nQuite surely your sys admin has already warned you that the login/head node should only be used to submit job executions and not run compute intensive tasks.\nWhen running a Nextflow pipeline, the driver application submits and monitors the job executions on your cluster (provided you have correctly specified the executor as stated in point 1), and therefore it should not run compute intensive tasks.\n\nHowever, it's never a good practice to launch a long running job in the login node, and therefore a good practice consists of running Nextflow itself as a cluster job. This can be done by wrapping the `nextflow run` command in a shell script and submitting it as any other job. An average pipeline may require 2 CPUs and 2 GB of resources allocation.\n\nNote: the queue where the Nextflow driver job is submitted should allow the spawning of the pipeline jobs to carry out the pipeline execution.\n\n### 3. Use the queueSize directive\n\nThe `queueSize` directive is part of the executor configuration in the `nextflow.config` file, and defines how many processes are queued at a given time. By default, Nextflow will submit up to 100 jobs at a time for execution. Increase or decrease this setting depending your HPC system quota and throughput. For example:\n\n```\nexecutor {\n name = 'slurm'\n queueSize = 50\n}\n```\n\n### 4. Specify the max heap size\n\nThe Nextflow runtime runs on top of the Java virtual machine which, by design, tries to allocate as much memory as is available. This is not a good practice in HPC systems which are designed to share compute resources across many users and applications.\nTo avoid this, specify the maximum amount of memory that can be used by the Java VM using the -Xms and -Xmx Java flags. These can be specified using the `NXF_OPTS` environment variable.\n\nFor example:\n\n```\nexport NXF_OPTS=\"-Xms500M -Xmx2G\"\n```\n\nThe above setting instructs Nextflow to allocate a Java heap in the range of 500 MB and 2 GB of RAM.\n\n### 5. Limit the Nextflow submit rate\n\nNextflow attempts to submit the job executions as quickly as possible, which is generally not a problem. However, in some HPC systems the submission throughput is constrained or it should be limited to avoid degrading the overall system performance.\nTo prevent this problem you can use `submitRateLimit` to control the Nextflow job submission throughput. This directive is part of the `executor` configuration scope, and defines the number of tasks that can be submitted per a unit of time. The default for the `submitRateLimit` is unlimited.\nYou can specify the `submitRateLimit` like this:\n\n```\nexecutor {\n submitRateLimit = '10 sec'\n}\n```\n\nYou can also more explicitly specify it as a rate of # processes / time unit:\n\n```\nexecutor {\n submitRateLimit = '10/2min'\n}\n```\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These options allow you to shape how Nextflow communicates with your HPC system. This can make workflows more robust while avoiding overloading the executor. Some systems have hard limits, and if you do not take them into account, it will stop any jobs from being scheduled.\n\nStay tuned for part two where we will discuss background executions, retry strategies, maxForks and other tips.\n", + "content": "Nextflow is a powerful tool for developing scientific workflows for use on HPC systems. It provides a simple solution to deploy parallelized workloads at scale using an elegant reactive/functional programming model in a portable manner.\n\nIt supports the most popular workload managers such as Grid Engine, Slurm, LSF and PBS, among other out-of-the-box executors, and comes with sensible defaults for each. However, each HPC system is a complex machine with its own characteristics and constraints. For this reason you should always consult your system administrator before running a new piece of software or a compute intensive pipeline that spawns a large number of jobs.\n\nIn this series of posts, we will be sharing the top tips we have learned along the way that should help you get results faster while keeping in the good books of your sys admins.\n\n### 1. Don't forget the executor\n\nNextflow, by default, spawns parallel task executions in the computer on which it is running. This is generally useful for development purposes, however, when using an HPC system you should specify the executor matching your system. This instructs Nextflow to submit pipeline tasks as jobs into your HPC workload manager. This can be done adding the following setting to the `nextflow.config` file in the launching directory, for example:\n\n```\nprocess.executor = 'slurm'\n```\n\nWith the above setting Nextflow will submit the job executions to your Slurm cluster spawning a `sbatch` command for each job in your pipeline. Find the executor matching your system at [this link](https://www.nextflow.io/docs/latest/executor.html).\nEven better, to prevent the undesired use of the local executor in a specific environment, define the _default_ executor to be used by Nextflow using the following system variable:\n\n```\nexport NXF_EXECUTOR=slurm\n```\n\n### 2. Nextflow as a job\n\nQuite surely your sys admin has already warned you that the login/head node should only be used to submit job executions and not run compute intensive tasks.\nWhen running a Nextflow pipeline, the driver application submits and monitors the job executions on your cluster (provided you have correctly specified the executor as stated in point 1), and therefore it should not run compute intensive tasks.\n\nHowever, it's never a good practice to launch a long running job in the login node, and therefore a good practice consists of running Nextflow itself as a cluster job. This can be done by wrapping the `nextflow run` command in a shell script and submitting it as any other job. An average pipeline may require 2 CPUs and 2 GB of resources allocation.\n\nNote: the queue where the Nextflow driver job is submitted should allow the spawning of the pipeline jobs to carry out the pipeline execution.\n\n### 3. Use the queueSize directive\n\nThe `queueSize` directive is part of the executor configuration in the `nextflow.config` file, and defines how many processes are queued at a given time. By default, Nextflow will submit up to 100 jobs at a time for execution. Increase or decrease this setting depending your HPC system quota and throughput. For example:\n\n```\nexecutor {\n name = 'slurm'\n queueSize = 50\n}\n```\n\n### 4. Specify the max heap size\n\nThe Nextflow runtime runs on top of the Java virtual machine which, by design, tries to allocate as much memory as is available. This is not a good practice in HPC systems which are designed to share compute resources across many users and applications.\nTo avoid this, specify the maximum amount of memory that can be used by the Java VM using the -Xms and -Xmx Java flags. These can be specified using the `NXF_OPTS` environment variable.\n\nFor example:\n\n```\nexport NXF_OPTS=\"-Xms500M -Xmx2G\"\n```\n\nThe above setting instructs Nextflow to allocate a Java heap in the range of 500 MB and 2 GB of RAM.\n\n### 5. Limit the Nextflow submit rate\n\nNextflow attempts to submit the job executions as quickly as possible, which is generally not a problem. However, in some HPC systems the submission throughput is constrained or it should be limited to avoid degrading the overall system performance.\nTo prevent this problem you can use `submitRateLimit` to control the Nextflow job submission throughput. This directive is part of the `executor` configuration scope, and defines the number of tasks that can be submitted per a unit of time. The default for the `submitRateLimit` is unlimited.\nYou can specify the `submitRateLimit` like this:\n\n```\nexecutor {\n submitRateLimit = '10 sec'\n}\n```\n\nYou can also more explicitly specify it as a rate of # processes / time unit:\n\n```\nexecutor {\n submitRateLimit = '10/2min'\n}\n```\n\n### Conclusion\n\nNextflow aims to give you control over every aspect of your workflow. These options allow you to shape how Nextflow communicates with your HPC system. This can make workflows more robust while avoiding overloading the executor. Some systems have hard limits, and if you do not take them into account, it will stop any jobs from being scheduled.\n\nStay tuned for part two where we will discuss background executions, retry strategies, maxForks and other tips.", "images": [], "author": "Kevin Sayers", "tags": "nextflow,hpc" @@ -356,7 +356,7 @@ "slug": "2021/configure-git-repositories-with-nextflow", "title": "Configure Git private repositories with Nextflow", "date": "2021-10-21T00:00:00.000Z", - "content": "\nGit has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.\n\nNextflow provides builtin support for Git and most popular Git hosting platforms such\nas GitHub, GitLab and Bitbucket between the others, which streamline managing versions\nand track changes in your pipeline projects and facilitate the collaboration across\ndifferent users.\n\nIn order to access public repositories Nextflow does not require any special configuration, just use the _http_ URL of the pipeline project you want to run\nin the run command, for example:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\nHowever to allow Nextflow to access private repositories you will need to specify\nthe repository credentials, and the server hostname in the case of self-managed\nGit server installations.\n\n## Configure access to private repositories\n\nThis is done through a file name `scm` placed in the `$HOME/.nextflow/` directory, containing the credentials and other details for accessing a particular Git hosting solution. You can refer to the Nextflow documentation for all the [SCM configuration file](https://www.nextflow.io/docs/edge/sharing.html) options.\n\nAll of these platforms have their own authentication mechanisms for Git operations which are captured in the `$HOME/.nextflow/scm` file with the following syntax:\n\n```groovy\nproviders {\n\n '' {\n user = value\n password = value\n ...\n }\n\n '' {\n user = value\n password = value\n ...\n }\n\n}\n```\n\nNote: Make sure to enclose the provider name with `'` if it contains a `-` or a\nblank character.\n\nAs of the 21.09.0-edge release, Nextflow integrates with the following Git providers:\n\n## GitHub\n\n[GitHub](https://github.com) is one of the most well known Git providers and is home to some of the most popular open-source Nextflow pipelines from the [nf-core](https://github.com/nf-core/) community project.\n\nIf you wish to use Nextflow code from a **public** repository hosted on GitHub.com, then you don't need to provide credentials (`user` and `password`) to pull code from the repository. However, if you wish to interact with a private repository or are running into GitHub API rate limits for public repos, then you must provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nIt is worth noting that [GitHub recently phased out Git password authentication](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/#what-you-need-to-do-today) and now requires that users supply a more secure GitHub-generated _Personal Access Token_ for authentication. With Nextflow, you can specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n github {\n user = 'me'\n password = 'my-personal-access-token'\n }\n\n}\n```\n\nTo generate a `personal-access-token` for the GitHub platform, follow the instructions provided [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). Ensure that the token has at a minimum all the permissions in the `repo` scope.\n\nOnce you have provided your username and _personal access token_, as shown above, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://github.com/user_name/private_repo\n```\n\n## Bitbucket Cloud\n\n[Bitbucket](https://bitbucket.org/) is a publicly accessible Git solution hosted by Atlassian. Please note that if you are using an on-premises Bitbucket installation, you should follow the instructions for _Bitbucket Server_ in the following section.\n\nIf your Nextflow code is in a public Bitbucket repository, then you don't need to specify your credentials to pull code from the repository. However, if you wish to interact with a private repository, you need to provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that Bitbucket Cloud requires your `app password` in the `password` field, which is different from your login password.\n\n```groovy\nproviders {\n\n bitbucket {\n user = 'me'\n password = 'my-app-password'\n }\n\n}\n```\n\nTo generate an `app password` for the Bitbucket platform, follow the instructions provided [here](https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/). Ensure that the token has at least `Repositories: Read` permission.\n\nOnce these settings are saved in `$HOME/.nextflow/scm`, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://bitbucket.org/user_name/private_repo\n```\n\n## Bitbucket Server\n\n[Bitbucket Server](https://www.atlassian.com/software/bitbucket/enterprise) is a Git hosting solution from Atlassian which is meant for teams that require a self-managed solution. If Nextflow code resides in an open Bitbucket repository, then you don't need to provide credentials to pull code from this repository. However, if you wish to interact with a private repository, you need to give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Bitbucket server as `mybitbucketserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mybitbucketserver {\n platform = 'bitbucketserver'\n server = 'https://your.bitbucket.host.com'\n user = 'me'\n password = 'my-password' // OR \"my-token\"\n }\n\n}\n```\n\nTo generate a _personal access token_ for Bitbucket Server, refer to the [Bitbucket Support documentation](https://confluence.atlassian.com/bitbucketserver/managing-personal-access-tokens-1005339986.html) from Atlassian.\n\nOnce the configuration is saved, you can test the integration by pulling code from a private repository and specifying the `mybitbucketserver` Git provider using the `-hub` option.\n\n```\nnextflow pull https://your.bitbucket.host.com/user_name/private_repo -hub mybitbucketserver\n```\n\nNOTE: It is worth noting that [Atlassian is phasing out the Server offering](https://www.atlassian.com/migration/assess/journey-to-cloud) in favor of cloud product [bitbucket.org](https://bitbucket.org).\n\n## GitLab\n\n[GitLab](https://gitlab.com) is a popular Git provider that offers features covering various aspects of the DevOps cycle.\n\nIf you wish to run a Nextflow pipeline from a public GitLab repository, there is no need to provide credentials to pull code. However, if you wish to interact with a private repository, then you must give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that you need to specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n mygitlab {\n user = 'me'\n password = 'my-password' // or 'my-personal-access-token'\n token = 'my-personal-access-token'\n }\n\n}\n```\n\nIn addition, you can specify the `server` fields for your self-hosted instance of GitLab, by default [https://gitlab.com](https://gitlab.com) is assumed as the server.\n\nTo generate a `personal-access-token` for the GitLab platform follow the instructions provided [here](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html). Please ensure that the token has at least `read_repository`, `read_api` permissions.\n\nOnce the configuration is saved, you can test the integration by pulling the repository code using the `-hub` option.\n\n```\nnextflow pull https://gitlab.com/user_name/private_repo -hub mygitlab\n```\n\n## Gitea\n\n[Gitea server](https://gitea.com/) is an open source Git-hosting solution that can be self-hosted. If you have your Nextflow code in an open Gitea repository, there is no need to specify credentials to pull code from this repository. However, if you wish to interact with a private repository, you can give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Gitea server `mygiteaserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mygiteaserver {\n platform = 'gitea'\n server = 'https://gitea.host.com'\n user = 'me'\n password = 'my-password'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Gitea server, please refer to the [official guide](https://docs.gitea.io/en-us/api-usage/).\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `mygiteaserver` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://git.host.com/user_name/private_repo -hub mygiteaserver\n```\n\n## Azure Repos\n\n[Azure Repos](https://azure.microsoft.com/en-us/services/devops/repos/) is a part of Microsoft Azure Cloud Suite. Nextflow integrates natively Azure Repos via the usual `~/$HOME/.nextflow/scm` file.\n\nIf you'd like to use the `myazure` alias for the `azurerepos` provider, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n myazure {\n server = 'https://dev.azure.com'\n platform = 'azurerepos'\n user = 'me'\n token = 'my-api-token'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Azure Repos integration, please refer to the [official guide](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=preview-page) on Azure.\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `myazure` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://dev.azure.com/org_name/DefaultCollection/_git/repo_name -hub myazure\n```\n\n## Conclusion\n\nGit is a popular, widely used software system for source code management. The native integration of Nextflow with various Git hosting solutions is an important feature to facilitate reproducible workflows that enable collaborative development and deployment of Nextflow pipelines.\n\nStay tuned for more integrations as we continue to improve our support for various source code management solutions!\n", + "content": "Git has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.\n\nNextflow provides builtin support for Git and most popular Git hosting platforms such\nas GitHub, GitLab and Bitbucket between the others, which streamline managing versions\nand track changes in your pipeline projects and facilitate the collaboration across\ndifferent users.\n\nIn order to access public repositories Nextflow does not require any special configuration, just use the _http_ URL of the pipeline project you want to run\nin the run command, for example:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\nHowever to allow Nextflow to access private repositories you will need to specify\nthe repository credentials, and the server hostname in the case of self-managed\nGit server installations.\n\n## Configure access to private repositories\n\nThis is done through a file name `scm` placed in the `$HOME/.nextflow/` directory, containing the credentials and other details for accessing a particular Git hosting solution. You can refer to the Nextflow documentation for all the [SCM configuration file](https://www.nextflow.io/docs/edge/sharing.html) options.\n\nAll of these platforms have their own authentication mechanisms for Git operations which are captured in the `$HOME/.nextflow/scm` file with the following syntax:\n\n```groovy\nproviders {\n\n '' {\n user = value\n password = value\n ...\n }\n\n '' {\n user = value\n password = value\n ...\n }\n\n}\n```\n\nNote: Make sure to enclose the provider name with `'` if it contains a `-` or a\nblank character.\n\nAs of the 21.09.0-edge release, Nextflow integrates with the following Git providers:\n\n## GitHub\n\n[GitHub](https://github.com) is one of the most well known Git providers and is home to some of the most popular open-source Nextflow pipelines from the [nf-core](https://github.com/nf-core/) community project.\n\nIf you wish to use Nextflow code from a **public** repository hosted on GitHub.com, then you don't need to provide credentials (`user` and `password`) to pull code from the repository. However, if you wish to interact with a private repository or are running into GitHub API rate limits for public repos, then you must provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nIt is worth noting that [GitHub recently phased out Git password authentication](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/#what-you-need-to-do-today) and now requires that users supply a more secure GitHub-generated _Personal Access Token_ for authentication. With Nextflow, you can specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n github {\n user = 'me'\n password = 'my-personal-access-token'\n }\n\n}\n```\n\nTo generate a `personal-access-token` for the GitHub platform, follow the instructions provided [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). Ensure that the token has at a minimum all the permissions in the `repo` scope.\n\nOnce you have provided your username and _personal access token_, as shown above, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://github.com/user_name/private_repo\n```\n\n## Bitbucket Cloud\n\n[Bitbucket](https://bitbucket.org/) is a publicly accessible Git solution hosted by Atlassian. Please note that if you are using an on-premises Bitbucket installation, you should follow the instructions for _Bitbucket Server_ in the following section.\n\nIf your Nextflow code is in a public Bitbucket repository, then you don't need to specify your credentials to pull code from the repository. However, if you wish to interact with a private repository, you need to provide elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that Bitbucket Cloud requires your `app password` in the `password` field, which is different from your login password.\n\n```groovy\nproviders {\n\n bitbucket {\n user = 'me'\n password = 'my-app-password'\n }\n\n}\n```\n\nTo generate an `app password` for the Bitbucket platform, follow the instructions provided [here](https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/). Ensure that the token has at least `Repositories: Read` permission.\n\nOnce these settings are saved in `$HOME/.nextflow/scm`, you can test the integration by pulling the repository code.\n\n```\nnextflow pull https://bitbucket.org/user_name/private_repo\n```\n\n## Bitbucket Server\n\n[Bitbucket Server](https://www.atlassian.com/software/bitbucket/enterprise) is a Git hosting solution from Atlassian which is meant for teams that require a self-managed solution. If Nextflow code resides in an open Bitbucket repository, then you don't need to provide credentials to pull code from this repository. However, if you wish to interact with a private repository, you need to give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Bitbucket server as `mybitbucketserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mybitbucketserver {\n platform = 'bitbucketserver'\n server = 'https://your.bitbucket.host.com'\n user = 'me'\n password = 'my-password' // OR \"my-token\"\n }\n\n}\n```\n\nTo generate a _personal access token_ for Bitbucket Server, refer to the [Bitbucket Support documentation](https://confluence.atlassian.com/bitbucketserver/managing-personal-access-tokens-1005339986.html) from Atlassian.\n\nOnce the configuration is saved, you can test the integration by pulling code from a private repository and specifying the `mybitbucketserver` Git provider using the `-hub` option.\n\n```\nnextflow pull https://your.bitbucket.host.com/user_name/private_repo -hub mybitbucketserver\n```\n\nNOTE: It is worth noting that [Atlassian is phasing out the Server offering](https://www.atlassian.com/migration/assess/journey-to-cloud) in favor of cloud product [bitbucket.org](https://bitbucket.org).\n\n## GitLab\n\n[GitLab](https://gitlab.com) is a popular Git provider that offers features covering various aspects of the DevOps cycle.\n\nIf you wish to run a Nextflow pipeline from a public GitLab repository, there is no need to provide credentials to pull code. However, if you wish to interact with a private repository, then you must give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nPlease note that you need to specify your _personal access token_ in the `password` field.\n\n```groovy\nproviders {\n\n mygitlab {\n user = 'me'\n password = 'my-password' // or 'my-personal-access-token'\n token = 'my-personal-access-token'\n }\n\n}\n```\n\nIn addition, you can specify the `server` fields for your self-hosted instance of GitLab, by default [https://gitlab.com](https://gitlab.com) is assumed as the server.\n\nTo generate a `personal-access-token` for the GitLab platform follow the instructions provided [here](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html). Please ensure that the token has at least `read_repository`, `read_api` permissions.\n\nOnce the configuration is saved, you can test the integration by pulling the repository code using the `-hub` option.\n\n```\nnextflow pull https://gitlab.com/user_name/private_repo -hub mygitlab\n```\n\n## Gitea\n\n[Gitea server](https://gitea.com/) is an open source Git-hosting solution that can be self-hosted. If you have your Nextflow code in an open Gitea repository, there is no need to specify credentials to pull code from this repository. However, if you wish to interact with a private repository, you can give elevated access to Nextflow by specifying your credentials in the `scm` file.\n\nFor example, if you'd like to call your hosted Gitea server `mygiteaserver`, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n mygiteaserver {\n platform = 'gitea'\n server = 'https://gitea.host.com'\n user = 'me'\n password = 'my-password'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Gitea server, please refer to the [official guide](https://docs.gitea.io/en-us/api-usage/).\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `mygiteaserver` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://git.host.com/user_name/private_repo -hub mygiteaserver\n```\n\n## Azure Repos\n\n[Azure Repos](https://azure.microsoft.com/en-us/services/devops/repos/) is a part of Microsoft Azure Cloud Suite. Nextflow integrates natively Azure Repos via the usual `~/$HOME/.nextflow/scm` file.\n\nIf you'd like to use the `myazure` alias for the `azurerepos` provider, then you'll need to add the following snippet in your `~/$HOME/.nextflow/scm` file.\n\n```groovy\nproviders {\n\n myazure {\n server = 'https://dev.azure.com'\n platform = 'azurerepos'\n user = 'me'\n token = 'my-api-token'\n }\n\n}\n```\n\nTo generate a _personal access token_ for your Azure Repos integration, please refer to the [official guide](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=preview-page) on Azure.\n\nOnce the configuration is set, you can test the integration by pulling the repository code and specifying `myazure` as the Git provider using the `-hub` option.\n\n```\nnextflow pull https://dev.azure.com/org_name/DefaultCollection/_git/repo_name -hub myazure\n```\n\n## Conclusion\n\nGit is a popular, widely used software system for source code management. The native integration of Nextflow with various Git hosting solutions is an important feature to facilitate reproducible workflows that enable collaborative development and deployment of Nextflow pipelines.\n\nStay tuned for more integrations as we continue to improve our support for various source code management solutions!\n", "images": [], "author": "Abhinav Sharma", "tags": "git,github" @@ -365,7 +365,7 @@ "slug": "2021/introducing-nextflow-for-azure-batch", "title": "Introducing Nextflow for Azure Batch", "date": "2021-02-22T00:00:00.000Z", - "content": "\nWhen the Nextflow project was created, one of the main drivers was to enable reproducible data pipelines that could be deployed across a wide range of execution platforms with minimal effort as well as to empower users to scale their data analysis while facilitating the migration to the cloud.\n\nThroughout the years, the computing services provided by cloud vendors have evolved in a spectacular manner. Eight years ago, the model was focused on launching virtual machines in the cloud, then came containers and then the idea of serverless computing which changed everything again. However, the power of the Nextflow abstraction consists of hiding the complexity of the underlying platform. Through the concept of executors, emerging technologies and new platforms can be easily adapted with no changes required to user pipelines.\n\nWith this in mind, we could not be more excited to announce that over the past months we have been working with Microsoft to implement built-in support for [Azure Batch](https://azure.microsoft.com/en-us/services/batch/) into Nextflow. Today we are delighted to make it available to all users as a beta release.\n\n### How does it work\n\nAzure Batch is a cloud-based computing service that allows the execution of highly scalable, container based, workloads in the Azure cloud.\n\nThe support for Nextflow comes in the form of a plugin which implements a new executor, not surprisingly named `azurebatch`, which offloads the execution of the pipeline jobs to corresponding Azure Batch jobs.\n\nEach job run consists in practical terms of a container execution which ships the job dependencies and carries out the job computation. As usual, each job is assigned a unique working directory allocated into a [Azure Blob](https://azure.microsoft.com/en-us/services/storage/blobs/) container.\n\n### Let's get started!\n\nThe support for Azure Batch requires the latest release of Nextflow from the _edge_ channel (version 21.02-edge or later). If you don't have this, you can install it using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nNote for Windows users, as Nextflow is \\*nix based tool you will need to run it using the [Windows subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Also make sure Java 8 or later is installed in the Linux environment.\n\nOnce Nextflow is installed, to run your data pipelines with Azure Batch, you will need to create an Azure Batch account in the region of your choice using the Azure Portal. In a similar manner, you will need an Azure Blob container.\n\nWith the Azure Batch and Blob storage container configured, your `nextflow.config` file should be set up similar to the example below:\n\n```\nplugins {\n id 'nf-azure'\n}\n\nprocess {\n executor = 'azurebatch'\n}\n\nazure {\n batch {\n location = 'westeurope'\n accountName = ''\n accountKey = ''\n autoPoolMode = true\n }\n storage {\n accountName = \"\"\n accountKey = \"\"\n }\n}\n```\n\nUsing this configuration snippet, Nextflow will automatically create the virtual machine pool(s) required to deploy the pipeline execution in the Azure Batch service.\n\nNow you will be able to launch the pipeline execution using the following command:\n\n```\nnextflow run -w az://my-container/work\n```\n\nReplace `` with a pipeline name e.g. nextflow-io/rnaseq-nf and `my-container` with a blob container in the storage account as defined in the above configuration.\n\nFor more details regarding the Nextflow configuration setting for Azure Batch\nrefers to the Nextflow documentation at [this link](/docs/edge/azure.html).\n\n### Conclusion\n\nThe support for Azure Batch further expands the wide range of computing platforms supported by Nextflow and empowers Nextflow users to deploy their data pipelines in the cloud provider of their choice. Above all, it allows researchers to scale, collaborate and share their work without being locked into a specific platform.\n\nWe thank Microsoft, and in particular [Jer-Ming Chia](https://www.linkedin.com/in/jermingchia/) who works in the HPC and AI team for having supported and sponsored this open source contribution to the Nextflow framework.\n", + "content": "When the Nextflow project was created, one of the main drivers was to enable reproducible data pipelines that could be deployed across a wide range of execution platforms with minimal effort as well as to empower users to scale their data analysis while facilitating the migration to the cloud.\n\nThroughout the years, the computing services provided by cloud vendors have evolved in a spectacular manner. Eight years ago, the model was focused on launching virtual machines in the cloud, then came containers and then the idea of serverless computing which changed everything again. However, the power of the Nextflow abstraction consists of hiding the complexity of the underlying platform. Through the concept of executors, emerging technologies and new platforms can be easily adapted with no changes required to user pipelines.\n\nWith this in mind, we could not be more excited to announce that over the past months we have been working with Microsoft to implement built-in support for [Azure Batch](https://azure.microsoft.com/en-us/services/batch/) into Nextflow. Today we are delighted to make it available to all users as a beta release.\n\n### How does it work\n\nAzure Batch is a cloud-based computing service that allows the execution of highly scalable, container based, workloads in the Azure cloud.\n\nThe support for Nextflow comes in the form of a plugin which implements a new executor, not surprisingly named `azurebatch`, which offloads the execution of the pipeline jobs to corresponding Azure Batch jobs.\n\nEach job run consists in practical terms of a container execution which ships the job dependencies and carries out the job computation. As usual, each job is assigned a unique working directory allocated into a [Azure Blob](https://azure.microsoft.com/en-us/services/storage/blobs/) container.\n\n### Let's get started!\n\nThe support for Azure Batch requires the latest release of Nextflow from the _edge_ channel (version 21.02-edge or later). If you don't have this, you can install it using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nNote for Windows users, as Nextflow is \\*nix based tool you will need to run it using the [Windows subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Also make sure Java 8 or later is installed in the Linux environment.\n\nOnce Nextflow is installed, to run your data pipelines with Azure Batch, you will need to create an Azure Batch account in the region of your choice using the Azure Portal. In a similar manner, you will need an Azure Blob container.\n\nWith the Azure Batch and Blob storage container configured, your `nextflow.config` file should be set up similar to the example below:\n\n```\nplugins {\n id 'nf-azure'\n}\n\nprocess {\n executor = 'azurebatch'\n}\n\nazure {\n batch {\n location = 'westeurope'\n accountName = ''\n accountKey = ''\n autoPoolMode = true\n }\n storage {\n accountName = \"\"\n accountKey = \"\"\n }\n}\n```\n\nUsing this configuration snippet, Nextflow will automatically create the virtual machine pool(s) required to deploy the pipeline execution in the Azure Batch service.\n\nNow you will be able to launch the pipeline execution using the following command:\n\n```\nnextflow run -w az://my-container/work\n```\n\nReplace `` with a pipeline name e.g. nextflow-io/rnaseq-nf and `my-container` with a blob container in the storage account as defined in the above configuration.\n\nFor more details regarding the Nextflow configuration setting for Azure Batch\nrefers to the Nextflow documentation at [this link](/docs/edge/azure.html).\n\n### Conclusion\n\nThe support for Azure Batch further expands the wide range of computing platforms supported by Nextflow and empowers Nextflow users to deploy their data pipelines in the cloud provider of their choice. Above all, it allows researchers to scale, collaborate and share their work without being locked into a specific platform.\n\nWe thank Microsoft, and in particular [Jer-Ming Chia](https://www.linkedin.com/in/jermingchia/) who works in the HPC and AI team for having supported and sponsored this open source contribution to the Nextflow framework.\n", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,azure" @@ -374,7 +374,7 @@ "slug": "2021/nextflow-developer-environment", "title": "6 Tips for Setting Up Your Nextflow Dev Environment", "date": "2021-03-04T00:00:00.000Z", - "content": "\n_This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html)._\n\nThis guide is designed to walk you through a basic development setup for writing Nextflow pipelines.\n\n### 1. Installation\n\nNextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Installing Nextflow is straightforward. You just need to download the `nextflow` executable. In your terminal type the following commands:\n\n```\n$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin\n```\n\nThe first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note `/usr/local/bin` is the default for MacOS, you might want to choose `~/bin` or `/usr/bin` depending on your PATH definition and operating system.\n\n### 2. Text Editor or IDE?\n\nNextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience.\n\nMy current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With [VSCode installed](https://code.visualstudio.com/download), you can search for the Nextflow extension in the marketplace.\n\n![VSCode with Nextflow Syntax Highlighting](/img/vscode-nf-highlighting.png)\n\nOther syntax highlighting has been made available by the community including:\n\n- [Atom](https://atom.io/packages/language-nextflow)\n- [Vim](https://github.com/LukeGoodsell/nextflow-vim)\n- [Emacs](https://github.com/Emiller88/nextflow-mode)\n\n### 3. The Nextflow REPL console\n\nThe Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts.\n\nStart the Nextflow console with the following command:\n\n```\n$ nextflow console\n```\n\n![Nextflow REPL console](/img/nf-repl-console.png)\n\nUse the `CTRL+R` keyboard shortcut to run (`⌘+R`on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the **Run selection**.\n\n### 4. Containerize all the things\n\nContainers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.\n\nOnce you have [downloaded and installed Docker](https://docs.docker.com/engine/install/), try pull a public docker image:\n\n```\n$ docker pull quay.io/nextflow/rnaseq-nf\n```\n\nTo run a Nextflow pipeline using the latest tag of the image, we can use:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest\n```\n\nTo learn more about building Docker containers, see the [Seqera Labs tutorial](https://seqera.io/training/#_manage_dependencies_containers) on managing dependencies with containers.\n\nAdditionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.\n\n![VSCode with Docker Extension](/img/vs-code-with-docker-extension.png)\n\n### 5. Use Tower to monitor your pipelines\n\nWhen developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With [Nextflow Tower](https://tower.nf), we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.\n\nSign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc`, or include them in your nextflow.config.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n```\n\nWe can then add the `-with-tower` child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/rnaseq-nf -with-tower\n```\n\n### 6. nf-core tools\n\n[nf-core](https://nf-co.re/) is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage.\n\nThese tools are particularly useful for developers contributing to the community pipelines on [GitHub](https://github.com/nf-core/) with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.\n\n`nf-core tools` is a python package that can be installed in your development environment from Bioconda or PyPi.\n\n```\n$ conda install nf-core\n```\n\nor\n\n```\n$ pip install nf-core\n```\n\n![nf-core tools](/img/nf-core-tools.png)\n\n### Conclusion\n\nDeveloper workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.\n\nSolutions such as [GitHub Codespaces](https://github.com/features/codespaces) and [Gitpod](https://www.gitpod.io/) are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!\n", + "content": "_This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html)._\n\nThis guide is designed to walk you through a basic development setup for writing Nextflow pipelines.\n\n### 1. Installation\n\nNextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Installing Nextflow is straightforward. You just need to download the `nextflow` executable. In your terminal type the following commands:\n\n```\n$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin\n```\n\nThe first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note `/usr/local/bin` is the default for MacOS, you might want to choose `~/bin` or `/usr/bin` depending on your PATH definition and operating system.\n\n### 2. Text Editor or IDE?\n\nNextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience.\n\nMy current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With [VSCode installed](https://code.visualstudio.com/download), you can search for the Nextflow extension in the marketplace.\n\n![VSCode with Nextflow Syntax Highlighting](/img/vscode-nf-highlighting.png)\n\nOther syntax highlighting has been made available by the community including:\n\n- [Atom](https://atom.io/packages/language-nextflow)\n- [Vim](https://github.com/LukeGoodsell/nextflow-vim)\n- [Emacs](https://github.com/Emiller88/nextflow-mode)\n\n### 3. The Nextflow REPL console\n\nThe Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts.\n\nStart the Nextflow console with the following command:\n\n```\n$ nextflow console\n```\n\n![Nextflow REPL console](/img/nf-repl-console.png)\n\nUse the `CTRL+R` keyboard shortcut to run (`⌘+R`on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the **Run selection**.\n\n### 4. Containerize all the things\n\nContainers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.\n\nOnce you have [downloaded and installed Docker](https://docs.docker.com/engine/install/), try pull a public docker image:\n\n```\n$ docker pull quay.io/nextflow/rnaseq-nf\n```\n\nTo run a Nextflow pipeline using the latest tag of the image, we can use:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest\n```\n\nTo learn more about building Docker containers, see the [Seqera Labs tutorial](https://seqera.io/training/#_manage_dependencies_containers) on managing dependencies with containers.\n\nAdditionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.\n\n![VSCode with Docker Extension](/img/vs-code-with-docker-extension.png)\n\n### 5. Use Tower to monitor your pipelines\n\nWhen developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With [Nextflow Tower](https://tower.nf), we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.\n\nSign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc`, or include them in your nextflow.config.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n```\n\nWe can then add the `-with-tower` child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/rnaseq-nf -with-tower\n```\n\n### 6. nf-core tools\n\n[nf-core](https://nf-co.re/) is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage.\n\nThese tools are particularly useful for developers contributing to the community pipelines on [GitHub](https://github.com/nf-core/) with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.\n\n`nf-core tools` is a python package that can be installed in your development environment from Bioconda or PyPi.\n\n```\n$ conda install nf-core\n```\n\nor\n\n```\n$ pip install nf-core\n```\n\n![nf-core tools](/img/nf-core-tools.png)\n\n### Conclusion\n\nDeveloper workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.\n\nSolutions such as [GitHub Codespaces](https://github.com/features/codespaces) and [Gitpod](https://www.gitpod.io/) are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!", "images": [], "author": "Evan Floden", "tags": "nextflow,development,learning" @@ -383,7 +383,7 @@ "slug": "2021/nextflow-sql-support", "title": "Introducing Nextflow support for SQL databases", "date": "2021-09-16T00:00:00.000Z", - "content": "\nThe recent tweet introducing the [Nextflow support for SQL databases](https://twitter.com/PaoloDiTommaso/status/1433120149888974854) raised a lot of positive reaction. In this post, I want to describe more in detail how this extension works.\n\nNextflow was designed with the idea to streamline the deployment of complex data pipelines in a scalable, portable and reproducible manner across different computing platforms. To make this all possible, it was decided the resulting pipeline and the runtime should be self-contained i.e. to not depend on separate services such as database servers.\n\nThis makes the resulting pipelines easier to configure, deploy, and allows for testing them using [CI services](https://en.wikipedia.org/wiki/Continuous_integration), which is a critical best practice for delivering high-quality and stable software.\n\nAnother important consequence is that Nextflow pipelines do not retain the pipeline state on separate storage. Said in a different way, the idea was - and still is - to promote stateless pipeline execution in which the computed results are only determined by the pipeline inputs and the code itself, which is consistent with the _functional_ dataflow paradigm on which Nextflow is based.\n\nHowever, the ability to access SQL data sources can be very useful in data pipelines, for example, to ingest input metadata or to store task executions logs.\n\n### How does it work?\n\nThe support for SQL databases in Nextflow is implemented as an optional plugin component. This plugin provides two new operations into your Nextflow script:\n\n1. `fromQuery` performs a SQL query against the specified database and returns a Nextflow channel emitting them. This channel can be used in your pipeline as any other Nextflow channel to trigger the process execution with the corresponding values.\n2. `sqlInsert` takes the values emitted by a Nextflow channel and inserts them into a database table.\n\nThe plugin supports out-of-the-box popular database servers such as MySQL, PostgreSQL and MariaDB. It should be noted that the technology is based on the Java JDBC database standard, therefore it could easily support any database technology implementing a driver for this standard interface.\n\nDisclaimer: This plugin is a preview technology. Some features, syntax and configuration settings can change in future releases.\n\n### Let's get started!\n\nThe use of the SQL plugin requires the use of Nextflow 21.08.0-edge or later. If are using an older version, check [this page](https://www.nextflow.io/docs/latest/getstarted.html#stable-edge-releases) on how to update to the latest edge release.\n\nTo enable the use of the database plugin, add the following snippet in your pipeline configuration file.\n\n```\nplugins {\n id 'nf-sqldb@0.1.0'\n}\n```\n\nIt is then required to specify the connection _coordinates_ of the database service you want to connect to in your pipeline. This is done by adding a snippet similar to the following in your configuration file:\n\n```\nsql {\n db {\n 'my-db' {\n url = 'jdbc:mysql://localhost:3306/demo'\n user = 'my-user'\n password = 'my-password'\n }\n }\n}\n```\n\nIn the above example, replace `my-db` with a name of your choice (this name will be used in the script to reference the corresponding database connection coordinates). Also, provide a `url`, `user` and `password` matching your database server.\n\nYour script should then look like the following:\n\n```\nnextflow.enable.dsl=2\n\nprocess myProcess {\n input:\n tuple val(sample_id), path(sample_in)\n output:\n tuple val(sample_id), path('sample.out')\n\n \"\"\"\n your_command --input $sample_id > sample.out\n \"\"\"\n}\n\nworkflow {\n\n query = 'select SAMPLE_ID, SAMPLE_FILE from SAMPLES'\n channel.sql.fromQuery(query, db: 'my-db') \\\n | myProcess \\\n | sqlInsert(table: 'RESULTS', db: 'my-db')\n\n}\n```\n\nThe above example shows how to perform a simple database query, pipe the results to a fictitious process named `myProcess` and finally store the process outputs into a database table named `RESULTS`.\n\nIt is worth noting that Nextflow allows the use of any number of database instances in your pipeline, simply defining them in the configuration file using the syntax shown above. This could be useful to fetch database data from one data source and store the results into a different one.\n\nAlso, this makes it straightforward to write [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) scripts that span across multiple data sources.\n\nFind more details about the SQL plugin for Nextflow at [this link](https://github.com/nextflow-io/nf-sqldb).\n\n## What about the self-contained property?\n\nYou may wonder if adding this capability breaks the self-contained property of Nextflow pipelines which allows them to be run in a single command and to be tested with continuous integration services e.g. GitHub Action.\n\nThe good news is that it does not ... or at least it should not if used properly.\n\nIn fact, the SQL plugin includes the [H2](http://www.h2database.com/html/features.html) embedded in-memory SQL database that is used by default when no other database is provided in the Nextflow configuration file and can be used for developing and testing your pipeline without the need for a separate database service.\n\nTip: Other than this, H2 also provides the capability to access and query CSV/TSV files as SQL tables. Read more about this feature at [this link](http://www.h2database.com/html/tutorial.html?highlight=csv&search=csv#csv).\n\n### Conclusion\n\nThe use of this plugin adds to Nextflow the capability to query and store data into the SQL databases. Currently, the most popular SQL technologies are supported such as MySQL, PostgreSQL and MariaDB. In the future, support for other database technologies e.g. MongoDB, DynamoDB could be added.\n\nNotably, the support for SQL data-stores has been implemented preserving the core Nextflow capabilities to allow portable and self-contained pipeline scripts that can be developed locally, tested through CI services, and deployed at scale into production environments.\n\nIf you have any questions or suggestions, please feel free to comment in the project discussion group at [this link](https://github.com/nextflow-io/nf-sqldb/discussions).\n\nCredits to [Francesco Strozzi](https://twitter.com/fstrozzi) & [Raoul J.P. Bonnal](https://twitter.com/bonnalr) for having contributed to this work 🙏.\n", + "content": "The recent tweet introducing the [Nextflow support for SQL databases](https://twitter.com/PaoloDiTommaso/status/1433120149888974854) raised a lot of positive reaction. In this post, I want to describe more in detail how this extension works.\n\nNextflow was designed with the idea to streamline the deployment of complex data pipelines in a scalable, portable and reproducible manner across different computing platforms. To make this all possible, it was decided the resulting pipeline and the runtime should be self-contained i.e. to not depend on separate services such as database servers.\n\nThis makes the resulting pipelines easier to configure, deploy, and allows for testing them using [CI services](https://en.wikipedia.org/wiki/Continuous_integration), which is a critical best practice for delivering high-quality and stable software.\n\nAnother important consequence is that Nextflow pipelines do not retain the pipeline state on separate storage. Said in a different way, the idea was - and still is - to promote stateless pipeline execution in which the computed results are only determined by the pipeline inputs and the code itself, which is consistent with the _functional_ dataflow paradigm on which Nextflow is based.\n\nHowever, the ability to access SQL data sources can be very useful in data pipelines, for example, to ingest input metadata or to store task executions logs.\n\n### How does it work?\n\nThe support for SQL databases in Nextflow is implemented as an optional plugin component. This plugin provides two new operations into your Nextflow script:\n\n1. `fromQuery` performs a SQL query against the specified database and returns a Nextflow channel emitting them. This channel can be used in your pipeline as any other Nextflow channel to trigger the process execution with the corresponding values.\n2. `sqlInsert` takes the values emitted by a Nextflow channel and inserts them into a database table.\n\nThe plugin supports out-of-the-box popular database servers such as MySQL, PostgreSQL and MariaDB. It should be noted that the technology is based on the Java JDBC database standard, therefore it could easily support any database technology implementing a driver for this standard interface.\n\nDisclaimer: This plugin is a preview technology. Some features, syntax and configuration settings can change in future releases.\n\n### Let's get started!\n\nThe use of the SQL plugin requires the use of Nextflow 21.08.0-edge or later. If are using an older version, check [this page](https://www.nextflow.io/docs/latest/getstarted.html#stable-edge-releases) on how to update to the latest edge release.\n\nTo enable the use of the database plugin, add the following snippet in your pipeline configuration file.\n\n```\nplugins {\n id 'nf-sqldb@0.1.0'\n}\n```\n\nIt is then required to specify the connection _coordinates_ of the database service you want to connect to in your pipeline. This is done by adding a snippet similar to the following in your configuration file:\n\n```\nsql {\n db {\n 'my-db' {\n url = 'jdbc:mysql://localhost:3306/demo'\n user = 'my-user'\n password = 'my-password'\n }\n }\n}\n```\n\nIn the above example, replace `my-db` with a name of your choice (this name will be used in the script to reference the corresponding database connection coordinates). Also, provide a `url`, `user` and `password` matching your database server.\n\nYour script should then look like the following:\n\n```\nnextflow.enable.dsl=2\n\nprocess myProcess {\n input:\n tuple val(sample_id), path(sample_in)\n output:\n tuple val(sample_id), path('sample.out')\n\n \"\"\"\n your_command --input $sample_id > sample.out\n \"\"\"\n}\n\nworkflow {\n\n query = 'select SAMPLE_ID, SAMPLE_FILE from SAMPLES'\n channel.sql.fromQuery(query, db: 'my-db') \\\n | myProcess \\\n | sqlInsert(table: 'RESULTS', db: 'my-db')\n\n}\n```\n\nThe above example shows how to perform a simple database query, pipe the results to a fictitious process named `myProcess` and finally store the process outputs into a database table named `RESULTS`.\n\nIt is worth noting that Nextflow allows the use of any number of database instances in your pipeline, simply defining them in the configuration file using the syntax shown above. This could be useful to fetch database data from one data source and store the results into a different one.\n\nAlso, this makes it straightforward to write [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) scripts that span across multiple data sources.\n\nFind more details about the SQL plugin for Nextflow at [this link](https://github.com/nextflow-io/nf-sqldb).\n\n## What about the self-contained property?\n\nYou may wonder if adding this capability breaks the self-contained property of Nextflow pipelines which allows them to be run in a single command and to be tested with continuous integration services e.g. GitHub Action.\n\nThe good news is that it does not ... or at least it should not if used properly.\n\nIn fact, the SQL plugin includes the [H2](http://www.h2database.com/html/features.html) embedded in-memory SQL database that is used by default when no other database is provided in the Nextflow configuration file and can be used for developing and testing your pipeline without the need for a separate database service.\n\nTip: Other than this, H2 also provides the capability to access and query CSV/TSV files as SQL tables. Read more about this feature at [this link](http://www.h2database.com/html/tutorial.html?highlight=csv&search=csv#csv).\n\n### Conclusion\n\nThe use of this plugin adds to Nextflow the capability to query and store data into the SQL databases. Currently, the most popular SQL technologies are supported such as MySQL, PostgreSQL and MariaDB. In the future, support for other database technologies e.g. MongoDB, DynamoDB could be added.\n\nNotably, the support for SQL data-stores has been implemented preserving the core Nextflow capabilities to allow portable and self-contained pipeline scripts that can be developed locally, tested through CI services, and deployed at scale into production environments.\n\nIf you have any questions or suggestions, please feel free to comment in the project discussion group at [this link](https://github.com/nextflow-io/nf-sqldb/discussions).\n\nCredits to [Francesco Strozzi](https://twitter.com/fstrozzi) & [Raoul J.P. Bonnal](https://twitter.com/bonnalr) for having contributed to this work 🙏.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,plugins,sql" @@ -392,7 +392,7 @@ "slug": "2021/setup-nextflow-on-windows", "title": "Setting up a Nextflow environment on Windows 10", "date": "2021-10-13T00:00:00.000Z", - "content": "\nFor Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as [Cygwin](http://www.cygwin.com/) or [Wubi](https://wiki.ubuntu.com/WubiGuide). Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.\n\nThe Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch.\n\nThis document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10.\n\n## High-level Steps\n\nThe steps described in this guide are as follows:\n\n- Install Windows PowerShell\n- Configure the Windows Subsystem for Linux (WSL2)\n- Obtain and Install a Linux distribution (on WSL2)\n- Install Windows Terminal\n- Install and configure Docker\n- Download and install an IDE (VS Code)\n- Install and test Nextflow\n- Configure X-Windows for use with the Nextflow Console\n- Install and Configure GIT\n\n## Install Windows PowerShell\n\nPowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install.\n\nPowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality.\n\n- You can obtain PowerShell for Windows from GitHub at the URL https://github.com/PowerShell/PowerShell.\n- Download and install the latest stable version of PowerShell for Windows x64 - e.g., [powershell-7.1.3-win-x64.msi](https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi).\n- If you run into difficulties, Microsoft provides detailed instructions [here](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1).\n\n## Configure the Windows Subsystem for Linux (WSL)\n\n### Enable the Windows Subsystem for Linux\n\nMake sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by select WIN-R (using the Windows key to run a command) and running the utility `winver`.\n\nFrom within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"_Run as administrator_\".\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nYou can learn more about DISM [here](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism).\n\n### Step 2: Enable the Virtual Machine Feature\n\nWithin PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nAfter enabling the Virtual Machine Platform support, **restart your machine**.\n\n### Step 3: Download the Linux Kernel Update Package\n\nNextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 [here](https://docs.microsoft.com/en-us/windows/wsl/compare-versions). Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:\n\n[WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi)\n\nOnce downloaded, double click on the kernel update package and select \"Yes\" to install it with elevated permissions.\n\n### STEP 4: Set WSL2 as your Default Version\n\nFrom within PowerShell:\n\n```powershell\nPS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2\n```\n\nIf you run into difficulties with any of these steps, Microsoft provides detailed installation instructions [here](https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps).\n\n## Obtain and Install a Linux Distribution on WSL\n\nIf you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux.\n\n- Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - https://aka.ms/wslstore.\n\n ![Linux Distributions at the Microsoft Store](/img/ms-store.png)\n\n- We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.\n- When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have `sudo` privileges. Whenever a shell is started, it will default to this user.\n- After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:\n\n ```bash\n sudo apt update && sudo apt upgrade\n ```\n\n- This is also a good time to add any additional Linux packages that you will want to use.\n\n ```bash\n sudo apt install net-tools\n ```\n\n## Install Windows Terminal\n\nWhile not necessary, it is a good idea to install [Windows Terminal](https://github.com/microsoft/terminal) at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.\n\nWindows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs.\n\n![Windows Terminal](/img/windows-terminal.png)\n\nInstructions for downloading and installing Windows Terminal are available at: https://docs.microsoft.com/en-us/windows/terminal/get-started.\n\nIt is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments.\n\nSome Windows Terminal commands you'll need right away are provided below:\n\n- Split the active window vertically: SHIFT ALT =\n- Split the active window horizontally: SHIFT ALT \n- Resize the active window: SHIFT ALT ``\n- Open a new window under the current tab: ALT v (_the new tab icon along the top of the Windows Terminal interface_)\n\n## Installing Docker on Windows\n\nThere are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this \"native Linux\" approach is provided [here](https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9).\n\nA second method is to run [Docker Desktop](https://www.docker.com/products/docker-desktop) on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:\n\n- The installation process is automated\n- Docker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)\n- Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extension\n- Docker Desktop provides support for auto-installing a single-node Kubernetes cluster\n- The Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user's perspective, Docker appears to be running natively on Linux.\n\nAn explanation of how the Docker Desktop WSL 2 Back-end works is provided [here](https://www.docker.com/blog/new-docker-desktop-wsl2-backend/).\n\n### Step 1: Install Docker Desktop on Windows\n\n- Download and install Docker Desktop for Windows from the following link: https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exe\n- Follow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.\n- After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.\n- The installation process is straightforward, but if you run into difficulties, detailed instructions are available [here](https://docs.docker.com/docker-for-windows/install/).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-images.png)\n\n The Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available [here](https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-windows-arch.png)\n\n### Step 2: Verify the Docker installation\n\nNow that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.\n\n- Run a Docker command from the Linux shell as shown below below. This command downloads a **centos** image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n\n ```console\n $ docker run -ti centos:6\n [root@02ac0beb2d2c /]# hostname\n 02ac0beb2d2c\n ```\n\n- You can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running `docker ps` in another window shows the running CentOS Docker container.\n\n ```console\n $ docker ps\n CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\n f5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes \thappy_hopper\n ```\n\n### Step 3: Dealing with exit code 139\n\nYou may encounter exit code `139` when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided [here](https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438) and [here](https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139).\n\nThe solution is to add two lines to a `.wslconfig` file in your Windows home directory. The `.wslconfig` file specifies kernel options that apply to all Linux distributions running under WSL 2.\n\nSome of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix.\n\n- Edit the `.wslconfig` file in your Windows home directory. You can do this using PowerShell as shown:\n\n ```powershell\n PS C:\\Users\\ notepad .wslconfig\n ```\n\n- Add these two lines to the `.wslconfig` file and save it:\n\n ```ini\n [wsl2]\n kernelCommandLine = vsyscall=emulate\n ```\n\n- After this, **restart your machine** to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code `139`.\n\n## Install Visual Studio Code as your IDE (optional)\n\nDevelopers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:\n\n- Visual Studio Code - https://code.visualstudio.com/Download (Nextflow VSCode Language plug-in [here](https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md))\n- Eclipse - https://www.eclipse.org/\n- VIM - https://www.vim.org/ (VIM plug-in for Nextflow [here](https://github.com/LukeGoodsell/nextflow-vim))\n- Emacs - https://www.gnu.org/software/emacs/download.html (Nextflow syntax highlighter [here](https://github.com/Emiller88/nextflow-mode))\n- JetBrains PyCharm - https://www.jetbrains.com/pycharm/\n- IntelliJ IDEA - https://www.jetbrains.com/idea/\n- Atom – https://atom.io/ (Nextflow Atom support available [here](https://atom.io/packages/language-nextflow))\n- Notepad++ - https://notepad-plus-plus.org/\n\nWe decided to install Visual Studio Code because it has some nice features, including:\n\n- Support for source code control from within the IDE (Git)\n- Support for developing on Linux via its WSL 2 Video Studio Code Backend\n- A library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an [extension pack for the nf-core community](https://github.com/nf-core/vscode-extensionpack).\n\nDownload Visual Studio Code from https://code.visualstudio.com/Download and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.\n\n- Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path `\\\\wsl$\\`. In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: [**\\\\wsl$\\Ubuntu-20.04**](file://wsl$/Ubuntu-20.04).\n\nNote that the reverse is possible also – from within Linux, `/mnt/c` maps to the Windows C: drive. You can inspect `/etc/mtab` to see the mounted file systems available under Linux.\n\n- It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at https://github.com/nextflow-io/vscode-language-nextflow\n\n ![Nextflow Visual Studio Code Extension](/img/nf-vscode-ext.png)\n\n## Visual Studio Code Remote Development\n\nVisual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment.\n\nWindows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below.\n\n![The Remote Development Environment in VS Code](/img/vscode-remote-dev.png)\n\nAn explanation of how VS Code Remote Development works is provided [here](https://code.visualstudio.com/docs/remote/remote-overview).\n\nVS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.\n\nTo open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select _\"New WSL Window using Distro ..\"_ and select `Ubuntu 20.04`. You'll notice that the environment changes to show that you are working in the WSL: `Ubuntu-20.04` environment.\n\n![Selecting the Remote Dev Environment within VS Code](/img/remote-dev-side-by-side.png)\n\nSelecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.\n\nThe Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.\n\n![Local vs. Remote Extensions in VS Code](/img/vscode-extensions.png)\n\n## Installing Nextflow\n\nWith Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available at https://www.nextflow.io/docs/latest/getstarted.html#installation\n\n### Step 1: Make sure Java is installed (under WSL)\n\nJava is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available [here](https://linuxize.com/post/install-java-on-ubuntu-18-04/). To install the default OpenJDK, follow the instructions below in a Linux shell window:\n\n- Update the _apt_ package index:\n\n ```bash\n sudo apt update\n ```\n\n- Install the latest default OpenJDK package\n\n ```bash\n sudo apt install default-jdk\n ```\n\n- Verify the installation\n\n ```bash\n java -version\n ```\n\n### Step 2: Make sure curl is installed\n\n`curl` is a convenient way to obtain Nextflow. `curl` is included in the default Ubuntu repositories, so installation is straightforward.\n\n- From the shell:\n\n ```bash\n sudo apt update\n sudo apt install curl\n ```\n\n- Verify that `curl` works:\n\n ```console\n $ curl\n curl: try 'curl --help' or 'curl --manual' for more information\n ```\n\n### STEP 3: Download and install Nextflow\n\n- Use `curl` to retrieve Nextflow into a temporary directory and then install it in `/usr/bin` so that the Nextflow command is on your path:\n\n ```bash\n mkdir temp\n cd temp\n curl -s https://get.nextflow.io | bash\n sudo cp nextflow /usr/bin\n ```\n\n- Make sure that Nextflow is executable:\n\n ```bash\n sudo chmod 755 /usr/bin/nextflow\n ```\n\n or if you prefer:\n\n ```bash\n sudo chmod +x /usr/bin/nextflow\n ```\n\n### Step 4: Verify the Nextflow installation\n\n- Make sure Nextflow runs:\n\n ```console\n $ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io\n ```\n\n- Run a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.\n\n ```console\n $ nextflow run hello\n\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\n executor > local (4)\n [06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\n Ciao world!\n\n Hola world!\n\n Bonjour world!\n\n Hello world!\n ```\n\n### Step 5: Run a Containerized Workflow\n\nTo validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at https://github.com/nextflow-io/blast-example. Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.\n\nThe `nextflow.config` file for the Blast example (below) specifies that process logic is encapsulated in the container `nextflow/examples` available from Docker Hub (https://hub.docker.com/r/nextflow/examples).\n\n- On GitHub: [nextflow-io/blast-example/nextflow.config](https://github.com/nextflow-io/blast-example/blob/master/nextflow.config)\n\n ```groovy\n manifest {\n nextflowVersion = '>= 20.01.0'\n }\n\n process {\n container = 'nextflow/examples'\n }\n ```\n\n- Run the _blast-example_ pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:\n\n ```console\n $ nextflow run blast-example -with-docker\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\n executor > local (2)\n [aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n [b3/c41401] process > extract (1) [100%] 1 of 1 ✔\n matching sequences:\n >lcl|1ABO:B unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1ABO:A unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1YCS:B unnamed protein product\n PEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\n GHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\n VQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n >lcl|1IHD:C unnamed protein product\n LPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT\n ```\n\n- Nextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes.\n\n## Configuring an XServer for the Nextflow Console\n\nPipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console's REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.\n\nThe Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided [here](https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3).\n\n- Download an X-Windows server for Windows. In this example, we use the _VcXsrv Windows X Server_ available from source forge at https://sourceforge.net/projects/vcxsrv/.\n\n- Accept all the defaults when running the automated installer. The X-server will end up installed in `c:\\Program Files\\VcXsrv`.\n\n- The automated installation of VcXsrv will create an _\"XLaunch\"_ shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don't need to interact with the XLaunch interface every time you start the X-server.\n\n- Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:\n\n ```powershell\n \"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto\n ```\n\n- Inspecting the new shortcut properties, it should look something like this:\n\n ![X-Server (vcxsrc) Properties](/img/xserver.png)\n\n- Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.\n\n- One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the `netstat` command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using `netstat`, you should see output like the following:\n\n ```powershell\n PS C:\\WINDOWS\\system32> **netstat -abno | findstr 6000**\n TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING 35176\n TCP 127.0.0.1:6000 127.0.0.1:56516 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56517 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56518 ESTABLISHED 35176\n TCP 127.0.0.1:56516 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56517 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56518 127.0.0.1:6000 ESTABLISHED 35176\n TCP 172.28.192.1:6000 172.28.197.205:46290 TIME_WAIT 0\n TCP [::]:6000 [::]:0 LISTENING 35176\n ```\n\n- At this point, the X-server is up and running and awaiting a connection from a client.\n\n- Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.\n\n- The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly.\n\n ```console\n $ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n $ echo $DISPLAY\n 172.28.192.1:0.0\n ```\n\n- Add this command to the end of your `.bashrc` file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n\n ```bash\n cd ~\n vi .bashrc\n ```\n\n ```bash\n # set the X-Windows display to connect to VcXsrv on Windows\n export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n \".bashrc\" 120L, 3912C written\n ```\n\n- Use an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:\n\n ```bash\n sudo apt install xterm\n ```\n\n- Assuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\n Before testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\n Depending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n\n ![Ensure that the Firewall is not interfering](/img/firewall.png)\n\n- With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:\n\n ```bash\n xterm &\n ```\n\n- If everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n\n ![Launch an xterm to verify functionality](/img/xterm.png)\n\n- Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to _\"Open ports to Work and Home networks\"_ for the `vcxsrv.exe` program only as shown:\n\n ![Allowing access to XServer traffic](/img/xserver_setup.png)\n\n- With the X-server running, the `DISPLAY` variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:\n\n ```bash\n nextflow console\n ```\n\n The command above opens the Nextflow REPL console under X-Windows.\n\n ![Nextflow REPL Console under X-Windows](/img/repl_console.png)\n\nInside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.\n\n# Installing Git\n\nCollaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.\n\nAs explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)\n\nDevelopers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.\n\n### Step 1: Install Git on Windows (optional)\n\n- Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.\n\n- Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).\n\n ![Installing Git on Windows](/img/git-install.png)\n\n- The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the `PATH` variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.\n\n- After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git.\n\n- If you plan to use Git from the command line, GitHub provides a useful cheatsheet [here](https://training.github.com/downloads/github-git-cheat-sheet.pdf).\n\n- After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.\n\n ![Using Git within VS Code](/img/git-vscode.png)\n\n- Documentation on using Git with Visual Studio Code is provided at https://code.visualstudio.com/docs/editor/versioncontrol\n\n### Step 2: Install Git on Linux\n\n- Open a Remote VS Code Window on **\\*WSL: Ubuntu 20.04\\*** (By selecting the green icon on the lower-left corner of the VS code interface.)\n\n- Git should already be installed in `/usr/bin`, but you can validate this from the Ubuntu shell:\n\n ```console\n $ git --version\n git version 2.25.1\n ```\n\n- To get started using Git with VS Code Remote on WSL, select the _Source Control icon_ on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to _Clone a Repository_.\n\n- Select \"Clone Repository,\" and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see _blast-example_ appear as a source code repository within VS code as shown:\n\n ![Using Git within VS Code](/img/git-linux-1.png)\n\n- Select the _Explorer_ panel in VS Code to see the cloned _blast-example_ repo. Now we can explore and modify the pipeline code using the IDE.\n\n ![Using Git within VS Code](/img/git-linux-2.png)\n\n- After making modifications to the pipeline, we can execute the _local copy_ of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:\n\n ![Using Git within VS Code](/img/git-linux-3.png)\n\n- With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.\n\n- Git commands are available from within VS Code by selecting the _Source Control_ icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.\n\n ![Using Git within VS Code](/img/git-linux-4.png)\n\n## Summary\n\nWith WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers.\n\nPipelines developed in the Windows environment can easily be extended to compute environments in the cloud.\n\nWhile installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful.\n", + "content": "For Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as [Cygwin](http://www.cygwin.com/) or [Wubi](https://wiki.ubuntu.com/WubiGuide). Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.\n\nThe Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch.\n\nThis document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10.\n\n## High-level Steps\n\nThe steps described in this guide are as follows:\n\n- Install Windows PowerShell\n- Configure the Windows Subsystem for Linux (WSL2)\n- Obtain and Install a Linux distribution (on WSL2)\n- Install Windows Terminal\n- Install and configure Docker\n- Download and install an IDE (VS Code)\n- Install and test Nextflow\n- Configure X-Windows for use with the Nextflow Console\n- Install and Configure GIT\n\n## Install Windows PowerShell\n\nPowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install.\n\nPowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality.\n\n- You can obtain PowerShell for Windows from GitHub at the URL https://github.com/PowerShell/PowerShell.\n- Download and install the latest stable version of PowerShell for Windows x64 - e.g., [powershell-7.1.3-win-x64.msi](https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi).\n- If you run into difficulties, Microsoft provides detailed instructions [here](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1).\n\n## Configure the Windows Subsystem for Linux (WSL)\n\n### Enable the Windows Subsystem for Linux\n\nMake sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by select WIN-R (using the Windows key to run a command) and running the utility `winver`.\n\nFrom within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"_Run as administrator_\".\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nYou can learn more about DISM [here](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism).\n\n### Step 2: Enable the Virtual Machine Feature\n\nWithin PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nAfter enabling the Virtual Machine Platform support, **restart your machine**.\n\n### Step 3: Download the Linux Kernel Update Package\n\nNextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 [here](https://docs.microsoft.com/en-us/windows/wsl/compare-versions). Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:\n\n[WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi)\n\nOnce downloaded, double click on the kernel update package and select \"Yes\" to install it with elevated permissions.\n\n### STEP 4: Set WSL2 as your Default Version\n\nFrom within PowerShell:\n\n```powershell\nPS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2\n```\n\nIf you run into difficulties with any of these steps, Microsoft provides detailed installation instructions [here](https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps).\n\n## Obtain and Install a Linux Distribution on WSL\n\nIf you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux.\n\n- Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - https://aka.ms/wslstore.\n\n ![Linux Distributions at the Microsoft Store](/img/ms-store.png)\n\n- We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.\n- When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have `sudo` privileges. Whenever a shell is started, it will default to this user.\n- After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:\n\n ```bash\n sudo apt update && sudo apt upgrade\n ```\n\n- This is also a good time to add any additional Linux packages that you will want to use.\n\n ```bash\n sudo apt install net-tools\n ```\n\n## Install Windows Terminal\n\nWhile not necessary, it is a good idea to install [Windows Terminal](https://github.com/microsoft/terminal) at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.\n\nWindows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs.\n\n![Windows Terminal](/img/windows-terminal.png)\n\nInstructions for downloading and installing Windows Terminal are available at: https://docs.microsoft.com/en-us/windows/terminal/get-started.\n\nIt is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments.\n\nSome Windows Terminal commands you'll need right away are provided below:\n\n- Split the active window vertically: SHIFT ALT =\n- Split the active window horizontally: SHIFT ALT \n- Resize the active window: SHIFT ALT ``\n- Open a new window under the current tab: ALT v (_the new tab icon along the top of the Windows Terminal interface_)\n\n## Installing Docker on Windows\n\nThere are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this \"native Linux\" approach is provided [here](https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9).\n\nA second method is to run [Docker Desktop](https://www.docker.com/products/docker-desktop) on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:\n\n- The installation process is automated\n- Docker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)\n- Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extension\n- Docker Desktop provides support for auto-installing a single-node Kubernetes cluster\n- The Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user's perspective, Docker appears to be running natively on Linux.\n\nAn explanation of how the Docker Desktop WSL 2 Back-end works is provided [here](https://www.docker.com/blog/new-docker-desktop-wsl2-backend/).\n\n### Step 1: Install Docker Desktop on Windows\n\n- Download and install Docker Desktop for Windows from the following link: https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exe\n- Follow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.\n- After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.\n- The installation process is straightforward, but if you run into difficulties, detailed instructions are available [here](https://docs.docker.com/docker-for-windows/install/).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-images.png)\n\n The Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available [here](https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-windows-arch.png)\n\n### Step 2: Verify the Docker installation\n\nNow that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.\n\n- Run a Docker command from the Linux shell as shown below below. This command downloads a **centos** image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n\n ```console\n $ docker run -ti centos:6\n [root@02ac0beb2d2c /]# hostname\n 02ac0beb2d2c\n ```\n\n- You can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running `docker ps` in another window shows the running CentOS Docker container.\n\n ```console\n $ docker ps\n CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\n f5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes \thappy_hopper\n ```\n\n### Step 3: Dealing with exit code 139\n\nYou may encounter exit code `139` when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided [here](https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438) and [here](https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139).\n\nThe solution is to add two lines to a `.wslconfig` file in your Windows home directory. The `.wslconfig` file specifies kernel options that apply to all Linux distributions running under WSL 2.\n\nSome of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix.\n\n- Edit the `.wslconfig` file in your Windows home directory. You can do this using PowerShell as shown:\n\n ```powershell\n PS C:\\Users\\ notepad .wslconfig\n ```\n\n- Add these two lines to the `.wslconfig` file and save it:\n\n ```ini\n [wsl2]\n kernelCommandLine = vsyscall=emulate\n ```\n\n- After this, **restart your machine** to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code `139`.\n\n## Install Visual Studio Code as your IDE (optional)\n\nDevelopers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:\n\n- Visual Studio Code - https://code.visualstudio.com/Download (Nextflow VSCode Language plug-in [here](https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md))\n- Eclipse - https://www.eclipse.org/\n- VIM - https://www.vim.org/ (VIM plug-in for Nextflow [here](https://github.com/LukeGoodsell/nextflow-vim))\n- Emacs - https://www.gnu.org/software/emacs/download.html (Nextflow syntax highlighter [here](https://github.com/Emiller88/nextflow-mode))\n- JetBrains PyCharm - https://www.jetbrains.com/pycharm/\n- IntelliJ IDEA - https://www.jetbrains.com/idea/\n- Atom – https://atom.io/ (Nextflow Atom support available [here](https://atom.io/packages/language-nextflow))\n- Notepad++ - https://notepad-plus-plus.org/\n\nWe decided to install Visual Studio Code because it has some nice features, including:\n\n- Support for source code control from within the IDE (Git)\n- Support for developing on Linux via its WSL 2 Video Studio Code Backend\n- A library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an [extension pack for the nf-core community](https://github.com/nf-core/vscode-extensionpack).\n\nDownload Visual Studio Code from https://code.visualstudio.com/Download and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.\n\n- Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path `\\\\wsl$\\`. In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: [**\\\\wsl$\\Ubuntu-20.04**](file://wsl$/Ubuntu-20.04).\n\nNote that the reverse is possible also – from within Linux, `/mnt/c` maps to the Windows C: drive. You can inspect `/etc/mtab` to see the mounted file systems available under Linux.\n\n- It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at https://github.com/nextflow-io/vscode-language-nextflow\n\n ![Nextflow Visual Studio Code Extension](/img/nf-vscode-ext.png)\n\n## Visual Studio Code Remote Development\n\nVisual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment.\n\nWindows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below.\n\n![The Remote Development Environment in VS Code](/img/vscode-remote-dev.png)\n\nAn explanation of how VS Code Remote Development works is provided [here](https://code.visualstudio.com/docs/remote/remote-overview).\n\nVS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.\n\nTo open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select _\"New WSL Window using Distro ..\"_ and select `Ubuntu 20.04`. You'll notice that the environment changes to show that you are working in the WSL: `Ubuntu-20.04` environment.\n\n![Selecting the Remote Dev Environment within VS Code](/img/remote-dev-side-by-side.png)\n\nSelecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.\n\nThe Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.\n\n![Local vs. Remote Extensions in VS Code](/img/vscode-extensions.png)\n\n## Installing Nextflow\n\nWith Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available at https://www.nextflow.io/docs/latest/getstarted.html#installation\n\n### Step 1: Make sure Java is installed (under WSL)\n\nJava is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available [here](https://linuxize.com/post/install-java-on-ubuntu-18-04/). To install the default OpenJDK, follow the instructions below in a Linux shell window:\n\n- Update the _apt_ package index:\n\n ```bash\n sudo apt update\n ```\n\n- Install the latest default OpenJDK package\n\n ```bash\n sudo apt install default-jdk\n ```\n\n- Verify the installation\n\n ```bash\n java -version\n ```\n\n### Step 2: Make sure curl is installed\n\n`curl` is a convenient way to obtain Nextflow. `curl` is included in the default Ubuntu repositories, so installation is straightforward.\n\n- From the shell:\n\n ```bash\n sudo apt update\n sudo apt install curl\n ```\n\n- Verify that `curl` works:\n\n ```console\n $ curl\n curl: try 'curl --help' or 'curl --manual' for more information\n ```\n\n### STEP 3: Download and install Nextflow\n\n- Use `curl` to retrieve Nextflow into a temporary directory and then install it in `/usr/bin` so that the Nextflow command is on your path:\n\n ```bash\n mkdir temp\n cd temp\n curl -s https://get.nextflow.io | bash\n sudo cp nextflow /usr/bin\n ```\n\n- Make sure that Nextflow is executable:\n\n ```bash\n sudo chmod 755 /usr/bin/nextflow\n ```\n\n or if you prefer:\n\n ```bash\n sudo chmod +x /usr/bin/nextflow\n ```\n\n### Step 4: Verify the Nextflow installation\n\n- Make sure Nextflow runs:\n\n ```console\n $ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io\n ```\n\n- Run a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.\n\n ```console\n $ nextflow run hello\n\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\n executor > local (4)\n [06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\n Ciao world!\n\n Hola world!\n\n Bonjour world!\n\n Hello world!\n ```\n\n### Step 5: Run a Containerized Workflow\n\nTo validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at https://github.com/nextflow-io/blast-example. Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.\n\nThe `nextflow.config` file for the Blast example (below) specifies that process logic is encapsulated in the container `nextflow/examples` available from Docker Hub (https://hub.docker.com/r/nextflow/examples).\n\n- On GitHub: [nextflow-io/blast-example/nextflow.config](https://github.com/nextflow-io/blast-example/blob/master/nextflow.config)\n\n ```groovy\n manifest {\n nextflowVersion = '>= 20.01.0'\n }\n\n process {\n container = 'nextflow/examples'\n }\n ```\n\n- Run the _blast-example_ pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:\n\n ```console\n $ nextflow run blast-example -with-docker\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\n executor > local (2)\n [aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n [b3/c41401] process > extract (1) [100%] 1 of 1 ✔\n matching sequences:\n >lcl|1ABO:B unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1ABO:A unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1YCS:B unnamed protein product\n PEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\n GHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\n VQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n >lcl|1IHD:C unnamed protein product\n LPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT\n ```\n\n- Nextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes.\n\n## Configuring an XServer for the Nextflow Console\n\nPipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console's REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.\n\nThe Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided [here](https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3).\n\n- Download an X-Windows server for Windows. In this example, we use the _VcXsrv Windows X Server_ available from source forge at https://sourceforge.net/projects/vcxsrv/.\n\n- Accept all the defaults when running the automated installer. The X-server will end up installed in `c:\\Program Files\\VcXsrv`.\n\n- The automated installation of VcXsrv will create an _\"XLaunch\"_ shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don't need to interact with the XLaunch interface every time you start the X-server.\n\n- Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:\n\n ```powershell\n \"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto\n ```\n\n- Inspecting the new shortcut properties, it should look something like this:\n\n ![X-Server (vcxsrc) Properties](/img/xserver.png)\n\n- Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.\n\n- One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the `netstat` command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using `netstat`, you should see output like the following:\n\n ```powershell\n PS C:\\WINDOWS\\system32> **netstat -abno | findstr 6000**\n TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING 35176\n TCP 127.0.0.1:6000 127.0.0.1:56516 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56517 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56518 ESTABLISHED 35176\n TCP 127.0.0.1:56516 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56517 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56518 127.0.0.1:6000 ESTABLISHED 35176\n TCP 172.28.192.1:6000 172.28.197.205:46290 TIME_WAIT 0\n TCP [::]:6000 [::]:0 LISTENING 35176\n ```\n\n- At this point, the X-server is up and running and awaiting a connection from a client.\n\n- Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.\n\n- The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly.\n\n ```console\n $ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n $ echo $DISPLAY\n 172.28.192.1:0.0\n ```\n\n- Add this command to the end of your `.bashrc` file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n\n ```bash\n cd ~\n vi .bashrc\n ```\n\n ```bash\n # set the X-Windows display to connect to VcXsrv on Windows\n export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n \".bashrc\" 120L, 3912C written\n ```\n\n- Use an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:\n\n ```bash\n sudo apt install xterm\n ```\n\n- Assuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\n Before testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\n Depending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n\n ![Ensure that the Firewall is not interfering](/img/firewall.png)\n\n- With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:\n\n ```bash\n xterm &\n ```\n\n- If everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n\n ![Launch an xterm to verify functionality](/img/xterm.png)\n\n- Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to _\"Open ports to Work and Home networks\"_ for the `vcxsrv.exe` program only as shown:\n\n ![Allowing access to XServer traffic](/img/xserver_setup.png)\n\n- With the X-server running, the `DISPLAY` variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:\n\n ```bash\n nextflow console\n ```\n\n The command above opens the Nextflow REPL console under X-Windows.\n\n ![Nextflow REPL Console under X-Windows](/img/repl_console.png)\n\nInside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.\n\n# Installing Git\n\nCollaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.\n\nAs explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)\n\nDevelopers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.\n\n### Step 1: Install Git on Windows (optional)\n\n- Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.\n\n- Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).\n\n ![Installing Git on Windows](/img/git-install.png)\n\n- The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the `PATH` variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.\n\n- After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git.\n\n- If you plan to use Git from the command line, GitHub provides a useful cheatsheet [here](https://training.github.com/downloads/github-git-cheat-sheet.pdf).\n\n- After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.\n\n ![Using Git within VS Code](/img/git-vscode.png)\n\n- Documentation on using Git with Visual Studio Code is provided at https://code.visualstudio.com/docs/editor/versioncontrol\n\n### Step 2: Install Git on Linux\n\n- Open a Remote VS Code Window on **\\*WSL: Ubuntu 20.04\\*** (By selecting the green icon on the lower-left corner of the VS code interface.)\n\n- Git should already be installed in `/usr/bin`, but you can validate this from the Ubuntu shell:\n\n ```console\n $ git --version\n git version 2.25.1\n ```\n\n- To get started using Git with VS Code Remote on WSL, select the _Source Control icon_ on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to _Clone a Repository_.\n\n- Select \"Clone Repository,\" and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see _blast-example_ appear as a source code repository within VS code as shown:\n\n ![Using Git within VS Code](/img/git-linux-1.png)\n\n- Select the _Explorer_ panel in VS Code to see the cloned _blast-example_ repo. Now we can explore and modify the pipeline code using the IDE.\n\n ![Using Git within VS Code](/img/git-linux-2.png)\n\n- After making modifications to the pipeline, we can execute the _local copy_ of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:\n\n ![Using Git within VS Code](/img/git-linux-3.png)\n\n- With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.\n\n- Git commands are available from within VS Code by selecting the _Source Control_ icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.\n\n ![Using Git within VS Code](/img/git-linux-4.png)\n\n## Summary\n\nWith WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers.\n\nPipelines developed in the Windows environment can easily be extended to compute environments in the cloud.\n\nWhile installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful.\n", "images": [], "author": "Evan Floden", "tags": "windows,learning" @@ -401,7 +401,7 @@ "slug": "2022/caching-behavior-analysis", "title": "Analyzing caching behavior of pipelines", "date": "2022-11-10T00:00:00.000Z", - "content": "\nThe ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending `-resume` to the `nextflow run` command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.\n\nWe have previously written about Nextflow's [resume functionality](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) as well as some [troubleshooting strategies](https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html) to gain more insights on the caching behavior.\n\nIn this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.\n\n### Local setup for the test\n\nFirst, we clone the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline locally:\n\n```bash\n$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf\n```\n\nIn the examples below, we have used Nextflow `v22.10.0`, Docker `v20.10.8` and `Java v17 LTS` on MacOS.\n\n### Pipeline flowchart\n\nThe flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.\n\n![rnaseq-nf](/img/rnaseq-nf.base.png)\n\n### Logs from initial (fresh) run\n\nAs a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article [Troubleshooting Nextflow resume](https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).\n\nTo have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called `fresh_run.log` and use them later on as \"ground-truth\" for the analysis. In order to save the process hashes we use the `-dump-hashes` flag, which prints them to the log.\n\n**TIP:** We rely upon the [`-log` option](https://www.nextflow.io/docs/latest/cli.html#execution-logs) in the `nextflow` command line interface to be able to supply a custom log file name instead of the default `.nextflow.log`.\n\n```console\n$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n### Edit the `FastQC` process\n\nAfter the initial run of the pipeline, we introduce a change in the `fastqc.nf` module, hard coding the number of threads which should be used to run the `FASTQC` process via Nextflow's [`cpus` directive](https://www.nextflow.io/docs/latest/process.html#cpus).\n\nHere's the output of `git diff` on the contents of `modules/fastqc/main.nf` file:\n\n```diff\n--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }\n```\n\n### Logs from the follow up run\n\nNext, we run the pipeline again with the `-resume` option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called `resumed_run.log`.\n\n```console\n$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n## Analysis of cache hashes\n\nFrom the summary of the command line output above, we can see that the `RNASEQ:FASTQC (FASTQC on ggal_gut)` and `MULTIQC` processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the `fresh_run` and `resumed_run`.\n\nFor the analysis, we need to keep in mind that:\n\n1. The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.\n\n2. The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.\n\n### Find the process level hashes\n\nWe can use standard Unix tools like `grep`, `cut` and `sort` to address these points and filter out the relevant information:\n\n1. Use `grep` to isolate log entries with `cache hash` string\n2. Remove the prefix time-stamps using `cut -d ‘-’ -f 3`\n3. Remove the caching mode related information using `cut -d ';' -f 1`\n4. Sort the lines based on process names using `sort` to have a standard order before comparison\n5. Use `tee` to print the resultant strings to the terminal and simultaneously save to a file\n\nNow, let’s apply these transformations to the `fresh_run.log` as well as `resumed_run.log` entries.\n\n- `fresh_run.log`\n\n```console\n$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n- `resumed_run.log`\n\n```console\n$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n### Inference from process top-level hashes\n\nComputing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in [Demystifying Nextflow resume](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) blog post) . The change we made in the task level CPUs directive and script section of the `FASTQC` process triggered a re-computation of hashes:\n\n```diff\n--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n\n\n```\n\nEven though we only introduced changes in `FASTQC`, the `MULTIQC` process was re-run since it relies upon the output of the `FASTQC` process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:\n\n![rnaseq-nf after modification](/img/rnaseq-nf.modified.png)\n\n### Understanding why `FASTQC` was re-run\n\nWe can see the full list of `FASTQC` process hashes within the `fresh_run.log` file\n\n```console\n\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n\n```\n\nWhen we isolate and compare the log entries for `FASTQC` between `fresh_run.log` and `resumed_run.log`, we see the following diff:\n\n```diff\n--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n\n```\n\nObservations from the diff:\n\n1. We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command.\n2. There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.\n\nIn other words, the diff from log files is confirming our edits.\n\n### Understanding why `MULTIQC` was re-run\n\nNow, we apply the same analysis technique for the `MULTIQC` process in both log files:\n\n```diff\n--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n```\n\nHere, the highlighted diffs show the directory of the input files, changing as a result of `FASTQC` being re-run; as a result `MULTIQC` has a new hash and has to be re-run as well.\n\n## Conclusion\n\nDebugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run.\n\nWhen analyzing large datasets, it may be worth using the `-dump-hashes` option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.\n\nWhile this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an `nf-cache` plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins.\n", + "content": "The ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending `-resume` to the `nextflow run` command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.\n\nWe have previously written about Nextflow's [resume functionality](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) as well as some [troubleshooting strategies](https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html) to gain more insights on the caching behavior.\n\nIn this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.\n\n### Local setup for the test\n\nFirst, we clone the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline locally:\n\n```bash\n$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf\n```\n\nIn the examples below, we have used Nextflow `v22.10.0`, Docker `v20.10.8` and `Java v17 LTS` on MacOS.\n\n### Pipeline flowchart\n\nThe flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.\n\n![rnaseq-nf](/img/rnaseq-nf.base.png)\n\n### Logs from initial (fresh) run\n\nAs a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article [Troubleshooting Nextflow resume](https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).\n\nTo have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called `fresh_run.log` and use them later on as \"ground-truth\" for the analysis. In order to save the process hashes we use the `-dump-hashes` flag, which prints them to the log.\n\n**TIP:** We rely upon the [`-log` option](https://www.nextflow.io/docs/latest/cli.html#execution-logs) in the `nextflow` command line interface to be able to supply a custom log file name instead of the default `.nextflow.log`.\n\n```console\n$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n### Edit the `FastQC` process\n\nAfter the initial run of the pipeline, we introduce a change in the `fastqc.nf` module, hard coding the number of threads which should be used to run the `FASTQC` process via Nextflow's [`cpus` directive](https://www.nextflow.io/docs/latest/process.html#cpus).\n\nHere's the output of `git diff` on the contents of `modules/fastqc/main.nf` file:\n\n```diff\n--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }\n```\n\n### Logs from the follow up run\n\nNext, we run the pipeline again with the `-resume` option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called `resumed_run.log`.\n\n```console\n$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n## Analysis of cache hashes\n\nFrom the summary of the command line output above, we can see that the `RNASEQ:FASTQC (FASTQC on ggal_gut)` and `MULTIQC` processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the `fresh_run` and `resumed_run`.\n\nFor the analysis, we need to keep in mind that:\n\n1. The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.\n\n2. The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.\n\n### Find the process level hashes\n\nWe can use standard Unix tools like `grep`, `cut` and `sort` to address these points and filter out the relevant information:\n\n1. Use `grep` to isolate log entries with `cache hash` string\n2. Remove the prefix time-stamps using `cut -d ‘-’ -f 3`\n3. Remove the caching mode related information using `cut -d ';' -f 1`\n4. Sort the lines based on process names using `sort` to have a standard order before comparison\n5. Use `tee` to print the resultant strings to the terminal and simultaneously save to a file\n\nNow, let’s apply these transformations to the `fresh_run.log` as well as `resumed_run.log` entries.\n\n- `fresh_run.log`\n\n```console\n$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n- `resumed_run.log`\n\n```console\n$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n### Inference from process top-level hashes\n\nComputing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in [Demystifying Nextflow resume](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) blog post) . The change we made in the task level CPUs directive and script section of the `FASTQC` process triggered a re-computation of hashes:\n\n```diff\n--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n\n```\n\nEven though we only introduced changes in `FASTQC`, the `MULTIQC` process was re-run since it relies upon the output of the `FASTQC` process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:\n\n![rnaseq-nf after modification](/img/rnaseq-nf.modified.png)\n\n### Understanding why `FASTQC` was re-run\n\nWe can see the full list of `FASTQC` process hashes within the `fresh_run.log` file\n\n```console\n\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n\n```\n\nWhen we isolate and compare the log entries for `FASTQC` between `fresh_run.log` and `resumed_run.log`, we see the following diff:\n\n```diff\n--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n\n```\n\nObservations from the diff:\n\n1. We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command.\n2. There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.\n\nIn other words, the diff from log files is confirming our edits.\n\n### Understanding why `MULTIQC` was re-run\n\nNow, we apply the same analysis technique for the `MULTIQC` process in both log files:\n\n```diff\n--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n```\n\nHere, the highlighted diffs show the directory of the input files, changing as a result of `FASTQC` being re-run; as a result `MULTIQC` has a new hash and has to be re-run as well.\n\n## Conclusion\n\nDebugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run.\n\nWhen analyzing large datasets, it may be worth using the `-dump-hashes` option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.\n\nWhile this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an `nf-cache` plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins.", "images": [], "author": "Abhinav Sharma", "tags": "nextflow,cache" @@ -410,7 +410,7 @@ "slug": "2022/czi-mentorship-round-1", "title": "Nextflow and nf-core mentorship, Round 1", "date": "2022-09-18T00:00:00.000Z", - "content": "\n## Introduction\n\n
\n \"Word\n

Word cloud of scientific interest keywords, averaged across all applications.

\n
\n\nOur recent [The State of the Workflow 2022: Community Survey Results](https://seqera.io/blog/state-of-the-workflow-2022-results/) showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.\n\nThanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.\n\nFrom almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.\n\nPeople worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor / mentee pairs](/img/mentorships-round1-map.png)
\n_The mentorship program was worldwide._\n\n## Ndeye Marième Top (mentee) & John Juma (mentor)\n\nFor the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.\n\nTogether, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to [GISAID](https://gisaid.org/). She also got up to speed with the [nf-core/mag](https://nf-co.re/mag) pipeline for metagenomic analysis.\n\n
\n \"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\" - Marième Top (mentee)\n
\n
\n \"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\" - John Juma (mentor)\n
\n\n## Philip Ashton (mentee) & Robert Petit (mentor)\n\nPhilip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the [Bactopia](https://bactopia.github.io/) pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.\n\nRobert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.\n\n
\n \"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\" - Phil Ashton (mentee)\n
\n
\n \"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\" - Robert Petit (mentor)\n
\n\n## Kalayanee Chairat (mentee) & Alison Meynert (mentor)\n\nKalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom.\n\nWorking with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to [nf-core/configs](https://github.com/nf-core/configs) as an institutional profile. Now she is familiar with these tools, Kalayanee is using [nf-core/sarek](https://nf-co.re/sarek) and [nf-core/rnaseq](https://nf-co.re/rnaseq) to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment.\n\n
\n \"The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\" - Kalayanee Chairat (mentee)\n
\n
\n \"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\" - Alison Meynert (mentor)\n
\n\n## Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)\n\nFor the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.\n\nEdward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.\n\n
\n \"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\" - Edward Lukyamuzi (mentee)\n
\n
\n \"I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\" - Emilio Garcia-Rios (mentor)\n
\n\n## Suchitra Thapa (mentee) & Maxime Borry (mentor)\n\nSuchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the [metaphlankrona](https://github.com/suchitrathapa/metaphlankrona) pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.\n\nSuchitra will be sharing her work and progress on the pipeline as a poster at the [Nextflow Summit 2022](https://summit.nextflow.io/speakers/suchitra-thapa/).\n\n
\n \"This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\" - Suchitra Thapa (mentee)\n
\n
\n \"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\" - Maxime Borry (mentor)\n
\n\n## Conclusion\n\nFeedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.\n\n
\n \"This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!\" - John Juma (mentor)\n
\n\n![Screenshot of final zoom meetup](/img/mentorships-round1-zoom.png)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow.\n\nWe were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See [https://nf-co.re/mentorships](https://nf-co.re/mentorships) for details.\n\n

\n Mentorship Round 2 - Details\n

\n", + "content": "## Introduction\n\n
\n \"Word\n \n\n*Word cloud of scientific interest keywords, averaged across all applications.*\n\n
\n\nOur recent [The State of the Workflow 2022: Community Survey Results](https://seqera.io/blog/state-of-the-workflow-2022-results/) showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.\n\nThanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.\n\nFrom almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.\n\nPeople worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor / mentee pairs](/img/mentorships-round1-map.png)
\n_The mentorship program was worldwide._\n\n## Ndeye Marième Top (mentee) & John Juma (mentor)\n\nFor the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.\n\nTogether, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to [GISAID](https://gisaid.org/). She also got up to speed with the [nf-core/mag](https://nf-co.re/mag) pipeline for metagenomic analysis.\n\n> *\"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\"* - Marième Top (mentee)\n\n> *\"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\"* - John Juma (mentor)\n\n## Philip Ashton (mentee) & Robert Petit (mentor)\n\nPhilip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the [Bactopia](https://bactopia.github.io/) pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.\n\nRobert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.\n\n> *\"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\"* - Phil Ashton (mentee)\n\n> *\"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\"* - Robert Petit (mentor)\n\n## Kalayanee Chairat (mentee) & Alison Meynert (mentor)\n\nKalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom.\n\nWorking with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to [nf-core/configs](https://github.com/nf-core/configs) as an institutional profile. Now she is familiar with these tools, Kalayanee is using [nf-core/sarek](https://nf-co.re/sarek) and [nf-core/rnaseq](https://nf-co.re/rnaseq) to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment.\n\n> *\"The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\"* - Kalayanee Chairat (mentee)\n\n> *\"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\"* - Alison Meynert (mentor)\n\n## Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)\n\nFor the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.\n\nEdward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.\n\n> *\"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\"* - Edward Lukyamuzi (mentee)\n\n> *\"I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\"* - Emilio Garcia-Rios (mentor)\n\n## Suchitra Thapa (mentee) & Maxime Borry (mentor)\n\nSuchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the [metaphlankrona](https://github.com/suchitrathapa/metaphlankrona) pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.\n\nSuchitra will be sharing her work and progress on the pipeline as a poster at the [Nextflow Summit 2022](https://summit.nextflow.io/speakers/suchitra-thapa/).\n\n> *\"This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\"* - Suchitra Thapa (mentee)\n\n> *\"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\"* - Maxime Borry (mentor)\n\n## Conclusion\n\nFeedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.\n\n> \"This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!\" - John Juma (mentor)\n\n![Screenshot of final zoom meetup](/img/mentorships-round1-zoom.png)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow.\n\nWe were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See [https://nf-co.re/mentorships](https://nf-co.re/mentorships) for details.\n\n[Mentorship Round 2 - Details](https://nf-co.re/mentorships)", "images": [ "/img/mentorships-round1-wordcloud.png" ], @@ -421,7 +421,7 @@ "slug": "2022/deploy-nextflow-pipelines-with-google-cloud-batch", "title": "Deploy Nextflow Pipelines with Google Cloud Batch!", "date": "2022-07-13T00:00:00.000Z", - "content": "\nA key feature of Nextflow is the ability to abstract the implementation of data analysis pipelines so they can be deployed in a portable manner across execution platforms.\n\nAs of today, Nextflow supports a rich variety of HPC schedulers and all major cloud providers. Our goal is to support new services as they emerge to enable Nextflow users to take advantage of the latest technology and deploy pipelines on the compute environments that best fit their requirements.\n\nFor this reason, we are delighted to announce that Nextflow now supports [Google Cloud Batch](https://cloud.google.com/batch), a new fully managed batch service just announced for beta availability by Google Cloud.\n\n### A New On-Ramp to the Google Cloud\n\nGoogle Cloud Batch is a comprehensive cloud service suitable for multiple use cases, including HPC, AI/ML, and data processing. While it is similar to the Google Cloud Life Sciences API, used by many Nextflow users today, Google Cloud Batch offers a broader set of capabilities. As with Google Cloud Life Sciences, Google Cloud Batch automatically provisions resources, manages capacity, and allows batch workloads to run at scale. It offers several advantages, including:\n\n- The ability to re-use VMs across jobs steps to reduce overhead and boost performance.\n- Granular control over task execution, compute, and storage resources.\n- Infrastructure, application, and task-level logging.\n- Improved task parallelization, including support for multi-node MPI jobs, with support for array jobs, and subtasks.\n- Improved support for spot instances, which provides a significant cost saving when compared to regular instance.\n- Streamlined data handling and provisioning.\n\nA nice feature of Google Cloud Batch API, that fits nicely with Nextflow, is its built-in support for data ingestion from Google Cloud Storage buckets. A batch job can _mount_ a storage bucket and make it directly accessible to a container running a Nextflow task. This feature makes data ingestion and sharing resulting data sets more efficient and reliable than other solutions.\n\n### Getting started with Google Cloud Batch\n\nSupport for the Google Cloud Batch requires the latest release of Nextflow from the edge channel (version `22.07.1-edge` or later). If you don't already have it, you can install this release using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nMake sure your Google account is allowed to access the Google Cloud Batch service by checking the [API & Service](https://console.cloud.google.com/apis/dashboard) dashboard.\n\nCredentials for accessing the service are picked up by Nextflow from your environment using the usual [Google Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http) mechanism. That is, either via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or by using the following command to set up the environment:\n\n```\ngcloud auth application-default login\n```\n\nAfter authenticating yourself to Google Cloud, create a `nextflow.config` file and specify `google-batch` as the Nextflow executor. You will also need to specify the Google Cloud project where execution will occur and the Google Cloud Storage working directory for pipeline execution.\n\n```\ncat < nextflow.config\nprocess.executor = 'google-batch'\nworkDir = 'gs://YOUR-GOOGLE-BUCKET/scratch'\ngoogle.project = 'YOUR GOOGLE PROJECT ID'\nEOT\n```\n\nIn the above snippet replace `` with a Google Storage bucket of your choice where to store the pipeline output data and `` with your Google project Id where the computation will be deployed.\n\nWith this information, you are ready to start. You can verify that the integration is working by running the Nextflow “hello” pipeline as shown below:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\n### Migrating Google Cloud Life Sciences pipelines to Google Cloud Batch\n\nGoogle Cloud Life Sciences users can easily migrate their pipelines to Google Cloud Batch by making just a few edits to their pipeline configuration settings. Simply replace the `google-lifesciences` executor with `google-batch`.\n\nFor each setting having the prefix `google.lifeScience.`, there is a corresponding `google.batch.` setting. Simply update these configuration settings to reflect the new service.\n\nThe usual process directives such as: [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory), [time](https://www.nextflow.io/docs/latest/process.html#time), [machineType](https://www.nextflow.io/docs/latest/process.html#machinetype) are natively supported by Google Cloud Batch, and should not be modified.\n\nFind out more details in the [Nextflow documentation](https://www.nextflow.io/docs/edge/google.html#cloud-batch).\n\n### 100% Open, Built to Scale\n\nThe Google Cloud Batch executor for Nextflow is offered as an open source contribution to the Nextflow project. The integration was developed by Google in collaboration with [Seqera Labs](https://seqera.io/). This is a validation of Google Cloud’s ongoing commitment to open source software (OSS) and a testament to the health and vibrancy of the Nextflow project. We wish to thank the entire Google Cloud Batch team, and Shamel Jacobs in particular, for their support of this effort.\n\n### Conclusion\n\nSupport for Google Cloud Batch further expands the wide range of computing platforms supported by Nextflow. It empowers Nextflow users to easily access cost-effective resources, and take full advantage of the rich capabilities of the Google Cloud. Above all, it enables researchers to easily scale and collaborate, improving their productivity, and resulting in better research outcomes.\n", + "content": "A key feature of Nextflow is the ability to abstract the implementation of data analysis pipelines so they can be deployed in a portable manner across execution platforms.\n\nAs of today, Nextflow supports a rich variety of HPC schedulers and all major cloud providers. Our goal is to support new services as they emerge to enable Nextflow users to take advantage of the latest technology and deploy pipelines on the compute environments that best fit their requirements.\n\nFor this reason, we are delighted to announce that Nextflow now supports [Google Cloud Batch](https://cloud.google.com/batch), a new fully managed batch service just announced for beta availability by Google Cloud.\n\n### A New On-Ramp to the Google Cloud\n\nGoogle Cloud Batch is a comprehensive cloud service suitable for multiple use cases, including HPC, AI/ML, and data processing. While it is similar to the Google Cloud Life Sciences API, used by many Nextflow users today, Google Cloud Batch offers a broader set of capabilities. As with Google Cloud Life Sciences, Google Cloud Batch automatically provisions resources, manages capacity, and allows batch workloads to run at scale. It offers several advantages, including:\n\n- The ability to re-use VMs across jobs steps to reduce overhead and boost performance.\n- Granular control over task execution, compute, and storage resources.\n- Infrastructure, application, and task-level logging.\n- Improved task parallelization, including support for multi-node MPI jobs, with support for array jobs, and subtasks.\n- Improved support for spot instances, which provides a significant cost saving when compared to regular instance.\n- Streamlined data handling and provisioning.\n\nA nice feature of Google Cloud Batch API, that fits nicely with Nextflow, is its built-in support for data ingestion from Google Cloud Storage buckets. A batch job can _mount_ a storage bucket and make it directly accessible to a container running a Nextflow task. This feature makes data ingestion and sharing resulting data sets more efficient and reliable than other solutions.\n\n### Getting started with Google Cloud Batch\n\nSupport for the Google Cloud Batch requires the latest release of Nextflow from the edge channel (version `22.07.1-edge` or later). If you don't already have it, you can install this release using these commands:\n\n```\nexport NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update\n```\n\nMake sure your Google account is allowed to access the Google Cloud Batch service by checking the [API & Service](https://console.cloud.google.com/apis/dashboard) dashboard.\n\nCredentials for accessing the service are picked up by Nextflow from your environment using the usual [Google Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http) mechanism. That is, either via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or by using the following command to set up the environment:\n\n```\ngcloud auth application-default login\n```\n\nAfter authenticating yourself to Google Cloud, create a `nextflow.config` file and specify `google-batch` as the Nextflow executor. You will also need to specify the Google Cloud project where execution will occur and the Google Cloud Storage working directory for pipeline execution.\n\n```\ncat < nextflow.config\nprocess.executor = 'google-batch'\nworkDir = 'gs://YOUR-GOOGLE-BUCKET/scratch'\ngoogle.project = 'YOUR GOOGLE PROJECT ID'\nEOT\n```\n\nIn the above snippet replace `` with a Google Storage bucket of your choice where to store the pipeline output data and `` with your Google project Id where the computation will be deployed.\n\nWith this information, you are ready to start. You can verify that the integration is working by running the Nextflow “hello” pipeline as shown below:\n\n```\nnextflow run https://github.com/nextflow-io/hello\n```\n\n### Migrating Google Cloud Life Sciences pipelines to Google Cloud Batch\n\nGoogle Cloud Life Sciences users can easily migrate their pipelines to Google Cloud Batch by making just a few edits to their pipeline configuration settings. Simply replace the `google-lifesciences` executor with `google-batch`.\n\nFor each setting having the prefix `google.lifeScience.`, there is a corresponding `google.batch.` setting. Simply update these configuration settings to reflect the new service.\n\nThe usual process directives such as: [cpus](https://www.nextflow.io/docs/latest/process.html#cpus), [memory](https://www.nextflow.io/docs/latest/process.html#memory), [time](https://www.nextflow.io/docs/latest/process.html#time), [machineType](https://www.nextflow.io/docs/latest/process.html#machinetype) are natively supported by Google Cloud Batch, and should not be modified.\n\nFind out more details in the [Nextflow documentation](https://www.nextflow.io/docs/edge/google.html#cloud-batch).\n\n### 100% Open, Built to Scale\n\nThe Google Cloud Batch executor for Nextflow is offered as an open source contribution to the Nextflow project. The integration was developed by Google in collaboration with [Seqera Labs](https://seqera.io/). This is a validation of Google Cloud’s ongoing commitment to open source software (OSS) and a testament to the health and vibrancy of the Nextflow project. We wish to thank the entire Google Cloud Batch team, and Shamel Jacobs in particular, for their support of this effort.\n\n### Conclusion\n\nSupport for Google Cloud Batch further expands the wide range of computing platforms supported by Nextflow. It empowers Nextflow users to easily access cost-effective resources, and take full advantage of the rich capabilities of the Google Cloud. Above all, it enables researchers to easily scale and collaborate, improving their productivity, and resulting in better research outcomes.\n", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,google,cloud" @@ -430,7 +430,7 @@ "slug": "2022/evolution-of-nextflow-runtime", "title": "Evolution of the Nextflow runtime", "date": "2022-03-24T00:00:00.000Z", - "content": "\nSoftware development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.\n\nNextflow is no exception. This post will summarise the major changes in the evolution of the framework over the next 12 to 18 months.\n\n### Java baseline version\n\nNextflow runs on top of Java (or, more precisely, the Java virtual machine). So far, Java 8 has been the minimal version required to run Nextflow. However, this version was released 8 years ago and is going to reach its end-of-life status at the end of [this month](https://endoflife.date/java). For this reason, as of version 22.01.x-edge and the upcoming stable release 22.04.0, Nextflow will require Java version 11 or later for its execution. This also allows the introduction of new capabilities provided by the modern Java runtime.\n\nTip: If you are confused about how to install or upgrade Java on your computer, consider using [Sdkman](https://sdkman.io/). It’s a one-liner install tool that allows easy management of Java versions.\n\n### DSL2 as default syntax\n\nNextflow DSL2 has been introduced nearly [2 years ago](https://www.nextflow.io/blog/2020/dsl2-is-here.html) (how time flies!) and definitely represented a major milestone for the project. Established pipeline collections such as those in [nf-core](https://nf-co.re/pipelines) have migrated their pipelines to DSL2 syntax.\n\nThis is a confirmation that the DSL2 syntax represents a natural evolution for the project and is not considered to be just an experimental or alternative syntax.\n\nFor this reason, as for Nextflow version 22.03.0-edge and the upcoming 22.04.0 stable release, DSL2 syntax is going to be the **default** syntax version used by Nextflow, if not otherwise specified.\n\nIn practical terms, this means it will no longer be necessary to add the declaration `nextflow.enable.dsl = 2` at the top of your script or use the command line option `-dsl2 ` to enable the use of this syntax.\n\nIf you still want to continue to use DSL1 for your pipeline scripts, you will need to add the declaration `nextflow.enable.dsl = 1` at the top of your pipeline script or use the command line option `-dsl1`.\n\nTo make this transition as smooth as possible, we have also added the possibility to declare the DSL version in the Nextflow configuration file, using the same syntax shown above.\n\nFinally, if you wish to keep the current DSL behaviour and not make any changes in your pipeline scripts, the following variable can be defined in your system environment:\n\n```\nexport NXF_DEFAULT_DSL=1\n```\n\n### DSL1 end-of-life phase\n\nMaintaining two separate DSL implementations in the same programming environment is not sustainable and, above all, does not make much sense. For this reason, along with making DSL2 the default Nextflow syntax, DSL1 will enter into a 12-month end-of-life phase, at the end of which it will be removed. Therefore version 22.04.x and 22.10.x will be the last stable versions providing the ability to run DSL1 scripts.\n\nThis is required to keep evolving the framework and to create a more solid implementation of Nextflow grammar. Maintaining compatibility with the legacy syntax implementation and data structures is a challenging task that prevents the evolution of the new syntax.\n\nBear in mind, this does **not** mean it will not be possible to use DSL1 starting from 2023. All existing Nextflow runtimes will continue to be available, and it will be possible to for any legacy pipeline to run using the required version available from the GitHub [releases page](https://github.com/nextflow-io/nextflow/releases), or by specifying the version using the NXF_VER variable, e.g.\n\n```\nNXF_VER: 21.10.6 nextflow run \n```\n\n### New configuration format\n\nThe configuration file is a key component of the Nextflow framework since it allows workflow developers to decouple the pipeline logic from the execution parameters and infrastructure deployment settings.\n\nThe current Nextflow configuration file mechanism is extremely powerful, but it also has some serious drawbacks due to its _dynamic_ nature that makes it very hard to keep stable and maintainable over time.\n\nFor this reason, we are planning to re-engineer the current configuration component and replace it with a better configuration component with two major goals: 1) continue to provide a rich and human-readable configuration system (so, no YAML or JSON), 2) have a well-defined syntax with a solid foundation that guarantees predictable configurations, simpler troubleshooting and more sustainable maintenance.\n\nCurrently, the most likely options are [Hashicorp HCL](https://github.com/hashicorp/hcl) (as used by Terraform and other Hashicorp tools) and [Lightbend HOCON](https://github.com/lightbend/config). You can read more about this feature at [this link](https://github.com/nextflow-io/nextflow/issues/2723).\n\n### Ignite executor deprecation\n\nThe executor for [Apache Ignite](https://www.nextflow.io/docs/latest/ignite.html) was an early attempt to provide Nextflow with a self-contained, distributed cluster for the deployment of pipelines into HPC environments. However, it had very little adoption over the years, which was not balanced by the increasing complexity of its maintenance.\n\nFor this reason, it was decided to deprecate it and remove it from the default Nextflow distribution. The module is still available in the form of a separate project plugin and available at [this link](https://github.com/nextflow-io/nf-ignite), however, it will not be actively maintained.\n\n### Conclusion\n\nThis post is focused on the most fundamental changes we are planning to make in the following months.\n\nWith the adoption of Java 11, the full migration of DSL1 to DSL2 and the re-engineering of the configuration system, our purpose is to consolidate the Nextflow technology and lay the foundation for all the new exciting developments and features on which we are working on. Stay tuned for future blogs about each of them in upcoming posts.\n\nIf you want to learn more about the upcoming changes reach us out on [Slack at this link](https://app.slack.com/client/T03L6DM9G).\n", + "content": "Software development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.\n\nNextflow is no exception. This post will summarise the major changes in the evolution of the framework over the next 12 to 18 months.\n\n### Java baseline version\n\nNextflow runs on top of Java (or, more precisely, the Java virtual machine). So far, Java 8 has been the minimal version required to run Nextflow. However, this version was released 8 years ago and is going to reach its end-of-life status at the end of [this month](https://endoflife.date/java). For this reason, as of version 22.01.x-edge and the upcoming stable release 22.04.0, Nextflow will require Java version 11 or later for its execution. This also allows the introduction of new capabilities provided by the modern Java runtime.\n\nTip: If you are confused about how to install or upgrade Java on your computer, consider using [Sdkman](https://sdkman.io/). It’s a one-liner install tool that allows easy management of Java versions.\n\n### DSL2 as default syntax\n\nNextflow DSL2 has been introduced nearly [2 years ago](https://www.nextflow.io/blog/2020/dsl2-is-here.html) (how time flies!) and definitely represented a major milestone for the project. Established pipeline collections such as those in [nf-core](https://nf-co.re/pipelines) have migrated their pipelines to DSL2 syntax.\n\nThis is a confirmation that the DSL2 syntax represents a natural evolution for the project and is not considered to be just an experimental or alternative syntax.\n\nFor this reason, as for Nextflow version 22.03.0-edge and the upcoming 22.04.0 stable release, DSL2 syntax is going to be the **default** syntax version used by Nextflow, if not otherwise specified.\n\nIn practical terms, this means it will no longer be necessary to add the declaration `nextflow.enable.dsl = 2` at the top of your script or use the command line option `-dsl2 ` to enable the use of this syntax.\n\nIf you still want to continue to use DSL1 for your pipeline scripts, you will need to add the declaration `nextflow.enable.dsl = 1` at the top of your pipeline script or use the command line option `-dsl1`.\n\nTo make this transition as smooth as possible, we have also added the possibility to declare the DSL version in the Nextflow configuration file, using the same syntax shown above.\n\nFinally, if you wish to keep the current DSL behaviour and not make any changes in your pipeline scripts, the following variable can be defined in your system environment:\n\n```\nexport NXF_DEFAULT_DSL=1\n```\n\n### DSL1 end-of-life phase\n\nMaintaining two separate DSL implementations in the same programming environment is not sustainable and, above all, does not make much sense. For this reason, along with making DSL2 the default Nextflow syntax, DSL1 will enter into a 12-month end-of-life phase, at the end of which it will be removed. Therefore version 22.04.x and 22.10.x will be the last stable versions providing the ability to run DSL1 scripts.\n\nThis is required to keep evolving the framework and to create a more solid implementation of Nextflow grammar. Maintaining compatibility with the legacy syntax implementation and data structures is a challenging task that prevents the evolution of the new syntax.\n\nBear in mind, this does **not** mean it will not be possible to use DSL1 starting from 2023. All existing Nextflow runtimes will continue to be available, and it will be possible to for any legacy pipeline to run using the required version available from the GitHub [releases page](https://github.com/nextflow-io/nextflow/releases), or by specifying the version using the NXF_VER variable, e.g.\n\n```\nNXF_VER: 21.10.6 nextflow run \n```\n\n### New configuration format\n\nThe configuration file is a key component of the Nextflow framework since it allows workflow developers to decouple the pipeline logic from the execution parameters and infrastructure deployment settings.\n\nThe current Nextflow configuration file mechanism is extremely powerful, but it also has some serious drawbacks due to its _dynamic_ nature that makes it very hard to keep stable and maintainable over time.\n\nFor this reason, we are planning to re-engineer the current configuration component and replace it with a better configuration component with two major goals: 1) continue to provide a rich and human-readable configuration system (so, no YAML or JSON), 2) have a well-defined syntax with a solid foundation that guarantees predictable configurations, simpler troubleshooting and more sustainable maintenance.\n\nCurrently, the most likely options are [Hashicorp HCL](https://github.com/hashicorp/hcl) (as used by Terraform and other Hashicorp tools) and [Lightbend HOCON](https://github.com/lightbend/config). You can read more about this feature at [this link](https://github.com/nextflow-io/nextflow/issues/2723).\n\n### Ignite executor deprecation\n\nThe executor for [Apache Ignite](https://www.nextflow.io/docs/latest/ignite.html) was an early attempt to provide Nextflow with a self-contained, distributed cluster for the deployment of pipelines into HPC environments. However, it had very little adoption over the years, which was not balanced by the increasing complexity of its maintenance.\n\nFor this reason, it was decided to deprecate it and remove it from the default Nextflow distribution. The module is still available in the form of a separate project plugin and available at [this link](https://github.com/nextflow-io/nf-ignite), however, it will not be actively maintained.\n\n### Conclusion\n\nThis post is focused on the most fundamental changes we are planning to make in the following months.\n\nWith the adoption of Java 11, the full migration of DSL1 to DSL2 and the re-engineering of the configuration system, our purpose is to consolidate the Nextflow technology and lay the foundation for all the new exciting developments and features on which we are working on. Stay tuned for future blogs about each of them in upcoming posts.\n\nIf you want to learn more about the upcoming changes reach us out on [Slack at this link](https://app.slack.com/client/T03L6DM9G).\n", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,dsl2" @@ -439,7 +439,7 @@ "slug": "2022/learn-nextflow-in-2022", "title": "Learning Nextflow in 2022", "date": "2022-01-21T00:00:00.000Z", - "content": "\nA lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow [Software Carpentries](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course and an excellent write-up by [23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1).\n\nWe have collated some links below from a diverse collection of resources to help you on your journey to learn Nextflow. Nextflow is a community-driven project - if you have any suggestions, please make a pull request to [this page on GitHub](https://github.com/nextflow-io/website/tree/master/content/blog/2022/learn-nextflow-in-2022.md).\n\nWithout further ado, here is the definitive guide for learning Nextflow in 2022. These resources will support anyone in the journey from total beginner to Nextflow expert.\n\n### Prerequisites\n\nBefore you start writing Nextflow pipelines, we recommend that you are comfortable with using the command-line and understand the basic concepts of scripting languages such as Python or Perl. Nextflow is widely used for bioinformatics applications, and scientific data analysis. The examples and guides below often focus on applications in these areas. However, Nextflow is now adopted in a number of data-intensive domains such as image analysis, machine learning, astronomy and geoscience.\n\n### Time commitment\n\nWe estimate that it will take at least 20 hours to complete the material. How quickly you finish will depend on your background and how deep you want to dive into the content. Most of the content is introductory but there are some more advanced dataflow and configuration concepts outlined in the workshop and pattern sections.\n\n### Contents\n\n- Why learn Nextflow?\n- Introduction to Nextflow from 23andMe\n- An RNA-Seq hands-on tutorial\n- Nextflow workshop from Seqera Labs\n- Software Carpentries Course\n- Managing Pipelines in the Cloud\n- The nf-core tutorial\n- Advanced implementation patterns\n- Awesome Nextflow\n- Further resources\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FAIR guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support** available from Seqera Labs. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow by 23andMe\n\nThis informative post begins with the basic concepts of Nextflow and builds towards how Nextflow is used at 23andMe. It includes a detailed use case for how 23andMe run their imputation pipeline in the cloud, processing over 1 million individuals per day with over 10,000 CPUs in a single compute environment.\n\n👉 [Nextflow at 23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1)\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, using channels to pass data around and writing processes to perform tasks. It includes all scripts, input data and resources and is perfect for getting a taste of Nextflow.\n\n👉 [Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Nextflow workshop from Seqera Labs\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n👉 [Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Software Carpentry workshop\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (in active development) motivates the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience are those with little programming experience, and the course provides a foundation to comfortably write and run Nextflow and nf-core workflows. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit [The Carpentries](https://carpentries.org/) style of training. The Carpentries emphasize feedback to improve teaching materials so we would like to hear back from you about what you thought was both well-explained and what needs improvement. Pull requests to the course material are very welcome.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n### 6. Managing Pipelines in the Cloud - GenomeWeb Webinar\n\nThis on-demand webinar features Phil Ewels from SciLifeLab and nf-core, Brendan Boufler from Amazon Web Services and Evan Floden from Seqera Labs. The wide ranging dicussion covers the significance of scientific workflow, examples of Nextflow in production settings and how Nextflow can be integrated with other processes.\n\n👉 [Watch the webinar](https://seqera.io/webinars-and-podcasts/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n### 7. Nextflow implementation patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along, as well as a GitHub repository.\n\n👉 [Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### 8. nf-core tutorials\n\nA tutorial covering the basics of using and creating nf-core pipelines. It provides an overview of the nf-core framework including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n👉 [nf-core usage tutorials](https://nf-co.re/usage/usage_tutorials)\nand [nf-core developer tutorials](https://nf-co.re/developers/developer_tutorials)\n\n### 9. Awesome Nextflow\n\nA collections of awesome Nextflow pipelines.\n\n👉 [Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) on GitHub\n\n### 10. Further resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\n👉 Latest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### Seqera Labs docs\n\nAn index of documentation, deployment guides, training materials and resources for all things Nextflow and Tower.\n\n👉 [Seqera Labs docs](https://seqera.io/docs)\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n👉 [nf-core website](https://nf-co.re) and [nf-core Slack](https://nf-co.re/join)\n\n#### Nextflow Tower\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface, CLI or API.\n\n👉 [Nextflow Tower](https://tower.nf) and [user documentation](http://help.tower.nf).\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n👉 [Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Nextflow Data Pipelines on Azure Batch\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guides below to set up both services on Azure, and to get your storage and batch account names and keys.\n\n👉 [Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n#### Running Nextflow by Google Cloud\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n👉 [Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Bonus: Nextflow Tutorial - Variant Calling Edition\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash-script into a Nextflow workflow and to modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io) and [Nextflow Tower](https://tower.nf)\n\n### Credits\n\nSpecial thanks to Mahesh Binzer-Panchal for reviewing the latest revision of this post and contributing the Software Carpentry workshop section.\n", + "content": "A lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow [Software Carpentries](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course and an excellent write-up by [23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1).\n\nWe have collated some links below from a diverse collection of resources to help you on your journey to learn Nextflow. Nextflow is a community-driven project - if you have any suggestions, please make a pull request to [this page on GitHub](https://github.com/nextflow-io/website/tree/master/content/blog/2022/learn-nextflow-in-2022.md).\n\nWithout further ado, here is the definitive guide for learning Nextflow in 2022. These resources will support anyone in the journey from total beginner to Nextflow expert.\n\n### Prerequisites\n\nBefore you start writing Nextflow pipelines, we recommend that you are comfortable with using the command-line and understand the basic concepts of scripting languages such as Python or Perl. Nextflow is widely used for bioinformatics applications, and scientific data analysis. The examples and guides below often focus on applications in these areas. However, Nextflow is now adopted in a number of data-intensive domains such as image analysis, machine learning, astronomy and geoscience.\n\n### Time commitment\n\nWe estimate that it will take at least 20 hours to complete the material. How quickly you finish will depend on your background and how deep you want to dive into the content. Most of the content is introductory but there are some more advanced dataflow and configuration concepts outlined in the workshop and pattern sections.\n\n### Contents\n\n- Why learn Nextflow?\n- Introduction to Nextflow from 23andMe\n- An RNA-Seq hands-on tutorial\n- Nextflow workshop from Seqera Labs\n- Software Carpentries Course\n- Managing Pipelines in the Cloud\n- The nf-core tutorial\n- Advanced implementation patterns\n- Awesome Nextflow\n- Further resources\n\n### 1. Why learn Nextflow?\n\nNextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.\n\nThe following are some high-level motivations on why people choose to adopt Nextflow:\n\n1. Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FAIR guidelines with version-control and containers to manage all software dependencies.\n2. Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.\n3. It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.\n4. It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.\n5. It is **growing fast** and has **long-term support** available from Seqera Labs. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.\n6. It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.\n\n### 2. Introduction to Nextflow by 23andMe\n\nThis informative post begins with the basic concepts of Nextflow and builds towards how Nextflow is used at 23andMe. It includes a detailed use case for how 23andMe run their imputation pipeline in the cloud, processing over 1 million individuals per day with over 10,000 CPUs in a single compute environment.\n\n👉 [Nextflow at 23andMe](https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1)\n\n### 3. A simple RNA-Seq hands-on tutorial\n\nThis hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, using channels to pass data around and writing processes to perform tasks. It includes all scripts, input data and resources and is perfect for getting a taste of Nextflow.\n\n👉 [Tutorial link on GitHub](https://github.com/seqeralabs/nextflow-tutorial)\n\n### 4. Nextflow workshop from Seqera Labs\n\nHere you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.\n\nThe workshop includes topics on:\n\n- Environment Setup\n- Basic NF Script and Concepts\n- Nextflow Processes\n- Nextflow Channels\n- Nextflow Operators\n- Basic RNA-Seq pipeline\n- Containers & Conda\n- Nextflow Configuration\n- On-premise & Cloud Deployment\n- DSL 2 & Modules\n- [GATK hands-on exercise](https://seqera.io/training/handson/)\n\n👉 [Workshop](https://seqera.io/training) & [YouTube playlist](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI).\n\n### 5. Software Carpentry workshop\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (in active development) motivates the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience are those with little programming experience, and the course provides a foundation to comfortably write and run Nextflow and nf-core workflows. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit [The Carpentries](https://carpentries.org/) style of training. The Carpentries emphasize feedback to improve teaching materials so we would like to hear back from you about what you thought was both well-explained and what needs improvement. Pull requests to the course material are very welcome.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n### 6. Managing Pipelines in the Cloud - GenomeWeb Webinar\n\nThis on-demand webinar features Phil Ewels from SciLifeLab and nf-core, Brendan Boufler from Amazon Web Services and Evan Floden from Seqera Labs. The wide ranging dicussion covers the significance of scientific workflow, examples of Nextflow in production settings and how Nextflow can be integrated with other processes.\n\n👉 [Watch the webinar](https://seqera.io/webinars-and-podcasts/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n### 7. Nextflow implementation patterns\n\nThis advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along, as well as a GitHub repository.\n\n👉 [Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n### 8. nf-core tutorials\n\nA tutorial covering the basics of using and creating nf-core pipelines. It provides an overview of the nf-core framework including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n👉 [nf-core usage tutorials](https://nf-co.re/usage/usage_tutorials)\nand [nf-core developer tutorials](https://nf-co.re/developers/developer_tutorials)\n\n### 9. Awesome Nextflow\n\nA collections of awesome Nextflow pipelines.\n\n👉 [Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) on GitHub\n\n### 10. Further resources\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n#### Nextflow docs\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month with the latest stable releases every three months.\n\n👉 Latest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n#### Seqera Labs docs\n\nAn index of documentation, deployment guides, training materials and resources for all things Nextflow and Tower.\n\n👉 [Seqera Labs docs](https://seqera.io/docs)\n\n#### nf-core\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.\n\n👉 [nf-core website](https://nf-co.re) and [nf-core Slack](https://nf-co.re/join)\n\n#### Nextflow Tower\n\nNextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface, CLI or API.\n\n👉 [Nextflow Tower](https://tower.nf) and [user documentation](http://help.tower.nf).\n\n#### Nextflow Biotech Blueprint by AWS\n\nA quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.\n\n👉 [Biotech Blueprint by AWS](https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/)\n\n#### Nextflow Data Pipelines on Azure Batch\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guides below to set up both services on Azure, and to get your storage and batch account names and keys.\n\n👉 [Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n#### Running Nextflow by Google Cloud\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud.\n\n👉 [Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow)\n\n#### Bonus: Nextflow Tutorial - Variant Calling Edition\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash-script into a Nextflow workflow and to modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git) where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\n👉 You can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n### Community and support\n\n- Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)\n- Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io) and [Nextflow Tower](https://tower.nf)\n\n### Credits\n\nSpecial thanks to Mahesh Binzer-Panchal for reviewing the latest revision of this post and contributing the Software Carpentry workshop section.", "images": [], "author": "Evan Floden", "tags": "learn,workshop,webinar" @@ -448,7 +448,7 @@ "slug": "2022/nextflow-is-moving-to-slack", "title": "Nextflow’s community is moving to Slack!", "date": "2022-02-22T00:00:00.000Z", - "content": "\n
\n\n“Software communities don’t just write code together. They brainstorm feature ideas, help new users get their bearings, and collaborate on best ways to use the software.…conversations need their own place\" - GitHub Satellite Blog 2020\n\n
\n\nThe Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.\n\nI still remember when a former colleague proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : \"would that not be a waste of time?\".\n\nFortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.\n\n### Where the future lies\n\nAs the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.\n\nFor these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!\n\nYou can join today using this link!\n\nOnce you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.\n\nAlong the same lines, the Nextflow discussion forum is moving from Google Groups to the Discussion forum in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.\n\nThe old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.\n\nIf you have any questions or problems signing up then please feel free to let us know at info@nextflow.io.\n\nAs always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!\n\nSee you on Slack!\n\n### Credits\n\nThis was also made possible thanks to sponsorship from the Chan Zuckerberg Initiative, the Slack for Nonprofits program and support from Seqera Labs.\n", + "content": "
\n*\n“Software communities don’t just write code together. They brainstorm feature ideas, help new users get their bearings, and collaborate on best ways to use the software.…conversations need their own place\" - [GitHub Satellite Blog 2020](https://github.blog/2020-05-06-new-from-satellite-2020-github-codespaces-github-discussions-securing-code-in-private-repositories-and-more)\n*\n
\n\nThe Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.\n\nI still remember when a [former colleague](https://twitter.com/helicobacter1) proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : \"would that not be a waste of time?\".\n\nFortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.\n\n### Where the future lies\n\nAs the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.\n\nFor these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!\n\nYou can join today using [this link](https://www.nextflow.io/slack-invite.html)!\n\nOnce you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.\n\nAlong the same lines, the Nextflow discussion forum is moving from [Google Groups](https://groups.google.com/forum/#!forum/nextflow) to the [Discussion forum](https://github.com/nextflow-io/nextflow/discussions) in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.\n\nThe old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.\n\nIf you have any questions or problems signing up then please feel free to let us know at info@nextflow.io.\n\nAs always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!\n\nSee you on Slack!\n\n### Credits\n\nThis was also made possible thanks to sponsorship from the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), the [Slack for Nonprofits program](https://slack.com/intl/en-gb/about/slack-for-good) and support from [Seqera Labs](https://www.seqera.io).", "images": [], "author": "Paolo Di Tommaso", "tags": "community, slack, github" @@ -457,7 +457,7 @@ "slug": "2022/nextflow-summit-2022-recap", "title": "Nextflow Summit 2022 Recap", "date": "2022-11-03T00:00:00.000Z", - "content": "\n## Three days of Nextflow goodness in Barcelona\n\nAfter a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!\n\nThe week-long event kicked off with 50 people participating in a hackathon organized by nf-core beginning on October 10th. The [hackathon](https://nf-co.re/events/2022/hackathon-october-2022) tackled several cutting-edge projects with developer teams focused on various aspects of nf-core including documentation, subworkflows, pipelines, DSL2 conversions, modules, and infrastructure. The Nextflow Summit began mid-week attracting nearly 600 people, including 165 attending in person and another 433 remotely. The [YouTube live streams](https://summit.nextflow.io/stream/) have now collected over two and half thousand views. Just prior to the summit, three virtual Nextflow training events were also run with separate sessions for the Americas, EMEA, and APAC in which 835 people participated.\n\n## An action-packed agenda\n\nThe three-day Nextflow Summit featured 33 talks delivered by speakers from academia, research, healthcare providers, biotechs, and cloud providers. This year’s speakers came from the following organizations:\n\n- Amazon Web Services\n- Center for Genomic Regulation\n- Centre for Molecular Medicine and Therapeutics, University of British Columbia\n- Chan Zukerberg Biohub\n- Curative\n- DNA Nexus\n- Enterome\n- Google\n- Janelia Research Campus\n- Microsoft\n- Oxford Nanopore\n- Quadram Institute BioScience\n- Seqera Labs\n- Quantitative Biology Center, University of Tübingen\n- Quilt Data\n- UNC Lineberger Comprehensive Cancer Center\n- Università degli Studi di Macerata\n- University of Maryland\n- Wellcome Sanger Institute\n- Wyoming Public Health Laboratory\n\n## Some recurring themes\n\nWhile there were too many excellent talks to cover individually, a few themes surfaced throughout the summit. Not surprisingly, SARS-Cov-2 was a thread that wound through several talks. Tony Zeljkovic from Curative led a discussion about [unlocking automated bioinformatics for large-scale healthcare](https://www.youtube.com/watch?v=JZMaRYzZxGU&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=8), and Thanh Le Viet of Quadram Institute Bioscience discussed [large-scale SARS-Cov-2 genomic surveillance at QIB](https://www.youtube.com/watch?v=6jQr9dDaais&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=30). Several speakers discussed best practices for building portable, modular pipelines. Other common themes were data provenance & traceability, data management, and techniques to use compute and storage more efficiently. There were also a few talks about the importance of dataflows in new application areas outside of genomics and bioinformatics.\n\n## Data provenance tracking\n\nIn the Thursday morning keynote, Rob Patro﹘Associate Professor at the University of Maryland Dept. of Computer Science and CTO and co-founder of Ocean Genomics﹘described in his talk “[What could be next(flow)](https://www.youtube.com/watch?v=vNrKFT5eT8U&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=6),” how far the Nextflow community had come in solving problems such as reproducibility, scalability, modularity, and ease of use. He then challenged the community with some complex issues still waiting in the wings. He focused on data provenance as a particularly vexing challenge explaining how tremendous effort currently goes into manual metadata curation.\n\nRob offered suggestions about how Nextflow might evolve, and coined the term “augmented execution contexts” (AECs) drawing from his work on provenance tracking – answering questions such as “what are these files, and where did they come from.” This thinking is reflected in [tximeta](https://github.com/mikelove/tximeta), a project co-developed with Mike Love of UNC. Rob also proposed ideas around automating data format conversions analogous to type casting in programming languages explaining how such conversions might be built into Nextflow channels to make pipelines more interoperable.\n\nIn his talk with the clever title “[one link to rule them all](https://www.youtube.com/watch?v=dttkcuP3OBc&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=13),” Aneesh Karve of Quilt explained how every pipeline run is a function of the code, environment, and data, and went on to show how Quilt could help dramatically simplify data management with dataset versioning, accessibility, and verifiability. Data provenance and traceability were also front and center when Yih-Chii Hwang of DNAnexus described her team’s work around [bringing GxP compliance to Nextflow workflows](https://www.youtube.com/watch?v=RIwpJTDlLiE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=21).\n\n## Data management and storage\n\nOther speakers also talked about challenges related to data management and performance. Angel Pizarro of AWS gave an interesting talk comparing the [price/performance of different AWS cloud storage options](https://www.youtube.com/watch?v=VXtYCAqGEQQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=12). [Hatem Nawar](https://www.youtube.com/watch?v=jB91uqUqsRM&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=9) (Google) and [Venkat Malladi](https://www.youtube.com/watch?v=GAIL8ZAMJPQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=20) (Microsoft) also talked about cloud economics and various approaches to data handling in their respective clouds. Data management was also a key part of Evan Floden’s discussion about Nextflow Tower where he discussed Tower Datasets, as well as the various cloud storage options accessible through Nextflow Tower. Finally, Nextflow creator Paolo Di Tommaso unveiled new work being done in Nextflow to simplify access to data residing in object stores in his talk “[Nextflow and the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14)”.\n\n## Compute optimization\n\nAnother recurring theme was improving compute efficiency. Several talks discussed using containers more effectively, leveraging GPUs & FPGAs for added performance, improving virtual machine instance type selection, and automating resource requirements. Mike Smoot of Illumina talked about Nextflow, Kubernetes, and DRAGENs and how Illumina’s FPGA-based Bio-IT Platform can dramatically accelerate analysis. Venkat Malladi discussed efforts to suggest optimal VM types based on different standardized nf-core labels in the Azure cloud (process_low, process_medium, process_high, etc.) Finally, Evan Floden discussed [Nextflow Tower](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22) and unveiled an exciting new [resource optimization feature](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/) that can intelligently tune pipeline resource requests to radically reduce cloud costs and improve run speed. Overall, the Nextflow community continues to make giant strides in improving efficiency and managing costs in the cloud.\n\n## Beyond genomics\n\nWhile most summit speakers focused on genomics, a few discussed data pipelines in other areas, including statistical modeling, analysis, and machine learning. Nicola Visonà from Università degli Studi di Macerata gave a fascinating talk about [using agent-based models to simulate the first industrial revolution](https://www.youtube.com/watch?v=PlKJ0IDV_ds&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=27). Similarly, Konrad Rokicki from the Janelia Research Campus explained how Janelia are using [Nextflow for petascale bioimaging data](https://www.youtube.com/watch?v=ZjSzx1I76z0&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=18) and why bioimage processing remains a large domain area with an unmet need for reproducible workflows.\n\n## Summit Announcements\n\nThis year’s summit also saw several exciting announcements from Nextflow developers. Paolo Di Tommaso, during his talk on [the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14), announced the availability of [Nextflow 22.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v22.10.0). In addition to various bug fixes, the latest Nextflow release introduces an exciting new technology called Wave that allows containers to be built on the fly from Dockerfiles or Conda recipes saved within a Nextflow pipeline. Wave also helps to simplify containerized pipeline deployment with features such as “container augmentation”; enabling developers to inject new container scripts and functionality on the fly without needing to rebuild the base containers such as a cloud-native [Fusion file system](https://www.nextflow.io/docs/latest/fusion.html). When used with Nextflow Tower, Wave also simplifies authentication to various public and private container registries. The latest Nextflow release also brings improved support for Kubernetes and enhancements to documentation, along with many other features.\n\nSeveral other announcements were made during [Evan Floden’s talk](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22&t=127s), such as:\n\n- MultiQC is joining the Seqera Labs family of products\n- Fusion – a distributed virtual file system for cloud-native data pipelines\n- Nextflow Tower support for Google Cloud Batch\n- Nextflow Tower resource optimization\n- Improved Resource Labels support in Tower with integrations for cost accounting with all major cloud providers\n- A new Nextflow Tower dashboard coming soon, providing visibility across workspaces\n\n## Thank you to our sponsors\n\nThe summit organizers wish to extend a sincere thank you to the event sponsors: AWS, Google Cloud, Seqera Labs, Quilt Data, Oxford Nanopore Technologies, and Element BioSciences. In addition, the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/) continues to play a key role with their EOSS grants funding important work related to Nextflow and the nf-core community. The success of this year’s summit reminds us of the tremendous value of community and the critical impact of open science software in improving the quality, accessibility, and efficiency of scientific research.\n\n## Learning more\n\nFor anyone who missed the summit, you can still watch the sessions or view the training sessions at your convenience:\n\n- Watch post-event recordings of the [Nextflow Summit on YouTube](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32)\n- View replays of the recent online [Nextflow and nf-core training](https://nf-co.re/events/2022/training-october-2022)\n\nFor additional detail on the summit and the preceding nf-core events, also check out an excellent [summary of the event](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/) written by Marcel Ribeiro-Dantas in his blog, the [Dataist Storyteller](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/)!\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)\n", + "content": "## Three days of Nextflow goodness in Barcelona\n\nAfter a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!\n\nThe week-long event kicked off with 50 people participating in a hackathon organized by nf-core beginning on October 10th. The [hackathon](https://nf-co.re/events/2022/hackathon-october-2022) tackled several cutting-edge projects with developer teams focused on various aspects of nf-core including documentation, subworkflows, pipelines, DSL2 conversions, modules, and infrastructure. The Nextflow Summit began mid-week attracting nearly 600 people, including 165 attending in person and another 433 remotely. The [YouTube live streams](https://summit.nextflow.io/stream/) have now collected over two and half thousand views. Just prior to the summit, three virtual Nextflow training events were also run with separate sessions for the Americas, EMEA, and APAC in which 835 people participated.\n\n## An action-packed agenda\n\nThe three-day Nextflow Summit featured 33 talks delivered by speakers from academia, research, healthcare providers, biotechs, and cloud providers. This year’s speakers came from the following organizations:\n\n- Amazon Web Services\n- Center for Genomic Regulation\n- Centre for Molecular Medicine and Therapeutics, University of British Columbia\n- Chan Zukerberg Biohub\n- Curative\n- DNA Nexus\n- Enterome\n- Google\n- Janelia Research Campus\n- Microsoft\n- Oxford Nanopore\n- Quadram Institute BioScience\n- Seqera Labs\n- Quantitative Biology Center, University of Tübingen\n- Quilt Data\n- UNC Lineberger Comprehensive Cancer Center\n- Università degli Studi di Macerata\n- University of Maryland\n- Wellcome Sanger Institute\n- Wyoming Public Health Laboratory\n\n## Some recurring themes\n\nWhile there were too many excellent talks to cover individually, a few themes surfaced throughout the summit. Not surprisingly, SARS-Cov-2 was a thread that wound through several talks. Tony Zeljkovic from Curative led a discussion about [unlocking automated bioinformatics for large-scale healthcare](https://www.youtube.com/watch?v=JZMaRYzZxGU&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=8), and Thanh Le Viet of Quadram Institute Bioscience discussed [large-scale SARS-Cov-2 genomic surveillance at QIB](https://www.youtube.com/watch?v=6jQr9dDaais&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=30). Several speakers discussed best practices for building portable, modular pipelines. Other common themes were data provenance & traceability, data management, and techniques to use compute and storage more efficiently. There were also a few talks about the importance of dataflows in new application areas outside of genomics and bioinformatics.\n\n## Data provenance tracking\n\nIn the Thursday morning keynote, Rob Patro﹘Associate Professor at the University of Maryland Dept. of Computer Science and CTO and co-founder of Ocean Genomics﹘described in his talk “[What could be next(flow)](https://www.youtube.com/watch?v=vNrKFT5eT8U&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=6),” how far the Nextflow community had come in solving problems such as reproducibility, scalability, modularity, and ease of use. He then challenged the community with some complex issues still waiting in the wings. He focused on data provenance as a particularly vexing challenge explaining how tremendous effort currently goes into manual metadata curation.\n\nRob offered suggestions about how Nextflow might evolve, and coined the term “augmented execution contexts” (AECs) drawing from his work on provenance tracking – answering questions such as “what are these files, and where did they come from.” This thinking is reflected in [tximeta](https://github.com/mikelove/tximeta), a project co-developed with Mike Love of UNC. Rob also proposed ideas around automating data format conversions analogous to type casting in programming languages explaining how such conversions might be built into Nextflow channels to make pipelines more interoperable.\n\nIn his talk with the clever title “[one link to rule them all](https://www.youtube.com/watch?v=dttkcuP3OBc&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=13),” Aneesh Karve of Quilt explained how every pipeline run is a function of the code, environment, and data, and went on to show how Quilt could help dramatically simplify data management with dataset versioning, accessibility, and verifiability. Data provenance and traceability were also front and center when Yih-Chii Hwang of DNAnexus described her team’s work around [bringing GxP compliance to Nextflow workflows](https://www.youtube.com/watch?v=RIwpJTDlLiE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=21).\n\n## Data management and storage\n\nOther speakers also talked about challenges related to data management and performance. Angel Pizarro of AWS gave an interesting talk comparing the [price/performance of different AWS cloud storage options](https://www.youtube.com/watch?v=VXtYCAqGEQQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=12). [Hatem Nawar](https://www.youtube.com/watch?v=jB91uqUqsRM&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=9) (Google) and [Venkat Malladi](https://www.youtube.com/watch?v=GAIL8ZAMJPQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=20) (Microsoft) also talked about cloud economics and various approaches to data handling in their respective clouds. Data management was also a key part of Evan Floden’s discussion about Nextflow Tower where he discussed Tower Datasets, as well as the various cloud storage options accessible through Nextflow Tower. Finally, Nextflow creator Paolo Di Tommaso unveiled new work being done in Nextflow to simplify access to data residing in object stores in his talk “[Nextflow and the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14)”.\n\n## Compute optimization\n\nAnother recurring theme was improving compute efficiency. Several talks discussed using containers more effectively, leveraging GPUs & FPGAs for added performance, improving virtual machine instance type selection, and automating resource requirements. Mike Smoot of Illumina talked about Nextflow, Kubernetes, and DRAGENs and how Illumina’s FPGA-based Bio-IT Platform can dramatically accelerate analysis. Venkat Malladi discussed efforts to suggest optimal VM types based on different standardized nf-core labels in the Azure cloud (process_low, process_medium, process_high, etc.) Finally, Evan Floden discussed [Nextflow Tower](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22) and unveiled an exciting new [resource optimization feature](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/) that can intelligently tune pipeline resource requests to radically reduce cloud costs and improve run speed. Overall, the Nextflow community continues to make giant strides in improving efficiency and managing costs in the cloud.\n\n## Beyond genomics\n\nWhile most summit speakers focused on genomics, a few discussed data pipelines in other areas, including statistical modeling, analysis, and machine learning. Nicola Visonà from Università degli Studi di Macerata gave a fascinating talk about [using agent-based models to simulate the first industrial revolution](https://www.youtube.com/watch?v=PlKJ0IDV_ds&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=27). Similarly, Konrad Rokicki from the Janelia Research Campus explained how Janelia are using [Nextflow for petascale bioimaging data](https://www.youtube.com/watch?v=ZjSzx1I76z0&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=18) and why bioimage processing remains a large domain area with an unmet need for reproducible workflows.\n\n## Summit Announcements\n\nThis year’s summit also saw several exciting announcements from Nextflow developers. Paolo Di Tommaso, during his talk on [the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14), announced the availability of [Nextflow 22.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v22.10.0). In addition to various bug fixes, the latest Nextflow release introduces an exciting new technology called Wave that allows containers to be built on the fly from Dockerfiles or Conda recipes saved within a Nextflow pipeline. Wave also helps to simplify containerized pipeline deployment with features such as “container augmentation”; enabling developers to inject new container scripts and functionality on the fly without needing to rebuild the base containers such as a cloud-native [Fusion file system](https://www.nextflow.io/docs/latest/fusion.html). When used with Nextflow Tower, Wave also simplifies authentication to various public and private container registries. The latest Nextflow release also brings improved support for Kubernetes and enhancements to documentation, along with many other features.\n\nSeveral other announcements were made during [Evan Floden’s talk](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22&t=127s), such as:\n\n- MultiQC is joining the Seqera Labs family of products\n- Fusion – a distributed virtual file system for cloud-native data pipelines\n- Nextflow Tower support for Google Cloud Batch\n- Nextflow Tower resource optimization\n- Improved Resource Labels support in Tower with integrations for cost accounting with all major cloud providers\n- A new Nextflow Tower dashboard coming soon, providing visibility across workspaces\n\n## Thank you to our sponsors\n\nThe summit organizers wish to extend a sincere thank you to the event sponsors: AWS, Google Cloud, Seqera Labs, Quilt Data, Oxford Nanopore Technologies, and Element BioSciences. In addition, the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/) continues to play a key role with their EOSS grants funding important work related to Nextflow and the nf-core community. The success of this year’s summit reminds us of the tremendous value of community and the critical impact of open science software in improving the quality, accessibility, and efficiency of scientific research.\n\n## Learning more\n\nFor anyone who missed the summit, you can still watch the sessions or view the training sessions at your convenience:\n\n- Watch post-event recordings of the [Nextflow Summit on YouTube](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32)\n- View replays of the recent online [Nextflow and nf-core training](https://nf-co.re/events/2022/training-october-2022)\n\nFor additional detail on the summit and the preceding nf-core events, also check out an excellent [summary of the event](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/) written by Marcel Ribeiro-Dantas in his blog, the [Dataist Storyteller](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/)!\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)", "images": [], "author": "Noel Ortiz", "tags": "nextflow,tower,cloud" @@ -466,7 +466,7 @@ "slug": "2022/nextflow-summit-call-for-abstracts", "title": "Nextflow Summit 2022", "date": "2022-06-17T00:00:00.000Z", - "content": "\n[As recently announced](https://twitter.com/nextflowio/status/1534903352810676224), we are super excited to host a new Nextflow community event late this year! The Nextflow Summit will take place **October 12-14, 2022** at the iconic Torre Glòries in Barcelona, with an associated [nf-core hackathon](https://nf-co.re/events/2022/hackathon-october-2022) beforehand.\n\n### Call for abstracts\n\nToday we’re excited to open the call for abstracts! We’re looking for talks and posters about anything and everything happening in the Nextflow world. Specifically, we’re aiming to shape the program into four key areas:\n\n- Nextflow: central tool / language / plugins\n- Community: pipelines / applications / use cases\n- Ecosystem: infrastructure / environments\n- Software: containers / tool packaging\n\nSpeaking at the summit will primarily be in-person, but we welcome posters from remote attendees. Posters will be submitted digitally and available online during and after the event. Talks will be streamed live and be available after the event.\n\n

\n Apply for a talk or poster\n

\n\n### Key dates\n\nRegistration for the event will happen separately, with key dates as follows (subject to change):\n\n- Jun 17: Call for abstracts opens\n- July 1: Registration opens\n- July 22: Call for abstracts closes\n- July 29: Accepted speakers notified\n- Sept 9: Registration closes\n- Oct 10-12: Hackathon\n- Oct 12-14: Summit\n\nAbstracts will be read and speakers notified on a rolling basis, so apply soon!\n\nThe Nextflow Summit will start Weds, Oct 12, 5:00 PM CEST and close Fri, Oct 14, 1:00 PM CEST.\n\n### Travel bursaries\n\nThanks to funding from the Chan Zuckerberg Initiative [EOSS Diversity & Inclusion grant](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), we are offering 5 bursaries for travel and accommodation. These will only be available to those who have applied to present a talk or poster and will cover up to $1500 USD, plus registration costs.\n\nIf you’re interested, please select this option when filling the abstracts application form and we will be in touch with more details.\n\n### Stay in the loop\n\nMore information about the summit will be available soon, as we continue to plan the event. Please visit [https://summit.nextflow.io](https://summit.nextflow.io) for details and to sign up to the email list for event updates.\n\n

\n Subscribe for updates\n

\n\nWe will be tweeting about the event using the [#NextflowSummit](http://twitter.com/hashtag/NextflowSummit) hashtag on Twitter. See you in Barcelona!\n", + "content": "[As recently announced](https://twitter.com/nextflowio/status/1534903352810676224), we are super excited to host a new Nextflow community event late this year! The Nextflow Summit will take place **October 12-14, 2022** at the iconic Torre Glòries in Barcelona, with an associated [nf-core hackathon](https://nf-co.re/events/2022/hackathon-october-2022) beforehand.\n\n### Call for abstracts\n\nToday we’re excited to open the call for abstracts! We’re looking for talks and posters about anything and everything happening in the Nextflow world. Specifically, we’re aiming to shape the program into four key areas:\n\n- Nextflow: central tool / language / plugins\n- Community: pipelines / applications / use cases\n- Ecosystem: infrastructure / environments\n- Software: containers / tool packaging\n\nSpeaking at the summit will primarily be in-person, but we welcome posters from remote attendees. Posters will be submitted digitally and available online during and after the event. Talks will be streamed live and be available after the event.\n\n[Apply for a talk or poster](https://seqera.typeform.com/summit-22-talks)\n\n### Key dates\n\nRegistration for the event will happen separately, with key dates as follows (subject to change):\n\n- Jun 17: Call for abstracts opens\n- July 1: Registration opens\n- July 22: Call for abstracts closes\n- July 29: Accepted speakers notified\n- Sept 9: Registration closes\n- Oct 10-12: Hackathon\n- Oct 12-14: Summit\n\nAbstracts will be read and speakers notified on a rolling basis, so apply soon!\n\nThe Nextflow Summit will start Weds, Oct 12, 5:00 PM CEST and close Fri, Oct 14, 1:00 PM CEST.\n\n### Travel bursaries\n\nThanks to funding from the Chan Zuckerberg Initiative [EOSS Diversity & Inclusion grant](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), we are offering 5 bursaries for travel and accommodation. These will only be available to those who have applied to present a talk or poster and will cover up to $1500 USD, plus registration costs.\n\nIf you’re interested, please select this option when filling the abstracts application form and we will be in touch with more details.\n\n### Stay in the loop\n\nMore information about the summit will be available soon, as we continue to plan the event. Please visit [https://summit.nextflow.io](https://summit.nextflow.io) for details and to sign up to the email list for event updates.\n\n[Subscribe for updates](https://share.hsforms.com/1F2Q5F0hSSiyNfuKo6tt-lw3zq3j)\n\nWe will be tweeting about the event using the [#NextflowSummit](http://twitter.com/hashtag/NextflowSummit) hashtag on Twitter. See you in Barcelona!", "images": [], "author": "Phil Ewels", "tags": "nextflow,summit,event,hackathon" @@ -475,7 +475,7 @@ "slug": "2022/rethinking-containers-for-cloud-native-pipelines", "title": "Rethinking containers for cloud native pipelines", "date": "2022-10-13T00:00:00.000Z", - "content": "\nContainers have become an essential part of well-structured data analysis pipelines. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and [reproducible results](https://www.nature.com/articles/nbt.3820).\n\nNextflow was one of the first workflow technologies to fully embrace [containers](https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html) for data analysis pipelines. Community curated container collections such as [BioContainers](https://biocontainers.pro/) also helped speed container adoption.\n\nHowever, the increasing complexity of data analysis pipelines and the need to deploy them across different clouds and platforms pose new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task.\n\nAlso, multi-cloud deployments and the increased use of private container registries further increase complexity for developers. Building and maintaining containers, pushing them to multiple registries, and dealing with platform-specific authentication schemes are tedious, time consuming, and a source of potential errors.\n\n## Wave – a game changer\n\nFor these reasons, we decided to fundamentally rethink how containers are deployed and managed in Nextflow. Today we are thrilled to announce Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.\n\nInstead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself. This approach helps simplify development, improves reliability, and makes pipelines easier to maintain. It can even improve pipeline performance.\n\n## How container provisioning works with Wave\n\nInstead of creating container images, pushing them to registries, and referencing them using Nextflow's [container](https://www.nextflow.io/docs/latest/process.html#container) directive, Wave allows developers to simply include a Dockerfile in the directory where a process is defined.\n\nWhen a process runs, the new Wave plug-in for Nextflow takes the Dockerfile and submits it to the Wave service. Wave then builds a container on-the-fly, pushes it to a destination container registry, and returns the container used for the actual process execution. The Wave service also employs caching at multiple levels to ensure that containers are built only once or when there is a change in the corresponding Dockerfile.\n\nThe registry where images are stored can be specified in the Nextflow config file, along with the other pipeline settings. This means containers can be served from cloud registries closer to where pipelines execute, delivering better performance and reducing network traffic.\n\n![Wave diagram](/img/wave-diagram.png)\n\n## Nextflow, Wave, and Conda – a match made in heaven\n\n[Conda](https://conda.io/) is an excellent package manager, fully [supported in Nextflow](https://www.nextflow.io/blog/2018/conda-support-has-landed.html) as an alternative to using containers to manage software dependencies in pipelines. However, until now, Conda could not be easily used in cloud-native computing platforms such as AWS Batch or Kubernetes.\n\nWave provides developers with a powerful new way to leverage Conda in Nextflow by using a [conda](https://www.nextflow.io/docs/latest/process.html#conda) directive as an alternative way to provision containers in their pipelines. When Wave encounters the `conda` directive in a process definition, and no container or Dockerfile is present, Wave automatically builds a container based on the Conda recipe using the strategy described above. Wave makes this process exceptionally fast (at least compared to vanilla Conda) by leveraging with the [Micromamba](https://github.com/mamba-org/mamba) project under the hood.\n\n## Support for private registries\n\nA long-standing problem with containers in Nextflow was the lack of support for private container registries. Wave solves this problem by acting as an authentication proxy between the Docker client requesting the container and a target container repository. Wave relies on [Nextflow Tower](https://seqera.io/tower/) to authenticate user requests to container registries.\n\nTo access private container registries from a Nextflow pipeline, developers can simply specify their Tower access token in the pipeline configuration file and store their repository credentials in [Nextflow Tower](https://help.tower.nf/22.2/credentials/overview/) page in your account. Wave will automatically and securely use these credentials to authenticate to the private container registry.\n\n## But wait, there's more! Container augmentation!\n\nBy automatically building and provisioning containers, Wave dramatically simplifies how containers are handled in Nextflow. However, there are cases where organizations are required to use validated containers for security or policy reasons rather than build their own images, but still they need to provide additional functionality, like for example, adding site-specific scripts or logging agents while keeping the base container layers intact.\n\nNextflow allows for the definition of pipeline level (and more recently module level) scripts executed in the context of the task execution environment. These scripts can be made accessible to the container environment by mounting a host volume. However, this approach only works when using a local or shared file system.\n\nWave solves these problems by dynamically adding one or more layers to an existing container image during the container image download phase from the registry. Developers can use container augmentation to inject an arbitrary payload into any container without re-building it. Wave then recomputes the image's final manifest adding new layers and checksums on-the-fly, so that the final downloaded image reflects the added content.\n\nWith container augmentation, developers can include a directory called `resources` in pipeline [module directories](https://www.nextflow.io/docs/latest/dsl2.html#module-directory). When the corresponding containerized task is executed, Wave automatically mirrors the content of the resources directory in the root path of the container where it can be accessed by scripts running within the container.\n\n## A sneak preview of Fusion file system\n\nOne of the main motivations for implementing Wave is that we wanted to have the ability to easily package a Fusion client in containers to make this important functionality readily available in Nextflow pipelines.\n\nFusion implements a virtual distributed file system and presents a thin-client allowing data hosted in AWS S3 buckets to be accessed via the standard POSIX filesystem interface expected by the pipeline tools. This client runs in the task container and is added automatically via the Wave augmentation capability. This makes Fusion functionality available for pipeline execution at runtime.\n\nThis means the Nextflow pipeline can use an AWS S3 bucket as the work directory, and pipeline tasks can access the S3 bucket natively as a local file system path. This is an important innovation as it avoids the additional step of copying files in and out of object storage. Fusion takes advantage for the Nextflow tasks segregation and idempotent execution model to optimise and speedup file access operations.\n\n## Getting started\n\nWave requires Nextflow version 22.10.0 or later and can be enabled by using the `-with-wave` command line option or by adding the following snippet in your nextflow.config file:\n\n```\nwave {\n enabled = true\n strategy = 'conda,container'\n}\n\ntower {\n accessToken = \"\"\n}\n```\n\nThe use of the Tower access token is not mandatory, however, it's required to enable the access to private repositories. The use of authentication also allows higher service rate limits compared to anonymous users. You can run a Nextflow pipeline such as rnaseq-nf with Wave, as follows:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-wave\n```\n\nThe configuration in the nextflow.config snippet above will enable the provisioning of Wave containers created starting from the `conda` requirements specified in the pipeline processes.\n\nYou can find additional information and examples in the Nextflow [documentation](https://www.nextflow.io/docs/latest/wave.html) and in the Wave [showcase project](https://github.com/seqeralabs/wave-showcase).\n\n## Availability\n\nThe Wave container provisioning service is available free of charge as technology preview to all Nextflow and Tower users. Wave supports all major container registries including [Docker Hub](https://hub.docker.com/), [Quay.io](https://quay.io/), [AWS Elastic Container Registry](https://aws.amazon.com/ecr/), [Google Artifact Registry](https://cloud.google.com/artifact-registry) and [Azure Container Registry](https://azure.microsoft.com/en-us/products/container-registry/).\n\nDuring the preview period, anonymous users can build up to 10 container images per day and pull 100 containers per hour. Tower authenticated users can build 100 container images per hour and pull 1000 containers per minute. After the preview period, we plan to make the Wave service available free of charge to academic users and open-source software (OSS) projects.\n\n## Conclusion\n\nSoftware containers greatly simplify the deployment of complex data analysis pipelines. However, there still have been many challenges preventing organizations from fully unlocking the potential of this exciting technology. For too long, containers have been viewed as a replacement for package managers, but they serve a different purpose.\n\nIn our view, it's time to re-consider containers as monolithic artifacts that are assembled separately from pipeline code. Instead, containers should be viewed simply as an execution substrate facilitating the deployment of the pipeline software dependencies defined via a proper package manager such as Conda.\n\nWave, Nextflow, and Nextflow Tower combine to fully automate the container lifecycle including management, provisioning and dependencies of complex data pipelines on-demand while removing unnecessary error-prone manual steps.\n", + "content": "Containers have become an essential part of well-structured data analysis pipelines. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and [reproducible results](https://www.nature.com/articles/nbt.3820).\n\nNextflow was one of the first workflow technologies to fully embrace [containers](https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html) for data analysis pipelines. Community curated container collections such as [BioContainers](https://biocontainers.pro/) also helped speed container adoption.\n\nHowever, the increasing complexity of data analysis pipelines and the need to deploy them across different clouds and platforms pose new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task.\n\nAlso, multi-cloud deployments and the increased use of private container registries further increase complexity for developers. Building and maintaining containers, pushing them to multiple registries, and dealing with platform-specific authentication schemes are tedious, time consuming, and a source of potential errors.\n\n## Wave – a game changer\n\nFor these reasons, we decided to fundamentally rethink how containers are deployed and managed in Nextflow. Today we are thrilled to announce Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.\n\nInstead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself. This approach helps simplify development, improves reliability, and makes pipelines easier to maintain. It can even improve pipeline performance.\n\n## How container provisioning works with Wave\n\nInstead of creating container images, pushing them to registries, and referencing them using Nextflow's [container](https://www.nextflow.io/docs/latest/process.html#container) directive, Wave allows developers to simply include a Dockerfile in the directory where a process is defined.\n\nWhen a process runs, the new Wave plug-in for Nextflow takes the Dockerfile and submits it to the Wave service. Wave then builds a container on-the-fly, pushes it to a destination container registry, and returns the container used for the actual process execution. The Wave service also employs caching at multiple levels to ensure that containers are built only once or when there is a change in the corresponding Dockerfile.\n\nThe registry where images are stored can be specified in the Nextflow config file, along with the other pipeline settings. This means containers can be served from cloud registries closer to where pipelines execute, delivering better performance and reducing network traffic.\n\n![Wave diagram](/img/wave-diagram.png)\n\n## Nextflow, Wave, and Conda – a match made in heaven\n\n[Conda](https://conda.io/) is an excellent package manager, fully [supported in Nextflow](https://www.nextflow.io/blog/2018/conda-support-has-landed.html) as an alternative to using containers to manage software dependencies in pipelines. However, until now, Conda could not be easily used in cloud-native computing platforms such as AWS Batch or Kubernetes.\n\nWave provides developers with a powerful new way to leverage Conda in Nextflow by using a [conda](https://www.nextflow.io/docs/latest/process.html#conda) directive as an alternative way to provision containers in their pipelines. When Wave encounters the `conda` directive in a process definition, and no container or Dockerfile is present, Wave automatically builds a container based on the Conda recipe using the strategy described above. Wave makes this process exceptionally fast (at least compared to vanilla Conda) by leveraging with the [Micromamba](https://github.com/mamba-org/mamba) project under the hood.\n\n## Support for private registries\n\nA long-standing problem with containers in Nextflow was the lack of support for private container registries. Wave solves this problem by acting as an authentication proxy between the Docker client requesting the container and a target container repository. Wave relies on [Nextflow Tower](https://seqera.io/tower/) to authenticate user requests to container registries.\n\nTo access private container registries from a Nextflow pipeline, developers can simply specify their Tower access token in the pipeline configuration file and store their repository credentials in [Nextflow Tower](https://help.tower.nf/22.2/credentials/overview/) page in your account. Wave will automatically and securely use these credentials to authenticate to the private container registry.\n\n## But wait, there's more! Container augmentation!\n\nBy automatically building and provisioning containers, Wave dramatically simplifies how containers are handled in Nextflow. However, there are cases where organizations are required to use validated containers for security or policy reasons rather than build their own images, but still they need to provide additional functionality, like for example, adding site-specific scripts or logging agents while keeping the base container layers intact.\n\nNextflow allows for the definition of pipeline level (and more recently module level) scripts executed in the context of the task execution environment. These scripts can be made accessible to the container environment by mounting a host volume. However, this approach only works when using a local or shared file system.\n\nWave solves these problems by dynamically adding one or more layers to an existing container image during the container image download phase from the registry. Developers can use container augmentation to inject an arbitrary payload into any container without re-building it. Wave then recomputes the image's final manifest adding new layers and checksums on-the-fly, so that the final downloaded image reflects the added content.\n\nWith container augmentation, developers can include a directory called `resources` in pipeline [module directories](https://www.nextflow.io/docs/latest/dsl2.html#module-directory). When the corresponding containerized task is executed, Wave automatically mirrors the content of the resources directory in the root path of the container where it can be accessed by scripts running within the container.\n\n## A sneak preview of Fusion file system\n\nOne of the main motivations for implementing Wave is that we wanted to have the ability to easily package a Fusion client in containers to make this important functionality readily available in Nextflow pipelines.\n\nFusion implements a virtual distributed file system and presents a thin-client allowing data hosted in AWS S3 buckets to be accessed via the standard POSIX filesystem interface expected by the pipeline tools. This client runs in the task container and is added automatically via the Wave augmentation capability. This makes Fusion functionality available for pipeline execution at runtime.\n\nThis means the Nextflow pipeline can use an AWS S3 bucket as the work directory, and pipeline tasks can access the S3 bucket natively as a local file system path. This is an important innovation as it avoids the additional step of copying files in and out of object storage. Fusion takes advantage for the Nextflow tasks segregation and idempotent execution model to optimise and speedup file access operations.\n\n## Getting started\n\nWave requires Nextflow version 22.10.0 or later and can be enabled by using the `-with-wave` command line option or by adding the following snippet in your nextflow.config file:\n\n```\nwave {\n enabled = true\n strategy = 'conda,container'\n}\n\ntower {\n accessToken = \"\"\n}\n```\n\nThe use of the Tower access token is not mandatory, however, it's required to enable the access to private repositories. The use of authentication also allows higher service rate limits compared to anonymous users. You can run a Nextflow pipeline such as rnaseq-nf with Wave, as follows:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-wave\n```\n\nThe configuration in the nextflow.config snippet above will enable the provisioning of Wave containers created starting from the `conda` requirements specified in the pipeline processes.\n\nYou can find additional information and examples in the Nextflow [documentation](https://www.nextflow.io/docs/latest/wave.html) and in the Wave [showcase project](https://github.com/seqeralabs/wave-showcase).\n\n## Availability\n\nThe Wave container provisioning service is available free of charge as technology preview to all Nextflow and Tower users. Wave supports all major container registries including [Docker Hub](https://hub.docker.com/), [Quay.io](https://quay.io/), [AWS Elastic Container Registry](https://aws.amazon.com/ecr/), [Google Artifact Registry](https://cloud.google.com/artifact-registry) and [Azure Container Registry](https://azure.microsoft.com/en-us/products/container-registry/).\n\nDuring the preview period, anonymous users can build up to 10 container images per day and pull 100 containers per hour. Tower authenticated users can build 100 container images per hour and pull 1000 containers per minute. After the preview period, we plan to make the Wave service available free of charge to academic users and open-source software (OSS) projects.\n\n## Conclusion\n\nSoftware containers greatly simplify the deployment of complex data analysis pipelines. However, there still have been many challenges preventing organizations from fully unlocking the potential of this exciting technology. For too long, containers have been viewed as a replacement for package managers, but they serve a different purpose.\n\nIn our view, it's time to re-consider containers as monolithic artifacts that are assembled separately from pipeline code. Instead, containers should be viewed simply as an execution substrate facilitating the deployment of the pipeline software dependencies defined via a proper package manager such as Conda.\n\nWave, Nextflow, and Nextflow Tower combine to fully automate the container lifecycle including management, provisioning and dependencies of complex data pipelines on-demand while removing unnecessary error-prone manual steps.\n", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,tower,cloud" @@ -484,7 +484,7 @@ "slug": "2022/turbocharging-nextflow-with-fig", "title": "Turbo-charging the Nextflow command line with Fig!", "date": "2022-09-22T00:00:00.000Z", - "content": "\nNextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn't be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options.\n\n![xkcd charge making fun of tar tricky command line arguments](/img/xkcd_tar_charge.png)\nhttps://xkcd.com/1168/\n\nMany tools have sprung up to make the command-line more user friendly, such as tldr pages and rich-click. [Fig](https://fig.io) is one such tool that adds powerful autocomplete functionality to your terminal. Fig gives you graphical popups with color-coded contexts more dynamic than shaded text for recent commands or long blocks of text after pressing tab.\n\nFig is compatible with most terminals, shells and IDEs (such as the VSCode terminal), is fully supported in MacOS, and has beta support for Linux and Windows. In MacOS, you can simply install it with `brew install --cask fig` and then running the `fig` command to set it up.\n\nWe have now added Nextflow for Fig. Thanks to Figs open source core we were able to contribute specifications in Typescript that will now be automatically added for anyone installing or updating Fig. Now, with Fig, when you start typing your Nextflow commands, you’ll see autocomplete suggestions based on what you are typing and what you have typed in the past, such as your favorite options.\n\n![GIF with a demo of nextflow log/list subcommands](/img/nxf-log-list-params.gif)\n\nThe Fig autocomplete functionality can also be adjusted to suit our preferences. Suggestions can be displayed in alphabetical order or as a list of your most recent commands. Similarly, suggestions can be displayed all the time or only when you press tab.\n\nThe Fig specification that we've written not only suggests commands and options, but dynamic inputs too. For example, finding previous run names when resuming or cleaning runs is tedious and error prone. Similarly, pipelines that you’ve already downloaded with `nextflow pull` will be autocompleted if they have been run in the past. You won't have to remember the full names anymore, as Fig generators in the autocomplete allow you to automatically complete the run name after typing a few letters where a run name is expected. Importantly, this also works for pipeline names!\n\n![GIF with a demo of nextflow pull/run/clean/view/config subcommands](/img/nxf-pull-run-clean-view-config.gif)\n\nFig for Nextflow will make you increase your productivity regardless of your user level. If you run multiple pipelines during your day you will immediately see the benefit of Fig. Your productivity will increase by taking advantage of this autocomplete function for run and project names. For Nextflow newcomers it will provide an intuitive way to explore the Nextflow CLI with built-in help text.\n\nWhile Fig won’t replace the need to view help menus and documentation it will undoubtedly save you time and energy searching for commands and copying and pasting run names. Take your coding to the next level using Fig!\n", + "content": "Nextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn't be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options.\n\n![xkcd charge making fun of tar tricky command line arguments](/img/xkcd_tar_charge.png)\nhttps://xkcd.com/1168/\n\nMany tools have sprung up to make the command-line more user friendly, such as tldr pages and rich-click. [Fig](https://fig.io) is one such tool that adds powerful autocomplete functionality to your terminal. Fig gives you graphical popups with color-coded contexts more dynamic than shaded text for recent commands or long blocks of text after pressing tab.\n\nFig is compatible with most terminals, shells and IDEs (such as the VSCode terminal), is fully supported in MacOS, and has beta support for Linux and Windows. In MacOS, you can simply install it with `brew install --cask fig` and then running the `fig` command to set it up.\n\nWe have now added Nextflow for Fig. Thanks to Figs open source core we were able to contribute specifications in Typescript that will now be automatically added for anyone installing or updating Fig. Now, with Fig, when you start typing your Nextflow commands, you’ll see autocomplete suggestions based on what you are typing and what you have typed in the past, such as your favorite options.\n\n![GIF with a demo of nextflow log/list subcommands](/img/nxf-log-list-params.gif)\n\nThe Fig autocomplete functionality can also be adjusted to suit our preferences. Suggestions can be displayed in alphabetical order or as a list of your most recent commands. Similarly, suggestions can be displayed all the time or only when you press tab.\n\nThe Fig specification that we've written not only suggests commands and options, but dynamic inputs too. For example, finding previous run names when resuming or cleaning runs is tedious and error prone. Similarly, pipelines that you’ve already downloaded with `nextflow pull` will be autocompleted if they have been run in the past. You won't have to remember the full names anymore, as Fig generators in the autocomplete allow you to automatically complete the run name after typing a few letters where a run name is expected. Importantly, this also works for pipeline names!\n\n![GIF with a demo of nextflow pull/run/clean/view/config subcommands](/img/nxf-pull-run-clean-view-config.gif)\n\nFig for Nextflow will make you increase your productivity regardless of your user level. If you run multiple pipelines during your day you will immediately see the benefit of Fig. Your productivity will increase by taking advantage of this autocomplete function for run and project names. For Nextflow newcomers it will provide an intuitive way to explore the Nextflow CLI with built-in help text.\n\nWhile Fig won’t replace the need to view help menus and documentation it will undoubtedly save you time and energy searching for commands and copying and pasting run names. Take your coding to the next level using Fig!", "images": [], "author": "Marcel Ribeiro-Dantas", "tags": "nextflow,development,learning" @@ -493,7 +493,7 @@ "slug": "2023/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer", "title": "A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer”", "date": "2023-06-19T00:00:00.000Z", - "content": "\nMost support tickets crossing our desks don’t warrant a blog article. However, occasionally we encounter a genuine mystery—a bug so pervasive and vile that it threatens innocent containers and pipelines everywhere. Such was the case of the **_OOM killer_**.\n\nIn this article, we alert our colleagues in the Nextflow community to the threat. We also discuss how to recognize the killer’s signature in case you find yourself dealing with a similar murder mystery in your own cluster or cloud.\n\n\n\n## To catch a killer\n\nIn mid-2022, Nextflow jobs began to mysteriously die. Containerized tasks were being struck down in the prime of life, seemingly at random. By November, the body count was beginning to mount: Out-of-memory (OOM) errors were everywhere we looked!\n\nIt became clear that we had a serial killer on our hands. Unfortunately, identifying a suspect turned out to be easier said than done. Nextflow is rather good at restarting failed containers after all, giving the killer a convenient alibi and plenty of places to hide. Sometimes, the killings went unnoticed, requiring forensic analysis of log files.\n\nWhile we’ve made great strides, and the number of killings has dropped dramatically, the killer is still out there. In this article, we offer some tips that may prove helpful if the killer strikes in your environment.\n\n## Establishing an MO\n\nFortunately for our intrepid investigators, the killer exhibited a consistent _modus operandi_. Containerized jobs on [Amazon EC2](https://aws.amazon.com/ec2/) were being killed due to out-of-memory (OOM) errors, even when plenty of memory was available on the container host. While we initially thought the killer was native to the AWS cloud, we later realized it could also strike in other locales.\n\nWhat the killings had in common was that they tended to occur when Nextflow tasks copied large files from Amazon S3 to a container’s local file system via the AWS CLI. As some readers may know, Nextflow leverages the AWS CLI behind the scenes to facilitate data movement. The killer’s calling card was an `[Errno 12] Cannot allocate memory` message, causing the container to terminate with an exit status of 1.\n\n```\nNov-08 21:54:07.926 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:MARKDUPLICATES:BAM_TO_CRAM:SAMTOOLS_STATS_CRAM (004-005_L3.SSHT82)'\nCaused by:\n Essential container in task exited\n..\nCommand error:\n download failed: s3://myproject/NFTower-Ref/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory\n```\n\nThe problem is illustrated in the diagram below. In theory, Nextflow should have been able to dispatch multiple containerized tasks to a single host. However, tasks were being killed with out-of-memory errors even though plenty of memory was available. Rather than being able to run many containers per host, we could only run two or three and even that was dicey! Needless to say, this resulted in a dramatic loss of efficiency.\n\n\n\nAmong our crack team of investigators, alarm bells began to ring. We asked ourselves, _“Could the killer be inside the house?”_ Was it possible that Nextflow was nefariously killing its own containerized tasks?\n\nBefore long, reports of similar mysterious deaths began to trickle in from other jurisdictions. It turned out that the killer had struck [Cromwell](https://cromwell.readthedocs.io/en/stable/) also ([see the police report here](https://github.com/aws/aws-cli/issues/5876)). We breathed a sigh of relief that we could rule out Nextflow as the culprit, but we still had a killer on the loose and a series of container murders to solve!\n\n## Recreating the scene of the crime\n\nAs any good detective knows, recreating the scene of the crime is a good place to start. It turned out that our killer had a profile and had been targeting containers processing large datasets since 2020. We came across an excellent [codefresh.io article](https://codefresh.io/blog/docker-memory-usage/) by Saffi Hartal, discussing similar murders and suggesting techniques to lure the killer out of hiding and protect the victims. Unfortunately, the suggested workaround of periodically clearing kernel buffers was impractical in our Nextflow pipeline scenario.\n\nWe borrowed the Python script from [Saffi’s article](https://codefresh.io/blog/docker-memory-usage/) designed to write huge files and simulate the issues we saw with the Linux buffer and page cache. Using this script, we hoped to replicate the conditions at the time of the murders.\n\nUsing separate SSH sessions to the same docker host, we manually launched the Python script from the command line to run in a Docker container, allocating 512MB of memory to each container. This was meant to simulate the behavior of the Nextflow head job dispatching multiple tasks to the same Docker host. We monitored memory usage as each container was started.\n\n```bash\n$ docker run --rm -it -v $PWD/dockertest.py:/dockertest.py --entrypoint /bin/bash --memory=\"512M\" --memory-swap=0 python:3.10.5-slim-bullseye\n```\n\nSure enough, we found that containers began dying with out-of-memory errors. Sometimes we could run a single container, and sometimes we could run two. Containers died even though memory use was well under the cgroups-enforced maximum, as reported by docker stats. As containers ran, we also used the Linux `free` command to monitor memory usage and the combined memory used by kernel buffers and the page cache.\n\n## Developing a theory of the case\n\nFrom our testing, we were able to clear both Nextflow and the AWS S3 copy facility since we could replicate the out-of-memory error in our controlled environment independent of both.\n\nWe had multiple theories of the case: **_Was it Colonel Mustard with an improper cgroups configuration? Was it Professor Plum and the size of the SWAP partition? Was it Mrs. Peacock running a Linux 5.20 kernel?_**\n\n_For the millennials and Gen Zs in the crowd, you can find a primer on the CLUE/Cluedo references [here](https://en.wikipedia.org/wiki/Cluedo)_\n\nTo make a long story short, we identified several suspects and conducted tests to clear each suspect one by one. Tests included the following:\n\n- We conducted tests with EBS vs. NVMe disk volumes to see if the error was related to page caches when using EBS. The problems persisted with NVMe but appeared to be much less severe.\n- We attempted to configure a swap partition as recommended in this [AWS article](https://repost.aws/knowledge-center/ecs-resolve-outofmemory-errors), which discusses similar out-of-memory errors in Amazon ECS (used by AWS Batch). AWS provides good documentation on managing container [swap space](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-swap.html) using the `--memory-swap` switch. You can learn more about how Docker manages swap space in the [Docker documentation](https://docs.docker.com/config/containers/resource_constraints/).\n- Creating swap files on the Docker host and making swap available to containers using the switch `--memory-swap=\"1g\"` appeared to help, and we learned a lot in the process. Using this workaround we could reliably run 10 containers simultaneously, whereas previously, we could run only one or two. This was a good workaround for static clusters but wasn’t always helpful in cloud batch environments. Creating the swap partition requires root privileges, and in batch environments, where resources may be provisioned automatically, this could be difficult to implement. It also didn’t explain the root cause of why containers were being killed. You can use the commands below to create a swap partition:\n\n```bash\n$ sudo dd if=/dev/zero of=/mnt/2GiB.swap bs=2048 count=1048576\n$ mkswap /mnt/2GiB.swap\n$ swapon /mnt/2GiB.swap\n```\n\n## A break in the case!\n\nOn Nov 16th, we finally caught a break in the case. A hot tip from Seqera Lab’s own [Jordi Deu-Pons](https://github.com/jordeu), indicated the culprit may be lurking in the Linux kernel. He suggested hard coding limits for two Linux kernel parameters as follows:\n\n```bash\n$ echo \"838860800\" > /proc/sys/vm/dirty_bytes\n$ echo \"524288000\" > /proc/sys/vm/dirty_background_bytes\n```\n\nWhile it may seem like a rather unusual and specific leap of brilliance, our tipster’s hypothesis was inspired by this [kernel bug](https://bugzilla.kernel.org/show_bug.cgi?id=207273) description. With this simple change, the reported memory usage for each container, as reported by docker stats, dropped dramatically. **Suddenly, we could run as many containers simultaneously as physical memory would allow.** It turns out that this was a regression bug that only manifested in newer versions of the Linux kernel.\n\nBy hardcoding these [kernel parameters](https://docs.kernel.org/admin-guide/sysctl/vm.html), we were limiting the number of dirty pages the kernel could hold before writing pages to disk. When these variables were not set, they defaulted to 0, and the default parameters `dirty_ratio` and `dirty_bakground_ratio` took effect instead.\n\nIn high-load conditions (such as data-intensive Nextflow pipeline tasks), processes accumulated dirty pages faster than the kernel could flush them to disk, eventually leading to the out-of-memory condition. By hard coding the dirty pages limit, we forced the kernel to flush the dirty pages to disk, thereby avoiding the bug. This also explained why the problem was less pronounced using NVMe storage, where flushing to disk occurred more quickly, thus mitigating the problem.\n\nFurther testing determined that the bug appeared reliably on the newer [AMI Linux 2 AMI using the 5.10 kernel](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-linux-2-ami-kernel-5-10/). The bug did not seem to appear when using the older Amazon Linux 2 AMI running the 4.14 kernel version.\n\nWe now had two solid strategies to resolve the problem and thwart our killer:\n\n- Create a swap partition and run containers with the `--memory-swap` flag set.\n- Set `dirty_bytes` and `dirty_background_bytes` kernel variables on the Docker host before launching the jobs.\n\n## The killer is (mostly) brought to justice\n\nAvoiding the Linux 5.10 kernel was obviously not a viable option. The 5.10 kernel includes support for important processor architectures such as Intel® Ice Lake. This bug did not manifest earlier because, by default, AWS Batch was using ECS-optimized AMIs based on the 4.14 kernel. Further testing showed us that the killer could still appear in 4.14 environments, but the bug was harder to trigger.\n\nWe ended up working around the problem for Nextflow Tower users by tweaking the kernel parameters in the compute environment deployed by Tower Forge. This solution works reliably with AMIs based on both the 4.14 and 5.10 kernels. We considered adding a swap partition as this was another potential solution to the problem. However, we were concerned that this could have performance implications, particularly for customers running with EBS gp2 magnetic disk storage.\n\nInterestingly, we also tested the [Fusion v2 file system](https://seqera.io/fusion/) with NVMe disk. Using Fusion, we avoided the bug entirely on both kernel versions without needing to adjust kernel partitions or add a swap partition.\n\n## Some helpful investigative tools\n\nIf you find evidence of foul play in your cloud or cluster, here are some useful investigative tools you can use:\n\n- After manually starting a container, use [docker stats](https://docs.docker.com/engine/reference/commandline/stats/) to monitor the CPU and memory used by each container compared to available memory.\n\n ```bash\n $ watch docker stats\n ```\n\n- The Linux [free](https://linuxhandbook.com/free-command/) utility is an excellent way to monitor memory usage. You can track total, used, and free memory and monitor the combined memory used by kernel buffers and page cache reported in the _buff/cache_ column.\n\n ```bash\n $ free -h\n ```\n\n- After a container was killed, we executed the command below on the Docker host to confirm why the containerized Python script was killed.\n\n ```bash\n $ dmesg -T | grep -i ‘killed process’\n ```\n\n- We used the Linux [htop](https://man7.org/linux/man-pages/man1/htop.1.html) command to monitor CPU and memory usage to check the results reported by Docker and double-check CPU and memory use.\n- You can use the command [systemd-cgtop](https://www.commandlinux.com/man-page/man1/systemd-cgtop.1.html) to validate group settings and ensure you are not running into arbitrary limits imposed by _cgroups_.\n- Related to the _cgroups_ settings described above, you can inspect various memory-related limits directly from the file system. You can also use an alias to make the large numbers associated with _cgroups_ parameters easier to read. For example:\n\n ```bash\n $ alias n='numft --to=iec-i'\n $ cat /sys/fs/cgroup/memory/docker/DOCKER_CONTAINER/memory.limit_in_bytes | n\n 512Mi\n ```\n\n- You can clear the kernel buffer and page cache that appears in the buff/cache columns reported by the Linux _free_ command using either of these commands:\n\n ```bash\n $ echo 1 > /proc/sys/vm/drop_caches\n $ sysctl -w vm.drop_caches=1\n ```\n\n## The bottom line\n\nWhile we’ve come a long way in bringing the killer to justice, out-of-memory issues still crop up occasionally. It’s hard to say whether these are copycats, but you may still run up against this bug in a dark alley near you!\n\nIf you run into similar problems, hopefully, some of the suggestions offered above, such as tweaking kernel parameters or adding a swap partition on the Docker host, can help.\n\nFor some users, a good workaround is to use the [Fusion file system](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) instead of Nextflow’s conventional approach based on the AWS CLI. As explained above, the combination of more efficient data handling in Fusion and fast NVMe storage means that dirty pages are flushed more quickly, and containers are less likely to reach hard limits and exit with an out-of-memory error.\n\nYou can learn more about the Fusion file system by downloading the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/). If you encounter similar issues or have ideas to share, join the discussion on the [Nextflow Slack channel](https://join.slack.com/t/nextflow/shared_invite/zt-11iwlxtw5-R6SNBpVksOJAx5sPOXNrZg).\n", + "content": "Most support tickets crossing our desks don’t warrant a blog article. However, occasionally we encounter a genuine mystery—a bug so pervasive and vile that it threatens innocent containers and pipelines everywhere. Such was the case of the **_OOM killer_**.\n\nIn this article, we alert our colleagues in the Nextflow community to the threat. We also discuss how to recognize the killer’s signature in case you find yourself dealing with a similar murder mystery in your own cluster or cloud.\n\n\n\n## To catch a killer\n\nIn mid-2022, Nextflow jobs began to mysteriously die. Containerized tasks were being struck down in the prime of life, seemingly at random. By November, the body count was beginning to mount: Out-of-memory (OOM) errors were everywhere we looked!\n\nIt became clear that we had a serial killer on our hands. Unfortunately, identifying a suspect turned out to be easier said than done. Nextflow is rather good at restarting failed containers after all, giving the killer a convenient alibi and plenty of places to hide. Sometimes, the killings went unnoticed, requiring forensic analysis of log files.\n\nWhile we’ve made great strides, and the number of killings has dropped dramatically, the killer is still out there. In this article, we offer some tips that may prove helpful if the killer strikes in your environment.\n\n## Establishing an MO\n\nFortunately for our intrepid investigators, the killer exhibited a consistent _modus operandi_. Containerized jobs on [Amazon EC2](https://aws.amazon.com/ec2/) were being killed due to out-of-memory (OOM) errors, even when plenty of memory was available on the container host. While we initially thought the killer was native to the AWS cloud, we later realized it could also strike in other locales.\n\nWhat the killings had in common was that they tended to occur when Nextflow tasks copied large files from Amazon S3 to a container’s local file system via the AWS CLI. As some readers may know, Nextflow leverages the AWS CLI behind the scenes to facilitate data movement. The killer’s calling card was an `[Errno 12] Cannot allocate memory` message, causing the container to terminate with an exit status of 1.\n\n```\nNov-08 21:54:07.926 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:MARKDUPLICATES:BAM_TO_CRAM:SAMTOOLS_STATS_CRAM (004-005_L3.SSHT82)'\nCaused by:\n Essential container in task exited\n..\nCommand error:\n download failed: s3://myproject/NFTower-Ref/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory\n```\n\nThe problem is illustrated in the diagram below. In theory, Nextflow should have been able to dispatch multiple containerized tasks to a single host. However, tasks were being killed with out-of-memory errors even though plenty of memory was available. Rather than being able to run many containers per host, we could only run two or three and even that was dicey! Needless to say, this resulted in a dramatic loss of efficiency.\n\n\n\nAmong our crack team of investigators, alarm bells began to ring. We asked ourselves, _“Could the killer be inside the house?”_ Was it possible that Nextflow was nefariously killing its own containerized tasks?\n\nBefore long, reports of similar mysterious deaths began to trickle in from other jurisdictions. It turned out that the killer had struck [Cromwell](https://cromwell.readthedocs.io/en/stable/) also ([see the police report here](https://github.com/aws/aws-cli/issues/5876)). We breathed a sigh of relief that we could rule out Nextflow as the culprit, but we still had a killer on the loose and a series of container murders to solve!\n\n## Recreating the scene of the crime\n\nAs any good detective knows, recreating the scene of the crime is a good place to start. It turned out that our killer had a profile and had been targeting containers processing large datasets since 2020. We came across an excellent [codefresh.io article](https://codefresh.io/blog/docker-memory-usage/) by Saffi Hartal, discussing similar murders and suggesting techniques to lure the killer out of hiding and protect the victims. Unfortunately, the suggested workaround of periodically clearing kernel buffers was impractical in our Nextflow pipeline scenario.\n\nWe borrowed the Python script from [Saffi’s article](https://codefresh.io/blog/docker-memory-usage/) designed to write huge files and simulate the issues we saw with the Linux buffer and page cache. Using this script, we hoped to replicate the conditions at the time of the murders.\n\nUsing separate SSH sessions to the same docker host, we manually launched the Python script from the command line to run in a Docker container, allocating 512MB of memory to each container. This was meant to simulate the behavior of the Nextflow head job dispatching multiple tasks to the same Docker host. We monitored memory usage as each container was started.\n\n```bash\n$ docker run --rm -it -v $PWD/dockertest.py:/dockertest.py --entrypoint /bin/bash --memory=\"512M\" --memory-swap=0 python:3.10.5-slim-bullseye\n```\n\nSure enough, we found that containers began dying with out-of-memory errors. Sometimes we could run a single container, and sometimes we could run two. Containers died even though memory use was well under the cgroups-enforced maximum, as reported by docker stats. As containers ran, we also used the Linux `free` command to monitor memory usage and the combined memory used by kernel buffers and the page cache.\n\n## Developing a theory of the case\n\nFrom our testing, we were able to clear both Nextflow and the AWS S3 copy facility since we could replicate the out-of-memory error in our controlled environment independent of both.\n\nWe had multiple theories of the case: **_Was it Colonel Mustard with an improper cgroups configuration? Was it Professor Plum and the size of the SWAP partition? Was it Mrs. Peacock running a Linux 5.20 kernel?_**\n\n_For the millennials and Gen Zs in the crowd, you can find a primer on the CLUE/Cluedo references [here](https://en.wikipedia.org/wiki/Cluedo)_\n\nTo make a long story short, we identified several suspects and conducted tests to clear each suspect one by one. Tests included the following:\n\n- We conducted tests with EBS vs. NVMe disk volumes to see if the error was related to page caches when using EBS. The problems persisted with NVMe but appeared to be much less severe.\n- We attempted to configure a swap partition as recommended in this [AWS article](https://repost.aws/knowledge-center/ecs-resolve-outofmemory-errors), which discusses similar out-of-memory errors in Amazon ECS (used by AWS Batch). AWS provides good documentation on managing container [swap space](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-swap.html) using the `--memory-swap` switch. You can learn more about how Docker manages swap space in the [Docker documentation](https://docs.docker.com/config/containers/resource_constraints/).\n- Creating swap files on the Docker host and making swap available to containers using the switch `--memory-swap=\"1g\"` appeared to help, and we learned a lot in the process. Using this workaround we could reliably run 10 containers simultaneously, whereas previously, we could run only one or two. This was a good workaround for static clusters but wasn’t always helpful in cloud batch environments. Creating the swap partition requires root privileges, and in batch environments, where resources may be provisioned automatically, this could be difficult to implement. It also didn’t explain the root cause of why containers were being killed. You can use the commands below to create a swap partition:\n\n```bash\n$ sudo dd if=/dev/zero of=/mnt/2GiB.swap bs=2048 count=1048576\n$ mkswap /mnt/2GiB.swap\n$ swapon /mnt/2GiB.swap\n```\n\n## A break in the case!\n\nOn Nov 16th, we finally caught a break in the case. A hot tip from Seqera Lab’s own [Jordi Deu-Pons](https://github.com/jordeu), indicated the culprit may be lurking in the Linux kernel. He suggested hard coding limits for two Linux kernel parameters as follows:\n\n```bash\n$ echo \"838860800\" > /proc/sys/vm/dirty_bytes\n$ echo \"524288000\" > /proc/sys/vm/dirty_background_bytes\n```\n\nWhile it may seem like a rather unusual and specific leap of brilliance, our tipster’s hypothesis was inspired by this [kernel bug](https://bugzilla.kernel.org/show_bug.cgi?id=207273) description. With this simple change, the reported memory usage for each container, as reported by docker stats, dropped dramatically. **Suddenly, we could run as many containers simultaneously as physical memory would allow.** It turns out that this was a regression bug that only manifested in newer versions of the Linux kernel.\n\nBy hardcoding these [kernel parameters](https://docs.kernel.org/admin-guide/sysctl/vm.html), we were limiting the number of dirty pages the kernel could hold before writing pages to disk. When these variables were not set, they defaulted to 0, and the default parameters `dirty_ratio` and `dirty_bakground_ratio` took effect instead.\n\nIn high-load conditions (such as data-intensive Nextflow pipeline tasks), processes accumulated dirty pages faster than the kernel could flush them to disk, eventually leading to the out-of-memory condition. By hard coding the dirty pages limit, we forced the kernel to flush the dirty pages to disk, thereby avoiding the bug. This also explained why the problem was less pronounced using NVMe storage, where flushing to disk occurred more quickly, thus mitigating the problem.\n\nFurther testing determined that the bug appeared reliably on the newer [AMI Linux 2 AMI using the 5.10 kernel](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-linux-2-ami-kernel-5-10/). The bug did not seem to appear when using the older Amazon Linux 2 AMI running the 4.14 kernel version.\n\nWe now had two solid strategies to resolve the problem and thwart our killer:\n\n- Create a swap partition and run containers with the `--memory-swap` flag set.\n- Set `dirty_bytes` and `dirty_background_bytes` kernel variables on the Docker host before launching the jobs.\n\n## The killer is (mostly) brought to justice\n\nAvoiding the Linux 5.10 kernel was obviously not a viable option. The 5.10 kernel includes support for important processor architectures such as Intel® Ice Lake. This bug did not manifest earlier because, by default, AWS Batch was using ECS-optimized AMIs based on the 4.14 kernel. Further testing showed us that the killer could still appear in 4.14 environments, but the bug was harder to trigger.\n\nWe ended up working around the problem for Nextflow Tower users by tweaking the kernel parameters in the compute environment deployed by Tower Forge. This solution works reliably with AMIs based on both the 4.14 and 5.10 kernels. We considered adding a swap partition as this was another potential solution to the problem. However, we were concerned that this could have performance implications, particularly for customers running with EBS gp2 magnetic disk storage.\n\nInterestingly, we also tested the [Fusion v2 file system](https://seqera.io/fusion/) with NVMe disk. Using Fusion, we avoided the bug entirely on both kernel versions without needing to adjust kernel partitions or add a swap partition.\n\n## Some helpful investigative tools\n\nIf you find evidence of foul play in your cloud or cluster, here are some useful investigative tools you can use:\n\n- After manually starting a container, use [docker stats](https://docs.docker.com/engine/reference/commandline/stats/) to monitor the CPU and memory used by each container compared to available memory.\n\n ```bash\n $ watch docker stats\n ```\n\n- The Linux [free](https://linuxhandbook.com/free-command/) utility is an excellent way to monitor memory usage. You can track total, used, and free memory and monitor the combined memory used by kernel buffers and page cache reported in the _buff/cache_ column.\n\n ```bash\n $ free -h\n ```\n\n- After a container was killed, we executed the command below on the Docker host to confirm why the containerized Python script was killed.\n\n ```bash\n $ dmesg -T | grep -i ‘killed process’\n ```\n\n- We used the Linux [htop](https://man7.org/linux/man-pages/man1/htop.1.html) command to monitor CPU and memory usage to check the results reported by Docker and double-check CPU and memory use.\n- You can use the command [systemd-cgtop](https://www.commandlinux.com/man-page/man1/systemd-cgtop.1.html) to validate group settings and ensure you are not running into arbitrary limits imposed by _cgroups_.\n- Related to the _cgroups_ settings described above, you can inspect various memory-related limits directly from the file system. You can also use an alias to make the large numbers associated with _cgroups_ parameters easier to read. For example:\n\n ```bash\n $ alias n='numft --to=iec-i'\n $ cat /sys/fs/cgroup/memory/docker/DOCKER_CONTAINER/memory.limit_in_bytes | n\n 512Mi\n ```\n\n- You can clear the kernel buffer and page cache that appears in the buff/cache columns reported by the Linux _free_ command using either of these commands:\n\n ```bash\n $ echo 1 > /proc/sys/vm/drop_caches\n $ sysctl -w vm.drop_caches=1\n ```\n\n## The bottom line\n\nWhile we’ve come a long way in bringing the killer to justice, out-of-memory issues still crop up occasionally. It’s hard to say whether these are copycats, but you may still run up against this bug in a dark alley near you!\n\nIf you run into similar problems, hopefully, some of the suggestions offered above, such as tweaking kernel parameters or adding a swap partition on the Docker host, can help.\n\nFor some users, a good workaround is to use the [Fusion file system](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) instead of Nextflow’s conventional approach based on the AWS CLI. As explained above, the combination of more efficient data handling in Fusion and fast NVMe storage means that dirty pages are flushed more quickly, and containers are less likely to reach hard limits and exit with an out-of-memory error.\n\nYou can learn more about the Fusion file system by downloading the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/). If you encounter similar issues or have ideas to share, join the discussion on the [Nextflow Slack channel](https://join.slack.com/t/nextflow/shared_invite/zt-11iwlxtw5-R6SNBpVksOJAx5sPOXNrZg).", "images": [ "/img/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer-1.jpg" ], @@ -504,7 +504,7 @@ "slug": "2023/best-practices-deploying-pipelines-with-hpc-workload-managers", "title": "Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters", "date": "2023-05-26T00:00:00.000Z", - "content": "\nWith all the focus on cloud computing, it's easy to forget that most Nextflow pipelines still run on traditional HPC clusters. In fact, according to our latest [State of the Workflow 2023](https://seqera.io/blog/the-state-of-the-workflow-the-2023-nextflow-and-nf-core-community-survey/) community survey, **62.8%** of survey respondents report running Nextflow on HPC clusters, and **75%** use an HPC workload manager.1 While the cloud is making gains, traditional clusters aren't going away anytime soon.\n\nTapping cloud infrastructure offers many advantages in terms of convenience and scalability. However, for organizations with the capacity to manage in-house clusters, there are still solid reasons to run workloads locally:\n\n- _Guaranteed access to resources_. Users don't need to worry about shortages of particular instance types, spot instance availability, or exceeding cloud spending caps.\n- _Predictable pricing_. Organizations are protected against price inflation and unexpected rate increases by capitalizing assets and depreciating them over time.\n- _Reduced costs_. Contrary to conventional wisdom, well-managed, highly-utilized, on-prem clusters are often less costly per core hour than cloud-based alternatives.\n- _Better performance and throughput_. While HPC infrastructure in the cloud is impressive, state-of-the-art on-prem clusters are still tough to beat.2\n\nThis article provides some helpful tips for organizations running Nextflow on HPC clusters.\n\n## The anatomy of an HPC cluster\n\nHPC Clusters come in many shapes and sizes. Some are small, consisting of a single head node and a few compute hosts, while others are huge, with tens or even hundreds of host computers.\n\nThe diagram below shows the topology of a typical mid-sized HPC cluster. Clusters typically have one or more \"head nodes\" that run workload and/or cluster management software. Cluster managers, such as [Warewulf](https://warewulf.lbl.gov/), [xCAT](https://xcat.org/), [NVIDIA Bright Cluster Manager](https://www.nvidia.com/en-us/data-center/bright-cluster-manager/), [HPE Performance Cluster Manager](https://www.hpe.com/psnow/doc/a00044858enw), or [IBM Spectrum Cluster Foundation](https://www.ibm.com/docs/en/scf/4.2.2?topic=guide-spectrum-cluster-foundation), are typically used to manage software images and provision cluster nodes. Large clusters may have multiple head nodes, with workload management software configured to failover if the master host fails.\n\n\n\nLarge clusters may have dedicated job submission hosts (also called login hosts) so that user activity does not interfere with scheduling and management activities on the head node. In smaller environments, users may simply log in to the head node to submit their jobs.\n\nClusters are often composed of different compute hosts suited to particular workloads.3 They may also have separate dedicated networks for management, internode communication, and connections to a shared storage subsystem. Users typically have network access only to the head node(s) and job submission hosts and are prevented from connecting to the compute hosts directly.\n\nDepending on the workloads a cluster is designed to support, compute hosts may be connected via a private high-speed 100 GbE or Infiniband-based network commonly used for MPI parallel workloads. Cluster hosts typically have access to a shared file system as well. In life sciences environments, NFS filers are commonly used. However, high-performance clusters may use parallel file systems such as [Lustre](https://www.lustre.org/), [IBM Spectrum Scale](https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html) (formerly GPFS), [BeeGFS](https://www.beegfs.io/c/), or [WEKA](https://www.weka.io/data-platform/solutions/hpc-data-management/).\n\n[Learn about selecting the right storage architecture for your Nextflow pipelines](https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html).\n\n## HPC workload managers\n\nHPC workload managers have been around for decades. Initial efforts date back to the original [Portable Batch System](https://www.chpc.utah.edu/documentation/software/pbs-scheduler.php) (PBS) developed for NASA in the early 1990s. While modern workload managers have become enormously sophisticated, many of their core principles remain unchanged.\n\nWorkload managers are designed to share resources efficiently between users and groups. Modern workload managers support many different scheduling policies and workload types — from parallel jobs to array jobs to interactive jobs to affinity/NUMA-aware scheduling. As a result, schedulers have many \"knobs and dials\" to support various applications and use cases. While complicated, all of this configurability makes them extremely powerful and flexible in the hands of a skilled cluster administrator.\n\n### Some notes on terminology\n\nHPC terminology can be confusing because different terms sometimes refer to the same thing. Nextflow refers to individual steps in a workflow as a \"process\". Sometimes, process steps spawned by Nextflow are also described as \"tasks\". When Nextflow processes are dispatched to an HPC workload manager, however, each process is managed as a \"job\" in the context of the workload manager.\n\nHPC workload managers are sometimes referred to as schedulers. In this text, we use the terms HPC workload manager, workload manager, and scheduler interchangeably.\n\n## Nextflow and HPC workload managers\n\nNextflow supports at least **14 workload managers**, not including popular cloud-based compute services. This number is even higher if one counts variants of popular schedulers. For example, the Grid Engine executor works with Altair® Grid Engine™ as well as older Grid Engine dialects, including Oracle Grid Engine (previously Sun Grid Engine), Open Grid Engine (OGE), and SoGE (son of Grid Engine). Similarly, the PBS integration works successors to the original OpenPBS project, including Altair® PBS Professional®, TORQUE, and Altair's more recent open-source version, OpenPBS.4 Workload managers supported by Nextflow are listed below:\n\n\n\nBelow we present some helpful tips and best practices when working with HPC workload managers.\n\n## Some best practices\n\n### 1. Select an HPC executor\n\nTo ensure that pipelines are portable across clouds and HPC clusters, Nextflow uses the notion of [executor](https://nextflow.io/docs/latest/executor.html) to insulate pipelines from the underlying compute environment. A Nextflow executor determines the system where a pipeline is run and supervises its execution.\n\nYou can specify the executor to use in the [nextflow.config](https://nextflow.io/docs/latest/config.html?highlight=queuesize#configuration-file) file, inline in your pipeline code, or by setting the shell variable `NXF_EXECUTOR` before running a pipeline.\n\n```groovy\nprocess.executor = 'slurm'\n```\n\nExecutors are defined as part of the process scope in Nextflow, so in theory, each process can have a different executor. You can use the [local](https://www.nextflow.io/docs/latest/executor.html?highlight=local#local) executor to run a process on the same host as the Nextflow head job rather than dispatching it to an HPC cluster.\n\nA complete list of available executors is available in the [Nextflow documentation](https://nextflow.io/docs/latest/executor.html). Below is a handy list of executors for HPC workload managers.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Workload Manager\n \n Executor\n \n License\n \n Documentation\n
\n Slurm\n \n slurm\n \n Open source\n \n Slurm\n
\n IBM Spectrum LSF\n \n lsf\n \n Commercial\n \n IBM Spectrum LSF knowledge center\n
\n OpenPBS\n \n pbspro\n \n Open source\n \n OpenPBS (docs packaged with software)\n
\n Altair® Grid Engine™\n \n sge\n \n Commercial\n \n Altair Grid Engine introductory guide\n
\n Altair® PBS Professional®\n \n pbspro\n \n Commercial\n \n Altair PBS Professional user's guide\n
\n Adaptive Computing MOAB\n \n moab\n \n Commercial\n \n Adaptive Computing Maui Scheduler5\n
\n Adaptive Computing TORQUE\n \n pbs\n \n Open source\n \n Torque administrators guide\n
\n HTCondor\n \n condor\n \n Open source\n \n HTCondor documentation\n
\n Apache Ignite\n \n ignite\n \n Open source\n \n Apache Ignite Documentation\n
\n HyperQueue\n \n hyperqueue\n \n Open source\n \n Docs on GitHub\n
\n\n### 2. Select a queue\n\nMost HPC workload managers support the notion of queues. In a small cluster with a few users, queues may not be important. However, they are essential in large environments. Cluster administrators typically configure queues to reflect site-specific scheduling and resource-sharing policies. For example, a site may have a short queue that only supports short-running jobs and kills them after 60 seconds. A _night_ queue may only dispatch jobs between midnight and 6:00 AM. Depending on the sophistication of the workload manager, different queues may have different priorities and access to queues may be limited to particular users or groups.\n\nWorkload managers typically have default queues. For example, `normal` is the default queue in LSF, while `all.q` is the default queue in Grid Engine. Slurm supports the notion of partitions that are essentially the same as queues, so Slurm partitions are referred to as queues within Nextflow. You should ask your HPC cluster administrator what queue to use when submitting Nextflow jobs.\n\nLike the executor, queues are part of the process scope. The queue to dispatch jobs to is usually defined once in the `nextflow.config` file and applied to all processes in the workflow as shown below, or it can be set per-process.\n\n```\nprocess {\n queue = 'myqueue'\n executor = 'sge'\n}\n```\n\nSome organizations use queues as a mechanism to request particular types of resources. For example, suppose hosts with the latest NVIDIA A100 or K100 GPUs are in high demand. In that case, a cluster administrator may configure a particular queue called `gpu_queue` to dispatch jobs to those hosts and limit access to specific users. For process steps requiring access to GPUs, the administrator may require submitting jobs to this queue. This is why it is important to consult site-specific documentation or ask your cluster administrator which queues are available.\n\n### 3. Specify process-level resource requirements\n\nDepending on the executor, you can pass various resource requirements for each process/job to the workload manager. Like _executors_ and _queues_, these settings are configured at the process level. Not all executors support the same resource directives, but the settings below are common to most HPC workload managers.\n\n[cpus](https://nextflow.io/docs/latest/process.html#process-cpus) – specifies the number of logical CPUs requested for a particular process/job. A logical CPU maps to a physical processor core or thread depending on whether hyperthreading is enabled on the underlying cluster hosts.\n\n[memory](https://nextflow.io/docs/latest/process.html#process-memory) – different process steps/jobs will typically have different memory requirements. It is important to specify memory requirements accurately because the HPC schedulers use this information to decide how many jobs can execute concurrently on a host. If you overstate resource requirements, you are wasting resources on the cluster.\n\n[time](https://nextflow.io/docs/latest/process.html#process-time) – it is helpful to limit how much time a particular process or job is allowed to run. To avoid jobs hanging and consuming resources indefinitely, you can specify a time limit after which a job will be automatically terminated and re-queued. Time limits may also be enforced at the queue level behind the scenes based on workload management policies. If you have long-running jobs, your cluster administrator may ask you to use a particular queue for those Nextflow process steps to prevent jobs from being automatically killed.6\n\nWhen writing pipelines, it is a good practice to consolidate per-process resource requirements in the `nextflow.config` file, and use process selectors to indicate what resource requirements apply to what process steps. For example, in the example below, processes will be dispatched to the Slurm cluster by default. Each process will require two cores, 4 GB of memory, and can run for no more than 10 minutes. For the foo and long-running bar jobs, process-specific selectors can override these default settings as shown below:\n\n```groovy\nprocess {\n executor = 'slurm'\n queue = 'general'\n cpus = 2\n memory = '4 GB'\n time = '10m'\n\n\n withName: foo {\n cpus = 8\n memory = '8 GB'\n }\n\n\n withName: bar {\n queue = 'long'\n cpus = 32\n memory = '8 GB'\n time = '1h 30m'\n }\n}\n```\n\n### 4. Take advantage of workload manager-specific features\n\nSometimes, organizations may want to take advantage of syntax specific to a particular workload manager. To accommodate this, most Nextflow executors provide a `clusterOptions` setting to inject one or more switches to the job submission command line specific to the selected workload manager ([bsub](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options), [msub](http://docs.adaptivecomputing.com/maui/commands/msub.php), [qsub](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html), etc).\n\nThese scheduler-specific commands can get very detailed and granular. They can apply to all processes in a workflow or only to specific processes. As an LSF-specific example, suppose a deep learning model training workload is a step in a Nextflow pipeline. The deep learning framework used may be GPU-aware and have specific topology requirements.\n\nIn this example, we specify a job consisting of two tasks where each task runs on a separate host and requires exclusive use of two GPUs. We also impose a resource requirement that we want to schedule the CPU portion of each CUDA job in physical proximity to the GPU to improve performance (on a processor core close to the same PCIe or NVLink connection, for example).\n\n```groovy\nprocess {\n withName: dl_workload {\n executor = 'lsf'\n queue = 'gpu_hosts'\n memory = '16B'\n clusterOptions = '-gpu \"num=2:mode=exclusive_process\" -n2 -R \"span[ptile=1] affinity[core(1)]\"'\n }\n}\n```\n\nIn addition to `clusterOptions`, several other settings in the [executor scope](https://nextflow.io/docs/latest/config.html?highlight=queuesize#scope-executor) can be helpful when controlling how jobs behave on an HPC workload manager.\n\n### 5. Decide where to launch your pipeline\n\nLaunching jobs from a head node is common in small HPC clusters. Launching jobs from dedicated job submission hosts (sometimes called login hosts) is more common in large environments. Depending on the workload manager, the head node or job submission host will usually have the workload manager’s client tools pre-installed. These include client binaries such as `sbatch` (Slurm), `qsub` (PBS or Grid Engine), or `bsub` (LSF). Nextflow expects to be able to find these job submission commands on the Linux `PATH`.\n\nRather than launching the Nextflow driver job for a long-running pipeline from the head node or a job submission host, a better practice is to wrap the Nextflow run command in a script and submit the entire workflow as a job. An example using LSF is provided below:\n\n```\n$ cat submit_pipeline.sh\n#!/bin/bash\n#BSUB -q Nextflow\n#BSUB -m \"hostgroupA\"\n#BSUB -o out.%J\n#BSUB -e err.%J\n#BSUB -J headjob\n#BSUB -R \"rusage[mem=16GB]\"\nnextflow run nextflow=io/hello -c my.config -ansi-log false\n\n\n$ bsub < submit_pipeline.sh\n```\n\nThe specifics will depend on the cluster environment and how the environment is configured. For this to work, the job submission commands must also be available on the execution hosts to which the head job is dispatched. This is not always the case, so you should check with your HPC cluster administrator.\n\nDepending on the workload manager, check your queue or cluster configuration to ensure that submitted jobs can spawn other jobs and that you do not bump up against hard limits. For example, Slurm by default allows a job step to spawn up to 512 tasks per node by default.7\n\n### 6. Limit your heap size\n\nSetting the JVM’s max heap size is another good practice when running on an HPC cluster. The Nextflow runtime runs on top of a Java virtual machine which by design, tries to allocate as much memory as possible. To avoid this, specify the maximum amount of memory that can be used by the Java VM using the `-Xms` and `-Xmx` Java flags.\n\nThese can be specified using the `NXF_OPTS` environment variable.\n\n```bash\nexport NFX_OPTS=\"-Xms=512m -Xmx=8g\"\n```\n\nThe `-Xms` flag specifies the minimum heap size, and -Xmx specifies the maximum heap size. In the example above, the minimum heap size is set to 512 MB, which can grow to a maximum of 8 GB. You will need to experiment with appropriate values for each pipeline to determine how many concurrent head jobs you can run on the same host.\n\nFor more information about memory management with Java, consult this [Oracle documentation regarding tuning JVMs](https://docs.oracle.com/cd/E21764_01/web.1111/e13814/jvm_tuning.htm#PERFM150).\n\n### 7. Use the scratch directive\n\nNextflow requires a shared file system path as a working directory to allow the pipeline tasks to share data with each other. When using this model, a common practice is to use the node's local scratch storage as the working directory. This avoids cluster nodes needing to simultaneously read and write files to a shared network file system, which can become a bottleneck.\n\nNextflow implements this best practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```groovy\nprocess.scratch = true\n```\n\nBy default, if you enable `process.scratch`, Nextflow will use the directory pointed to by `$TMPDIR` as a scratch directory on the execution host.\n\nYou can optionally specify a specific path for the scratch directory as shown:\n\n```groovy\nprocess.scratch = '/ssd_drive/scratch_dir'\n```\n\nWhen the scratch directive is enabled, Nextflow:\n\n- Creates a unique directory for process execution in the supplied scratch directory;\n- Creates a symbolic link in the scratch directory for each input file in the shared work directory required for job execution;\n- Runs the job using the local scratch path as the working directory;\n- Copies output files to the job's shared work directory on the shared file system when the job is complete.\n\nScratch storage is particularly beneficial for process steps that perform a lot of file system I/O or create large numbers of intermediate files.\n\nTo learn more about Nextflow and how it works with various storage architectures, including shared file systems, check out our recent article [Selecting the right storage architecture for your Nextflow pipelines](https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html).\n\n### 8. Launch pipelines in the background\n\nIf you are launching your pipeline from a login node or cluster head node, it is useful to run pipelines in the background without losing the execution output reported by Nextflow. You can accomplish this by using the -bg switch in Nextflow and redirecting _stdout_ to a log file as shown:\n\n```bash\nnextflow run -bg > my-file.log\n```\n\nThis frees up the interactive command line to run commands such as [squeue](https://slurm.schedmd.com/squeue.html) (Slurm) or [qstat](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html) (Grid Engine) to monitor job execution on the cluster. It is also beneficial because it prevents network connection issues from interfering with pipeline execution.\n\nNextflow has rich terminal logging and uses ANSI escape codes to update pipeline execution counters interactively as the pipeline runs. If you are logging output to a file as shown above, it is a good idea to disable ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`. ANSI logging can also be disabled when wrapping the Nextflow head job in a script and launching it as a job managed by the workload manager as explained above.\n\n### 9. Retry failing jobs after increasing resource allocation\n\nGetting resource requirements such as cpu, memory, and time is often challenging since resource requirements can vary depending on the size of the dataset processed by each job step. If you request too much resource, you end up wasting resources on the cluster and reducing the effectiveness of the compute environment for everyone. On the other hand, if you request insufficient resources, process steps can fail.\n\nTo address this problem, Nextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure on the fly and attempt to re-execute it using a higher limit. For example:\n\n```groovy\nprocess {\n withName: foo {\n memory = { 2.GB * task.attempt }\n time = { 1.hour * task.attempt }\n\n errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries = 3\n }\n}\n```\n\nYou can manage how many times a job can be retried and specify different behaviours depending on the exit error code. You will see this automated mechanism used in many production pipelines. It is a common practice to double the resources requested after a failure until the job runs successfully.\n\nFor sites running Nextflow Tower, Tower has a powerful resource optimization facility built in that essentially learns per-process resource requirements from previously executed pipelines and auto-generates resource requirements that can be placed in a pipeline's `nextflow.config` file. By using resource optimization in Tower, pipelines will request only the resources that they actually need. This avoids unnecessary delays due to failed/retried jobs and also uses the shared cluster more efficiently.\n\nTower resource optimizations works with all HPC workload managers as well as popular cloud services. You can learn more about resource optimization in the article [Optimizing resource usage with Nextflow Tower](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/).\n\n### 10. Cloud Bursting\n\nCloud bursting is a configuration method in hybrid cloud environments where cloud computing resources are used automatically whenever on-premises infrastructure reaches peak capacity. The idea is that when sites run out of compute capacity on their local infrastructure, they can dynamically burst additional workloads to the cloud.\n\nWith its built-in support for cloud executors, Nextflow handles bursting to the cloud with ease, but it is important to remember that large HPC sites run other workloads beyond Nextflow pipelines. As such, they often have their own bursting solutions tightly coupled to the workload manager.\n\nCommercial HPC schedulers tend to have facilities for cloud bursting built in. While there are many ways to enable burstings, and implementations vary by workload manager, a few examples are provided here:\n\n- Open source Slurm provides a native mechanism to burst workloads to major cloud providers when local cluster resources are fully subscribed. To learn more, see the Slurm [Cloud Scheduling Guide](https://slurm.schedmd.com/elastic_computing.html).\n- IBM Spectrum LSF provides a cloud resource connector enabling policy-driven cloud bursting to various clouds. See the [IBM Spectrum LSF Resource Connector](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsf-resource-connnector) documentation for details.\n- Altair PBS Professional also provides sophisticated support for cloud bursting to multiple clouds, with cloud cost integration features that avoid overspending in the cloud. [See PBS Professional 2022.1](https://altair.service-now.com/community?sys_id=0e9b07dadbf8d150cfd5f6a4e2961997&view=sp&id=community_blog&table=sn_communities_blog).\n- Adaptive Computing offers [Moab Cloud/NODUS Cloud Bursting](https://support.adaptivecomputing.com/wp-content/uploads/2018/08/Moab_Cloud-NODUS_Cloud_Bursting_datasheet_web.pdf), a commercial offering that works with an extensive set of resource providers including AliCloud, OCI, OpenStack, VMware vSphere, and others.\n\nData handling makes cloud bursting complex. Some HPC centers deploy solutions that provide a consistent namespace where on-premises and cloud-based nodes have a consistent view of a shared file system.\n\nIf you are in a larger facility, it's worth having a discussion with your HPC cluster administrator. Cloud bursting may be handled automatically for you. You may be able to use the executor associated with your on-premises workload manager, and simply point your workloads to a particular queue. The good news is that Nextflow provides you with tremendous flexibility.\n\n### 11. Fusion file system\n\nTraditionally, on-premises clusters have used a local shared file system such as NFS or Lustre. The new Fusion file system provides an alternative way to manage data.\n\nFusion is a lightweight, POSIX-compliant file system deployed inside containers that provides transparent access to cloud-based object stores such as Amazon S3. While users running pipelines on local clusters may not have considered using cloud storage, doing so has some advantages:\n\n- Cloud object storage is economical for long-term storage.\n- Object stores such as Amazon S3 provided virtually unlimited capacity.\n- Many reference datasets in life sciences already reside in cloud object stores.\n\nIn cloud computing environments, Fusion FS has demonstrated that it can improve pipeline throughput by up to **2.2x** and reduce long-term cloud storage costs by up to **76%**. To learn more about Fusion file systems and how it works, you can download the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nRecently, Fusion support has been added for selected HPC workload managers including Slurm, IBM Spectrum LSF, and Grid Engine. This is an exciting development as it enables on-premises cluster users to seamlessly run workload locally using cloud-based storage with minimal configuration effort.\n\n### 12. Additional configuration options\n\nThere are several additional Nextflow configuration options that are important to be aware of when working with HPC clusters. You can find a complete list in the Netflow documentation in the [Scope executor](https://nextflow.io/docs/latest/config.html#scope-executor) section.\n\n`queueSize` – The queueSize parameter is optionally defined in the `nextflow.config` file or within a process and defines how many Nextflow processes can be queued in the selected workload manager at a given time. By default, this value is set to 100 jobs. In large sites with multiple users, HPC cluster administrators may limit the number of pending or executing jobs per user on the cluster. For example, on an LSF cluster, this is done by setting the parameter `MAX_JOBS` in the `lsb.users` file to enforce per user or per group slot limits. If your administrators have placed limits on the number of jobs you can run, you should tune the `queueSize` parameter in Nextflow to match your site enforced maximums.\n\n`submitRateLimit` – Depending on the scheduler, having many users simultaneously submitting large numbers of jobs to a cluster can overwhelm the scheduler on the head node and cause it to become unresponsive to commands. To mitigate this, if your pipeline submits a large number of jobs, it is a good practice to throttle the rate at which jobs will be dispatched from Nextflow. By default the job submission rate is unlimited. If you wanted to allow no more than 50 jobs to be submitted every two minutes, set this parameter as shown:\n\n```groovy\nexecutor.submitRateLimit = '50/2min'\nexecutor.queueSize = 50\n```\n\n`jobName` – Many workload managers have interactive web interfaces or downstream reporting or analysis tools for monitoring or analyzing workloads. A few examples include [Slurm-web](http://rackslab.github.io/slurm-web/introduction.html), [MOAB HPC Suite](https://adaptivecomputing.com/moab-hpc-suite/) (MOAB and Torque), [Platform Management Console](https://www.ibm.com/docs/en/pasc/1.1.1?topic=asc-platform-management-console) (for LSF), [Spectrum LSF RTM](https://www.ibm.com/docs/en/spectrum-lsf-rtm/10.2.0?topic=about-spectrum-lsf-rtm), and [Altair® Access™](https://altair.com/access).\n\nWhen using these tools, it is helpful to associate a meaningful name with each job. Remember, a job in the context of the workload manager maps to a process or task in Nextflow. Use the `jobName` property associated with the executor to give your job a name. You can construct these names dynamically as illustrated below so the job reported by the workload manager reflects the name of our Nextflow process step and its unique ID.\n\n```groovy\nexecutor.jobName = { \"$task.name - $task.hash\" }\n```\n\nYou will need to make sure that generated name matches the validation constraints of the underlying workload manager. This also makes troubleshooting easier because it allows you to cross reference Nextflow log files with files generated by the workload manager.\n\n## The bottom line\n\nIn addition to supporting major cloud environments, Nextflow works seamlessly with a wide variety of on-premises workload managers. If you are fortunate enough to have access to large-scale compute infrastructure at your facility, taking advantage of these powerful HPC workload management integrations is likely the way to go.\n\n
\n\n1While this may sound like a contradiction, remember that HPC workload managers can also run in the cloud.\n\n2A cloud vCPU is equivalent to a thread on a multicore CPU, and HPC workloads often run with hyperthreading disabled for the best performance. As a result, you may need 64 vCPUs in the cloud to match the performance of a 32-core processor SKU on-premises. Similarly, interconnects such as Amazon Elastic Fabric Adapter (EFA) deliver impressive performance. However, even with high-end cloud instance types, its 100 Gbps throughput falls short compared to interconnects such as [NDR InfiniBand](https://www.hpcwire.com/2020/11/16/nvidia-mellanox-debuts-ndr-400-gigabit-infiniband-at-sc20/) and [HPE Cray Slingshot](https://www.nextplatform.com/2022/01/31/crays-slingshot-interconnect-is-at-the-heart-of-hpes-hpc-and-ai-ambitions/), delivering 400 Gbps or more.\n\n3While MPI parallel jobs are less common in Nextflow pipelines, sites may also run fluid dynamics, computational chemistry, or molecular dynamics workloads using tools such as [NWChem](https://www.nwchem-sw.org/) or [GROMACS](https://www.gromacs.org/) that rely on MPI and fast interconnects to facilitate efficient inter-node communication.\n\n4Altair’s open-source OpenPBS is distinct from the original OpenPBS project released in 1998 of the same name.\n\n5MOAB HPC is a commercial product offered by Adaptive Computing. Its scheduler is based on the open source Maui scheduler.\n\n6For some workload managers, knowing how much time a job is expected to run is considered in scheduling algorithms. For example, suppose it becomes necessary to preempt particular jobs when higher priority jobs come along or because of resource ownership issues. In that case, the scheduler may take into account actual vs. estimated runtime to avoid terminating long-running jobs that are closed to completion.\n\n7[MaxTasksPerNode](https://slurm.schedmd.com/slurm.conf.html#OPT_MaxTasksPerNode) setting is configurable in the slurm.conf file.\n\n8Jobs that request large amounts of resource often pend in queues and take longer to schedule impacting productivity as there may be fewer candidate hosts available that meet the job’s resource requirement.\n", + "content": "With all the focus on cloud computing, it's easy to forget that most Nextflow pipelines still run on traditional HPC clusters. In fact, according to our latest [State of the Workflow 2023](https://seqera.io/blog/the-state-of-the-workflow-the-2023-nextflow-and-nf-core-community-survey/) community survey, **62.8%** of survey respondents report running Nextflow on HPC clusters, and **75%** use an HPC workload manager.^1^ While the cloud is making gains, traditional clusters aren't going away anytime soon.\n\nTapping cloud infrastructure offers many advantages in terms of convenience and scalability. However, for organizations with the capacity to manage in-house clusters, there are still solid reasons to run workloads locally:\n\n- _Guaranteed access to resources_. Users don't need to worry about shortages of particular instance types, spot instance availability, or exceeding cloud spending caps.\n- _Predictable pricing_. Organizations are protected against price inflation and unexpected rate increases by capitalizing assets and depreciating them over time.\n- _Reduced costs_. Contrary to conventional wisdom, well-managed, highly-utilized, on-prem clusters are often less costly per core hour than cloud-based alternatives.\n- _Better performance and throughput_. While HPC infrastructure in the cloud is impressive, state-of-the-art on-prem clusters are still tough to beat.^2^\n\nThis article provides some helpful tips for organizations running Nextflow on HPC clusters.\n\n## The anatomy of an HPC cluster\n\nHPC Clusters come in many shapes and sizes. Some are small, consisting of a single head node and a few compute hosts, while others are huge, with tens or even hundreds of host computers.\n\nThe diagram below shows the topology of a typical mid-sized HPC cluster. Clusters typically have one or more \"head nodes\" that run workload and/or cluster management software. Cluster managers, such as [Warewulf](https://warewulf.lbl.gov/), [xCAT](https://xcat.org/), [NVIDIA Bright Cluster Manager](https://www.nvidia.com/en-us/data-center/bright-cluster-manager/), [HPE Performance Cluster Manager](https://www.hpe.com/psnow/doc/a00044858enw), or [IBM Spectrum Cluster Foundation](https://www.ibm.com/docs/en/scf/4.2.2?topic=guide-spectrum-cluster-foundation), are typically used to manage software images and provision cluster nodes. Large clusters may have multiple head nodes, with workload management software configured to failover if the master host fails.\n\n\n\nLarge clusters may have dedicated job submission hosts (also called login hosts) so that user activity does not interfere with scheduling and management activities on the head node. In smaller environments, users may simply log in to the head node to submit their jobs.\n\nClusters are often composed of different compute hosts suited to particular workloads.^3^ They may also have separate dedicated networks for management, internode communication, and connections to a shared storage subsystem. Users typically have network access only to the head node(s) and job submission hosts and are prevented from connecting to the compute hosts directly.\n\nDepending on the workloads a cluster is designed to support, compute hosts may be connected via a private high-speed 100 GbE or Infiniband-based network commonly used for MPI parallel workloads. Cluster hosts typically have access to a shared file system as well. In life sciences environments, NFS filers are commonly used. However, high-performance clusters may use parallel file systems such as [Lustre](https://www.lustre.org/), [IBM Spectrum Scale](https://www.ibm.com/docs/en/storage-scale?topic=STXKQY/gpfsclustersfaq.html) (formerly GPFS), [BeeGFS](https://www.beegfs.io/c/), or [WEKA](https://www.weka.io/data-platform/solutions/hpc-data-management/).\n\n[Learn about selecting the right storage architecture for your Nextflow pipelines](https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html).\n\n## HPC workload managers\n\nHPC workload managers have been around for decades. Initial efforts date back to the original [Portable Batch System](https://www.chpc.utah.edu/documentation/software/pbs-scheduler.php) (PBS) developed for NASA in the early 1990s. While modern workload managers have become enormously sophisticated, many of their core principles remain unchanged.\n\nWorkload managers are designed to share resources efficiently between users and groups. Modern workload managers support many different scheduling policies and workload types — from parallel jobs to array jobs to interactive jobs to affinity/NUMA-aware scheduling. As a result, schedulers have many \"knobs and dials\" to support various applications and use cases. While complicated, all of this configurability makes them extremely powerful and flexible in the hands of a skilled cluster administrator.\n\n### Some notes on terminology\n\nHPC terminology can be confusing because different terms sometimes refer to the same thing. Nextflow refers to individual steps in a workflow as a \"process\". Sometimes, process steps spawned by Nextflow are also described as \"tasks\". When Nextflow processes are dispatched to an HPC workload manager, however, each process is managed as a \"job\" in the context of the workload manager.\n\nHPC workload managers are sometimes referred to as schedulers. In this text, we use the terms HPC workload manager, workload manager, and scheduler interchangeably.\n\n## Nextflow and HPC workload managers\n\nNextflow supports at least **14 workload managers**, not including popular cloud-based compute services. This number is even higher if one counts variants of popular schedulers. For example, the Grid Engine executor works with Altair® Grid Engine™ as well as older Grid Engine dialects, including Oracle Grid Engine (previously Sun Grid Engine), Open Grid Engine (OGE), and SoGE (son of Grid Engine). Similarly, the PBS integration works successors to the original OpenPBS project, including Altair® PBS Professional®, TORQUE, and Altair's more recent open-source version, OpenPBS.^4^ Workload managers supported by Nextflow are listed below:\n\n\n\nBelow we present some helpful tips and best practices when working with HPC workload managers.\n\n## Some best practices\n\n### 1. Select an HPC executor\n\nTo ensure that pipelines are portable across clouds and HPC clusters, Nextflow uses the notion of [executor](https://nextflow.io/docs/latest/executor.html) to insulate pipelines from the underlying compute environment. A Nextflow executor determines the system where a pipeline is run and supervises its execution.\n\nYou can specify the executor to use in the [nextflow.config](https://nextflow.io/docs/latest/config.html?highlight=queuesize#configuration-file) file, inline in your pipeline code, or by setting the shell variable `NXF_EXECUTOR` before running a pipeline.\n\n```groovy\nprocess.executor = 'slurm'\n```\n\nExecutors are defined as part of the process scope in Nextflow, so in theory, each process can have a different executor. You can use the [local](https://www.nextflow.io/docs/latest/executor.html?highlight=local#local) executor to run a process on the same host as the Nextflow head job rather than dispatching it to an HPC cluster.\n\nA complete list of available executors is available in the [Nextflow documentation](https://nextflow.io/docs/latest/executor.html). Below is a handy list of executors for HPC workload managers.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Workload Manager\n \n Executor\n \n License\n \n Documentation\n
\n Slurm\n \n [slurm](https://www.nextflow.io/docs/latest/executor.html?highlight=local#slurm)\n \n Open source\n \n [Slurm](https://slurm.schedmd.com/documentation.html)\n
\n IBM Spectrum LSF\n \n [lsf](https://www.nextflow.io/docs/latest/executor.html?highlight=local#lsf)\n \n Commercial\n \n [IBM Spectrum LSF knowledge center](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0)\n
\n OpenPBS\n \n [pbspro](https://www.nextflow.io/docs/latest/executor.html?highlight=local#pbs-pro)\n \n Open source\n \n [OpenPBS](https://github.com/openpbs/openpbs/tree/master/doc) (docs packaged with software)\n
\n Altair® Grid Engine™\n \n [sge](https://www.nextflow.io/docs/latest/executor.html?highlight=local#sge)\n \n Commercial\n \n [Altair Grid Engine introductory guide](https://2022.help.altair.com/2022.1.0/AltairGridEngine/IntroductionGE.pdf)\n
\n Altair® PBS Professional®\n \n [pbspro](https://www.nextflow.io/docs/latest/executor.html?highlight=local#pbs-pro)\n \n Commercial\n \n [Altair PBS Professional user's guide](https://slurm.schedmd.com/documentation.html)\n
\n Adaptive Computing MOAB\n \n [moab](https://www.nextflow.io/docs/latest/executor.html?highlight=local#moab)\n \n Commercial\n \n [Adaptive Computing Maui Scheduler](http://docs.adaptivecomputing.com/maui/)^5^\n
\n Adaptive Computing TORQUE\n \n [pbs](https://www.nextflow.io/docs/latest/executor.html?highlight=local#pbs-torque)\n \n Open source\n \n [Torque administrators guide](http://docs.adaptivecomputing.com/10-0-0/Torque/torque.htm#topics/torque/1-intro/introduction.htm)\n
\n HTCondor\n \n [condor](https://www.nextflow.io/docs/latest/executor.html?highlight=local#htcondor)\n \n Open source\n \n [HTCondor documentation](https://research.cs.wisc.edu/htcondor/htcondor/documentation/)\n
\n Apache Ignite\n \n [ignite](https://www.nextflow.io/docs/latest/executor.html?highlight=local#ignite)\n \n Open source\n \n [Apache Ignite Documentation](https://ignite.apache.org/docs/latest/)\n
\n HyperQueue\n \n [hyperqueue](https://www.nextflow.io/docs/latest/executor.html?highlight=local#hyperqueue)\n \n Open source\n \n [Docs on GitHub](https://github.com/It4innovations/hyperqueue)\n
\n\n### 2. Select a queue\n\nMost HPC workload managers support the notion of queues. In a small cluster with a few users, queues may not be important. However, they are essential in large environments. Cluster administrators typically configure queues to reflect site-specific scheduling and resource-sharing policies. For example, a site may have a short queue that only supports short-running jobs and kills them after 60 seconds. A _night_ queue may only dispatch jobs between midnight and 6:00 AM. Depending on the sophistication of the workload manager, different queues may have different priorities and access to queues may be limited to particular users or groups.\n\nWorkload managers typically have default queues. For example, `normal` is the default queue in LSF, while `all.q` is the default queue in Grid Engine. Slurm supports the notion of partitions that are essentially the same as queues, so Slurm partitions are referred to as queues within Nextflow. You should ask your HPC cluster administrator what queue to use when submitting Nextflow jobs.\n\nLike the executor, queues are part of the process scope. The queue to dispatch jobs to is usually defined once in the `nextflow.config` file and applied to all processes in the workflow as shown below, or it can be set per-process.\n\n```\nprocess {\n queue = 'myqueue'\n executor = 'sge'\n}\n```\n\nSome organizations use queues as a mechanism to request particular types of resources. For example, suppose hosts with the latest NVIDIA A100 or K100 GPUs are in high demand. In that case, a cluster administrator may configure a particular queue called `gpu_queue` to dispatch jobs to those hosts and limit access to specific users. For process steps requiring access to GPUs, the administrator may require submitting jobs to this queue. This is why it is important to consult site-specific documentation or ask your cluster administrator which queues are available.\n\n### 3. Specify process-level resource requirements\n\nDepending on the executor, you can pass various resource requirements for each process/job to the workload manager. Like _executors_ and _queues_, these settings are configured at the process level. Not all executors support the same resource directives, but the settings below are common to most HPC workload managers.\n\n[cpus](https://nextflow.io/docs/latest/process.html#process-cpus) – specifies the number of logical CPUs requested for a particular process/job. A logical CPU maps to a physical processor core or thread depending on whether hyperthreading is enabled on the underlying cluster hosts.\n\n[memory](https://nextflow.io/docs/latest/process.html#process-memory) – different process steps/jobs will typically have different memory requirements. It is important to specify memory requirements accurately because the HPC schedulers use this information to decide how many jobs can execute concurrently on a host. If you overstate resource requirements, you are wasting resources on the cluster.\n\n[time](https://nextflow.io/docs/latest/process.html#process-time) – it is helpful to limit how much time a particular process or job is allowed to run. To avoid jobs hanging and consuming resources indefinitely, you can specify a time limit after which a job will be automatically terminated and re-queued. Time limits may also be enforced at the queue level behind the scenes based on workload management policies. If you have long-running jobs, your cluster administrator may ask you to use a particular queue for those Nextflow process steps to prevent jobs from being automatically killed.^6^\n\nWhen writing pipelines, it is a good practice to consolidate per-process resource requirements in the `nextflow.config` file, and use process selectors to indicate what resource requirements apply to what process steps. For example, in the example below, processes will be dispatched to the Slurm cluster by default. Each process will require two cores, 4 GB of memory, and can run for no more than 10 minutes. For the foo and long-running bar jobs, process-specific selectors can override these default settings as shown below:\n\n```groovy\nprocess {\n executor = 'slurm'\n queue = 'general'\n cpus = 2\n memory = '4 GB'\n time = '10m'\n\n withName: foo {\n cpus = 8\n memory = '8 GB'\n }\n\n withName: bar {\n queue = 'long'\n cpus = 32\n memory = '8 GB'\n time = '1h 30m'\n }\n}\n```\n\n### 4. Take advantage of workload manager-specific features\n\nSometimes, organizations may want to take advantage of syntax specific to a particular workload manager. To accommodate this, most Nextflow executors provide a `clusterOptions` setting to inject one or more switches to the job submission command line specific to the selected workload manager ([bsub](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options), [msub](http://docs.adaptivecomputing.com/maui/commands/msub.php), [qsub](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html), etc).\n\nThese scheduler-specific commands can get very detailed and granular. They can apply to all processes in a workflow or only to specific processes. As an LSF-specific example, suppose a deep learning model training workload is a step in a Nextflow pipeline. The deep learning framework used may be GPU-aware and have specific topology requirements.\n\nIn this example, we specify a job consisting of two tasks where each task runs on a separate host and requires exclusive use of two GPUs. We also impose a resource requirement that we want to schedule the CPU portion of each CUDA job in physical proximity to the GPU to improve performance (on a processor core close to the same PCIe or NVLink connection, for example).\n\n```groovy\nprocess {\n withName: dl_workload {\n executor = 'lsf'\n queue = 'gpu_hosts'\n memory = '16B'\n clusterOptions = '-gpu \"num=2:mode=exclusive_process\" -n2 -R \"span[ptile=1] affinity[core(1)]\"'\n }\n}\n```\n\nIn addition to `clusterOptions`, several other settings in the [executor scope](https://nextflow.io/docs/latest/config.html?highlight=queuesize#scope-executor) can be helpful when controlling how jobs behave on an HPC workload manager.\n\n### 5. Decide where to launch your pipeline\n\nLaunching jobs from a head node is common in small HPC clusters. Launching jobs from dedicated job submission hosts (sometimes called login hosts) is more common in large environments. Depending on the workload manager, the head node or job submission host will usually have the workload manager’s client tools pre-installed. These include client binaries such as `sbatch` (Slurm), `qsub` (PBS or Grid Engine), or `bsub` (LSF). Nextflow expects to be able to find these job submission commands on the Linux `PATH`.\n\nRather than launching the Nextflow driver job for a long-running pipeline from the head node or a job submission host, a better practice is to wrap the Nextflow run command in a script and submit the entire workflow as a job. An example using LSF is provided below:\n\n```\n$ cat submit_pipeline.sh\n#!/bin/bash\n#BSUB -q Nextflow\n#BSUB -m \"hostgroupA\"\n#BSUB -o out.%J\n#BSUB -e err.%J\n#BSUB -J headjob\n#BSUB -R \"rusage[mem=16GB]\"\nnextflow run nextflow=io/hello -c my.config -ansi-log false\n\n$ bsub < submit_pipeline.sh\n```\n\nThe specifics will depend on the cluster environment and how the environment is configured. For this to work, the job submission commands must also be available on the execution hosts to which the head job is dispatched. This is not always the case, so you should check with your HPC cluster administrator.\n\nDepending on the workload manager, check your queue or cluster configuration to ensure that submitted jobs can spawn other jobs and that you do not bump up against hard limits. For example, Slurm by default allows a job step to spawn up to 512 tasks per node by default.^7^\n\n### 6. Limit your heap size\n\nSetting the JVM’s max heap size is another good practice when running on an HPC cluster. The Nextflow runtime runs on top of a Java virtual machine which by design, tries to allocate as much memory as possible. To avoid this, specify the maximum amount of memory that can be used by the Java VM using the `-Xms` and `-Xmx` Java flags.\n\nThese can be specified using the `NXF_OPTS` environment variable.\n\n```bash\nexport NFX_OPTS=\"-Xms=512m -Xmx=8g\"\n```\n\nThe `-Xms` flag specifies the minimum heap size, and -Xmx specifies the maximum heap size. In the example above, the minimum heap size is set to 512 MB, which can grow to a maximum of 8 GB. You will need to experiment with appropriate values for each pipeline to determine how many concurrent head jobs you can run on the same host.\n\nFor more information about memory management with Java, consult this [Oracle documentation regarding tuning JVMs](https://docs.oracle.com/cd/E21764_01/web.1111/e13814/jvm_tuning.htm#PERFM150).\n\n### 7. Use the scratch directive\n\nNextflow requires a shared file system path as a working directory to allow the pipeline tasks to share data with each other. When using this model, a common practice is to use the node's local scratch storage as the working directory. This avoids cluster nodes needing to simultaneously read and write files to a shared network file system, which can become a bottleneck.\n\nNextflow implements this best practice which can be enabled by adding the following setting in your `nextflow.config` file.\n\n```groovy\nprocess.scratch = true\n```\n\nBy default, if you enable `process.scratch`, Nextflow will use the directory pointed to by `$TMPDIR` as a scratch directory on the execution host.\n\nYou can optionally specify a specific path for the scratch directory as shown:\n\n```groovy\nprocess.scratch = '/ssd_drive/scratch_dir'\n```\n\nWhen the scratch directive is enabled, Nextflow:\n\n- Creates a unique directory for process execution in the supplied scratch directory;\n- Creates a symbolic link in the scratch directory for each input file in the shared work directory required for job execution;\n- Runs the job using the local scratch path as the working directory;\n- Copies output files to the job's shared work directory on the shared file system when the job is complete.\n\nScratch storage is particularly beneficial for process steps that perform a lot of file system I/O or create large numbers of intermediate files.\n\nTo learn more about Nextflow and how it works with various storage architectures, including shared file systems, check out our recent article [Selecting the right storage architecture for your Nextflow pipelines](https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html).\n\n### 8. Launch pipelines in the background\n\nIf you are launching your pipeline from a login node or cluster head node, it is useful to run pipelines in the background without losing the execution output reported by Nextflow. You can accomplish this by using the -bg switch in Nextflow and redirecting _stdout_ to a log file as shown:\n\n```bash\nnextflow run -bg > my-file.log\n```\n\nThis frees up the interactive command line to run commands such as [squeue](https://slurm.schedmd.com/squeue.html) (Slurm) or [qstat](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html) (Grid Engine) to monitor job execution on the cluster. It is also beneficial because it prevents network connection issues from interfering with pipeline execution.\n\nNextflow has rich terminal logging and uses ANSI escape codes to update pipeline execution counters interactively as the pipeline runs. If you are logging output to a file as shown above, it is a good idea to disable ANSI logging using the command line option `-ansi-log false` or the environment variable `NXF_ANSI_LOG=false`. ANSI logging can also be disabled when wrapping the Nextflow head job in a script and launching it as a job managed by the workload manager as explained above.\n\n### 9. Retry failing jobs after increasing resource allocation\n\nGetting resource requirements such as cpu, memory, and time is often challenging since resource requirements can vary depending on the size of the dataset processed by each job step. If you request too much resource, you end up wasting resources on the cluster and reducing the effectiveness of the compute environment for everyone. On the other hand, if you request insufficient resources, process steps can fail.\n\nTo address this problem, Nextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure on the fly and attempt to re-execute it using a higher limit. For example:\n\n```groovy\nprocess {\n withName: foo {\n memory = { 2.GB * task.attempt }\n time = { 1.hour * task.attempt }\n\n errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries = 3\n }\n}\n```\n\nYou can manage how many times a job can be retried and specify different behaviours depending on the exit error code. You will see this automated mechanism used in many production pipelines. It is a common practice to double the resources requested after a failure until the job runs successfully.\n\nFor sites running Nextflow Tower, Tower has a powerful resource optimization facility built in that essentially learns per-process resource requirements from previously executed pipelines and auto-generates resource requirements that can be placed in a pipeline's `nextflow.config` file. By using resource optimization in Tower, pipelines will request only the resources that they actually need. This avoids unnecessary delays due to failed/retried jobs and also uses the shared cluster more efficiently.\n\nTower resource optimizations works with all HPC workload managers as well as popular cloud services. You can learn more about resource optimization in the article [Optimizing resource usage with Nextflow Tower](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/).\n\n### 10. Cloud Bursting\n\nCloud bursting is a configuration method in hybrid cloud environments where cloud computing resources are used automatically whenever on-premises infrastructure reaches peak capacity. The idea is that when sites run out of compute capacity on their local infrastructure, they can dynamically burst additional workloads to the cloud.\n\nWith its built-in support for cloud executors, Nextflow handles bursting to the cloud with ease, but it is important to remember that large HPC sites run other workloads beyond Nextflow pipelines. As such, they often have their own bursting solutions tightly coupled to the workload manager.\n\nCommercial HPC schedulers tend to have facilities for cloud bursting built in. While there are many ways to enable burstings, and implementations vary by workload manager, a few examples are provided here:\n\n- Open source Slurm provides a native mechanism to burst workloads to major cloud providers when local cluster resources are fully subscribed. To learn more, see the Slurm [Cloud Scheduling Guide](https://slurm.schedmd.com/elastic_computing.html).\n- IBM Spectrum LSF provides a cloud resource connector enabling policy-driven cloud bursting to various clouds. See the [IBM Spectrum LSF Resource Connector](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsf-resource-connnector) documentation for details.\n- Altair PBS Professional also provides sophisticated support for cloud bursting to multiple clouds, with cloud cost integration features that avoid overspending in the cloud. [See PBS Professional 2022.1](https://altair.service-now.com/community?sys_id=0e9b07dadbf8d150cfd5f6a4e2961997&view=sp&id=community_blog&table=sn_communities_blog).\n- Adaptive Computing offers [Moab Cloud/NODUS Cloud Bursting](https://support.adaptivecomputing.com/wp-content/uploads/2018/08/Moab_Cloud-NODUS_Cloud_Bursting_datasheet_web.pdf), a commercial offering that works with an extensive set of resource providers including AliCloud, OCI, OpenStack, VMware vSphere, and others.\n\nData handling makes cloud bursting complex. Some HPC centers deploy solutions that provide a consistent namespace where on-premises and cloud-based nodes have a consistent view of a shared file system.\n\nIf you are in a larger facility, it's worth having a discussion with your HPC cluster administrator. Cloud bursting may be handled automatically for you. You may be able to use the executor associated with your on-premises workload manager, and simply point your workloads to a particular queue. The good news is that Nextflow provides you with tremendous flexibility.\n\n### 11. Fusion file system\n\nTraditionally, on-premises clusters have used a local shared file system such as NFS or Lustre. The new Fusion file system provides an alternative way to manage data.\n\nFusion is a lightweight, POSIX-compliant file system deployed inside containers that provides transparent access to cloud-based object stores such as Amazon S3. While users running pipelines on local clusters may not have considered using cloud storage, doing so has some advantages:\n\n- Cloud object storage is economical for long-term storage.\n- Object stores such as Amazon S3 provided virtually unlimited capacity.\n- Many reference datasets in life sciences already reside in cloud object stores.\n\nIn cloud computing environments, Fusion FS has demonstrated that it can improve pipeline throughput by up to **2.2x** and reduce long-term cloud storage costs by up to **76%**. To learn more about Fusion file systems and how it works, you can download the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nRecently, Fusion support has been added for selected HPC workload managers including Slurm, IBM Spectrum LSF, and Grid Engine. This is an exciting development as it enables on-premises cluster users to seamlessly run workload locally using cloud-based storage with minimal configuration effort.\n\n### 12. Additional configuration options\n\nThere are several additional Nextflow configuration options that are important to be aware of when working with HPC clusters. You can find a complete list in the Netflow documentation in the [Scope executor](https://nextflow.io/docs/latest/config.html#scope-executor) section.\n\n`queueSize` – The queueSize parameter is optionally defined in the `nextflow.config` file or within a process and defines how many Nextflow processes can be queued in the selected workload manager at a given time. By default, this value is set to 100 jobs. In large sites with multiple users, HPC cluster administrators may limit the number of pending or executing jobs per user on the cluster. For example, on an LSF cluster, this is done by setting the parameter `MAX_JOBS` in the `lsb.users` file to enforce per user or per group slot limits. If your administrators have placed limits on the number of jobs you can run, you should tune the `queueSize` parameter in Nextflow to match your site enforced maximums.\n\n`submitRateLimit` – Depending on the scheduler, having many users simultaneously submitting large numbers of jobs to a cluster can overwhelm the scheduler on the head node and cause it to become unresponsive to commands. To mitigate this, if your pipeline submits a large number of jobs, it is a good practice to throttle the rate at which jobs will be dispatched from Nextflow. By default the job submission rate is unlimited. If you wanted to allow no more than 50 jobs to be submitted every two minutes, set this parameter as shown:\n\n```groovy\nexecutor.submitRateLimit = '50/2min'\nexecutor.queueSize = 50\n```\n\n`jobName` – Many workload managers have interactive web interfaces or downstream reporting or analysis tools for monitoring or analyzing workloads. A few examples include [Slurm-web](http://rackslab.github.io/slurm-web/introduction.html), [MOAB HPC Suite](https://adaptivecomputing.com/moab-hpc-suite/) (MOAB and Torque), [Platform Management Console](https://www.ibm.com/docs/en/pasc/1.1.1?topic=asc-platform-management-console) (for LSF), [Spectrum LSF RTM](https://www.ibm.com/docs/en/spectrum-lsf-rtm/10.2.0?topic=about-spectrum-lsf-rtm), and [Altair® Access™](https://altair.com/access).\n\nWhen using these tools, it is helpful to associate a meaningful name with each job. Remember, a job in the context of the workload manager maps to a process or task in Nextflow. Use the `jobName` property associated with the executor to give your job a name. You can construct these names dynamically as illustrated below so the job reported by the workload manager reflects the name of our Nextflow process step and its unique ID.\n\n```groovy\nexecutor.jobName = { \"$task.name - $task.hash\" }\n```\n\nYou will need to make sure that generated name matches the validation constraints of the underlying workload manager. This also makes troubleshooting easier because it allows you to cross reference Nextflow log files with files generated by the workload manager.\n\n## The bottom line\n\nIn addition to supporting major cloud environments, Nextflow works seamlessly with a wide variety of on-premises workload managers. If you are fortunate enough to have access to large-scale compute infrastructure at your facility, taking advantage of these powerful HPC workload management integrations is likely the way to go.\n\n---\n\n^1^While this may sound like a contradiction, remember that HPC workload managers can also run in the cloud.\n\n^2^A cloud vCPU is equivalent to a thread on a multicore CPU, and HPC workloads often run with hyperthreading disabled for the best performance. As a result, you may need 64 vCPUs in the cloud to match the performance of a 32-core processor SKU on-premises. Similarly, interconnects such as Amazon Elastic Fabric Adapter (EFA) deliver impressive performance. However, even with high-end cloud instance types, its 100 Gbps throughput falls short compared to interconnects such as [NDR InfiniBand](https://www.hpcwire.com/2020/11/16/nvidia-mellanox-debuts-ndr-400-gigabit-infiniband-at-sc20/) and [HPE Cray Slingshot](https://www.nextplatform.com/2022/01/31/crays-slingshot-interconnect-is-at-the-heart-of-hpes-hpc-and-ai-ambitions/), delivering 400 Gbps or more.\n\n^3^While MPI parallel jobs are less common in Nextflow pipelines, sites may also run fluid dynamics, computational chemistry, or molecular dynamics workloads using tools such as [NWChem](https://www.nwchem-sw.org/) or [GROMACS](https://www.gromacs.org/) that rely on MPI and fast interconnects to facilitate efficient inter-node communication.\n\n^4^Altair’s open-source OpenPBS is distinct from the original OpenPBS project released in 1998 of the same name.\n\n^5^MOAB HPC is a commercial product offered by Adaptive Computing. Its scheduler is based on the open source Maui scheduler.\n\n^6^For some workload managers, knowing how much time a job is expected to run is considered in scheduling algorithms. For example, suppose it becomes necessary to preempt particular jobs when higher priority jobs come along or because of resource ownership issues. In that case, the scheduler may take into account actual vs. estimated runtime to avoid terminating long-running jobs that are closed to completion.\n\n^7^[MaxTasksPerNode](https://slurm.schedmd.com/slurm.conf.html#OPT_MaxTasksPerNode) setting is configurable in the slurm.conf file.\n\n^8^Jobs that request large amounts of resource often pend in queues and take longer to schedule impacting productivity as there may be fewer candidate hosts available that meet the job’s resource requirement.\n", "images": [ "/img/nextflow-on-big-iron-twelve-tips-for-improving-the-effectiveness-of-pipelines-on-hpc-clusters-1.jpg", "/img/nextflow-on-big-iron-twelve-tips-for-improving-the-effectiveness-of-pipelines-on-hpc-clusters-2.jpg" @@ -516,7 +516,7 @@ "slug": "2023/celebrating-our-largest-international-training-event-and-hackathon-to-date", "title": "Celebrating our largest international training event and hackathon to date", "date": "2023-04-25T00:00:00.000Z", - "content": "\nIn mid-March, we conducted our bi-annual Nextflow and [nf-core](https://nf-co.re/) training and hackathon in what was unquestionably our best-attended community events to date. This year we had an impressive **1,345 participants** attend the training from **76 countries**. Attendees came from far and wide — from Algeria to Andorra to Zambia to Zimbabwe!\n\nAmong our event attendees, we observed the following statistics:\n\n- 40% were 30 years old or younger, pointing to a young cohort of Nextflow users;\n- 55.3% identified as male vs. 40% female, highlighting our growing diversity;\n- 68.2% came from research institutions;\n- 71.4% were attending their first Nextflow training event;\n- 96.7% had never attended a Nextflow hackathon.\n\nRead on to learn more about these exciting events. If you missed it, you can still [watch the Nextflow & nf-core training](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo) at your convenience.\n\n\n\n## Multilingual training\n\nThis year, we were pleased to offer [Nextflow / nf-core training](https://nf-co.re/events/2023/training-march-2023) in multiple languages: in addition to English, we delivered sessions in French, Hindi, Portuguese, and Spanish.\n\nIn our pre-event registration, **~88%** of respondents indicated they would watch the training in English. However, there turned out to be a surprising appetite for training in other languages. We hope that multilingual training will make Nextflow even more accessible to talented scientists and researchers around the world.\n\nThe training consisted of four separate sessions in **5 languages** for a total of **20 sessions**. As of April 19th, we’ve amassed over **6,600 YouTube views** with **2,300+ hours** of training watched so far. **27%** have watched the non-English sessions, making the effort at translation highly worthwhile.\n\nThank you to the following people who delivered the training: [Chris Hakkaart](https://twitter.com/Chris_Hakk) (English), [Marcel Ribeiro-Dantas](https://twitter.com/mribeirodantas) (Portuguese), [Maxime Garcia](https://twitter.com/gau) (French), [Julia Mir Pedrol](https://twitter.com/juliamirpedrol) and [Gisela Gabernet](https://twitter.com/GGabernet) (Spanish), and [Abhinav Sharma](https://twitter.com/abhi18av) (Hindi).\n\nYou can view the community training sessions on YouTube here:\n\n- [March 2023 Community Training – English](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo)\n- [March 2023 Community Training – Portugese](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhi41yDYhyHitUhIcUHIbJg)\n- [March 2023 Community Training – French](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhiv9SjhoA1EDOXj9nzIqdS)\n- [March 2023 Community Training – Spanish](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhSlCWVoa3GURacuLWeFc8O)\n- [March 2023 Community Training – Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ)\n\nThe videos accompany the written training material, which you can find at [https://training.nextflow.io/](https://training.nextflow.io/)\n\n## Improved community training resources\n\nAlong with the updated training and hackathon resources above, we’ve significantly enhanced our online training materials available at [https://training.nextflow.io/](https://training.nextflow.io/). Thanks to the efforts of our volunteers, technical training, [Gitpod resources](https://training.nextflow.io/basic_training/setup/#gitpod), and materials for hands-on, self-guided learning are now available in English and Portuguese. Some of the materials are also available in Spanish and French.\n\nThe training comprises a significant set of resources covering topics including managing dependencies, containers, channels, processes, operators, and an introduction to the Groovy language. It also includes topics related to nf-core for users and developers as well as Nextflow Tower. Marcel Ribeiro-Dantas describes his experience leading the translation effort for this documentation in his latest nf-core/bytesize [translation talk](https://nf-co.re/events/2023/bytesize_translations).\n\nAdditional educational resources are provided in the recent Seqera Labs blog article, [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html), posted in February before our latest training event.\n\n## The nf-core hackathon\n\nWe also ran a separate [hackathon](https://nf-co.re/events/2023/hackathon-march-2023) event from March 27th to 29th. This hackathon ran online via Gather, a virtual hosting platform, but for the first time we also asked community members to host local sites. We were blown away by the response, with volunteers coming forward to organize in-person attendance in 16 different locations across the world (and this was before we announced that Seqera would organize pizza for all the sites!). These gatherings had a big impact on the feel of the hackathon, whilst remaining accessible and eco-friendly, avoiding the need for air travel.\n\nThe hackathon was divided into five focus areas: modules, pipelines, documentation, infrastructure, and subworkflows. We had **411** people register, including **278 in-person attendees** at **16 locations**. This is an increase of **38%** compared to the **289** people that attended our October 2022 event. The hackathon was hosted in multiple countries including Brazil, France, Germany, Italy, Poland, Senegal, Serbia, South Africa, Spain, Sweden, the UK, and the United States.\n\nWe would like to thank the many organizations worldwide who provided a venue to host the hackathon and helped make it a resounding success. Besides being an excellent educational event, we resolved many longstanding Nextflow and nf-core issues.\n\n
\n \"Hackathon\n
\n\nYou can access the project reports from each hackathon team over the three-day event compiled in HackMD below:\n\n- [Modules team](https://hackmd.io/A5v4soteQjKywl3UgFa_6g)\n- [Pipelines Team](https://hackmd.io/Bj_MK3ubQWGBD4t0X2KpjA)\n- [Documentation Team](https://hackmd.io/o6AgPTZ7RBGCyZI72O1haA)\n- [Infrastructure Team](https://hackmd.io/uC-mZlEXQy6DaXZdjV6akA)\n- [Subworkflows Team](https://hackmd.io/Udtvj4jASsWLtMgrbTNwBA)\n\nYou can also view ten Hackathon videos outlining the event, introducing an overview of the teams, and daily hackathon activities in the [March 2023 nf-core hackathon YouTube playlist](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhfyF_QJIfSslnxRCU817yc). Check out activity in the nf-core hackathon March 2023 Github [issues board](https://github.com/orgs/nf-core/projects/38/views/16?layout=board) for a summary of what each team worked on.\n\n## A diverse and growing community\n\nWe were particularly pleased to see the growing diversity of the Nextflow and nf-core community membership, enabled partly by support from the Chan Zuckerberg Initiative Diversity and Inclusion grant and our nf-core mentorship programs. You can learn more about our mentorship efforts and exciting efforts of our global team in Chris Hakkaart’s excellent post, [Nextflow and nf-core Mentorship](https://nextflow.io/blog/2023/czi-mentorship-round-2.html) on the Nextflow blog.\n\nThe growing diversity of our community was also reflected in the results of our latest Nextflow Community survey, which you can read more about on the [Seqera Labs blog](https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results/).\n\n
\n \"Hackathon\n
\n\n## Looking forward\n\nRunning global events at this scale takes a tremendous team effort. The resources compiled will be valuable in introducing more people to Nextflow and nf-core. Thanks to everyone who participated in this year’s training and hackathon events. We look forward to making these even bigger and better in the future!\n\nThe next community training will be held online September 2023. This will be followed by two Nextflow Summit events with associated nf-core hackathons:\n\n- Barcelona: October 16-20, 2023\n- Boston: November 2023 (dates to be confirmed)\n\nIf you’d like to join, you can register to receive news and updates about the events at [https://summit.nextflow.io/summit-2023-preregistration/](https://summit.nextflow.io/summit-2023-preregistration/)\n\nYou can follow us on Twitter at [@nextflowio](https://twitter.com/nextflowio) or [@nf_core](https://twitter.com/nf_core) or join the discussion on the [Nextflow](https://www.nextflow.io/slack-invite.html) and [nf-core](https://nf-co.re/join) community Slack channels.\n\n
\n \"Hackathon\n
\n\n
\n \"Hackathon\n
\n", + "content": "In mid-March, we conducted our bi-annual Nextflow and [nf-core](https://nf-co.re/) training and hackathon in what was unquestionably our best-attended community events to date. This year we had an impressive **1,345 participants** attend the training from **76 countries**. Attendees came from far and wide — from Algeria to Andorra to Zambia to Zimbabwe!\n\nAmong our event attendees, we observed the following statistics:\n\n- 40% were 30 years old or younger, pointing to a young cohort of Nextflow users;\n- 55.3% identified as male vs. 40% female, highlighting our growing diversity;\n- 68.2% came from research institutions;\n- 71.4% were attending their first Nextflow training event;\n- 96.7% had never attended a Nextflow hackathon.\n\nRead on to learn more about these exciting events. If you missed it, you can still [watch the Nextflow & nf-core training](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo) at your convenience.\n\n\n\n## Multilingual training\n\nThis year, we were pleased to offer [Nextflow / nf-core training](https://nf-co.re/events/2023/training-march-2023) in multiple languages: in addition to English, we delivered sessions in French, Hindi, Portuguese, and Spanish.\n\nIn our pre-event registration, **~88%** of respondents indicated they would watch the training in English. However, there turned out to be a surprising appetite for training in other languages. We hope that multilingual training will make Nextflow even more accessible to talented scientists and researchers around the world.\n\nThe training consisted of four separate sessions in **5 languages** for a total of **20 sessions**. As of April 19th, we’ve amassed over **6,600 YouTube views** with **2,300+ hours** of training watched so far. **27%** have watched the non-English sessions, making the effort at translation highly worthwhile.\n\nThank you to the following people who delivered the training: [Chris Hakkaart](https://twitter.com/Chris_Hakk) (English), [Marcel Ribeiro-Dantas](https://twitter.com/mribeirodantas) (Portuguese), [Maxime Garcia](https://twitter.com/gau) (French), [Julia Mir Pedrol](https://twitter.com/juliamirpedrol) and [Gisela Gabernet](https://twitter.com/GGabernet) (Spanish), and [Abhinav Sharma](https://twitter.com/abhi18av) (Hindi).\n\nYou can view the community training sessions on YouTube here:\n\n- [March 2023 Community Training – English](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo)\n- [March 2023 Community Training – Portugese](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhi41yDYhyHitUhIcUHIbJg)\n- [March 2023 Community Training – French](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhiv9SjhoA1EDOXj9nzIqdS)\n- [March 2023 Community Training – Spanish](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhSlCWVoa3GURacuLWeFc8O)\n- [March 2023 Community Training – Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ)\n\nThe videos accompany the written training material, which you can find at [https://training.nextflow.io/](https://training.nextflow.io/)\n\n## Improved community training resources\n\nAlong with the updated training and hackathon resources above, we’ve significantly enhanced our online training materials available at [https://training.nextflow.io/](https://training.nextflow.io/). Thanks to the efforts of our volunteers, technical training, [Gitpod resources](https://training.nextflow.io/basic_training/setup/#gitpod), and materials for hands-on, self-guided learning are now available in English and Portuguese. Some of the materials are also available in Spanish and French.\n\nThe training comprises a significant set of resources covering topics including managing dependencies, containers, channels, processes, operators, and an introduction to the Groovy language. It also includes topics related to nf-core for users and developers as well as Nextflow Tower. Marcel Ribeiro-Dantas describes his experience leading the translation effort for this documentation in his latest nf-core/bytesize [translation talk](https://nf-co.re/events/2023/bytesize_translations).\n\nAdditional educational resources are provided in the recent Seqera Labs blog article, [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html), posted in February before our latest training event.\n\n## The nf-core hackathon\n\nWe also ran a separate [hackathon](https://nf-co.re/events/2023/hackathon-march-2023) event from March 27th to 29th. This hackathon ran online via Gather, a virtual hosting platform, but for the first time we also asked community members to host local sites. We were blown away by the response, with volunteers coming forward to organize in-person attendance in 16 different locations across the world (and this was before we announced that Seqera would organize pizza for all the sites!). These gatherings had a big impact on the feel of the hackathon, whilst remaining accessible and eco-friendly, avoiding the need for air travel.\n\nThe hackathon was divided into five focus areas: modules, pipelines, documentation, infrastructure, and subworkflows. We had **411** people register, including **278 in-person attendees** at **16 locations**. This is an increase of **38%** compared to the **289** people that attended our October 2022 event. The hackathon was hosted in multiple countries including Brazil, France, Germany, Italy, Poland, Senegal, Serbia, South Africa, Spain, Sweden, the UK, and the United States.\n\nWe would like to thank the many organizations worldwide who provided a venue to host the hackathon and helped make it a resounding success. Besides being an excellent educational event, we resolved many longstanding Nextflow and nf-core issues.\n\n
\n \"Hackathon\n
\n\nYou can access the project reports from each hackathon team over the three-day event compiled in HackMD below:\n\n- [Modules team](https://hackmd.io/A5v4soteQjKywl3UgFa_6g)\n- [Pipelines Team](https://hackmd.io/Bj_MK3ubQWGBD4t0X2KpjA)\n- [Documentation Team](https://hackmd.io/o6AgPTZ7RBGCyZI72O1haA)\n- [Infrastructure Team](https://hackmd.io/uC-mZlEXQy6DaXZdjV6akA)\n- [Subworkflows Team](https://hackmd.io/Udtvj4jASsWLtMgrbTNwBA)\n\nYou can also view ten Hackathon videos outlining the event, introducing an overview of the teams, and daily hackathon activities in the [March 2023 nf-core hackathon YouTube playlist](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhfyF_QJIfSslnxRCU817yc). Check out activity in the nf-core hackathon March 2023 Github [issues board](https://github.com/orgs/nf-core/projects/38/views/16?layout=board) for a summary of what each team worked on.\n\n## A diverse and growing community\n\nWe were particularly pleased to see the growing diversity of the Nextflow and nf-core community membership, enabled partly by support from the Chan Zuckerberg Initiative Diversity and Inclusion grant and our nf-core mentorship programs. You can learn more about our mentorship efforts and exciting efforts of our global team in Chris Hakkaart’s excellent post, [Nextflow and nf-core Mentorship](https://nextflow.io/blog/2023/czi-mentorship-round-2.html) on the Nextflow blog.\n\nThe growing diversity of our community was also reflected in the results of our latest Nextflow Community survey, which you can read more about on the [Seqera Labs blog](https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results/).\n\n
\n \"Hackathon\n
\n\n## Looking forward\n\nRunning global events at this scale takes a tremendous team effort. The resources compiled will be valuable in introducing more people to Nextflow and nf-core. Thanks to everyone who participated in this year’s training and hackathon events. We look forward to making these even bigger and better in the future!\n\nThe next community training will be held online September 2023. This will be followed by two Nextflow Summit events with associated nf-core hackathons:\n\n- Barcelona: October 16-20, 2023\n- Boston: November 2023 (dates to be confirmed)\n\nIf you’d like to join, you can register to receive news and updates about the events at [https://summit.nextflow.io/summit-2023-preregistration/](https://summit.nextflow.io/summit-2023-preregistration/)\n\nYou can follow us on Twitter at [@nextflowio](https://twitter.com/nextflowio) or [@nf_core](https://twitter.com/nf_core) or join the discussion on the [Nextflow](https://www.nextflow.io/slack-invite.html) and [nf-core](https://nf-co.re/join) community Slack channels.\n\n
\n \"Hackathon\n
\n\n
\n \"Hackathon\n
", "images": [ "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-1.jpg", "/img/celebrating-our-largest-international-training-event-and-hackathon-to-date-2.jpg", @@ -530,7 +530,7 @@ "slug": "2023/community-forum", "title": "Introducing community.seqera.io", "date": "2023-10-18T00:00:00.000Z", - "content": "\nWe are very excited to introduce the [Seqera community forum](https://community.seqera.io/) - the new home of the Nextflow community!\n\n

community.seqera.io

\n\nThe growth of the Nextflow community over recent years has been phenomenal. The Nextflow Slack organization was launched in early 2022 and has already reached a membership of nearly 3,000 members. As we look ahead to growing to 5,000 and even 50,000, we are making a new tool available to the community: a community forum.\n\nWe expect the new forum to coexist with the Nextflow Slack. The forum will be great at medium-format discussion, whereas Slack is largely designed for short-term ephemeral conversations. We want to support this growth of the community and believe the new forum will allow us to scale.\n\nDiscourse is an open-source, web-based platform designed for online community discussions and forum-style interactions. Discourse offers a user-friendly interface, real-time notifications, and a wide range of customization options. It prioritizes healthy and productive conversations by promoting user-friendly features, such as trust levels, gamification, and robust moderation tools. Discourse is well known for its focus on fostering engaging and respectful discussions and already caters to many large developer communities. It’s able to serve immense groups, giving us confidence that it will meet the needs of our growing developer community just as well. We believe that Discourse is a natural fit for the evolution of the Nextflow community.\n\n

\n\nThe community forum offers many exciting new features. Here are some of the things you can expect:\n\n- **Open content:** Content on the new forum is public – accessible without login, indexed by Google, and can be linked to directly. This means that it will be much easier to find answers to your problems, as well as share solutions on other platforms.\n- **Don’t ask the same thing twice:** It’s not always easy to find answers when there’s a lot of content available. The community forum helps you by suggesting similar topics as you write a new post. An upcoming [Discourse AI Bot](https://www.discourse.org/plugins/ai.html) may even allow you to ask questions using natural language in the future!\n- **Stay afloat:** The community forum will ensure developers have a space where they can post without fear that what they write might be drowned out, and where anything that our community finds useful will rise to the top of the list. Discourse will give life to threads with high-quality content that may have otherwise gone unnoticed and lost in a sea of new posts.\n- **Better organized:** The forum model for categories, tags, threads, and quoting forces conversations to be structured. Many questions involve the broader Nextflow ecosystem, tagging with multiple topics will cut through the noise and allow people to participate in targeted and well-labeled discussions. Importantly, maintainers can move miscategorized posts without asking the original author to delete and write again.\n- **Multi-product:** The forum has categories for Nextflow but also [Seqera Platform](https://seqera.io/platform/), [MultiQC](https://seqera.io/multiqc/), [Wave](https://seqera.io/wave/), and [Fusion](https://seqera.io/fusion/). Questions that involve multiple Seqera products can now span these boundaries, and content can be shared between posts easily.\n- **Community recognition:** The community forum will encourage a healthy ecosystem of developers that provides value to everyone involved and rewards the most active users. The new forum encourages positive community behaviors through features such as badges, a trust system, and community moderation. There’s even a [community leaderboard](https://community.seqera.io/leaderboard/)! We plan to gradually introduce additional features over time as adoption grows.\n\nOnline discussion platforms have been the beating heart of the Nextflow community from its inception. The first was a Google groups email list, which was followed by the Gitter instant messaging platform, GitHub Discussions, and most recently, Slack. We’re thrilled to embark on this new chapter of the Nextflow community – let us know what you think and ask any questions you might have in the [“Site Feedback” forum category](https://community.seqera.io/c/community/site-feedback/2)! Join us today at [https://community.seqera.io](https://community.seqera.io/) for a new and improved developer experience.\n\n

Visit the Seqera community forum

\n", + "content": "We are very excited to introduce the [Seqera community forum](https://community.seqera.io/) - the new home of the Nextflow community!\n\n[community.seqera.io](https://community.seqera.io/)\n\nThe growth of the Nextflow community over recent years has been phenomenal. The Nextflow Slack organization was launched in early 2022 and has already reached a membership of nearly 3,000 members. As we look ahead to growing to 5,000 and even 50,000, we are making a new tool available to the community: a community forum.\n\nWe expect the new forum to coexist with the Nextflow Slack. The forum will be great at medium-format discussion, whereas Slack is largely designed for short-term ephemeral conversations. We want to support this growth of the community and believe the new forum will allow us to scale.\n\nDiscourse is an open-source, web-based platform designed for online community discussions and forum-style interactions. Discourse offers a user-friendly interface, real-time notifications, and a wide range of customization options. It prioritizes healthy and productive conversations by promoting user-friendly features, such as trust levels, gamification, and robust moderation tools. Discourse is well known for its focus on fostering engaging and respectful discussions and already caters to many large developer communities. It’s able to serve immense groups, giving us confidence that it will meet the needs of our growing developer community just as well. We believe that Discourse is a natural fit for the evolution of the Nextflow community.\n\n\n\nThe community forum offers many exciting new features. Here are some of the things you can expect:\n\n- **Open content:** Content on the new forum is public – accessible without login, indexed by Google, and can be linked to directly. This means that it will be much easier to find answers to your problems, as well as share solutions on other platforms.\n- **Don’t ask the same thing twice:** It’s not always easy to find answers when there’s a lot of content available. The community forum helps you by suggesting similar topics as you write a new post. An upcoming [Discourse AI Bot](https://www.discourse.org/plugins/ai.html) may even allow you to ask questions using natural language in the future!\n- **Stay afloat:** The community forum will ensure developers have a space where they can post without fear that what they write might be drowned out, and where anything that our community finds useful will rise to the top of the list. Discourse will give life to threads with high-quality content that may have otherwise gone unnoticed and lost in a sea of new posts.\n- **Better organized:** The forum model for categories, tags, threads, and quoting forces conversations to be structured. Many questions involve the broader Nextflow ecosystem, tagging with multiple topics will cut through the noise and allow people to participate in targeted and well-labeled discussions. Importantly, maintainers can move miscategorized posts without asking the original author to delete and write again.\n- **Multi-product:** The forum has categories for Nextflow but also [Seqera Platform](https://seqera.io/platform/), [MultiQC](https://seqera.io/multiqc/), [Wave](https://seqera.io/wave/), and [Fusion](https://seqera.io/fusion/). Questions that involve multiple Seqera products can now span these boundaries, and content can be shared between posts easily.\n- **Community recognition:** The community forum will encourage a healthy ecosystem of developers that provides value to everyone involved and rewards the most active users. The new forum encourages positive community behaviors through features such as badges, a trust system, and community moderation. There’s even a [community leaderboard](https://community.seqera.io/leaderboard/)! We plan to gradually introduce additional features over time as adoption grows.\n\nOnline discussion platforms have been the beating heart of the Nextflow community from its inception. The first was a Google groups email list, which was followed by the Gitter instant messaging platform, GitHub Discussions, and most recently, Slack. We’re thrilled to embark on this new chapter of the Nextflow community – let us know what you think and ask any questions you might have in the [“Site Feedback” forum category](https://community.seqera.io/c/community/site-feedback/2)! Join us today at [https://community.seqera.io](https://community.seqera.io/) for a new and improved developer experience.\n\n[Visit the Seqera community forum](https://community.seqera.io/)", "images": [ "/img/seqera-community-all.png" ], @@ -541,7 +541,7 @@ "slug": "2023/czi-mentorship-round-2", "title": "Nextflow and nf-core Mentorship, Round 2", "date": "2023-04-17T00:00:00.000Z", - "content": "\n## Introduction\n\n
\n \"Mentorship\n

Nextflow and nf-core mentorship rocket.

\n
\n\nThe global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation.\n\nThanks to the support of our Chan Zuckerberg Initiative Diversity and Inclusion grant, we established an international Nextflow and nf-core mentoring program. With the second round of the mentorship program now complete, we celebrate the success of the most recent cohort of mentors and mentees.\n\nFrom hundreds of applications, thirteen pairs of mentors and mentees were chosen for the second round of the program. For the past four months, they met regularly to collaborate on Nextflow or nf-core projects. The project scope was left up to the mentees, enabling them to work on any project aligned with their scientific interests and schedules.\n\nMentor-mentee pairs worked on a range of projects that included learning Nextflow and nf-core fundamentals, setting up Nextflow on their institutional clusters, translating Nextflow training material into other languages, and developing and implementing Nextflow and nf-core pipelines. Impressively, despite many mentees starting the program with very limited knowledge of Nextflow and nf-core, they completed the program with confidence and improved their abilities to develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor and mentee pairs](/img/mentorships-round2-map.png)
\n_The second round of the mentorship program was global._\n\n## Jing Lu (Mentee) & Moritz Beber (Mentor)\n\nJing joined the program with the goal of learning how to develop advanced Nextflow pipelines for disease surveillance at the Guangdong Provincial Center for Diseases Control and Prevention in China. His mentor was Moritz Beber from Denmark.\n\nTogether, Jing and Moritz developed a pipeline for the analysis of SARS-CoV-2 genomes from sewage samples. They also used GitHub and docker containers to make the pipeline more sharable and reproducible. In the future, Jing hopes to use Nextflow Tower to share the pipeline with other institutions.\n\n## Luria Leslie Founou (Mentee) & Sebastian Malkusch (Mentor)\n\nLuria's goal was to accelerate her understanding of Nextflow and apply it to her exploration of the resistome, virulome, mobilome, and phylogeny of bacteria at the Research Centre of Expertise and Biological Diagnostic of Cameroon. Luria was mentored by Sebastian Malkusch, Kolja Becker, and Alex Peltzer from the Boehringer Ingelheim Pharma GmbH & Co. KG in Germany.\n\nFor their project, Luria and her mentors developed a [pipeline](https://github.com/SMLMS/nfml) for mapping multi-dimensional feature space onto a discrete or continuous response variable by using multivariate models from the field of classical machine learning. Their pipeline will be able to handle classification, regression, and time-to-event models and can be used for model training, validation, and feature selection.\n\n## Sebastian Musundi (Mentee) & Athanasios Baltzis (Mentor)\n\nSebastian, from Mount Kenya University in Kenya, joined the mentorship program with the goal of using Nextflow pipelines to identify vaccine targets in Apicomplexan parasites. He was mentored by Athanasios Balzis from the Centre for Genomic Regulation in Spain.\n\nWith Athanasios’s help, Sebastian learned the fundamentals for developing Nextflow pipelines. During the learning process, they developed a [pipeline](https://github.com/sebymusundi/simple_RNA-seq) for customized RNA sequencing and a [pipeline](https://github.com/sebymusundi/AMR_pipeline) for predicting antimicrobial resistance genes. With his new skills, Sebastian plans to keep using Nextflow on a daily basis and start contributing to nf-core.\n\n## Juan Ugalde (Mentee) & Robert Petit (Mentor)\n\nJuan joined the mentorship program with the goal of improving his understanding of Nextflow to support microbial and viral analysis at the Universidad Andres Bello in Chile. Juan was mentored by Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert is an experienced Nextflow mentor who also mentored in Round 1 of the program.\n\nJuan and Robert shared an interest in viral genomics. After learning more about the Nextflow and nf-core ecosystem, Robert mentored Juan as he developed a Nextflow viral amplicon analysis [pipeline](https://github.com/gene2dis/hantaflow). Juan will continue his Nextflow and nf-core journey by sharing his new knowledge with his group and incorporating it into his classes in the coming semester.\n\n## Bhargava Reddy Morampalli (Mentee) & Venkat Malladi (Mentor)\n\nBhargava studies at Massey University in New Zealand and joined the program with the goal of improving his understanding of Nextflow and resolving issues he was facing while developing a pipeline to analyze Nanopore direct RNA sequencing data. Bhargava was mentored by Venkat Malladi from Microsoft in the USA.\n\nBhargava and Venkat worked on Bhargava’s [pipeline](https://github.com/bhargava-morampalli/rnamods-nf/) to identify RNA modifications from bacteria. Their successes included advancing the pipeline and making Singularity images for the tools Bhargava was using to make it more reproducible. For Bhargava, the mentorship program was a great kickstart for learning Nextflow and his pipeline development. He hopes to continue to develop his pipeline and optimize it for cloud platforms in the future.\n\n## Odion Ikhimiukor (Mentee) & Ben Sherman (Mentor)\n\nBefore the program, Odion, who is at the University at Albany in the USA, was new to Nextflow and nf-core. He joined the program with the goal of improving his understanding and to learn how to develop pipelines for bacterial genome analysis. His mentor Ben Sherman works for Seqera Labs in the USA.\n\nDuring the program Odion and Ben developed a [pipeline](https://github.com/odionikh/nf-practice) to analyze bacterial genomes for antimicrobial resistance surveillance. They also developed configuration settings to enable the deployment of their pipeline with high and low resources. Odion has plans to share his new knowledge with others in his community.\n\n## Batool Almarzouq (Mentee) & Murray Wham (Mentor)\n\nBatool works at the King Abdullah International Medical Research Center in Saudi Arabia. Her goal for the mentorship program was to contribute to, and develop, nf-core pipelines.\nAdditionally, she aimed to develop new educational resources for nf-core that can support researchers from lowly represented groups. Her mentor was Murray Wham from the ​​University of Edinburgh in the UK.\n\nDuring the mentorship program, Murray helped Batool develop her molecular dynamics pipeline and participate in the 1st Biohackathon in MENA (KAUST). Batool and Murray also found ways to make documentation more accessible and are actively promoting Nextlfow and nf-core in Saudi Arabia.\n\n## Mariama Telly Diallo (Mentee) & Emilio Garcia (Mentor)\n\nMariama Telly joined the mentorship program with the goal of developing and implementing Nextflow pipelines for malaria research at the Medical Research Unit at The London School of Hygiene and Tropical Medicine in Gambia. She was mentored by Emilio Garcia from Platomics in Austria. Emilio is another experienced mentor who joined the program for a second time.\n\nTogether, Mariama Telly and Emilio worked on learning the basics of Nextflow, Git, and Docker. Putting these skills into practice they started to develop a Nextflow pipeline with a docker file and custom configuration. Mariama Telly greatly improved her understanding of best practices and Nextflow and intends to use her newfound knowledge for future projects.\n\n## Anabella Trigila (Mentee) & Matthias De Smet (Mentor)\n\nAnabella’s goal was to set up Nextflow on her institutional cluster at Héritas S.A. in Argentina and translate some bash pipelines into Nextflow pipelines. Anabella was mentored by Matthias De Smet from Ghent University in Belgium.\n\nAnabella and Matthias worked on developing several new nf-core modules. Extending this, they started the development of a [pipeline](https://github.com/atrigila/nf-core-saliva) to process VCFs obtained from saliva samples and a [pipeline](https://github.com/atrigila/nf-core-ancestry) to infer ancestry from VCF samples. Anabella has now transitioned from a user to a developer and made multiple contributions to the most recent nf-core hackathon. She also contributed to the Spanish translation of the Nextflow [training material](https://training.nextflow.io/es/).\n\n## Juliano de Oliveira Silveira (Mentee) & Maxime Garcia (Mentor)\n\nJuliano works at the Laboratório Central de Saúde Pública RS in Brazil. He joined the program with the goal of setting up Nextflow at his institution, which led him to learn to write his own pipelines. Juliano was mentored by Maxime Garcia from Seqera Labs in Sweden.\n\nJuliano and Maxime worked on learning about Nextflow and nf-core. Juliano applied his new skills to an open-source bioinformatics program that used Nextflow with a customized R script. Juliano hopes to give back to the wider community and peers in Brazil.\n\n## Patricia Agudelo-Romero (Mentee) & Abhinav Sharma (Mentor)\n\nPatricia's goal was to create, customize, and deploy nf-core pipelines at the Telethon Kids Institute in Australia. Her mentor was Abhinav Sharma from Stellenbosch University in South Africa.\n\nAbhinav helped Patricia learn how to write reproducible pipelines with Nextflow and how to work with shared code repositories on GitHub. With Abhinav's support, Patricia worked on translating a Snakemake [pipeline](https://github.com/agudeloromero/everest_nf) designed for genome virus identification and classification into Nextflow. Patricia is already applying her new skills and supporting others at her institute as they adopt Nextflow.\n\n## Mariana Guilardi (Mentee) & Alyssa Briggs (Mentor)\n\nMariana’s goal was to learn the fundamentals of Nextflow, construct and run pipelines, and help with nf-core pipeline development. Her mentor was Alyssa Briggs from the University of Texas at Dallas in the USA\n\nAt the start of the program, Alyssa helped Mariana learn the fundamentals of Nextflow. With Alyssa’s help, Mariana’s skills progressed rapidly and by the end of the program, they were running pipelines and developing new nf-core modules and the [nf-core/viralintegration](https://github.com/nf-core/viralintegration) pipeline. Mariana also made community contributions to the Portuguese translation of the Nextflow [training material](https://training.nextflow.io/pt/).\n\n## Liliane Cavalcante (Mentee) & Marcel Ribeiro-Dantas (Mentor)\n\nLiliane’s goal was to develop and apply Nextflow pipelines for genomic and epidemiological analyses at the Laboratório Central de Saúde Pública Noel Nutels in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera Labs in Brazil.\n\nLiliane and Marcel used Nextflow and nf-core to analyze SARS-CoV-2 genomes and demographic data for public health surveillance. They used the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline and made a new Nextflow script for additional analysis and generating graphs.\n\n## Conclusion\n\nAs with the first round of the program, the feedback about the second round of the mentorship program was overwhelmingly positive. All mentees found the experience to be highly beneficial and were grateful for the opportunity to participate.\n\n
\n “Having a mentor guide through the entire program was super cool. We worked all the way from the basics of Nextflow and learned a lot about developing and debugging pipelines. Today, I feel more confident than before in using Nextflow on a daily basis.” - Sebastian Musundi (Mentee)\n
\n\nSimilarly, the mentors also found the experience to be highly rewarding.\n\n
\n “As a mentor, I really enjoyed participating in the program. Not only did I have the chance to support and work with colleagues from lowly represented regions, but also I learned a lot and improved myself through the mentoring and teaching process.” - Athanasios Baltzis (Mentor)\n
\n\nImportantly, all program participants expressed their willingness to encourage others to be part of it in the future.\n\n
\n “The mentorship allows mentees not only to learn nf-core/Nextflow but also a lot of aspects about open-source reproducible research. With your learning, at the end of the mentorship, you could even contribute back to the nf-core community, which is fantastic! I would tell everyone who is interested in the program to go for it.” - Anabella Trigila (Mentee)\n
\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-lasting benefits beyond those that can be immediately measured. Mentees from the program have already become positive role models, contributing new perspectives to the broader community.\n\n
\n “I highly recommend this program. Independent if you are new to Nextflow or already have some experience, the possibility of working with amazing people to learn about the Nextflow ecosystem is invaluable. It helped me to improve my work, learn new things, and become confident enough to teach Nextflow to students.” - Juan Ugalde (Mentee)\n
\n\nWe were delighted with the achievements of the mentors and mentees. Applications for the third round are now open! For more information, please visit https://nf-co.re/mentorships.\n", + "content": "## Introduction\n\n
\n \"Mentorship\n \n\n*Nextflow and nf-core mentorship rocket.*\n\n
\n\nThe global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation.\n\nThanks to the support of our Chan Zuckerberg Initiative Diversity and Inclusion grant, we established an international Nextflow and nf-core mentoring program. With the second round of the mentorship program now complete, we celebrate the success of the most recent cohort of mentors and mentees.\n\nFrom hundreds of applications, thirteen pairs of mentors and mentees were chosen for the second round of the program. For the past four months, they met regularly to collaborate on Nextflow or nf-core projects. The project scope was left up to the mentees, enabling them to work on any project aligned with their scientific interests and schedules.\n\nMentor-mentee pairs worked on a range of projects that included learning Nextflow and nf-core fundamentals, setting up Nextflow on their institutional clusters, translating Nextflow training material into other languages, and developing and implementing Nextflow and nf-core pipelines. Impressively, despite many mentees starting the program with very limited knowledge of Nextflow and nf-core, they completed the program with confidence and improved their abilities to develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor and mentee pairs](/img/mentorships-round2-map.png)
\n_The second round of the mentorship program was global._\n\n## Jing Lu (Mentee) & Moritz Beber (Mentor)\n\nJing joined the program with the goal of learning how to develop advanced Nextflow pipelines for disease surveillance at the Guangdong Provincial Center for Diseases Control and Prevention in China. His mentor was Moritz Beber from Denmark.\n\nTogether, Jing and Moritz developed a pipeline for the analysis of SARS-CoV-2 genomes from sewage samples. They also used GitHub and docker containers to make the pipeline more sharable and reproducible. In the future, Jing hopes to use Nextflow Tower to share the pipeline with other institutions.\n\n## Luria Leslie Founou (Mentee) & Sebastian Malkusch (Mentor)\n\nLuria's goal was to accelerate her understanding of Nextflow and apply it to her exploration of the resistome, virulome, mobilome, and phylogeny of bacteria at the Research Centre of Expertise and Biological Diagnostic of Cameroon. Luria was mentored by Sebastian Malkusch, Kolja Becker, and Alex Peltzer from the Boehringer Ingelheim Pharma GmbH & Co. KG in Germany.\n\nFor their project, Luria and her mentors developed a [pipeline](https://github.com/SMLMS/nfml) for mapping multi-dimensional feature space onto a discrete or continuous response variable by using multivariate models from the field of classical machine learning. Their pipeline will be able to handle classification, regression, and time-to-event models and can be used for model training, validation, and feature selection.\n\n## Sebastian Musundi (Mentee) & Athanasios Baltzis (Mentor)\n\nSebastian, from Mount Kenya University in Kenya, joined the mentorship program with the goal of using Nextflow pipelines to identify vaccine targets in Apicomplexan parasites. He was mentored by Athanasios Balzis from the Centre for Genomic Regulation in Spain.\n\nWith Athanasios’s help, Sebastian learned the fundamentals for developing Nextflow pipelines. During the learning process, they developed a [pipeline](https://github.com/sebymusundi/simple_RNA-seq) for customized RNA sequencing and a [pipeline](https://github.com/sebymusundi/AMR_pipeline) for predicting antimicrobial resistance genes. With his new skills, Sebastian plans to keep using Nextflow on a daily basis and start contributing to nf-core.\n\n## Juan Ugalde (Mentee) & Robert Petit (Mentor)\n\nJuan joined the mentorship program with the goal of improving his understanding of Nextflow to support microbial and viral analysis at the Universidad Andres Bello in Chile. Juan was mentored by Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert is an experienced Nextflow mentor who also mentored in Round 1 of the program.\n\nJuan and Robert shared an interest in viral genomics. After learning more about the Nextflow and nf-core ecosystem, Robert mentored Juan as he developed a Nextflow viral amplicon analysis [pipeline](https://github.com/gene2dis/hantaflow). Juan will continue his Nextflow and nf-core journey by sharing his new knowledge with his group and incorporating it into his classes in the coming semester.\n\n## Bhargava Reddy Morampalli (Mentee) & Venkat Malladi (Mentor)\n\nBhargava studies at Massey University in New Zealand and joined the program with the goal of improving his understanding of Nextflow and resolving issues he was facing while developing a pipeline to analyze Nanopore direct RNA sequencing data. Bhargava was mentored by Venkat Malladi from Microsoft in the USA.\n\nBhargava and Venkat worked on Bhargava’s [pipeline](https://github.com/bhargava-morampalli/rnamods-nf/) to identify RNA modifications from bacteria. Their successes included advancing the pipeline and making Singularity images for the tools Bhargava was using to make it more reproducible. For Bhargava, the mentorship program was a great kickstart for learning Nextflow and his pipeline development. He hopes to continue to develop his pipeline and optimize it for cloud platforms in the future.\n\n## Odion Ikhimiukor (Mentee) & Ben Sherman (Mentor)\n\nBefore the program, Odion, who is at the University at Albany in the USA, was new to Nextflow and nf-core. He joined the program with the goal of improving his understanding and to learn how to develop pipelines for bacterial genome analysis. His mentor Ben Sherman works for Seqera Labs in the USA.\n\nDuring the program Odion and Ben developed a [pipeline](https://github.com/odionikh/nf-practice) to analyze bacterial genomes for antimicrobial resistance surveillance. They also developed configuration settings to enable the deployment of their pipeline with high and low resources. Odion has plans to share his new knowledge with others in his community.\n\n## Batool Almarzouq (Mentee) & Murray Wham (Mentor)\n\nBatool works at the King Abdullah International Medical Research Center in Saudi Arabia. Her goal for the mentorship program was to contribute to, and develop, nf-core pipelines.\nAdditionally, she aimed to develop new educational resources for nf-core that can support researchers from lowly represented groups. Her mentor was Murray Wham from the ​​University of Edinburgh in the UK.\n\nDuring the mentorship program, Murray helped Batool develop her molecular dynamics pipeline and participate in the 1st Biohackathon in MENA (KAUST). Batool and Murray also found ways to make documentation more accessible and are actively promoting Nextlfow and nf-core in Saudi Arabia.\n\n## Mariama Telly Diallo (Mentee) & Emilio Garcia (Mentor)\n\nMariama Telly joined the mentorship program with the goal of developing and implementing Nextflow pipelines for malaria research at the Medical Research Unit at The London School of Hygiene and Tropical Medicine in Gambia. She was mentored by Emilio Garcia from Platomics in Austria. Emilio is another experienced mentor who joined the program for a second time.\n\nTogether, Mariama Telly and Emilio worked on learning the basics of Nextflow, Git, and Docker. Putting these skills into practice they started to develop a Nextflow pipeline with a docker file and custom configuration. Mariama Telly greatly improved her understanding of best practices and Nextflow and intends to use her newfound knowledge for future projects.\n\n## Anabella Trigila (Mentee) & Matthias De Smet (Mentor)\n\nAnabella’s goal was to set up Nextflow on her institutional cluster at Héritas S.A. in Argentina and translate some bash pipelines into Nextflow pipelines. Anabella was mentored by Matthias De Smet from Ghent University in Belgium.\n\nAnabella and Matthias worked on developing several new nf-core modules. Extending this, they started the development of a [pipeline](https://github.com/atrigila/nf-core-saliva) to process VCFs obtained from saliva samples and a [pipeline](https://github.com/atrigila/nf-core-ancestry) to infer ancestry from VCF samples. Anabella has now transitioned from a user to a developer and made multiple contributions to the most recent nf-core hackathon. She also contributed to the Spanish translation of the Nextflow [training material](https://training.nextflow.io/es/).\n\n## Juliano de Oliveira Silveira (Mentee) & Maxime Garcia (Mentor)\n\nJuliano works at the Laboratório Central de Saúde Pública RS in Brazil. He joined the program with the goal of setting up Nextflow at his institution, which led him to learn to write his own pipelines. Juliano was mentored by Maxime Garcia from Seqera Labs in Sweden.\n\nJuliano and Maxime worked on learning about Nextflow and nf-core. Juliano applied his new skills to an open-source bioinformatics program that used Nextflow with a customized R script. Juliano hopes to give back to the wider community and peers in Brazil.\n\n## Patricia Agudelo-Romero (Mentee) & Abhinav Sharma (Mentor)\n\nPatricia's goal was to create, customize, and deploy nf-core pipelines at the Telethon Kids Institute in Australia. Her mentor was Abhinav Sharma from Stellenbosch University in South Africa.\n\nAbhinav helped Patricia learn how to write reproducible pipelines with Nextflow and how to work with shared code repositories on GitHub. With Abhinav's support, Patricia worked on translating a Snakemake [pipeline](https://github.com/agudeloromero/everest_nf) designed for genome virus identification and classification into Nextflow. Patricia is already applying her new skills and supporting others at her institute as they adopt Nextflow.\n\n## Mariana Guilardi (Mentee) & Alyssa Briggs (Mentor)\n\nMariana’s goal was to learn the fundamentals of Nextflow, construct and run pipelines, and help with nf-core pipeline development. Her mentor was Alyssa Briggs from the University of Texas at Dallas in the USA\n\nAt the start of the program, Alyssa helped Mariana learn the fundamentals of Nextflow. With Alyssa’s help, Mariana’s skills progressed rapidly and by the end of the program, they were running pipelines and developing new nf-core modules and the [nf-core/viralintegration](https://github.com/nf-core/viralintegration) pipeline. Mariana also made community contributions to the Portuguese translation of the Nextflow [training material](https://training.nextflow.io/pt/).\n\n## Liliane Cavalcante (Mentee) & Marcel Ribeiro-Dantas (Mentor)\n\nLiliane’s goal was to develop and apply Nextflow pipelines for genomic and epidemiological analyses at the Laboratório Central de Saúde Pública Noel Nutels in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera Labs in Brazil.\n\nLiliane and Marcel used Nextflow and nf-core to analyze SARS-CoV-2 genomes and demographic data for public health surveillance. They used the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline and made a new Nextflow script for additional analysis and generating graphs.\n\n## Conclusion\n\nAs with the first round of the program, the feedback about the second round of the mentorship program was overwhelmingly positive. All mentees found the experience to be highly beneficial and were grateful for the opportunity to participate.\n\n> *“Having a mentor guide through the entire program was super cool. We worked all the way from the basics of Nextflow and learned a lot about developing and debugging pipelines. Today, I feel more confident than before in using Nextflow on a daily basis.”* - Sebastian Musundi (Mentee)\n\nSimilarly, the mentors also found the experience to be highly rewarding.\n\n> *“As a mentor, I really enjoyed participating in the program. Not only did I have the chance to support and work with colleagues from lowly represented regions, but also I learned a lot and improved myself through the mentoring and teaching process.”* - Athanasios Baltzis (Mentor)\n\nImportantly, all program participants expressed their willingness to encourage others to be part of it in the future.\n\n> *“The mentorship allows mentees not only to learn nf-core/Nextflow but also a lot of aspects about open-source reproducible research. With your learning, at the end of the mentorship, you could even contribute back to the nf-core community, which is fantastic! I would tell everyone who is interested in the program to go for it.”* - Anabella Trigila (Mentee)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-lasting benefits beyond those that can be immediately measured. Mentees from the program have already become positive role models, contributing new perspectives to the broader community.\n\n> *“I highly recommend this program. Independent if you are new to Nextflow or already have some experience, the possibility of working with amazing people to learn about the Nextflow ecosystem is invaluable. It helped me to improve my work, learn new things, and become confident enough to teach Nextflow to students.”* - Juan Ugalde (Mentee)\n\nWe were delighted with the achievements of the mentors and mentees. Applications for the third round are now open! For more information, please visit https://nf-co.re/mentorships.", "images": [ "/img/mentorships-round2-rocket.png" ], @@ -552,7 +552,7 @@ "slug": "2023/czi-mentorship-round-3", "title": "Nextflow and nf-core Mentorship, Round 3", "date": "2023-11-13T00:00:00.000Z", - "content": "\n
\n \"Mentorship\n

Nextflow and nf-core mentorship rocket.

\n
\n\nWith the third round of the [Nextflow and nf-core mentorship program](https://nf-co.re/mentorships) now behind us, it's time to pop the confetti and celebrate the outstanding achievements of our latest group of mentors and mentees!\n\nAs with the [first](https://www.nextflow.io/blog/2022/czi-mentorship-round-1.html) and [second](https://www.nextflow.io/blog/2023/czi-mentorship-round-2.html) rounds of the program, we received hundreds of applications from all over the world. Mentors and mentees were matched based on compatible interests and time zones and set off to work on a project of their choosing. Pairs met regularly to work on their projects and reported back to the group to discuss their progress every month.\n\nThe mentor-mentee duos chose to tackle many interesting projects during the program. From learning how to develop pipelines with Nextflow and nf-core, setting up Nextflow on their institutional clusters, and translating Nextflow training materials into other languages, this cohort of mentors and mentees did it all. Regardless of all initial challenges, every pair emerged from the program brimming with confidence and a knack for building scalable and reproducible scientific workflows with Nextlfow. Way to go, team!\n\n![Map of mentor and mentee pairs](/img/mentorship_3_map.png)
\n_Participants of the third round of the mentorship program._\n\n## Abhay Rastogi and Matthias De Smet\n\nAbhay Rastogi is a Clinical Research Fellow at the All India Institute Of Medical Sciences (AllMS Delhi). During the program, he wanted to contribute to the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline. He was mentored by Matthias De Smet, a Bioinformatician at the Center for Medical Genetics in the Ghent University Hospital. Together they worked on developing an nf-core module for Exomiser, a variant prioritization tool for short-read WGS data that they hope to incorporate into [nf-core/sarek](https://github.com/nf-core/sarek/). Keep an eye out for this brand new feature as they continue to work towards implementing this new feature into the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline!\n\n## Alan Möbbs and Simon Pearce\n\nAlan Möbbs, a Bioinformatics Analyst at MultiplAI, was mentored by Simon Pearce, Principal Bioinformatician at the Cancer Research UK Cancer Biomarker Centre. During the program, Alan wanted to create a custom pipeline that merges functionalities from the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) and [nf-core/rnavar](https://github.com/nf-core/rnavar/) pipelines. They started their project by forking the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) pipeline and adding a subworkflow with variant calling functionalities. As the project moved on, they were able to remove tools from the pipeline that were no longer required. Finally, they created some custom definitions for processing samples and work queues to optimize the workflow on AWS. Alan plans to keep working on this project in the future.\n\n## Cen Liau and Chris Hakkaart\n\nCen Liau is a scientist at the Bragato Research Institute in New Zealand, analyzing the epigenetics of grapevines in response to environmental stress. Her mentor was Chris Hakkaart, a Developer Advocate at Seqera. They started the program by deploying the [nf-core/methylseq](https://github.com/nf-core/methylseq/) pipeline on New Zealand’s national infrastructure to analyze data Cen had produced. Afterward, they started to develop a proof of concept methylation pipeline to analyze additional data Cen has produced. Along the way, they learned about nf-core best practices and how to use GitHub to build pipelines collaboratively.\n\n## Chenyu Jin and Ben Sherman\n\nChenyu Jin is a Ph.D. student at the Center for Palaeogenetics of the Swedish Museum of Natural History. She worked with Ben Sherman, a Software Engineer at Seqera. Together they worked towards establishing a workflow for recursive step-down classification using experimental Nextflow features. During the program, they made huge progress in developing a cutting-edge pipeline that can be used for analyzing ancient environmental DNA and reconstructing flora and fauna. Watch this space for future developments!\n\n## Georgie Samaha and Cristina Tuñí i Domínguez\n\nGeorgie Samaha, a bioinformatician from the University of Sydney, was mentored by Cristina Tuñi i Domínguez, a Bioinformatics Scientist at Flomics Biotech SL. During the program, they developed Nextflow configuration files. As a part of this, they built institutional configuration files for multiple national research HPC and cloud infrastructures in Australia. Towards the end of the mentorship, they [built a tool for building configuration files](https://github.com/georgiesamaha/configBuilder-nf) that they hope to share widely in the future.\n\n## Ícaro Maia Santos de Castro and Robert Petit\n\nÍcaro Maia Santos is a Ph.D. Candidate at the University of São Paulo. He was mentored by Robert, a Research Scientist from Wyoming Public Health Lab. After learning the basics of Nextflow and nf-core, they worked on a [metatranscriptomics pipeline](https://github.com/icaromsc/nf-core-phiflow) that simultaneously characterizes microbial composition and host gene expression RNA sequencing samples. As a part of this process, they used nf-core modules that were already available and developed and contributed new modules to the nf-core repository. Ícaro found having someone to help him learn and overcome issues as he was developing his pipeline was invaluable for his career.\n\n![phiflow metro map](/img/phiflow_metro_map.png)
\n_Metro map of the phiflow workflow._\n\n## Lila Maciel Rodríguez Pérez and Priyanka Surana\n\nLila Maciel Rodríguez Pérez, from the National Agrarian University in Peru, was mentored by Priyanka Surana, a researcher from the Wellcome Sanger Institute in the UK. Lila and Priyanka focused on building and deploying Nextflow scripts for metagenomic assemblies. In particular, they were interested in the identification of Antibiotic-Resistant Genes (ARG), Metal-Resistant Genes (MRG), and Mobile Genetic Elements (MGE) in different environments, and in figuring out how these genes are correlated. Both Lila and Priyanka spoke highly of each other and how much they enjoyed being a part of the program.\n\n## Luisa Sacristan and Gisela Gabernet\n\nLuisa is an MSc. student studying computational biology in the Computational Biology and Microbial Ecology group at Universidad de los Andes in Colombia. She was mentored by Gisela Gabernet, a researcher at Yale Medical School. At the start of the program, Luisa and Gisela focused on learning more about GitHub. They quickly moved on to developing an nf-core configuration file for Luisa’s local university cluster. Finally, they started developing a pipeline for the analysis of custom ONT metagenomic amplicons from coffee beans.\n\n## Natalia Coutouné and Marcel Ribeiro-Dantas\n\nNatalia Coutoné is a Ph.D. Candidate at the University of Campinas in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera. Natalia and Marcel worked on developing a pipeline to identify relevant QTL among two or more pool-seq samples. Learning the little things, such as how and where to get help was a valuable part of the learning process for Natalia. She also found it especially useful to consolidate a “Frankenstein” pipeline she had been using into a cohesive Nextflow pipeline that she could share with others.\n\n## Raquel Manzano and Maxime Garcia\n\nRaquel Manzano is a bioinformatician and Ph.D. candidate at the University of Cambridge, Cancer Research UK Cambridge Institute. She was mentored by Maxime Garcia, a bioinformatics engineer at Seqera. During the program, they spent their time developing the [nf-core/rnadnavar](https://github.com/nf-core/rnadnavar/) pipeline. Initially designed for cancer research, this pipeline identifies a consensus call set from RNA and DNA somatic variant calling tools. Both Raquel and Maxime found the program to be highly rewarding. Raquel’s [presentation](https://www.youtube.com/watch?v=PzGOvqSI5n0) about the rnadnavar pipeline and her experience as a mentee from the 2023 Nextflow Summit in Barcelona is now online.\n\n## Conclusion\n\nWe are thrilled to report that the feedback from both mentors and mentees has been overwhelmingly positive. Every participant, whether mentor or mentee, found the experience extremely valuable and expressed gratitude for the chance to participate.\n\n
\n “I loved the experience and the opportunity to develop my autonomy in nextflow/nf-core. This community is totally amazing!” - Icaro Castro\n
\n\n
\n “I think this was a great opportunity to learn about a tool that can make our day-to-day easier and reproducible. Who knows, maybe it can give you a better chance when applying for jobs.” - Alan Möbbs\n
\n\nThanks to the fantastic support of the Chan Zuckerberg Initiative Diversity and Inclusion grant, Seqera, and our fantastic community, who made it possible to run all three rounds of the Nextflow and nf-core mentorship program.\n", + "content": "
\n \"Mentorship\n \n\n*Nextflow and nf-core mentorship rocket.*\n\n
\n\nWith the third round of the [Nextflow and nf-core mentorship program](https://nf-co.re/mentorships) now behind us, it's time to pop the confetti and celebrate the outstanding achievements of our latest group of mentors and mentees!\n\nAs with the [first](https://www.nextflow.io/blog/2022/czi-mentorship-round-1.html) and [second](https://www.nextflow.io/blog/2023/czi-mentorship-round-2.html) rounds of the program, we received hundreds of applications from all over the world. Mentors and mentees were matched based on compatible interests and time zones and set off to work on a project of their choosing. Pairs met regularly to work on their projects and reported back to the group to discuss their progress every month.\n\nThe mentor-mentee duos chose to tackle many interesting projects during the program. From learning how to develop pipelines with Nextflow and nf-core, setting up Nextflow on their institutional clusters, and translating Nextflow training materials into other languages, this cohort of mentors and mentees did it all. Regardless of all initial challenges, every pair emerged from the program brimming with confidence and a knack for building scalable and reproducible scientific workflows with Nextlfow. Way to go, team!\n\n![Map of mentor and mentee pairs](/img/mentorship_3_map.png)
\n_Participants of the third round of the mentorship program._\n\n## Abhay Rastogi and Matthias De Smet\n\nAbhay Rastogi is a Clinical Research Fellow at the All India Institute Of Medical Sciences (AllMS Delhi). During the program, he wanted to contribute to the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline. He was mentored by Matthias De Smet, a Bioinformatician at the Center for Medical Genetics in the Ghent University Hospital. Together they worked on developing an nf-core module for Exomiser, a variant prioritization tool for short-read WGS data that they hope to incorporate into [nf-core/sarek](https://github.com/nf-core/sarek/). Keep an eye out for this brand new feature as they continue to work towards implementing this new feature into the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline!\n\n## Alan Möbbs and Simon Pearce\n\nAlan Möbbs, a Bioinformatics Analyst at MultiplAI, was mentored by Simon Pearce, Principal Bioinformatician at the Cancer Research UK Cancer Biomarker Centre. During the program, Alan wanted to create a custom pipeline that merges functionalities from the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) and [nf-core/rnavar](https://github.com/nf-core/rnavar/) pipelines. They started their project by forking the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) pipeline and adding a subworkflow with variant calling functionalities. As the project moved on, they were able to remove tools from the pipeline that were no longer required. Finally, they created some custom definitions for processing samples and work queues to optimize the workflow on AWS. Alan plans to keep working on this project in the future.\n\n## Cen Liau and Chris Hakkaart\n\nCen Liau is a scientist at the Bragato Research Institute in New Zealand, analyzing the epigenetics of grapevines in response to environmental stress. Her mentor was Chris Hakkaart, a Developer Advocate at Seqera. They started the program by deploying the [nf-core/methylseq](https://github.com/nf-core/methylseq/) pipeline on New Zealand’s national infrastructure to analyze data Cen had produced. Afterward, they started to develop a proof of concept methylation pipeline to analyze additional data Cen has produced. Along the way, they learned about nf-core best practices and how to use GitHub to build pipelines collaboratively.\n\n## Chenyu Jin and Ben Sherman\n\nChenyu Jin is a Ph.D. student at the Center for Palaeogenetics of the Swedish Museum of Natural History. She worked with Ben Sherman, a Software Engineer at Seqera. Together they worked towards establishing a workflow for recursive step-down classification using experimental Nextflow features. During the program, they made huge progress in developing a cutting-edge pipeline that can be used for analyzing ancient environmental DNA and reconstructing flora and fauna. Watch this space for future developments!\n\n## Georgie Samaha and Cristina Tuñí i Domínguez\n\nGeorgie Samaha, a bioinformatician from the University of Sydney, was mentored by Cristina Tuñi i Domínguez, a Bioinformatics Scientist at Flomics Biotech SL. During the program, they developed Nextflow configuration files. As a part of this, they built institutional configuration files for multiple national research HPC and cloud infrastructures in Australia. Towards the end of the mentorship, they [built a tool for building configuration files](https://github.com/georgiesamaha/configBuilder-nf) that they hope to share widely in the future.\n\n## Ícaro Maia Santos de Castro and Robert Petit\n\nÍcaro Maia Santos is a Ph.D. Candidate at the University of São Paulo. He was mentored by Robert, a Research Scientist from Wyoming Public Health Lab. After learning the basics of Nextflow and nf-core, they worked on a [metatranscriptomics pipeline](https://github.com/icaromsc/nf-core-phiflow) that simultaneously characterizes microbial composition and host gene expression RNA sequencing samples. As a part of this process, they used nf-core modules that were already available and developed and contributed new modules to the nf-core repository. Ícaro found having someone to help him learn and overcome issues as he was developing his pipeline was invaluable for his career.\n\n![phiflow metro map](/img/phiflow_metro_map.png)
\n_Metro map of the phiflow workflow._\n\n## Lila Maciel Rodríguez Pérez and Priyanka Surana\n\nLila Maciel Rodríguez Pérez, from the National Agrarian University in Peru, was mentored by Priyanka Surana, a researcher from the Wellcome Sanger Institute in the UK. Lila and Priyanka focused on building and deploying Nextflow scripts for metagenomic assemblies. In particular, they were interested in the identification of Antibiotic-Resistant Genes (ARG), Metal-Resistant Genes (MRG), and Mobile Genetic Elements (MGE) in different environments, and in figuring out how these genes are correlated. Both Lila and Priyanka spoke highly of each other and how much they enjoyed being a part of the program.\n\n## Luisa Sacristan and Gisela Gabernet\n\nLuisa is an MSc. student studying computational biology in the Computational Biology and Microbial Ecology group at Universidad de los Andes in Colombia. She was mentored by Gisela Gabernet, a researcher at Yale Medical School. At the start of the program, Luisa and Gisela focused on learning more about GitHub. They quickly moved on to developing an nf-core configuration file for Luisa’s local university cluster. Finally, they started developing a pipeline for the analysis of custom ONT metagenomic amplicons from coffee beans.\n\n## Natalia Coutouné and Marcel Ribeiro-Dantas\n\nNatalia Coutoné is a Ph.D. Candidate at the University of Campinas in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera. Natalia and Marcel worked on developing a pipeline to identify relevant QTL among two or more pool-seq samples. Learning the little things, such as how and where to get help was a valuable part of the learning process for Natalia. She also found it especially useful to consolidate a “Frankenstein” pipeline she had been using into a cohesive Nextflow pipeline that she could share with others.\n\n## Raquel Manzano and Maxime Garcia\n\nRaquel Manzano is a bioinformatician and Ph.D. candidate at the University of Cambridge, Cancer Research UK Cambridge Institute. She was mentored by Maxime Garcia, a bioinformatics engineer at Seqera. During the program, they spent their time developing the [nf-core/rnadnavar](https://github.com/nf-core/rnadnavar/) pipeline. Initially designed for cancer research, this pipeline identifies a consensus call set from RNA and DNA somatic variant calling tools. Both Raquel and Maxime found the program to be highly rewarding. Raquel’s [presentation](https://www.youtube.com/watch?v=PzGOvqSI5n0) about the rnadnavar pipeline and her experience as a mentee from the 2023 Nextflow Summit in Barcelona is now online.\n\n## Conclusion\n\nWe are thrilled to report that the feedback from both mentors and mentees has been overwhelmingly positive. Every participant, whether mentor or mentee, found the experience extremely valuable and expressed gratitude for the chance to participate.\n\n> *“I loved the experience and the opportunity to develop my autonomy in nextflow/nf-core. This community is totally amazing!”* - Icaro Castro\n\n> *“I think this was a great opportunity to learn about a tool that can make our day-to-day easier and reproducible. Who knows, maybe it can give you a better chance when applying for jobs.”* - Alan Möbbs\n\nThanks to the fantastic support of the Chan Zuckerberg Initiative Diversity and Inclusion grant, Seqera, and our fantastic community, who made it possible to run all three rounds of the Nextflow and nf-core mentorship program.", "images": [ "/img/mentorship_3_sticker.png" ], @@ -563,7 +563,7 @@ "slug": "2023/geraldine-van-der-auwera-joins-seqera", "title": "Geraldine Van der Auwera joins Seqera", "date": "2023-10-11T00:00:00.000Z", - "content": "\n\n\nI’m excited to announce that I’m joining Seqera as Lead Developer Advocate. My mission is to support the growth of the Nextflow user community, especially in the USA, which will involve running community events, conducting training sessions, managing communications and working globally with our partners across the field to ensure Nextflow users have what they need to be successful. I’ll be working remotely from Boston, in collaboration with Paolo, Phil and the rest of the Nextflow team.\n\nSome of you may already know me from my previous job at the Broad Institute, where I spent a solid decade doing outreach and providing support for the genomics research community, first for GATK, then for WDL and Cromwell, and eventually Terra. A smaller subset might have come across the O’Reilly book I co-authored, [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/).\n\nThis new mission is very much a continuation of my dedication to helping the research community use cutting-edge software tools effectively.\n\n## From bacterial cultures to large-scale genomics\n\nTo give you a brief sense of where I’m coming from, I originally trained as a wetlab microbiologist in my homeland of Belgium, so it’s fair to say I’ve come a long way, quite literally. I never took a computing class, but taught myself Python during my PhD to analyze bacterial plasmid sequencing data (72 kb of Sanger sequence!) and sort of fell in love with bioinformatics in the process. Later, I got the opportunity to deepen my bioinformatics skills during my postdoc at Harvard Medical School, although my overall research project was still very focused on wetlab work.\n\nToward the end of my postdoc, I realized I had become more interested in the software side of things, though I didn’t have any formal qualifications. Fortunately I was able to take a big leap sideways and found a new home at the Broad Institute, where I was hired as a Bioinformatics Scientist to build out the GATK community, at a time when it was still a bit niche. (It’s a long story that I don’t have time for today, but I’m always happy to tell it over drinks at a conference reception…)\n\nThe GATK job involved providing technical and scientific support to researchers, developing documentation, and teaching workshops about genomics and variant calling specifically. Which is hilarious because at the time I was hired, I had no clue what variant calling even meant! I think I was easily a month or two into the job before that part actually started making a little bit of sense. I still remember the stress and confusion of trying to figure all that out, and it’s something I always carry with me when I think about how to help newcomers to the ecosystem. I can safely say, whatever aspect of this highly multidisciplinary field is causing you trouble, I’ve struggled with it myself at some point.\n\nAnyway, I can’t fully summarize a decade in a couple of paragraphs, but suffice to say, I learned an enormous amount on the job. And in the process, I developed a passion for helping researchers take maximum advantage of the powerful bioinformatics at their disposal. Which inevitably involves workflows.\n\n## Going with the flow\n\nOver time my responsibilities at the Broad grew into supporting not just GATK, but also the workflow systems people use to run tools like GATK at scale, both on premises and increasingly, on public cloud platforms. My own pipelining experience has been focused on WDL and Cromwell, but I’ve dabbled with most of the mainstream tools in the space.\n\nIf I had a dollar for every time I’ve been asked the question “What’s the best workflow language?” I’d still need a full-time job, but I could maybe take a nice holiday somewhere warm. Oh, and my answer is: whatever gets the work done, plays nice with the systems you’re tied to, and connects you to a community.\n\nThat’s one of the reasons I’ve been watching the growth of Nextflow’s popularity with great interest for the last few years. The amount of community engagement that we’ve seen around Nextflow, and especially around the development of nf-core, has been really impressive.\n\nSo I’m especially thrilled to be joining the Seqera team the week of the [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, because it means I’ll get to meet a lot of people from the community in person during my very first few days on the job. I’m also very much looking forward to participating in the hackathon, which should be a great way for me to get started doing real work with Nextflow.\n\nI’m hoping to see many of you there!\n", + "content": "\n\nI’m excited to announce that I’m joining Seqera as Lead Developer Advocate. My mission is to support the growth of the Nextflow user community, especially in the USA, which will involve running community events, conducting training sessions, managing communications and working globally with our partners across the field to ensure Nextflow users have what they need to be successful. I’ll be working remotely from Boston, in collaboration with Paolo, Phil and the rest of the Nextflow team.\n\nSome of you may already know me from my previous job at the Broad Institute, where I spent a solid decade doing outreach and providing support for the genomics research community, first for GATK, then for WDL and Cromwell, and eventually Terra. A smaller subset might have come across the O’Reilly book I co-authored, [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/).\n\nThis new mission is very much a continuation of my dedication to helping the research community use cutting-edge software tools effectively.\n\n## From bacterial cultures to large-scale genomics\n\nTo give you a brief sense of where I’m coming from, I originally trained as a wetlab microbiologist in my homeland of Belgium, so it’s fair to say I’ve come a long way, quite literally. I never took a computing class, but taught myself Python during my PhD to analyze bacterial plasmid sequencing data (72 kb of Sanger sequence!) and sort of fell in love with bioinformatics in the process. Later, I got the opportunity to deepen my bioinformatics skills during my postdoc at Harvard Medical School, although my overall research project was still very focused on wetlab work.\n\nToward the end of my postdoc, I realized I had become more interested in the software side of things, though I didn’t have any formal qualifications. Fortunately I was able to take a big leap sideways and found a new home at the Broad Institute, where I was hired as a Bioinformatics Scientist to build out the GATK community, at a time when it was still a bit niche. (It’s a long story that I don’t have time for today, but I’m always happy to tell it over drinks at a conference reception…)\n\nThe GATK job involved providing technical and scientific support to researchers, developing documentation, and teaching workshops about genomics and variant calling specifically. Which is hilarious because at the time I was hired, I had no clue what variant calling even meant! I think I was easily a month or two into the job before that part actually started making a little bit of sense. I still remember the stress and confusion of trying to figure all that out, and it’s something I always carry with me when I think about how to help newcomers to the ecosystem. I can safely say, whatever aspect of this highly multidisciplinary field is causing you trouble, I’ve struggled with it myself at some point.\n\nAnyway, I can’t fully summarize a decade in a couple of paragraphs, but suffice to say, I learned an enormous amount on the job. And in the process, I developed a passion for helping researchers take maximum advantage of the powerful bioinformatics at their disposal. Which inevitably involves workflows.\n\n## Going with the flow\n\nOver time my responsibilities at the Broad grew into supporting not just GATK, but also the workflow systems people use to run tools like GATK at scale, both on premises and increasingly, on public cloud platforms. My own pipelining experience has been focused on WDL and Cromwell, but I’ve dabbled with most of the mainstream tools in the space.\n\nIf I had a dollar for every time I’ve been asked the question “What’s the best workflow language?” I’d still need a full-time job, but I could maybe take a nice holiday somewhere warm. Oh, and my answer is: whatever gets the work done, plays nice with the systems you’re tied to, and connects you to a community.\n\nThat’s one of the reasons I’ve been watching the growth of Nextflow’s popularity with great interest for the last few years. The amount of community engagement that we’ve seen around Nextflow, and especially around the development of nf-core, has been really impressive.\n\nSo I’m especially thrilled to be joining the Seqera team the week of the [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, because it means I’ll get to meet a lot of people from the community in person during my very first few days on the job. I’m also very much looking forward to participating in the hackathon, which should be a great way for me to get started doing real work with Nextflow.\n\nI’m hoping to see many of you there!", "images": [ "/img/geraldine-van-der-auwera.jpg" ], @@ -574,7 +574,7 @@ "slug": "2023/introducing-nextflow-ambassador-program", "title": "Introducing the Nextflow Ambassador Program", "date": "2023-10-18T00:00:00.000Z", - "content": "\n\n\nWe are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers.\n\nNextflow ambassadors will play a vital role in:\n\n- Sharing Knowledge: Ambassadors provide valuable insights and best practices to help users make the most of Nextflow by writing training material and blog posts, giving seminars and workshops, organizing hackathons and meet-ups, and helping with community support.\n- Fostering Collaboration: As knowledgeable members of our community, ambassadors facilitate connections among users and developers, enabling collaboration on community projects, such as nf-core pipelines, sub-workflows, and modules, among other things, in the Nextflow ecosystem.\n- Community Growth: Ambassadors help expand and enrich the Nextflow community, making it more vibrant and supportive. They are local contacts for new community members and engage with potential users in their region and fields of expertise.\n\nAs community members who already actively contribute to outreach, ambassadors will be supported to extend the work they're already doing. For example, many of our ambassadors run local Nextflow training events – to help with this, the program will include “train the trainer” sessions and give access to our content library with slide decks, templates, and more. Ambassadors can also request stickers and financial support for events they organize (e.g., for pizza). Seqera is opening an exclusive travel fund that ambassadors can apply to help cover travel costs for events where they will present relevant work. Social media content written by ambassadors will be amplified by the nextflow and nf-core accounts, increasing their reach. Ambassadors will get \"behind the scenes\" access, with insights into running an open-source community, early access to new features, and a great networking experience. The ambassador network will enable members to be kept up-to-date with events and opportunities happening all over the world. To recognize their efforts, ambassadors will receive exclusive swag and apparel, a certificate for their work, and a profile on the ambassador page of our website.\n\n## Meet Our Ambassadors\n\nYou can visit our [Nextflow ambassadors page](https://www.nextflow.io/our_ambassadors.html) to learn more about our first group of ambassadors. You will find their profiles there, highlighting their interests, expertise, and insights they bring to the Nextflow ecosystem.\n\nYou can see snippets about some of our ambassadors below:\n\n#### Priyanka Surana\n\nPriyanka Surana is a Principal Bioinformatician at the Wellcome Sanger Institute, where she oversees the Nextflow development for the Tree of Life program. Over the last almost two years, they have released nine pipelines with nf-core standards and have three more in development. You can learn more about them [here](https://pipelines.tol.sanger.ac.uk/pipelines).\n\nShe’s one of our ambassadors in the UK 🇬🇧 and has already done fantastic outreach work, organizing seminars and bringing many new users to our community! 🤩 In the March Hackathon, she organized a local site with over 70 individuals participating in person, plus over five other events in 2023. The Nextflow community on the Wellcome Genome Campus started in March 2023 with the nf-core hackathon, and now it has grown to over 150 members across 11 different organizations across Cambridge. Currently, they are planning a day-long Nextflow Symposium in December 🤯. They do seminars, workshops, coffee meetups, and trainings. In our previous round of the Nextflow and nf-core mentorship, Priyanka mentored Lila, a graduate student in Peru, to build her first Nextflow pipeline using nf-core tools to analyze bacterial metagenomics data. This is the power of a Nextflow ambassador! Not only growing a local community but helping people all over the world to get the best out of Nextflow and nf-core 🥰.\n\n#### Abhinav Sharma\n\nAbhinav is a PhD candidate at Stellenbosch University, South Africa. As a Nextflow Ambassador, Abhinav has been tremendously active in the Global South, supporting young scientists in Africa 🇿🇦🇿🇲, Brazil 🇧🇷, India 🇮🇳 and Australia 🇦🇺 leading to the growth of local communities. He has contributed to the [Nextflow training in Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ) and played a key role in integrating African bioinformaticians in the Nextflow and nf-core community and initiatives, showcased by the high participation of individuals in African countries who benefited from mentorship during nf-core Hackathons, Training events and prominent workshops like [VEME, 2023](https://twitter.com/abhi18av/status/1695863348162675042). In Australia, Abhinav continues to collaborate with Patricia, a research scientist from Telethon Kids Institute, Perth (whom he mentored during the nf-core mentorship round 2), to organize monthly seminars on [BioWiki](https://github.com/TelethonKids/Nextflow-BioWiki) and bootcamp for local capacity building. In addition, he engages in regular capacity-building sessions in Brazilian institutes such as [Instituto Evandro Chagas](https://www.gov.br/iec/pt-br/assuntos/noticias/curso-contribui-para-criacao-da-rede-norte-nordeste-de-vigilancia-genomica-para-tuberculose-no-iec) (Belém, Brazil) and INI, FIOCRUZ (Rio de Janeiro, Brazil). Last but not least, Abhinav has contributed to the Nextflow community and project in several ways, even to the extent of contributing to the Nextflow code base and plugin ecosystem! 😎\n\n#### Robert Petit\n\nRobert Petit is the Senior Bioinformatics Scientist at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/) 🦬 and a long-time contributor to the Nextflow community! 🥳 Being a Nextflow Ambassador, Robert has made extensive efforts to grow the Nextflow and nf-core communities, both locally and internationally. Through his work on [Bactopia](https://bactopia.github.io/), a popular and extensive Nextflow pipeline for the analysis of bacterial genomes, Robert has been able to [contribute to nf-core regularly](https://bactopia.github.io/v3.0.0/impact-and-outreach/enhancements/#enhancements-and-fixes). As a Bioconda Core team member, he is always lending a hand when called upon by the Nextflow community, whether it is to add a new recipe or approve a pull request! ⚒️ He has also delivered multiple trainings to the local community in Wyoming, US 🇺🇸, and workshops at conferences, including ASM Microbe. Robert's dedication as a Nextflow Ambassador is best highlighted, and he'll agree, by his active role as a mentor. Robert has acted as a mentor multiple times during virtual nf-core hackathons, and he is the only person to be a mentor in all three rounds of the Nextflow and nf-core mentorship program 😍!\n\nThe Nextflow Ambassador Program is a testament to the power of community-driven innovation, and we invite you to join us in celebrating this exceptional group. In the coming weeks and months, you will hear more from our ambassadors as they continue to share their experiences, insights, and expertise with the community as freshly minted Nextflow ambassadors.\n", + "content": "\n\nWe are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers.\n\nNextflow ambassadors will play a vital role in:\n\n- Sharing Knowledge: Ambassadors provide valuable insights and best practices to help users make the most of Nextflow by writing training material and blog posts, giving seminars and workshops, organizing hackathons and meet-ups, and helping with community support.\n- Fostering Collaboration: As knowledgeable members of our community, ambassadors facilitate connections among users and developers, enabling collaboration on community projects, such as nf-core pipelines, sub-workflows, and modules, among other things, in the Nextflow ecosystem.\n- Community Growth: Ambassadors help expand and enrich the Nextflow community, making it more vibrant and supportive. They are local contacts for new community members and engage with potential users in their region and fields of expertise.\n\nAs community members who already actively contribute to outreach, ambassadors will be supported to extend the work they're already doing. For example, many of our ambassadors run local Nextflow training events – to help with this, the program will include “train the trainer” sessions and give access to our content library with slide decks, templates, and more. Ambassadors can also request stickers and financial support for events they organize (e.g., for pizza). Seqera is opening an exclusive travel fund that ambassadors can apply to help cover travel costs for events where they will present relevant work. Social media content written by ambassadors will be amplified by the nextflow and nf-core accounts, increasing their reach. Ambassadors will get \"behind the scenes\" access, with insights into running an open-source community, early access to new features, and a great networking experience. The ambassador network will enable members to be kept up-to-date with events and opportunities happening all over the world. To recognize their efforts, ambassadors will receive exclusive swag and apparel, a certificate for their work, and a profile on the ambassador page of our website.\n\n## Meet Our Ambassadors\n\nYou can visit our [Nextflow ambassadors page](https://www.nextflow.io/our_ambassadors.html) to learn more about our first group of ambassadors. You will find their profiles there, highlighting their interests, expertise, and insights they bring to the Nextflow ecosystem.\n\nYou can see snippets about some of our ambassadors below:\n\n#### Priyanka Surana\n\nPriyanka Surana is a Principal Bioinformatician at the Wellcome Sanger Institute, where she oversees the Nextflow development for the Tree of Life program. Over the last almost two years, they have released nine pipelines with nf-core standards and have three more in development. You can learn more about them [here](https://pipelines.tol.sanger.ac.uk/pipelines).\n\nShe’s one of our ambassadors in the UK 🇬🇧 and has already done fantastic outreach work, organizing seminars and bringing many new users to our community! 🤩 In the March Hackathon, she organized a local site with over 70 individuals participating in person, plus over five other events in 2023. The Nextflow community on the Wellcome Genome Campus started in March 2023 with the nf-core hackathon, and now it has grown to over 150 members across 11 different organizations across Cambridge. Currently, they are planning a day-long Nextflow Symposium in December 🤯. They do seminars, workshops, coffee meetups, and trainings. In our previous round of the Nextflow and nf-core mentorship, Priyanka mentored Lila, a graduate student in Peru, to build her first Nextflow pipeline using nf-core tools to analyze bacterial metagenomics data. This is the power of a Nextflow ambassador! Not only growing a local community but helping people all over the world to get the best out of Nextflow and nf-core 🥰.\n\n#### Abhinav Sharma\n\nAbhinav is a PhD candidate at Stellenbosch University, South Africa. As a Nextflow Ambassador, Abhinav has been tremendously active in the Global South, supporting young scientists in Africa 🇿🇦🇿🇲, Brazil 🇧🇷, India 🇮🇳 and Australia 🇦🇺 leading to the growth of local communities. He has contributed to the [Nextflow training in Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ) and played a key role in integrating African bioinformaticians in the Nextflow and nf-core community and initiatives, showcased by the high participation of individuals in African countries who benefited from mentorship during nf-core Hackathons, Training events and prominent workshops like [VEME, 2023](https://twitter.com/abhi18av/status/1695863348162675042). In Australia, Abhinav continues to collaborate with Patricia, a research scientist from Telethon Kids Institute, Perth (whom he mentored during the nf-core mentorship round 2), to organize monthly seminars on [BioWiki](https://github.com/TelethonKids/Nextflow-BioWiki) and bootcamp for local capacity building. In addition, he engages in regular capacity-building sessions in Brazilian institutes such as [Instituto Evandro Chagas](https://www.gov.br/iec/pt-br/assuntos/noticias/curso-contribui-para-criacao-da-rede-norte-nordeste-de-vigilancia-genomica-para-tuberculose-no-iec) (Belém, Brazil) and INI, FIOCRUZ (Rio de Janeiro, Brazil). Last but not least, Abhinav has contributed to the Nextflow community and project in several ways, even to the extent of contributing to the Nextflow code base and plugin ecosystem! 😎\n\n#### Robert Petit\n\nRobert Petit is the Senior Bioinformatics Scientist at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/) 🦬 and a long-time contributor to the Nextflow community! 🥳 Being a Nextflow Ambassador, Robert has made extensive efforts to grow the Nextflow and nf-core communities, both locally and internationally. Through his work on [Bactopia](https://bactopia.github.io/), a popular and extensive Nextflow pipeline for the analysis of bacterial genomes, Robert has been able to [contribute to nf-core regularly](https://bactopia.github.io/v3.0.0/impact-and-outreach/enhancements/#enhancements-and-fixes). As a Bioconda Core team member, he is always lending a hand when called upon by the Nextflow community, whether it is to add a new recipe or approve a pull request! ⚒️ He has also delivered multiple trainings to the local community in Wyoming, US 🇺🇸, and workshops at conferences, including ASM Microbe. Robert's dedication as a Nextflow Ambassador is best highlighted, and he'll agree, by his active role as a mentor. Robert has acted as a mentor multiple times during virtual nf-core hackathons, and he is the only person to be a mentor in all three rounds of the Nextflow and nf-core mentorship program 😍!\n\nThe Nextflow Ambassador Program is a testament to the power of community-driven innovation, and we invite you to join us in celebrating this exceptional group. In the coming weeks and months, you will hear more from our ambassadors as they continue to share their experiences, insights, and expertise with the community as freshly minted Nextflow ambassadors.", "images": [ "/img/ambassadors-hackathon.jpeg" ], @@ -585,7 +585,7 @@ "slug": "2023/learn-nextflow-in-2023", "title": "Learn Nextflow in 2023", "date": "2023-02-24T00:00:00.000Z", - "content": "\nIn 2023, the world of Nextflow is more exciting than ever! With new resources constantly being released, there is no better time to dive into this powerful tool. From a new [Software Carpentries’](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course to [recordings of mutiple nf-core training events](https://nf-co.re/events/training/) to [new tutorials on Wave and Fusion](https://github.com/seqeralabs/wave-showcase), the options for learning Nextflow are endless.\n\nWe've compiled a list of the best resources in 2023 to make your journey to Nextflow mastery as seamless as possible. And remember, Nextflow is a community-driven project. If you have suggestions or want to contribute to this list, head to the [GitHub page](https://github.com/nextflow-io/) and make a pull request.\n\n## Before you start\n\nBefore learning Nextflow, you should be comfortable with the Linux command line and be familiar with some basic scripting languages, such as Perl or Python. The beauty of Nextflow is that task logic can be written in your language of choice. You will just need to learn Nextflow’s domain-specific language (DSL) to control overall flow.\n\nNextflow is widely used in bioinformatics, so many tutorials focus on life sciences. However, Nextflow can be used for almost any data-intensive workflow, including image analysis, ML model training, astronomy, and geoscience applications.\n\nSo, let's get started! These resources will guide you from beginner to expert and make you unstoppable in the field of scientific workflows.\n\n## Contents\n\n- [Why Learn Nextflow](#why-learn-nextflow)\n- [Meet the Tutorials!](#meet-the-tutorials)\n 1. [Basic Nextflow Community Training](#introduction-to-nextflow-by-community)\n 2. [Hands-on Nextflow Community Training](#nextflow-hands-on-by-community)\n 3. [Advanced Nextflow Community Training](#advanced-nextflow-by-community)\n 4. [Software Carpentry workshop](#software-carpentry-workshop)\n 5. [An introduction to Nextflow course by Uppsala University](#intro-nexflow-by-uppsala)\n 6. [Introduction to Nextflow workshop by VIB](#intro-nextflow-by-vib)\n 7. [Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)](#nextflow-training-cira)\n 8. [Managing Pipelines in the Cloud - GenomeWeb Webinar](#managing-pipelines-in-the-cloud-genomeweb-webinar)\n 9. [Nextflow implementation patterns](#nextflow-implementation-patterns)\n 10. [nf-core tutorials](#nf-core-tutorials)\n 11. [Awesome Nextflow](#awesome-nextflow)\n 12. [Wave showcase: Wave and Fusion tutorials](#wave-showcase-wave-and-fusion-tutorials)\n 13. [Building Containers for Scientific Workflows](#building-containers-for-scientific-workflows)\n 14. [Best Practices for Deploying Pipelines with Nextflow Tower](#best-practices-for-deploying-pipelines-with-nextflow-tower)\n- [Cloud integration tutorials](#cloud-integration-tutorials)\n 1. [Nextflow and AWS Batch Inside the Integration](#nextflow-and-aws-batch-inside-the-integration)\n 2. [Nextflow and Azure Batch Inside the Integration](#nextflow-and-azure-batch-inside-the-integration)\n 3. [Get started with Nextflow on Google Cloud Batch](#get-started-with-nextflow-on-google-cloud-batch)\n 4. [Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](#nextflow-and-k8s-rebooted-running-nextflow-on-amazon-eks)\n- [Additional resources](#additional-resources)\n 1. [Nextflow docs](#nextflow-docs)\n 2. [Seqera Labs docs](#seqera-labs-docs)\n 3. [nf-core](#nf-core)\n 4. [Nextflow Tower](#nextflow-tower)\n 5. [Nextflow on AWS](#nextflow-on-aws)\n 6. [Nextflow Data pipelines on Azure Batch](#nextflow-data-pipelines-on-azure-batch)\n 7. [Running Nextflow with Google Life Sciences](#running-nextflow-with-google-life-sciences)\n 8. [Bonus: Nextflow Tutorial - Variant Calling Edition](#bonus-nextflow-tutorial-variant-calling-edition)\n- [Community and support](#community-and-support)\n\n

Why Learn Nextflow

\n\nThere are hundreds of workflow managers to choose from. In fact, Meir Wahnon and several of his colleagues have gone to the trouble of compiling an awesome-workflow-engines list. The workflows community initiative is another excellent source of information about workflow engines.\n\n- Using Nextflow in your analysis workflows helps you implement reproducible pipelines. Nextflow pipelines follow [FAIR guidelines](https://www.go-fair.org/fair-principles/) (findability, accessibility, interoperability, and reuse). Nextflow also supports version control and containers to manage all software dependencies.\n- Nextflow is portable; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon AWS, Microsoft Azure, Google Cloud Platform, or Kubernetes. With features like [configuration profiles](https://nextflow.io/docs/latest/config.html?#config-profiles), code can be written so that it is 100% portable across different on-prem and cloud infrastructures enabling collaboration and avoiding lock-in.\n- It is massively **scalable**, allowing the parallelization of tasks using the dataflow paradigm without hard-coding pipelines to specific platforms, workload managers, or batch services.\n- Nextflow is **flexible**, supporting scientific workflow requirements like caching processes to avoid redundant computation and workflow reporting to help understand and diagnose workflow execution patterns.\n- It is **growing fast**, and **support is available** from [Seqera Labs](https://seqera.io). The project has been active since 2013 with a vibrant developer community, and the Nextflow ecosystem continues to expand rapidly.\n- Finally, Nextflow is open source and licensed under Apache 2.0. You are free to use it, modify it, and distribute it.\n\n

Meet the Tutorials!

\n\nSome of the best publicly available tutorials are listed below:\n\n

1. Basic Nextflow Community Training

\n\nBasic training for all things Nextflow. Perfect for anyone looking to get to grips with using Nextflow to run analyses and build workflows. This is the primary Nextflow training material used in most Nextflow and nf-core training events. It covers a large number of topics, with both theoretical and hands-on chapters.\n\n[Basic Nextflow Community Training](https://training.nextflow.io/basic_training/)\n\nWe run a free online training event for this course approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community. You can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-basic-2023)) in the [YouTube playlist](https://youtu.be/ERbTqLtAkps?si=6xDoDXsb6kGQ_Qa8) below:\n\n
\n \n
\n\n

2. Hands-on Nextflow Community Training

\n\nA \"learn by doing\" tutorial with less focus on theory, instead leading through exercises of slowly increasing complexity. This course is quite short and hands-on, great if you want to practice your Nextflow skills.\n\n[Hands-on Nextflow Community Training](https://training.nextflow.io/hands_on/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-hands-on-2023/)) below:\n\n
\n \n
\n\n

3. Advanced Nextflow Community Training

\n\nAn advanced material exploring the advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows. This is the Nextflow training material used in advanced training events.\n\n[Advanced Nextflow Community Training](https://training.nextflow.io/advanced/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-sept-2023/)) below:\n\n
\n \n
\n\n

4. Software Carpentry workshop

\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (still being developed) explains the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience is those with little programming experience. The course provides a foundation to write and run Nextflow and nf-core workflows comfortably. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit The Carpentries training style. [The Carpentries](https://carpentries.org/) emphasize feedback to improve teaching materials, so we would like to hear back from you about what you thought was well-explained and what needs improvement. Pull requests to the course material are very welcome.\nThe workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nYou can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n

5. An introduction to Nextflow course from Uppsala University

\n\nThis 5-module course by Uppsala University covers the basics of Nextflow, from running Nextflow pipelines, writing your own pipelines and even using containers and conda.\n\nThe course can be viewed [here](https://uppsala.instructure.com/courses/51980/pages/nextflow-1-introduction?module_item_id=328997).\n\n

6. Introduction to Nextflow workshop by VIB

\n\nWorkshop materials by VIB (mainly) in DSL2 aiming to get familiar with the Nextflow syntax by explaining basic concepts and building a simple RNAseq pipeline. Highlights also reproducibility aspects with adding containers (docker & singularity).\n\nThe course can be viewed [here](https://vibbits-nextflow-workshop.readthedocs.io/en/latest/).\n\n

7. Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)

\n\nThis training was prepared for physicists and has examples applied to astronomy which may be interesting for Nextflow users coming from this background!\n\nThe course can be viewed [here](https://carpentries-incubator.github.io/Pipeline_Training_with_Nextflow/).\n\n

8. Managing Pipelines in the Cloud - GenomeWeb Webinar

\n\nThis on-demand webinar features Phil Ewels from SciLifeLab, nf-core (now also Seqera Labs), Brendan Boufler from Amazon Web Services, and Evan Floden from Seqera Labs. The wide-ranging discussion covers the significance of scientific workflows, examples of Nextflow in production settings, and how Nextflow can be integrated with other processes.\n\n[Watch the webinar](https://seqera.io/events/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n

9. Nextflow implementation patterns

\n\nThis advanced documentation discusses recurring patterns in Nextflow and solutions to many common implementation requirements. Code examples are available with notes to follow along and a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n

10. nf-core tutorials

\n\nA set of tutorials covering the basics of using and creating nf-core pipelines developed by the team at [nf-core](https://nf-co.re/). These tutorials provide an overview of the nf-core framework, including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n[nf-core usage tutorials](https://nf-co.re/docs/usage/tutorials) and [nf-core developer tutorials](https://nf-co.re/docs/contributing/tutorials).\n\n

11. Awesome Nextflow

\n\nA collection of awesome Nextflow pipelines compiled by various contributors to the open-source Nextflow project.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) and GitHub\n\n

12. Wave showcase: Wave and Fusion tutorials

\n\nWave and the Fusion file system are new Nextflow capabilities introduced in November 2022. Wave is a container provisioning and augmentation service fully integrated with the Nextflow ecosystem. Instead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself.\n\nTightly coupled with Wave is the new Fusion 2.0 file system. Fusion implements a virtual distributed file system and presents a thin client, allowing data hosted in AWS S3 buckets (and other object stores in the future) to be accessed via the standard POSIX filesystem interface expected by most applications.\n\nWave can help simplify development, improve reliability, and make pipelines easier to maintain. It can even improve pipeline performance. The optional Fusion 2.0 file system offers further advantages, delivering performance on par with FSx for Lustre while enabling organizations to reduce their cloud computing bill and improve pipeline efficiency throughput. See the [blog article](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) released in February 2023 explaining the Fusion file system and providing benchmarks comparing Fusion to other data handling approaches in the cloud.\n\n[Wave showcase](https://github.com/seqeralabs/wave-showcase) on GitHub\n\n

13. Building Containers for Scientific Workflows

\n\nWhile not strictly a guide about Nextflow, this article provides an overview of scientific containers and provides a tutorial involved in creating your own container and integrating it into a Nextflow pipeline. It also provides some useful tips on troubleshooting containers and publishing them to registries.\n\n[Building Containers for Scientific Workflows](https://seqera.io/blog/building-containers-for-scientific-workflows/)\n\n

14. Best Practices for Deploying Pipelines with Nextflow Tower

\n\nWhen building Nextflow pipelines, a best practice is to supply a nextflow_schema.json file describing pipeline input parameters. The benefit of adding this file to your code repository, is that if the pipeline is launched using Nextflow, the schema enables an easy-to-use web interface that users through the process of parameter selection. While it is possible to craft this file by hand, the nf-core community provides a handy schema build tool. This step-by-step guide explains how to adapt your pipeline for use with Nextflow Tower by using the schema build tool to automatically generate the nextflow_schema.json file.\n\n[Best Practices for Deploying Pipelines with Nextflow Tower](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-nextflow-tower/)\n\n

Cloud integration tutorials

\n\nIn addition to the learning resources above, several step-by-step integration guides explain how to run Nextflow pipelines on your cloud platform of choice. Some of these tutorials extend to the use of [Nextflow Tower](https://cloud.tower.nf/). Organizations can use the Tower Cloud Free edition to launch pipelines quickly in the cloud. Organizations can optionally use Tower Cloud Professional or run self-hosted or on-premises Tower Enterprise environments as requirements grow. This year, we added Google Cloud Batch to the cloud services supported by Nextflow.\n\n

1. Nextflow and AWS Batch — Inside the Integration

\n\nThis three-part series of articles provides a step-by-step guide explaining how to use Nextflow with AWS Batch. The [first of three articles](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) covers AWS Batch concepts, the Nextflow execution model, and explains how the integration works under the covers. The [second article](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/) in the series provides a step-by-step guide explaining how to set up the AWS batch environment and how to run and troubleshoot open-source Nextflow pipelines. The [third article](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/) builds on what you've learned, explaining how to integrate workflows with Nextflow Tower and share the AWS Batch environment with other users by \"publishing\" your workflows to the cloud.\n\nNextflow and AWS Batch — Inside the Integration ([part 1 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/), [part 2 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/), [part 3 of 3](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/))\n\n

2. Nextflow and Azure Batch — Inside the Integration

\n\nSimilar to the tutorial above, this set of articles does a deep dive into the Nextflow Azure Batch integration. [Part 1](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/) covers Azure Batch and essential concepts, provides an overview of the integration, and explains how to set up Azure Batch and Storage accounts. It also covers deploying a machine instance in the Azure cloud and configuring it to run Nextflow pipelines against the Azure Batch service.\n\n[Part 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/) builds on what you learned in part 1 and shows how to use Azure Batch from within Nextflow Tower Cloud. It provides a walkthrough of how to make the environment set up in part 1 accessible to users through Tower's intuitive web interface.\n\nNextflow and Azure Batch — Inside the Integration ([part 1 of 2](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/), [part 2 of 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/))\n\n

3. Get started with Nextflow on Google Cloud Batch

\n\nThis excellent article by Marcel Ribeiro-Dantas provides a step-by-step tutorial on using Nextflow with Google’s new Google Cloud Batch service. Google Cloud Batch is expected to replace the Google Life Sciences integration over time. The article explains how to deploy the Google Cloud Batch and Storage environments in GCP using the gcloud CLI. It then goes on to explain how to configure Nextflow to launch pipelines into the newly created Google Cloud Batch environment.\n\n[Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html)\n\n

4. Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS

\n\nWhile not commonly used for HPC workloads, Kubernetes has clear momentum. In this educational article, Ben Sherman provides an overview of how the Nextflow / Kubernetes integration has been simplified by avoiding the requirement for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). This detailed guide provides step-by-step instructions for using Amazon EKS as a compute environment complete with how to configure IAM Roles for Kubernetes Services Accounts (IRSA), now an Amazon EKS best practice.\n\n[Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/)\n\n

Additional resources

\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community which maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n

1. Nextflow docs

\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month, with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n

2. Seqera Labs docs

\n\nAn index of documentation, deployment guides, training materials, and resources for all things Nextflow and Tower.\n\n[Seqera Labs docs](https://seqera.io/docs/)\n\n

3. nf-core

\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up for the Slack channel.\n\n[nf-core website](https://nf-co.re/) and [nf-core Slack](https://nf-co.re/join)\n\n

4. Nextflow Tower

\n\nNextflow Tower is a platform to easily monitor, launch, and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines, and launching using either the web graphic interface, CLI, or API.\n\n[Nextflow Tower](https://tower.nf/) and [user documentation](http://help.tower.nf/).\n\n

5. Nextflow on AWS

\n\nPart of the Genomics Workflows on AWS, Amazon provides a quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes. While this article is packed with good information, the procedure outlined in the more recent [Nextflow and AWS Batch – Inside the integration](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) series, may be an easier place to start. Some of the steps that previously needed to be performed manually have been updated in the latest integration.\n\n[Nextflow on AWS Batch](https://docs.opendata.aws/genomics-workflows/orchestration/nextflow/nextflow-overview.html)\n\n

6. Nextflow Data Pipelines on Azure Batch

\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guide below developed by the team at Microsoft to set up both services on Azure, and to get your storage and batch account names and keys.\n\n[Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n

7. Running Nextflow with Google Life Sciences

\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud. Note that this integration process is specific to Google Life Sciences – an offering that pre-dates Google Cloud Batch. If you want to use the newer integration approach, you can also check out the Nextflow blog article [Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html).\n\n[Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow]\n\n

8. Bonus: Nextflow Tutorial - Variant Calling Edition

\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash script into a Nextflow workflow, and modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git), where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\nYou can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n

Community and support

\n\n- [Seqera Community Forum](https://community.seqera.io)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Nextflow Slack](https://www.nextflow.io/slack-invite.html)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io/) and [Nextflow Tower](https://tower.nf/)\n- [Nextflow patterns](https://github.com/nextflow-io/patterns)\n- [Nextflow Snippets](https://github.com/mribeirodantas/NextflowSnippets)\n", + "content": "In 2023, the world of Nextflow is more exciting than ever! With new resources constantly being released, there is no better time to dive into this powerful tool. From a new [Software Carpentries’](https://carpentries-incubator.github.io/workflows-nextflow/index.html) course to [recordings of mutiple nf-core training events](https://nf-co.re/events/training/) to [new tutorials on Wave and Fusion](https://github.com/seqeralabs/wave-showcase), the options for learning Nextflow are endless.\n\nWe've compiled a list of the best resources in 2023 to make your journey to Nextflow mastery as seamless as possible. And remember, Nextflow is a community-driven project. If you have suggestions or want to contribute to this list, head to the [GitHub page](https://github.com/nextflow-io/) and make a pull request.\n\n## Before you start\n\nBefore learning Nextflow, you should be comfortable with the Linux command line and be familiar with some basic scripting languages, such as Perl or Python. The beauty of Nextflow is that task logic can be written in your language of choice. You will just need to learn Nextflow’s domain-specific language (DSL) to control overall flow.\n\nNextflow is widely used in bioinformatics, so many tutorials focus on life sciences. However, Nextflow can be used for almost any data-intensive workflow, including image analysis, ML model training, astronomy, and geoscience applications.\n\nSo, let's get started! These resources will guide you from beginner to expert and make you unstoppable in the field of scientific workflows.\n\n## Contents\n\n- [Why Learn Nextflow](#why-learn-nextflow)\n- [Meet the Tutorials!](#meet-the-tutorials)\n 1. [Basic Nextflow Community Training](#introduction-to-nextflow-by-community)\n 2. [Hands-on Nextflow Community Training](#nextflow-hands-on-by-community)\n 3. [Advanced Nextflow Community Training](#advanced-nextflow-by-community)\n 4. [Software Carpentry workshop](#software-carpentry-workshop)\n 5. [An introduction to Nextflow course by Uppsala University](#intro-nexflow-by-uppsala)\n 6. [Introduction to Nextflow workshop by VIB](#intro-nextflow-by-vib)\n 7. [Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)](#nextflow-training-cira)\n 8. [Managing Pipelines in the Cloud - GenomeWeb Webinar](#managing-pipelines-in-the-cloud-genomeweb-webinar)\n 9. [Nextflow implementation patterns](#nextflow-implementation-patterns)\n 10. [nf-core tutorials](#nf-core-tutorials)\n 11. [Awesome Nextflow](#awesome-nextflow)\n 12. [Wave showcase: Wave and Fusion tutorials](#wave-showcase-wave-and-fusion-tutorials)\n 13. [Building Containers for Scientific Workflows](#building-containers-for-scientific-workflows)\n 14. [Best Practices for Deploying Pipelines with Nextflow Tower](#best-practices-for-deploying-pipelines-with-nextflow-tower)\n- [Cloud integration tutorials](#cloud-integration-tutorials)\n 1. [Nextflow and AWS Batch Inside the Integration](#nextflow-and-aws-batch-inside-the-integration)\n 2. [Nextflow and Azure Batch Inside the Integration](#nextflow-and-azure-batch-inside-the-integration)\n 3. [Get started with Nextflow on Google Cloud Batch](#get-started-with-nextflow-on-google-cloud-batch)\n 4. [Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](#nextflow-and-k8s-rebooted-running-nextflow-on-amazon-eks)\n- [Additional resources](#additional-resources)\n 1. [Nextflow docs](#nextflow-docs)\n 2. [Seqera Labs docs](#seqera-labs-docs)\n 3. [nf-core](#nf-core)\n 4. [Nextflow Tower](#nextflow-tower)\n 5. [Nextflow on AWS](#nextflow-on-aws)\n 6. [Nextflow Data pipelines on Azure Batch](#nextflow-data-pipelines-on-azure-batch)\n 7. [Running Nextflow with Google Life Sciences](#running-nextflow-with-google-life-sciences)\n 8. [Bonus: Nextflow Tutorial - Variant Calling Edition](#bonus-nextflow-tutorial-variant-calling-edition)\n- [Community and support](#community-and-support)\n\n

Why Learn Nextflow

\n\nThere are hundreds of workflow managers to choose from. In fact, Meir Wahnon and several of his colleagues have gone to the trouble of compiling an awesome-workflow-engines list. The workflows community initiative is another excellent source of information about workflow engines.\n\n- Using Nextflow in your analysis workflows helps you implement reproducible pipelines. Nextflow pipelines follow [FAIR guidelines](https://www.go-fair.org/fair-principles/) (findability, accessibility, interoperability, and reuse). Nextflow also supports version control and containers to manage all software dependencies.\n- Nextflow is portable; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon AWS, Microsoft Azure, Google Cloud Platform, or Kubernetes. With features like [configuration profiles](https://nextflow.io/docs/latest/config.html?#config-profiles), code can be written so that it is 100% portable across different on-prem and cloud infrastructures enabling collaboration and avoiding lock-in.\n- It is massively **scalable**, allowing the parallelization of tasks using the dataflow paradigm without hard-coding pipelines to specific platforms, workload managers, or batch services.\n- Nextflow is **flexible**, supporting scientific workflow requirements like caching processes to avoid redundant computation and workflow reporting to help understand and diagnose workflow execution patterns.\n- It is **growing fast**, and **support is available** from [Seqera Labs](https://seqera.io). The project has been active since 2013 with a vibrant developer community, and the Nextflow ecosystem continues to expand rapidly.\n- Finally, Nextflow is open source and licensed under Apache 2.0. You are free to use it, modify it, and distribute it.\n\n

Meet the Tutorials!

\n\nSome of the best publicly available tutorials are listed below:\n\n

1. Basic Nextflow Community Training

\n\nBasic training for all things Nextflow. Perfect for anyone looking to get to grips with using Nextflow to run analyses and build workflows. This is the primary Nextflow training material used in most Nextflow and nf-core training events. It covers a large number of topics, with both theoretical and hands-on chapters.\n\n[Basic Nextflow Community Training](https://training.nextflow.io/basic_training/)\n\nWe run a free online training event for this course approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community. You can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-basic-2023)) in the [YouTube playlist](https://youtu.be/ERbTqLtAkps?si=6xDoDXsb6kGQ_Qa8) below:\n\n
\n \n
\n\n

2. Hands-on Nextflow Community Training

\n\nA \"learn by doing\" tutorial with less focus on theory, instead leading through exercises of slowly increasing complexity. This course is quite short and hands-on, great if you want to practice your Nextflow skills.\n\n[Hands-on Nextflow Community Training](https://training.nextflow.io/hands_on/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-hands-on-2023/)) below:\n\n
\n \n
\n\n

3. Advanced Nextflow Community Training

\n\nAn advanced material exploring the advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows. This is the Nextflow training material used in advanced training events.\n\n[Advanced Nextflow Community Training](https://training.nextflow.io/advanced/)\n\nYou can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-sept-2023/)) below:\n\n
\n \n
\n\n

4. Software Carpentry workshop

\n\nThe [Nextflow Software Carpentry](https://carpentries-incubator.github.io/workflows-nextflow/index.html) workshop (still being developed) explains the use of Nextflow and [nf-core](https://nf-co.re/) as development tools for building and sharing reproducible data science workflows. The intended audience is those with little programming experience. The course provides a foundation to write and run Nextflow and nf-core workflows comfortably. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit The Carpentries training style. [The Carpentries](https://carpentries.org/) emphasize feedback to improve teaching materials, so we would like to hear back from you about what you thought was well-explained and what needs improvement. Pull requests to the course material are very welcome.\nThe workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow) where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.\n\nYou can find the course in [The Carpentries incubator](https://carpentries-incubator.github.io/workflows-nextflow/index.html).\n\n

5. An introduction to Nextflow course from Uppsala University

\n\nThis 5-module course by Uppsala University covers the basics of Nextflow, from running Nextflow pipelines, writing your own pipelines and even using containers and conda.\n\nThe course can be viewed [here](https://uppsala.instructure.com/courses/51980/pages/nextflow-1-introduction?module_item_id=328997).\n\n

6. Introduction to Nextflow workshop by VIB

\n\nWorkshop materials by VIB (mainly) in DSL2 aiming to get familiar with the Nextflow syntax by explaining basic concepts and building a simple RNAseq pipeline. Highlights also reproducibility aspects with adding containers (docker & singularity).\n\nThe course can be viewed [here](https://vibbits-nextflow-workshop.readthedocs.io/en/latest/).\n\n

7. Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)

\n\nThis training was prepared for physicists and has examples applied to astronomy which may be interesting for Nextflow users coming from this background!\n\nThe course can be viewed [here](https://carpentries-incubator.github.io/Pipeline_Training_with_Nextflow/).\n\n

8. Managing Pipelines in the Cloud - GenomeWeb Webinar

\n\nThis on-demand webinar features Phil Ewels from SciLifeLab, nf-core (now also Seqera Labs), Brendan Boufler from Amazon Web Services, and Evan Floden from Seqera Labs. The wide-ranging discussion covers the significance of scientific workflows, examples of Nextflow in production settings, and how Nextflow can be integrated with other processes.\n\n[Watch the webinar](https://seqera.io/events/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/)\n\n

9. Nextflow implementation patterns

\n\nThis advanced documentation discusses recurring patterns in Nextflow and solutions to many common implementation requirements. Code examples are available with notes to follow along and a GitHub repository.\n\n[Nextflow Patterns](http://nextflow-io.github.io/patterns/index.html) & [GitHub repository](https://github.com/nextflow-io/patterns).\n\n

10. nf-core tutorials

\n\nA set of tutorials covering the basics of using and creating nf-core pipelines developed by the team at [nf-core](https://nf-co.re/). These tutorials provide an overview of the nf-core framework, including:\n\n- How to run nf-core pipelines\n- What are the most commonly used nf-core tools\n- How to make new pipelines using the nf-core template\n- What are nf-core shared modules\n- How to add nf-core shared modules to a pipeline\n- How to make new nf-core modules using the nf-core module template\n- How nf-core pipelines are reviewed and ultimately released\n\n[nf-core usage tutorials](https://nf-co.re/docs/usage/tutorials) and [nf-core developer tutorials](https://nf-co.re/docs/contributing/tutorials).\n\n

11. Awesome Nextflow

\n\nA collection of awesome Nextflow pipelines compiled by various contributors to the open-source Nextflow project.\n\n[Awesome Nextflow](https://github.com/nextflow-io/awesome-nextflow) and GitHub\n\n

12. Wave showcase: Wave and Fusion tutorials

\n\nWave and the Fusion file system are new Nextflow capabilities introduced in November 2022. Wave is a container provisioning and augmentation service fully integrated with the Nextflow ecosystem. Instead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself.\n\nTightly coupled with Wave is the new Fusion 2.0 file system. Fusion implements a virtual distributed file system and presents a thin client, allowing data hosted in AWS S3 buckets (and other object stores in the future) to be accessed via the standard POSIX filesystem interface expected by most applications.\n\nWave can help simplify development, improve reliability, and make pipelines easier to maintain. It can even improve pipeline performance. The optional Fusion 2.0 file system offers further advantages, delivering performance on par with FSx for Lustre while enabling organizations to reduce their cloud computing bill and improve pipeline efficiency throughput. See the [blog article](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) released in February 2023 explaining the Fusion file system and providing benchmarks comparing Fusion to other data handling approaches in the cloud.\n\n[Wave showcase](https://github.com/seqeralabs/wave-showcase) on GitHub\n\n

13. Building Containers for Scientific Workflows

\n\nWhile not strictly a guide about Nextflow, this article provides an overview of scientific containers and provides a tutorial involved in creating your own container and integrating it into a Nextflow pipeline. It also provides some useful tips on troubleshooting containers and publishing them to registries.\n\n[Building Containers for Scientific Workflows](https://seqera.io/blog/building-containers-for-scientific-workflows/)\n\n

14. Best Practices for Deploying Pipelines with Nextflow Tower

\n\nWhen building Nextflow pipelines, a best practice is to supply a nextflow_schema.json file describing pipeline input parameters. The benefit of adding this file to your code repository, is that if the pipeline is launched using Nextflow, the schema enables an easy-to-use web interface that users through the process of parameter selection. While it is possible to craft this file by hand, the nf-core community provides a handy schema build tool. This step-by-step guide explains how to adapt your pipeline for use with Nextflow Tower by using the schema build tool to automatically generate the nextflow_schema.json file.\n\n[Best Practices for Deploying Pipelines with Nextflow Tower](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-nextflow-tower/)\n\n

Cloud integration tutorials

\n\nIn addition to the learning resources above, several step-by-step integration guides explain how to run Nextflow pipelines on your cloud platform of choice. Some of these tutorials extend to the use of [Nextflow Tower](https://cloud.tower.nf/). Organizations can use the Tower Cloud Free edition to launch pipelines quickly in the cloud. Organizations can optionally use Tower Cloud Professional or run self-hosted or on-premises Tower Enterprise environments as requirements grow. This year, we added Google Cloud Batch to the cloud services supported by Nextflow.\n\n

1. Nextflow and AWS Batch — Inside the Integration

\n\nThis three-part series of articles provides a step-by-step guide explaining how to use Nextflow with AWS Batch. The [first of three articles](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) covers AWS Batch concepts, the Nextflow execution model, and explains how the integration works under the covers. The [second article](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/) in the series provides a step-by-step guide explaining how to set up the AWS batch environment and how to run and troubleshoot open-source Nextflow pipelines. The [third article](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/) builds on what you've learned, explaining how to integrate workflows with Nextflow Tower and share the AWS Batch environment with other users by \"publishing\" your workflows to the cloud.\n\nNextflow and AWS Batch — Inside the Integration ([part 1 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/), [part 2 of 3](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/), [part 3 of 3](https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/))\n\n

2. Nextflow and Azure Batch — Inside the Integration

\n\nSimilar to the tutorial above, this set of articles does a deep dive into the Nextflow Azure Batch integration. [Part 1](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/) covers Azure Batch and essential concepts, provides an overview of the integration, and explains how to set up Azure Batch and Storage accounts. It also covers deploying a machine instance in the Azure cloud and configuring it to run Nextflow pipelines against the Azure Batch service.\n\n[Part 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/) builds on what you learned in part 1 and shows how to use Azure Batch from within Nextflow Tower Cloud. It provides a walkthrough of how to make the environment set up in part 1 accessible to users through Tower's intuitive web interface.\n\nNextflow and Azure Batch — Inside the Integration ([part 1 of 2](https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/), [part 2 of 2](https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/))\n\n

3. Get started with Nextflow on Google Cloud Batch

\n\nThis excellent article by Marcel Ribeiro-Dantas provides a step-by-step tutorial on using Nextflow with Google’s new Google Cloud Batch service. Google Cloud Batch is expected to replace the Google Life Sciences integration over time. The article explains how to deploy the Google Cloud Batch and Storage environments in GCP using the gcloud CLI. It then goes on to explain how to configure Nextflow to launch pipelines into the newly created Google Cloud Batch environment.\n\n[Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html)\n\n

4. Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS

\n\nWhile not commonly used for HPC workloads, Kubernetes has clear momentum. In this educational article, Ben Sherman provides an overview of how the Nextflow / Kubernetes integration has been simplified by avoiding the requirement for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). This detailed guide provides step-by-step instructions for using Amazon EKS as a compute environment complete with how to configure IAM Roles for Kubernetes Services Accounts (IRSA), now an Amazon EKS best practice.\n\n[Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/)\n\n

Additional resources

\n\nThe following resources will help you dig deeper into Nextflow and other related projects like the nf-core community which maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.\n\n

1. Nextflow docs

\n\nThe reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month, with the latest stable releases every three months.\n\nLatest [stable](https://www.nextflow.io/docs/latest/index.html) & [edge](https://www.nextflow.io/docs/edge/index.html) documentation.\n\n

2. Seqera Labs docs

\n\nAn index of documentation, deployment guides, training materials, and resources for all things Nextflow and Tower.\n\n[Seqera Labs docs](https://seqera.io/docs/)\n\n

3. nf-core

\n\nnf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up for the Slack channel.\n\n[nf-core website](https://nf-co.re/) and [nf-core Slack](https://nf-co.re/join)\n\n

4. Nextflow Tower

\n\nNextflow Tower is a platform to easily monitor, launch, and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines, and launching using either the web graphic interface, CLI, or API.\n\n[Nextflow Tower](https://tower.nf/) and [user documentation](http://help.tower.nf/).\n\n

5. Nextflow on AWS

\n\nPart of the Genomics Workflows on AWS, Amazon provides a quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes. While this article is packed with good information, the procedure outlined in the more recent [Nextflow and AWS Batch – Inside the integration](https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/) series, may be an easier place to start. Some of the steps that previously needed to be performed manually have been updated in the latest integration.\n\n[Nextflow on AWS Batch](https://docs.opendata.aws/genomics-workflows/orchestration/nextflow/nextflow-overview.html)\n\n

6. Nextflow Data Pipelines on Azure Batch

\n\nNextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guide below developed by the team at Microsoft to set up both services on Azure, and to get your storage and batch account names and keys.\n\n[Azure Blog](https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383) and [GitHub repository](https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md).\n\n

7. Running Nextflow with Google Life Sciences

\n\nA step-by-step guide to launching Nextflow Pipelines in Google Cloud. Note that this integration process is specific to Google Life Sciences – an offering that pre-dates Google Cloud Batch. If you want to use the newer integration approach, you can also check out the Nextflow blog article [Get started with Nextflow on Google Cloud Batch](https://nextflow.io/blog/2023/nextflow-with-gbatch.html).\n\n[Nextflow on Google Cloud](https://cloud.google.com/life-sciences/docs/tutorials/nextflow]\n\n

8. Bonus: Nextflow Tutorial - Variant Calling Edition

\n\nThis [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/) has been adapted from the [Nextflow Software Carpentry training material](https://carpentries-incubator.github.io/workflows-nextflow/index.html) and [Data Carpentry: Wrangling Genomics Lesson](https://datacarpentry.org/wrangling-genomics/). Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash script into a Nextflow workflow, and modularize the pipeline using DSL2 modules and sub-workflows.\n\nThe workshop can be opened on [Gitpod](https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git), where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.\n\nYou can find the course in [Nextflow Tutorial - Variant Calling Edition](https://sateeshperi.github.io/nextflow_varcal/nextflow/).\n\n

Community and support

\n\n- [Seqera Community Forum](https://community.seqera.io)\n- Nextflow Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)\n- [Nextflow Slack](https://www.nextflow.io/slack-invite.html)\n- [nf-core Slack](https://nfcore.slack.com/)\n- [Seqera Labs](https://www.seqera.io/) and [Nextflow Tower](https://tower.nf/)\n- [Nextflow patterns](https://github.com/nextflow-io/patterns)\n- [Nextflow Snippets](https://github.com/mribeirodantas/NextflowSnippets)", "images": [], "author": "Evan Floden", "tags": "nextflow, tower" @@ -594,7 +594,7 @@ "slug": "2023/nextflow-goes-to-university", "title": "Nextflow goes to university!", "date": "2023-07-24T00:00:00.000Z", - "content": "\nThe Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual [Nextflow and nf-core community training online](https://www.youtube.com/@nf-core/playlists?view=50&sort=dd&shelf_id=2). In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, [Nextflow](https://www.nextflow.io/) and [nf-core](https://nf-co.re/) training for Masters and Ph.D. students in Bioinformatics and other related fields.\n\nWe recently held one of our first such projects, a collaboration with the [Bioinformatics Multidisciplinary Environment, BioME](https://bioinfo.imd.ufrn.br/site/en-US) at the [Federal University of Rio Grande do Norte (UFRN)](https://www.ufrn.br/) in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.\n\n\n\nThe course participants included one undergraduate student, Master's students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!\n\n Picture with some of the students that attended the course\n\n## Final projects\n\nFinal hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion.\n\n### Euryale 🐍\n\n[João Vitor Cavalcante](https://www.linkedin.com/in/joao-vitor-cavalcante), along with collaborators, had developed and [published](https://www.frontiersin.org/articles/10.3389/fgene.2022.814437/full) a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result [here](https://github.com/dalmolingroup/euryale/) 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓\n\n### Bringing Nanopore to Google Batch ☁️\n\nThe Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, [wf-alignment](https://github.com/epi2me-labs/wf-alignment), takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, [Danilo Imparato](https://www.linkedin.com/in/daniloimparato) noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his [final project](https://github.com/daniloimparato/wf-alignment), which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯\n\n### Nextflow applied to Economics! 🤩\n\n[Galileu Nobre](https://www.linkedin.com/in/galileu-nobre-901551187/) is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his [final project](https://github.com/galileunobre/nextflow_projeto_1). The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.\n\n### Whole Exome Sequencing 🧬\n\nFor her [final project](https://github.com/RafaellaFerraz/exome), [Rafaella Ferraz](https://www.linkedin.com/in/rafaella-sousa-ferraz) used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻\n\n### RNASeq with contamination 🧫\n\nIn her [final project](https://github.com/iaradsouza1/tab-projeto-final), [Iara Souza](https://www.linkedin.com/in/iaradsouza) developed a bioinformatics pipeline that analyzed RNA-Seq data when it's required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the \"batteries included\" that come with it 🔋😬\n\n### SARS-CoV-2 Genome assembly and lineage classification 🦠\n\n[Diego Teixeira](https://www.linkedin.com/in/diego-go-tex) has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a [Nextflow pipeline](https://github.com/diegogotex/sarscov2_irma_nf) aggregating all tools and analyses he's been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.\n\nIn the nf-core project, there are almost a [thousand modules](https://nf-co.re/modules) ready to plug in your pipeline, together with [dozens of full-featured pipelines](https://nf-co.re/pipelines). However, in many situations, you'll need a custom pipeline. With that in mind, it's very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.\n\n## Exciting experience!\n\nIt was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩\n\nWe’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don't hesitate to reach out to us at community@seqera.io. We would love to hear from you!\n", + "content": "The Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual [Nextflow and nf-core community training online](https://www.youtube.com/@nf-core/playlists?view=50&sort=dd&shelf_id=2). In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, [Nextflow](https://www.nextflow.io/) and [nf-core](https://nf-co.re/) training for Masters and Ph.D. students in Bioinformatics and other related fields.\n\nWe recently held one of our first such projects, a collaboration with the [Bioinformatics Multidisciplinary Environment, BioME](https://bioinfo.imd.ufrn.br/site/en-US) at the [Federal University of Rio Grande do Norte (UFRN)](https://www.ufrn.br/) in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.\n\n\n\nThe course participants included one undergraduate student, Master's students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!\n\n Picture with some of the students that attended the course\n\n## Final projects\n\nFinal hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion.\n\n### Euryale 🐍\n\n[João Vitor Cavalcante](https://www.linkedin.com/in/joao-vitor-cavalcante), along with collaborators, had developed and [published](https://www.frontiersin.org/articles/10.3389/fgene.2022.814437/full) a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result [here](https://github.com/dalmolingroup/euryale/) 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓\n\n### Bringing Nanopore to Google Batch ☁️\n\nThe Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, [wf-alignment](https://github.com/epi2me-labs/wf-alignment), takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, [Danilo Imparato](https://www.linkedin.com/in/daniloimparato) noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his [final project](https://github.com/daniloimparato/wf-alignment), which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯\n\n### Nextflow applied to Economics! 🤩\n\n[Galileu Nobre](https://www.linkedin.com/in/galileu-nobre-901551187/) is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his [final project](https://github.com/galileunobre/nextflow_projeto_1). The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.\n\n### Whole Exome Sequencing 🧬\n\nFor her [final project](https://github.com/RafaellaFerraz/exome), [Rafaella Ferraz](https://www.linkedin.com/in/rafaella-sousa-ferraz) used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻\n\n### RNASeq with contamination 🧫\n\nIn her [final project](https://github.com/iaradsouza1/tab-projeto-final), [Iara Souza](https://www.linkedin.com/in/iaradsouza) developed a bioinformatics pipeline that analyzed RNA-Seq data when it's required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the \"batteries included\" that come with it 🔋😬\n\n### SARS-CoV-2 Genome assembly and lineage classification 🦠\n\n[Diego Teixeira](https://www.linkedin.com/in/diego-go-tex) has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a [Nextflow pipeline](https://github.com/diegogotex/sarscov2_irma_nf) aggregating all tools and analyses he's been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.\n\nIn the nf-core project, there are almost a [thousand modules](https://nf-co.re/modules) ready to plug in your pipeline, together with [dozens of full-featured pipelines](https://nf-co.re/pipelines). However, in many situations, you'll need a custom pipeline. With that in mind, it's very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.\n\n## Exciting experience!\n\nIt was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩\n\nWe’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don't hesitate to reach out to us at community@seqera.io. We would love to hear from you!", "images": [ "/img/nextflow-university-class-ufrn.jpg" ], @@ -605,7 +605,7 @@ "slug": "2023/nextflow-summit-2023-recap", "title": "Nextflow Summit 2023 Recap", "date": "2023-10-25T00:00:00.000Z", - "content": "\n## Five days of Nextflow Awesomeness in Barcelona\n\nOn Friday, Oct 20, we wrapped up our [hackathon](https://nf-co.re/events/hackathon) and [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, Spain. By any measure, this year’s Summit was our best community event ever, drawing roughly 900 attendees across multiple channels, including in-person attendees, participants in our [#summit-2023](https://nextflow.slack.com/archives/C0602TWRT5G) Slack channel, and [Summit Livestream](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL) viewers on YouTube.\n\nThe Summit drew attendees, speakers, and sponsors from around the world. Over the course of the three-day event, we heard from dozens of impressive speakers working at the cutting edge of life sciences from academia, research, healthcare providers, biotechs, and cloud providers, including:\n\n- Australian BioCommons\n- Genomics England\n- Pixelgen Technologies\n- University of Tennessee Health Science Center\n- Amazon Web Services\n- Quantitative Biology Center - University of Tübingen\n- Biomodal\n- Matterhorn Studio\n- Centre for Genomic Regulation (CRG)\n- Heidelberg University Hospital\n- MemVerge\n- University of Cambridge\n- Oxford Nanopore Technologies\n- Medical University of Innsbruck\n- Sano Genetics\n- Institute of Genetics and Development of Rennes, University of Rennes\n- Ardigen\n- ZS\n- Wellcome Sanger Institute\n- SciLifeLab\n- AstraZeneca UK Ltd\n- University of Texas at Dallas\n- Seqera\n\n## The Hackathon – advancing the Nextflow ecosystem\n\nThe week began with a three-day in-person and virtual nf-core hackathon event. With roughly 100 in-person developers, this was twice the size of our largest Hackathon to date. As with previous Hackathons, participants were divided into project groups, with activities coordinated via a single [GitHub project board](https://github.com/orgs/nf-core/projects/47/views/1) focusing on different aspects of [nf-core](https://nf-co.re/) and Nextflow, including:\n\n- Pipelines\n- Modules & subworkflows\n- Infrastructure\n- Nextflow & plugins development\n\nThis year, the focus of the hackathon was [nf-test](https://code.askimed.com/nf-test/), an open-source testing framework for Nextflow pipelines. The team made considerable progress applying nf-test consistently across various nf-core pipelines and modules — and of course, no Hackathon would be complete without a community cooking class, quiz, bingo, a sock hunt, and a scavenger hunt!\n\nFor an overview of the tremendous progress made advancing the state of Nextflow and nf-core in three short days, view Chris Hakkaart’s talk on [highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/).\n\n## The Summit kicks off\n\nThe Summit began on Wednesday Oct 18 with excellent talks from [Australian BioCommons](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) and [Genomics England](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/). This was followed by a presentation where [Pixelgen Technologies](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) described their unique Molecular Pixelation (MPX) technologies and unveiled their new [nf-core/pixelator](https://nf-co.re/pixelator/1.0.0) community pipeline for molecular pixelation assays.\n\nNext, Seqera’s Phil Ewels took the stage providing a series of community updates, including the announcement of a new [Nextflow Ambassador](https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html) program, [a new community forum](https://nextflow.io/blog/2023/community-forum.html) at [community.seqera.io](https://community.seqera.io), and the exciting appointment of [Geraldine Van der Auwera](https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html) as lead developer advocate for the Nextflow. Geraldine is well known for her work on GATK, WDL, and Terra.bio and is the co-author of the book [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/). As Geraldine assumes leadership of the developer advocacy team, Phil will spend more time focusing on open-source development, as product manager of open source at Seqera.\n\n
\n \"Hackathon\n
\n\nSeqera’s Evan Floden shared his vision of the modern biotech stack for open science, highlighting recent developments at Seqera, including a revamped [Seqera platform](https://seqera.io/platform/), new [Data Explorer](https://seqera.io/blog/introducing-data-explorer/) functionality, and providing an exciting glimpse of the new Data Studios feature now in private preview. You can view [Evan’s full talk here](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/).\n\nA highlight was the keynote delivered by Erik Garrison of the University of Tennessee Health Science Center provided. In his talk, [Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/), Erik shared how his team's cutting-edge work applying new computational methods in the context of the Human Pangenome Project has yielded the most complete picture of human sequence evolution available to date.\n\nDay one wrapped up with a surprise [announcement](https://www.globenewswire.com/news-release/2023/10/20/2763899/0/en/Seqera-Sets-Sail-With-Alinghi-Red-Bull-Racing-as-Official-High-Performance-Computing-Supplier.html) that Seqera has been confirmed as the official High-Performance Computing Supplier for Alinghi Red Bull Racing at the [37th America’s Cup](https://www.americascup.com/) in Barcelona. This was followed by an evening reception hosted by [Alinghi Red Bull Racing](https://alinghiredbullracing.americascup.com/).\n\n## Day two starts off on the right foot\n\nDay two kicked off with a brisk sunrise run along the iconic Barcelona Waterfront attended by a team of hardy Summit participants. After that, things kicked into high gear for the morning session with talks on everything from using Nextflow to power [Machine Learning pipelines for materials science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) to [standardized frameworks for protein structure prediction](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) to discussions on [how to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/).\n\n
\n \"Summit\n
\n\nNextflow creator and Seqera CTO and co-founder Paolo Di Tommaso provided an update on some of the technologies he and his team have been working on including a deep dive on the [Fusion file system](https://seqera.io/fusion/). Paolo also delved into [Wave containers](https://seqera.io/wave/), discussing the dynamic assembly of containers using the [Spack package manager](https://nextflow.io/docs/latest/process.html#spack), echoing a similar theme from AWS’s [Brendan Bouffler](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) earlier in the day. During the conference, Seqera announced Wave Containers as our latest [open-source](https://github.com/seqeralabs/wave) contribution to the bioinformatics community — a huge contribution to the open science movement.\n\nPaolo also provided an impressive command-line focused demo of Wave, echoing Harshil Patel’s equally impressive demo earlier in the day focused on [seqerakit and automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/). Both Harshil and Paolo showed themselves to be **\"kings of the live demo\"** for their command line mastery under pressure! You can view [Paolo’s talk and demos here](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) and [Harshil’s talk here](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/).\n\nTalks during day two included [bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/), a discussion of [nf-validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/), and a talk on the [development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/).\n\nUnfortunately, there were too many brilliant speakers and topics to mention them all here, so we’ve provided a handy summary of talks at the end of this post so you can look up topics of interest.\n\nThe Summit also featured an exhibition area, and attendees visited booths hosted by [event sponsors](https://summit.nextflow.io/barcelona/sponsors/) between talks and viewed the many excellent [scientific posters](https://summit.nextflow.io/barcelona/posters/) contributed for the event. Following a packed day of sessions that went into the evening, attendees relaxed and socialized with colleagues over dinner.\n\n
\n \"Morning\n
\n\n## Wrapping up\n\nAs things wound to a close on day three, there were additional talks on topics ranging from ZS’s [contributing to nf-core through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) to [decoding the Tree of Life at Wellcome Sanger Institute](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) to [performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) at Medical University of Innsbruck.\n\nPhil Ewels discussed [future plans for MultiQC](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/), and Edmund Miller [shared his experience working on nf-test](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) and how it is empowering scalable and streamlined testing for nf-core projects.\n\nTo close the event, Evan took the stage a final time, thanking the many Summit organizers and contributors, and announcing the next Nextflow Summit Barcelona, scheduled for **October 21-25, 2024**. He also reminded attendees of the upcoming North American Hackathon and [Nextflow Summit in Boston](https://summit.nextflow.io/boston/) beginning on November 28, 2023.\n\nOn behalf of the Seqera team, thank you to our fellow [sponsors](https://summit.nextflow.io/boston/sponsors/) who helped make the Nextflow Summit a resounding success. This year’s sponsors included:\n\n- AWS\n- ZS\n- Element Biosciences\n- Microsoft\n- MemVerge\n- Pixelgen Technologies\n- Oxford Nanopore\n- Quilt\n- TileDB\n\n## In case you missed it\n\nIf you were unable to attend in person, or missed a talk, you can watch all three days of the Summit on our [YouTube channel](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL).\n\nFor information about additional upcoming events including bytesize talks, hackathons, webinars, and training events, you can visit [https://nf-co.re/events](https://nf-co.re/events) or [https://seqera.io/events/seqera/](https://seqera.io/events/seqera/).\n\nFor your convenience, a handy list of talks from Nextflow Summit 2023 are summarized below.\n\n### Day one (Wednesday Oct 18):\n\n- [The National Nextflow Tower Service for Australian researchers](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) – Steven Manos\n- [Analysing ONT long read data for cancer with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/) – Arthur Gymer\n- [Community updates](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-community-updates/) – Phil Ewels\n- [Pixelgen Technologies ❤︎ Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) – John Dahlberg\n- [The modern biotech stack](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/) – Evan Floden\n- [Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/) – Erik Garrison\n\n### Day two (Thursday Oct 19):\n\n- [It’s been quite a year for research technology in the cloud: we’ve been busy](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) – Brendan Bouffler\n- [nf-validation: a Nextflow plugin to validate pipeline parameters and input files](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/) - Júlia Mir Pedrol\n- [Computational methods for allele-specific methylation with biomodal Duet](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-biomodal-duet/) – Michael Wilson\n- [How to use data pipelines in Machine Learning for Material Science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) – Jakob Zeitler\n- [nf-core/proteinfold: a standardized workflow framework for protein structure prediction tools](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) - Jose Espinosa-Carrasco\n- [Automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/) - Harshil Patel\n- [nf-co2footprint: a Nextflow plugin to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/) - Sabrina Krakau\n- [Bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/) - Victor Perez\n- [Bioinformatics at the speed of cloud: revolutionizing genomics with Nextflow and MMCloud](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bioinformatics-at-the-speed-of-cloud/) - Sateesh Peri\n- [Enabling converged computing with the Nextflow ecosystem](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) - Paolo Di Tommaso\n- [Cluster scalable pangenome graph construction with nf-core/pangenome](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-cluster-scalable-pangenome/) - Simon Heumos\n- [Development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/) - Raquel Manzano\n- [Annotation cache: using nf-core/modules and Seqera Platform to build an AWS open data resource](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-annotation-cache/) - Maxime Garcia\n- [Real-time sequencing analysis with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-real-time-sequencing-analysis-with-nextflow/) - Chris Wright\n- [nf-core/sarek: a comprehensive & efficient somatic & germline variant calling workflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-sarek/) - Friederike Hanssen\n- [nf-test: a simple but powerful testing framework for Nextflow pipelines](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-test-simple-but-powerful/) - Lukas Forer\n- [Empowering distributed precision medicine: scalable genomic analysis in clinical trial recruitment](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-empowering-distributed-precision-medicine/) - Heath Obrien\n- [nf-core pipeline for genomic imputation: from phasing to imputation to validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-pipeline-for-genomic-imputation/) - Louis Le Nézet\n- [Porting workflow managers to Nextflow at a national diagnostic genomics medical service – strategy and learnings](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-genomics-england/) - Several Speakers\n\n### Day three (Thursday Oct 19):\n\n- [Driving discovery: contributing to the nf-core project through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) - Felipe Almeida & Juliet Frederiksen\n- [Automated production engine to decode the Tree of Life](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) - Guoying Qi\n- [Building a community: experiences from one year as a developer advocate](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-community-building/) - Marcel Ribeiro-Dantas\n- [nf-core/raredisease: a workflow to analyse data from patients with rare diseases](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-core-raredisease/) - Ramprasad Neethiraj\n- [Enabling AZ bioinformatics with Nextflow/Nextflow Tower](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-az/) - Manasa Surakala\n- [Bringing MultiQC into a new era](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/) - Phil Ewels\n- [nf-test at nf-core: empowering scalable and streamlined testing](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) - Edmund Miller\n- [Performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) - Sebastian Schönherr\n- [Highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/) - Chris Hakkaart\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)\n", + "content": "## Five days of Nextflow Awesomeness in Barcelona\n\nOn Friday, Oct 20, we wrapped up our [hackathon](https://nf-co.re/events/hackathon) and [Nextflow Summit](https://summit.nextflow.io/) in Barcelona, Spain. By any measure, this year’s Summit was our best community event ever, drawing roughly 900 attendees across multiple channels, including in-person attendees, participants in our [#summit-2023](https://nextflow.slack.com/archives/C0602TWRT5G) Slack channel, and [Summit Livestream](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL) viewers on YouTube.\n\nThe Summit drew attendees, speakers, and sponsors from around the world. Over the course of the three-day event, we heard from dozens of impressive speakers working at the cutting edge of life sciences from academia, research, healthcare providers, biotechs, and cloud providers, including:\n\n- Australian BioCommons\n- Genomics England\n- Pixelgen Technologies\n- University of Tennessee Health Science Center\n- Amazon Web Services\n- Quantitative Biology Center - University of Tübingen\n- Biomodal\n- Matterhorn Studio\n- Centre for Genomic Regulation (CRG)\n- Heidelberg University Hospital\n- MemVerge\n- University of Cambridge\n- Oxford Nanopore Technologies\n- Medical University of Innsbruck\n- Sano Genetics\n- Institute of Genetics and Development of Rennes, University of Rennes\n- Ardigen\n- ZS\n- Wellcome Sanger Institute\n- SciLifeLab\n- AstraZeneca UK Ltd\n- University of Texas at Dallas\n- Seqera\n\n## The Hackathon – advancing the Nextflow ecosystem\n\nThe week began with a three-day in-person and virtual nf-core hackathon event. With roughly 100 in-person developers, this was twice the size of our largest Hackathon to date. As with previous Hackathons, participants were divided into project groups, with activities coordinated via a single [GitHub project board](https://github.com/orgs/nf-core/projects/47/views/1) focusing on different aspects of [nf-core](https://nf-co.re/) and Nextflow, including:\n\n- Pipelines\n- Modules & subworkflows\n- Infrastructure\n- Nextflow & plugins development\n\nThis year, the focus of the hackathon was [nf-test](https://code.askimed.com/nf-test/), an open-source testing framework for Nextflow pipelines. The team made considerable progress applying nf-test consistently across various nf-core pipelines and modules — and of course, no Hackathon would be complete without a community cooking class, quiz, bingo, a sock hunt, and a scavenger hunt!\n\nFor an overview of the tremendous progress made advancing the state of Nextflow and nf-core in three short days, view Chris Hakkaart’s talk on [highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/).\n\n## The Summit kicks off\n\nThe Summit began on Wednesday Oct 18 with excellent talks from [Australian BioCommons](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) and [Genomics England](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/). This was followed by a presentation where [Pixelgen Technologies](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) described their unique Molecular Pixelation (MPX) technologies and unveiled their new [nf-core/pixelator](https://nf-co.re/pixelator/1.0.0) community pipeline for molecular pixelation assays.\n\nNext, Seqera’s Phil Ewels took the stage providing a series of community updates, including the announcement of a new [Nextflow Ambassador](https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html) program, [a new community forum](https://nextflow.io/blog/2023/community-forum.html) at [community.seqera.io](https://community.seqera.io), and the exciting appointment of [Geraldine Van der Auwera](https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html) as lead developer advocate for the Nextflow. Geraldine is well known for her work on GATK, WDL, and Terra.bio and is the co-author of the book [Genomics on the Cloud](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/). As Geraldine assumes leadership of the developer advocacy team, Phil will spend more time focusing on open-source development, as product manager of open source at Seqera.\n\n
\n \"Hackathon\n
\n\nSeqera’s Evan Floden shared his vision of the modern biotech stack for open science, highlighting recent developments at Seqera, including a revamped [Seqera platform](https://seqera.io/platform/), new [Data Explorer](https://seqera.io/blog/introducing-data-explorer/) functionality, and providing an exciting glimpse of the new Data Studios feature now in private preview. You can view [Evan’s full talk here](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/).\n\nA highlight was the keynote delivered by Erik Garrison of the University of Tennessee Health Science Center provided. In his talk, [Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/), Erik shared how his team's cutting-edge work applying new computational methods in the context of the Human Pangenome Project has yielded the most complete picture of human sequence evolution available to date.\n\nDay one wrapped up with a surprise [announcement](https://www.globenewswire.com/news-release/2023/10/20/2763899/0/en/Seqera-Sets-Sail-With-Alinghi-Red-Bull-Racing-as-Official-High-Performance-Computing-Supplier.html) that Seqera has been confirmed as the official High-Performance Computing Supplier for Alinghi Red Bull Racing at the [37th America’s Cup](https://www.americascup.com/) in Barcelona. This was followed by an evening reception hosted by [Alinghi Red Bull Racing](https://alinghiredbullracing.americascup.com/).\n\n## Day two starts off on the right foot\n\nDay two kicked off with a brisk sunrise run along the iconic Barcelona Waterfront attended by a team of hardy Summit participants. After that, things kicked into high gear for the morning session with talks on everything from using Nextflow to power [Machine Learning pipelines for materials science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) to [standardized frameworks for protein structure prediction](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) to discussions on [how to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/).\n\n
\n \"Summit\n
\n\nNextflow creator and Seqera CTO and co-founder Paolo Di Tommaso provided an update on some of the technologies he and his team have been working on including a deep dive on the [Fusion file system](https://seqera.io/fusion/). Paolo also delved into [Wave containers](https://seqera.io/wave/), discussing the dynamic assembly of containers using the [Spack package manager](https://nextflow.io/docs/latest/process.html#spack), echoing a similar theme from AWS’s [Brendan Bouffler](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) earlier in the day. During the conference, Seqera announced Wave Containers as our latest [open-source](https://github.com/seqeralabs/wave) contribution to the bioinformatics community — a huge contribution to the open science movement.\n\nPaolo also provided an impressive command-line focused demo of Wave, echoing Harshil Patel’s equally impressive demo earlier in the day focused on [seqerakit and automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/). Both Harshil and Paolo showed themselves to be **\"kings of the live demo\"** for their command line mastery under pressure! You can view [Paolo’s talk and demos here](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) and [Harshil’s talk here](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/).\n\nTalks during day two included [bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/), a discussion of [nf-validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/), and a talk on the [development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/).\n\nUnfortunately, there were too many brilliant speakers and topics to mention them all here, so we’ve provided a handy summary of talks at the end of this post so you can look up topics of interest.\n\nThe Summit also featured an exhibition area, and attendees visited booths hosted by [event sponsors](https://summit.nextflow.io/barcelona/sponsors/) between talks and viewed the many excellent [scientific posters](https://summit.nextflow.io/barcelona/posters/) contributed for the event. Following a packed day of sessions that went into the evening, attendees relaxed and socialized with colleagues over dinner.\n\n
\n \"Morning\n
\n\n## Wrapping up\n\nAs things wound to a close on day three, there were additional talks on topics ranging from ZS’s [contributing to nf-core through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) to [decoding the Tree of Life at Wellcome Sanger Institute](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) to [performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) at Medical University of Innsbruck.\n\nPhil Ewels discussed [future plans for MultiQC](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/), and Edmund Miller [shared his experience working on nf-test](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) and how it is empowering scalable and streamlined testing for nf-core projects.\n\nTo close the event, Evan took the stage a final time, thanking the many Summit organizers and contributors, and announcing the next Nextflow Summit Barcelona, scheduled for **October 21-25, 2024**. He also reminded attendees of the upcoming North American Hackathon and [Nextflow Summit in Boston](https://summit.nextflow.io/boston/) beginning on November 28, 2023.\n\nOn behalf of the Seqera team, thank you to our fellow [sponsors](https://summit.nextflow.io/boston/sponsors/) who helped make the Nextflow Summit a resounding success. This year’s sponsors included:\n\n- AWS\n- ZS\n- Element Biosciences\n- Microsoft\n- MemVerge\n- Pixelgen Technologies\n- Oxford Nanopore\n- Quilt\n- TileDB\n\n## In case you missed it\n\nIf you were unable to attend in person, or missed a talk, you can watch all three days of the Summit on our [YouTube channel](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL).\n\nFor information about additional upcoming events including bytesize talks, hackathons, webinars, and training events, you can visit [https://nf-co.re/events](https://nf-co.re/events) or [https://seqera.io/events/seqera/](https://seqera.io/events/seqera/).\n\nFor your convenience, a handy list of talks from Nextflow Summit 2023 are summarized below.\n\n### Day one (Wednesday Oct 18):\n\n- [The National Nextflow Tower Service for Australian researchers](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) – Steven Manos\n- [Analysing ONT long read data for cancer with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/) – Arthur Gymer\n- [Community updates](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-community-updates/) – Phil Ewels\n- [Pixelgen Technologies ❤︎ Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) – John Dahlberg\n- [The modern biotech stack](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/) – Evan Floden\n- [Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/) – Erik Garrison\n\n### Day two (Thursday Oct 19):\n\n- [It’s been quite a year for research technology in the cloud: we’ve been busy](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) – Brendan Bouffler\n- [nf-validation: a Nextflow plugin to validate pipeline parameters and input files](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/) - Júlia Mir Pedrol\n- [Computational methods for allele-specific methylation with biomodal Duet](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-biomodal-duet/) – Michael Wilson\n- [How to use data pipelines in Machine Learning for Material Science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) – Jakob Zeitler\n- [nf-core/proteinfold: a standardized workflow framework for protein structure prediction tools](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) - Jose Espinosa-Carrasco\n- [Automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/) - Harshil Patel\n- [nf-co2footprint: a Nextflow plugin to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/) - Sabrina Krakau\n- [Bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/) - Victor Perez\n- [Bioinformatics at the speed of cloud: revolutionizing genomics with Nextflow and MMCloud](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bioinformatics-at-the-speed-of-cloud/) - Sateesh Peri\n- [Enabling converged computing with the Nextflow ecosystem](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) - Paolo Di Tommaso\n- [Cluster scalable pangenome graph construction with nf-core/pangenome](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-cluster-scalable-pangenome/) - Simon Heumos\n- [Development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/) - Raquel Manzano\n- [Annotation cache: using nf-core/modules and Seqera Platform to build an AWS open data resource](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-annotation-cache/) - Maxime Garcia\n- [Real-time sequencing analysis with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-real-time-sequencing-analysis-with-nextflow/) - Chris Wright\n- [nf-core/sarek: a comprehensive & efficient somatic & germline variant calling workflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-sarek/) - Friederike Hanssen\n- [nf-test: a simple but powerful testing framework for Nextflow pipelines](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-test-simple-but-powerful/) - Lukas Forer\n- [Empowering distributed precision medicine: scalable genomic analysis in clinical trial recruitment](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-empowering-distributed-precision-medicine/) - Heath Obrien\n- [nf-core pipeline for genomic imputation: from phasing to imputation to validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-pipeline-for-genomic-imputation/) - Louis Le Nézet\n- [Porting workflow managers to Nextflow at a national diagnostic genomics medical service – strategy and learnings](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-genomics-england/) - Several Speakers\n\n### Day three (Thursday Oct 19):\n\n- [Driving discovery: contributing to the nf-core project through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) - Felipe Almeida & Juliet Frederiksen\n- [Automated production engine to decode the Tree of Life](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) - Guoying Qi\n- [Building a community: experiences from one year as a developer advocate](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-community-building/) - Marcel Ribeiro-Dantas\n- [nf-core/raredisease: a workflow to analyse data from patients with rare diseases](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-core-raredisease/) - Ramprasad Neethiraj\n- [Enabling AZ bioinformatics with Nextflow/Nextflow Tower](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-az/) - Manasa Surakala\n- [Bringing MultiQC into a new era](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/) - Phil Ewels\n- [nf-test at nf-core: empowering scalable and streamlined testing](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) - Edmund Miller\n- [Performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) - Sebastian Schönherr\n- [Highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/) - Chris Hakkaart\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)", "images": [ "/img/blog-summit-2023-recap--img1b.jpg", "/img/blog-summit-2023-recap--img2b.jpg", @@ -618,7 +618,7 @@ "slug": "2023/nextflow-with-gbatch", "title": "Get started with Nextflow on Google Cloud Batch", "date": "2023-02-01T00:00:00.000Z", - "content": "\n[We have talked about Google Cloud Batch before](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html). Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The [Nextflow official documentation](https://www.nextflow.io/docs/latest/google.html) also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳\n\n### Running an RNAseq pipeline with Google Cloud Batch\n\nWelcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check [nf-core/rnaseq](https://github.com/nf-core/rnaseq). For the proof-of-concept RNAseq pipeline that we will use here, check [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf).\n\nNextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis.\n\nWe will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!\n\n### Setting up Google Cloud CLI (gcloud)\n\nIn this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions [here](https://cloud.google.com/sdk/docs/install) to install it. Once you have gcloud installed, run the command `gcloud init` to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project \"my-rnaseq-pipeline\". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.\n\n### Setting up Batch and Storage in Google Cloud Platform\n\n#### Enable Google Batch\n\nAccording to the [official Google documentation](https://cloud.google.com/batch/docs/get-started) _Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale_.\n\nThe first step is to download the `beta` command group. You can do this by executing:\n\n```bash\n$ gcloud components install beta\n```\n\nThen, enable billing for this project. You will first need to get your account id with\n\n```bash\n$ gcloud beta billing accounts list\n```\n\nAfter that, you will see something like the following appear in your window:\n\n```console\nACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True\n```\n\nIf you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.\n\n```bash\n$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ\n```\n\nNext, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:\n\n```bash\n$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com\n```\n\nYou should see a message similar to the one below:\n\n```console\nOperation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.\n```\n\n#### Create a Service Account\n\nIn order to access the APIs we enabled, you need to [create a Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud) and set the necessary IAM roles for the project. You can create the Service Account by executing:\n\n```bash\n$ gcloud iam service-accounts create rnaseq-pipeline-sa\n```\n\nAfter this, set appropriate roles for the project using the commands below:\n\n```bash\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"\n```\n\n#### Create your Bucket\n\nNow it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name\n\n```bash\n$ gcloud storage buckets create gs://rnaseq-pipeline-bckt\n```\n\nNow it's time for Nextflow to join the party! 🥳\n\n### Setting up Nextflow to make use of Batch and Storage\n\n#### Write the configuration file\n\nHere you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.\n\nStart by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally.\n\nInside the folder that you created for the project, create a file named `nextflow.config` with the following content (remember to replace PROJECT-ID with the project id you created above):\n\n```groovy\nworkDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}\n```\n\nThe `workDir` option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice.\n\nThe `process` scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the `nextflow/rnaseq-nf` Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed.\n\nThe `google` scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances [here](https://www.nextflow.io/docs/latest/google.html#spot-instances)), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the `nextflow.config` file should contain \"rnaseq-nxf\" as the project id.\n\nUse the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.\n\n```bash\n$ gcloud auth application-default login\n```\n\n#### Launch the pipeline!\n\nWith that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of [this tutorial](https://github.com/nextflow-io/rnaseq-nf), and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at .\n\nOne important thing to mention is that in this repository there is already a `nextflow.config` file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the `-c` Nextflow parameter. Run the command line below:\n\n```bash\n$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config\n```\n\nWhile the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called `results`, because of how the `publishDir` directive was specified in the `main.nf` script (example [here](https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5)). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the `publishDir` to another bucket directory, e.g. `gs://rnaseq-pipeline-bckt/results`.\n\nIn your terminal, you should see something like this:\n\n![Nextflow ongoing run on Google Cloud Batch](/img/ongoing-nxf-gbatch.png)\n\nYou can check the status of your jobs on Google Batch by opening another terminal and running the following command:\n\n```bash\n$ gcloud batch jobs list\n```\n\nBy the end of it, if everything worked well, you should see something like:\n\n![Nextflow run on Google Cloud Batch finished](/img/nxf-gbatch-finished.png)\n\nAnd that's all, folks! 😆\n\nYou will find more information about Nextflow on Google Batch in [this blog post](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html) and the [official Nextflow documentation](https://www.nextflow.io/docs/latest/google.html).\n\nSpecial thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document.\n", + "content": "[We have talked about Google Cloud Batch before](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html). Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The [Nextflow official documentation](https://www.nextflow.io/docs/latest/google.html) also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳\n\n### Running an RNAseq pipeline with Google Cloud Batch\n\nWelcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check [nf-core/rnaseq](https://github.com/nf-core/rnaseq). For the proof-of-concept RNAseq pipeline that we will use here, check [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf).\n\nNextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis.\n\nWe will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!\n\n### Setting up Google Cloud CLI (gcloud)\n\nIn this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions [here](https://cloud.google.com/sdk/docs/install) to install it. Once you have gcloud installed, run the command `gcloud init` to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project \"my-rnaseq-pipeline\". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.\n\n### Setting up Batch and Storage in Google Cloud Platform\n\n#### Enable Google Batch\n\nAccording to the [official Google documentation](https://cloud.google.com/batch/docs/get-started) _Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale_.\n\nThe first step is to download the `beta` command group. You can do this by executing:\n\n```bash\n$ gcloud components install beta\n```\n\nThen, enable billing for this project. You will first need to get your account id with\n\n```bash\n$ gcloud beta billing accounts list\n```\n\nAfter that, you will see something like the following appear in your window:\n\n```console\nACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True\n```\n\nIf you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.\n\n```bash\n$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ\n```\n\nNext, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:\n\n```bash\n$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com\n```\n\nYou should see a message similar to the one below:\n\n```console\nOperation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.\n```\n\n#### Create a Service Account\n\nIn order to access the APIs we enabled, you need to [create a Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud) and set the necessary IAM roles for the project. You can create the Service Account by executing:\n\n```bash\n$ gcloud iam service-accounts create rnaseq-pipeline-sa\n```\n\nAfter this, set appropriate roles for the project using the commands below:\n\n```bash\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"\n```\n\n#### Create your Bucket\n\nNow it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name\n\n```bash\n$ gcloud storage buckets create gs://rnaseq-pipeline-bckt\n```\n\nNow it's time for Nextflow to join the party! 🥳\n\n### Setting up Nextflow to make use of Batch and Storage\n\n#### Write the configuration file\n\nHere you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.\n\nStart by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally.\n\nInside the folder that you created for the project, create a file named `nextflow.config` with the following content (remember to replace PROJECT-ID with the project id you created above):\n\n```groovy\nworkDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}\n```\n\nThe `workDir` option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice.\n\nThe `process` scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the `nextflow/rnaseq-nf` Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed.\n\nThe `google` scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances [here](https://www.nextflow.io/docs/latest/google.html#spot-instances)), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the `nextflow.config` file should contain \"rnaseq-nxf\" as the project id.\n\nUse the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.\n\n```bash\n$ gcloud auth application-default login\n```\n\n#### Launch the pipeline!\n\nWith that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of [this tutorial](https://github.com/nextflow-io/rnaseq-nf), and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at .\n\nOne important thing to mention is that in this repository there is already a `nextflow.config` file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the `-c` Nextflow parameter. Run the command line below:\n\n```bash\n$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config\n```\n\nWhile the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called `results`, because of how the `publishDir` directive was specified in the `main.nf` script (example [here](https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5)). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the `publishDir` to another bucket directory, e.g. `gs://rnaseq-pipeline-bckt/results`.\n\nIn your terminal, you should see something like this:\n\n![Nextflow ongoing run on Google Cloud Batch](/img/ongoing-nxf-gbatch.png)\n\nYou can check the status of your jobs on Google Batch by opening another terminal and running the following command:\n\n```bash\n$ gcloud batch jobs list\n```\n\nBy the end of it, if everything worked well, you should see something like:\n\n![Nextflow run on Google Cloud Batch finished](/img/nxf-gbatch-finished.png)\n\nAnd that's all, folks! 😆\n\nYou will find more information about Nextflow on Google Batch in [this blog post](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html) and the [official Nextflow documentation](https://www.nextflow.io/docs/latest/google.html).\n\nSpecial thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document.\n", "images": [], "author": "Marcel Ribeiro-Dantas", "tags": "nextflow,google,cloud" @@ -627,7 +627,7 @@ "slug": "2023/reflecting-on-ten-years-of-nextflow-awesomeness", "title": "Reflecting on ten years of Nextflow awesomeness", "date": "2023-06-06T00:00:00.000Z", - "content": "\nThere's been a lot of water under the bridge since the first release of Nextflow in July 2013. From its humble beginnings at the [Centre for Genomic Regulation](https://www.crg.eu/) (CRG) in Barcelona, Nextflow has evolved from an upstart workflow orchestrator to one of the most consequential projects in open science software (OSS). Today, Nextflow is downloaded **120,000+** times monthly, boasts vibrant user and developer communities, and is used by leading pharmaceutical, healthcare, and biotech research firms.\n\nOn the occasion of Nextflow's anniversary, I thought it would be fun to share some perspectives and point out how far we've come as a community. I also wanted to recognize the efforts of Paolo Di Tommaso and the many people who have contributed enormous time and effort to make Nextflow what it is today.\n\n## A decade of innovation\n\nBill Gates is credited with observing that \"people often overestimate what they can do in one year, but underestimate what they can do in ten.\" The lesson, of course, is that real, meaningful change takes time. Progress is measured in a series of steps. Considered in isolation, each new feature added to Nextflow seems small, but they combine to deliver powerful capabilities.\n\nLife sciences has seen a staggering amount of innovation. According to estimates from the National Human Genome Research Institute (NHGRI), the cost of sequencing a human genome in 2013 was roughly USD 10,000. Today, sequencing costs are in the range of USD 200—a **50-fold reduction**.1\n\nA fundamental principle of economics is that _\"if you make something cheaper, you get more of it.\"_ One didn't need a crystal ball to see that, driven by plummeting sequencing and computing costs, the need for downstream analysis was poised to explode. With advances in sequencing technology outpacing Moore's Law, It was clear that scaling analysis capacity would be a significant issue.2\n\n## Getting the fundamentals right\n\nWhen Paolo and his colleagues started the Nextflow project, it was clear that emerging technologies such as cloud computing, containers, and collaborative software development would be important. Even so, it is still amazing how rapidly these key technologies have advanced in ten short years.\n\nIn an [article for eLife magazine in 2021](https://elifesciences.org/labs/d193babe/the-story-of-nextflow-building-a-modern-pipeline-orchestrator), Paolo described how Solomon Hyke's talk \"[Why we built Docker](https://www.youtube.com/watch?v=3N3n9FzebAA)\" at DotScale in the summer of 2013 impacted his thinking about the design of Nextflow. It was evident that containers would be a game changer for scientific workflows. Encapsulating application logic in self-contained, portable containers solved a multitude of complexity and dependency management challenges — problems experienced daily at the CRG and by many bioinformaticians to this day. Nextflow was developed concurrent with the container revolution, and Nextflow’s authors had the foresight to make containers first-class citizens.\n\nWith containers, HPC environments have been transformed — from complex environments where application binaries were typically served to compute nodes via NFS to simpler architectures where task-specific containers are pulled from registries on demand. Today, most bioinformatic pipelines use containers. Nextflow supports [multiple container formats](https://www.nextflow.io/docs/latest/container.html?highlight=containers) and runtimes, including [Docker](https://www.docker.com/), [Singularity](https://sylabs.io/), [Podman](https://podman.io/), [Charliecloud](https://hpc.github.io/charliecloud/), [Sarus](https://sarus.readthedocs.io/en/stable/), and [Shifter](https://github.com/NERSC/shifter).\n\n## The shift to the cloud\n\nSome of the earliest efforts around Nextflow centered on building high-quality executors for HPC workload managers. A key idea behind schedulers such as LSF, PBS, Slurm, and Grid Engine was to share a fixed pool of on-premises resources among multiple users, maximizing throughput, efficiency, and resource utilization.\n\nSee the article [Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters](https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html)\n\nWhile cloud infrastructure was initially \"clunky\" and hard to deploy and use, the idea of instant access and pay-per-use models was too compelling to ignore. In the early days, many organizations attempted to replicate on-premises HPC clusters in the cloud, deploying the same software stacks and management tools used locally to cloud-based VMs.\n\nWith the launch of [AWS Batch](https://aws.amazon.com/batch/) in December 2016, Nextflow’s developers realized there was a better way. In cloud environments, resources are (in theory) infinite and just an API call away. The traditional scheduling paradigm of sharing a finite resource pool didn't make sense in the cloud, where users could dynamically provision a private, scalable resource pool for only the duration of their workload. All the complex scheduling and control policies that tended to make HPC workload managers hard to use and manage were no longer required.3\n\nAWS Batch also relied on containerization, so it only made sense that AWS Batch was the first cloud-native integration to the Nextflow platform early in 2017, along with native support for S3 storage buckets. Nextflow has since been enhanced to support other batch services, including [Azure Batch](https://azure.microsoft.com/en-us/products/batch) and [Google Cloud Batch](https://cloud.google.com/batch), along with a rich set of managed cloud storage solutions. Nextflow’s authors have also embraced [Kubernetes](https://kubernetes.io/docs/concepts/overview/), developed by Google, yet another way to marshal and manage containerized application environments across public and private clouds.\n\n## SCMs come of age\n\nA major trend shaping software development has been the use of collaborative source code managers (SCMs) based on Git. When Paolo was thinking about the design of Nextflow, GitHub had already been around for several years, and DevOps techniques were revolutionizing software. These advances turned out to be highly relevant to managing pipelines. Ten years ago, most bioinformaticians stored copies of pipeline scripts locally. Nextflow’s authors recognized what now seems obvious — it would be easier to make Nextflow SCM aware and launch pipelines directly from a code repository. Today, this simple idea has become standard practice. Most users run pipelines directly from GitHub, GitLab, Gitea, or other favorite SCMs.\n\n## Modularization on steroids\n\nA few basic concepts and patterns in computer science appear repeatedly in different contexts. These include iteration, indirection, abstraction, and component reuse/modularization. Enabled by containers, we have seen a significant shift towards modularization in bioinformatics pipelines enabled by catalogs of reusable containers. In addition to general-purpose registries such as [Docker Hub](https://hub.docker.com/) and [Quay.io](https://quay.io/), domain-specific efforts such as [biocontainers](https://biocontainers.pro/) have emerged, aimed at curating purpose-built containers to meet the specialized needs of bioinformaticians.\n\nWe have also seen the emergence of platform and language-independent package managers such as [Conda](https://docs.conda.io/en/latest/). Today, almost **10,000** Conda recipes for various bioinformatics tools are freely available from [Bioconda](https://anaconda.org/bioconda/repo). Gone are the days of manually installing software. In addition to pulling pre-built bioinformatics containers from registries, developers can leverage [packages of bioconda](http://bioconda.github.io/conda-package_index.html) recipes directly from the bioconda channel.\n\nThe Nextflow community has helped lead this trend toward modularization in several areas. For example, in 2022, Seqera Labs introduced [Wave](https://seqera.io/wave/). This new service can dynamically build and serve containers on the fly based on bioconda recipes, enabling the two technologies to work together seamlessly and avoiding building and maintaining containers by hand.\n\nWith [nf-core](https://nf-co.re/), the Nextflow community has extended the concept of modularization and reuse one step further. Much as bioconda and containers have made bioinformatics software modular and portable, [nf-core modules](https://nf-co.re/modules) extend these concepts to pipelines. Today, there are **900+** nf-core modules — essentially building blocks with pre-defined inputs and outputs based on Nextflow's elegant dataflow model. Rather than creating pipelines from scratch, developers can now wire together these pre-assembled modules to deliver new functionality rapidly or use any of **80** of the pre-built [nf-core analysis pipelines](https://nf-co.re/pipelines). The result is a dramatic reduction in development and maintenance costs.\n\n## Some key Nextflow milestones\n\nSince the [first Nextflow release](https://github.com/nextflow-io/nextflow/releases/tag/v0.3.0) in July 2013, there have been **237 releases** and **5,800 commits**. Also, the project has been forked over **530** times. There have been too many important enhancements and milestones to capture here. We capture some important developments in the timeline below:\n\n\"Nextflow\n\nAs we look to the future, the pace of innovation continues to increase. It’s been exciting to see Nextflow expand beyond the various _omics_ disciplines to new areas such as medical imaging, data science, and machine learning. We continue to evolve Nextflow, adding new features and capabilities to support these emerging use cases and support new compute and storage environments. I can hardly wait to see what the next ten years will bring.\n\nFor those new to Nextflow and wishing to learn more about the project, we have compiled an excellent collection of resources to help you [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html).\n\n---\n\n1 [https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost](https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost)\n2 Coined by Gordon Moore of Intel in 1965, Moore’s Law predicted that transistor density, roughly equating to compute performance, would roughly double every two years. This was later revised in some estimates to 18 months. Over ten years, Moore’s law predicts roughly a 2^5 = 32X increase in performance – less than the ~50-fold decrease in sequencing costs. See [chart here](https://www.genome.gov/sites/default/files/inline-images/2021_Sequencing_cost_per_Human_Genome.jpg).\n3 This included features like separate queues, pre-emption policies, application profiles, and weighted fairshare algorithms.\n", + "content": "There's been a lot of water under the bridge since the first release of Nextflow in July 2013. From its humble beginnings at the [Centre for Genomic Regulation](https://www.crg.eu/) (CRG) in Barcelona, Nextflow has evolved from an upstart workflow orchestrator to one of the most consequential projects in open science software (OSS). Today, Nextflow is downloaded **120,000+** times monthly, boasts vibrant user and developer communities, and is used by leading pharmaceutical, healthcare, and biotech research firms.\n\nOn the occasion of Nextflow's anniversary, I thought it would be fun to share some perspectives and point out how far we've come as a community. I also wanted to recognize the efforts of Paolo Di Tommaso and the many people who have contributed enormous time and effort to make Nextflow what it is today.\n\n## A decade of innovation\n\nBill Gates is credited with observing that \"people often overestimate what they can do in one year, but underestimate what they can do in ten.\" The lesson, of course, is that real, meaningful change takes time. Progress is measured in a series of steps. Considered in isolation, each new feature added to Nextflow seems small, but they combine to deliver powerful capabilities.\n\nLife sciences has seen a staggering amount of innovation. According to estimates from the National Human Genome Research Institute (NHGRI), the cost of sequencing a human genome in 2013 was roughly USD 10,000. Today, sequencing costs are in the range of USD 200—a **50-fold reduction**.^1^\n\nA fundamental principle of economics is that _\"if you make something cheaper, you get more of it.\"_ One didn't need a crystal ball to see that, driven by plummeting sequencing and computing costs, the need for downstream analysis was poised to explode. With advances in sequencing technology outpacing Moore's Law, It was clear that scaling analysis capacity would be a significant issue.^2^\n\n## Getting the fundamentals right\n\nWhen Paolo and his colleagues started the Nextflow project, it was clear that emerging technologies such as cloud computing, containers, and collaborative software development would be important. Even so, it is still amazing how rapidly these key technologies have advanced in ten short years.\n\nIn an [article for eLife magazine in 2021](https://elifesciences.org/labs/d193babe/the-story-of-nextflow-building-a-modern-pipeline-orchestrator), Paolo described how Solomon Hyke's talk \"[Why we built Docker](https://www.youtube.com/watch?v=3N3n9FzebAA)\" at DotScale in the summer of 2013 impacted his thinking about the design of Nextflow. It was evident that containers would be a game changer for scientific workflows. Encapsulating application logic in self-contained, portable containers solved a multitude of complexity and dependency management challenges — problems experienced daily at the CRG and by many bioinformaticians to this day. Nextflow was developed concurrent with the container revolution, and Nextflow’s authors had the foresight to make containers first-class citizens.\n\nWith containers, HPC environments have been transformed — from complex environments where application binaries were typically served to compute nodes via NFS to simpler architectures where task-specific containers are pulled from registries on demand. Today, most bioinformatic pipelines use containers. Nextflow supports [multiple container formats](https://www.nextflow.io/docs/latest/container.html?highlight=containers) and runtimes, including [Docker](https://www.docker.com/), [Singularity](https://sylabs.io/), [Podman](https://podman.io/), [Charliecloud](https://hpc.github.io/charliecloud/), [Sarus](https://sarus.readthedocs.io/en/stable/), and [Shifter](https://github.com/NERSC/shifter).\n\n## The shift to the cloud\n\nSome of the earliest efforts around Nextflow centered on building high-quality executors for HPC workload managers. A key idea behind schedulers such as LSF, PBS, Slurm, and Grid Engine was to share a fixed pool of on-premises resources among multiple users, maximizing throughput, efficiency, and resource utilization.\n\nSee the article [Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters](https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html)\n\nWhile cloud infrastructure was initially \"clunky\" and hard to deploy and use, the idea of instant access and pay-per-use models was too compelling to ignore. In the early days, many organizations attempted to replicate on-premises HPC clusters in the cloud, deploying the same software stacks and management tools used locally to cloud-based VMs.\n\nWith the launch of [AWS Batch](https://aws.amazon.com/batch/) in December 2016, Nextflow’s developers realized there was a better way. In cloud environments, resources are (in theory) infinite and just an API call away. The traditional scheduling paradigm of sharing a finite resource pool didn't make sense in the cloud, where users could dynamically provision a private, scalable resource pool for only the duration of their workload. All the complex scheduling and control policies that tended to make HPC workload managers hard to use and manage were no longer required.^3^\n\nAWS Batch also relied on containerization, so it only made sense that AWS Batch was the first cloud-native integration to the Nextflow platform early in 2017, along with native support for S3 storage buckets. Nextflow has since been enhanced to support other batch services, including [Azure Batch](https://azure.microsoft.com/en-us/products/batch) and [Google Cloud Batch](https://cloud.google.com/batch), along with a rich set of managed cloud storage solutions. Nextflow’s authors have also embraced [Kubernetes](https://kubernetes.io/docs/concepts/overview/), developed by Google, yet another way to marshal and manage containerized application environments across public and private clouds.\n\n## SCMs come of age\n\nA major trend shaping software development has been the use of collaborative source code managers (SCMs) based on Git. When Paolo was thinking about the design of Nextflow, GitHub had already been around for several years, and DevOps techniques were revolutionizing software. These advances turned out to be highly relevant to managing pipelines. Ten years ago, most bioinformaticians stored copies of pipeline scripts locally. Nextflow’s authors recognized what now seems obvious — it would be easier to make Nextflow SCM aware and launch pipelines directly from a code repository. Today, this simple idea has become standard practice. Most users run pipelines directly from GitHub, GitLab, Gitea, or other favorite SCMs.\n\n## Modularization on steroids\n\nA few basic concepts and patterns in computer science appear repeatedly in different contexts. These include iteration, indirection, abstraction, and component reuse/modularization. Enabled by containers, we have seen a significant shift towards modularization in bioinformatics pipelines enabled by catalogs of reusable containers. In addition to general-purpose registries such as [Docker Hub](https://hub.docker.com/) and [Quay.io](https://quay.io/), domain-specific efforts such as [biocontainers](https://biocontainers.pro/) have emerged, aimed at curating purpose-built containers to meet the specialized needs of bioinformaticians.\n\nWe have also seen the emergence of platform and language-independent package managers such as [Conda](https://docs.conda.io/en/latest/). Today, almost **10,000** Conda recipes for various bioinformatics tools are freely available from [Bioconda](https://anaconda.org/bioconda/repo). Gone are the days of manually installing software. In addition to pulling pre-built bioinformatics containers from registries, developers can leverage [packages of bioconda](http://bioconda.github.io/conda-package_index.html) recipes directly from the bioconda channel.\n\nThe Nextflow community has helped lead this trend toward modularization in several areas. For example, in 2022, Seqera Labs introduced [Wave](https://seqera.io/wave/). This new service can dynamically build and serve containers on the fly based on bioconda recipes, enabling the two technologies to work together seamlessly and avoiding building and maintaining containers by hand.\n\nWith [nf-core](https://nf-co.re/), the Nextflow community has extended the concept of modularization and reuse one step further. Much as bioconda and containers have made bioinformatics software modular and portable, [nf-core modules](https://nf-co.re/modules) extend these concepts to pipelines. Today, there are **900+** nf-core modules — essentially building blocks with pre-defined inputs and outputs based on Nextflow's elegant dataflow model. Rather than creating pipelines from scratch, developers can now wire together these pre-assembled modules to deliver new functionality rapidly or use any of **80** of the pre-built [nf-core analysis pipelines](https://nf-co.re/pipelines). The result is a dramatic reduction in development and maintenance costs.\n\n## Some key Nextflow milestones\n\nSince the [first Nextflow release](https://github.com/nextflow-io/nextflow/releases/tag/v0.3.0) in July 2013, there have been **237 releases** and **5,800 commits**. Also, the project has been forked over **530** times. There have been too many important enhancements and milestones to capture here. We capture some important developments in the timeline below:\n\n\"Nextflow\n\nAs we look to the future, the pace of innovation continues to increase. It’s been exciting to see Nextflow expand beyond the various _omics_ disciplines to new areas such as medical imaging, data science, and machine learning. We continue to evolve Nextflow, adding new features and capabilities to support these emerging use cases and support new compute and storage environments. I can hardly wait to see what the next ten years will bring.\n\nFor those new to Nextflow and wishing to learn more about the project, we have compiled an excellent collection of resources to help you [Learn Nextflow in 2023](https://nextflow.io/blog/2023/learn-nextflow-in-2023.html).\n\n---\n\n^1^ [https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost](https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost)\n^2^ Coined by Gordon Moore of Intel in 1965, Moore’s Law predicted that transistor density, roughly equating to compute performance, would roughly double every two years. This was later revised in some estimates to 18 months. Over ten years, Moore’s law predicts roughly a 2^5 = 32X increase in performance – less than the ~50-fold decrease in sequencing costs. See [chart here](https://www.genome.gov/sites/default/files/inline-images/2021_Sequencing_cost_per_Human_Genome.jpg).\n^3^ This included features like separate queues, pre-emption policies, application profiles, and weighted fairshare algorithms.", "images": [ "/img/nextflow_ten_years_graphic.jpg" ], @@ -638,7 +638,7 @@ "slug": "2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines", "title": "Selecting the right storage architecture for your Nextflow pipelines", "date": "2023-05-04T00:00:00.000Z", - "content": "\n_In this article we present the various storage solutions supported by Nextflow including on-prem and cloud file systems, parallel file systems, and cloud object stores. We also discuss Fusion file system 2.0, a new high-performance file system that can help simplify configuration, improve throughput, and reduce costs in the cloud._\n\nAt one time, selecting a file system for distributed workloads was straightforward. Through the 1990s, the Network File System (NFS), developed by Sun Microsystems in 1984, was pretty much the only game in town. It was part of every UNIX distribution, and it presented a standard [POSIX interface](https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html), meaning that applications could read and write data without modification. Dedicated NFS servers and NAS filers became the norm in most clustered computing environments.\n\nFor organizations that outgrew the capabilities of NFS, other POSIX file systems emerged. These included parallel file systems such as [Lustre](https://www.lustre.org/), [PVFS](https://www.anl.gov/mcs/pvfs-parallel-virtual-file-system), [OpenZFS](https://openzfs.org/wiki/Main_Page), [BeeGFS](https://www.beegfs.io/c/), and [IBM Spectrum Scale](https://www.ibm.com/products/storage-scale-system) (formerly GPFS). Parallel file systems can support thousands of compute clients and deliver more than a TB/sec combined throughput, however, they are expensive, and can be complex to deploy and manage. While some parallel file systems work with standard Ethernet, most rely on specialized low-latency fabrics such as Intel® Omni-Path Architecture (OPA) or InfiniBand. Because of this, these file systems are typically found in only the largest HPC data centers.\n\n## Cloud changes everything\n\nWith the launch of [Amazon S3](https://aws.amazon.com/s3/) in 2006, new choices began to emerge. Rather than being a traditional file system, S3 is an object store accessible through a web API. S3 abandoned traditional ideas around hierarchical file systems. Instead, it presented a simple programmatic interface and CLI for storing and retrieving binary objects.\n\nObject stores are a good fit for cloud services because they are simple and scalable to multiple petabytes of storage. Rather than relying on central metadata that presents a bottleneck, metadata is stored with each object. All operations are atomic, so there is no need for complex POSIX-style file-locking mechanisms that add complexity to the design. Developers interact with object stores using simple calls like [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html) (store an object in a bucket in return for a key) and [GetObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html) (retrieve a binary object, given a key).\n\nThis simple approach was ideal for internet-scale applications. It was also much less expensive than traditional file systems. As a result, S3 usage grew rapidly. Similar object stores quickly emerged, including Microsoft [Azure Blob Storage](https://azure.microsoft.com/en-ca/products/storage/blobs/), [Open Stack Swift](https://wiki.openstack.org/wiki/Swift), and [Google Cloud Storage](https://cloud.google.com/storage/), released in 2010.\n\n## Cloud object stores vs. shared file systems\n\nObject stores are attractive because they are reliable, scalable, and cost-effective. They are frequently used to store large amounts of data that are accessed infrequently. Examples include archives, images, raw video footage, or in the case of bioinformatics applications, libraries of biological samples or reference genomes. Object stores provide near-continuous availability by spreading data replicas across cloud availability zones (AZs). AWS claims theoretical data availability of up to 99.999999999% (11 9's) – a level of availability so high that it does not even register on most [downtime calculators](https://availability.sre.xyz/)!\n\nBecause they support both near-line and cold storage, object stores are sometimes referred to as \"cheap and deep.\" Based on current [S3 pricing](https://aws.amazon.com/s3/pricing), the going rate for data storage is USD 0.023 per GB for the first 50 TB of data. Users can \"pay as they go\" — spinning up S3 storage buckets and storing arbitrary amounts of data for as long as they choose. Some high-level differences between object stores and traditional file systems are summarized below.\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Cloud object stores\n \n Traditional file systems\n
\n Interface / access protocol\n \n HTTP-based API\n \n POSIX interface\n
\n Cost\n \n $\n \n $$$\n
\n Scalability / capacity\n \n Practically unlimited\n \n Limited\n
\n Reliability / availability\n \n Extremely high\n \n Varies\n
\n Performance\n \n Typically lower\n \n Varies\n
\n Support for existing application\n \n NO\n \n YES\n
\n
\n\nThe downside of object storage is that the vast majority of applications are written to work with POSIX file systems. As a result, applications seldom interact directly with object stores. A common practice is to copy data from an object store, perform calculations locally on a cluster node, and write results back to the object store for long-term storage.\n\n## Data handling in Nextflow\n\nUnlike older pipeline orchestrators, Nextflow was built with cloud object stores in mind. Depending on the cloud where pipelines run, Nextflow manages cloud credentials and allows users to provide a path to shared data. This can be a shared file system such as `/my-shared-filesystem/data` or a cloud object store e.g. `s3://my-bucket/data/`.\n\n**Nextflow is exceptionally versatile when it comes to data handling, and can support almost any file system or object store.** Internally, Nextflow uses [executors](https://nextflow.io/docs/latest/executor.html) implemented as plug-ins to insulate pipeline code from underlying compute and storage environments. This enables pipelines to run without modification across multiple clouds regardless of the underlying storage technology.\n\nSuppose an S3 bucket is specified as a location for shared data during pipeline execution. In that case, aided by the [nf-amazon](https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-amazon) plug-in, Nextflow transparently copies data from the S3 bucket to a file system on a cloud instance. Containerized applications mount the local file system and read and write data directly. Once processing is complete, Nextflow copies data to the shared bucket to be available for the next task. All of this is completely transparent to the pipeline and applications. The same plug-in-based approach is used for other cloud object stores such as Azure BLOBs and Google Cloud Storage.\n\n## The Nextflow scratch directive\n\nThe idea of staging data from shared repositories to a local disk, as described above, is not new. A common practice with HPC clusters when using NFS file systems is to use local \"scratch\" storage.\n\nA common problem with shared NFS file systems is that they can be relatively slow — especially when there are multiple clients. File systems introduce latency, have limited IO capacity, and are prone to problems such as “hot spots” and bandwidth limitations when multiple clients read and write files in the same directory.\n\nTo avoid bottlenecks, data is often copied from an NFS filer to local scratch storage for processing. Depending on data volumes, users often use fast solid-state drives or [RAM disks](https://www.mvps.net/docs/how-to-mount-the-physical-memory-from-a-linux-system-as-a-partition/) for scratch storage to accelerate processing.\n\nNextflow automates this data handling pattern with built-in support for a [scratch](https://nextflow.io/docs/latest/process.html?highlight=scratch#scratch) directive that can be enabled or disabled per process. If scratch is enabled, data is automatically copied to a designated local scratch device prior to processing.\n\nWhen high-performance file systems such as Lustre or Spectrum Scale are available, the question of whether to use scratch storage becomes more complicated. Depending on the file system and interconnect, parallel file systems performance can sometimes exceed that of local disk. In these cases, customers may set scratch to false and perform I/O directly on the parallel file system.\n\nResults will vary depending on the performance of the shared file system, the speed of local scratch storage, and the amount of shared data to be shuttled back and forth. Users will want to experiment to determine whether enabling scratch benefits pipelines performance.\n\n## Multiple storage options for Nextflow users\n\nStorage solutions used with Nextflow can be grouped into five categories as described below:\n\n- Traditional file systems\n- Cloud object stores\n- Cloud file systems\n- High-performance cloud file systems\n- Fusion file system v2.0\n\nThe optimal choice will depend on your environment and the nature of your applications and compute environments.\n\n**Traditional file systems** — These are file systems typically deployed on-premises that present a POSIX interface. NFS is the most popular choice, but some users may use high-performance parallel file systems. Storage vendors often package their offerings as appliances, making them easier to deploy and manage. Solutions common in on-prem HPC environments include [Network Appliance](https://www.netapp.com/), [Data Direct Networks](https://www.ddn.com/) (DDN), [HPE Cray ClusterStor](https://www.hpe.com/psnow/doc/a00062172enw), and [IBM Storage Scale](https://www.ibm.com/products/storage-scale-system). While customers can deploy self-managed NFS or parallel file systems in the cloud, most don’t bother with this in practice. There are generally better solutions available in the cloud.\n\n**Cloud object stores** — In the cloud, object stores tend to be the most popular solution among Nextflow users. Although object stores don’t present a POSIX interface, they are inexpensive, easy to configure, and scale practically without limit. Depending on performance, access, and retention requirements, customers can purchase different object storage tiers at different price points. Popular cloud object stores include Amazon S3, Azure BLOBs, and Google Cloud Storage. As pipelines execute, the Nextflow executors described above manage data transfers to and from cloud object storage automatically. One drawback is that because of the need to copy data to and from the object store for every process, performance may be lower than a fast shared file system.\n\n**Cloud file systems** — Often, it is desirable to have a shared file NFS system. However, these environments can be tedious to deploy and manage in the cloud. Recognizing this, most cloud providers offer cloud file systems that combine some of the best properties of traditional file systems and object stores. These file systems present a POSIX interface and are accessible via SMB and NFS file-sharing protocols. Like object stores, they are easy to deploy and scalable on demand. Examples include [Amazon EFS](https://aws.amazon.com/efs/), [Azure Files](https://azure.microsoft.com/en-us/products/storage/files/), and [Google Cloud Filestore](https://cloud.google.com/filestore). These file systems are described as \"serverless\" and \"elastic\" because there are no servers to manage, and capacity scales automatically.\n\nComparing price and performance can be tricky because cloud file systems are highly configurable. For example, [Amazon EFS](https://aws.amazon.com/efs/pricing/) is available in [four storage classes](https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html) – Amazon EFS Standard, Amazon EFS Standard-IA, and two One Zone storage classes – Amazon EFS One Zone and Amazon EFS One Zone-IA. Similarly, Azure Files is configurable with [four different redundancy options](https://azure.microsoft.com/en-us/pricing/details/storage/files/), and different billing models apply depending on the offer selected. To provide a comparison, Amazon EFS Standard costs $0.08 /GB-Mo in the US East region, which is ~4x more expensive than Amazon S3.\n\nFrom the perspective of Nextflow users, using Amazon EFS and similar cloud file systems is the same as using a local NFS system. Nextflow users must ensure that their cloud instances mount the NFS share, so there is slightly more management overhead than using an S3 bucket. Nextflow users and administrators can experiment with the scratch directive governing whether Nextflow stages data in a local scratch area or reads and writes directly to the shared file system.\n\nCloud file systems suffer from some of the same limitations as on-prem NFS file systems. They often don’t scale efficiently, and performance is limited by network bandwidth. Also, depending on the pipeline, users may need to stage data to the shared file system in advance, often by copying data from an object store used for long term storage.\n\nFor [Nextflow Tower](https://cloud.tower.nf/) users, there is a convenient integration with Amazon EFS. Tower Cloud users can have an Amazon EFS instance created for them automatically via Tower Forge, or they can leverage an existing EFS instance in their compute environment. In either case, Tower ensures that the EFS share is available to compute hosts in the AWS Batch environment, reducing configuration requirements.\n\n**Cloud high-performance file systems** — For customers that need high levels of performance in the cloud, Amazon offers Amazon FSx. Amazon FSx comes in different flavors, including NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. In HPC circles, [FSx for Lustre](https://aws.amazon.com/fsx/lustre/) is most popular delivering sub-millisecond latency, up to 1 TB/sec maximum throughput per file system, and millions of IOPs. Some Nextflow users with data bottlenecks use FSx for Lustre, but it is more difficult to configure and manage than Amazon S3.\n\nLike Amazon EFS, FSx for Lustre is a fully-managed, serverless, elastic file system. Amazon FSx for Lustre is configurable, depending on customer requirements. For example, customers with latency-sensitive applications can deploy FSx cluster nodes with SSD drives. Customers concerned with cost and throughput can select standard hard drives (HDD). HDD-based FSx for Lustre clusters can be optionally configured with an SSD-based cache to accelerate performance. Customers also choose between different persistent file system options and a scratch file system option. Another factor to remember is that with parallel file systems, bandwidth scales with capacity. If you deploy a Lustre file system that is too small, you may be disappointed in the performance.\n\nFSx for Lustre persistent file systems ranges from 125 to 1,000 MB/s/TiB at [prices](https://aws.amazon.com/fsx/lustre/pricing/) ranging from **$0.145** to **$0.600** per GB month. Amazon also offers a lower-cost scratch FSx for Lustre file systems (not to be confused with the scratch directive in Nextflow). At this tier, FSx for Lustre does not replicate data across availability zones, so it is suited to short-term data storage. Scratch FSx for Lustre storage delivers **200 MB/s/TiB**, costing **$0.140** per GB month. This is **~75%** more expensive than Amazon EFS (Standard) and **~6x** the cost of standard S3 storage. Persistent FSx for Lustre file systems configured to deliver **1,000 MB/s/TiB** can be up to **~26x** the price of standard S3 object storage!\n\n**Hybrid Cloud file systems** — In addition to the solutions described above, there are other solutions that combine the best of object stores and high-performance parallel file systems. An example is [WekaFS™](https://www.weka.io/) from WEKA. WekaFS is used by several Nextflow users and is deployable on-premises or across your choice cloud platforms. WekaFS is attractive because it provides multi-protocol access to the same data (POSIX, S3, NFS, SMB) while presenting a common namespace between on-prem and cloud resident compute environments. Weka delivers the performance benefits of a high-performance parallel file system and optionally uses cloud object storage as a backing store for file system data to help reduce costs.\n\nFrom a Nextflow perspective, WekaFS behaves like any other shared file system. As such, Nextflow and Tower have no specific integration with WEKA. Nextflow users will need to deploy and manage WekaFS themselves making the environment more complex to setup and manage. However, the flexibility and performance provided by a hybrid cloud file system makes this worthwhile for many organizations.\n\n**Fusion file system 2.0** — Fusion file system is a solution developed by [Seqera Labs](https://seqera.io/fusion) that aims to bridge the gap between cloud-native storage and data analysis workflows. The solution implements a thin client that allows pipeline jobs to access object storage using a standard POSIX interface, thus simplifying and speeding up most operations.\n\nThe advantage of the Fusion file system is that there is no need to copy data between S3 and local storage. The Fusion file system driver accesses and manipulates files in Amazon S3 directly. You can learn more about the Fusion file system and how it works in the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nFor sites struggling with performance and scalability issues on shared file systems or object storage, the Fusion file system offers several advantages. [Benchmarks conducted](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) by Seqera Labs have shown that, in some cases, **Fusion can deliver performance on par with Lustre but at a much lower cost.** Fusion is also significantly easier to configure and manage and can result in lower costs for both compute and storage resources.\n\n## Comparing the alternatives\n\nA summary of storage options is presented in the table below:\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Traditional file systems\n \n Cloud object storage\n \n Cloud file systems\n \n Fusion FS\n
\n NFS, Lustre, Spectrum Scale\n \n Amazon S3\n \n Azure BLOB storage\n \n Google Cloud Storage\n \n Amazon EFS\n \n Amazon FSX for Lustre\n \n Azure File\n \n Fusion file system 2.0\n
\n Deployment model\n \n Manual\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n
\n Access model\n \n POSIX\n \n Object\n \n Object\n \n Object\n \n POSIX\n \n POSIX\n \n POSIX\n \n POSIX\n
\n Clouds supported\n \n On-prem, any cloud\n \n AWS only\n \n Azure only\n \n GCP only\n \n AWS only\n \n AWS only\n \n Azure only\n \n AWS, GCP and Azure 1\n
\n Requires block storage\n \n Yes\n \n Optional\n \n Optional\n \n Optional\n \n Optional\n \n No\n \n Optional\n \n No\n
\n Relative cost\n \n $$\n \n $\n \n $\n \n $\n \n $$\n \n $$$\n \n $$\n \n $\n
\n Nextflow plugins\n \n -\n \n nf-amazon\n \n nf-azure\n \n nf-google\n \n -\n \n -\n \n -\n \n nf-amazon\n
\n Tower support\n \n Yes\n \n Yes, existing buckets\n \n Yes, existing BLOB container\n \n Yes, existing cloud storage bucket\n \n Yes, creates EFS instances\n \n Yes, creates FSx for Lustre instances\n \n File system created manually\n \n Yes, fully automated\n
\n Dependencies\n \n Externally configured\n \n Wave Amazon S3\n
\n Cost model\n \n Fixed price on-prem, instance+block storage costs\n \n GB per month\n \n GB per month\n \n GB per month\n \n Multiple factors\n \n Multiple factors\n \n Multiple factors\n \n GB per month (uses S3)\n
\n Level of configuration effort (when used with Tower)\n \n High\n \n Low\n \n Low\n \n Low\n \n Medium (low with Tower)\n \n High (easier with Tower)\n \n Medium\n \n Low\n
\n Works best with:\n \n Any on-prem cluster manager (LSF, Slurm, etc.)\n \n AWS Batch\n \n Azure Batch\n \n Google Cloud Batch\n \n AWS Batch\n \n AWS Batch\n \n Azure Batch\n \n AWS Batch, Amazon EKS, Azure Batch, Google Cloud Batch 1\n
\n
\n\n## So what’s the bottom line?\n\nThe choice or storage solution depends on several factors. Object stores like Amazon S3 are popular because they are convenient and inexpensive. However, depending on data access patterns, and the amount of data to be staged in advance, file systems such as EFS, Azure Files or FSx for Lustre can also be a good alternative.\n\nFor many Nextflow users, Fusion file system will be a better option since it offers performance comparable to a high-performance file system at the cost of cloud object storage. Fusion is also dramatically easier to deploy and manage. [Adding Fusion support](https://nextflow.io/docs/latest/fusion.html) is just a matter of adding a few lines to the `nextflow.config` file.\n\nWhere workloads run is also an important consideration. For example, on-premises clusters will typically use whatever shared file system is available locally. When operating in the cloud, you can choose whether to use cloud file systems, object stores, high-performance file systems, Fusion FS, or hybrid cloud solutions such as Weka.\n\nStill unsure what storage solution will best meet your needs? Consider joining our community at [nextflow.slack.com](https://nextflow.slack.com/). You can engage with others, post technical questions, and learn more about the pros and cons of the storage solutions described above.\n", + "content": "_In this article we present the various storage solutions supported by Nextflow including on-prem and cloud file systems, parallel file systems, and cloud object stores. We also discuss Fusion file system 2.0, a new high-performance file system that can help simplify configuration, improve throughput, and reduce costs in the cloud._\n\nAt one time, selecting a file system for distributed workloads was straightforward. Through the 1990s, the Network File System (NFS), developed by Sun Microsystems in 1984, was pretty much the only game in town. It was part of every UNIX distribution, and it presented a standard [POSIX interface](https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html), meaning that applications could read and write data without modification. Dedicated NFS servers and NAS filers became the norm in most clustered computing environments.\n\nFor organizations that outgrew the capabilities of NFS, other POSIX file systems emerged. These included parallel file systems such as [Lustre](https://www.lustre.org/), [PVFS](https://www.anl.gov/mcs/pvfs-parallel-virtual-file-system), [OpenZFS](https://openzfs.org/wiki/Main_Page), [BeeGFS](https://www.beegfs.io/c/), and [IBM Spectrum Scale](https://www.ibm.com/products/storage-scale-system) (formerly GPFS). Parallel file systems can support thousands of compute clients and deliver more than a TB/sec combined throughput, however, they are expensive, and can be complex to deploy and manage. While some parallel file systems work with standard Ethernet, most rely on specialized low-latency fabrics such as Intel® Omni-Path Architecture (OPA) or InfiniBand. Because of this, these file systems are typically found in only the largest HPC data centers.\n\n## Cloud changes everything\n\nWith the launch of [Amazon S3](https://aws.amazon.com/s3/) in 2006, new choices began to emerge. Rather than being a traditional file system, S3 is an object store accessible through a web API. S3 abandoned traditional ideas around hierarchical file systems. Instead, it presented a simple programmatic interface and CLI for storing and retrieving binary objects.\n\nObject stores are a good fit for cloud services because they are simple and scalable to multiple petabytes of storage. Rather than relying on central metadata that presents a bottleneck, metadata is stored with each object. All operations are atomic, so there is no need for complex POSIX-style file-locking mechanisms that add complexity to the design. Developers interact with object stores using simple calls like [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html) (store an object in a bucket in return for a key) and [GetObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html) (retrieve a binary object, given a key).\n\nThis simple approach was ideal for internet-scale applications. It was also much less expensive than traditional file systems. As a result, S3 usage grew rapidly. Similar object stores quickly emerged, including Microsoft [Azure Blob Storage](https://azure.microsoft.com/en-ca/products/storage/blobs/), [Open Stack Swift](https://wiki.openstack.org/wiki/Swift), and [Google Cloud Storage](https://cloud.google.com/storage/), released in 2010.\n\n## Cloud object stores vs. shared file systems\n\nObject stores are attractive because they are reliable, scalable, and cost-effective. They are frequently used to store large amounts of data that are accessed infrequently. Examples include archives, images, raw video footage, or in the case of bioinformatics applications, libraries of biological samples or reference genomes. Object stores provide near-continuous availability by spreading data replicas across cloud availability zones (AZs). AWS claims theoretical data availability of up to 99.999999999% (11 9's) – a level of availability so high that it does not even register on most [downtime calculators](https://availability.sre.xyz/)!\n\nBecause they support both near-line and cold storage, object stores are sometimes referred to as \"cheap and deep.\" Based on current [S3 pricing](https://aws.amazon.com/s3/pricing), the going rate for data storage is USD 0.023 per GB for the first 50 TB of data. Users can \"pay as they go\" — spinning up S3 storage buckets and storing arbitrary amounts of data for as long as they choose. Some high-level differences between object stores and traditional file systems are summarized below.\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n **Cloud object stores**\n \n **Traditional file systems**\n
\n Interface / access protocol\n \n HTTP-based API\n \n POSIX interface\n
\n Cost\n \n $\n \n $$$\n
\n Scalability / capacity\n \n Practically unlimited\n \n Limited\n
\n Reliability / availability\n \n Extremely high\n \n Varies\n
\n Performance\n \n Typically lower\n \n Varies\n
\n Support for existing application\n \n NO\n \n YES\n
\n
\n\nThe downside of object storage is that the vast majority of applications are written to work with POSIX file systems. As a result, applications seldom interact directly with object stores. A common practice is to copy data from an object store, perform calculations locally on a cluster node, and write results back to the object store for long-term storage.\n\n## Data handling in Nextflow\n\nUnlike older pipeline orchestrators, Nextflow was built with cloud object stores in mind. Depending on the cloud where pipelines run, Nextflow manages cloud credentials and allows users to provide a path to shared data. This can be a shared file system such as `/my-shared-filesystem/data` or a cloud object store e.g. `s3://my-bucket/data/`.\n\n**Nextflow is exceptionally versatile when it comes to data handling, and can support almost any file system or object store.** Internally, Nextflow uses [executors](https://nextflow.io/docs/latest/executor.html) implemented as plug-ins to insulate pipeline code from underlying compute and storage environments. This enables pipelines to run without modification across multiple clouds regardless of the underlying storage technology.\n\nSuppose an S3 bucket is specified as a location for shared data during pipeline execution. In that case, aided by the [nf-amazon](https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-amazon) plug-in, Nextflow transparently copies data from the S3 bucket to a file system on a cloud instance. Containerized applications mount the local file system and read and write data directly. Once processing is complete, Nextflow copies data to the shared bucket to be available for the next task. All of this is completely transparent to the pipeline and applications. The same plug-in-based approach is used for other cloud object stores such as Azure BLOBs and Google Cloud Storage.\n\n## The Nextflow scratch directive\n\nThe idea of staging data from shared repositories to a local disk, as described above, is not new. A common practice with HPC clusters when using NFS file systems is to use local \"scratch\" storage.\n\nA common problem with shared NFS file systems is that they can be relatively slow — especially when there are multiple clients. File systems introduce latency, have limited IO capacity, and are prone to problems such as “hot spots” and bandwidth limitations when multiple clients read and write files in the same directory.\n\nTo avoid bottlenecks, data is often copied from an NFS filer to local scratch storage for processing. Depending on data volumes, users often use fast solid-state drives or [RAM disks](https://www.mvps.net/docs/how-to-mount-the-physical-memory-from-a-linux-system-as-a-partition/) for scratch storage to accelerate processing.\n\nNextflow automates this data handling pattern with built-in support for a [scratch](https://nextflow.io/docs/latest/process.html?highlight=scratch#scratch) directive that can be enabled or disabled per process. If scratch is enabled, data is automatically copied to a designated local scratch device prior to processing.\n\nWhen high-performance file systems such as Lustre or Spectrum Scale are available, the question of whether to use scratch storage becomes more complicated. Depending on the file system and interconnect, parallel file systems performance can sometimes exceed that of local disk. In these cases, customers may set scratch to false and perform I/O directly on the parallel file system.\n\nResults will vary depending on the performance of the shared file system, the speed of local scratch storage, and the amount of shared data to be shuttled back and forth. Users will want to experiment to determine whether enabling scratch benefits pipelines performance.\n\n## Multiple storage options for Nextflow users\n\nStorage solutions used with Nextflow can be grouped into five categories as described below:\n\n- Traditional file systems\n- Cloud object stores\n- Cloud file systems\n- High-performance cloud file systems\n- Fusion file system v2.0\n\nThe optimal choice will depend on your environment and the nature of your applications and compute environments.\n\n**Traditional file systems** — These are file systems typically deployed on-premises that present a POSIX interface. NFS is the most popular choice, but some users may use high-performance parallel file systems. Storage vendors often package their offerings as appliances, making them easier to deploy and manage. Solutions common in on-prem HPC environments include [Network Appliance](https://www.netapp.com/), [Data Direct Networks](https://www.ddn.com/) (DDN), [HPE Cray ClusterStor](https://www.hpe.com/psnow/doc/a00062172enw), and [IBM Storage Scale](https://www.ibm.com/products/storage-scale-system). While customers can deploy self-managed NFS or parallel file systems in the cloud, most don’t bother with this in practice. There are generally better solutions available in the cloud.\n\n**Cloud object stores** — In the cloud, object stores tend to be the most popular solution among Nextflow users. Although object stores don’t present a POSIX interface, they are inexpensive, easy to configure, and scale practically without limit. Depending on performance, access, and retention requirements, customers can purchase different object storage tiers at different price points. Popular cloud object stores include Amazon S3, Azure BLOBs, and Google Cloud Storage. As pipelines execute, the Nextflow executors described above manage data transfers to and from cloud object storage automatically. One drawback is that because of the need to copy data to and from the object store for every process, performance may be lower than a fast shared file system.\n\n**Cloud file systems** — Often, it is desirable to have a shared file NFS system. However, these environments can be tedious to deploy and manage in the cloud. Recognizing this, most cloud providers offer cloud file systems that combine some of the best properties of traditional file systems and object stores. These file systems present a POSIX interface and are accessible via SMB and NFS file-sharing protocols. Like object stores, they are easy to deploy and scalable on demand. Examples include [Amazon EFS](https://aws.amazon.com/efs/), [Azure Files](https://azure.microsoft.com/en-us/products/storage/files/), and [Google Cloud Filestore](https://cloud.google.com/filestore). These file systems are described as \"serverless\" and \"elastic\" because there are no servers to manage, and capacity scales automatically.\n\nComparing price and performance can be tricky because cloud file systems are highly configurable. For example, [Amazon EFS](https://aws.amazon.com/efs/pricing/) is available in [four storage classes](https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html) – Amazon EFS Standard, Amazon EFS Standard-IA, and two One Zone storage classes – Amazon EFS One Zone and Amazon EFS One Zone-IA. Similarly, Azure Files is configurable with [four different redundancy options](https://azure.microsoft.com/en-us/pricing/details/storage/files/), and different billing models apply depending on the offer selected. To provide a comparison, Amazon EFS Standard costs $0.08 /GB-Mo in the US East region, which is ~4x more expensive than Amazon S3.\n\nFrom the perspective of Nextflow users, using Amazon EFS and similar cloud file systems is the same as using a local NFS system. Nextflow users must ensure that their cloud instances mount the NFS share, so there is slightly more management overhead than using an S3 bucket. Nextflow users and administrators can experiment with the scratch directive governing whether Nextflow stages data in a local scratch area or reads and writes directly to the shared file system.\n\nCloud file systems suffer from some of the same limitations as on-prem NFS file systems. They often don’t scale efficiently, and performance is limited by network bandwidth. Also, depending on the pipeline, users may need to stage data to the shared file system in advance, often by copying data from an object store used for long term storage.\n\nFor [Nextflow Tower](https://cloud.tower.nf/) users, there is a convenient integration with Amazon EFS. Tower Cloud users can have an Amazon EFS instance created for them automatically via Tower Forge, or they can leverage an existing EFS instance in their compute environment. In either case, Tower ensures that the EFS share is available to compute hosts in the AWS Batch environment, reducing configuration requirements.\n\n**Cloud high-performance file systems** — For customers that need high levels of performance in the cloud, Amazon offers Amazon FSx. Amazon FSx comes in different flavors, including NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. In HPC circles, [FSx for Lustre](https://aws.amazon.com/fsx/lustre/) is most popular delivering sub-millisecond latency, up to 1 TB/sec maximum throughput per file system, and millions of IOPs. Some Nextflow users with data bottlenecks use FSx for Lustre, but it is more difficult to configure and manage than Amazon S3.\n\nLike Amazon EFS, FSx for Lustre is a fully-managed, serverless, elastic file system. Amazon FSx for Lustre is configurable, depending on customer requirements. For example, customers with latency-sensitive applications can deploy FSx cluster nodes with SSD drives. Customers concerned with cost and throughput can select standard hard drives (HDD). HDD-based FSx for Lustre clusters can be optionally configured with an SSD-based cache to accelerate performance. Customers also choose between different persistent file system options and a scratch file system option. Another factor to remember is that with parallel file systems, bandwidth scales with capacity. If you deploy a Lustre file system that is too small, you may be disappointed in the performance.\n\nFSx for Lustre persistent file systems ranges from 125 to 1,000 MB/s/TiB at [prices](https://aws.amazon.com/fsx/lustre/pricing/) ranging from **$0.145** to **$0.600** per GB month. Amazon also offers a lower-cost scratch FSx for Lustre file systems (not to be confused with the scratch directive in Nextflow). At this tier, FSx for Lustre does not replicate data across availability zones, so it is suited to short-term data storage. Scratch FSx for Lustre storage delivers **200 MB/s/TiB**, costing **$0.140** per GB month. This is **~75%** more expensive than Amazon EFS (Standard) and **~6x** the cost of standard S3 storage. Persistent FSx for Lustre file systems configured to deliver **1,000 MB/s/TiB** can be up to **~26x** the price of standard S3 object storage!\n\n**Hybrid Cloud file systems** — In addition to the solutions described above, there are other solutions that combine the best of object stores and high-performance parallel file systems. An example is [WekaFS™](https://www.weka.io/) from WEKA. WekaFS is used by several Nextflow users and is deployable on-premises or across your choice cloud platforms. WekaFS is attractive because it provides multi-protocol access to the same data (POSIX, S3, NFS, SMB) while presenting a common namespace between on-prem and cloud resident compute environments. Weka delivers the performance benefits of a high-performance parallel file system and optionally uses cloud object storage as a backing store for file system data to help reduce costs.\n\nFrom a Nextflow perspective, WekaFS behaves like any other shared file system. As such, Nextflow and Tower have no specific integration with WEKA. Nextflow users will need to deploy and manage WekaFS themselves making the environment more complex to setup and manage. However, the flexibility and performance provided by a hybrid cloud file system makes this worthwhile for many organizations.\n\n**Fusion file system 2.0** — Fusion file system is a solution developed by [Seqera Labs](https://seqera.io/fusion) that aims to bridge the gap between cloud-native storage and data analysis workflows. The solution implements a thin client that allows pipeline jobs to access object storage using a standard POSIX interface, thus simplifying and speeding up most operations.\n\nThe advantage of the Fusion file system is that there is no need to copy data between S3 and local storage. The Fusion file system driver accesses and manipulates files in Amazon S3 directly. You can learn more about the Fusion file system and how it works in the whitepaper [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/).\n\nFor sites struggling with performance and scalability issues on shared file systems or object storage, the Fusion file system offers several advantages. [Benchmarks conducted](https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/) by Seqera Labs have shown that, in some cases, **Fusion can deliver performance on par with Lustre but at a much lower cost.** Fusion is also significantly easier to configure and manage and can result in lower costs for both compute and storage resources.\n\n## Comparing the alternatives\n\nA summary of storage options is presented in the table below:\n\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n **Traditional file systems**\n \n **Cloud object storage**\n \n **Cloud file systems**\n \n **Fusion FS**\n
\n NFS, Lustre, Spectrum Scale\n \n Amazon S3\n \n Azure BLOB storage\n \n Google Cloud Storage\n \n Amazon EFS\n \n Amazon FSX for Lustre\n \n Azure File\n \n Fusion file system 2.0\n
\n **Deployment model**\n \n Manual\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n
\n **Access model**\n \n POSIX\n \n Object\n \n Object\n \n Object\n \n POSIX\n \n POSIX\n \n POSIX\n \n POSIX\n
\n **Clouds supported**\n \n On-prem, any cloud\n \n AWS only\n \n Azure only\n \n GCP only\n \n AWS only\n \n AWS only\n \n Azure only\n \n AWS, GCP and Azure ^1^\n
\n **Requires block storage**\n \n Yes\n \n Optional\n \n Optional\n \n Optional\n \n Optional\n \n No\n \n Optional\n \n No\n
\n **Relative cost**\n \n $$\n \n $\n \n $\n \n $\n \n $$\n \n $$$\n \n $$\n \n $\n
\n **Nextflow plugins**\n \n -\n \n nf-amazon\n \n nf-azure\n \n nf-google\n \n -\n \n -\n \n -\n \n nf-amazon\n
\n **Tower support**\n \n Yes\n \n Yes, existing buckets\n \n Yes, existing BLOB container\n \n Yes, existing cloud storage bucket\n \n Yes, creates EFS instances\n \n Yes, creates FSx for Lustre instances\n \n File system created manually\n \n Yes, fully automated\n
\n **Dependencies**\n \n Externally configured\n \n Wave Amazon S3\n
\n **Cost model**\n \n Fixed price on-prem, instance+block storage costs\n \n GB per month\n \n GB per month\n \n GB per month\n \n Multiple factors\n \n Multiple factors\n \n Multiple factors\n \n GB per month (uses S3)\n
\n **Level of configuration effort (when used with Tower)**\n \n High\n \n Low\n \n Low\n \n Low\n \n Medium (low with Tower)\n \n High (easier with Tower)\n \n Medium\n \n Low\n
\n **Works best with:**\n \n Any on-prem cluster manager (LSF, Slurm, etc.)\n \n AWS Batch\n \n Azure Batch\n \n Google Cloud Batch\n \n AWS Batch\n \n AWS Batch\n \n Azure Batch\n \n AWS Batch, Amazon EKS, Azure Batch, Google Cloud Batch ^1^\n
\n
\n\n## So what’s the bottom line?\n\nThe choice or storage solution depends on several factors. Object stores like Amazon S3 are popular because they are convenient and inexpensive. However, depending on data access patterns, and the amount of data to be staged in advance, file systems such as EFS, Azure Files or FSx for Lustre can also be a good alternative.\n\nFor many Nextflow users, Fusion file system will be a better option since it offers performance comparable to a high-performance file system at the cost of cloud object storage. Fusion is also dramatically easier to deploy and manage. [Adding Fusion support](https://nextflow.io/docs/latest/fusion.html) is just a matter of adding a few lines to the `nextflow.config` file.\n\nWhere workloads run is also an important consideration. For example, on-premises clusters will typically use whatever shared file system is available locally. When operating in the cloud, you can choose whether to use cloud file systems, object stores, high-performance file systems, Fusion FS, or hybrid cloud solutions such as Weka.\n\nStill unsure what storage solution will best meet your needs? Consider joining our community at [nextflow.slack.com](https://nextflow.slack.com/). You can engage with others, post technical questions, and learn more about the pros and cons of the storage solutions described above.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow" @@ -647,7 +647,7 @@ "slug": "2023/the-state-of-kubernetes-in-nextflow", "title": "The State of Kubernetes in Nextflow", "date": "2023-03-10T00:00:00.000Z", - "content": "\nHi, my name is Ben, and I’m a software engineer at Seqera Labs. I joined Seqera in November 2021 after finishing my Ph.D. at Clemson University. I work on a number of things at Seqera, but my primary role is that of a Nextflow core contributor.\n\nI have run Nextflow just about everywhere, from my laptop to my university cluster to the cloud and Kubernetes. I have written Nextlfow pipelines for bioinformatics and machine learning, and I even wrote a pipeline to run other Nextflow pipelines for my [dissertation research](https://github.com/bentsherman/tesseract). While I tried to avoid contributing code to Nextflow as a student (I had enough work already), now I get to work on it full-time!\n\nWhich brings me to the topic of this post: Nextflow and Kubernetes.\n\nOne of my first contributions was a “[best practices guide](https://github.com/seqeralabs/nf-k8s-best-practices)” for running Nextflow on Kubernetes. The guide has helped many people, but for me it provided a map for how to improve K8s support in Nextflow. You see, Nextflow was originally built for HPC, while Kubernetes and cloud batch executors were added later. While Nextflow’s extensible design makes adding features like new executors relatively easy, support for Kubernetes is still a bit spotty.\n\nSo, I set out to make Nextflow + K8s great! Over the past year, in collaboration with talented members of the Nextflow community, we have added all sorts of enhancements to the K8s executor. In this blog post, I’d like to show off all of these improvements in one place. So here we go!\n\n## New features\n\n### Submit tasks as Kubernetes Jobs\n\n_New in version 22.05.0-edge._\n\nNextflow submits tasks as Pods by default, which is sort of a bad practice. In Kubernetes, every Pod should be created through a controller (e.g., Deployment, Job, StatefulSet) so that Pod failures can be handled automatically. For Nextflow, the appropriate controller is a K8s Job. Using Jobs instead of Pods directly has greatly improved the stability of large Nextflow runs on Kubernetes, and will likely become the default behavior in a future version.\n\nYou can enable this feature with the following configuration option:\n\n```groovy\nk8s.computeResourceType = 'Job'\n```\n\nCredit goes to @xhejtman from CERIT-SC for leading the charge on this one!\n\n### Object storage as the work directory\n\n_New in version 22.10.0._\n\nOne of the most difficult aspects of using Nextflow with Kubernetes is that Nextflow needs a `PersistentVolumeClaim` (PVC) to store the shared work directory, which also means that Nextflow itself must run inside the Kubernetes cluster in order to access this storage. While the `kuberun` command attempts to automate this process, it has never been reliable enough for production usage.\n\nAt the Nextflow Summit in October 2022, we introduced [Fusion](https://seqera.io/fusion/), a file system driver that can mount S3 buckets as POSIX-like directories. The combination of Fusion and [Wave](https://seqera.io/wave/) (a just-in-time container provisioning service) enables you to have your work directory in S3-compatible storage. See the [Wave blog post](https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html) for an explanation of how it works — it’s pretty cool.\n\nThis functionality is useful in general, but it is especially useful for Kubernetes, because (1) you don’t need to provision your own PVC and (2) you can run Nextflow on Kubernetes without using `kuberun` or creating your own submitter Pod.\n\nThis feature currently supports AWS S3 on Elastic Kubernetes Service (EKS) clusters and Google Cloud Storage on Google Kubernetes Engine (GKE) clusters.\n\nCheck out [this article](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/) over at the Seqera blog for an in-depth guide to running Nextflow (with Fusion) on Amazon EKS.\n\n### No CPU limits by default\n\n_New in version 22.11.0-edge._\n\nWe have changed the default behavior of CPU requests for the K8s executor. Before, a single number in a Nextflow resource request (e.g., `cpus = 8`) was interpreted as both a “request” (lower bound) and a “limit” (upper bound) in the Pod definition. However, setting an explicit CPU limit in K8s is increasingly seen as an anti-pattern (see [this blog post](https://home.robusta.dev/blog/stop-using-cpu-limits) for an explanation). The bottom line is that it is better to specify a request without a limit, because that will ensure that each task has the CPU time it requested, while also allowing the task to use more CPU time if it is available. Unlike other resources like memory and disk, CPU time is compressible — it can be given and taken away without killing the application.\n\nWe have also updated the Docker integration in Nextflow to use [CPU shares](https://www.batey.info/cgroup-cpu-shares-for-docker.html), which is the mechanism used by [Kubernetes](https://www.batey.info/cgroup-cpu-shares-for-kubernetes.html) and [AWS Batch](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions) under the hood to define expandable CPU requests. These changes make the behavior of CPU requests in Nextflow much more consistent across executors.\n\n### CSI ephemeral volumes\n\n_New in version 22.11.0-edge._\n\nIn Kubernetes, volumes are used to provide storage and data (e.g., configuration and secrets) to Pods. Persistent volumes exist independently of Pods and can be mounted and unmounted over time, while ephemeral volumes are attached to a single Pod and are created and destroyed alongside it. While Nextflow can use any persistent volume through a `PersistentVolumeClaim`, ephemeral volume types are supported on a case-by-case basis. For example, `ConfigMaps` and `Secrets` are two ephemeral volume types that are already supported by Nextflow.\n\nNextflow now also supports [CSI ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes). CSI stands for Container Storage Interface, and it is a standard used by Kubernetes to support third-party storage systems as volumes. The most common example of a CSI ephemeral volume is [Secrets Store](https://secrets-store-csi-driver.sigs.k8s.io/getting-started/usage.html), which is used to inject secrets from a remote vault such as [Hashicorp Vault](https://www.vaultproject.io/) or [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/).\n\n_Note: CSI persistent volumes can already be used in Nextflow through a `PersistentVolumeClaim`._\n\n### Local disk storage for tasks\n\n_New in version 22.11.0-edge._\n\nNextflow uses a shared work directory to coordinate tasks. Each task receives its own subdirectory with the required input files, and each task is expected to write its output files to this directory. As a workflow scales to thousands of concurrent tasks, this shared storage becomes a major performance bottleneck. We are investigating a few different ways to overcome this challenge.\n\nOne of the tools we have to reduce I/O pressure on the shared work directory is to make tasks use local storage. For example, if a task takes input file A, produces an intermediate file B, then produces an output file C, the file B can be written to local storage instead of shared storage because it isn’t a required output file. Or, if the task writes an output file line by line instead of all at once at the end, it can stream the output to local storage first and then copy the file to shared storage.\n\nWhile it is far from a comprehensive solution, local storage can reduce I/O congestion in some cases. Provisioning local storage for every task looks different on every platform, and in some cases it is not supported. Fortunately, Kubernetes provides a seamless interface for local storage, and now Nextflow supports it as well.\n\nTo provision local storage for tasks, you must (1) add an `emptyDir` volume to your Pod options, (2) request disk storage via the `disk` directive, and (3) direct tasks to use the local storage with the `scratch` directive. Here’s an example:\n\n```groovy\nprocess {\n disk = 10.GB\n pod = [ [emptyDir: [:], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nAs a bonus, you can also provision an `emptyDir` backed by memory:\n\n```groovy\nprocess {\n memory = 10.GB\n pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nNextflow maps the `disk` directive to the [`ephemeral-storage`](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage) resource request, which is provided by the [`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume (another ephemeral volume type).\n\n### Miscellaneous\n\nCheck the [release notes](https://github.com/nextflow-io/nextflow/releases) or the list of [K8s pull requests](https://github.com/nextflow-io/nextflow/pulls?q=is%3Apr+label%3Aplatform%2Fk8s) on Github to see what else has been added. Here are some notable improvements from the past year:\n\n- Support Pod `affinity` ([640cbed4](https://github.com/nextflow-io/nextflow/commit/640cbed4813a34887d4dc10f87fa2e4aa524d055))\n- Support Pod `automountServiceAccountToken` ([1b5908e4](https://github.com/nextflow-io/nextflow/commit/1b5908e4cbbb79f93be2889eec3acfa6242068a1))\n- Support Pod `priorityClassName` ([51650f8c](https://github.com/nextflow-io/nextflow/commit/51650f8c411ba40f0966031035e7a47c036f542e))\n- Support Pod `tolerations` ([7f7cdadc](https://github.com/nextflow-io/nextflow/commit/7f7cdadc6a36d0fb99ef125f6c6f89bfca8ca52e))\n- Support `time` directive via `activeDeadlineSeconds` ([2b6f70a8](https://github.com/nextflow-io/nextflow/commit/2b6f70a8fa55b993fa48755f7a47ac9e1b584e48))\n- Improved control over error conditions ([064f9bc4](https://github.com/nextflow-io/nextflow/commit/064f9bc4), [58be2128](https://github.com/nextflow-io/nextflow/commit/58be2128), [d86ddc36](https://github.com/nextflow-io/nextflow/commit/d86ddc36))\n- Improved support for labels and queue annotation ([9951fcd9](https://github.com/nextflow-io/nextflow/commit/9951fcd9), [4df8c8d2](https://github.com/nextflow-io/nextflow/commit/4df8c8d2))\n- Add support for AWS IAM role for Service Accounts ([62df42c3](https://github.com/nextflow-io/nextflow/commit/62df42c3), [c3364d0f](https://github.com/nextflow-io/nextflow/commit/c3364d0f), [b3d33e3b](https://github.com/nextflow-io/nextflow/commit/b3d33e3b))\n\n## Beyond Kubernetes\n\nWe’ve added tons of value to Nextflow over the past year – not just in terms of Kubernetes support, but also in terms of performance, stability, and integrations with other technologies – and we aren’t stopping any time soon! We have greater ambitions still for Nextflow, and I for one am looking forward to what we will accomplish together. As always, keep an eye on this blog, as well as the [Nextflow GitHub](https://github.com/nextflow-io/nextflow) page, for the latest updates to Nextflow.\n", + "content": "Hi, my name is Ben, and I’m a software engineer at Seqera Labs. I joined Seqera in November 2021 after finishing my Ph.D. at Clemson University. I work on a number of things at Seqera, but my primary role is that of a Nextflow core contributor.\n\nI have run Nextflow just about everywhere, from my laptop to my university cluster to the cloud and Kubernetes. I have written Nextlfow pipelines for bioinformatics and machine learning, and I even wrote a pipeline to run other Nextflow pipelines for my [dissertation research](https://github.com/bentsherman/tesseract). While I tried to avoid contributing code to Nextflow as a student (I had enough work already), now I get to work on it full-time!\n\nWhich brings me to the topic of this post: Nextflow and Kubernetes.\n\nOne of my first contributions was a “[best practices guide](https://github.com/seqeralabs/nf-k8s-best-practices)” for running Nextflow on Kubernetes. The guide has helped many people, but for me it provided a map for how to improve K8s support in Nextflow. You see, Nextflow was originally built for HPC, while Kubernetes and cloud batch executors were added later. While Nextflow’s extensible design makes adding features like new executors relatively easy, support for Kubernetes is still a bit spotty.\n\nSo, I set out to make Nextflow + K8s great! Over the past year, in collaboration with talented members of the Nextflow community, we have added all sorts of enhancements to the K8s executor. In this blog post, I’d like to show off all of these improvements in one place. So here we go!\n\n## New features\n\n### Submit tasks as Kubernetes Jobs\n\n_New in version 22.05.0-edge._\n\nNextflow submits tasks as Pods by default, which is sort of a bad practice. In Kubernetes, every Pod should be created through a controller (e.g., Deployment, Job, StatefulSet) so that Pod failures can be handled automatically. For Nextflow, the appropriate controller is a K8s Job. Using Jobs instead of Pods directly has greatly improved the stability of large Nextflow runs on Kubernetes, and will likely become the default behavior in a future version.\n\nYou can enable this feature with the following configuration option:\n\n```groovy\nk8s.computeResourceType = 'Job'\n```\n\nCredit goes to @xhejtman from CERIT-SC for leading the charge on this one!\n\n### Object storage as the work directory\n\n_New in version 22.10.0._\n\nOne of the most difficult aspects of using Nextflow with Kubernetes is that Nextflow needs a `PersistentVolumeClaim` (PVC) to store the shared work directory, which also means that Nextflow itself must run inside the Kubernetes cluster in order to access this storage. While the `kuberun` command attempts to automate this process, it has never been reliable enough for production usage.\n\nAt the Nextflow Summit in October 2022, we introduced [Fusion](https://seqera.io/fusion/), a file system driver that can mount S3 buckets as POSIX-like directories. The combination of Fusion and [Wave](https://seqera.io/wave/) (a just-in-time container provisioning service) enables you to have your work directory in S3-compatible storage. See the [Wave blog post](https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html) for an explanation of how it works — it’s pretty cool.\n\nThis functionality is useful in general, but it is especially useful for Kubernetes, because (1) you don’t need to provision your own PVC and (2) you can run Nextflow on Kubernetes without using `kuberun` or creating your own submitter Pod.\n\nThis feature currently supports AWS S3 on Elastic Kubernetes Service (EKS) clusters and Google Cloud Storage on Google Kubernetes Engine (GKE) clusters.\n\nCheck out [this article](https://seqera.io/blog/deploying-nextflow-on-amazon-eks/) over at the Seqera blog for an in-depth guide to running Nextflow (with Fusion) on Amazon EKS.\n\n### No CPU limits by default\n\n_New in version 22.11.0-edge._\n\nWe have changed the default behavior of CPU requests for the K8s executor. Before, a single number in a Nextflow resource request (e.g., `cpus = 8`) was interpreted as both a “request” (lower bound) and a “limit” (upper bound) in the Pod definition. However, setting an explicit CPU limit in K8s is increasingly seen as an anti-pattern (see [this blog post](https://home.robusta.dev/blog/stop-using-cpu-limits) for an explanation). The bottom line is that it is better to specify a request without a limit, because that will ensure that each task has the CPU time it requested, while also allowing the task to use more CPU time if it is available. Unlike other resources like memory and disk, CPU time is compressible — it can be given and taken away without killing the application.\n\nWe have also updated the Docker integration in Nextflow to use [CPU shares](https://www.batey.info/cgroup-cpu-shares-for-docker.html), which is the mechanism used by [Kubernetes](https://www.batey.info/cgroup-cpu-shares-for-kubernetes.html) and [AWS Batch](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions) under the hood to define expandable CPU requests. These changes make the behavior of CPU requests in Nextflow much more consistent across executors.\n\n### CSI ephemeral volumes\n\n_New in version 22.11.0-edge._\n\nIn Kubernetes, volumes are used to provide storage and data (e.g., configuration and secrets) to Pods. Persistent volumes exist independently of Pods and can be mounted and unmounted over time, while ephemeral volumes are attached to a single Pod and are created and destroyed alongside it. While Nextflow can use any persistent volume through a `PersistentVolumeClaim`, ephemeral volume types are supported on a case-by-case basis. For example, `ConfigMaps` and `Secrets` are two ephemeral volume types that are already supported by Nextflow.\n\nNextflow now also supports [CSI ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes). CSI stands for Container Storage Interface, and it is a standard used by Kubernetes to support third-party storage systems as volumes. The most common example of a CSI ephemeral volume is [Secrets Store](https://secrets-store-csi-driver.sigs.k8s.io/getting-started/usage.html), which is used to inject secrets from a remote vault such as [Hashicorp Vault](https://www.vaultproject.io/) or [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/).\n\n_Note: CSI persistent volumes can already be used in Nextflow through a `PersistentVolumeClaim`._\n\n### Local disk storage for tasks\n\n_New in version 22.11.0-edge._\n\nNextflow uses a shared work directory to coordinate tasks. Each task receives its own subdirectory with the required input files, and each task is expected to write its output files to this directory. As a workflow scales to thousands of concurrent tasks, this shared storage becomes a major performance bottleneck. We are investigating a few different ways to overcome this challenge.\n\nOne of the tools we have to reduce I/O pressure on the shared work directory is to make tasks use local storage. For example, if a task takes input file A, produces an intermediate file B, then produces an output file C, the file B can be written to local storage instead of shared storage because it isn’t a required output file. Or, if the task writes an output file line by line instead of all at once at the end, it can stream the output to local storage first and then copy the file to shared storage.\n\nWhile it is far from a comprehensive solution, local storage can reduce I/O congestion in some cases. Provisioning local storage for every task looks different on every platform, and in some cases it is not supported. Fortunately, Kubernetes provides a seamless interface for local storage, and now Nextflow supports it as well.\n\nTo provision local storage for tasks, you must (1) add an `emptyDir` volume to your Pod options, (2) request disk storage via the `disk` directive, and (3) direct tasks to use the local storage with the `scratch` directive. Here’s an example:\n\n```groovy\nprocess {\n disk = 10.GB\n pod = [ [emptyDir: [:], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nAs a bonus, you can also provision an `emptyDir` backed by memory:\n\n```groovy\nprocess {\n memory = 10.GB\n pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}\n```\n\nNextflow maps the `disk` directive to the [`ephemeral-storage`](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage) resource request, which is provided by the [`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume (another ephemeral volume type).\n\n### Miscellaneous\n\nCheck the [release notes](https://github.com/nextflow-io/nextflow/releases) or the list of [K8s pull requests](https://github.com/nextflow-io/nextflow/pulls?q=is%3Apr+label%3Aplatform%2Fk8s) on Github to see what else has been added. Here are some notable improvements from the past year:\n\n- Support Pod `affinity` ([640cbed4](https://github.com/nextflow-io/nextflow/commit/640cbed4813a34887d4dc10f87fa2e4aa524d055))\n- Support Pod `automountServiceAccountToken` ([1b5908e4](https://github.com/nextflow-io/nextflow/commit/1b5908e4cbbb79f93be2889eec3acfa6242068a1))\n- Support Pod `priorityClassName` ([51650f8c](https://github.com/nextflow-io/nextflow/commit/51650f8c411ba40f0966031035e7a47c036f542e))\n- Support Pod `tolerations` ([7f7cdadc](https://github.com/nextflow-io/nextflow/commit/7f7cdadc6a36d0fb99ef125f6c6f89bfca8ca52e))\n- Support `time` directive via `activeDeadlineSeconds` ([2b6f70a8](https://github.com/nextflow-io/nextflow/commit/2b6f70a8fa55b993fa48755f7a47ac9e1b584e48))\n- Improved control over error conditions ([064f9bc4](https://github.com/nextflow-io/nextflow/commit/064f9bc4), [58be2128](https://github.com/nextflow-io/nextflow/commit/58be2128), [d86ddc36](https://github.com/nextflow-io/nextflow/commit/d86ddc36))\n- Improved support for labels and queue annotation ([9951fcd9](https://github.com/nextflow-io/nextflow/commit/9951fcd9), [4df8c8d2](https://github.com/nextflow-io/nextflow/commit/4df8c8d2))\n- Add support for AWS IAM role for Service Accounts ([62df42c3](https://github.com/nextflow-io/nextflow/commit/62df42c3), [c3364d0f](https://github.com/nextflow-io/nextflow/commit/c3364d0f), [b3d33e3b](https://github.com/nextflow-io/nextflow/commit/b3d33e3b))\n\n## Beyond Kubernetes\n\nWe’ve added tons of value to Nextflow over the past year – not just in terms of Kubernetes support, but also in terms of performance, stability, and integrations with other technologies – and we aren’t stopping any time soon! We have greater ambitions still for Nextflow, and I for one am looking forward to what we will accomplish together. As always, keep an eye on this blog, as well as the [Nextflow GitHub](https://github.com/nextflow-io/nextflow) page, for the latest updates to Nextflow.", "images": [], "author": "Ben Sherman", "tags": "nextflow, kubernetes" @@ -656,7 +656,7 @@ "slug": "2024/addressing-bioinformatics-core-challenges", "title": "Addressing Bioinformatics Core Challenges with Nextflow and nf-core", "date": "2024-09-11T00:00:00.000Z", - "content": "\nI was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face. From managing complex pipelines to optimizing resource utilization, Nextflow offers a range of benefits that can streamline workflows and improve productivity. In this blog, I'll summarize my talk and share insights on how Nextflow can help overcome some of those challenges, including meeting the needs of a wide range of users or customers, automate reporting, customising pipelines and training.\n\n### Challenge 1: running multiple services\n\n_Challenge description: “I have a wide range of stakeholders, and my pipelines need to address different needs in multiple scientific domains”_\n\nOne of the biggest challenges faced by bioinformatics cores is catering to a diverse range of users with varying applications. On one hand, one might need to run analyses for researchers focused on cancer or human genetics. On the other hand, one may also need to support scientists working with mass spectrometry or metagenomics. Fortunately, the nf-core community has made it relatively easy to tackle these diverse needs with their curated pipelines. These pipelines are ready to use, covering a broad spectrum of applications, from genomics and metagenomics to immunology and mass spectrometry. In one of my slides I showed a non-exhaustive list, which spans genomics, metagenomics, immunology, mass spec, and more: one can find best-practice pipelines for almost any bioinformatics application imaginable, including emerging areas like imaging and spatial-omics. By leveraging this framework, one can not only tap into the expertise of the pipeline developers but also engage with them to discuss specific needs and requirements. This collaborative approach can significantly ease the deployment of a workflow, allowing the user to focus on high-priority tasks while ensuring that the analyses are always up to date and aligned with current best practices.\n\n### Challenge 2: customising applications\n\n_Challenge description: “We often need to customise our applications and pipeline, to meet specific in-house needs of our users”_\n\nWhile ready-to-use applications are a huge advantage, there are times when customisation is necessary. Perhaps the standard pipeline that works for most users doesn't quite meet the specific needs of a facilities user or customer. Fortunately, the nf-core community has got these cases covered. With over 1,300 modules at everyone’s disposal, one can easily compose their own pipeline using the nf-core components and tooling. Should that not be enough though, one can even create a pipeline from scratch using nf-core tools. For instance, one can run a simple command like “nf-core create” followed by the name of the pipeline, and voilà! The software package will create a complete skeleton for the pipeline, filled with pre-compiled code and placeholders to ease customisation. This process is incredibly quick, as I demonstrated in a video clip during the talk, where a pipeline skeleton was created in just a few moments.\n\nOf course, customisation isn't limited to pipelines. It also applies to containers, which are a crucial enabler of portability. When it comes to containers, Nextflow users have two options: an easy way and a more advanced approach. The easy way involves using Seqera Containers, a platform that allows anyone to compose a container using tools from bioconda, pypi, and conda-forge. No need for logging in, just select the tools, and the URL of your container will be made available in no time. One can build containers for either Docker or Singularity, and for different platforms (amd64 or arm64).\n\nIf one is looking for more control, they can use Wave as a command line. This is a powerful tool that can act as an intermediary between the user and a container registry. Wave builds containers on the fly, allowing anyone to pass a wave build command as an evaluation inside a docker run command. It's incredibly fast, and builds containers from conda packages in a matter of seconds. Wave, which is also the engine behind Seqera Containers, can be extremely handy to allow other operations like container augmentation. This feature enables a user to add new layers to existing containers without having to rebuild them, thanks to Docker's layer-based architecture. One can simply create a folder where configuration files or executable scripts are located, pass the folder to Wave which will add the folder with a new layer, and get the URL of the augmented container on the fly.\n\n### Challenge 3: Reporting\n\n_Challenge description: “I need to deliver a clear report of the analysis results, in a format that is accessible and can be used for publication purposes by my users”_\n\nReporting is a crucial aspect of any bioinformatics pipeline, and as for customisation Nextflow offers different ways to approach it. suitable for different levels of expertise. The most straightforward solution involves running MultiQC, a tool that collects the output and logs of a wide range of software in a pipeline and generates a nicely formatted HTML report. This is a great option if one wants a quick and easy way to get a summary of their pipeline's results. MultiQC is a widely used tool that supports a huge list (and growing) of bioinformatics tools and file formats, making it a great choice for many use cases.\n\nHowever, if the developer needs more control over the reporting process or wants to create a custom report that meets some specific needs, it is entirely possible to engineer the reports from scratch. This involves collecting the outputs from various processes in the pipeline and passing them as an input to a process that runs an R Markdown or Quarto script. R Markdown and Quarto are popular tools for creating dynamic documents that can be parameterised, allowing anyone to customize the content and the layout of a report dynamically.\nBy using this approach, one can create a report that is tailored to your specific needs, including the types of plots and visualizations they want to include, the formatting and layouting, branding, and anything specific one might want to highlight.\n\nTo follow this approach, the user can either create their own customised module, or re-use one of the available notebooks modules in the nf-core repository (quarto [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/quartonotebook), or jupyter [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/jupyternotebook)).\n\n### Challenge 4: Monitoring\n\n_Challenge description: “I need to be able to estimate and optimise runtimes as well as costs of my pipelines, fitting our cost model”_\n\nMonitoring is a critical aspect of pipeline management, and Nextflow provides a robust set of tools to help you track and optimise a pipeline's performance. At its core, monitoring involves tracking the execution of the pipeline to ensure that it's running efficiently and effectively. But it's not just about knowing how long a pipeline takes to run or how much it costs - it's also about making sure each process in the pipeline is using the requested resources efficiently.\nWith Nextflow, the user can track the resources used by each process in your pipeline, including CPU, memory, and disk usage and compare them visually with the resources requested in the pipeline configuration and reserved by each job. This information allows the user to identify bottlenecks and areas for optimisation, so one can fine-tune their pipeline for a better resource consumption. For example, if the user notices that one process is using a disproportionate amount of memory, they can adjust the configuration to better match the actual usage.\n\nBut monitoring isn't just about optimising a pipeline's performance - it's also about reducing the environmental impact where possible. A recently developed Nextflow plugin allows to track the carbon footprint of a pipeline, including the energy consumption and greenhouse gas emissions associated with running that pipeline. This information allows one to make informed decisions about their environmental impact, and gaining better awareness or even adopting greener strategies to computing.\n\nOne of the key benefits of Nextflow’s monitoring system is its flexibility. The user can either use the built-in html reports for trace and pipeline execution, or could monitor a run live by connecting to Seqera Platform and visualising its progress on a graphical interface in real time. More expert or creative users could use the trace file produced by a Nextflow execution, to create their own metrics and visualisations.\n\n### Challenge 5: User accessibility\n\n_Challenge description: “I could balance workloads better, by giving users a certain level of autonomy in running some of my pipelines”_\n\nUser accessibility is a crucial aspect of pipeline development, as it enables users with varying levels of bioinformatics experience to run complex pipelines with ease. One of the advantages of Nextflow, is that a developer can create pipelines that are not only robust and efficient but also user-friendly. Allowing your users to run them with a certain level of autonomy might be a good strategy in a bioinformatics core to decentralise straightforward analyses and invest human resources on more complex projects. Empowering a facility’s users to run specific pipelines independently could be a solution to reduce certain workloads.\n\nThe nf-core template includes a parameters schema, which is captured by the nf-core website to create a graphical interface for parameters configuration of the pipelines hosted under the nf-core organisation on GitHub. This interface allows users to fill in the necessary fields for parameters needed to run a pipeline, and allows even users with minimal experience with bioinformatics or command-line interfaces to quickly set up a run. The user can then simply copy and paste the command generated by the webpage into a terminal, and the pipeline will launch as configured. This approach is ideal for users who are familiar with basic computer tasks, and have a very minimal familiarity with a terminal.\n\nHowever, for users with even less bioinformatics experience, Nextflow and the nf-core template together offer an even more intuitive solution. The pipeline can be added to the launcher of the Seqera Platform, and one can provide users with a comprehensive and user-friendly interface that allows them to launch pipelines with ease. This platform offers a range of features, including access to datasets created from sample sheets, the ability to launch pipelines on a wide range of cloud environments as well as on HPC on-premise. A simple graphical interface simplifies the entire process.The Seqera Platform provides in this way a seamless and intuitive experience for users, allowing them to run pipelines without requiring extensive bioinformatics knowledge.\n\n### Challenge 6: Training\n\n_Challenge description: “Training my team and especially onboarding new team members is always challenging and requires documentation and good materials”_\n\nThe final challenge we often face in bioinformatics facilities is training. We all know that training is an ongoing issue, not just because of staff turnover and the need to onboard new recruits, but also because the field is constantly evolving. With new tools, techniques, and technologies emerging all the time, it can be difficult to keep up with the latest developments. However, training is crucial for ensuring that pipelines are robust, efficient, and accurate.\n\nFortunately, there are now many resources available to help with training. The Nextflow training website, for example, has been completely rebuilt recently and now offers a wealth of material suitable for everyone, from beginners to experts. Whether you're just starting out with Nextflow or are already an experienced user, you'll find plenty of resources to help you improve your skills. From introductory tutorials to advanced guides, the training website has everything you need to get the most out of this workflow manager.\n\nEveryone can access the material at their own pace, but regular training events have been scheduled during the year. Additionally, there is now a network of Nextflow Ambassadors who often organise local training events across the world. Without making comparisons with other solutions, I can easily say that the steep learning curve to get going with Nextflow is just a myth nowadays. The quality of the training material, the examples available, the frequency of events in person or online you can attend to, and more importantly a welcoming community of users, make learning Nextflow quite easy.\n\nIn my laboratory, usually in a couple of months bachelor students are reasonably confident with the code and with running pipelines and debugging common issues.\n\n### Conclusions\n\nIn conclusion, the presentation at ISMB has gathered quite some interest because I believe it has shown how Nextflow is a powerful and versatile tool that can help bioinformatics cores address those common challenges everyone has experienced. With its comprehensive tooling, extensive training materials, and active community of users, Nextflow offers a complete package that can help people streamline their workflows and improve their productivity.\nAlthough I might be biased on this, I also believe that by adopting Nextflow one also becomes part of a community of researchers and developers who are passionate about bioinformatics and committed to sharing their knowledge and expertise. Beginners not only will have access to a wealth of resources and tutorials, but more importantly to a supportive network of peers who can offer advice and guidance, and which is really fun to be part of.\n", + "content": "I was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face. From managing complex pipelines to optimizing resource utilization, Nextflow offers a range of benefits that can streamline workflows and improve productivity. In this blog, I'll summarize my talk and share insights on how Nextflow can help overcome some of those challenges, including meeting the needs of a wide range of users or customers, automate reporting, customising pipelines and training.\n\n### Challenge 1: running multiple services\n\n_Challenge description: “I have a wide range of stakeholders, and my pipelines need to address different needs in multiple scientific domains”_\n\nOne of the biggest challenges faced by bioinformatics cores is catering to a diverse range of users with varying applications. On one hand, one might need to run analyses for researchers focused on cancer or human genetics. On the other hand, one may also need to support scientists working with mass spectrometry or metagenomics. Fortunately, the nf-core community has made it relatively easy to tackle these diverse needs with their curated pipelines. These pipelines are ready to use, covering a broad spectrum of applications, from genomics and metagenomics to immunology and mass spectrometry. In one of my slides I showed a non-exhaustive list, which spans genomics, metagenomics, immunology, mass spec, and more: one can find best-practice pipelines for almost any bioinformatics application imaginable, including emerging areas like imaging and spatial-omics. By leveraging this framework, one can not only tap into the expertise of the pipeline developers but also engage with them to discuss specific needs and requirements. This collaborative approach can significantly ease the deployment of a workflow, allowing the user to focus on high-priority tasks while ensuring that the analyses are always up to date and aligned with current best practices.\n\n### Challenge 2: customising applications\n\n_Challenge description: “We often need to customise our applications and pipeline, to meet specific in-house needs of our users”_\n\nWhile ready-to-use applications are a huge advantage, there are times when customisation is necessary. Perhaps the standard pipeline that works for most users doesn't quite meet the specific needs of a facilities user or customer. Fortunately, the nf-core community has got these cases covered. With over 1,300 modules at everyone’s disposal, one can easily compose their own pipeline using the nf-core components and tooling. Should that not be enough though, one can even create a pipeline from scratch using nf-core tools. For instance, one can run a simple command like “nf-core create” followed by the name of the pipeline, and voilà! The software package will create a complete skeleton for the pipeline, filled with pre-compiled code and placeholders to ease customisation. This process is incredibly quick, as I demonstrated in a video clip during the talk, where a pipeline skeleton was created in just a few moments.\n\nOf course, customisation isn't limited to pipelines. It also applies to containers, which are a crucial enabler of portability. When it comes to containers, Nextflow users have two options: an easy way and a more advanced approach. The easy way involves using Seqera Containers, a platform that allows anyone to compose a container using tools from bioconda, pypi, and conda-forge. No need for logging in, just select the tools, and the URL of your container will be made available in no time. One can build containers for either Docker or Singularity, and for different platforms (amd64 or arm64).\n\nIf one is looking for more control, they can use Wave as a command line. This is a powerful tool that can act as an intermediary between the user and a container registry. Wave builds containers on the fly, allowing anyone to pass a wave build command as an evaluation inside a docker run command. It's incredibly fast, and builds containers from conda packages in a matter of seconds. Wave, which is also the engine behind Seqera Containers, can be extremely handy to allow other operations like container augmentation. This feature enables a user to add new layers to existing containers without having to rebuild them, thanks to Docker's layer-based architecture. One can simply create a folder where configuration files or executable scripts are located, pass the folder to Wave which will add the folder with a new layer, and get the URL of the augmented container on the fly.\n\n### Challenge 3: Reporting\n\n_Challenge description: “I need to deliver a clear report of the analysis results, in a format that is accessible and can be used for publication purposes by my users”_\n\nReporting is a crucial aspect of any bioinformatics pipeline, and as for customisation Nextflow offers different ways to approach it. suitable for different levels of expertise. The most straightforward solution involves running MultiQC, a tool that collects the output and logs of a wide range of software in a pipeline and generates a nicely formatted HTML report. This is a great option if one wants a quick and easy way to get a summary of their pipeline's results. MultiQC is a widely used tool that supports a huge list (and growing) of bioinformatics tools and file formats, making it a great choice for many use cases.\n\nHowever, if the developer needs more control over the reporting process or wants to create a custom report that meets some specific needs, it is entirely possible to engineer the reports from scratch. This involves collecting the outputs from various processes in the pipeline and passing them as an input to a process that runs an R Markdown or Quarto script. R Markdown and Quarto are popular tools for creating dynamic documents that can be parameterised, allowing anyone to customize the content and the layout of a report dynamically.\nBy using this approach, one can create a report that is tailored to your specific needs, including the types of plots and visualizations they want to include, the formatting and layouting, branding, and anything specific one might want to highlight.\n\nTo follow this approach, the user can either create their own customised module, or re-use one of the available notebooks modules in the nf-core repository (quarto [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/quartonotebook), or jupyter [here](https://github.com/nf-core/modules/tree/master/modules/nf-core/jupyternotebook)).\n\n### Challenge 4: Monitoring\n\n_Challenge description: “I need to be able to estimate and optimise runtimes as well as costs of my pipelines, fitting our cost model”_\n\nMonitoring is a critical aspect of pipeline management, and Nextflow provides a robust set of tools to help you track and optimise a pipeline's performance. At its core, monitoring involves tracking the execution of the pipeline to ensure that it's running efficiently and effectively. But it's not just about knowing how long a pipeline takes to run or how much it costs - it's also about making sure each process in the pipeline is using the requested resources efficiently.\nWith Nextflow, the user can track the resources used by each process in your pipeline, including CPU, memory, and disk usage and compare them visually with the resources requested in the pipeline configuration and reserved by each job. This information allows the user to identify bottlenecks and areas for optimisation, so one can fine-tune their pipeline for a better resource consumption. For example, if the user notices that one process is using a disproportionate amount of memory, they can adjust the configuration to better match the actual usage.\n\nBut monitoring isn't just about optimising a pipeline's performance - it's also about reducing the environmental impact where possible. A recently developed Nextflow plugin allows to track the carbon footprint of a pipeline, including the energy consumption and greenhouse gas emissions associated with running that pipeline. This information allows one to make informed decisions about their environmental impact, and gaining better awareness or even adopting greener strategies to computing.\n\nOne of the key benefits of Nextflow’s monitoring system is its flexibility. The user can either use the built-in html reports for trace and pipeline execution, or could monitor a run live by connecting to Seqera Platform and visualising its progress on a graphical interface in real time. More expert or creative users could use the trace file produced by a Nextflow execution, to create their own metrics and visualisations.\n\n### Challenge 5: User accessibility\n\n_Challenge description: “I could balance workloads better, by giving users a certain level of autonomy in running some of my pipelines”_\n\nUser accessibility is a crucial aspect of pipeline development, as it enables users with varying levels of bioinformatics experience to run complex pipelines with ease. One of the advantages of Nextflow, is that a developer can create pipelines that are not only robust and efficient but also user-friendly. Allowing your users to run them with a certain level of autonomy might be a good strategy in a bioinformatics core to decentralise straightforward analyses and invest human resources on more complex projects. Empowering a facility’s users to run specific pipelines independently could be a solution to reduce certain workloads.\n\nThe nf-core template includes a parameters schema, which is captured by the nf-core website to create a graphical interface for parameters configuration of the pipelines hosted under the nf-core organisation on GitHub. This interface allows users to fill in the necessary fields for parameters needed to run a pipeline, and allows even users with minimal experience with bioinformatics or command-line interfaces to quickly set up a run. The user can then simply copy and paste the command generated by the webpage into a terminal, and the pipeline will launch as configured. This approach is ideal for users who are familiar with basic computer tasks, and have a very minimal familiarity with a terminal.\n\nHowever, for users with even less bioinformatics experience, Nextflow and the nf-core template together offer an even more intuitive solution. The pipeline can be added to the launcher of the Seqera Platform, and one can provide users with a comprehensive and user-friendly interface that allows them to launch pipelines with ease. This platform offers a range of features, including access to datasets created from sample sheets, the ability to launch pipelines on a wide range of cloud environments as well as on HPC on-premise. A simple graphical interface simplifies the entire process.The Seqera Platform provides in this way a seamless and intuitive experience for users, allowing them to run pipelines without requiring extensive bioinformatics knowledge.\n\n### Challenge 6: Training\n\n_Challenge description: “Training my team and especially onboarding new team members is always challenging and requires documentation and good materials”_\n\nThe final challenge we often face in bioinformatics facilities is training. We all know that training is an ongoing issue, not just because of staff turnover and the need to onboard new recruits, but also because the field is constantly evolving. With new tools, techniques, and technologies emerging all the time, it can be difficult to keep up with the latest developments. However, training is crucial for ensuring that pipelines are robust, efficient, and accurate.\n\nFortunately, there are now many resources available to help with training. The Nextflow training website, for example, has been completely rebuilt recently and now offers a wealth of material suitable for everyone, from beginners to experts. Whether you're just starting out with Nextflow or are already an experienced user, you'll find plenty of resources to help you improve your skills. From introductory tutorials to advanced guides, the training website has everything you need to get the most out of this workflow manager.\n\nEveryone can access the material at their own pace, but regular training events have been scheduled during the year. Additionally, there is now a network of Nextflow Ambassadors who often organise local training events across the world. Without making comparisons with other solutions, I can easily say that the steep learning curve to get going with Nextflow is just a myth nowadays. The quality of the training material, the examples available, the frequency of events in person or online you can attend to, and more importantly a welcoming community of users, make learning Nextflow quite easy.\n\nIn my laboratory, usually in a couple of months bachelor students are reasonably confident with the code and with running pipelines and debugging common issues.\n\n### Conclusions\n\nIn conclusion, the presentation at ISMB has gathered quite some interest because I believe it has shown how Nextflow is a powerful and versatile tool that can help bioinformatics cores address those common challenges everyone has experienced. With its comprehensive tooling, extensive training materials, and active community of users, Nextflow offers a complete package that can help people streamline their workflows and improve their productivity.\nAlthough I might be biased on this, I also believe that by adopting Nextflow one also becomes part of a community of researchers and developers who are passionate about bioinformatics and committed to sharing their knowledge and expertise. Beginners not only will have access to a wealth of resources and tutorials, but more importantly to a supportive network of peers who can offer advice and guidance, and which is really fun to be part of.", "images": [], "author": "Francesco Lescai", "tags": "nextflow,ambassador_post" @@ -665,7 +665,7 @@ "slug": "2024/ambassador-second-call", "title": "Open call for new Nextflow Ambassadors closes June 14", "date": "2024-05-17T00:00:00.000Z", - "content": "\nNextflow Ambassadors are passionate individuals within the Nextflow community who play a more active role in fostering collaboration, knowledge sharing, and engagement. We launched this program at the Nextflow Summit in Barcelona last year, and it's been a great experience so far, so we've been recruiting more volunteers to expand the program. We’re going to close applications in June with the goal of having new ambassadors start in July, so if you’re interested in becoming an ambassador, now is your chance to apply!\n\n\n\nThe program has been off to a great start, bringing together a diverse group of 46 passionate individuals from around the globe. Our ambassadors have done a great job in their dedication to spreading the word about Nextflow, contributing significantly to the community in numerous ways, including writing insightful content, organizing impactful events, conducting training sessions, leading hackathons, and even contributing to the codebase. Their efforts have not only enhanced the Nextflow ecosystem but have also fostered a stronger, more interconnected global community.\n\nTo support their endeavors, we provide our ambassadors with exclusive swag, essential assets to facilitate their work and funding to attend events where they can promote Nextflow. With the end of the first semester fast approaching, we are excited to officially announce the second cohort of the Nextflow Ambassador program will start in July. If you are passionate about Nextflow and eager to make a meaningful impact, we invite you to [apply](http://seqera.typeform.com/ambassadors/) and join our vibrant community of ambassadors.\n\n**Application Details:**\n\n- **Call for Applications:** Open until June 14 (23h59 any timezone)\n- **Notification of Acceptance:** By June 30\n- **Program Start:** July 2024\n\n
\n \"Ambassadors\n
\n\nWe seek enthusiastic individuals ready to take their contribution to the next level through various initiatives such as content creation, event organization, training, hackathons, and more. As an ambassador, you will receive support and resources to help you succeed in your role, including swag, necessary assets, and funding for event participation.\n\nTo apply, please visit our [Nextflow Ambassador Program Application Page](http://seqera.typeform.com/ambassadors/) and submit your application no later than 23h59 June 14 (any timezone). The form shouldn’t take more than a few minutes to complete. We are eager to welcome a new group of ambassadors who will help support the growth and success of the Nextflow community.\n\nThanks to all our current ambassadors for their incredible work and dedication. We look forward to seeing the new ideas and initiatives that the next cohort of ambassadors will bring to the table. Together, let's continue to build a stronger, more dynamic Nextflow community.\n\n[Apply now and become a part of the Nextflow journey!](http://seqera.typeform.com/ambassadors/)\n\n---\n\nStay tuned for more updates and follow us on our [social](https://twitter.com/nextflowio) [media](https://x.com/seqeralabs) [channels](https://www.linkedin.com/company/seqera/posts/) to keep up with the latest news and events from the Nextflow community.\n", + "content": "Nextflow Ambassadors are passionate individuals within the Nextflow community who play a more active role in fostering collaboration, knowledge sharing, and engagement. We launched this program at the Nextflow Summit in Barcelona last year, and it's been a great experience so far, so we've been recruiting more volunteers to expand the program. We’re going to close applications in June with the goal of having new ambassadors start in July, so if you’re interested in becoming an ambassador, now is your chance to apply!\n\n\n\nThe program has been off to a great start, bringing together a diverse group of 46 passionate individuals from around the globe. Our ambassadors have done a great job in their dedication to spreading the word about Nextflow, contributing significantly to the community in numerous ways, including writing insightful content, organizing impactful events, conducting training sessions, leading hackathons, and even contributing to the codebase. Their efforts have not only enhanced the Nextflow ecosystem but have also fostered a stronger, more interconnected global community.\n\nTo support their endeavors, we provide our ambassadors with exclusive swag, essential assets to facilitate their work and funding to attend events where they can promote Nextflow. With the end of the first semester fast approaching, we are excited to officially announce the second cohort of the Nextflow Ambassador program will start in July. If you are passionate about Nextflow and eager to make a meaningful impact, we invite you to [apply](http://seqera.typeform.com/ambassadors/) and join our vibrant community of ambassadors.\n\n**Application Details:**\n\n- **Call for Applications:** Open until June 14 (23h59 any timezone)\n- **Notification of Acceptance:** By June 30\n- **Program Start:** July 2024\n\n
\n \"Ambassadors\n
\n\nWe seek enthusiastic individuals ready to take their contribution to the next level through various initiatives such as content creation, event organization, training, hackathons, and more. As an ambassador, you will receive support and resources to help you succeed in your role, including swag, necessary assets, and funding for event participation.\n\nTo apply, please visit our [Nextflow Ambassador Program Application Page](http://seqera.typeform.com/ambassadors/) and submit your application no later than 23h59 June 14 (any timezone). The form shouldn’t take more than a few minutes to complete. We are eager to welcome a new group of ambassadors who will help support the growth and success of the Nextflow community.\n\nThanks to all our current ambassadors for their incredible work and dedication. We look forward to seeing the new ideas and initiatives that the next cohort of ambassadors will bring to the table. Together, let's continue to build a stronger, more dynamic Nextflow community.\n\n[Apply now and become a part of the Nextflow journey!](http://seqera.typeform.com/ambassadors/)\n\n---\n\nStay tuned for more updates and follow us on our [social](https://twitter.com/nextflowio) [media](https://x.com/seqeralabs) [channels](https://www.linkedin.com/company/seqera/posts/) to keep up with the latest news and events from the Nextflow community.", "images": [ "/img/ambassadors-hackathon.jpeg" ], @@ -676,7 +676,7 @@ "slug": "2024/better-support-through-community-forum-2024", "title": "Moving toward better support through the Community forum", "date": "2024-08-28T00:00:00.000Z", - "content": "\nAs the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support. By improving the forum’s usability and accessibility, we’re aiming to create a more efficient and welcoming environment for everyone. Read on to learn about the changes we’re implementing and how you can contribute to making the forum an even better resource for the community.\n\n\n\n
\n\nOne of the things that impressed me the most when I joined Seqera last year as a developer advocate for the Nextflow community, was how engaged people are, and how much peer-to-peer interaction there is across a vast range of scientific domains, cultures, and geographies. That’s wonderful for a number of reasons, not least of which is that whenever you run into a problem —or you’re trying to do something a bit complicated or new— it’s very likely that there is someone out there who is able and willing to help you figure it out.\n\nFor the past few months, our small team of developer advocates have been thinking about how to nurture that dynamism, and how to further improve the experience of peer-to-peer support as the Nextflow community continues to grow. We’ve come to the conclusion that the best thing we can do is make the [community forum](https://community.seqera.io/) an awesome place to go for help, answers, and resources.\n\n## Why focus on the forum?\n\nIf you’re familiar with the Nextflow Slack workspace, you know there’s a lot of activity there, and the #help channel is always hopping. It’s true, and that’s great, buuuuut using Slack has some important downsides that the forum doesn’t suffer from.\n\nOne of the standout features of the forum is the ability to search past questions and answers really easily. Whether you're browsing directly within the forum, or using Google or some other search engine, you can quickly find relevant information in a way that’s much harder to do on Slack. This means that solutions to common issues are readily accessible, saving you (and the resident experts who have already answered the same question a bunch of times) a whole lot of time and effort.\n\nAdditionally, the forum has no barrier to access— you can view all the content without the need to join yet another app. This open access ensures that everyone can benefit from the wealth of knowledge shared by community members.\n\n## Immediate improvements to the forum’s ease of use\n\nWe’re excited to roll out a few immediate changes to the forum that should make it easier and more pleasant to use.\n\n- We’re introducing a new, sleeker visual design to make navigation and posting more intuitive and enjoyable.\n\n- We’ve reorganized the categories to streamline the process of finding and providing help. Instead of having separate categories for various things (like Nextflow, Wave, Seqera Platform etc), there is now a single \"Ask for help\" category for all topics, eliminating any confusion about where to post your question. Simply put, if you need help, just post in the \"Ask for help\" category. Done.\n\nWe’re also planning to mirror existing categories from the Nextflow Slack workspace, such as the jobs board and shameless promo channels, to make that content more visible and searchable. This will help you find opportunities and promote your work more effectively.\n\n## What you can do to help\n\nThese changes are meant to make the forum a great place for peer-to-peer support for the Nextflow community. You can help us improve it further by giving us your feedback about the forum functionality (don’t be shy), by posting your questions in the forum, and of course, if you’re already a Nextflow expert, by answering questions there.\n\nCheck out the [community forum](https://community.seqera.io/) now!\n", + "content": "As the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support. By improving the forum’s usability and accessibility, we’re aiming to create a more efficient and welcoming environment for everyone. Read on to learn about the changes we’re implementing and how you can contribute to making the forum an even better resource for the community.\n\n\n\n---\n\nOne of the things that impressed me the most when I joined Seqera last year as a developer advocate for the Nextflow community, was how engaged people are, and how much peer-to-peer interaction there is across a vast range of scientific domains, cultures, and geographies. That’s wonderful for a number of reasons, not least of which is that whenever you run into a problem —or you’re trying to do something a bit complicated or new— it’s very likely that there is someone out there who is able and willing to help you figure it out.\n\nFor the past few months, our small team of developer advocates have been thinking about how to nurture that dynamism, and how to further improve the experience of peer-to-peer support as the Nextflow community continues to grow. We’ve come to the conclusion that the best thing we can do is make the [community forum](https://community.seqera.io/) an awesome place to go for help, answers, and resources.\n\n## Why focus on the forum?\n\nIf you’re familiar with the Nextflow Slack workspace, you know there’s a lot of activity there, and the #help channel is always hopping. It’s true, and that’s great, buuuuut using Slack has some important downsides that the forum doesn’t suffer from.\n\nOne of the standout features of the forum is the ability to search past questions and answers really easily. Whether you're browsing directly within the forum, or using Google or some other search engine, you can quickly find relevant information in a way that’s much harder to do on Slack. This means that solutions to common issues are readily accessible, saving you (and the resident experts who have already answered the same question a bunch of times) a whole lot of time and effort.\n\nAdditionally, the forum has no barrier to access— you can view all the content without the need to join yet another app. This open access ensures that everyone can benefit from the wealth of knowledge shared by community members.\n\n## Immediate improvements to the forum’s ease of use\n\nWe’re excited to roll out a few immediate changes to the forum that should make it easier and more pleasant to use.\n\n- We’re introducing a new, sleeker visual design to make navigation and posting more intuitive and enjoyable.\n\n- We’ve reorganized the categories to streamline the process of finding and providing help. Instead of having separate categories for various things (like Nextflow, Wave, Seqera Platform etc), there is now a single \"Ask for help\" category for all topics, eliminating any confusion about where to post your question. Simply put, if you need help, just post in the \"Ask for help\" category. Done.\n\nWe’re also planning to mirror existing categories from the Nextflow Slack workspace, such as the jobs board and shameless promo channels, to make that content more visible and searchable. This will help you find opportunities and promote your work more effectively.\n\n## What you can do to help\n\nThese changes are meant to make the forum a great place for peer-to-peer support for the Nextflow community. You can help us improve it further by giving us your feedback about the forum functionality (don’t be shy), by posting your questions in the forum, and of course, if you’re already a Nextflow expert, by answering questions there.\n\nCheck out the [community forum](https://community.seqera.io/) now!", "images": [], "author": "Geraldine Van der Auwera", "tags": "nextflow,community" @@ -685,7 +685,7 @@ "slug": "2024/bioinformatics-growth-in-turkiye", "title": "Fostering Bioinformatics Growth in Türkiye", "date": "2024-06-12T00:00:00.000Z", - "content": "\nAfter diving into the Nextflow community, I've seen how it benefits bioinformatics in places like South Africa, Brazil, and France. I'm confident it can do the same for Türkiye by fostering collaboration and speeding up research. Since I became a Nextflow Ambassador, I am happy and excited because I can contribute to this development! Even though our first attempt to organize an introductory Nextflow workshop was online, it was a fruitful collaboration with RSG-Türkiye that initiated our effort to promote more Nextflow in Türkiye. We are happy to announce that we will organize a hands-on workshop soon.\n\n\n\nI am [Kübra Narcı](https://www.ghga.de/about-us/team-members/narci-kuebra), currently employed as a bioinformatician within the [German Human Genome Phenome Archive (GHGA) Workflows workstream](https://www.ghga.de/about-us/how-we-work/workstreams). Upon commencing this position nearly two years ago, I was introduced to Nextflow due to the necessity of transporting certain variant calling workflows here, and given my prior experience with other workflow managers, I was well-suited for the task. Though the initial two months were marked by challenges and moments of frustration, my strong perseverance ultimately led to the successful development of my first pipeline.\n\nSubsequently, owing much to the supportive Nextflow community, my interest, as well as my proficiency in the platform, steadily grew, culminating in my acceptance to the role of Nextflow Ambassador for the past six months. I jumped into the role since it was a great opportunity for GHGA and Nextflow to be connected even more.\n\n
\n \"meme\n
\n\nTransitioning into this ambassadorial role prompted a solid realization: the absence of a dedicated Nextflow community in Türkiye. This revelation was a shock, particularly given my academic background in bioinformatics there, where the community’s live engagement in workflow development is undeniable. Witnessing Turkish contributors within Nextflow and nf-core Slack workspaces further underscored this sentiment. It became evident that what was lacking was a spark for organizing events to ignite the Turkish community, a task I gladly undertook.\n\nWhile I possessed foresight regarding the establishment of a Nextflow community, I initially faced uncertainty regarding the appropriate course of action. To address this, I sought counsel from [Marcel](https://www.twitter.com/mribeirodantas), given his pivotal role in the initiation of the Nextflow community in Brazil. Following our discussion and receipt of valuable insights, it became evident that establishing connections with the appropriate community from my base in Germany was a necessity.\n\nThis attempt led me to meet with [RSG-Türkiye](https://rsgturkey.com). RSG-Türkiye aims to create a platform for students and post-docs in computational biology and bioinformatics in Türkiye. It aims to share knowledge and experience, promote collaboration, and expand training opportunities. The organization also collaborates with universities and the Bioinformatics Council, a recently established national organization as the Turkish counterpart of the ISCB (International Society for Computational Biology) to introduce industrial and academic research. To popularize the field, they have offline and online talk series in university student centers to promote computational biology and bioinformatics.\n\nFollowing our introduction, RSG-Türkiye and I hosted a workshop focusing on workflow reproducibility, Nextflow, and nf-core. We chose Turkish as the language to make it more accessible for participants who are not fluent in English. The online session lasted a bit more than an hour and attracted nearly 50 attendees, mostly university students but also individuals from the research and industry sectors. The strong student turnout was especially gratifying as it aligned with my goal of building a vibrant Nextflow community in Türkiye. I took the opportunity to discuss Nextflow’s ambassadorship and mentorship programs, which can greatly benefit students, given Türkiye’s growing interest in bioinformatics. The whole workshop was recorded and can be viewed on [YouTube](https://www.youtube.com/watch?v=AqNmIkoQrNo&ab_channel=RSG-Turkey).\n\nI am delighted to report that the workshop was a success. It was not only attracting considerable interest but also marked the commencement of a promising journey. Our collaboration with RSG-Türkiye persists, with plans underway for a more comprehensive on-site training session in Türkiye scheduled for later this year. I look forward to more engagement from Turkish participants as we work together to strengthen our community. Hopefully, this effort will lead to more Turkish-language content, new mentor relations from the core Nextflow team, and the emergence of a local Nextflow ambassador.\n\n
\n \"meme\n
\n\n## How can I contact the Nextflow Türkiye community?\n\nIf you want to help grow the Nextflow community in Türkiye, join the Nextflow and nf-core Slack workspaces and connect with Turkish contributors in the #region-turkiye channel. Don't be shy—say hello, and let's build up the community together! Feel free to contact me if you're interested in helping organize local hands-on Nextflow workshops. We welcome both advanced users and beginners. By participating, you'll contribute to the growth of bioinformatics in Türkiye, collaborate with peers, and access resources to advance your research and career.\n", + "content": "After diving into the Nextflow community, I've seen how it benefits bioinformatics in places like South Africa, Brazil, and France. I'm confident it can do the same for Türkiye by fostering collaboration and speeding up research. Since I became a Nextflow Ambassador, I am happy and excited because I can contribute to this development! Even though our first attempt to organize an introductory Nextflow workshop was online, it was a fruitful collaboration with RSG-Türkiye that initiated our effort to promote more Nextflow in Türkiye. We are happy to announce that we will organize a hands-on workshop soon.\n\n\n\nI am [Kübra Narcı](https://www.ghga.de/about-us/team-members/narci-kuebra), currently employed as a bioinformatician within the [German Human Genome Phenome Archive (GHGA) Workflows workstream](https://www.ghga.de/about-us/how-we-work/workstreams). Upon commencing this position nearly two years ago, I was introduced to Nextflow due to the necessity of transporting certain variant calling workflows here, and given my prior experience with other workflow managers, I was well-suited for the task. Though the initial two months were marked by challenges and moments of frustration, my strong perseverance ultimately led to the successful development of my first pipeline.\n\nSubsequently, owing much to the supportive Nextflow community, my interest, as well as my proficiency in the platform, steadily grew, culminating in my acceptance to the role of Nextflow Ambassador for the past six months. I jumped into the role since it was a great opportunity for GHGA and Nextflow to be connected even more.\n\n
\n \"meme\n
\n\nTransitioning into this ambassadorial role prompted a solid realization: the absence of a dedicated Nextflow community in Türkiye. This revelation was a shock, particularly given my academic background in bioinformatics there, where the community’s live engagement in workflow development is undeniable. Witnessing Turkish contributors within Nextflow and nf-core Slack workspaces further underscored this sentiment. It became evident that what was lacking was a spark for organizing events to ignite the Turkish community, a task I gladly undertook.\n\nWhile I possessed foresight regarding the establishment of a Nextflow community, I initially faced uncertainty regarding the appropriate course of action. To address this, I sought counsel from [Marcel](https://www.twitter.com/mribeirodantas), given his pivotal role in the initiation of the Nextflow community in Brazil. Following our discussion and receipt of valuable insights, it became evident that establishing connections with the appropriate community from my base in Germany was a necessity.\n\nThis attempt led me to meet with [RSG-Türkiye](https://rsgturkey.com). RSG-Türkiye aims to create a platform for students and post-docs in computational biology and bioinformatics in Türkiye. It aims to share knowledge and experience, promote collaboration, and expand training opportunities. The organization also collaborates with universities and the Bioinformatics Council, a recently established national organization as the Turkish counterpart of the ISCB (International Society for Computational Biology) to introduce industrial and academic research. To popularize the field, they have offline and online talk series in university student centers to promote computational biology and bioinformatics.\n\nFollowing our introduction, RSG-Türkiye and I hosted a workshop focusing on workflow reproducibility, Nextflow, and nf-core. We chose Turkish as the language to make it more accessible for participants who are not fluent in English. The online session lasted a bit more than an hour and attracted nearly 50 attendees, mostly university students but also individuals from the research and industry sectors. The strong student turnout was especially gratifying as it aligned with my goal of building a vibrant Nextflow community in Türkiye. I took the opportunity to discuss Nextflow’s ambassadorship and mentorship programs, which can greatly benefit students, given Türkiye’s growing interest in bioinformatics. The whole workshop was recorded and can be viewed on [YouTube](https://www.youtube.com/watch?v=AqNmIkoQrNo&ab_channel=RSG-Turkey).\n\nI am delighted to report that the workshop was a success. It was not only attracting considerable interest but also marked the commencement of a promising journey. Our collaboration with RSG-Türkiye persists, with plans underway for a more comprehensive on-site training session in Türkiye scheduled for later this year. I look forward to more engagement from Turkish participants as we work together to strengthen our community. Hopefully, this effort will lead to more Turkish-language content, new mentor relations from the core Nextflow team, and the emergence of a local Nextflow ambassador.\n\n
\n \"meme\n
\n\n## How can I contact the Nextflow Türkiye community?\n\nIf you want to help grow the Nextflow community in Türkiye, join the Nextflow and nf-core Slack workspaces and connect with Turkish contributors in the #region-turkiye channel. Don't be shy—say hello, and let's build up the community together! Feel free to contact me if you're interested in helping organize local hands-on Nextflow workshops. We welcome both advanced users and beginners. By participating, you'll contribute to the growth of bioinformatics in Türkiye, collaborate with peers, and access resources to advance your research and career.", "images": [ "/img/blog-2024-06-12-turkish_workshop1a.png", "/img/blog-2024-06-12-turkish_workshop2a.png" @@ -697,7 +697,7 @@ "slug": "2024/empowering-bioinformatics-mentoring", "title": "Empowering Bioinformatics: Mentoring Across Continents with Nextflow", "date": "2024-04-25T00:00:00.000Z", - "content": "\nIn my journey with the nf-core Mentorship Program, I've mentored individuals from Malawi, Chile, and Brazil, guiding them through Nextflow and nf-core. Despite the distances, my mentees successfully adapted their workflows, contributing to the open-source community. Witnessing the transformative impact of mentorship firsthand, I'm encouraged to continue participating in future mentorship efforts and urge others to join this rewarding experience. But how did it all start?\n\n\n\nI’m [Robert Petit](https://www.robertpetit.com/), a bioinformatician at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/), in [Wyoming, USA](https://en.wikipedia.org/wiki/Wyoming). If you don’t know where that is, haha that’s fine, I’m pretty sure half the people in the US don’t know either! Wyoming is the 10th largest US state (253,000 km2), but the least populated with only about 580,000 people. It’s home to some very beautiful mountains and national parks, large animals including bears, wolves and the fastest land animal in the northern hemisphere, the Pronghorn. But it’s rural, can get cold (-10 C) and the high wind speeds (somedays average 50 kmph, with gusts 100+ kmph) only make it feel colder during the winter (sometimes feeling like -60 C to -40 C). You might be wondering:\n\nHow did some random person from Wyoming get involved in the nf-core Mentorship Program, and end up being the only mentor to have participated in all three rounds?\n\nI’ve been in the Nextflow world for over 7 years now (as of 2024), when I first converted a pipeline, [Staphopia](https://staphopia.github.io/) from Ruffus to Nextflow. Eventually, I would develop [Bactopia](https://bactopia.github.io/latest/), one of the leading and longest maintained (5 years now!) Nextflow pipelines for the analysis of Bacterial genomes. Through Bactopia, I’ve had the opportunity to help people all around the world get started using Nextflow and analyzing their own bacterial sequencing. It has also allowed me to make numerous contributions to nf-core, mostly through the nf-core/modules. So, when I heard about the opportunity to be a mentor in the nf-core’s Mentorship Program, I immediately applied.\n\nRound 1! To be honest, I didn’t know what to expect from the program. Only that I would help a mentee with whatever they needed related to Nextflow and nf-core. Then at the first meeting, I learned I would be working with Phil Ashton the Lead Bioinformatcian at Malawi Liverpool Wellcome Trust, in Blantyre, Malawi, and immediately sent him a “Yo!”. Phil and I had run into each other in the past because when it comes to bacterial genomics, the field is very small! Phil’s goal was to get Nextflow pipelines running on their infrastructure in Malawi to help with their public health response. We would end up using Bactopia as the model. But this mentorship wasn’t just about “running Bactopia”, for Phil it was important we built a basic understanding of how things are working on the back-end with Nextflow. In the end, Phil was able to get Nextflow, and Bactopia running, using Singularity, but also gain a better understanding of Nextflow by writing his own Nextflow code.\n\nRound 2! When Round 2 was announced, I didn’t hesitate to apply again as a mentor. This time, I would be paired up with Juan Ugalde, an Assistant Professor at Universidad Andres Bello in Santiago, Chile. I think Juan and I were both excited by this, as similar to Phil, Juan and I had run into each other (virtually) through MetaSub, a project to sequence samples taken from public transport systems across the globe. Like many during the COVID-19 pandemic, Juan was pulled into the response, during which he began looking into Nextflow for other viruses. In particular, hantavirus, a public health concern due to it being endemic in parts of Chile. Juan had developed a pipeline for hantavirus sequence analysis, and his goal was to convert it into Nextflow. Throughout this Juan got to learn about the nf-core community and Nextflow development, which he was successful at! As he was able to convert his pipeline into Nextflow and make it publicly available as [hantaflow](https://github.com/microbialds/hantaflow).\n\nRound 3! Well Round 3 almost didn’t happen for me, but I’m glad it did happen! At the first meeting, I learned I would be paired with Ícaro Maia Santos de Castro, at the time a PhD candidate at the University of São Paulo, in São Paulo, Brazil. We quickly learned we were both fans of One Piece, as Ícaro’s GitHub picture was Luffy from One Piece, haha and my background included a poster from One Piece. With Ícaro, we were starting with the basics of Nextflow (e.g. the nf-core training materials) with the goal of writing a Nextflow pipeline for his meta-transcriptomics dissertation work. We set the goal to develop his Nextflow pipeline, before an overseas move he had a few months away. He brought so many questions, his motivation never waned, and once he was asking questions about Channel Operators, I knew he was ready to write his pipeline. While writing his pipeline he learned about the nf-core/tools and also got to submit a new recipe to Bioconda, and modules to nf-core. By the end of the mentorship, Ícaro had succeeded in writing his pipeline in Nextflow and making it publicly available at [phiflow](https://github.com/icaromsc/nf-core-phiflow).\n\n
\n \"phiflow\n

Metromap of the phiflow workflow

\n
\n\nThrough all three rounds, I had the opportunity to work with some incredible people! But the awesomeness didn’t end with my mentees. One thing that always stuck out to me was how motivated everyone was, both mentees and mentors. There was a sense of excitement and real progress was being made by every group. After the first round ended, I remember thinking to myself, “how could it get better?” Haha, well it did, and it continued to get better and better in Rounds 2 and 3. I think this is a great testament to the organizers at nf-core that put it all together, the mentors and mentees, and the community behind Nextflow and nf-core.\n\nFor the future mentees in mentorship opportunities! Please don’t let yourself stop you from applying. Whether it’s a time issue, or a fear of not having enough experience to be productive. In each round, we’ve had people from all over the world, starting from the ground with no experience, to some mentees in which I wondered if maybe they should have been a mentor (some mentees did end up being a mentor in the last round!). As a mentee, it is a great opportunity to work directly with a mentor dedicated to seeing you grow and build confidence when it comes to Nextflow and bioinformatics. In addition, you will be introduced to the incredible community that is behind Nextflow and nf-core. I think you will quickly learn there are so many people in this community that are willing to help!\n\nFor the future mentors! It’s always awesome to be able to help others learn, but sometimes the mentor needs to learn too! For me, I found the nf-core Mentorship Program to be a great opportunity to improve my skills as a mentor. But it wasn’t just from working with my mentees. During each round I was surrounded by many great role models in the form of mentors and mentees to learn from. No two groups ever had the same goals, so you really get the chance to see so many different styles of mentorship being implemented, all producing significant results for each mentee. Like I told the mentees, if the opportunity comes up again, take the chance and apply to be a mentor!\n\nThere have now been three rounds of the nf-core Mentorship Program, and I am very proud to have been a mentor in each round! During this I have learned so much and been able to help my mentees and the community grow. I look forward to seeing what the future holds for the mentorship opportunities in the Nextflow community, and I encourage potential mentors and mentees to consider joining the program!\n", + "content": "In my journey with the nf-core Mentorship Program, I've mentored individuals from Malawi, Chile, and Brazil, guiding them through Nextflow and nf-core. Despite the distances, my mentees successfully adapted their workflows, contributing to the open-source community. Witnessing the transformative impact of mentorship firsthand, I'm encouraged to continue participating in future mentorship efforts and urge others to join this rewarding experience. But how did it all start?\n\n\n\nI’m [Robert Petit](https://www.robertpetit.com/), a bioinformatician at the [Wyoming Public Health Laboratory](https://health.wyo.gov/publichealth/lab/), in [Wyoming, USA](https://en.wikipedia.org/wiki/Wyoming). If you don’t know where that is, haha that’s fine, I’m pretty sure half the people in the US don’t know either! Wyoming is the 10th largest US state (253,000 km2), but the least populated with only about 580,000 people. It’s home to some very beautiful mountains and national parks, large animals including bears, wolves and the fastest land animal in the northern hemisphere, the Pronghorn. But it’s rural, can get cold (-10 C) and the high wind speeds (somedays average 50 kmph, with gusts 100+ kmph) only make it feel colder during the winter (sometimes feeling like -60 C to -40 C). You might be wondering:\n\nHow did some random person from Wyoming get involved in the nf-core Mentorship Program, and end up being the only mentor to have participated in all three rounds?\n\nI’ve been in the Nextflow world for over 7 years now (as of 2024), when I first converted a pipeline, [Staphopia](https://staphopia.github.io/) from Ruffus to Nextflow. Eventually, I would develop [Bactopia](https://bactopia.github.io/latest/), one of the leading and longest maintained (5 years now!) Nextflow pipelines for the analysis of Bacterial genomes. Through Bactopia, I’ve had the opportunity to help people all around the world get started using Nextflow and analyzing their own bacterial sequencing. It has also allowed me to make numerous contributions to nf-core, mostly through the nf-core/modules. So, when I heard about the opportunity to be a mentor in the nf-core’s Mentorship Program, I immediately applied.\n\nRound 1! To be honest, I didn’t know what to expect from the program. Only that I would help a mentee with whatever they needed related to Nextflow and nf-core. Then at the first meeting, I learned I would be working with Phil Ashton the Lead Bioinformatcian at Malawi Liverpool Wellcome Trust, in Blantyre, Malawi, and immediately sent him a “Yo!”. Phil and I had run into each other in the past because when it comes to bacterial genomics, the field is very small! Phil’s goal was to get Nextflow pipelines running on their infrastructure in Malawi to help with their public health response. We would end up using Bactopia as the model. But this mentorship wasn’t just about “running Bactopia”, for Phil it was important we built a basic understanding of how things are working on the back-end with Nextflow. In the end, Phil was able to get Nextflow, and Bactopia running, using Singularity, but also gain a better understanding of Nextflow by writing his own Nextflow code.\n\nRound 2! When Round 2 was announced, I didn’t hesitate to apply again as a mentor. This time, I would be paired up with Juan Ugalde, an Assistant Professor at Universidad Andres Bello in Santiago, Chile. I think Juan and I were both excited by this, as similar to Phil, Juan and I had run into each other (virtually) through MetaSub, a project to sequence samples taken from public transport systems across the globe. Like many during the COVID-19 pandemic, Juan was pulled into the response, during which he began looking into Nextflow for other viruses. In particular, hantavirus, a public health concern due to it being endemic in parts of Chile. Juan had developed a pipeline for hantavirus sequence analysis, and his goal was to convert it into Nextflow. Throughout this Juan got to learn about the nf-core community and Nextflow development, which he was successful at! As he was able to convert his pipeline into Nextflow and make it publicly available as [hantaflow](https://github.com/microbialds/hantaflow).\n\nRound 3! Well Round 3 almost didn’t happen for me, but I’m glad it did happen! At the first meeting, I learned I would be paired with Ícaro Maia Santos de Castro, at the time a PhD candidate at the University of São Paulo, in São Paulo, Brazil. We quickly learned we were both fans of One Piece, as Ícaro’s GitHub picture was Luffy from One Piece, haha and my background included a poster from One Piece. With Ícaro, we were starting with the basics of Nextflow (e.g. the nf-core training materials) with the goal of writing a Nextflow pipeline for his meta-transcriptomics dissertation work. We set the goal to develop his Nextflow pipeline, before an overseas move he had a few months away. He brought so many questions, his motivation never waned, and once he was asking questions about Channel Operators, I knew he was ready to write his pipeline. While writing his pipeline he learned about the nf-core/tools and also got to submit a new recipe to Bioconda, and modules to nf-core. By the end of the mentorship, Ícaro had succeeded in writing his pipeline in Nextflow and making it publicly available at [phiflow](https://github.com/icaromsc/nf-core-phiflow).\n\n
\n \"phiflow\n\nMetromap of the [phiflow](https://github.com/icaromsc/nf-core-phiflow) workflow\n\n
\n\nThrough all three rounds, I had the opportunity to work with some incredible people! But the awesomeness didn’t end with my mentees. One thing that always stuck out to me was how motivated everyone was, both mentees and mentors. There was a sense of excitement and real progress was being made by every group. After the first round ended, I remember thinking to myself, “how could it get better?” Haha, well it did, and it continued to get better and better in Rounds 2 and 3. I think this is a great testament to the organizers at nf-core that put it all together, the mentors and mentees, and the community behind Nextflow and nf-core.\n\nFor the future mentees in mentorship opportunities! Please don’t let yourself stop you from applying. Whether it’s a time issue, or a fear of not having enough experience to be productive. In each round, we’ve had people from all over the world, starting from the ground with no experience, to some mentees in which I wondered if maybe they should have been a mentor (some mentees did end up being a mentor in the last round!). As a mentee, it is a great opportunity to work directly with a mentor dedicated to seeing you grow and build confidence when it comes to Nextflow and bioinformatics. In addition, you will be introduced to the incredible community that is behind Nextflow and nf-core. I think you will quickly learn there are so many people in this community that are willing to help!\n\nFor the future mentors! It’s always awesome to be able to help others learn, but sometimes the mentor needs to learn too! For me, I found the nf-core Mentorship Program to be a great opportunity to improve my skills as a mentor. But it wasn’t just from working with my mentees. During each round I was surrounded by many great role models in the form of mentors and mentees to learn from. No two groups ever had the same goals, so you really get the chance to see so many different styles of mentorship being implemented, all producing significant results for each mentee. Like I told the mentees, if the opportunity comes up again, take the chance and apply to be a mentor!\n\nThere have now been three rounds of the nf-core Mentorship Program, and I am very proud to have been a mentor in each round! During this I have learned so much and been able to help my mentees and the community grow. I look forward to seeing what the future holds for the mentorship opportunities in the Nextflow community, and I encourage potential mentors and mentees to consider joining the program!", "images": [ "/img/blog-2024-04-25-mentorship-img1a.png" ], @@ -708,7 +708,7 @@ "slug": "2024/experimental-cleanup-with-nf-boost", "title": "Experimental cleanup with nf-boost", "date": "2024-08-08T00:00:00.000Z", - "content": "\n### Backstory\n\nWhen I (Ben) was in grad school, I worked on a Nextflow pipeline called [GEMmaker](https://github.com/systemsgenetics/gemmaker), an RNA-seq analysis pipeline similar to [nf-core/rnaseq](https://github.com/nf-core/rnaseq). We quickly ran into a problem, which is that on large runs, we were running out of storage! As it turns out, it wasn’t the final outputs, but the intermediate outputs (the BAM files, etc) that were taking up so much space, and we figured that if we could just delete those intermediate files sooner, we might be able to make it through a pipeline run without running out of storage. We were far from alone.\n\n\n\nAutomatic cleanup is currently the [oldest open issue](https://github.com/nextflow-io/nextflow/issues/452) on the Nextflow repository. For many users, the ability to quickly delete intermediate files makes the difference between a run being possible or impossible. [Stephen Ficklin](https://github.com/spficklin), the creator of GEMmaker, came up with a clever way to delete intermediate files and even “trick” Nextflow into skipping deleted tasks on a resumed run, which you can read about in the GitHub issue. It involved wiring the intermediate output channels to a “cleanup” process, along with a “done” signal from the relevant downstream processes to ensure that the intermediates were deleted at the right time.\n\nThis hack worked, but it required a lot of manual effort to wire up the cleanup process correctly, and it left me wondering whether it could be done automatically. Nextflow should be able to analyze the DAG, figure out when an output file can be deleted, and then delete it! During my time on the Nextflow team, I have implemented this exact idea in a [pull request](https://github.com/nextflow-io/nextflow/pull/3849), but there are still a few challenges to resolve, such as resuming from deleted runs (which is not as impossible as it sounds).\n\n### Introducing nf-boost: experimental features for Nextflow\n\nMany users have told me that they would gladly take the cleanup without the resume, so I found a way to provide the cleanup functionality in a plugin, which I call [nf-boost](https://github.com/bentsherman/nf-boost). This plugin is not just about automatic cleanup – it contains a variety of experimental features, like new operators and functions, that anyone can try today with a few extra lines of config, which is much less tedious than building Nextflow from a pull request. Not every new feature can be implemented via plugin, but for those features that can, it’s nice for the community to be able to try it out before we make it official.\n\nThe nf-boost plugin requires Nextflow v23.10.0 or later. You can enable the experimental cleanup by adding the following lines to your config file:\n\n```groovy\nplugins {\n id 'nf-boost'\n}\n\nboost {\n cleanup = true\n}\n```\n\n### Automatic cleanup: how it works\n\nThe strategy of automatic cleanup is simple:\n\n1. As soon as an output file can be deleted, delete it\n2. An output file can be deleted when (1) all downstream tasks that use the output file as an input have completed AND (2) the output file has been published (if it needs to be published)\n\nIn practice, the conditions for 2(a) are tricky to get right because Nextflow doesn’t know the full task graph from the start (thanks to the flexibility of Nextflow’s dataflow operators). But you don’t have to worry about any of that because we already figured out how to make it work! All you have to do is flip a switch (`boost.cleanup = true`) and enjoy the ride.\n\n### Real-world example\n\nLet’s consider a variant calling pipeline following standard best practices. Sequencing reads are mapped onto the genome, producing a BAM file which will be marked for duplicates, filtered, recalibrated using GATK, etc. This means that, for a given sample, at least four copies of the BAM file will be stored in the work directory. In other words, for an initial paired-end whole-exome sequencing (WES) sample of 12 GB, the work directory will quickly grow to 50 GB just to store the BAM files for one sample, or 100 GB for a paired sample (e.g. germline and tumor).\n\nNow suppose that we want to analyze a cohort of 100 patients – that’s ~10 TB of intermediate data, which is a real problem. For some users, it means processing only a few samples at a time, even though they might have the compute capacity to do much more. For others, it means not being able to process even one sample, because the accumulated intermediate data is simply too large. With automatic cleanup, Nextflow should be able to delete the previous BAM as soon as the next BAM is produced, for each sample independently.\n\nWe tested this use-case with a paired WES sample (total input size of 26.8 GB), by tracking the work directory size for a run with and a run without automatic cleanup. The results are shown below.\n\n\"disk\n\n_Note: we also changed the `boost.cleanupInterval` config option to 180 seconds, which was more optimal for our system._\n\nAs expected, we see that without automatic cleanup, the size of the work directory reaches 110 GB when all BAM files are produced and never deleted. On the other hand, when the nf-boost cleanup is enabled, the work directory occasionally peaks at ~50 GB (i.e. no more than two BAM files are stored at the same time), but always returns to ~25 GB, since the previous BAM is deleted immediately after the next BAM is ready. There is no impact on the size of the results (since they are identical) or the total runtime (since cleanup happens in parallel with the workflow itself).\n\nIn this case, automatic cleanup reduced the total storage by 50-75% (depending on how you measure the storage). In general, the effectiveness of automatic cleanup will depend greatly on how you write your pipeline. Here are a few rules of thumb that we’ve come up with so far:\n\n- As your pipeline becomes “deeper” (i.e. more processing steps in sequence), automatic cleanup becomes more effective, because it only needs to keep two steps’ worth of data, regardless of the total number of steps\n- As your pipeline becomes “wider” (i.e. more inputs being processed in parallel), automatic cleanup should have roughly the same level of effectiveness. If some samples take longer to process than others, the peak storage should be lower with automatic cleanup, since the “peaks” for each sample will happen at different times.\n- As you add more dependencies between processes, automatic cleanup becomes less effective, because it has to wait longer before it can delete the upstream outputs. Note that each output is tracked independently, so for example, sending logs to a summary process won’t affect the cleanup of other outputs from that same process.\n\n### Closing thoughts\n\nAutomatic cleanup in nf-boost is an experimental feature, and notably does not support resumability, meaning that the deleted files will simply be re-executed on a resumed run. While we work through these last few challenges, the nf-boost plugin is a nice option for users who want to benefit from what we’ve built so far and don’t need the resumability.\n\nThe nice thing about nf-boost’s automatic cleanup is that it is just a preview of what will eventually be the “official” cleanup feature in Nextflow (when it is merged), so by using nf-boost, you are helping the future of Nextflow directly! We hope that this experimental version will help users run workloads that were previously difficult or even impossible, and we look forward to when we can bring this feature home to Nextflow.\n", + "content": "### Backstory\n\nWhen I (Ben) was in grad school, I worked on a Nextflow pipeline called [GEMmaker](https://github.com/systemsgenetics/gemmaker), an RNA-seq analysis pipeline similar to [nf-core/rnaseq](https://github.com/nf-core/rnaseq). We quickly ran into a problem, which is that on large runs, we were running out of storage! As it turns out, it wasn’t the final outputs, but the intermediate outputs (the BAM files, etc) that were taking up so much space, and we figured that if we could just delete those intermediate files sooner, we might be able to make it through a pipeline run without running out of storage. We were far from alone.\n\n\n\nAutomatic cleanup is currently the [oldest open issue](https://github.com/nextflow-io/nextflow/issues/452) on the Nextflow repository. For many users, the ability to quickly delete intermediate files makes the difference between a run being possible or impossible. [Stephen Ficklin](https://github.com/spficklin), the creator of GEMmaker, came up with a clever way to delete intermediate files and even “trick” Nextflow into skipping deleted tasks on a resumed run, which you can read about in the GitHub issue. It involved wiring the intermediate output channels to a “cleanup” process, along with a “done” signal from the relevant downstream processes to ensure that the intermediates were deleted at the right time.\n\nThis hack worked, but it required a lot of manual effort to wire up the cleanup process correctly, and it left me wondering whether it could be done automatically. Nextflow should be able to analyze the DAG, figure out when an output file can be deleted, and then delete it! During my time on the Nextflow team, I have implemented this exact idea in a [pull request](https://github.com/nextflow-io/nextflow/pull/3849), but there are still a few challenges to resolve, such as resuming from deleted runs (which is not as impossible as it sounds).\n\n### Introducing nf-boost: experimental features for Nextflow\n\nMany users have told me that they would gladly take the cleanup without the resume, so I found a way to provide the cleanup functionality in a plugin, which I call [nf-boost](https://github.com/bentsherman/nf-boost). This plugin is not just about automatic cleanup – it contains a variety of experimental features, like new operators and functions, that anyone can try today with a few extra lines of config, which is much less tedious than building Nextflow from a pull request. Not every new feature can be implemented via plugin, but for those features that can, it’s nice for the community to be able to try it out before we make it official.\n\nThe nf-boost plugin requires Nextflow v23.10.0 or later. You can enable the experimental cleanup by adding the following lines to your config file:\n\n```groovy\nplugins {\n id 'nf-boost'\n}\n\nboost {\n cleanup = true\n}\n```\n\n### Automatic cleanup: how it works\n\nThe strategy of automatic cleanup is simple:\n\n1. As soon as an output file can be deleted, delete it\n2. An output file can be deleted when (1) all downstream tasks that use the output file as an input have completed AND (2) the output file has been published (if it needs to be published)\n\nIn practice, the conditions for 2(a) are tricky to get right because Nextflow doesn’t know the full task graph from the start (thanks to the flexibility of Nextflow’s dataflow operators). But you don’t have to worry about any of that because we already figured out how to make it work! All you have to do is flip a switch (`boost.cleanup = true`) and enjoy the ride.\n\n### Real-world example\n\nLet’s consider a variant calling pipeline following standard best practices. Sequencing reads are mapped onto the genome, producing a BAM file which will be marked for duplicates, filtered, recalibrated using GATK, etc. This means that, for a given sample, at least four copies of the BAM file will be stored in the work directory. In other words, for an initial paired-end whole-exome sequencing (WES) sample of 12 GB, the work directory will quickly grow to 50 GB just to store the BAM files for one sample, or 100 GB for a paired sample (e.g. germline and tumor).\n\nNow suppose that we want to analyze a cohort of 100 patients – that’s ~10 TB of intermediate data, which is a real problem. For some users, it means processing only a few samples at a time, even though they might have the compute capacity to do much more. For others, it means not being able to process even one sample, because the accumulated intermediate data is simply too large. With automatic cleanup, Nextflow should be able to delete the previous BAM as soon as the next BAM is produced, for each sample independently.\n\nWe tested this use-case with a paired WES sample (total input size of 26.8 GB), by tracking the work directory size for a run with and a run without automatic cleanup. The results are shown below.\n\n\"disk\n\n_Note: we also changed the `boost.cleanupInterval` config option to 180 seconds, which was more optimal for our system._\n\nAs expected, we see that without automatic cleanup, the size of the work directory reaches 110 GB when all BAM files are produced and never deleted. On the other hand, when the nf-boost cleanup is enabled, the work directory occasionally peaks at ~50 GB (i.e. no more than two BAM files are stored at the same time), but always returns to ~25 GB, since the previous BAM is deleted immediately after the next BAM is ready. There is no impact on the size of the results (since they are identical) or the total runtime (since cleanup happens in parallel with the workflow itself).\n\nIn this case, automatic cleanup reduced the total storage by 50-75% (depending on how you measure the storage). In general, the effectiveness of automatic cleanup will depend greatly on how you write your pipeline. Here are a few rules of thumb that we’ve come up with so far:\n\n- As your pipeline becomes “deeper” (i.e. more processing steps in sequence), automatic cleanup becomes more effective, because it only needs to keep two steps’ worth of data, regardless of the total number of steps\n- As your pipeline becomes “wider” (i.e. more inputs being processed in parallel), automatic cleanup should have roughly the same level of effectiveness. If some samples take longer to process than others, the peak storage should be lower with automatic cleanup, since the “peaks” for each sample will happen at different times.\n- As you add more dependencies between processes, automatic cleanup becomes less effective, because it has to wait longer before it can delete the upstream outputs. Note that each output is tracked independently, so for example, sending logs to a summary process won’t affect the cleanup of other outputs from that same process.\n\n### Closing thoughts\n\nAutomatic cleanup in nf-boost is an experimental feature, and notably does not support resumability, meaning that the deleted files will simply be re-executed on a resumed run. While we work through these last few challenges, the nf-boost plugin is a nice option for users who want to benefit from what we’ve built so far and don’t need the resumability.\n\nThe nice thing about nf-boost’s automatic cleanup is that it is just a preview of what will eventually be the “official” cleanup feature in Nextflow (when it is merged), so by using nf-boost, you are helping the future of Nextflow directly! We hope that this experimental version will help users run workloads that were previously difficult or even impossible, and we look forward to when we can bring this feature home to Nextflow.", "images": [ "/img/blog-2024-08-08-nfboost-img1a.png" ], @@ -719,7 +719,7 @@ "slug": "2024/how_i_became_a_nextflow_ambassador", "title": "How I became a Nextflow Ambassador!", "date": "2024-07-24T00:00:00.000Z", - "content": "\nAs a PhD student in bioinformatics, I aimed to build robust pipelines to analyze diverse datasets throughout my research. Initially, mastering Bash scripting was a time-consuming challenge, but this journey ultimately led me to become a Nextflow Ambassador, engaging actively with the expert Nextflow community.\n\n\n\nMy name is [Firas Zemzem](https://www.linkedin.com/in/firaszemzem/), a PhD student based in [Tunisia](https://www.google.com/search?q=things+to+do+in+tunisia&sca_esv=3b07b09e3325eaa7&sca_upv=1&udm=15&biw=1850&bih=932&ei=AS2eZuqnFpG-i-gPwciJyAk&ved=0ahUKEwiqrOiRsbqHAxUR3wIHHUFkApkQ4dUDCBA&uact=5&oq=things+to+do+in+tunisia&gs_lp=Egxnd3Mtd2l6LXNlcnAiF3RoaW5ncyB0byBkbyBpbiB0dW5pc2lhMgUQABiABDIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yCBAAGBYYHhgPSOIGULYDWNwEcAF4AZABAJgBfaAB9gGqAQMwLjK4AQPIAQD4AQGYAgOgAoYCwgIKEAAYsAMY1gQYR5gDAIgGAZAGCJIHAzEuMqAH_Aw&sclient=gws-wiz-serp) working with the Laboratory of Cytogenetics, Molecular Genetics, and Biology of Reproduction at CHU Farhat Hached Sousse. I was specialized in human genetics, focusing on studying genomics behind neurodevelopmental disorders. Hence Developing methods for detecting SNPs and variants related to my work was crucial step for advancing medical research and improving patient outcomes. On the other hand, pipelines integration and bioinformatics tools were essential in this process, enabling efficient data analysis, accurate variant detection, and streamlined workflows that enhance the reliability and reproducibility of our findings.\n\n## The initial nightmare of Bash\n\nDuring my master's degree, I was a steadfast user of Bash scripting. Bash had been my go-to tool for automating tasks and managing workflows in my bioinformatics projects, such as variant calling. Its simplicity and versatility made it an indispensable part of my toolkit. I was writing Bash scripts for various next-generation sequencing (NGS) high-throughput analyses, including data preprocessing, quality control, alignment, and variant calling. However, as my projects grew more complex, I began to encounter the limitations of Bash. Managing dependencies, handling parallel executions, and ensuring reproducibility became increasingly challenging. Handling the vast amount of data generated by NGS and other high-throughput technologies was cumbersome. Using Bash became a nightmare for debugging and maintaining. I spent countless hours trying to make it work, only to be met with more errors and inefficiencies. It was nearly impossible to scale for larger datasets and more complex analyses. Additionally, managing different environments and versions of tools was beyond Bash's capabilities. I needed a solution that could handle these challenges more gracefully.\n\n## Game-Changing Call\n\nOne evening, I received a call from my friend, Mr. HERO, a bioinformatician. As we discussed our latest projects, I vented my frustrations with Bash. Mr. HERO, as I called him, the problem-solver, mentioned a tool called Nextflow. He described how it had revolutionized his workflow, making complex pipeline management a breeze. Intrigued, I decided to look into it.\n\n## Diving Into the process\n\nReading the [documentation](https://www.nextflow.io/docs/latest/index.html) and watching [tutorials](https://training.nextflow.io/) were my first steps. Nextflow's approach to workflow management was a revelation. Unlike Bash, Nextflow was designed to address the complexities of modern computational questions. It provided a transparent, declarative syntax for defining tasks and their dependencies and supported parallel execution out of the box. The first thing I did when I decided to convert one of my existing Bash scripts into a Nextflow pipeline was to start experimenting with simple code. Doing this was no small feat. I had to rethink my approach to workflow design and embrace a new way of defining tasks and dependencies. My learning curve was not too steep, so understanding how to translate my Bash logic into Nextflow's domain-specific language (DSL) was not that hard.\n\n## Eureka Moment: First run\n\nThe first time I ran my Nextflow pipeline, I was amazed by how smoothly and efficiently it handled tasks that previously took hours to debug and execute in Bash. Nextflow managed task dependencies, parallel execution, and error handling with ease, resulting in a faster, more reliable, and maintainable pipeline. The ability to run pipelines on different computing environments, from local machines to high-performance clusters and cloud platforms, was a game-changer. Several Nextflow features were particularly valuable: Containerization Support using Docker and Singularity ensured consistency across environments; Error Handling with automatic retry mechanisms and detailed error reporting saved countless debugging hours; Portability and scalability allowed seamless execution on various platforms; Modularity facilitated the reuse and combination of processes across different pipelines, enhancing efficiency and organization; and Reproducibility features, including versioning and traceability, ensured that workflows could be reliably reproduced and shared across different research projects and teams.\n\n
\n \"meme\n
\n\n## New Horizons: Becoming a Nextflow Ambassador\n\nSwitching from Bash scripting to Nextflow was more than just adopting a new tool. It was about embracing a new mindset. Nextflow’s emphasis on scalability, reproducibility, and ease of use transformed how I approached bioinformatics. The initial effort to learn Nextflow paid off in spades, leading to more robust, maintainable, and scalable workflows. My enthusiasm and advocacy for Nextflow didn't go unnoticed. Recently, I became a Nextflow Ambassador. This role allows me to further contribute to the community, promote best practices, and support new users as they embark on their own Nextflow journeys.\n\n## Future Projects and Community Engagement\n\nCurrently I am working on developing a Nextflow pipeline with my team that will help in analyzing variants, providing valuable insights for medical and clinical applications. This pipeline aims to improve the accuracy and efficiency of variant detection, ultimately supporting better diagnostic for patients with various genetic conditions. As part of my ongoing efforts within the Nextflow community, I am planning a series of projects aimed at developing and sharing advanced Nextflow pipelines tailored to specific genetic rare disorder analyses. These initiative will include detailed tutorials, case studies, and collaborative efforts with other researchers to enhance the accessibility and utility of Nextflow for various bioinformatics applications. Additionally, I plan to host workshops and seminars to spread knowledge and best practices among my colleagues and other researchers. This will help foster a collaborative environment where we can all benefit from the power and flexibility of Nextflow.\n\n## Invitation for researchers over the world\n\nAs a Nextflow Ambassador, I invite you to become part of a dynamic group of experts and enthusiasts dedicated to advancing workflow automation. Whether you're just starting or looking to deepen your knowledge, our community offers invaluable resources, support, and networking opportunities. You can chat with us on the [Nextflow Slack Workspace](https://www.nextflow.io/slack-invite.html) and ask your questions at the [Seqera Community Forum](https://community.seqera.io).\n", + "content": "As a PhD student in bioinformatics, I aimed to build robust pipelines to analyze diverse datasets throughout my research. Initially, mastering Bash scripting was a time-consuming challenge, but this journey ultimately led me to become a Nextflow Ambassador, engaging actively with the expert Nextflow community.\n\n\n\nMy name is [Firas Zemzem](https://www.linkedin.com/in/firaszemzem/), a PhD student based in [Tunisia](https://www.google.com/search?q=things+to+do+in+tunisia&sca_esv=3b07b09e3325eaa7&sca_upv=1&udm=15&biw=1850&bih=932&ei=AS2eZuqnFpG-i-gPwciJyAk&ved=0ahUKEwiqrOiRsbqHAxUR3wIHHUFkApkQ4dUDCBA&uact=5&oq=things+to+do+in+tunisia&gs_lp=Egxnd3Mtd2l6LXNlcnAiF3RoaW5ncyB0byBkbyBpbiB0dW5pc2lhMgUQABiABDIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yCBAAGBYYHhgPSOIGULYDWNwEcAF4AZABAJgBfaAB9gGqAQMwLjK4AQPIAQD4AQGYAgOgAoYCwgIKEAAYsAMY1gQYR5gDAIgGAZAGCJIHAzEuMqAH_Aw&sclient=gws-wiz-serp) working with the Laboratory of Cytogenetics, Molecular Genetics, and Biology of Reproduction at CHU Farhat Hached Sousse. I was specialized in human genetics, focusing on studying genomics behind neurodevelopmental disorders. Hence Developing methods for detecting SNPs and variants related to my work was crucial step for advancing medical research and improving patient outcomes. On the other hand, pipelines integration and bioinformatics tools were essential in this process, enabling efficient data analysis, accurate variant detection, and streamlined workflows that enhance the reliability and reproducibility of our findings.\n\n## The initial nightmare of Bash\n\nDuring my master's degree, I was a steadfast user of Bash scripting. Bash had been my go-to tool for automating tasks and managing workflows in my bioinformatics projects, such as variant calling. Its simplicity and versatility made it an indispensable part of my toolkit. I was writing Bash scripts for various next-generation sequencing (NGS) high-throughput analyses, including data preprocessing, quality control, alignment, and variant calling. However, as my projects grew more complex, I began to encounter the limitations of Bash. Managing dependencies, handling parallel executions, and ensuring reproducibility became increasingly challenging. Handling the vast amount of data generated by NGS and other high-throughput technologies was cumbersome. Using Bash became a nightmare for debugging and maintaining. I spent countless hours trying to make it work, only to be met with more errors and inefficiencies. It was nearly impossible to scale for larger datasets and more complex analyses. Additionally, managing different environments and versions of tools was beyond Bash's capabilities. I needed a solution that could handle these challenges more gracefully.\n\n## Game-Changing Call\n\nOne evening, I received a call from my friend, Mr. HERO, a bioinformatician. As we discussed our latest projects, I vented my frustrations with Bash. Mr. HERO, as I called him, the problem-solver, mentioned a tool called Nextflow. He described how it had revolutionized his workflow, making complex pipeline management a breeze. Intrigued, I decided to look into it.\n\n## Diving Into the process\n\nReading the [documentation](https://www.nextflow.io/docs/latest/index.html) and watching [tutorials](https://training.nextflow.io/) were my first steps. Nextflow's approach to workflow management was a revelation. Unlike Bash, Nextflow was designed to address the complexities of modern computational questions. It provided a transparent, declarative syntax for defining tasks and their dependencies and supported parallel execution out of the box. The first thing I did when I decided to convert one of my existing Bash scripts into a Nextflow pipeline was to start experimenting with simple code. Doing this was no small feat. I had to rethink my approach to workflow design and embrace a new way of defining tasks and dependencies. My learning curve was not too steep, so understanding how to translate my Bash logic into Nextflow's domain-specific language (DSL) was not that hard.\n\n## Eureka Moment: First run\n\nThe first time I ran my Nextflow pipeline, I was amazed by how smoothly and efficiently it handled tasks that previously took hours to debug and execute in Bash. Nextflow managed task dependencies, parallel execution, and error handling with ease, resulting in a faster, more reliable, and maintainable pipeline. The ability to run pipelines on different computing environments, from local machines to high-performance clusters and cloud platforms, was a game-changer. Several Nextflow features were particularly valuable: Containerization Support using Docker and Singularity ensured consistency across environments; Error Handling with automatic retry mechanisms and detailed error reporting saved countless debugging hours; Portability and scalability allowed seamless execution on various platforms; Modularity facilitated the reuse and combination of processes across different pipelines, enhancing efficiency and organization; and Reproducibility features, including versioning and traceability, ensured that workflows could be reliably reproduced and shared across different research projects and teams.\n\n
\n \"meme\n
\n\n## New Horizons: Becoming a Nextflow Ambassador\n\nSwitching from Bash scripting to Nextflow was more than just adopting a new tool. It was about embracing a new mindset. Nextflow’s emphasis on scalability, reproducibility, and ease of use transformed how I approached bioinformatics. The initial effort to learn Nextflow paid off in spades, leading to more robust, maintainable, and scalable workflows. My enthusiasm and advocacy for Nextflow didn't go unnoticed. Recently, I became a Nextflow Ambassador. This role allows me to further contribute to the community, promote best practices, and support new users as they embark on their own Nextflow journeys.\n\n## Future Projects and Community Engagement\n\nCurrently I am working on developing a Nextflow pipeline with my team that will help in analyzing variants, providing valuable insights for medical and clinical applications. This pipeline aims to improve the accuracy and efficiency of variant detection, ultimately supporting better diagnostic for patients with various genetic conditions. As part of my ongoing efforts within the Nextflow community, I am planning a series of projects aimed at developing and sharing advanced Nextflow pipelines tailored to specific genetic rare disorder analyses. These initiative will include detailed tutorials, case studies, and collaborative efforts with other researchers to enhance the accessibility and utility of Nextflow for various bioinformatics applications. Additionally, I plan to host workshops and seminars to spread knowledge and best practices among my colleagues and other researchers. This will help foster a collaborative environment where we can all benefit from the power and flexibility of Nextflow.\n\n## Invitation for researchers over the world\n\nAs a Nextflow Ambassador, I invite you to become part of a dynamic group of experts and enthusiasts dedicated to advancing workflow automation. Whether you're just starting or looking to deepen your knowledge, our community offers invaluable resources, support, and networking opportunities. You can chat with us on the [Nextflow Slack Workspace](https://www.nextflow.io/slack-invite.html) and ask your questions at the [Seqera Community Forum](https://community.seqera.io).", "images": [ "/img/ZemFiras-nextflowtestpipeline-Blog.png" ], @@ -730,7 +730,7 @@ "slug": "2024/nextflow-24.04-highlights", "title": "Nextflow 24.04 - Release highlights", "date": "2024-05-27T00:00:00.000Z", - "content": "\nWe release an \"edge\" version of Nextflow every month and a \"stable\" version every six months. The stable releases are recommended for production usage and represent a significant milestone. The [release changelogs](https://github.com/nextflow-io/nextflow/releases) contain a lot of detail, so we thought we'd highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let's get into it!\n\n:::tip\nWe also did a podcast episode about some of these changes!\nCheck it out here: [Channels Episode 41](/podcast/2024/ep41_nextflow_2404.html).\n:::\n\n## Table of contents\n\n- [New features](#new-features)\n - [Seqera Containers](#seqera-containers)\n - [Workflow output definition](#workflow-output-definition)\n - [Topic channels](#topic-channels)\n - [Process eval outputs](#process-eval-outputs)\n - [Resource limits](#resource-limits)\n - [Job arrays](#job-arrays)\n- [Enhancements](#enhancements)\n - [Colored logs](#colored-logs)\n - [AWS Fargate support](#aws-fargate-support)\n - [OCI auto pull mode for Singularity and Apptainer](#oci-auto-pull-mode-for-singularity-and-apptainer)\n - [Support for GA4GH TES](#support-for-ga4gh-tes)\n- [Fusion](#fusion)\n - [Enhanced Garbage Collection](#enhanced-garbage-collection)\n - [Increased File Handling Capacity](#increased-file-handling-capacity)\n - [Correct Publishing of Symbolic Links](#correct-publishing-of-symbolic-links)\n- [Other notable changes](#other-notable-changes)\n\n## New features\n\n### Seqera Containers\n\nA new flagship community offering was revealed at the Nextflow Summit 2024 Boston - **Seqera Containers**. This is a free-to-use container cache powered by [Wave](https://seqera.io/wave/), allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a [dedicated blog post](https://seqera.io/blog/introducing-seqera-pipelines-containers/) about this, but it's worth noting that the service can be used directly from Nextflow and not only through [https://seqera.io/containers/](https://seqera.io/containers/)\n\nIn order to use Seqera Containers in Nextflow, simply set `wave.freeze` _without_ setting `wave.build.repository` - for example, by using the following config for your pipeline:\n\n```groovy\nwave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'\n```\n\nAny processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether `singularity.enabled` is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the `nextflow inspect` command.\n\n### Workflow output definition\n\nThe workflow output definition is a new syntax for defining workflow outputs:\n\n```groovy\nnextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]\n```\n\nIt essentially provides a DSL2-style approach for publishing, and will replace `publishDir` once it is finalized. It also provides extra flexibility as it allows you to publish _any_ channel, not just process outputs. See the [Nextflow docs](https://nextflow.io/docs/latest/workflow.html#publishing-outputs) for more information.\n\n:::info\nThis feature is still in preview and may change in a future release.\nWe hope to finalize it in version 24.10, so don't hesitate to share any feedback with us!\n:::\n\n### Topic channels\n\nTopic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or \"topic\":\n\n```groovy\nprocess foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}\n```\n\nTopic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the `mix` operator). See the [Nextflow docs](https://nextflow.io/docs/latest/channel.html#topic) for more information.\n\n### Process `eval` outputs\n\nProcess `eval` outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:\n\n```groovy\nprocess sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}\n```\n\nThe shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a `path` or `env` output. The new `eval` output is a much more convenient way to capture this kind of command output directly. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#output-type-eval) for more information.\n\n#### Collecting software versions\n\nTogether, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:\n\n```groovy\nprocess FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}\n```\n\nThis approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq [here](https://github.com/nf-core/rnaseq/pull/1109) and [here](https://github.com/nf-core/rnaseq/pull/1115) to see them in action!\n\n### Resource limits\n\nThe **resourceLimits** directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:\n\n```groovy\nprocess.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]\n```\n\nIf a task requests more than the specified limit (e.g. due to [retry with dynamic resources](https://nextflow.io/docs/latest/process.html#dynamic-computing-resources)), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the `check_max()` function, which can now be replaced with `resourceLimits`. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#resourcelimits) for more information.\n\n### Job arrays\n\n**Job arrays** are now supported in Nextflow using the `array` directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a \"job array\" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.\n\nWith Nextflow, using job arrays is a one-liner:\n\n```groovy\nprocess.array = 100\n```\n\nYou can also enable job arrays for individual processes like any other directive. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#array) for more information.\n\n:::tip\nOn Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the `machineType` directive in conjunction with the `cpus` and `memory` directives.\n:::\n\n## Enhancements\n\n### Colored logs\n\n
\n\n**Colored logs** have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire [blog post](https://nextflow.io/blog/2024/nextflow-colored-logs.html) about it, so go check that out for more details!\n\n
\n\n![New coloured output from Nextflow](/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png)\n\n
\n\n### AWS Fargate support\n\nNextflow now supports **AWS Fargate** for AWS Batch jobs. See the [Nextflow docs](https://nextflow.io/docs/latest/aws.html#aws-fargate) for details.\n\n### OCI auto pull mode for Singularity and Apptainer\n\nNextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format.\n\nNow Nextflow allows specifying the option `ociAutoPull` both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the `exec` command.\n\n```groovy\nsingularity.ociAutoPull = true\n```\n\nThis results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/config.html#scope-singularity) for more information.\n\n### Support for GA4GH TES\n\nThe [Task Execution Service (TES)](https://ga4gh.github.io/task-execution-schemas/docs/) is an API specification, developed by [GA4GH](https://www.ga4gh.org/), which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are [Funnel](https://github.com/ohsu-comp-bio/funnel) and [TES Azure](https://github.com/microsoft/ga4gh-tes).\n\nNextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the `bin` directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:\n\n```groovy\nplugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'\n```\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/executor.html#ga4gh-tes) for more information.\n\n:::note\nTo better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, `nextflow-io/nf-ga4gh`. To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above.\n:::\n\n## Fusion\n\n[Fusion](https://seqera.io/fusion/) is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:\n\n### Enhanced Garbage Collection\n\nFusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage.\n\n### Increased File Handling Capacity\n\nSupport for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets.\n\n### Correct Publishing of Symbolic Links\n\nIn previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.\n\nThese enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users.\n\n## Other notable changes\n\n- Add native retry on spot termination for Google Batch ([`ea1c1b`](https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b))\n- Add support for instance templates in Google Batch ([`df7ed2`](https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96))\n- Allow secrets to be used with `includeConfig` ([`00c9f2`](https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d))\n- Allow secrets to be used in the pipeline script ([`df866a`](https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27))\n- Add retry strategy for publishing ([`c9c703`](https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761))\n- Add `k8s.cpuLimits` config option ([`3c6e96`](https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7))\n- Removed `seqera` and `defaults` from the standard channels used by the nf-wave plugin. ([`ec5ebd`](https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062))\n\nYou can view the full [Nextflow release notes on GitHub](https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0).\n", + "content": "We release an \"edge\" version of Nextflow every month and a \"stable\" version every six months. The stable releases are recommended for production usage and represent a significant milestone. The [release changelogs](https://github.com/nextflow-io/nextflow/releases) contain a lot of detail, so we thought we'd highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let's get into it!\n\n:::tip\nWe also did a podcast episode about some of these changes!\nCheck it out here: [Channels Episode 41](/podcast/2024/ep41_nextflow_2404.html).\n:::\n\n## Table of contents\n\n- [New features](#new-features)\n - [Seqera Containers](#seqera-containers)\n - [Workflow output definition](#workflow-output-definition)\n - [Topic channels](#topic-channels)\n - [Process eval outputs](#process-eval-outputs)\n - [Resource limits](#resource-limits)\n - [Job arrays](#job-arrays)\n- [Enhancements](#enhancements)\n - [Colored logs](#colored-logs)\n - [AWS Fargate support](#aws-fargate-support)\n - [OCI auto pull mode for Singularity and Apptainer](#oci-auto-pull-mode-for-singularity-and-apptainer)\n - [Support for GA4GH TES](#support-for-ga4gh-tes)\n- [Fusion](#fusion)\n - [Enhanced Garbage Collection](#enhanced-garbage-collection)\n - [Increased File Handling Capacity](#increased-file-handling-capacity)\n - [Correct Publishing of Symbolic Links](#correct-publishing-of-symbolic-links)\n- [Other notable changes](#other-notable-changes)\n\n## New features\n\n### Seqera Containers\n\nA new flagship community offering was revealed at the Nextflow Summit 2024 Boston - **Seqera Containers**. This is a free-to-use container cache powered by [Wave](https://seqera.io/wave/), allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a [dedicated blog post](https://seqera.io/blog/introducing-seqera-pipelines-containers/) about this, but it's worth noting that the service can be used directly from Nextflow and not only through [https://seqera.io/containers/](https://seqera.io/containers/)\n\nIn order to use Seqera Containers in Nextflow, simply set `wave.freeze` _without_ setting `wave.build.repository` - for example, by using the following config for your pipeline:\n\n```groovy\nwave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'\n```\n\nAny processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether `singularity.enabled` is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the `nextflow inspect` command.\n\n### Workflow output definition\n\nThe workflow output definition is a new syntax for defining workflow outputs:\n\n```groovy\nnextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]\n```\n\nIt essentially provides a DSL2-style approach for publishing, and will replace `publishDir` once it is finalized. It also provides extra flexibility as it allows you to publish _any_ channel, not just process outputs. See the [Nextflow docs](https://nextflow.io/docs/latest/workflow.html#publishing-outputs) for more information.\n\n:::info\nThis feature is still in preview and may change in a future release.\nWe hope to finalize it in version 24.10, so don't hesitate to share any feedback with us!\n:::\n\n### Topic channels\n\nTopic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or \"topic\":\n\n```groovy\nprocess foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}\n```\n\nTopic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the `mix` operator). See the [Nextflow docs](https://nextflow.io/docs/latest/channel.html#topic) for more information.\n\n### Process `eval` outputs\n\nProcess `eval` outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:\n\n```groovy\nprocess sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}\n```\n\nThe shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a `path` or `env` output. The new `eval` output is a much more convenient way to capture this kind of command output directly. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#output-type-eval) for more information.\n\n#### Collecting software versions\n\nTogether, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:\n\n```groovy\nprocess FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}\n```\n\nThis approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq [here](https://github.com/nf-core/rnaseq/pull/1109) and [here](https://github.com/nf-core/rnaseq/pull/1115) to see them in action!\n\n### Resource limits\n\nThe **resourceLimits** directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:\n\n```groovy\nprocess.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]\n```\n\nIf a task requests more than the specified limit (e.g. due to [retry with dynamic resources](https://nextflow.io/docs/latest/process.html#dynamic-computing-resources)), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the `check_max()` function, which can now be replaced with `resourceLimits`. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#resourcelimits) for more information.\n\n### Job arrays\n\n**Job arrays** are now supported in Nextflow using the `array` directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a \"job array\" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.\n\nWith Nextflow, using job arrays is a one-liner:\n\n```groovy\nprocess.array = 100\n```\n\nYou can also enable job arrays for individual processes like any other directive. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#array) for more information.\n\n:::tip\nOn Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the `machineType` directive in conjunction with the `cpus` and `memory` directives.\n:::\n\n## Enhancements\n\n### Colored logs\n\n
\n\n**Colored logs** have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire [blog post](https://nextflow.io/blog/2024/nextflow-colored-logs.html) about it, so go check that out for more details!\n\n
\n\n![New coloured output from Nextflow](/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png)\n\n
\n\n### AWS Fargate support\n\nNextflow now supports **AWS Fargate** for AWS Batch jobs. See the [Nextflow docs](https://nextflow.io/docs/latest/aws.html#aws-fargate) for details.\n\n### OCI auto pull mode for Singularity and Apptainer\n\nNextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format.\n\nNow Nextflow allows specifying the option `ociAutoPull` both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the `exec` command.\n\n```groovy\nsingularity.ociAutoPull = true\n```\n\nThis results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/config.html#scope-singularity) for more information.\n\n### Support for GA4GH TES\n\nThe [Task Execution Service (TES)](https://ga4gh.github.io/task-execution-schemas/docs/) is an API specification, developed by [GA4GH](https://www.ga4gh.org/), which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are [Funnel](https://github.com/ohsu-comp-bio/funnel) and [TES Azure](https://github.com/microsoft/ga4gh-tes).\n\nNextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the `bin` directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:\n\n```groovy\nplugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'\n```\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/executor.html#ga4gh-tes) for more information.\n\n:::note\nTo better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, `nextflow-io/nf-ga4gh`. To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above.\n:::\n\n## Fusion\n\n[Fusion](https://seqera.io/fusion/) is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:\n\n### Enhanced Garbage Collection\n\nFusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage.\n\n### Increased File Handling Capacity\n\nSupport for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets.\n\n### Correct Publishing of Symbolic Links\n\nIn previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.\n\nThese enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users.\n\n## Other notable changes\n\n- Add native retry on spot termination for Google Batch ([`ea1c1b`](https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b))\n- Add support for instance templates in Google Batch ([`df7ed2`](https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96))\n- Allow secrets to be used with `includeConfig` ([`00c9f2`](https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d))\n- Allow secrets to be used in the pipeline script ([`df866a`](https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27))\n- Add retry strategy for publishing ([`c9c703`](https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761))\n- Add `k8s.cpuLimits` config option ([`3c6e96`](https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7))\n- Removed `seqera` and `defaults` from the standard channels used by the nf-wave plugin. ([`ec5ebd`](https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062))\n\nYou can view the full [Nextflow release notes on GitHub](https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0).", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow" @@ -739,7 +739,7 @@ "slug": "2024/nextflow-colored-logs", "title": "Nextflow's colorful new console output", "date": "2024-03-28T00:00:00.000Z", - "content": "\nNextflow is a command-line interface (CLI) tool that runs in the terminal. Everyone who has launched Nextflow from the command line knows what it’s like to follow the console output as a pipeline runs: the excitement of watching jobs zipping off as they’re submitted, the satisfaction of the phrase _\"Pipeline completed successfully!\"_ and occasionally, the sinking feeling of seeing an error message.\n\nBecause the CLI is the primary way that people interact with Nextflow, a little bit of polish can have a big effect. In this article, I’m excited to describe an upgrade for the console output that should make monitoring workflow progress just a little easier.\n\nThe new functionality is available in `24.02-0-edge` and will be included in the next `24.04.0` stable release. You can try it out now by updating Nextflow as follows:\n\n```bash\nNXF_EDGE=1 nextflow self-update\n```\n\n## Background\n\nThe Nextflow console output hasn’t changed much over the 10 years that it’s been around. The biggest update happened in 2018 when \"ANSI logging\" was released in version `18.10.0`. This replaced the stream of log messages announcing each task submission with a view that updates dynamically, giving an overview of each process. This gives an overview of the pipeline’s progress rather than being swamped with thousands of individual task submissions.\n\n
\n \"Nextflow\n
\n\nANSI console output. Nextflow log output from running the nf-core/rnaseq pipeline before (Left) and after (Right) enabling ANSI logging.\n\n
\n
\n\nI can be a little obsessive about tool user interfaces. The nf-core template, as well as MultiQC and nf-core/tools all have coloured terminal output, mostly using the excellent [textualize/rich](https://github.com/Textualize/rich). I’ve also written a couple of general-use tools around this such as [ewels/rich-click](https://github.com/ewels/rich-click/) for Python CLI help texts, and [ewels/rich-codex](https://github.com/ewels/rich-codex) to auto-generate screenshots from code / commands in markdown. The problem with being surrounded by so much colored CLI output is that any tools _without_ colors start to stand out. Dropping hints to the Nextflow team didn’t work, so eventually I whipped up [a proposal](https://github.com/nextflow-io/nextflow/issues/3976) of what the console output could look like using the tools I knew: Python and Rich. Paolo knows me well and [offered up a bait](https://github.com/nextflow-io/nextflow/issues/3976#issuecomment-1568071479) that I couldn’t resist: _\"Phil. I think this a great opportunity to improve your Groovy skills 😆\"._\n\n## Showing what’s important\n\nThe console output shown by Nextflow describes a range of information. Much of it aligns in vertical columns, but not all. There’s also a variety of fields, some of which are more important than others to see at a glance.\n\n
\n \"New\n
\n\nIntroducing: colored console output. Output from running nf-core/rnaseq with the new colors applied (nf-core header removed for clarity).\n\n
\n
\n\nWith some judicious use of the `dim` style, we can make less important information fade into the background. For example, the \"stem\" of the fully qualified process identifiers now step back to allow the process name to stand out. Secondary information such as the number of tasks that were cached, or the executor that is being submitted to, are still there to see but take a back seat. Doing the reverse with some `bold` text helps to highlight the run name – key information for identifying and resuming pipeline runs. Using color allows different fields to be easily distinguished, such as process labels and task hashes. Greens, blues, and reds in the task statuses allow a reader to get an impression of the run progress without needing to read every number.\n\nProbably the most difficult aspect technically was the `NEXTFLOW` header line. I knew I wanted to use the _\"Nextflow Green\"_ here, or as close to it as possible. But colors in the terminal are tricky. What the ANSI standard defines as `green`, `black`, and `blue` can vary significantly across different systems and terminal themes. Some people use a light color scheme and others run in dark mode. This hadn’t mattered much for most of the colors up until this point - I could use the [Jansi](https://github.com/fusesource/jansi) library to use named colors and they should look ok. But for the specific RGB of the _\"Nextflow Green\"_ I had to [hardcode specific ANSI control characters](https://github.com/nextflow-io/nextflow/blob/c9c7032c2e34132cf721ffabfea09d893adf3761/modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy#L379-L389). But it got worse - it turns out that the default Terminal app that ships with macOS only supports 256 colors, so I had to find the closest match (_\"light sea green\"_ if you’re curious). Even once the green was ok, using `black` as the text color meant that it would actually render as white with some terminal color themes and be unreadable. In the end, the header text is a very dark gray.\n\n
\n \"Testing\n
\n\nTesting color rendering across a wide range of themes in the OS X Terminal app.\n\n
\n
\n\n## More than just colors\n\nWhilst the original intent was focused on using color, it didn’t take long to come up with a shortlist of other niggles that I wanted to fix. I took this project as an opportunity to address a few of these, specifically:\n\n- Make the most of the available width in the terminal:\n - Redundant text is now cut down when the screen is narrow. Specifically the repeated `process >` text, plus other small gains such as replacing the three `...` characters with a single `…` character. The percentage-complete is removed if the window is really narrow. These changes happen dynamically every time the screen refreshes, so should update if you resize the terminal window.\n- Be more selective about which part of process names are truncated:\n - There’s only so much width that can be saved, and fully qualified process names are long. The current Nextflow console output truncates the end of the identifier if there’s no space, but this is the part that varies most between pipeline steps. Instead, we can truncate the start and preserve the process name and label.\n- Don’t show all pending processes without tasks:\n - The existing ANSI logging shows _all_ processes in the pipeline, even those that haven’t had any tasks submitted. If a pipeline has a lot of processes this can push the running processes out of view.\n - Nextflow now tracks the number of available rows in the terminal and hides pending processes once we run out of space. Running processes are always printed.\n\nThe end result is console output that makes the most of the available space in your terminal window:\n\n
\n \"Nextflow\n
\n\nProgress of the nf-core/rnaseq shown across 3 different terminal-width breakpoints, with varying levels of text truncation.\n\n
\n
\n\n## Contributing to Nextflow\n\nDespite building tools that use Nextflow for many years, I’ve spent relatively little time venturing into the main codebase myself. Just as with any contributor, part of the challenge was figuring out how to build Nextflow, how to navigate its code structure and how to write tests. I found it quite a fun experience, so I described and demoed the process in a recent nf-core Bytesize talk titled \"[Contributing to Nextflow](https://nf-co.re/events/2024/bytesize_nextflow_dev)\". You can watch the talk on [YouTube](https://www.youtube.com/watch?v=R0fqk5OS-nw), where I explain the mechanics of forking Nextflow, enhancing, compiling, and testing changes locally, and contributing enhancements back to the main code base.\n\n
\n \n
\n\n## But wait, there’s more!\n\nI’m happy with how the new console output looks, and it seems to have been well received so far. But once the warm glow of the newly merged pull request started to subside, I realized there was more to do. The console output is great for monitoring a running pipeline, but I spend most of my time these days digging through much more verbose `.nextflow.log` files. Suddenly it seemed a little unfair that these didn’t also benefit from a similar treatment.\n\nThis project was a little different because the logs are just files on the disk, meaning that I could approach the problem with whatever code stack I liked. Coincidentally, [Will McGugan](https://github.com/willmcgugan) (author of [textualize/rich](https://github.com/Textualize/rich)) was recently [writing about](https://textual.textualize.io/blog/2024/02/11/file-magic-with-the-python-standard-library/) a side project of his own: [Toolong](https://github.com/textualize/toolong). This is a terminal app built using [Textual](https://www.textualize.io/) which is specifically aimed at viewing large log files. I took it for a spin and it did a great job with Nextflow log files right out of the box, but I figured that I could take it further. At its core, Toolong uses the [Rich](https://github.com/textualize/rich) library to format text and so with a little hacking, I was able to introduce a handful of custom formatters for the Nextflow logs. And voilà, we have colored console output for log files too!\n\n
\n \"Formatting\n
\n\nThe tail end of a `.nextflow.log` file, rendered with `less` (Left) and Toolong (Right). Try finding the warning log message in both!\n\n
\n
\n\nBy using Toolong as a viewer we get much more than just syntax highlighting too - it provides powerful file navigation and search functionality. It also supports tailing files in real time, so you can launch a pipeline in one window and tail the log in another to have the best of both worlds!\n\n
\n \n
\n\nRunning nf-core/rnaseq with the new Nextflow coloured console output (Left) whilst simultaneously tailing the `.nextflow.log` file using `nf-core log` (Right).\n\n
\n
\n\nThis work with Toolong is still in two [open](https://github.com/Textualize/toolong/pull/47) [pull requests](https://github.com/nf-core/tools/pull/2895) as I write this, but hopefully you’ll soon be able to use the `nf-core log` command in a directory where you’ve run Nextflow, and it’ll launch Toolong with any log files it finds.\n", + "content": "Nextflow is a command-line interface (CLI) tool that runs in the terminal. Everyone who has launched Nextflow from the command line knows what it’s like to follow the console output as a pipeline runs: the excitement of watching jobs zipping off as they’re submitted, the satisfaction of the phrase _\"Pipeline completed successfully!\"_ and occasionally, the sinking feeling of seeing an error message.\n\nBecause the CLI is the primary way that people interact with Nextflow, a little bit of polish can have a big effect. In this article, I’m excited to describe an upgrade for the console output that should make monitoring workflow progress just a little easier.\n\nThe new functionality is available in `24.02-0-edge` and will be included in the next `24.04.0` stable release. You can try it out now by updating Nextflow as follows:\n\n```bash\nNXF_EDGE=1 nextflow self-update\n```\n\n## Background\n\nThe Nextflow console output hasn’t changed much over the 10 years that it’s been around. The biggest update happened in 2018 when \"ANSI logging\" was released in version `18.10.0`. This replaced the stream of log messages announcing each task submission with a view that updates dynamically, giving an overview of each process. This gives an overview of the pipeline’s progress rather than being swamped with thousands of individual task submissions.\n\n
\n \"Nextflow\n
\n\nANSI console output. Nextflow log output from running the nf-core/rnaseq pipeline before (Left) and after (Right) enabling ANSI logging.\n\n
\n
\n\nI can be a little obsessive about tool user interfaces. The nf-core template, as well as MultiQC and nf-core/tools all have coloured terminal output, mostly using the excellent [textualize/rich](https://github.com/Textualize/rich). I’ve also written a couple of general-use tools around this such as [ewels/rich-click](https://github.com/ewels/rich-click/) for Python CLI help texts, and [ewels/rich-codex](https://github.com/ewels/rich-codex) to auto-generate screenshots from code / commands in markdown. The problem with being surrounded by so much colored CLI output is that any tools _without_ colors start to stand out. Dropping hints to the Nextflow team didn’t work, so eventually I whipped up [a proposal](https://github.com/nextflow-io/nextflow/issues/3976) of what the console output could look like using the tools I knew: Python and Rich. Paolo knows me well and [offered up a bait](https://github.com/nextflow-io/nextflow/issues/3976#issuecomment-1568071479) that I couldn’t resist: _\"Phil. I think this a great opportunity to improve your Groovy skills 😆\"._\n\n## Showing what’s important\n\nThe console output shown by Nextflow describes a range of information. Much of it aligns in vertical columns, but not all. There’s also a variety of fields, some of which are more important than others to see at a glance.\n\n
\n \"New\n
\n\nIntroducing: colored console output. Output from running nf-core/rnaseq with the new colors applied (nf-core header removed for clarity).\n\n
\n
\n\nWith some judicious use of the `dim` style, we can make less important information fade into the background. For example, the \"stem\" of the fully qualified process identifiers now step back to allow the process name to stand out. Secondary information such as the number of tasks that were cached, or the executor that is being submitted to, are still there to see but take a back seat. Doing the reverse with some `bold` text helps to highlight the run name – key information for identifying and resuming pipeline runs. Using color allows different fields to be easily distinguished, such as process labels and task hashes. Greens, blues, and reds in the task statuses allow a reader to get an impression of the run progress without needing to read every number.\n\nProbably the most difficult aspect technically was the `NEXTFLOW` header line. I knew I wanted to use the _\"Nextflow Green\"_ here, or as close to it as possible. But colors in the terminal are tricky. What the ANSI standard defines as `green`, `black`, and `blue` can vary significantly across different systems and terminal themes. Some people use a light color scheme and others run in dark mode. This hadn’t mattered much for most of the colors up until this point - I could use the [Jansi](https://github.com/fusesource/jansi) library to use named colors and they should look ok. But for the specific RGB of the _\"Nextflow Green\"_ I had to [hardcode specific ANSI control characters](https://github.com/nextflow-io/nextflow/blob/c9c7032c2e34132cf721ffabfea09d893adf3761/modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy#L379-L389). But it got worse - it turns out that the default Terminal app that ships with macOS only supports 256 colors, so I had to find the closest match (_\"light sea green\"_ if you’re curious). Even once the green was ok, using `black` as the text color meant that it would actually render as white with some terminal color themes and be unreadable. In the end, the header text is a very dark gray.\n\n
\n \"Testing\n
\n\nTesting color rendering across a wide range of themes in the OS X Terminal app.\n\n
\n
\n\n## More than just colors\n\nWhilst the original intent was focused on using color, it didn’t take long to come up with a shortlist of other niggles that I wanted to fix. I took this project as an opportunity to address a few of these, specifically:\n\n- Make the most of the available width in the terminal:\n - Redundant text is now cut down when the screen is narrow. Specifically the repeated `process >` text, plus other small gains such as replacing the three `...` characters with a single `…` character. The percentage-complete is removed if the window is really narrow. These changes happen dynamically every time the screen refreshes, so should update if you resize the terminal window.\n- Be more selective about which part of process names are truncated:\n - There’s only so much width that can be saved, and fully qualified process names are long. The current Nextflow console output truncates the end of the identifier if there’s no space, but this is the part that varies most between pipeline steps. Instead, we can truncate the start and preserve the process name and label.\n- Don’t show all pending processes without tasks:\n - The existing ANSI logging shows _all_ processes in the pipeline, even those that haven’t had any tasks submitted. If a pipeline has a lot of processes this can push the running processes out of view.\n - Nextflow now tracks the number of available rows in the terminal and hides pending processes once we run out of space. Running processes are always printed.\n\nThe end result is console output that makes the most of the available space in your terminal window:\n\n
\n \"Nextflow\n
\n\nProgress of the nf-core/rnaseq shown across 3 different terminal-width breakpoints, with varying levels of text truncation.\n\n
\n
\n\n## Contributing to Nextflow\n\nDespite building tools that use Nextflow for many years, I’ve spent relatively little time venturing into the main codebase myself. Just as with any contributor, part of the challenge was figuring out how to build Nextflow, how to navigate its code structure and how to write tests. I found it quite a fun experience, so I described and demoed the process in a recent nf-core Bytesize talk titled \"[Contributing to Nextflow](https://nf-co.re/events/2024/bytesize_nextflow_dev)\". You can watch the talk on [YouTube](https://www.youtube.com/watch?v=R0fqk5OS-nw), where I explain the mechanics of forking Nextflow, enhancing, compiling, and testing changes locally, and contributing enhancements back to the main code base.\n\n
\n \n
\n\n## But wait, there’s more!\n\nI’m happy with how the new console output looks, and it seems to have been well received so far. But once the warm glow of the newly merged pull request started to subside, I realized there was more to do. The console output is great for monitoring a running pipeline, but I spend most of my time these days digging through much more verbose `.nextflow.log` files. Suddenly it seemed a little unfair that these didn’t also benefit from a similar treatment.\n\nThis project was a little different because the logs are just files on the disk, meaning that I could approach the problem with whatever code stack I liked. Coincidentally, [Will McGugan](https://github.com/willmcgugan) (author of [textualize/rich](https://github.com/Textualize/rich)) was recently [writing about](https://textual.textualize.io/blog/2024/02/11/file-magic-with-the-python-standard-library/) a side project of his own: [Toolong](https://github.com/textualize/toolong). This is a terminal app built using [Textual](https://www.textualize.io/) which is specifically aimed at viewing large log files. I took it for a spin and it did a great job with Nextflow log files right out of the box, but I figured that I could take it further. At its core, Toolong uses the [Rich](https://github.com/textualize/rich) library to format text and so with a little hacking, I was able to introduce a handful of custom formatters for the Nextflow logs. And voilà, we have colored console output for log files too!\n\n
\n \"Formatting\n
\n\nThe tail end of a `.nextflow.log` file, rendered with `less` (Left) and Toolong (Right). Try finding the warning log message in both!\n\n
\n
\n\nBy using Toolong as a viewer we get much more than just syntax highlighting too - it provides powerful file navigation and search functionality. It also supports tailing files in real time, so you can launch a pipeline in one window and tail the log in another to have the best of both worlds!\n\n
\n \n
\n\nRunning nf-core/rnaseq with the new Nextflow coloured console output (Left) whilst simultaneously tailing the `.nextflow.log` file using `nf-core log` (Right).\n\n
\n
\n\nThis work with Toolong is still in two [open](https://github.com/Textualize/toolong/pull/47) [pull requests](https://github.com/nf-core/tools/pull/2895) as I write this, but hopefully you’ll soon be able to use the `nf-core log` command in a directory where you’ve run Nextflow, and it’ll launch Toolong with any log files it finds.", "images": [ "/img/blog-nextflow-colored-logs/nextflow_log_with_without_ansi.png", "/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png", @@ -754,7 +754,7 @@ "slug": "2024/nextflow-nf-core-ancient-env-dna", "title": "Application of Nextflow and nf-core to ancient environmental eDNA", "date": "2024-04-17T00:00:00.000Z", - "content": "\nAncient environmental DNA (eDNA) is currently a hot topic in archaeological, ecological, and metagenomic research fields. Recent eDNA studies have shown that authentic ‘ancient’ DNA can be recovered from soil and sediments even as far back as 2 million years ago(1). However, as with most things metagenomics (the simultaneous analysis of the entire DNA content of a sample), there is a need to work at scale, processing the large datasets of many sequencing libraries to ‘fish’ out the tiny amounts of temporally degraded ancient DNA from amongst a huge swamp of contaminating modern biomolecules.\n\n\n\nThis need to work at scale, while also conducting reproducible analyses to demonstrate the authenticity of ancient DNA, lends itself to the processing of DNA with high-quality pipelines and open source workflow managers such as Nextflow. In this context, I was invited to the Australian Center for Ancient DNA (ACAD) at the University of Adelaide in February 2024 to co-teach a graduate-level course on ‘Hands-on bioinformatics for ancient environmental DNA’, alongside other members of the ancient eDNA community. Workshop participants included PhD students from across Australia, New Zealand, and even from as far away as Estonia.\n\n
\n \"Mentor\n © Photo: Peter Mundy and Australian Center for Ancient DNA\n
\n\nWe began the five-day workshop with an overview of the benefits of using workflow managers and pipelines in academic research, which include efficiency, portability, reproducibility, and fault-tolerance, and we then proceeded to introduce the Ph.D. students to installing Nextflow, and configure pipelines for running on different types of computing infrastructure.\n\n
\n \"Review\n © Photo: Peter Mundy and Australian Center for Ancient DNA\n
\n\nOver the next two days, I then introduced two well-established nf-core pipelines: [nf-core/eager](https://nf-co.re/eager) (2) and [nf-core/mag](https://nf-co.re/mag) (3), and explained to students how these pipelines can be applied to various aspects of environmental metagenomic and ancient DNA analysis:\nnf-core/eager is a dedicated ‘swiss-army-knife’ style pipeline for ancient DNA analysis that performs genetic data preprocessing, genomic alignment, variant calling, and metagenomic screening with specific tools and parameters to account for the characteristics of degraded DNA.\nnf-core/mag is a best-practice pipeline for metagenomic de novo assembly of microbial genomes that performs preprocessing, assembly, binning, bin-refinement and validation. It also contains a specific subworkflow for the authentication of ancient contigs.\nIn both cases, the students of the workshops were given practical tasks to set up and run both pipelines on real data, and time was spent exploring the extensive nf-core documentation and evaluating the outputs from MultiQC, both important components that contribute to the quality of nf-core pipelines.\n\nThe workshop was well received by students, and many were eager (pun intended) to start running Nextflow and nf-core pipelines on their own data at their own institutions.\n\nI would like to thank Vilma Pérez at ACAD for the invitation to contribute to the workshop as well as Mikkel Winther Pedersen for being my co-instructor, and the nf-core community for continued support in the development of the pipelines. Thank you also to Tina Warinner for proof-reading this blog post, and I would like to acknowledge [ACAD](https://www.adelaide.edu.au/acad/), the [University of Adelaide Environment Institute](https://www.adelaide.edu.au/environment/), the [Werner Siemens-Stiftung](https://www.wernersiemens-stiftung.ch/), [Leibniz HKI](https://www.leibniz-hki.de/), and [MPI for Evolutionary Anthropology](https://www.eva.mpg.de) for financial support to attend the workshop and support in developing nf-core pipelines.\n\n---\n\n(1) Kjær, K.H., Winther Pedersen, M., De Sanctis, B. et al. A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA. Nature **612**, 283–291 (2022). [https://doi.org/10.1038/s41586-022-05453-y](https://doi.org/10.1038/s41586-022-05453-y)\n\n(2) Fellows Yates, J.A., Lamnidis, T.C., Borry, M., Andrades Valtueña, A., Fagernäs, Z., Clayton, S., Garcia, M.U., Neukamm, J., Peltzer, A.. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:10947 (2021) [http://doi.org/10.7717/peerj.10947](http://doi.org/10.7717/peerj.10947)\n\n(3) Krakau, S., Straub, D., Gourlé, H., Gabernet, G., Nahnsen, S., nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning, NAR Genomics and Bioinformatics, **4**:1 (2022) [https://doi.org/10.1093/nargab/lqac007](https://doi.org/10.1093/nargab/lqac007)\n", + "content": "Ancient environmental DNA (eDNA) is currently a hot topic in archaeological, ecological, and metagenomic research fields. Recent eDNA studies have shown that authentic ‘ancient’ DNA can be recovered from soil and sediments even as far back as 2 million years ago(1). However, as with most things metagenomics (the simultaneous analysis of the entire DNA content of a sample), there is a need to work at scale, processing the large datasets of many sequencing libraries to ‘fish’ out the tiny amounts of temporally degraded ancient DNA from amongst a huge swamp of contaminating modern biomolecules.\n\n\n\nThis need to work at scale, while also conducting reproducible analyses to demonstrate the authenticity of ancient DNA, lends itself to the processing of DNA with high-quality pipelines and open source workflow managers such as Nextflow. In this context, I was invited to the Australian Center for Ancient DNA (ACAD) at the University of Adelaide in February 2024 to co-teach a graduate-level course on ‘Hands-on bioinformatics for ancient environmental DNA’, alongside other members of the ancient eDNA community. Workshop participants included PhD students from across Australia, New Zealand, and even from as far away as Estonia.\n\n
\n \"Mentor\n © Photo: Peter Mundy and Australian Center for Ancient DNA\n
\n\nWe began the five-day workshop with an overview of the benefits of using workflow managers and pipelines in academic research, which include efficiency, portability, reproducibility, and fault-tolerance, and we then proceeded to introduce the Ph.D. students to installing Nextflow, and configure pipelines for running on different types of computing infrastructure.\n\n
\n \"Review\n © Photo: Peter Mundy and Australian Center for Ancient DNA\n
\n\nOver the next two days, I then introduced two well-established nf-core pipelines: [nf-core/eager](https://nf-co.re/eager) (2) and [nf-core/mag](https://nf-co.re/mag) (3), and explained to students how these pipelines can be applied to various aspects of environmental metagenomic and ancient DNA analysis:\nnf-core/eager is a dedicated ‘swiss-army-knife’ style pipeline for ancient DNA analysis that performs genetic data preprocessing, genomic alignment, variant calling, and metagenomic screening with specific tools and parameters to account for the characteristics of degraded DNA.\nnf-core/mag is a best-practice pipeline for metagenomic de novo assembly of microbial genomes that performs preprocessing, assembly, binning, bin-refinement and validation. It also contains a specific subworkflow for the authentication of ancient contigs.\nIn both cases, the students of the workshops were given practical tasks to set up and run both pipelines on real data, and time was spent exploring the extensive nf-core documentation and evaluating the outputs from MultiQC, both important components that contribute to the quality of nf-core pipelines.\n\nThe workshop was well received by students, and many were eager (pun intended) to start running Nextflow and nf-core pipelines on their own data at their own institutions.\n\nI would like to thank Vilma Pérez at ACAD for the invitation to contribute to the workshop as well as Mikkel Winther Pedersen for being my co-instructor, and the nf-core community for continued support in the development of the pipelines. Thank you also to Tina Warinner for proof-reading this blog post, and I would like to acknowledge [ACAD](https://www.adelaide.edu.au/acad/), the [University of Adelaide Environment Institute](https://www.adelaide.edu.au/environment/), the [Werner Siemens-Stiftung](https://www.wernersiemens-stiftung.ch/), [Leibniz HKI](https://www.leibniz-hki.de/), and [MPI for Evolutionary Anthropology](https://www.eva.mpg.de) for financial support to attend the workshop and support in developing nf-core pipelines.\n\n---\n\n(1) Kjær, K.H., Winther Pedersen, M., De Sanctis, B. et al. A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA. Nature **612**, 283–291 (2022). [https://doi.org/10.1038/s41586-022-05453-y](https://doi.org/10.1038/s41586-022-05453-y)\n\n(2) Fellows Yates, J.A., Lamnidis, T.C., Borry, M., Andrades Valtueña, A., Fagernäs, Z., Clayton, S., Garcia, M.U., Neukamm, J., Peltzer, A.. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:10947 (2021) [http://doi.org/10.7717/peerj.10947](http://doi.org/10.7717/peerj.10947)\n\n(3) Krakau, S., Straub, D., Gourlé, H., Gabernet, G., Nahnsen, S., nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning, NAR Genomics and Bioinformatics, **4**:1 (2022) [https://doi.org/10.1093/nargab/lqac007](https://doi.org/10.1093/nargab/lqac007)", "images": [ "/img/blog-2024-04-17-img1a.jpg", "/img/blog-2024-04-17-img1b.jpg" @@ -766,7 +766,7 @@ "slug": "2024/nf-schema", "title": "nf-schema: the new and improved nf-validation", "date": "2024-05-01T00:00:00.000Z", - "content": "\nCheck out Nextflow's newest plugin, nf-schema! It's an enhanced version of nf-validation, utilizing JSON schemas to validate parameters and sample sheets. Unlike its predecessor, it supports the latest JSON schema draft and can convert pipeline-generated files. But what's the story behind its development?\n\n\n\n`nf-validation` is a well-known Nextflow plugin that uses JSON schemas to validate parameters and sample sheets. It can also convert sample sheets to channels using a built-in channel factory. On top of that, it can create a nice summary of pipeline parameters and can even be used to generate a help message for the pipeline.\n\nAll of this has made the plugin very popular in the Nextflow community, but it wasn’t without its issues. For example, the plugin uses an older version of the JSON schema draft, namely draft `07` while the latest draft is `2020-12`. It also can’t convert any files/sample sheets created by the pipeline itself since the channel factory is only able to access values from pipeline parameters.\n\nBut then `nf-schema` came to the rescue! In this plugin we rewrote large parts of the `nf-validation` code, making the plugin way faster and more flexible while adding a lot of requested features. Let’s see what’s been changed in this new and improved version of `nf-validation`.\n\n# What a shiny new JSON schema draft\n\nTo quote the official JSON schema website:\n\n> “JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.”\n\nThis one sentence does an excellent job of explaining what JSON schema is and why it was such a great fit for `nf-validation` and `nf-schema`. By using these schemas, we can validate pipeline inputs in a way that would otherwise be impossible. The JSON schema drafts define a set of annotations that are used to set some conditions to which the data has to adhere. In our case, this can be used to determine what a parameter or sample sheet value should look like (this can range from what type of data it has to be to a specific pattern that the data has to follow).\n\nThe JSON schema draft `07` already has a lot of useful annotations, but it lacked some special annotations that could elevate our validations to the next level. That’s where the JSON schema draft `2020-12` came in. This draft contained a lot more specialized annotations, like dependent requirements of values (if one value is set, another value also has to be set). Although this example was already possible in `nf-validation`, it was poorly implemented and didn’t follow any consensus specified by the JSON schema team.\n\n
\n \"meme\n
\n\n# Bye-bye Channel Factory, hello Function\n\nOne major shortcoming in the `nf-validation` plugin was the lack of the `fromSamplesheet` channel factory to handle files created by the pipeline (or files imported from another pipeline as part of a meta pipeline). That’s why we decided to remove the `fromSamplesheet` channel factory and replace it with a function called `samplesheetToList` that can be deployed in an extremely flexible way. It takes two inputs: the sample sheet to be validated and converted, and the JSON schema used for the conversion. Both inputs can either be a `String` value containing the path to the files or a Nextflow `file` object. By converting the channel factory to a function, we also decoupled the parameter schema from the actual sample sheet conversion. This means all validation and conversion of the sample sheet is now fully done by the `samplesheetToList` function. In `nf-validation`, you could add a relative path to another JSON schema to the parameter schema so that the plugin would validate the file given with that parameter using the supplied JSON schema. It was necessary to also add this for sample sheet inputs as they would not be validated otherwise. Due to the change described earlier, the schema should no longer be given to the sample sheet inputs because they will be validated twice that way. Last, but certainly not least, this function also introduces the possibility of using nested sample sheets. This was probably one of the most requested features and it’s completely possible right now! Mind that this feature only works for YAML and JSON sample sheets since CSV and TSV do not support nesting.\n\n# Configuration sensation\n\nIn `nf-validation`, you could configure how the plugin worked by certain parameters (like `validationSchemaIgnoreParams`, which could be used to exempt certain parameters from the validation). These parameters have now been converted to proper configuration options under the `validation` scope. The `validationSchemaIgnoreParams` has even been expanded into two configuration options: `validation.ignoreParams` and `validation.defaultIgnoreParams`. The former is to be used by the pipeline user to exclude certain parameters from validation, while the latter is to be used by the pipeline developer to set which parameters should be ignored by default. The plugin combines both options so users no longer need to supply the defaults alongside their parameters that need to be ignored.\n\n# But, why not stick to nf-validation?\n\nIn February we released an earlier version of these changes as `nf-validation` version `2.0.0`. This immediately caused massive issues in quite some nf-core pipelines (I think I set a new record of how many pipelines could be broken by one release). This was due to the fact that a lot of pipelines didn’t pin the `nf-validation` version, so all these pipelines started pulling the newest version of `nf-validation`. The pipelines all started showing errors because this release contained breaking changes. For that reason we decided to remove the version `2.0.0` release until more pipelines pinned their plugin versions.\n\n
\n \"meme\n
\n\nSome discussion arose from this and we decided that version `2.0.0` would always cause issues since a lot of older versions of the nf-core pipelines didn’t pin their nf-validation version either, which would mean that all those older versions (that were probably running as production pipelines) would suddenly start breaking. That’s why there seemed to be only one sensible solution: make a new plugin with the breaking changes! And it would also need a new name. We started collecting feedback from the community and got some very nice suggestions. I made a poll with the 5 most popular suggestions and let everyone vote on their preferred options. The last place was tied between `nf-schemavalidator` and `nf-validationutils`, both with 3 votes. In third place was `nf-checker` with 4 votes. The second place belonged to `nf-validation2` with 7 votes. And with 13 votes we had a winner: `nf-schema`!\n\nSo, a fork was made of `nf-validation` that we called `nf-schema`. At this point, the only breaking change was the new JSON schema draft, but some other feature requests started pouring in. That’s the reason why the new `samplesheetToList` function and the configuration options were implemented before the first release of `nf-schema` on the 22nd of April 2024.\n\nAnd to try and mitigate the same issue from ever happening again, we added an automatic warning when the pipeline is being run with an unpinned version of nf-schema:\n\n
\n \"meme\n
\n\n# So, what’s next?\n\nOne of the majorly requested features is the support for nested parameters. The version `2.0.0` already was getting pretty big so I decided not to implement any extra features into it. This is, however, one of the first features that I will try to tackle in version `2.1.0`.\n\nFurthermore, I’d also like to improve the functionality of the `exists` keyword to also work for non-conventional paths (like s3 and azure paths).\n\nIt’s also a certainty that some weird bugs will pop up over time, those will, of course, also be fixed.\n\n# Useful links\n\nHere are some useful links to get you started on using `nf-schema`:\n\nIf you want to easily migrate from nf-validation to `nf-schema`, you can use the migration guide: https://nextflow-io.github.io/nf-schema/latest/migration_guide/\n\nIf you are completely new to the plugin I suggest reading through the documentation: https://nextflow-io.github.io/nf-schema/latest/\n\nIf you need some examples, look no further: https://github.com/nextflow-io/nf-schema/tree/master/examples\n\nAnd to conclude this blog post, here are some very wise words from Master Yoda himself:\n\n
\n \"meme\n
\n", + "content": "Check out Nextflow's newest plugin, nf-schema! It's an enhanced version of nf-validation, utilizing JSON schemas to validate parameters and sample sheets. Unlike its predecessor, it supports the latest JSON schema draft and can convert pipeline-generated files. But what's the story behind its development?\n\n\n\n`nf-validation` is a well-known Nextflow plugin that uses JSON schemas to validate parameters and sample sheets. It can also convert sample sheets to channels using a built-in channel factory. On top of that, it can create a nice summary of pipeline parameters and can even be used to generate a help message for the pipeline.\n\nAll of this has made the plugin very popular in the Nextflow community, but it wasn’t without its issues. For example, the plugin uses an older version of the JSON schema draft, namely draft `07` while the latest draft is `2020-12`. It also can’t convert any files/sample sheets created by the pipeline itself since the channel factory is only able to access values from pipeline parameters.\n\nBut then `nf-schema` came to the rescue! In this plugin we rewrote large parts of the `nf-validation` code, making the plugin way faster and more flexible while adding a lot of requested features. Let’s see what’s been changed in this new and improved version of `nf-validation`.\n\n# What a shiny new JSON schema draft\n\nTo quote the official JSON schema website:\n\n> “JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.”\n\nThis one sentence does an excellent job of explaining what JSON schema is and why it was such a great fit for `nf-validation` and `nf-schema`. By using these schemas, we can validate pipeline inputs in a way that would otherwise be impossible. The JSON schema drafts define a set of annotations that are used to set some conditions to which the data has to adhere. In our case, this can be used to determine what a parameter or sample sheet value should look like (this can range from what type of data it has to be to a specific pattern that the data has to follow).\n\nThe JSON schema draft `07` already has a lot of useful annotations, but it lacked some special annotations that could elevate our validations to the next level. That’s where the JSON schema draft `2020-12` came in. This draft contained a lot more specialized annotations, like dependent requirements of values (if one value is set, another value also has to be set). Although this example was already possible in `nf-validation`, it was poorly implemented and didn’t follow any consensus specified by the JSON schema team.\n\n
\n \"meme\n
\n\n# Bye-bye Channel Factory, hello Function\n\nOne major shortcoming in the `nf-validation` plugin was the lack of the `fromSamplesheet` channel factory to handle files created by the pipeline (or files imported from another pipeline as part of a meta pipeline). That’s why we decided to remove the `fromSamplesheet` channel factory and replace it with a function called `samplesheetToList` that can be deployed in an extremely flexible way. It takes two inputs: the sample sheet to be validated and converted, and the JSON schema used for the conversion. Both inputs can either be a `String` value containing the path to the files or a Nextflow `file` object. By converting the channel factory to a function, we also decoupled the parameter schema from the actual sample sheet conversion. This means all validation and conversion of the sample sheet is now fully done by the `samplesheetToList` function. In `nf-validation`, you could add a relative path to another JSON schema to the parameter schema so that the plugin would validate the file given with that parameter using the supplied JSON schema. It was necessary to also add this for sample sheet inputs as they would not be validated otherwise. Due to the change described earlier, the schema should no longer be given to the sample sheet inputs because they will be validated twice that way. Last, but certainly not least, this function also introduces the possibility of using nested sample sheets. This was probably one of the most requested features and it’s completely possible right now! Mind that this feature only works for YAML and JSON sample sheets since CSV and TSV do not support nesting.\n\n# Configuration sensation\n\nIn `nf-validation`, you could configure how the plugin worked by certain parameters (like `validationSchemaIgnoreParams`, which could be used to exempt certain parameters from the validation). These parameters have now been converted to proper configuration options under the `validation` scope. The `validationSchemaIgnoreParams` has even been expanded into two configuration options: `validation.ignoreParams` and `validation.defaultIgnoreParams`. The former is to be used by the pipeline user to exclude certain parameters from validation, while the latter is to be used by the pipeline developer to set which parameters should be ignored by default. The plugin combines both options so users no longer need to supply the defaults alongside their parameters that need to be ignored.\n\n# But, why not stick to nf-validation?\n\nIn February we released an earlier version of these changes as `nf-validation` version `2.0.0`. This immediately caused massive issues in quite some nf-core pipelines (I think I set a new record of how many pipelines could be broken by one release). This was due to the fact that a lot of pipelines didn’t pin the `nf-validation` version, so all these pipelines started pulling the newest version of `nf-validation`. The pipelines all started showing errors because this release contained breaking changes. For that reason we decided to remove the version `2.0.0` release until more pipelines pinned their plugin versions.\n\n
\n \"meme\n
\n\nSome discussion arose from this and we decided that version `2.0.0` would always cause issues since a lot of older versions of the nf-core pipelines didn’t pin their nf-validation version either, which would mean that all those older versions (that were probably running as production pipelines) would suddenly start breaking. That’s why there seemed to be only one sensible solution: make a new plugin with the breaking changes! And it would also need a new name. We started collecting feedback from the community and got some very nice suggestions. I made a poll with the 5 most popular suggestions and let everyone vote on their preferred options. The last place was tied between `nf-schemavalidator` and `nf-validationutils`, both with 3 votes. In third place was `nf-checker` with 4 votes. The second place belonged to `nf-validation2` with 7 votes. And with 13 votes we had a winner: `nf-schema`!\n\nSo, a fork was made of `nf-validation` that we called `nf-schema`. At this point, the only breaking change was the new JSON schema draft, but some other feature requests started pouring in. That’s the reason why the new `samplesheetToList` function and the configuration options were implemented before the first release of `nf-schema` on the 22nd of April 2024.\n\nAnd to try and mitigate the same issue from ever happening again, we added an automatic warning when the pipeline is being run with an unpinned version of nf-schema:\n\n
\n \"meme\n
\n\n# So, what’s next?\n\nOne of the majorly requested features is the support for nested parameters. The version `2.0.0` already was getting pretty big so I decided not to implement any extra features into it. This is, however, one of the first features that I will try to tackle in version `2.1.0`.\n\nFurthermore, I’d also like to improve the functionality of the `exists` keyword to also work for non-conventional paths (like s3 and azure paths).\n\nIt’s also a certainty that some weird bugs will pop up over time, those will, of course, also be fixed.\n\n# Useful links\n\nHere are some useful links to get you started on using `nf-schema`:\n\nIf you want to easily migrate from nf-validation to `nf-schema`, you can use the migration guide: https://nextflow-io.github.io/nf-schema/latest/migration_guide/\n\nIf you are completely new to the plugin I suggest reading through the documentation: https://nextflow-io.github.io/nf-schema/latest/\n\nIf you need some examples, look no further: https://github.com/nextflow-io/nf-schema/tree/master/examples\n\nAnd to conclude this blog post, here are some very wise words from Master Yoda himself:\n\n
\n \"meme\n
", "images": [ "/img/blog-2024-05-01-nfschema-img1a.jpg", "/img/blog-2024-05-01-nfschema-img1b.jpg", @@ -780,7 +780,7 @@ "slug": "2024/nf-test-in-nf-core", "title": "Leveraging nf-test for enhanced quality control in nf-core", "date": "2024-04-03T00:00:00.000Z", - "content": "\n# The ever-changing landscape of bioinformatics\n\nReproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is **hopefully** being updated, and pipelines are **ideally** being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.\n\n\n\n# The Wonderful World of Testing\n\n> \"Software testing is the process of evaluating and verifying that a software product does what it is supposed to do,\"\n> Lukas Forer, co-creator of nf-test.\n\nSoftware testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs across runs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are introduced during an update.\n\n# Writing effective tests\n\nAlthough having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups:\n\n1. Which inputs/functionalities should be tested?\n2. What contents should be tested?\n\n## Consideration 1: Testing inputs/functionality\n\nGenerally, software will have a default or most common use case. For instance, the nf-core [FastQC](https://nf-co.re/modules/fastqc) module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage ([\"the degree to which a test or set of tests exercises a particular program or system\"](https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/)), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the [bowtie2/align](https://nf-co.re/modules/bowtie2_align) module, aside from input files, the `save_unaligned` and `sort_bam` parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process.\n\n## Consideration 2: Testing outputs\n\nOnce test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible (for example, outputs containing timestamps or paths). In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced.\n\n# Testing in nf-core\n\nnf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs.\n\n# New and Improved Nextflow Testing with nf-test\n\nCreated by [Lukas Forer](https://github.com/lukfor) and [Sebastian Schönherr](https://github.com/seppinho), nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform:\n\n1. **Comprehensive Output Testing**: nf-test employs [snapshots](https://www.nf-test.com/docs/assertions/snapshots/) for handling complex data structures. This feature evaluates the contents of any specified output channel/file, enabling comprehensive and reliable tests that ensure data integrity following changes.\n2. **A Consistent Testing Framework for All Nextflow Components**: nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components.\n3. **A DSL for Tests**: Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing.\n4. **Readable Assertions**: nf-test offers a wide range of functions for writing clear and understandable [assertions](https://www.nf-test.com/docs/assertions/assertions/), improving the clarity and maintainability of tests.\n5. **Boilerplate Code Generation**: To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests.\n\n# But wait… there's more!\n\nThe merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have nf-test implemented. Moreover, an [expanding collection of examples](https://nf-co.re/docs/contributing/tutorials/nf-test_assertions) is available to guide you through adopting nf-test for your projects.\n\n# Adding nf-test to pipelines\n\nSeveral nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, [nf-core/methylseq](https://nf-co.re/methylseq/) was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future.\nFollowing these insights, [nf-core/fetchngs](https://nf-co.re/fetchngs) has been at the forefront of incorporating nf-test for testing modules, subworkflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other nf-core pipelines actively integrating nf-test include [mag](https://nf-co.re/mag), [sarek](https://nf-co.re/sarek), [readsimulator](https://nf-co.re/readsimulator), and [rnaseq](https://nf-co.re/rnaseq).\n\n# Pipeline development with nf-test\n\n**For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended**. An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is [phageannotator](https://github.com/nf-core/phageannotator). Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected.\nFor those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience:\n\n1. **Leverage nf-core modules/subworkflows extensively**. Devoting time early to contribute modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests.\n2. **Prioritize incremental changes over large overhauls**. Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable.\n3. **Facilitate parallel execution of nf-test to generate and test snapshots**. By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or updating snapshots time-consuming. Implementing scripts that allow tests to run in parallel—whether via a workload manager or in the cloud—can significantly save time and simplify the process of monitoring tests for pass or fail outcomes.\n\n# Community and contribution\n\nnf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds true as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition:\n\n1. **Adding nf-tests to new and existing nf-core modules/subworkflows**. There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful [instructions](https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test) to the website, which has made this process much more streamlined.\n2. **Adding nf-tests to nf-core pipelines**. Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started ([fetchngs](https://github.com/nf-core/fetchngs/tree/master), [sarek](https://github.com/nf-core/sarek/tree/master), [rnaseq](https://github.com/nf-core/rnaseq/tree/master), [readsimulator](https://github.com/nf-core/readsimulator/tree/master), [phageannotator](https://github.com/nf-core/phageannotator)).\n\n> These are great areas to work on & contribute in nf-core hackathons\n\nThe nf-core community added a significant number of nf-tests during the recent [hackathon in March 2024](https://nf-co.re/events/2024/hackathon-march-2024). Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about a much more effective, transparent, and inclusive system than if we worked in isolation.\n\n# Future directions\n\nLooking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management.\n\n# Conclusion\n\nThe integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research.\n\nSpecial thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration.\n", + "content": "# The ever-changing landscape of bioinformatics\n\nReproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is **hopefully** being updated, and pipelines are **ideally** being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.\n\n\n\n# The Wonderful World of Testing\n\n> \"Software testing is the process of evaluating and verifying that a software product does what it is supposed to do,\"\n> Lukas Forer, co-creator of nf-test.\n\nSoftware testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs across runs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are introduced during an update.\n\n# Writing effective tests\n\nAlthough having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups:\n\n1. Which inputs/functionalities should be tested?\n2. What contents should be tested?\n\n## Consideration 1: Testing inputs/functionality\n\nGenerally, software will have a default or most common use case. For instance, the nf-core [FastQC](https://nf-co.re/modules/fastqc) module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage ([\"the degree to which a test or set of tests exercises a particular program or system\"](https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/)), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the [bowtie2/align](https://nf-co.re/modules/bowtie2_align) module, aside from input files, the `save_unaligned` and `sort_bam` parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process.\n\n## Consideration 2: Testing outputs\n\nOnce test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible (for example, outputs containing timestamps or paths). In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced.\n\n# Testing in nf-core\n\nnf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs.\n\n# New and Improved Nextflow Testing with nf-test\n\nCreated by [Lukas Forer](https://github.com/lukfor) and [Sebastian Schönherr](https://github.com/seppinho), nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform:\n\n1. **Comprehensive Output Testing**: nf-test employs [snapshots](https://www.nf-test.com/docs/assertions/snapshots/) for handling complex data structures. This feature evaluates the contents of any specified output channel/file, enabling comprehensive and reliable tests that ensure data integrity following changes.\n2. **A Consistent Testing Framework for All Nextflow Components**: nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components.\n3. **A DSL for Tests**: Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing.\n4. **Readable Assertions**: nf-test offers a wide range of functions for writing clear and understandable [assertions](https://www.nf-test.com/docs/assertions/assertions/), improving the clarity and maintainability of tests.\n5. **Boilerplate Code Generation**: To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests.\n\n# But wait… there's more!\n\nThe merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have nf-test implemented. Moreover, an [expanding collection of examples](https://nf-co.re/docs/contributing/tutorials/nf-test_assertions) is available to guide you through adopting nf-test for your projects.\n\n# Adding nf-test to pipelines\n\nSeveral nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, [nf-core/methylseq](https://nf-co.re/methylseq/) was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future.\nFollowing these insights, [nf-core/fetchngs](https://nf-co.re/fetchngs) has been at the forefront of incorporating nf-test for testing modules, subworkflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other nf-core pipelines actively integrating nf-test include [mag](https://nf-co.re/mag), [sarek](https://nf-co.re/sarek), [readsimulator](https://nf-co.re/readsimulator), and [rnaseq](https://nf-co.re/rnaseq).\n\n# Pipeline development with nf-test\n\n**For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended**. An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is [phageannotator](https://github.com/nf-core/phageannotator). Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected.\nFor those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience:\n\n1. **Leverage nf-core modules/subworkflows extensively**. Devoting time early to contribute modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests.\n2. **Prioritize incremental changes over large overhauls**. Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable.\n3. **Facilitate parallel execution of nf-test to generate and test snapshots**. By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or updating snapshots time-consuming. Implementing scripts that allow tests to run in parallel—whether via a workload manager or in the cloud—can significantly save time and simplify the process of monitoring tests for pass or fail outcomes.\n\n# Community and contribution\n\nnf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds true as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition:\n\n1. **Adding nf-tests to new and existing nf-core modules/subworkflows**. There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful [instructions](https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test) to the website, which has made this process much more streamlined.\n2. **Adding nf-tests to nf-core pipelines**. Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started ([fetchngs](https://github.com/nf-core/fetchngs/tree/master), [sarek](https://github.com/nf-core/sarek/tree/master), [rnaseq](https://github.com/nf-core/rnaseq/tree/master), [readsimulator](https://github.com/nf-core/readsimulator/tree/master), [phageannotator](https://github.com/nf-core/phageannotator)).\n\n> These are great areas to work on & contribute in nf-core hackathons\n\nThe nf-core community added a significant number of nf-tests during the recent [hackathon in March 2024](https://nf-co.re/events/2024/hackathon-march-2024). Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about a much more effective, transparent, and inclusive system than if we worked in isolation.\n\n# Future directions\n\nLooking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management.\n\n# Conclusion\n\nThe integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research.\n\nSpecial thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration.", "images": [], "author": "Carson Miller", "tags": "nextflow,nf-core,nf-test,ambassador_post" @@ -789,7 +789,7 @@ "slug": "2024/nxf-nf-core-workshop-kogo", "title": "Nextflow workshop at the 20th KOGO Winter Symposium", "date": "2024-03-14T00:00:00.000Z", - "content": "\nThrough a partnership between AWS Asia Pacific and Japan, and Seqera, Nextflow touched ground in South Korea for the first time with a training session at the Korea Genome Organization (KOGO) Winter Symposium. The objective was to introduce participants to Nextflow, empowering them to craft their own pipelines. Recognizing the interest among bioinformaticians, MinSung Cho from AWS Korea’s Healthcare & Research Team decided to sponsor this 90-minute workshop session. This initiative covered my travel expenses and accommodations.\n\n\n\n
\n \"Nextflow\n
\n\nThe training commenced with an overview of Nextflow pipelines, exemplified by the [nf-core/nanoseq](https://nf-co.re/nanoseq/3.1.0) Nextflow pipeline, highlighting the subworkflows and modules. nfcore/nanoseq is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data that can be used to perform base-calling, demultiplexing, QC, alignment, and downstream analysis. Following this, participants engaged in a hands-on workshop using the AWS Cloud9 environment. In 70 minutes, they constructed a basic pipeline for analyzing nanopore sequencing data, incorporating workflow templates, modules, and subworkflows from [nf-core/tools](https://github.com/nf-core/tools). If you're interested in learning more about the nf-core/nanoseq Nextflow pipeline, I recorded a video talking about it in the nf-core bytesize meeting. You can watch it [here](https://www.youtube.com/watch?v=KM1A0_GD2vQ).\n\n
\n \"Slide\n
\n\nYou can find the workshop slides [here](https://docs.google.com/presentation/d/1OC4ccgbrNet4e499ShIT7S6Gm6S0xr38_OauKPa4G88/edit?usp=sharing) and the GitHub repository with source code [here](https://github.com/yuukiiwa/nf-core-koreaworkshop).\n\nThe workshop received positive feedback, with participants expressing interest in further sessions to deepen their Nextflow proficiency. Due to this feedback, AWS and the nf-core outreach team are considering organizing small-group local or Zoom training sessions in response to these requests.\n\nIt is imperative to acknowledge the invaluable contributions and support from AWS Korea’s Health Care & Research Team, including MinSung Cho, HyunMin Kim, YoungUng Kim, SeungChang Kang, and Jiyoon Hwang, without whom this workshop would not have been possible. Gratitude is also extended to Charlie Lee for fostering collaboration with the nf-core/outreach team.\n", + "content": "Through a partnership between AWS Asia Pacific and Japan, and Seqera, Nextflow touched ground in South Korea for the first time with a training session at the Korea Genome Organization (KOGO) Winter Symposium. The objective was to introduce participants to Nextflow, empowering them to craft their own pipelines. Recognizing the interest among bioinformaticians, MinSung Cho from AWS Korea’s Healthcare & Research Team decided to sponsor this 90-minute workshop session. This initiative covered my travel expenses and accommodations.\n\n\n\n
\n \"Nextflow\n
\n\nThe training commenced with an overview of Nextflow pipelines, exemplified by the [nf-core/nanoseq](https://nf-co.re/nanoseq/3.1.0) Nextflow pipeline, highlighting the subworkflows and modules. nfcore/nanoseq is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data that can be used to perform base-calling, demultiplexing, QC, alignment, and downstream analysis. Following this, participants engaged in a hands-on workshop using the AWS Cloud9 environment. In 70 minutes, they constructed a basic pipeline for analyzing nanopore sequencing data, incorporating workflow templates, modules, and subworkflows from [nf-core/tools](https://github.com/nf-core/tools). If you're interested in learning more about the nf-core/nanoseq Nextflow pipeline, I recorded a video talking about it in the nf-core bytesize meeting. You can watch it [here](https://www.youtube.com/watch?v=KM1A0_GD2vQ).\n\n
\n \"Slide\n
\n\nYou can find the workshop slides [here](https://docs.google.com/presentation/d/1OC4ccgbrNet4e499ShIT7S6Gm6S0xr38_OauKPa4G88/edit?usp=sharing) and the GitHub repository with source code [here](https://github.com/yuukiiwa/nf-core-koreaworkshop).\n\nThe workshop received positive feedback, with participants expressing interest in further sessions to deepen their Nextflow proficiency. Due to this feedback, AWS and the nf-core outreach team are considering organizing small-group local or Zoom training sessions in response to these requests.\n\nIt is imperative to acknowledge the invaluable contributions and support from AWS Korea’s Health Care & Research Team, including MinSung Cho, HyunMin Kim, YoungUng Kim, SeungChang Kang, and Jiyoon Hwang, without whom this workshop would not have been possible. Gratitude is also extended to Charlie Lee for fostering collaboration with the nf-core/outreach team.", "images": [ "/img/blog-2024-03-14-kogo-img1a.jpg", "/img/blog-2024-03-14-kogo-img1b.png" @@ -801,7 +801,7 @@ "slug": "2024/optimizing-nextflow-for-hpc-and-cloud-at-scale", "title": "Optimizing Nextflow for HPC and Cloud at Scale", "date": "2024-01-17T00:00:00.000Z", - "content": "\n## Introduction\n\nA Nextflow workflow run consists of the head job (Nextflow itself) and compute tasks (defined in the pipeline script). It is common to request resources for the tasks via process directives such as `cpus` and `memory`, but the Nextflow head job also requires compute resources. Most of the time, users don’t need to explicitly define the head job resources, as Nextflow generally does a good job of allocating resources for itself. For very large workloads, however, head job resource sizing becomes much more important.\n\nIn this article, we will help you understand how the Nextflow head job works and show you how to tune head job resources such as CPUs and memory for your use case.\n\n\n\n## Head job resources\n\n### CPUs\n\nNextflow uses a thread pool to run native Groovy code (e.g. channel operators, `exec` processes), submit tasks to executors, and publish output files. The number of threads is based on the number of available CPUs, so if you want to provide more compute power to the head job, simply allocate more CPUs and Nextflow will use them. In the [Seqera Platform](https://seqera.io/platform/), you can use **Head Job CPUs** or **Head Job submit options** (depending on the compute environment) to allocate more CPUs.\n\n### Memory\n\nNextflow runs on the Java Virtual Machine (JVM), so it allocates memory based on the standard JVM options, specifically the initial and maximum heap size. You can view the default JVM options for your environment by running this command:\n\n```bash\njava -XX:+PrintFlagsFinal -version | grep 'HeapSize\\|RAM'\n```\n\nFor example, here are the JVM options for an environment with 8 GB of RAM and OpenJDK Temurin 17.0.6:\n\n```\n size_t ErgoHeapSizeLimit = 0\n size_t HeapSizePerGCThread = 43620760\n size_t InitialHeapSize = 127926272\n uintx InitialRAMFraction = 64\n double InitialRAMPercentage = 1.562500\n size_t LargePageHeapSizeThreshold = 134217728\n size_t MaxHeapSize = 2044723200\n uint64_t MaxRAM = 137438953472\n uintx MaxRAMFraction = 4\n double MaxRAMPercentage = 25.000000\n size_t MinHeapSize = 8388608\n uintx MinRAMFraction = 2\n double MinRAMPercentage = 50.000000\n uintx NonNMethodCodeHeapSize = 5839372\n uintx NonProfiledCodeHeapSize = 122909434\n uintx ProfiledCodeHeapSize = 122909434\n size_t SoftMaxHeapSize = 2044723200\n```\n\nThese settings (displayed in bytes) show an initial and maximum heap size of ~128MB and ~2GB, or 1/64 (1.5625%) and 1/4 (25%) of physical memory. These percentages are the typical default settings, although different environments may have different defaults. In the Seqera Platform, the default settings are 40% and 75%, respectively.\n\nYou can set these options for Nextflow at runtime, for example:\n\n```bash\n# absolute values\nexport NXF_JVM_ARGS=\"-Xms2g -Xmx6g\"\n\n# percentages\nexport NXF_JVM_ARGS=\"-XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=75\"\n```\n\nIf you need to provide more memory to Nextflow, you can (1) allocate more memory to the head job and/or (2) use `NXF_JVM_ARGS` to increase the percentage of available memory that Nextflow can use. In the Seqera Platform, you can use **Head Job memory** or **Head Job submit options** (depending on the compute environment) to allocate more memory.\n\n### Disk\n\nThe Nextflow head job is generally responsible for downloading software dependencies and transferring inputs and outputs, but the details vary depending on the environment:\n\n- In an HPC environment, the home directory is typically used to store pipeline code and container images, while the work directory is typically stored in high-performance shared storage. Within the work directory, task inputs are staged from previous tasks via symlinks. Remote inputs (e.g. from HTTP or S3) are first staged into the work directory and then symlinked into the task directory.\n- In a cloud environment like AWS Batch, each task is responsible for pulling its own container image, downloading input files from the work directory (e.g. in S3), and uploading outputs. The head job’s local storage is only used to download the pipeline code.\n\nOverall, the head job uses very little local storage, since most data is saved to shared storage (HPC) or object storage (cloud) rather than the head job itself. However, there are a few specific cases to keep in mind, which we will cover in the following section.\n\n## Common failure modes\n\n### Not enough CPUs for local tasks\n\nIf your workflow has any tasks that use the local executor, make sure the Nextflow head job has enough CPUs to execute these tasks. For example, if a local task requires 4 CPUs, the Nextflow head job should have at least 5 CPUs (the local executor reserves 1 CPU for Nextflow by default).\n\n### Not enough memory for native pipeline code\n\nNextflow pipelines are a combination of native Groovy code (channels, operators, `exec` processes) and embedded shell scripts (`script` processes). Native code is executed directly by the Nextflow head job, while tasks with shell scripts are delegated to executors. Typically, tasks are used to perform the “actual” computations, while channels and operators are used to pass data between tasks.\n\nHowever much Groovy code you write, keep in mind that the Nextflow head job needs to have enough memory to execute it at the desired scale. The simplest way to determine how much memory Nextflow needs is to iteratively allocate more memory to the head job until it succeeds (e.g. start with 1 GB, then 2 GB, then 4 GB, and so on). In general, 2-4 GB is more than enough memory for the Nextflow head job.\n\n### Not enough memory to stage and publish files\n\nIn Nextflow, input files can come from a variety of sources: local files, an HTTP or FTP server, an S3 bucket, etc. When an input file is not local, Nextflow automatically stages the file into the work directory. Similarly, when a `publishDir` directive points to a remote path, Nextflow automatically “publishes” the output files using the correct protocol. These transfers are usually performed in-memory.\n\nMany users have encountered head job errors when running large-scale workloads, where the head job runs out of memory while staging or publishing files. While you can try to give more and more memory to Nextflow as in the previous example, you might be able to fix your problem by simply updating your Nextflow version. There have been many improvements to Nextflow over the past few years around file staging, particularly with S3, and overall we have seen fewer out-of-memory errors of this kind.\n\n### Not enough disk storage to build Singularity images\n\nSingularity / Apptainer can download and convert Docker images on the fly, and it uses the head job’s local scratch storage to do so. This is a common pattern in HPC environments, since container images are usually published as Docker images but HPC environments usually require the use of a rootless container runtime like Singularity. In this case, make sure the head job has enough scratch storage to build each image, even if the image is eventually saved to shared storage.\n\nSince Nextflow version [23.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0), you can use [Wave](https://seqera.io/wave/) to build Singularity images for you. Refer to the [Nextflow documentation](https://nextflow.io/docs/latest/wave.html#build-singularity-native-images) for more details.\n\nAdditionally, Nextflow version [23.11.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v23.11.0-edge) introduced support for [Singularity OCI mode](https://docs.sylabs.io/guides/3.1/user-guide/oci_runtime.html), which allows Singularity / Apptainer to use the OCI container format (the same as Docker) instead of having to build and store a SIF container image locally.\n\n### Failures due to head job and tasks sharing local storage\n\nThere are some situations where the head job and tasks may run on the same node and thereby share the node’s local storage, for example, Kubernetes. If this storage becomes full, any one of the jobs might fail first, including the head job. You can avoid this problem by segregating the head job to its own node, or explicitly requesting disk storage for each task so that they each have sufficient storage.\n\n## Virtual threads\n\n[Virtual threads](https://www.infoq.com/articles/java-virtual-threads/) were introduced in Java 19 and finalized in Java 21. Whereas threads in Java are normally “platform” threads managed by the operating system, “virtual” threads are user-space threads that share a pool of platform threads. Virtual threads use less memory and can be context-switched faster than platform threads, so an application that uses a fixed-size pool of platform threads (e.g. one thread per CPU) could instead have thousands of virtual threads (one thread per “task”) with the same memory footprint and more flexibility – if a virtual thread is blocked (i.e. waiting on I/O), the underlying platform thread can be switched to another virtual thread that isn’t blocked.\n\nSince Nextflow [23.05.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v23.05.0-edge), you can enable virtual threads by using Java 19 or later and setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`. Since version [23.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0), when using Java 21, virtual threads are enabled by default.\n\n### Initial Benchmark: S3 Upload\n\nVirtual threads are particularly useful when there are many I/O-bound tasks, such as uploading many files to S3. So to demonstrate this benefit, we wrote a pipeline… that uploads many files to S3! Here is the core pipeline code:\n\n```groovy\nparams.upload_count = 1000\nparams.upload_size = '10M'\n\nprocess make_random_file {\n publishDir 's3://my-bucket/data/'\n\n input:\n val index\n val size\n\n output:\n path '*.data'\n\n script:\n \"\"\"\n dd \\\n if=/dev/random \\\n of=upload-${size}-${index}.data \\\n bs=1 count=0 seek=${size}\n \"\"\"\n}\n\nworkflow {\n index = Channel.of(1..params.upload_count)\n make_random_file(index, params.upload_size)\n}\n```\n\nThe full source code is available on [GitHub](https://github.com/bentsherman/nf-head-job-benchmark).\n\nWe ran this pipeline across a variety of file sizes and counts, and the results are shown below. Error bars denote +/- 1 standard deviation across three independent trials.\n\nAt larger scales, virtual threads significantly reduce the total runtime, at the cost of higher CPU and memory usage. Considering that the head job resources are typically underutilized anyway, we think the lower time-to-solution is a decent trade!\n\nThe reason why virtual threads are faster in this case is that Nextflow usually spends extra time waiting for files to be published after all tasks have completed. Normally, these publishing tasks are executed by a fixed-size thread pool based on the number of CPUs, but with virtual threads there is no such limit, so Nextflow can fully utilize the available network bandwidth. In the largest case (1000x 100 MB files), virtual threads reduce the runtime by over 30%.\n\n
\n \"CPU\n
Figure 1: CPU usage
\n
\n\n
\n \"Memory\n
Figure 2: Memory usage
\n
\n\n
\n \"Workflow\n
Figure 3: Workflow runtime
\n
\n\n### Realistic Benchmark: nf-core/rnaseq\n\nTo evaluate virtual threads on a real pipeline, we also ran [nf-core/rnaseq](https://github.com/nf-core/rnaseq) with the `test` profile. To simulate a run with many samples, we upsampled the test dataset to 1000 samples. The results are summarized below:\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WalltimeMemory
Platform threads2h 51m1.5 GB
Virtual threads2h 47m1.9 GB
\n\nAs you can see, the benefit here is not so clear. Whereas the upload benchmark was almost entirely I/O, a typical Nextflow pipeline spends most of its time scheduling compute tasks and waiting for them to finish. These tasks are generally not I/O bound and do not block for very long, so there may be little opportunity for improvement from virtual threads.\n\nThat being said, this benchmark consisted of only two runs of nf-core/rnaseq. We didn’t perform more runs here because they were so large, so your results may vary. In particular, if your Nextflow runs spend a lot of time publishing outputs after all the compute tasks have completed, you will likely benefit the most from using virtual threads. In any case, virtual threads should perform at least as well as platform threads, albeit with higher memory usage in some cases.\n\n## Summary\n\nThe key to right-sizing the Nextflow head job is to understand which parts of a Nextflow pipeline are executed directly by Nextflow, and which parts are delegated to compute tasks. This knowledge will help prevent head job failures at scale.\n\nHere are the main takeaways:\n\n- Nextflow uses a thread pool based on the number of available CPUs.\n- Nextflow uses a maximum heap size based on the standard JVM options, which is typically 25% of physical memory (75% in the Seqera Platform).\n- You can use `NXF_JVM_ARGS` to make more system memory available to Nextflow.\n- The easiest way to figure out how much memory Nextflow needs is to iteratively double the memory allocation until the workflow succeeds (but usually 2-4 GB is enough).\n- You can enable virtual threads in Nextflow, which may reduce overall runtime for some pipelines.\n", + "content": "## Introduction\n\nA Nextflow workflow run consists of the head job (Nextflow itself) and compute tasks (defined in the pipeline script). It is common to request resources for the tasks via process directives such as `cpus` and `memory`, but the Nextflow head job also requires compute resources. Most of the time, users don’t need to explicitly define the head job resources, as Nextflow generally does a good job of allocating resources for itself. For very large workloads, however, head job resource sizing becomes much more important.\n\nIn this article, we will help you understand how the Nextflow head job works and show you how to tune head job resources such as CPUs and memory for your use case.\n\n\n\n## Head job resources\n\n### CPUs\n\nNextflow uses a thread pool to run native Groovy code (e.g. channel operators, `exec` processes), submit tasks to executors, and publish output files. The number of threads is based on the number of available CPUs, so if you want to provide more compute power to the head job, simply allocate more CPUs and Nextflow will use them. In the [Seqera Platform](https://seqera.io/platform/), you can use **Head Job CPUs** or **Head Job submit options** (depending on the compute environment) to allocate more CPUs.\n\n### Memory\n\nNextflow runs on the Java Virtual Machine (JVM), so it allocates memory based on the standard JVM options, specifically the initial and maximum heap size. You can view the default JVM options for your environment by running this command:\n\n```bash\njava -XX:+PrintFlagsFinal -version | grep 'HeapSize\\|RAM'\n```\n\nFor example, here are the JVM options for an environment with 8 GB of RAM and OpenJDK Temurin 17.0.6:\n\n```\n size_t ErgoHeapSizeLimit = 0\n size_t HeapSizePerGCThread = 43620760\n size_t InitialHeapSize = 127926272\n uintx InitialRAMFraction = 64\n double InitialRAMPercentage = 1.562500\n size_t LargePageHeapSizeThreshold = 134217728\n size_t MaxHeapSize = 2044723200\n uint64_t MaxRAM = 137438953472\n uintx MaxRAMFraction = 4\n double MaxRAMPercentage = 25.000000\n size_t MinHeapSize = 8388608\n uintx MinRAMFraction = 2\n double MinRAMPercentage = 50.000000\n uintx NonNMethodCodeHeapSize = 5839372\n uintx NonProfiledCodeHeapSize = 122909434\n uintx ProfiledCodeHeapSize = 122909434\n size_t SoftMaxHeapSize = 2044723200\n```\n\nThese settings (displayed in bytes) show an initial and maximum heap size of ~128MB and ~2GB, or 1/64 (1.5625%) and 1/4 (25%) of physical memory. These percentages are the typical default settings, although different environments may have different defaults. In the Seqera Platform, the default settings are 40% and 75%, respectively.\n\nYou can set these options for Nextflow at runtime, for example:\n\n```bash\n# absolute values\nexport NXF_JVM_ARGS=\"-Xms2g -Xmx6g\"\n\n# percentages\nexport NXF_JVM_ARGS=\"-XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=75\"\n```\n\nIf you need to provide more memory to Nextflow, you can (1) allocate more memory to the head job and/or (2) use `NXF_JVM_ARGS` to increase the percentage of available memory that Nextflow can use. In the Seqera Platform, you can use **Head Job memory** or **Head Job submit options** (depending on the compute environment) to allocate more memory.\n\n### Disk\n\nThe Nextflow head job is generally responsible for downloading software dependencies and transferring inputs and outputs, but the details vary depending on the environment:\n\n- In an HPC environment, the home directory is typically used to store pipeline code and container images, while the work directory is typically stored in high-performance shared storage. Within the work directory, task inputs are staged from previous tasks via symlinks. Remote inputs (e.g. from HTTP or S3) are first staged into the work directory and then symlinked into the task directory.\n- In a cloud environment like AWS Batch, each task is responsible for pulling its own container image, downloading input files from the work directory (e.g. in S3), and uploading outputs. The head job’s local storage is only used to download the pipeline code.\n\nOverall, the head job uses very little local storage, since most data is saved to shared storage (HPC) or object storage (cloud) rather than the head job itself. However, there are a few specific cases to keep in mind, which we will cover in the following section.\n\n## Common failure modes\n\n### Not enough CPUs for local tasks\n\nIf your workflow has any tasks that use the local executor, make sure the Nextflow head job has enough CPUs to execute these tasks. For example, if a local task requires 4 CPUs, the Nextflow head job should have at least 5 CPUs (the local executor reserves 1 CPU for Nextflow by default).\n\n### Not enough memory for native pipeline code\n\nNextflow pipelines are a combination of native Groovy code (channels, operators, `exec` processes) and embedded shell scripts (`script` processes). Native code is executed directly by the Nextflow head job, while tasks with shell scripts are delegated to executors. Typically, tasks are used to perform the “actual” computations, while channels and operators are used to pass data between tasks.\n\nHowever much Groovy code you write, keep in mind that the Nextflow head job needs to have enough memory to execute it at the desired scale. The simplest way to determine how much memory Nextflow needs is to iteratively allocate more memory to the head job until it succeeds (e.g. start with 1 GB, then 2 GB, then 4 GB, and so on). In general, 2-4 GB is more than enough memory for the Nextflow head job.\n\n### Not enough memory to stage and publish files\n\nIn Nextflow, input files can come from a variety of sources: local files, an HTTP or FTP server, an S3 bucket, etc. When an input file is not local, Nextflow automatically stages the file into the work directory. Similarly, when a `publishDir` directive points to a remote path, Nextflow automatically “publishes” the output files using the correct protocol. These transfers are usually performed in-memory.\n\nMany users have encountered head job errors when running large-scale workloads, where the head job runs out of memory while staging or publishing files. While you can try to give more and more memory to Nextflow as in the previous example, you might be able to fix your problem by simply updating your Nextflow version. There have been many improvements to Nextflow over the past few years around file staging, particularly with S3, and overall we have seen fewer out-of-memory errors of this kind.\n\n### Not enough disk storage to build Singularity images\n\nSingularity / Apptainer can download and convert Docker images on the fly, and it uses the head job’s local scratch storage to do so. This is a common pattern in HPC environments, since container images are usually published as Docker images but HPC environments usually require the use of a rootless container runtime like Singularity. In this case, make sure the head job has enough scratch storage to build each image, even if the image is eventually saved to shared storage.\n\nSince Nextflow version [23.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0), you can use [Wave](https://seqera.io/wave/) to build Singularity images for you. Refer to the [Nextflow documentation](https://nextflow.io/docs/latest/wave.html#build-singularity-native-images) for more details.\n\nAdditionally, Nextflow version [23.11.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v23.11.0-edge) introduced support for [Singularity OCI mode](https://docs.sylabs.io/guides/3.1/user-guide/oci_runtime.html), which allows Singularity / Apptainer to use the OCI container format (the same as Docker) instead of having to build and store a SIF container image locally.\n\n### Failures due to head job and tasks sharing local storage\n\nThere are some situations where the head job and tasks may run on the same node and thereby share the node’s local storage, for example, Kubernetes. If this storage becomes full, any one of the jobs might fail first, including the head job. You can avoid this problem by segregating the head job to its own node, or explicitly requesting disk storage for each task so that they each have sufficient storage.\n\n## Virtual threads\n\n[Virtual threads](https://www.infoq.com/articles/java-virtual-threads/) were introduced in Java 19 and finalized in Java 21. Whereas threads in Java are normally “platform” threads managed by the operating system, “virtual” threads are user-space threads that share a pool of platform threads. Virtual threads use less memory and can be context-switched faster than platform threads, so an application that uses a fixed-size pool of platform threads (e.g. one thread per CPU) could instead have thousands of virtual threads (one thread per “task”) with the same memory footprint and more flexibility – if a virtual thread is blocked (i.e. waiting on I/O), the underlying platform thread can be switched to another virtual thread that isn’t blocked.\n\nSince Nextflow [23.05.0-edge](https://github.com/nextflow-io/nextflow/releases/tag/v23.05.0-edge), you can enable virtual threads by using Java 19 or later and setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`. Since version [23.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0), when using Java 21, virtual threads are enabled by default.\n\n### Initial Benchmark: S3 Upload\n\nVirtual threads are particularly useful when there are many I/O-bound tasks, such as uploading many files to S3. So to demonstrate this benefit, we wrote a pipeline… that uploads many files to S3! Here is the core pipeline code:\n\n```groovy\nparams.upload_count = 1000\nparams.upload_size = '10M'\n\nprocess make_random_file {\n publishDir 's3://my-bucket/data/'\n\n input:\n val index\n val size\n\n output:\n path '*.data'\n\n script:\n \"\"\"\n dd \\\n if=/dev/random \\\n of=upload-${size}-${index}.data \\\n bs=1 count=0 seek=${size}\n \"\"\"\n}\n\nworkflow {\n index = Channel.of(1..params.upload_count)\n make_random_file(index, params.upload_size)\n}\n```\n\nThe full source code is available on [GitHub](https://github.com/bentsherman/nf-head-job-benchmark).\n\nWe ran this pipeline across a variety of file sizes and counts, and the results are shown below. Error bars denote +/- 1 standard deviation across three independent trials.\n\nAt larger scales, virtual threads significantly reduce the total runtime, at the cost of higher CPU and memory usage. Considering that the head job resources are typically underutilized anyway, we think the lower time-to-solution is a decent trade!\n\nThe reason why virtual threads are faster in this case is that Nextflow usually spends extra time waiting for files to be published after all tasks have completed. Normally, these publishing tasks are executed by a fixed-size thread pool based on the number of CPUs, but with virtual threads there is no such limit, so Nextflow can fully utilize the available network bandwidth. In the largest case (1000x 100 MB files), virtual threads reduce the runtime by over 30%.\n\n
\n \"CPU\n
Figure 1: CPU usage
\n
\n\n
\n \"Memory\n
Figure 2: Memory usage
\n
\n\n
\n \"Workflow\n
Figure 3: Workflow runtime
\n
\n\n### Realistic Benchmark: nf-core/rnaseq\n\nTo evaluate virtual threads on a real pipeline, we also ran [nf-core/rnaseq](https://github.com/nf-core/rnaseq) with the `test` profile. To simulate a run with many samples, we upsampled the test dataset to 1000 samples. The results are summarized below:\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
WalltimeMemory
Platform threads2h 51m1.5 GB
Virtual threads2h 47m1.9 GB
\n\nAs you can see, the benefit here is not so clear. Whereas the upload benchmark was almost entirely I/O, a typical Nextflow pipeline spends most of its time scheduling compute tasks and waiting for them to finish. These tasks are generally not I/O bound and do not block for very long, so there may be little opportunity for improvement from virtual threads.\n\nThat being said, this benchmark consisted of only two runs of nf-core/rnaseq. We didn’t perform more runs here because they were so large, so your results may vary. In particular, if your Nextflow runs spend a lot of time publishing outputs after all the compute tasks have completed, you will likely benefit the most from using virtual threads. In any case, virtual threads should perform at least as well as platform threads, albeit with higher memory usage in some cases.\n\n## Summary\n\nThe key to right-sizing the Nextflow head job is to understand which parts of a Nextflow pipeline are executed directly by Nextflow, and which parts are delegated to compute tasks. This knowledge will help prevent head job failures at scale.\n\nHere are the main takeaways:\n\n- Nextflow uses a thread pool based on the number of available CPUs.\n- Nextflow uses a maximum heap size based on the standard JVM options, which is typically 25% of physical memory (75% in the Seqera Platform).\n- You can use `NXF_JVM_ARGS` to make more system memory available to Nextflow.\n- The easiest way to figure out how much memory Nextflow needs is to iteratively double the memory allocation until the workflow succeeds (but usually 2-4 GB is enough).\n- You can enable virtual threads in Nextflow, which may reduce overall runtime for some pipelines.", "images": [ "/img/blog-2024-01-17--s3-upload-cpu.png", "/img/blog-2024-01-17--s3-upload-memory.png", @@ -814,7 +814,7 @@ "slug": "2024/reflecting-ambassador-collaboration", "title": "Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador", "date": "2024-06-19T00:00:00.000Z", - "content": "\nAs a Nextflow Ambassador and a PhD student working in bioinformatics, I’ve always believed in the power of collaboration. Over the past six months, I’ve had the privilege of working with another PhD student specializing in metagenomics environmental science. This collaboration began through a simple email after the other researcher discovered my contact information on the ambassadors’ list page. It has been a journey of learning, problem-solving, and mutual growth. I’d like to share some reflections on this experience, highlighting both the challenges and the rewards.\n\n\n\n## Connecting Across Disciplines\n\nOur partnership began with a simple question about running one of nf-core’s metagenomics analysis pipelines. Despite being in different parts of Europe and coming from different academic backgrounds, we quickly found common ground. The combination of our expertise – my focus on bioinformatics workflows and their deep knowledge of microbial ecosystems – created a synergy that enriched our work.\n\n## Navigating Challenges Together\n\nLike any collaboration, ours was not without its difficulties. We faced numerous technical challenges, from optimizing computational resources to troubleshooting pipeline errors. There were moments of frustration when things didn’t work as expected. However, each challenge was an opportunity to learn and grow. Working through these challenges together made them much more manageable and even enjoyable at times. We focused on mastering Nextflow in a high-performance computing (HPC) environment, managing large datasets, and conducting comprehensive data analysis. Additionally, we explored effective data visualization techniques to better interpret and present the findings.\nWe leaned heavily on the Nextflow and nf-core community for support. The extensive documentation and guides were invaluable, and the different Slack channels provided real-time problem-solving assistance. Having the possibility of contacting the main developers of the pipeline that was troubling was a great resource that we are fortunate to have. The community’s willingness to share and offer help was a constant source of encouragement, making us feel supported every step of the way.\n\n## Learning and Growing\n\nOver the past six months, we’ve both learned a tremendous amount. The other PhD student became more adept at using and understanding Nextflow, particularly when running the nf-core/ampliseq pipeline, managing files, and handling high-performance computing (HPC) environments. I, on the other hand, gained a deeper understanding of environmental microbiomes and the specific needs of metagenomics research.\nOur sessions were highly collaborative, allowing us to share knowledge and insights freely. It was reassuring to know that we weren’t alone in our journey and that there was a whole community of researchers ready to share their wisdom and experiences. These interactions made our learning process more rewarding.\n\n## Achieving Synergy\n\nOne of the most rewarding aspects of this collaboration has been the synergy between our different backgrounds. Our combined expertise enabled us to efficiently analyze a high volume of metagenomics samples. The journey does not stop here, of course. Now that they have their samples processed, it comes the time to interpret the data, one of my favorite parts. Our work together highlighted the potential for Nextflow and the nf-core community to facilitate research across diverse fields. The collaboration has been a testament to the idea that when individuals from different disciplines come together, they can achieve more than they could alone.\nThis collaboration is poised to result in significant academic contributions. The other PhD student is preparing to publish a paper with the findings enabled by the use of the nf-core/ampliseq pipeline, which will be a key component of their thesis. This paper is going to serve as an excellent example of using Nextflow and nf-core pipelines in the field of metagenomics environmental science.\n\n## Reflecting on the Journey\n\nAs I reflect on these six months, I’m struck by the power of this community in fostering such collaborations. The support network, comprehensive resources, and culture of knowledge sharing have been essential in our success. This experience has reinforced my belief in the importance of open-source bioinformatics and data science communities for professional development and scientific advancement. Through it all, having a collaborator who understood the struggles and celebrated the successes with me made the journey all the more rewarding.\nMoving forward, I’m excited about the potential for more such collaborations. The past six months have been a journey of discovery and growth, and I’m grateful for the opportunity to work with such a dedicated and talented researcher. Our work is far from over, and I look forward to continuing this journey, learning more, and contributing to the field of environmental science.\n\n## Join the Journey!\n\nFor those of you in the Nextflow community or considering joining, I encourage you to take advantage of the resources available. Engage with the community, attend webinars, and don’t hesitate to ask questions. Whether you’re a seasoned expert or a curious newcomer, the Nextflow family is here to support you. Together, we can achieve great things.\n", + "content": "As a Nextflow Ambassador and a PhD student working in bioinformatics, I’ve always believed in the power of collaboration. Over the past six months, I’ve had the privilege of working with another PhD student specializing in metagenomics environmental science. This collaboration began through a simple email after the other researcher discovered my contact information on the ambassadors’ list page. It has been a journey of learning, problem-solving, and mutual growth. I’d like to share some reflections on this experience, highlighting both the challenges and the rewards.\n\n\n\n## Connecting Across Disciplines\n\nOur partnership began with a simple question about running one of nf-core’s metagenomics analysis pipelines. Despite being in different parts of Europe and coming from different academic backgrounds, we quickly found common ground. The combination of our expertise – my focus on bioinformatics workflows and their deep knowledge of microbial ecosystems – created a synergy that enriched our work.\n\n## Navigating Challenges Together\n\nLike any collaboration, ours was not without its difficulties. We faced numerous technical challenges, from optimizing computational resources to troubleshooting pipeline errors. There were moments of frustration when things didn’t work as expected. However, each challenge was an opportunity to learn and grow. Working through these challenges together made them much more manageable and even enjoyable at times. We focused on mastering Nextflow in a high-performance computing (HPC) environment, managing large datasets, and conducting comprehensive data analysis. Additionally, we explored effective data visualization techniques to better interpret and present the findings.\nWe leaned heavily on the Nextflow and nf-core community for support. The extensive documentation and guides were invaluable, and the different Slack channels provided real-time problem-solving assistance. Having the possibility of contacting the main developers of the pipeline that was troubling was a great resource that we are fortunate to have. The community’s willingness to share and offer help was a constant source of encouragement, making us feel supported every step of the way.\n\n## Learning and Growing\n\nOver the past six months, we’ve both learned a tremendous amount. The other PhD student became more adept at using and understanding Nextflow, particularly when running the nf-core/ampliseq pipeline, managing files, and handling high-performance computing (HPC) environments. I, on the other hand, gained a deeper understanding of environmental microbiomes and the specific needs of metagenomics research.\nOur sessions were highly collaborative, allowing us to share knowledge and insights freely. It was reassuring to know that we weren’t alone in our journey and that there was a whole community of researchers ready to share their wisdom and experiences. These interactions made our learning process more rewarding.\n\n## Achieving Synergy\n\nOne of the most rewarding aspects of this collaboration has been the synergy between our different backgrounds. Our combined expertise enabled us to efficiently analyze a high volume of metagenomics samples. The journey does not stop here, of course. Now that they have their samples processed, it comes the time to interpret the data, one of my favorite parts. Our work together highlighted the potential for Nextflow and the nf-core community to facilitate research across diverse fields. The collaboration has been a testament to the idea that when individuals from different disciplines come together, they can achieve more than they could alone.\nThis collaboration is poised to result in significant academic contributions. The other PhD student is preparing to publish a paper with the findings enabled by the use of the nf-core/ampliseq pipeline, which will be a key component of their thesis. This paper is going to serve as an excellent example of using Nextflow and nf-core pipelines in the field of metagenomics environmental science.\n\n## Reflecting on the Journey\n\nAs I reflect on these six months, I’m struck by the power of this community in fostering such collaborations. The support network, comprehensive resources, and culture of knowledge sharing have been essential in our success. This experience has reinforced my belief in the importance of open-source bioinformatics and data science communities for professional development and scientific advancement. Through it all, having a collaborator who understood the struggles and celebrated the successes with me made the journey all the more rewarding.\nMoving forward, I’m excited about the potential for more such collaborations. The past six months have been a journey of discovery and growth, and I’m grateful for the opportunity to work with such a dedicated and talented researcher. Our work is far from over, and I look forward to continuing this journey, learning more, and contributing to the field of environmental science.\n\n## Join the Journey!\n\nFor those of you in the Nextflow community or considering joining, I encourage you to take advantage of the resources available. Engage with the community, attend webinars, and don’t hesitate to ask questions. Whether you’re a seasoned expert or a curious newcomer, the Nextflow family is here to support you. Together, we can achieve great things.", "images": [], "author": "Cristina Tuñi i Domínguez", "tags": "nextflow,ambassador_post" @@ -823,7 +823,7 @@ "slug": "2024/reflections-on-nextflow-mentorship", "title": "One-Year Reflections on Nextflow Mentorship", "date": "2024-04-10T00:00:00.000Z", - "content": "\nFrom December 2022 to March 2023, I was part of the second cohort of the Nextflow and nf-core mentorship program, which spanned four months and attracted participants globally. I could not have anticipated the extent to which my participation in this program and the associated learning experiences would positively change my professional growth.\nThe mentorship aims to foster collaboration, knowledge exchange, flexible learning, collaborative coding, and contributions to the nf-core community. It was funded by the Chan Zuckerberg Initiative and is guided by experienced mentors in the community.\nIn the upcoming paragraphs, I'll be sharing more details about the program—its structure, the valuable learning experiences it brought, and the exciting opportunities it opened up for me.\n\n\n\n# Meeting my mentor\n\nOne of the most interesting aspects of the mentorship is that the program emphasizes that mentor-mentee pairs share research interests. In addition, the mentor should have significant experience in the areas where the mentee wants to develop. I found this extremely valuable, as it makes the program very flexible while also considering individual goals and interests. My goal as a mentee was to transition from a **Nextflow user to a Nextflow developer**.\n\nI was lucky enough to have Matthias De Smet as a mentor. He is a member of the Center for Medical Genetics in Ghent and has extensive experience working with open-source projects such as nf-core and Bioconda. His experience working in clinical genomics was a common ground for us to communicate, share experiences and build effective collaboration.\n\nDuring my first days, he guided me to the most useful Nextflow resources available online, tailored to my goals. Then, I drafted a pipeline that I wanted to build and attempted to write my first lines of code in Nextflow. We communicated via Slack and Matthias reviewed and corrected my code via GitHub. He introduced me to the supportive nf-core community, to ask for help when needed, and to acknowledge every success along the way.\n\n
\n \"Mentor\n
\n\n# Highlights of the program\n\nWe decided to start small, setting step-by-step goals. Matthias suggested that a doable goal would be to create my first Nextflow module in the context of a broader pipeline I wanted to develop. A module is a building block that encapsulates a specific functionality or task within a workflow. We realized that the tool I wanted to modularize was not available as part of nf-core. The nf-core GitHub has a community-driven collection of Nextflow modules, subworkflows and pipelines for bioinformatics, providing standardized and well-documented modules. The goal, therefore, was to create a module for this missing tool and then submit it as a contribution to nf-core.\n\nFor those unfamiliar, contributing to nf-core requires another member of the community, usually a maintainer, to review your code. As a newcomer, I was obviously curious about how the process would be. In academia, where anonymity often prevails, feedback can occasionally be a bit stringent. Conversely, during my submission to the nf-core project, I was pleasantly surprised that reviewers look for collective improvement, providing quick, constructive and amicable reviews, leading to a positive environment.\n\n
\n \"Review\n
\n\nFor my final project in the mentorship program, I successfully ported a complete pipeline from Bash to Nextflow. This was a learning experience that allowed me to explore a diverse range of skills, such as modularizing content, understanding how crucial the meta map is, and creating Docker container images for software. This process not only enhanced my proficiency in Nextflow but also allowed me to interact with and contribute to related projects like Bioconda and BioContainers.\n\n# Life after the mentorship\n\nWith the skills I acquired during the mentorship as a mentee, I proposed and successfully implemented a custom solution in Nextflow for a precision medicine start-up I worked at the time that could sequentially do several diagnostics and consumer-genetics applications in the cloud, resulting in substantial cost savings and increasing flexibility for the company.\nBeyond my immediate projects, I joined a group actively developing an open-source Nextflow pipeline for genetic imputation. This project allowed me to be in close contact with members of the nf-core community working on similar projects, adding new tools to this pipeline, giving and receiving feedback, and continuing to improve my overall Nextflow skills while also contributing to the broader bioinformatics community. You can learn more about this project with the fantastic talk by Louis Le Nézet at Nextflow Summit 2023 [here](https://www.youtube.com/watch?v=GHb2Wt9VCOg).\n\nFinally, I was honored to become a Nextflow ambassador. The program’s goal is to extend the awareness of Nextflow around the world while also building a supportive community. In particular, the South American community is underrepresented, so I serve as a point of contact for any institution or newcomer who wants to implement pipelines with Nextflow.\nAs part of this program, I was invited to speak at the second Chilean Congress of Bioinformatics, where I gave a talk about how Nextflow and nf-core can support scaling bioinformatics projects in the cloud. It was incredibly rewarding to introduce Nextflow to a community for the first time and witness the genuine enthusiasm it sparks among students and attendees for the potential in their research projects.\n\n
\n \"Second\n
\n\n# What’s next?\n\nThe comprehensive skill set acquired in my journey proved to be incredibly valuable for my professional development and allowed me to join the ZS Discovery Team as a Senior Bioinformatician. This organization accelerates transformation in research and early development with direct contribution to impactful bioinformatics projects with a globally distributed, multidisciplinary talented team.\n\nIn addition, we organized a local site for the nf-core hackathon in March 2024, the first Nextflow Hackathon in Argentina, fostering a space to advance our skills in workflow management collectively. It was a pleasure to see how beginners got their first PRs approved and how they interacted with the nf-core community for the first time.\n\n
\n \"nf-core\n
\n\nMy current (and probably future!) day-to-day work involves working and developing pipelines with Nextflow, while also mentoring younger bioinformaticians into this language. The commitment to open-source projects remains a cornerstone of my journey and I am thankful that it has provided me the opportunity to collaborate with individuals from diverse backgrounds all over the world.\n\nWhether you're interested in the mentorship program, curious about the hackathon, or simply wish to connect, feel free to reach out at the nf-core Slack!\n", + "content": "From December 2022 to March 2023, I was part of the second cohort of the Nextflow and nf-core mentorship program, which spanned four months and attracted participants globally. I could not have anticipated the extent to which my participation in this program and the associated learning experiences would positively change my professional growth.\nThe mentorship aims to foster collaboration, knowledge exchange, flexible learning, collaborative coding, and contributions to the nf-core community. It was funded by the Chan Zuckerberg Initiative and is guided by experienced mentors in the community.\nIn the upcoming paragraphs, I'll be sharing more details about the program—its structure, the valuable learning experiences it brought, and the exciting opportunities it opened up for me.\n\n\n\n# Meeting my mentor\n\nOne of the most interesting aspects of the mentorship is that the program emphasizes that mentor-mentee pairs share research interests. In addition, the mentor should have significant experience in the areas where the mentee wants to develop. I found this extremely valuable, as it makes the program very flexible while also considering individual goals and interests. My goal as a mentee was to transition from a **Nextflow user to a Nextflow developer**.\n\nI was lucky enough to have Matthias De Smet as a mentor. He is a member of the Center for Medical Genetics in Ghent and has extensive experience working with open-source projects such as nf-core and Bioconda. His experience working in clinical genomics was a common ground for us to communicate, share experiences and build effective collaboration.\n\nDuring my first days, he guided me to the most useful Nextflow resources available online, tailored to my goals. Then, I drafted a pipeline that I wanted to build and attempted to write my first lines of code in Nextflow. We communicated via Slack and Matthias reviewed and corrected my code via GitHub. He introduced me to the supportive nf-core community, to ask for help when needed, and to acknowledge every success along the way.\n\n
\n \"Mentor\n
\n\n# Highlights of the program\n\nWe decided to start small, setting step-by-step goals. Matthias suggested that a doable goal would be to create my first Nextflow module in the context of a broader pipeline I wanted to develop. A module is a building block that encapsulates a specific functionality or task within a workflow. We realized that the tool I wanted to modularize was not available as part of nf-core. The nf-core GitHub has a community-driven collection of Nextflow modules, subworkflows and pipelines for bioinformatics, providing standardized and well-documented modules. The goal, therefore, was to create a module for this missing tool and then submit it as a contribution to nf-core.\n\nFor those unfamiliar, contributing to nf-core requires another member of the community, usually a maintainer, to review your code. As a newcomer, I was obviously curious about how the process would be. In academia, where anonymity often prevails, feedback can occasionally be a bit stringent. Conversely, during my submission to the nf-core project, I was pleasantly surprised that reviewers look for collective improvement, providing quick, constructive and amicable reviews, leading to a positive environment.\n\n
\n \"Review\n
\n\nFor my final project in the mentorship program, I successfully ported a complete pipeline from Bash to Nextflow. This was a learning experience that allowed me to explore a diverse range of skills, such as modularizing content, understanding how crucial the meta map is, and creating Docker container images for software. This process not only enhanced my proficiency in Nextflow but also allowed me to interact with and contribute to related projects like Bioconda and BioContainers.\n\n# Life after the mentorship\n\nWith the skills I acquired during the mentorship as a mentee, I proposed and successfully implemented a custom solution in Nextflow for a precision medicine start-up I worked at the time that could sequentially do several diagnostics and consumer-genetics applications in the cloud, resulting in substantial cost savings and increasing flexibility for the company.\nBeyond my immediate projects, I joined a group actively developing an open-source Nextflow pipeline for genetic imputation. This project allowed me to be in close contact with members of the nf-core community working on similar projects, adding new tools to this pipeline, giving and receiving feedback, and continuing to improve my overall Nextflow skills while also contributing to the broader bioinformatics community. You can learn more about this project with the fantastic talk by Louis Le Nézet at Nextflow Summit 2023 [here](https://www.youtube.com/watch?v=GHb2Wt9VCOg).\n\nFinally, I was honored to become a Nextflow ambassador. The program’s goal is to extend the awareness of Nextflow around the world while also building a supportive community. In particular, the South American community is underrepresented, so I serve as a point of contact for any institution or newcomer who wants to implement pipelines with Nextflow.\nAs part of this program, I was invited to speak at the second Chilean Congress of Bioinformatics, where I gave a talk about how Nextflow and nf-core can support scaling bioinformatics projects in the cloud. It was incredibly rewarding to introduce Nextflow to a community for the first time and witness the genuine enthusiasm it sparks among students and attendees for the potential in their research projects.\n\n
\n \"Second\n
\n\n# What’s next?\n\nThe comprehensive skill set acquired in my journey proved to be incredibly valuable for my professional development and allowed me to join the ZS Discovery Team as a Senior Bioinformatician. This organization accelerates transformation in research and early development with direct contribution to impactful bioinformatics projects with a globally distributed, multidisciplinary talented team.\n\nIn addition, we organized a local site for the nf-core hackathon in March 2024, the first Nextflow Hackathon in Argentina, fostering a space to advance our skills in workflow management collectively. It was a pleasure to see how beginners got their first PRs approved and how they interacted with the nf-core community for the first time.\n\n
\n \"nf-core\n
\n\nMy current (and probably future!) day-to-day work involves working and developing pipelines with Nextflow, while also mentoring younger bioinformaticians into this language. The commitment to open-source projects remains a cornerstone of my journey and I am thankful that it has provided me the opportunity to collaborate with individuals from diverse backgrounds all over the world.\n\nWhether you're interested in the mentorship program, curious about the hackathon, or simply wish to connect, feel free to reach out at the nf-core Slack!", "images": [ "/img/blog-2024-04-10-img1a.png", "/img/blog-2024-04-10-img1b.png", @@ -837,7 +837,7 @@ "slug": "2024/training-local-site", "title": "Nextflow Training: Bridging Online Learning with In-Person Connections", "date": "2024-05-08T00:00:00.000Z", - "content": "\nNextflow and nf-core provide frequent community training events to new users, which offer an opportunity to get started using and understanding Nextflow, Groovy and nf-core. These events are live-streamed and are available for on-demand viewing on YouTube, but what if you could join friends in person and watch it live?\n\n\n\nLearning something new by yourself can be a daunting task. Having colleagues and friends go through the learning and discovering process alongside you can really enrich the experience and be a lot of fun! With that in mind, we decided to host a get-together for the fundamentals training streams in person. Anybody from the scientific community in and around Heidelberg who wanted to learn Nextflow was welcome to join.\n\nThis year, [Marcel Ribeiro-Dantas](https://twitter.com/mribeirodantas) and [Chris Hakkaart](https://twitter.com/Chris_Hakk) from Seqera held the training over two days, offering the first steps into the Nextflow universe (you can watch it [here](https://www.youtube.com/playlist?list=PL3xpfTVZLcNgLBGLAiY6Rl9fizsz-DTCT)). [Kübra Narcı](https://twitter.com/kubranarci) and [Florian Wünneman](https://twitter.com/flowuenne) hosted a local training site for the recent community fundamentals training in Heidelberg. Kübra is a Nextflow ambassador, working as a bioinformatician and using Nextflow to develop pipelines for the German Human Genome Phenome Archive (GHGA) project in her daily life. At the time, Florian was a Postdoc at the Institute of Computational Biomedicine with Denis Schapiro in Heidelberg, though he has since then joined Seqera as a Bioinformatics Engineer.\n\nWe advertised the event about a month beforehand in our local communities (genomics, transcriptomics, spatial omics among others) to give people enough time to decide whether they want to join. We had quite a bit of interest and a total of 15 people participated. The event took place at the Marsilius Arkaden at the University Clinic campus in Heidelberg. Participants brought their laptops and followed along with the stream, which we projected for everyone, so people could use their laptops exclusively for coding and did not have to switch between stream and coding environment.\n\n
\n \"meme\n
\n\n
\n \"meme\n
\n\nThe goal of this local training site was for everyone to follow the fundamentals training sessions on their laptop and be able to ask follow-up questions in person to the room. We also had a few experienced Nextflow users be there for support. There is a dedicated nf-core Slack channel during the training events for people to ask questions, which is a great tool for help. We also found that in-person discussions around topics that remained confusing to participants were really helpful for many people, as they could provide some more context and allow quick follow-up questions. During the course of the fundamentals training, we found ourselves naturally pausing the video and taking the time to discuss with the group. It was particularly great to see new users explaining concepts they just learned to each other.\n\nThis local training site was also an excellent opportunity for new Nextflow users in Heidelberg to get to know each other and make new connections before the upcoming nf-core hackathon, for which there was also a [local site](https://nf-co.re/events/2024/hackathon-march-2024/germany-heidelberg) organized in Heidelberg. It was a great experience to organize a smaller local event to learn Nextflow with the local community. We learned some valuable lessons from this experience, that we will apply for the next local Nextflow gatherings. Advertising a bit earlier will give people more time to spread the word, we would likely aim for 2 months in advance next time. Offering coffee during breaks can go a long way to keep people awake and motivated, so we would try to serve up some hot coffee next time. Finally, having a bit more in-depth introductions (maybe via short posts on a forum) of everyone joining could be an even better ice breaker to foster contacts and collaborations for the future.\n\nThe ability to join training sessions, bytesize talks, and other events from nf-core and Nextflow online is absolutely fantastic and enables the free dissemination of knowledge. However, the opportunity to join a group in person and work through the content together can really enrich the experience and bring people closer together.\n\nIf you're looking for a training opportunity, there will be one in Basel, Switzerland, on June 25 and another one in Cambridge, UK, on September 12. These and other events will be displayed in the [Seqera Events](https://seqera.io/events/) page when it gets closer to the dates of the events.\n\nWho knows, maybe you will meet someone interested in the same topic, a new collaborator or even a new friend in your local Nextflow community!\n", + "content": "Nextflow and nf-core provide frequent community training events to new users, which offer an opportunity to get started using and understanding Nextflow, Groovy and nf-core. These events are live-streamed and are available for on-demand viewing on YouTube, but what if you could join friends in person and watch it live?\n\n\n\nLearning something new by yourself can be a daunting task. Having colleagues and friends go through the learning and discovering process alongside you can really enrich the experience and be a lot of fun! With that in mind, we decided to host a get-together for the fundamentals training streams in person. Anybody from the scientific community in and around Heidelberg who wanted to learn Nextflow was welcome to join.\n\nThis year, [Marcel Ribeiro-Dantas](https://twitter.com/mribeirodantas) and [Chris Hakkaart](https://twitter.com/Chris_Hakk) from Seqera held the training over two days, offering the first steps into the Nextflow universe (you can watch it [here](https://www.youtube.com/playlist?list=PL3xpfTVZLcNgLBGLAiY6Rl9fizsz-DTCT)). [Kübra Narcı](https://twitter.com/kubranarci) and [Florian Wünneman](https://twitter.com/flowuenne) hosted a local training site for the recent community fundamentals training in Heidelberg. Kübra is a Nextflow ambassador, working as a bioinformatician and using Nextflow to develop pipelines for the German Human Genome Phenome Archive (GHGA) project in her daily life. At the time, Florian was a Postdoc at the Institute of Computational Biomedicine with Denis Schapiro in Heidelberg, though he has since then joined Seqera as a Bioinformatics Engineer.\n\nWe advertised the event about a month beforehand in our local communities (genomics, transcriptomics, spatial omics among others) to give people enough time to decide whether they want to join. We had quite a bit of interest and a total of 15 people participated. The event took place at the Marsilius Arkaden at the University Clinic campus in Heidelberg. Participants brought their laptops and followed along with the stream, which we projected for everyone, so people could use their laptops exclusively for coding and did not have to switch between stream and coding environment.\n\n
\n \"meme\n
\n\n
\n \"meme\n
\n\nThe goal of this local training site was for everyone to follow the fundamentals training sessions on their laptop and be able to ask follow-up questions in person to the room. We also had a few experienced Nextflow users be there for support. There is a dedicated nf-core Slack channel during the training events for people to ask questions, which is a great tool for help. We also found that in-person discussions around topics that remained confusing to participants were really helpful for many people, as they could provide some more context and allow quick follow-up questions. During the course of the fundamentals training, we found ourselves naturally pausing the video and taking the time to discuss with the group. It was particularly great to see new users explaining concepts they just learned to each other.\n\nThis local training site was also an excellent opportunity for new Nextflow users in Heidelberg to get to know each other and make new connections before the upcoming nf-core hackathon, for which there was also a [local site](https://nf-co.re/events/2024/hackathon-march-2024/germany-heidelberg) organized in Heidelberg. It was a great experience to organize a smaller local event to learn Nextflow with the local community. We learned some valuable lessons from this experience, that we will apply for the next local Nextflow gatherings. Advertising a bit earlier will give people more time to spread the word, we would likely aim for 2 months in advance next time. Offering coffee during breaks can go a long way to keep people awake and motivated, so we would try to serve up some hot coffee next time. Finally, having a bit more in-depth introductions (maybe via short posts on a forum) of everyone joining could be an even better ice breaker to foster contacts and collaborations for the future.\n\nThe ability to join training sessions, bytesize talks, and other events from nf-core and Nextflow online is absolutely fantastic and enables the free dissemination of knowledge. However, the opportunity to join a group in person and work through the content together can really enrich the experience and bring people closer together.\n\nIf you're looking for a training opportunity, there will be one in Basel, Switzerland, on June 25 and another one in Cambridge, UK, on September 12. These and other events will be displayed in the [Seqera Events](https://seqera.io/events/) page when it gets closer to the dates of the events.\n\nWho knows, maybe you will meet someone interested in the same topic, a new collaborator or even a new friend in your local Nextflow community!", "images": [ "/img/blog-2024-05-06-training-img1a.jpg", "/img/blog-2024-05-06-training-img2a.jpg" @@ -849,7 +849,7 @@ "slug": "2024/welcome_ambassadors_20242", "title": "Join us in welcoming the new Nextflow Ambassadors", "date": "2024-07-10T00:00:00.000Z", - "content": "\nAs the second semester of 2024 kicks off, I am thrilled to welcome a new cohort of ambassadors to the Nextflow Ambassador Program. This vibrant group joins the dedicated ambassadors who are continuing their remarkable work from the previous semester. Together, they form a diverse and talented team, representing a variety of countries and backgrounds, encompassing both industry and academia.\n\n\n\n## A Diverse and Inclusive Cohort\n\nThis semester, I am proud to announce that our ambassadors hail from over 20 countries, reflecting the increasingly global reach and inclusive nature of the Nextflow community. There has historically been a strong presence of Nextflow in the US and Europe, so I would like to extend an especially warm welcome to all those in Asia and the global south who are joining us through the program, from countries such as Argentina, Chile, Brazil, Ghana, Tunisia, Nigeria, South Africa, India, Indonesia, Singapore, and Australia. From seasoned bioinformaticians to emerging data scientists, our ambassadors bring a wealth of expertise and unique perspectives to the program.\n\n## Industry and Academia Unite\n\nOne of the strengths of the Nextflow Ambassador Program is its ability to bridge the gap between industry and academia. This semester, we have an exciting mix of professionals from biotech companies, renowned research institutions, and leading universities. This synergy fosters a rich exchange of ideas, driving innovation and collaboration.\n\n## Spotlight on New Ambassadors\n\nI am particularly happy with this last call for ambassadors. Amazing people were selected, and I would like to highlight a few, though all of them are good additions to the team! For example, while Carson Miller, a PhD Candidate in the Department of Microbiology at the University of Washington, is new to the ambassador program, he has been making impactful contributions to the community for a long time. He hosted a local site for the nf-core Hackathon back in March, wrote a post to the Nextflow blog and has been very active in the nf-core community. The same can be said about Mahesh Binzer-Panchal, a Bioinformatician at NBIS, who has been very active in the community answering technical questions about Nextflow.\n\nThe previous round of ambassadors allowed us to achieve a broad global presence. However, some regions were more represented than others. I am especially thrilled to have new ambassadors in new regions of the globe, For example, Fadinda Shafira and Edwin Simjaya from Indonesia, AI Engineer and Head of AI at Kalbe, respectively. Prior to joining the program, they had already been strong advocates for Nextflow in Indonesia and had conducted Nextflow training sessions!\n\n## Continuing the Good Work\n\nI'm also delighted to see the continuing work of several dedicated ambassadors who have made significant contributions to the program. Abhinav Sharma, a Ph.D. Candidate at Stellenbosch University in South Africa, has been a key community contact in the African continent, and with the support we were able to provide him through the program, he was able to travel around Brazil and visit multiple research groups to advocate for Open Science, Nextflow, and nf-core. Similarly, Kübra Narcı, a bioinformatician at DKFZ in Germany, increased the awareness of [Nextflow in her home country, Türkiye](https://www.nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html), while also contributing to the [German research community](https://www.nextflow.io/blog/2024/training-local-site.html).\n\nThe program has been shown to welcome a variety of backgrounds and both new and long-time community members. Just last year, Anabella Trigila, a Senior Bioinformatician at ZS in Argentina, was a mentee in the Nextflow and nf-core mentorship program and has quickly become a [key member in Latin America](https://www.nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html). Robert Petit, a Bioinformatician at the Wyoming Public Health Laboratory in the US, meanwhile, has been [a contributor for many years](https://www.nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html) and keeps giving back to the community.\n\n## Where we are\n\n
\n \"Map\n
\n\n## Looking Ahead\n\nThe upcoming semester promises to be an exciting period of growth and innovation for the Nextflow Ambassador Program. Based on current plans, our ambassadors are set to make sure people worldwide know Nextflow and have all the support they need to use it to advance the field of computational biology, among others. I look forward to seeing the incredible work that will emerge from this talented group.\n\nWelcome, new and continuing ambassadors, to another inspiring semester! Together, we will continue to help push the boundaries of what's possible with Nextflow.\n\nStay tuned for more updates and follow our ambassadors' journeys on the Nextflow blog here and the [Nextflow's Twitter/X account](https://x.com/nextflowio).\n\n
\n \n\n
\n

\n Ambassadors are passionate individuals who support\n the Nextflow community. Interested in becoming an ambassador? Read more about it\n here.\n

\n
\n
\n", + "content": "As the second semester of 2024 kicks off, I am thrilled to welcome a new cohort of ambassadors to the Nextflow Ambassador Program. This vibrant group joins the dedicated ambassadors who are continuing their remarkable work from the previous semester. Together, they form a diverse and talented team, representing a variety of countries and backgrounds, encompassing both industry and academia.\n\n\n\n## A Diverse and Inclusive Cohort\n\nThis semester, I am proud to announce that our ambassadors hail from over 20 countries, reflecting the increasingly global reach and inclusive nature of the Nextflow community. There has historically been a strong presence of Nextflow in the US and Europe, so I would like to extend an especially warm welcome to all those in Asia and the global south who are joining us through the program, from countries such as Argentina, Chile, Brazil, Ghana, Tunisia, Nigeria, South Africa, India, Indonesia, Singapore, and Australia. From seasoned bioinformaticians to emerging data scientists, our ambassadors bring a wealth of expertise and unique perspectives to the program.\n\n## Industry and Academia Unite\n\nOne of the strengths of the Nextflow Ambassador Program is its ability to bridge the gap between industry and academia. This semester, we have an exciting mix of professionals from biotech companies, renowned research institutions, and leading universities. This synergy fosters a rich exchange of ideas, driving innovation and collaboration.\n\n## Spotlight on New Ambassadors\n\nI am particularly happy with this last call for ambassadors. Amazing people were selected, and I would like to highlight a few, though all of them are good additions to the team! For example, while Carson Miller, a PhD Candidate in the Department of Microbiology at the University of Washington, is new to the ambassador program, he has been making impactful contributions to the community for a long time. He hosted a local site for the nf-core Hackathon back in March, wrote a post to the Nextflow blog and has been very active in the nf-core community. The same can be said about Mahesh Binzer-Panchal, a Bioinformatician at NBIS, who has been very active in the community answering technical questions about Nextflow.\n\nThe previous round of ambassadors allowed us to achieve a broad global presence. However, some regions were more represented than others. I am especially thrilled to have new ambassadors in new regions of the globe, For example, Fadinda Shafira and Edwin Simjaya from Indonesia, AI Engineer and Head of AI at Kalbe, respectively. Prior to joining the program, they had already been strong advocates for Nextflow in Indonesia and had conducted Nextflow training sessions!\n\n## Continuing the Good Work\n\nI'm also delighted to see the continuing work of several dedicated ambassadors who have made significant contributions to the program. Abhinav Sharma, a Ph.D. Candidate at Stellenbosch University in South Africa, has been a key community contact in the African continent, and with the support we were able to provide him through the program, he was able to travel around Brazil and visit multiple research groups to advocate for Open Science, Nextflow, and nf-core. Similarly, Kübra Narcı, a bioinformatician at DKFZ in Germany, increased the awareness of [Nextflow in her home country, Türkiye](https://www.nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html), while also contributing to the [German research community](https://www.nextflow.io/blog/2024/training-local-site.html).\n\nThe program has been shown to welcome a variety of backgrounds and both new and long-time community members. Just last year, Anabella Trigila, a Senior Bioinformatician at ZS in Argentina, was a mentee in the Nextflow and nf-core mentorship program and has quickly become a [key member in Latin America](https://www.nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html). Robert Petit, a Bioinformatician at the Wyoming Public Health Laboratory in the US, meanwhile, has been [a contributor for many years](https://www.nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html) and keeps giving back to the community.\n\n## Where we are\n\n
\n \"Map\n
\n\n## Looking Ahead\n\nThe upcoming semester promises to be an exciting period of growth and innovation for the Nextflow Ambassador Program. Based on current plans, our ambassadors are set to make sure people worldwide know Nextflow and have all the support they need to use it to advance the field of computational biology, among others. I look forward to seeing the incredible work that will emerge from this talented group.\n\nWelcome, new and continuing ambassadors, to another inspiring semester! Together, we will continue to help push the boundaries of what's possible with Nextflow.\n\nStay tuned for more updates and follow our ambassadors' journeys on the Nextflow blog here and the [Nextflow's Twitter/X account](https://x.com/nextflowio).\n\n
\n \n\n \n\n> Ambassadors are passionate individuals who support\n> the Nextflow community. Interested in becoming an ambassador? Read more about it\n> [here](https://www.nextflow.io/ambassadors.html).\n\n
", "images": [ "/img/blog-2024-07-10-img1a.png", "/img/nextflow_ambassador_logo.svg" diff --git a/internal/export.mjs b/internal/export.mjs index 32585c29..11ecb3b7 100644 --- a/internal/export.mjs +++ b/internal/export.mjs @@ -18,6 +18,85 @@ function extractImagePaths(content, postPath) { return images; } +function sanitizeMarkdown(content) { + const $ = cheerio.load(`
${content}
`); + + $('p').each((i, elem) => { + const $elem = $(elem); + $elem.replaceWith(`\n\n${$elem.html().trim()}\n\n`); + }); + + $('s, del, strike').each((i, elem) => { + const $elem = $(elem); + $elem.replaceWith(`~~${$elem.html()}~~`); + }); + + $('sup').each((i, elem) => { + const $elem = $(elem); + $elem.replaceWith(`^${$elem.html()}^`); + }); + + $('sub').each((i, elem) => { + const $elem = $(elem); + $elem.replaceWith(`~${$elem.html()}~`); + }); + + $('a').each((i, elem) => { + const $elem = $(elem); + const href = $elem.attr('href'); + const text = $elem.text(); + $elem.replaceWith(`[${text}](${href})`); + }); + + $('blockquote').each((i, elem) => { + const $elem = $(elem); + const text = $elem.html().trim().replace(/\n/g, '\n> '); + $elem.replaceWith(`\n\n> ${text}\n\n`); + }); + + $('em, i').each((i, elem) => { + const $elem = $(elem); + $elem.replaceWith(`*${$elem.html()}*`); + }); + + $('strong, b').each((i, elem) => { + const $elem = $(elem); + $elem.replaceWith(`**${$elem.html()}**`); + }); + + $('code').each((i, elem) => { + const $elem = $(elem); + if ($elem.parent().is('pre')) { + // This is a code block, leave it as is + return; + } + $elem.replaceWith(`\`${$elem.html()}\``); + }); + + $('hr').each((i, elem) => { + $(elem).replaceWith('\n\n---\n\n'); + }); + + $('ul, ol').each((i, elem) => { + const $elem = $(elem); + const listItems = $elem.children('li').map((i, li) => { + const prefix = $elem.is('ul') ? '- ' : `${i + 1}. `; + return prefix + $(li).html().trim(); + }).get().join('\n'); + $elem.replaceWith(`\n\n${listItems}\n\n`); + }); + + // Remove any remaining HTML tags + // $('*').each((i, elem) => { + // const $elem = $(elem); + // $elem.replaceWith($elem.html()); + // }); + + let markdown = $('#root').html().trim(); + markdown = markdown.replace(/\n{3,}/g, '\n\n'); + return markdown; +} + function getPostsRecursively(dir) { let posts = []; const items = fs.readdirSync(dir, { withFileTypes: true }); @@ -30,13 +109,14 @@ function getPostsRecursively(dir) { } else if (item.isFile() && item.name.endsWith('.md')) { const fileContents = fs.readFileSync(fullPath, 'utf8'); const { data, content } = matter(fileContents); - const images = extractImagePaths(content, fullPath); + const convertedContent = sanitizeMarkdown(content); + const images = extractImagePaths(convertedContent, fullPath); posts.push({ slug: path.relative(postsDirectory, fullPath).replace('.md', ''), title: data.title, date: data.date, - content: content, + content: convertedContent, images: images, author: data.author, tags: data.tags, diff --git a/internal/findPerson.mjs b/internal/findPerson.mjs index 72ceb728..767aa83b 100644 --- a/internal/findPerson.mjs +++ b/internal/findPerson.mjs @@ -9,13 +9,7 @@ export const client = sanityClient({ async function findPerson(name) { const person = await client.fetch(`*[_type == "person" && name == $name][0]`, { name }); - if (!person) { - console.log(`⭕ No person found with the name "${name}".`); - return; - } else { - console.log(`Person found`, person.name); - return person; - } + return person } export default findPerson; \ No newline at end of file diff --git a/internal/import.mjs b/internal/import.mjs index fddec92f..28af64c7 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -156,8 +156,12 @@ function tokenToPortableText(imageMap, token) { item.tokens.map(inlineTokenToPortableText.bind(null, imageMap)) ), }; + + case 'space': + return null; + default: - console.warn(`Unsupported token type: ${token.type}`, token); + console.warn(`Unsupported token type: ${token.type}`); return null; } } @@ -204,7 +208,7 @@ function inlineTokenToPortableText(imageMap, token) { marks: [], }; default: - console.warn(`Unsupported inline token type: ${token.type}`, token); + console.warn(`Unsupported inline token type: ${token.type}`); return { _type: 'span', text: token.raw, _key: nanoid() }; } } @@ -231,7 +235,7 @@ async function migratePosts() { console.log(''); - for (const post of selectedPosts) { + for (const post of firstTen) { const imageMap = {}; for (const imagePath of post.images) { @@ -246,7 +250,10 @@ async function migratePosts() { } const person = await findPerson(post.author); - if (!person) return false; + if (!person) { + console.log(`⭕ No person found with the name "${post.author}"; skipping import.`); + continue; + } const portableTextContent = markdownToPortableText(post.content, imageMap); @@ -254,7 +261,6 @@ async function migratePosts() { let dateStr = post.date.split('T')[0]; dateStr = `${dateStr} 8:00`; - console.log(dateStr); const sanityPost = { @@ -273,6 +279,8 @@ async function migratePosts() { console.error(`Failed to migrate post: ${post.title}`, error); } } + + return true; } migratePosts().then((isSuccess) => { From 51a7208e910e77337ca36e3207d56695a748bd8e Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 25 Sep 2024 15:18:57 +0200 Subject: [PATCH 14/21] More support...... --- internal/export.json | 119 +++++++++++++++++++++++++++++++++---------- internal/export.mjs | 21 ++++++-- internal/import.mjs | 84 +++++++++++++++++++++++++----- 3 files changed, 179 insertions(+), 45 deletions(-) diff --git a/internal/export.json b/internal/export.json index 75428c42..9b4c954f 100644 --- a/internal/export.json +++ b/internal/export.json @@ -51,7 +51,9 @@ "title": "MPI-like distributed execution with Nextflow", "date": "2015-11-13T00:00:00.000Z", "content": "The main goal of Nextflow is to make workflows portable across different\ncomputing platforms taking advantage of the parallelisation features provided\nby the underlying system without having to reimplement your application code.\n\nFrom the beginning Nextflow has included executors designed to target the most popular\nresource managers and batch schedulers commonly used in HPC data centers,\nsuch as [Univa Grid Engine](http://www.univa.com), [Platform LSF](http://www.ibm.com/systems/platformcomputing/products/lsf/),\n[SLURM](https://computing.llnl.gov/linux/slurm/), [PBS](http://www.pbsworks.com/Product.aspx?id=1) and [Torque](http://www.adaptivecomputing.com/products/open-source/torque/).\n\nWhen using one of these executors Nextflow submits the computational workflow tasks\nas independent job requests to the underlying platform scheduler, specifying\nfor each of them the computing resources needed to carry out its job.\n\nThis approach works well for workflows that are composed of long running tasks, which\nis the case of most common genomic pipelines.\n\nHowever this approach does not scale well for workloads made up of a large number of\nshort-lived tasks (e.g. a few seconds or sub-seconds). In this scenario the resource\nmanager scheduling time is much longer than the actual task execution time, thus resulting\nin an overall execution time that is much longer than the real execution time.\nIn some cases this represents an unacceptable waste of computing resources.\n\nMoreover supercomputers, such as [MareNostrum](https://www.bsc.es/marenostrum-support-services/mn3)\nin the [Barcelona Supercomputer Center (BSC)](https://www.bsc.es/), are optimized for\nmemory distributed applications. In this context it is needed to allocate a certain\namount of computing resources in advance to run the application in a distributed manner,\ncommonly using the [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) standard.\n\nIn this scenario, the Nextflow execution model was far from optimal, if not unfeasible.\n\n### Distributed execution\n\nFor this reason, since the release 0.16.0, Nextflow has implemented a new distributed execution\nmodel that greatly improves the computation capability of the framework. It uses [Apache Ignite](https://ignite.apache.org/),\na lightweight clustering engine and in-memory data grid, which has been recently open sourced\nunder the Apache software foundation umbrella.\n\nWhen using this feature a Nextflow application is launched as if it were an MPI application.\nIt uses a job wrapper that submits a single request specifying all the needed computing\nresources. The Nextflow command line is executed by using the `mpirun` utility, as shown in the\nexample below:\n\n #!/bin/bash\n #$ -l virtual_free=120G\n #$ -q \n #$ -N \n #$ -pe ompi \n mpirun --pernode nextflow run -with-mpi [pipeline parameters]\n\nThis tool spawns a Nextflow instance in each of the computing nodes allocated by the\ncluster manager.\n\nEach Nextflow instance automatically connects with the other peers creating an _private_\ninternal cluster, thanks to the Apache Ignite clustering feature that\nis embedded within Nextflow itself.\n\nThe first node becomes the application driver that manages the execution of the\nworkflow application, submitting the tasks to the remaining nodes that act as workers.\n\nWhen the application is complete, the Nextflow driver automatically shuts down the\nNextflow/Ignite cluster and terminates the job execution.\n\n![Nextflow distributed execution](/img/nextflow-distributed-execution.png)\n\n### Conclusion\n\nIn this way it is possible to deploy a Nextflow workload in a supercomputer using an\nexecution strategy that resembles the MPI distributed execution model. This doesn't\nrequire to implement your application using the MPI api/library and it allows you to\nmaintain your code portable across different execution platforms.\n\nAlthough we do not currently have a performance comparison between a Nextflow distributed\nexecution and an equivalent MPI application, we assume that the latter provides better\nperformance due to its low-level optimisation.\n\nNextflow, however, focuses on the fast prototyping of scientific applications in a portable\nmanner while maintaining the ability to scale and distribute the application workload in an\nefficient manner in an HPC cluster.\n\nThis allows researchers to validate an experiment, quickly, reusing existing tools and\nsoftware components. This eventually makes it possible to implement an optimised version\nusing a low-level programming language in the second stage of a project.\n\nRead the documentation to learn more about the [Nextflow distributed execution model](https://www.nextflow.io/docs/latest/ignite.html#execution-with-mpi).\n", - "images": [], + "images": [ + "/img/nextflow-distributed-execution.png" + ], "author": "Paolo Di Tommaso", "tags": "mpi,hpc,pipelines,genomic" }, @@ -69,7 +71,9 @@ "title": "Workflows & publishing: best practice for reproducibility", "date": "2016-04-13T00:00:00.000Z", "content": "Publication time acts as a snapshot for scientific work. Whether a project is ongoing\nor not, work which was performed months ago must be described, new software documented,\ndata collated and figures generated.\n\nThe monumental increase in data and pipeline complexity has led to this task being\nperformed to many differing standards, or [lack of thereof](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080278).\nWe all agree it is not good enough to simply note down the software version number.\nBut what practical measures can be taken?\n\nThe recent publication describing _Kallisto_ [(Bray et al. 2016)](https://doi.org/10.1038/nbt.3519)\nprovides an excellent high profile example of the growing efforts to ensure reproducible\nscience in computational biology. The authors provide a GitHub [repository](https://github.com/pachterlab/kallisto_paper_analysis)\nthat _“contains all the analysis to reproduce the results in the kallisto paper”_.\n\nThey should be applauded and indeed - in the Twittersphere - they were. The corresponding\nauthor Lior Pachter stated that the publication could be reproduced starting from raw\nreads in the NCBI Sequence Read Archive through to the results, which marks a fantastic\naccomplishment.\n\n> Hoping people will notice [https://t.co/qiu3LFozMX](https://t.co/qiu3LFozMX) by [@yarbsalocin](https://twitter.com/yarbsalocin) [@hjpimentel](https://twitter.com/hjpimentel) [@pmelsted](https://twitter.com/pmelsted) reproducing ALL the [#kallisto](https://twitter.com/hashtag/kallisto?src=hash) paper from SRA→results\n> \n> — Lior Pachter (@lpachter) [April 5, 2016](https://twitter.com/lpachter/status/717279998424457216)\n\n\n\nThey achieve this utilising the workflow framework [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home).\nIncreasingly, we are seeing scientists applying workflow frameworks to their pipelines,\nwhich is great to see. There is a learning curve, but I have personally found the payoffs\nin productivity to be immense.\n\nAs both users and developers of Nextflow, we have long discussed best practice to ensure\nreproducibility of our work. As a community, we are at the beginning of that conversation\n\n- there are still many ideas to be aired and details ironed out - nevertheless we wished\n to provide a _state-of-play_ as we see it and to describe what is possible with Nextflow\n in this regard.\n\n### Guaranteed Reproducibility\n\nThis is our goal. It is one thing for a pipeline to be able to be reproduced in your own\nhands, on your machine, yet is another for this to be guaranteed so that anyone anywhere\ncan reproduce it. What I mean by guaranteed is that when a given pipeline is executed,\nthere is only one result which can be output.\nEnvisage what I term the _reproducibility triangle_: consisting of data, code and\ncompute environment.\n\n![Reproducibility Triangle](/img/reproducibility-triangle.png)\n\n**Figure 1:** The Reproducibility Triangle. _Data_: raw data such as sequencing reads,\ngenomes and annotations but also metadata such as experimental design. _Code_:\nscripts, binaries and libraries/dependencies. _Environment_: operating system.\n\nIf there is any change to one of these then the reproducibililty is no longer guaranteed.\nFor years there have been solutions to each of these individual components. But they have\nlived a somewhat discrete existence: data in databases such as the SRA and Ensembl, code\non GitHub and compute environments in the form of virtual machines. We think that in the\nfuture science must embrace solutions that integrate each of these components natively and\nholistically.\n\n### Implementation\n\nNextflow provides a solution to reproduciblility through version control and sandboxing.\n\n#### Code\n\nVersion control is provided via [native integration with GitHub](https://www.nextflow.io/docs/latest/sharing.html)\nand other popular code management platforms such as Bitbucket and GitLab.\nPipelines can be pulled, executed, developed, collaborated on and shared. For example,\nthe command below will pull a specific version of a [simple Kallisto + Sleuth pipeline](https://github.com/cbcrg/kallisto-nf)\nfrom GitHub and execute it. The `-r` parameter can be used to specify a specific tag, branch\nor revision that was previously defined in the Git repository.\n\n nextflow run cbcrg/kallisto-nf -r v0.9\n\n#### Environment\n\nSandboxing during both development and execution is another key concept; version control\nalone does not ensure that all dependencies nor the compute environment are the same.\n\nA simplified implementation of this places all binaries, dependencies and libraries within\nthe project repository. In Nextflow, any binaries within the the `bin` directory of a\nrepository are added to the path. Also, within the Nextflow [config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config),\nenvironmental variables such as `PERL5LIB` can be defined so that they are automatically\nadded during the task executions.\n\nThis can be taken a step further with containerisation such as [Docker](https://www.nextflow.io/docs/latest/docker.html).\nWe have recently published [work](https://doi.org/10.7717/peerj.1273) about this:\nbriefly a [dockerfile](https://github.com/cbcrg/kallisto-nf/blob/master/Dockerfile)\ncontaining the instructions on how to build the docker image resides inside a repository.\nThis provides a specification for the operating system, software, libraries and\ndependencies to be run.\n\nThe images themself also have content-addressable identifiers in the form of\n[digests](https://docs.docker.com/engine/userguide/containers/dockerimages/#image-digests),\nwhich ensure not a single byte of information, from the operating system through to the\nlibraries pulled from public repos, has been changed. This container digest can be specified\nin the [pipeline config file](https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config).\n\n process {\n container = \"cbcrg/kallisto-nf@sha256:9f84012739...\"\n }\n\nWhen doing so Nextflow automatically pulls the specified image from the Docker Hub and\nmanages the execution of the pipeline tasks from within the container in a transparent manner,\ni.e. without having to adapt or modify your code.\n\n#### Data\n\nData is currently one of the more challenging aspect to address. _Small data_ can be\neasily version controlled within git-like repositories. For larger files\nthe [Git Large File Storage](https://git-lfs.github.com/), for which Nextflow provides\nbuilt-in support, may be one solution. Ultimately though, the real home of scientific data\nis in publicly available, programmatically accessible databases.\n\nProviding out-of-box solutions is difficult given the hugely varying nature of the data\nand meta-data within these databases. We are currently looking to incorporate the most\nhighly used ones, such as the [SRA](http://www.ncbi.nlm.nih.gov/sra) and [Ensembl](http://www.ensembl.org/index.html).\nIn the long term we have an eye on initiatives, such as [NCBI BioProject](https://www.ncbi.nlm.nih.gov/bioproject/),\nwith the idea there is a single identifier for both the data and metadata that can be referenced in a workflow.\n\nAdhering to the practices above, one could imagine one line of code which would appear within a publication.\n\n nextflow run [user/repo] -r [version] --data[DB_reference:data_reference] -with-docker\n\nThe result would be guaranteed to be reproduced by whoever wished.\n\n### Conclusion\n\nWith this approach the reproducilbility triangle is complete. But it must be noted that\nthis does not guard against conceptual or implementation errors. It does not replace proper\ndocumentation. What it does is to provide transparency to a result.\n\nThe assumption that the deterministic nature of computation makes results insusceptible\nto irreproducbility is clearly false. We consider Nextflow with its other features such\nits polyglot nature, out-of-the-box portability and native support across HPC and Cloud\nenvironments to be an ideal solution in our everyday work. We hope to see more scientists\nadopt this approach to their workflows.\n\nThe recent efforts by the _Kallisto_ authors highlight the appetite for increasing these\nstandards and we encourage the community at large to move towards ensuring this becomes\nthe normal state of affairs for publishing in science.\n\n### References\n\nBray, Nicolas L., Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology, April. Nature Publishing Group. doi:10.1038/nbt.3519.\n\nDi Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. (2015) \"The impact of Docker containers on the performance of genomic pipelines.\" PeerJ 3:e1273 doi.org:10.7717/peerj.1273.\n\nGarijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) \"Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.\" PLoS ONE 8(11): e80278. doi:10.1371/journal.pone.0080278", - "images": [], + "images": [ + "/img/reproducibility-triangle.png" + ], "author": "Evan Floden", "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,docker" }, @@ -77,7 +81,7 @@ "slug": "2016/deploy-in-the-cloud-at-snap-of-a-finger", "title": "Deploy your computational pipelines in the cloud at the snap-of-a-finger", "date": "2016-09-01T00:00:00.000Z", - "content": "*Learn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease\nthanks to Nextflow and Docker containers*\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\n~~We are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`~~.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image ~~`ami-43f49030`~~ `ami-4b7daa32`^\\* ^ (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the [documentation](/docs/latest/awscloud.html) for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", + "content": "*Learn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease thanks to Nextflow and Docker containers*\n\nNextflow is a framework that simplifies the writing of parallel and distributed computational\npipelines in a portable and reproducible manner across different computing platforms, from\na laptop to a cluster of computers.\n\nIndeed, the original idea, when this project started three years ago, was to\nimplement a tool that would allow researchers in\n[our lab](http://www.crg.eu/es/programmes-groups/comparative-bioinformatics) to smoothly migrate\ntheir data analysis applications in the cloud when needed - without having\nto change or adapt their code.\n\nHowever to date Nextflow has been used mostly to deploy computational workflows within on-premise\ncomputing clusters or HPC data-centers, because these infrastructures are easier to use\nand provide, on average, cheaper cost and better performance when compared to a cloud environment.\n\nA major obstacle to efficient deployment of scientific workflows in the cloud is the lack\nof a performant POSIX compatible shared file system. These kinds of applications\nare usually made-up by putting together a collection of tools, scripts and\nsystem commands that need a reliable file system to share with each other the input and\noutput files as they are produced, above all in a distributed cluster of computers.\n\nThe recent availability of the [Amazon Elastic File System](https://aws.amazon.com/efs/)\n(EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents\na major step in this context, unlocking the deployment of scientific computing\nin the cloud and taking it to the next level.\n\n### Nextflow support for the cloud\n\nNextflow could already be deployed in the cloud, either using tools such as\n[ElastiCluster](https://github.com/gc3-uzh-ch/elasticluster) or\n[CfnCluster](https://aws.amazon.com/hpc/cfncluster/), or by using custom deployment\nscripts. However the procedure was still cumbersome and, above all, it was not optimised\nto fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing\ncluster dynamically as the computing needs change over time.\n\nFor these reasons, we decided it was time to provide Nextflow with a first-class support\nfor the cloud, integrating the Amazon EFS and implementing an optimised native cloud\nscheduler, based on [Apache Ignite](https://ignite.apache.org/), with a full support for cluster\nauto-scaling and spot/preemptible instances.\n\nIn practice this means that Nextflow can now spin-up and configure a fully featured computing\ncluster in the cloud with a single command, after that you need only to login to the master\nnode and launch the pipeline execution as you would do in your on-premise cluster.\n\n### Demo !\n\nSince a demo is worth a thousands words, I've record a short screencast showing how\nNextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.\n\n\n\nNote: in this screencast it has been cut the Ec2 instances startup delay. It required around\n5 minutes to launch them and setup the cluster.\n\nLet's recap the steps showed in the demo:\n\n- The user provides the cloud parameters (such as the VM image ID and the instance type)\n in the `nextflow.config` file.\n\n- To configure the EFS file system you need to provide your EFS storage ID and the mount path\n by using the `sharedStorageId` and `sharedStorageMount` properties.\n\n- To use [EC2 Spot](https://aws.amazon.com/ec2/spot/) instances, just specify the price\n you want to bid by using the `spotPrice` property.\n\n- The AWS access and secret keys are provided by using the usual environment variables.\n\n- The `nextflow cloud create` launches the requested number of instances, configures the user and\n access key, mounts the EFS storage and setups the Nextflow cluster automatically.\n Any Linux AMI can be used, it is only required that the [cloud-init](https://cloudinit.readthedocs.io/en/latest/)\n package, a Java 7+ runtime and the Docker engine are present.\n\n- When the cluster is ready, you can SSH in the master node and launch the pipeline execution\n as usual with the `nextflow run ` command.\n\n- For the sake of this demo we are using [paraMSA](https://github.com/pditommaso/paraMSA),\n a pipeline for generating multiple sequence alignments and bootstrap replicates developed\n in our lab.\n\n- Nextflow automatically pulls the pipeline code from its GitHub repository when the\n execution is launched. This repository includes also a dataset which is used by default.\n [The many bioinformatic tools used by the pipeline](https://github.com/pditommaso/paraMSA#dependencies-)\n are packaged using a Docker image, which is downloaded automatically on each computing node.\n\n- The pipeline results are uploaded automatically in the S3 bucket specified\n by the `--output s3://cbcrg-eu/para-msa-results` command line option.\n\n- When the computation is completed, the cluster can be safely shutdown and the\n EC2 instances terminated with the `nextflow cloud shutdown` command.\n\n### Try it yourself\n\n~~We are releasing the Nextflow integrated cloud support in the upcoming version `0.22.0`~~.\n\nNextflow integrated cloud support is available from version `0.22.0`. To use it just make sure to\nhave this or an higher version of Nextflow.\n\nBare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+\n(Windows 10 users which have installed the [Ubuntu subsystem](https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/)\nshould be able to run it, at their risk..).\n\nOnce you have installed it, you can follow the steps in the above demo. For your convenience\nwe made publicly available the EC2 image ~~`ami-43f49030`~~ `ami-4b7daa32`^\\* ^ (EU Ireland region) used to record this\nscreencast.\n\nAlso make sure you have the following the following variables defined in your environment:\n\n AWS_ACCESS_KEY_ID=\"\"\n AWS_SECRET_ACCESS_KEY=\"\"\n AWS_DEFAULT_REGION=\"\"\n\nReferes to the [documentation](/docs/latest/awscloud.html) for configuration details.\n\n\\* Update: the AMI has been updated with Java 8 on Sept 2017.\n\n### Conclusion\n\nNextflow provides state of the art support for cloud and containers technologies making\nit possible to create computing clusters in the cloud and deploy computational workflows\nin a no-brainer way, with just two commands on your terminal.\n\nIn an upcoming post I will describe the autoscaling capabilities implemented by the\nNextflow scheduler that allows, along with the use of spot/preemptible instances,\na cost effective solution for the execution of your pipeline in the cloud.\n\n#### Credits\n\nThanks to [Evan Floden](https://github.com/skptic) for reviewing this post and for writing\nthe [paraMSA](https://github.com/skptic/paraMSA/) pipeline.\n", "images": [], "author": "Paolo Di Tommaso", "tags": "aws,cloud,pipelines,nextflow,genomic,docker" @@ -96,7 +100,9 @@ "title": "Docker for dunces & Nextflow for nunces", "date": "2016-06-10T00:00:00.000Z", "content": "_Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution._\n\nModern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.\n\nDuring several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.\n\nCreating the Nextflow pipeline ([here](http://www.github.com/cbcrg/lncrna-annotation-nf)) in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.\n\nThe pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the _long_ dependency tree. The solution was to turn to [Docker](https://www.docker.com/) containers.\n\nI have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.\n\n###Docker Installation\n\nBy far the most challenging issue is the installation of Docker. For local installations, the [process is relatively straight forward](https://docs.docker.com/engine/installation). However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with [updates addressing these issues](https://blog.docker.com/2016/02/docker-engine-1-10-security/).\n\nThat being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.\n\nIf you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.\n\n docker pull cbcrg/lncrna_annotation\n nextflow run cbcrg/lncrna-annotation-nf -profile test\n\n[If the following does not work, there could be a problem with your Docker installation.]\n\nThe first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and\nrun it using the test data included with it.\n\n###The Dockerfile\n\nThe `Dockerfile` contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules.\n\nWe begin by creating a file `Dockerfile` in the Nextflow project directory. The Dockerfile begins with:\n\n # Set the base image to debian jessie\n FROM debian:jessie\n\n # File Author / Maintainer\n MAINTAINER Evan Floden \n\nThis sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.\n\nNext we update the repository sources and install some essential tools such as `wget` and `perl`.\n\n RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...\n\nNotice that we use the command `RUN` before each line. The `RUN` instruction executes commands as if they are performed from the Linux shell.\n\nAlso is good practice to group as many as possible commands in the same `RUN` statement. This reduces the size of the final Docker image. See [here](https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/) for these details and [here](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/) for more best practices.\n\nNext we can specify the install of the required perl modules using [cpan minus](http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo):\n\n # Install perl modules\n RUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...\n\nWe can give the instructions to download and install software from GitHub using:\n\n # Install Star Mapper\n RUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR\n\nWe can add custom Perl modules and specify environmental variables such as `PERL5LIB` as below:\n\n # Install FEELnc\n RUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\n ENV FEELNCPATH /FEELnc\n ENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/\n\nR and R libraries can be installed as follows:\n\n # Install R\n RUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\n apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\n apt-get update --fix-missing && \\\n apt-get -y install r-base\n\n # Install R libraries\n RUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'\n\nFor the complete working Dockerfile of this project see [here](https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile)\n\n###Building the Docker Image\n\nOnce we start working on the Dockerfile, we can build it anytime using:\n\n docker build -t skptic/lncRNA_annotation .\n\nThis builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.\n\n###Testing the Docker Image\n\nWe find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:\n\n docker run -ti lncrna_annotation\n\n cd FEELnc/test\n\n FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n > -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\n exit # remember to exit the Docker image\n\n###Tagging the Docker Image\n\nOnce you are confident your image is built correctly, you can tag it, allowing you to push it to [Dockerhub.io](https://hub.docker.com/). Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.\n\nYou can view the images in your local repository with the `docker images` command and tag using `docker tag` with the image ID and the name.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n docker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest\n\nNow when we check our local images we can see the updated tag.\n\n docker images\n\n REPOSITORY TAG IMAGE ID CREATED SIZE\n cbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\n###Pushing the Docker Image to Dockerhub\n\nIf you have not previously, sign up for a Dockerhub account [here](https://hub.docker.com/). From the command line, login to Dockerhub and push your image.\n\n docker login --username=cbcrg\n docker push cbcrg/lncrna_annotation\n\nYou can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:\n\n docker rmi -f d8ec49cbe3ed\n\n # Ensure the local version is not listed.\n docker images\n\n docker pull cbcrg/lncrna_annotation\n\nWe are now almost ready to run our pipeline. The last step is to set up the Nexflow config.\n\n###Nextflow Configuration\n\nWithin the `nextflow.config` file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:\n\n- General (same docker image for all processes):\n\n process {\n container = 'cbcrg/lncrna_annotation'\n }\n\n- Specific to a profile (specified by `-profile crg` for example):\n\n profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n\n- Specific to a given process within a pipeline:\n\n $processName.container = 'cbcrg/lncrna_annotation'\n\nIn most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have [previously written about this](https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html), but briefly, including a hash ensures that not a single byte of the operating system or software is different.\n\n process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }\n\nAll that is left now to run the pipeline.\n\n nextflow run lncRNA-Annotation-nf -profile test\n\nWhilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.\n\n###CircleCI and Nextflow\n\nNow that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the [CircleCI](http://www.circleci.com) servers (using Docker).\n\nTo include CircleCI in the Nexflow pipeline, create a file named `circle.yml` in the project directory. We add the following instructions to the file:\n\n machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\n dependencies:\n override:\n\n test:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test\n\nNext you can sign up to CircleCI, linking your GitHub account.\n\nWithin the GitHub README.md you can add a badge with the following:\n\n ![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)\n\n###Tips and Tricks\n\n**File permissions**: When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:\n\nfind -type f -exec chmod 644 {} \\;\nfind -type d -exec chmod 755 {} \\;\n\n###Summary\nThis was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform.\n", - "images": [], + "images": [ + "https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield" + ], "author": "Evan Floden", "tags": "bioinformatics,reproducibility,pipelines,nextflow,genomic,docker" }, @@ -105,7 +111,9 @@ "title": "Enabling elastic computing with Nextflow", "date": "2016-10-19T00:00:00.000Z", "content": "*Learn how to deploy an elastic computing cluster in the AWS cloud with Nextflow *\n\nIn the [previous post](/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html) I introduced\nthe new cloud native support for AWS provided by Nextflow.\n\nIt allows the creation of a computing cluster in the cloud in a no-brainer way, enabling\nthe deployment of complex computational pipelines in a few commands.\n\nThis solution is characterised by using a lean application stack which does not\nrequire any third party component installed in the EC2 instances other than a Java VM and the\nDocker engine (the latter it's only required in order to deploy pipeline binary dependencies).\n\n![Nextflow cloud deployment](/img/cloud-deployment.png)\n\nEach EC2 instance runs a script, at bootstrap time, that mounts the [EFS](https://aws.amazon.com/efs/)\nstorage and downloads and launches the Nextflow cluster daemon. This daemon is self-configuring,\nit automatically discovers the other running instances and joins them forming the computing cluster.\n\nThe simplicity of this stack makes it possible to setup the cluster in the cloud in just a few minutes,\na little more time than is required to spin up the EC2 VMs. This time does not depend on\nthe number of instances launched, as they configure themself independently.\n\nThis also makes it possible to add or remove instances as needed, realising the [long promised\nelastic scalability](http://www.nextplatform.com/2016/09/21/three-great-lies-cloud-computing/)\nof cloud computing.\n\nThis ability is even more important for bioinformatic workflows, which frequently crunch\nnot homogeneous datasets and are composed of tasks with very different computing requirements\n(eg. a few very long running tasks and many short-lived tasks in the same workload).\n\n### Going elastic\n\nThe Nextflow support for the cloud features an elastic cluster which is capable of resizing itself\nto adapt to the actual computing needs at runtime, thus spinning up new EC2 instances when jobs\nwait for too long in the execution queue, or terminating instances that are not used for\na certain amount of time.\n\nIn order to enable the cluster autoscaling you will need to specify the autoscale\nproperties in the `nextflow.config` file. For example:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.xlarge'\n\n autoscale {\n enabled = true\n minInstances = 5\n maxInstances = 10\n }\n}\n```\n\nThe above configuration enables the autoscaling features so that the cluster will include\nat least 5 nodes. If at any point one or more tasks spend more than 5 minutes without being\nprocessed, the number of instances needed to fullfil the pending tasks, up to limit specified\nby the `maxInstances` attribute, are launched. On the other hand, if these instances are\nidle, they are terminated before reaching the 60 minutes instance usage boundary.\n\nThe autoscaler launches instances by using the same AMI ID and type specified in the `cloud`\nconfiguration. However it is possible to define different attributes as shown below:\n\n```\ncloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.large'\n\n autoscale {\n enabled = true\n maxInstances = 10\n instanceType = 'm4.2xlarge'\n spotPrice = 0.05\n }\n}\n```\n\nThe cluster is first created by using instance(s) of type `m4.large`. Then, when new\ncomputing nodes are required the autoscaler launches instances of type `m4.2xlarge`.\nAlso, since the `spotPrice` attribute is specified, [EC2 spot](https://aws.amazon.com/ec2/spot/)\ninstances are launched, instead of regular on-demand ones, bidding for the price specified.\n\n### Conclusion\n\nNextflow implements an easy though effective cloud scheduler that is able to scale dynamically\nto meet the computing needs of deployed workloads taking advantage of the _elastic_ nature\nof the cloud platform.\n\nThis ability, along the support for spot/preemptible instances, allows a cost effective solution\nfor the execution of your pipeline in the cloud.", - "images": [], + "images": [ + "/img/cloud-deployment.png" + ], "author": "Paolo Di Tommaso", "tags": "aws,cloud,pipelines,nextflow,genomic,docker" }, @@ -131,7 +139,7 @@ "slug": "2017/caw-and-singularity", "title": "Running CAW with Singularity and Nextflow", "date": "2017-11-16T00:00:00.000Z", - "content": "*This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max\ndescribes how they deploy complex cancer data analysis pipelines using Nextflow\nand Singularity. We are very happy to share their experience across the Nextflow community.*\n\n### The CAW pipeline\n\n\"Cancer\n\n[Cancer Analysis Workflow](http://opensource.scilifelab.se/projects/sarek/) (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs.\nIt is developed in collaboration with two infrastructures within [Science for Life Laboratory](https://www.scilifelab.se/): [National Genomics Infrastructure](https://ngisweden.scilifelab.se/) (NGI), in The Stockholm [Genomics Applications Development Facility](https://www.scilifelab.se/facilities/ngi-stockholm/) to be precise and [National Bioinformatics Infrastructure Sweden](https://www.nbis.se/) (NBIS).\n\nCAW is based on [GATK Best Practices](https://software.broadinstitute.org/gatk/best-practices/) for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels ([MuTect1](https://github.com/broadinstitute/mutect/), [MuTect2](https://github.com/broadgsa/gatk-protected/), [Strelka](https://github.com/Illumina/strelka/), [Freebayes](https://github.com/ekg/freebayes/)), ([GATK HaplotyeCaller](https://github.com/broadgsa/gatk-protected/)), for structural variants([Manta](https://github.com/Illumina/manta/)) and for CNVs ([ASCAT](https://github.com/Crick-CancerGenomics/ascat/)).\nAnnotation tools ([snpEff](http://snpeff.sourceforge.net/), [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)) are also used, and finally [MultiQC](http://multiqc.info/) for handling reports.\n\nWe are currently working on a manuscript, but you're welcome to look at (or even contribute to) our [github repository](https://github.com/SciLifeLab/CAW/) or talk with us on our [gitter channel](https://gitter.im/SciLifeLab/CAW/).\n\n### Singularity and UPPMAX\n\n[Singularity](http://singularity.lbl.gov/) is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.\n\nWe're based in Sweden, and [Uppsala Multidisciplinary Center for Advanced Computational Science](https://uppmax.uu.se/) (UPPMAX) provides Computational infrastructures for all Swedish researchers.\nSince we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: [SNIC-SENS](https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/).\n\nIn my case, since we're still developing the pipeline, I am mainly using the research cluster [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/).\nSo I can only transfer files and data in one specific repository using SFTP.\n\nUPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time.\nTypically, [Environment Modules](http://modules.sourceforge.net/) are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.\n\n### Why use containers?\n\nThe idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow.\nWe cannot use [Docker](https://www.docker.com/) on our secure cluster, so we wanted to run CAW with [Singularity](http://singularity.lbl.gov/) images instead.\n\n### How was the switch made?\n\nWe were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process.\nBecause this process is quite slow, repetitive and I~~'m lazy~~ like to automate everything, I made a simple NF [script](https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf) to build and push all docker containers.\nBasically it's just `build` and `pull` for all containers, with some configuration possibilities.\n\n```\ndocker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}\n```\n\nSince Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.\n\n```\nsingularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}\n```\n\nAfter this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile.\nI'll spare you the details of the SFTP transfer.\nThis is what the configuration file for such Singularity images looks like: [`singularity-path.config`](https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config)\n\n```\n/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}\n```\n\nThis approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...\n\n### Conclusion\n\nThis switch was completed a couple of months ago and has been a great success.\nWe are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI.\nEven if we do enjoy the improved control, we must not forgot that:\n\n> With great power comes great responsibility!\n\n### Credits\n\nThanks to [Rickard Hammarén](https://github.com/Hammarn) and [Phil Ewels](http://phil.ewels.co.uk/) for comments and suggestions for improving the post.", + "content": "*This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max describes how they deploy complex cancer data analysis pipelines using Nextflow and Singularity. We are very happy to share their experience across the Nextflow community.*\n\n### The CAW pipeline\n\n\"Cancer\n\n[Cancer Analysis Workflow](http://opensource.scilifelab.se/projects/sarek/) (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs.\nIt is developed in collaboration with two infrastructures within [Science for Life Laboratory](https://www.scilifelab.se/): [National Genomics Infrastructure](https://ngisweden.scilifelab.se/) (NGI), in The Stockholm [Genomics Applications Development Facility](https://www.scilifelab.se/facilities/ngi-stockholm/) to be precise and [National Bioinformatics Infrastructure Sweden](https://www.nbis.se/) (NBIS).\n\nCAW is based on [GATK Best Practices](https://software.broadinstitute.org/gatk/best-practices/) for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels ([MuTect1](https://github.com/broadinstitute/mutect/), [MuTect2](https://github.com/broadgsa/gatk-protected/), [Strelka](https://github.com/Illumina/strelka/), [Freebayes](https://github.com/ekg/freebayes/)), ([GATK HaplotyeCaller](https://github.com/broadgsa/gatk-protected/)), for structural variants([Manta](https://github.com/Illumina/manta/)) and for CNVs ([ASCAT](https://github.com/Crick-CancerGenomics/ascat/)).\nAnnotation tools ([snpEff](http://snpeff.sourceforge.net/), [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)) are also used, and finally [MultiQC](http://multiqc.info/) for handling reports.\n\nWe are currently working on a manuscript, but you're welcome to look at (or even contribute to) our [github repository](https://github.com/SciLifeLab/CAW/) or talk with us on our [gitter channel](https://gitter.im/SciLifeLab/CAW/).\n\n### Singularity and UPPMAX\n\n[Singularity](http://singularity.lbl.gov/) is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.\n\nWe're based in Sweden, and [Uppsala Multidisciplinary Center for Advanced Computational Science](https://uppmax.uu.se/) (UPPMAX) provides Computational infrastructures for all Swedish researchers.\nSince we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: [SNIC-SENS](https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/).\n\nIn my case, since we're still developing the pipeline, I am mainly using the research cluster [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/).\nSo I can only transfer files and data in one specific repository using SFTP.\n\nUPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time.\nTypically, [Environment Modules](http://modules.sourceforge.net/) are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.\n\n### Why use containers?\n\nThe idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow.\nWe cannot use [Docker](https://www.docker.com/) on our secure cluster, so we wanted to run CAW with [Singularity](http://singularity.lbl.gov/) images instead.\n\n### How was the switch made?\n\nWe were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process.\nBecause this process is quite slow, repetitive and I~~'m lazy~~ like to automate everything, I made a simple NF [script](https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf) to build and push all docker containers.\nBasically it's just `build` and `pull` for all containers, with some configuration possibilities.\n\n```\ndocker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}\n```\n\nSince Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.\n\n```\nsingularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}\n```\n\nAfter this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile.\nI'll spare you the details of the SFTP transfer.\nThis is what the configuration file for such Singularity images looks like: [`singularity-path.config`](https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config)\n\n```\n/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}\n```\n\nThis approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...\n\n### Conclusion\n\nThis switch was completed a couple of months ago and has been a great success.\nWe are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI.\nEven if we do enjoy the improved control, we must not forgot that:\n\n> With great power comes great responsibility!\n\n### Credits\n\nThanks to [Rickard Hammarén](https://github.com/Hammarn) and [Phil Ewels](http://phil.ewels.co.uk/) for comments and suggestions for improving the post.", "images": [ "/img/CAW_logo.png" ], @@ -143,7 +151,9 @@ "title": "Nextflow and the Common Workflow Language", "date": "2017-07-20T00:00:00.000Z", "content": "The Common Workflow Language ([CWL](http://www.commonwl.org/)) is a specification for defining\nworkflows in a declarative manner. It has been implemented to varying degrees\nby different software packages. Nextflow and CWL share a common goal of enabling portable\nreproducible workflows.\n\nWe are currently investigating the automatic conversion of CWL workflows into Nextflow scripts\nto increase the portability of workflows. This work is being developed as\nthe [cwl2nxf](https://github.com/nextflow-io/cwl2nxf) project, currently in early prototype stage.\n\nOur first phase of the project was to determine mappings of CWL to Nextflow and familiarize\nourselves with how the current implementation of the converter supports a number of CWL specific\nfeatures.\n\n### Mapping CWL to Nextflow\n\nInputs in the CWL workflow file are initially parsed as _channels_ or other Nextflow input types.\nEach step specified in the workflow is then parsed independently. At the time of writing\nsubworkflows are not supported, each step must be a CWL `CommandLineTool` file.\n\nThe image below shows an example of the major components in the CWL files and then post-conversion (click to zoom).\n\n[![Nextflow CWL conversion](/img/cwl2nxf-min.png)](/img/cwl2nxf-min.png)\n\nCWL and Nextflow share a similar structure of defining inputs and outputs as shown above.\n\nA notable difference between the two is how tasks are defined. CWL requires either a separate\nfile for each task or a sub-workflow. CWL also requires the explicit mapping of each command\nline option for an executed tool. This is done using YAML meta-annotation to indicate the position, prefix, etc.\nfor each command line option.\n\nIn Nextflow a task command is defined as a separated component in the `process` definition and\nit is ultimately a multiline string which is interpreted by a command script by the underlying\nsystem. Input parameters can be used in the command string with a simple variable interpolation\nmechanism. This is beneficial as it simplifies porting existing BASH scripts to Nextflow\nwith minimal refactoring.\n\nThese examples highlight some of the differences between the two approaches, and the difficulties\nconverting complex use cases such as scatter, CWL expressions, and conditional command line inclusion.\n\n### Current status\n\nThe cwl2nxf is a Groovy based tool with a limited conversion ability. It parses the\nYAML documents and maps the various CWL objects to Nextflow. Conversion examples are\nprovided as part of the repository along with documentation for each example specifying the mapping.\n\nThis project was initially focused on developing an understanding of how to translate CWL to Nextflow.\nA number of CWL specific features such as scatter, secondary files and simple JavaScript expressions\nwere analyzed and implemented.\n\nThe GitHub repository includes instructions on how to build cwl2nxf and an example usage.\nThe tool can be executed as either just a parser printing the converted CWL to stdout,\nor by specifying an output file which will generate the Nextflow script file and if necessary\na config file.\n\nThe tool takes in a CWL workflow file and the YAML inputs file. It does not currently work\nwith a standalone `CommandLineTool`. The following example show how to run it:\n\n```\njava -jar build/libs/cwl2nxf-*.jar rnatoy.cwl samp.yaml\n```\n\n
\nSee the GitHub [repository](https://github.com/nextflow-io/cwl2nxf) for further details.\n\n### Conclusion\n\nWe are continuing to investigate ways to improve the interoperability of Nextflow with CWL.\nAlthough still an early prototype, the cwl2nxf tool provides some level of conversion of CWL to Nextflow.\n\nWe are also planning to explore [CWL Avro](https://github.com/common-workflow-language/cwlavro),\nwhich may provide a more efficient way to parse and handle CWL objects for conversion to Nextflow.\n\nAdditionally, a number of workflows in the GitHub repository have been implemented in both\nCWL and Nextflow which can be used as a comparison of the two languages.\n\nThe Nextflow team will be presenting a short talk and participating in the Codefest at [BOSC 2017](https://www.open-bio.org/wiki/BOSC_2017).\nWe are interested in hearing from the community regarding CWL to Nextflow conversion, and would like\nto encourage anyone interested to contribute to the cwl2nxf project.", - "images": [], + "images": [ + "/img/cwl2nxf-min.png" + ], "author": "Kevin Sayers", "tags": "nextflow,workflow,reproducibility,cwl" }, @@ -152,7 +162,9 @@ "title": "Nexflow Hackathon 2017", "date": "2017-09-30T00:00:00.000Z", "content": "Last week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation\n(CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as\nwell as two hackathon sessions for more advanced users.\n\nNearly 50 participants attended over the two days which included an entertaining tapas course\nduring the first evening!\n\nOne of the main objectives of the event was to bring together Nextflow users to work\ntogether on common interest projects. There were several proposals for the hackathon\nsessions and in the end five diverse ideas were chosen for communal development ranging from\nnew pipelines through to the addition of new features in Nextflow.\n\nThe proposals and outcomes of each the projects, which can be found in the issues section\nof [this GitHub repository](https://github.com/nextflow-io/hack17), have been summarised below.\n\n### Nextflow HTML tracing reports\n\nThe HTML tracing project aims to generate a rendered version of the Nextflow trace file to\nenable fast sorting and visualisation of task/process execution statistics.\n\nCurrently the data in the trace includes information such as CPU duration, memory usage and\ncompletion status of each task, however wading through the file is often not convenient\nwhen a large number of tasks have been executed.\n\n[Phil Ewels](https://github.com/ewels) proposed the idea and led the coordination effort\nwith the outcome being a very impressive working prototype which can be found in the Nextflow\nbranch `html-trace`.\n\nAn image of the example report is shown below with the interactive HTML available\n[here](/misc/nf-trace-report.html). It is expected to be merged into the main branch of Nextflow\nwith documentation in a near-future release.\n\n![Nextflow HTML execution report](/img/nf-trace-report-min.png)\n\n### Nextflow pipeline for 16S microbial data\n\nThe H3Africa Bioinformatics Network have been developing several pipelines which are used\nacross the participating centers. The diverse computing resources available across the nodes has led to\nmembers wanting workflow solutions with a particular focus on portability.\n\nWith this is mind, Scott Hazelhurst proposed a project for a 16S Microbial data analysis\npipeline which had [previously been developed using CWL](https://github.com/h3abionet/h3abionet16S/tree/master).\n\nThe participants made a new [branch](https://github.com/h3abionet/h3abionet16S/tree/nextflow)\nof the original pipeline and ported it into Nextflow.\n\nThe pipeline will continue to be developed with the goal of acting as a comparison between\nCWL and Nextflow. It is thought this can then be extended to other pipelines by both those\nwho are already familiar with Nextflow as well as used as a tool for training newer users.\n\n### Nextflow modules prototyping\n\n_Toolboxing_ allows users to incorporate software into their pipelines in an efficient and\nreproducible manner. Various software repositories are becoming increasing popular,\nhighlighted by the over 5,000 tools available in the [Galaxy Toolshed](https://toolshed.g2.bx.psu.edu/).\n\nProjects such as [Biocontainers](http://biocontainers.pro/) aim to wrap up the execution\nenvironment using containers. [Myself](https://github.com/skptic) and [Johan Viklund](https://github.com/viklund)\nwished to piggyback off existing repositories and settled on [Dockstore](https://dockstore.org)\nwhich is an open platform compliant with the [GA4GH](http://genomicsandhealth.org) initiative.\n\nThe majority of tools in Dockstore are written in the CWL and therefore we required a parser\nbetween the CWL CommandLineTool class and Nextflow processes. Johan was able to develop\na parser which generates Nextflow processes for several Dockstore tools.\n\nAs these resources such as Dockstore become mature and standardised, it will be\npossible to automatically generate a _Nextflow Store_ and enable efficient incorporation\nof tools into workflows.\n\n\n\n_Example showing a Nextflow process generated from the Dockstore CWL repository for the tool BAMStats._\n\n### Nextflow pipeline for de novo assembly of nanopore reads\n\n[Nanopore sequencing](https://en.wikipedia.org/wiki/Nanopore_sequencing) is an exciting\nand emerging technology which promises to change the landscape of nucleotide sequencing.\n\nWith keen interest in Nanopore specific pipelines, [Hadrien Gourlé](https://github.com/HadrienG)\nlead the hackathon project for _Nanoflow_.\n\n[Nanoflow](https://github.com/HadrienG/nanoflow) is a de novo assembler of bacterials genomes\nfrom nanopore reads using Nextflow.\n\nDuring the two days the participants developed the pipeline for adapter trimming as well\nas assembly and consensus sequence generation using either\n[Canu](https://github.com/marbl/canu) and [Miniasm](https://github.com/lh3/miniasm).\n\nThe future plans are to finalise the pipeline to include a polishing step and a genome\nannotation step.\n\n### Nextflow AWS Batch integration\n\nNextflow already has experimental support for [AWS Batch](https://aws.amazon.com/batch/)\nand the goal of this project proposed by [Francesco Strozzi](https://github.com/fstrozzi)\nwas to improve this support, add features and test the implementation on real world pipelines.\n\nEarlier work from [Paolo Di Tommaso](https://github.com/pditommaso) in the Nextflow\nrepository, highlighted several challenges to using AWS Batch with Nextflow.\n\nThe major obstacle described by [Tim Dudgeon](https://github.com/tdudgeon) was the requirement\nfor each Docker container to have a version of the Amazon Web Services Command Line tools\n(aws-cli) installed.\n\nA solution was to install the AWS CLI tools on a custom AWS image that is used by the\nDocker host machine, and then mount the directory that contains the necessary items into\neach of the Docker containers as a volume. Early testing suggests this approach works\nwith the hope of providing a more elegant solution in future iterations.\n\nThe code and documentation for AWS Batch has been prepared and will be tested further\nbefore being rolled into an official Nextflow release in the near future.\n\n### Conclusion\n\nThe event was seen as an overwhelming success and special thanks must be made to all the\nparticipants. As the Nextflow community continues to grow, it would be fantastic to make these types\nmeetings more regular occasions.\n\nIn the meantime we have put together a short video containing some of the highlights\nof the two days.\n\nWe hope to see you all again in Barcelona soon or at new events around the world!\n\n", - "images": [], + "images": [ + "/img/nf-trace-report-min.png" + ], "author": "Evan Floden", "tags": "nextflow,docker,hackathon" }, @@ -169,8 +181,10 @@ "slug": "2017/nextflow-workshop", "title": "Nextflow workshop is coming!", "date": "2017-04-26T00:00:00.000Z", - "content": "We are excited to announce the first Nextflow workshop that will take place at the\nBarcelona Biomedical Research Park building ([PRBB](https://www.prbb.org/)) on 14-15th September 2017.\n\nThis event is open to everybody who is interested in the problem of computational workflow\nreproducibility. Leading experts and users will discuss the current state of the Nextflow\ntechnology and how it can be applied to manage -omics analyses in a reproducible manner.\nBest practices will be introduced on how to deploy real-world large-scale genomic\napplications for precision medicine.\n\nDuring the hackathon, organized for the second day, participants will have the\nopportunity to learn how to write self-contained, replicable data analysis\npipelines along with Nextflow expert developers.\n\nMore details at [this link](http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics).\nThe registration form is [available here](http://apps.crg.es/content/internet/events/webforms/17502) (deadline 15th Jun).\n\n### Schedule (draft)\n\n#### Thursday, 14 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
10.00Welcome & introduction
\n *Cedric Notredame
\n Comparative Bioinformatics, CRG, Spain*
10.15Nextflow: a quick review
\n *Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain*
10.30Standardising Swedish genomics analyses using Nextflow
\n *Phil Ewels
\n National Genomics Infrastructure, SciLifeLab, Sweden*\n
11.00Building Pipelines to Support African Bioinformatics: the H3ABioNet Pipelines Project
\n *Scott Hazelhurst
\n University of the Witwatersrand, Johannesburg, South Africa*\n
11.30coffee break\n
12.00Using Nextflow for Large Scale Benchmarking of Phylogenetic methods and tools
\n *Frédéric Lemoine
\n Evolutionary Bioinformatics, Institut Pasteur, France*\n
12.30Nextflow for chemistry - crossing the divide
\n *Tim Dudgeon
\n Informatics Matters Ltd, UK*\n
12.50From zero to Nextflow @ CRG's Biocore
\n *Luca Cozzuto
\n Bioinformatics Core Facility, CRG, Spain*\n
13.10(to be determined)
13.30Lunch
14.30
18.30
Hackathon & course
\n\n#### Friday, 15 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
9.30Computational workflows for omics analyses at the IARC
\n *Matthieu Foll
\n International Agency for Research on Cancer (IARC), France*
10.00Medical Genetics at Oslo University Hospital
\n *Hugues Fontanelle
\n Oslo University Hospital, Norway*
10.30Inside-Out: reproducible analysis of external data, inside containers with Nextflow
\n *Evan Floden
\n Comparative Bioinformatics, CRG, Spain*
11.00coffee break
11.30(title to be defined)
\n *Johnny Wu
\n Roche Sequencing, Pleasanton, USA*
12.00Standardizing life sciences datasets to improve studies reproducibility in the EOSC
\n *Jordi Rambla
\n European Genome-Phenome Archive, CRG*
12.20Unbounded by Economics
\n *Brendan Bouffler
\n AWS Research Cloud Program, UK*
12.40Challenges with large-scale portable computational workflows
\n *Paolo Di Tommaso
\n Comparative Bioinformatics, CRG, Spain*
13.00Lunch
14.00
18.00
Hackathon
\n\n
\nSee you in Barcelona!\n\n![Nextflow workshop](/img/nf-workshop.png)", - "images": [], + "content": "We are excited to announce the first Nextflow workshop that will take place at the\nBarcelona Biomedical Research Park building ([PRBB](https://www.prbb.org/)) on 14-15th September 2017.\n\nThis event is open to everybody who is interested in the problem of computational workflow\nreproducibility. Leading experts and users will discuss the current state of the Nextflow\ntechnology and how it can be applied to manage -omics analyses in a reproducible manner.\nBest practices will be introduced on how to deploy real-world large-scale genomic\napplications for precision medicine.\n\nDuring the hackathon, organized for the second day, participants will have the\nopportunity to learn how to write self-contained, replicable data analysis\npipelines along with Nextflow expert developers.\n\nMore details at [this link](http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics).\nThe registration form is [available here](http://apps.crg.es/content/internet/events/webforms/17502) (deadline 15th Jun).\n\n### Schedule (draft)\n\n#### Thursday, 14 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
10.00Welcome & introduction
\n *Cedric Notredame
Comparative Bioinformatics, CRG, Spain*
10.15Nextflow: a quick review
\n *Paolo Di Tommaso
Comparative Bioinformatics, CRG, Spain*
10.30Standardising Swedish genomics analyses using Nextflow
\n *Phil Ewels
National Genomics Infrastructure, SciLifeLab, Sweden*\n
11.00Building Pipelines to Support African Bioinformatics: the H3ABioNet Pipelines Project
\n *Scott Hazelhurst
University of the Witwatersrand, Johannesburg, South Africa*\n
11.30coffee break\n
12.00Using Nextflow for Large Scale Benchmarking of Phylogenetic methods and tools
\n *Frédéric Lemoine
Evolutionary Bioinformatics, Institut Pasteur, France*\n
12.30Nextflow for chemistry - crossing the divide
\n *Tim Dudgeon
Informatics Matters Ltd, UK*\n
12.50From zero to Nextflow @ CRG's Biocore
\n *Luca Cozzuto
Bioinformatics Core Facility, CRG, Spain*\n
13.10(to be determined)
13.30Lunch
14.30
18.30
Hackathon & course
\n\n#### Friday, 15 September\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
9.30Computational workflows for omics analyses at the IARC
\n *Matthieu Foll
International Agency for Research on Cancer (IARC), France*
10.00Medical Genetics at Oslo University Hospital
\n *Hugues Fontanelle
Oslo University Hospital, Norway*
10.30Inside-Out: reproducible analysis of external data, inside containers with Nextflow
\n *Evan Floden
Comparative Bioinformatics, CRG, Spain*
11.00coffee break
11.30(title to be defined)
\n *Johnny Wu
Roche Sequencing, Pleasanton, USA*
12.00Standardizing life sciences datasets to improve studies reproducibility in the EOSC
\n *Jordi Rambla
European Genome-Phenome Archive, CRG*
12.20Unbounded by Economics
\n *Brendan Bouffler
AWS Research Cloud Program, UK*
12.40Challenges with large-scale portable computational workflows
\n *Paolo Di Tommaso
Comparative Bioinformatics, CRG, Spain*
13.00Lunch
14.00
18.00
Hackathon
\n\n
\nSee you in Barcelona!\n\n![Nextflow workshop](/img/nf-workshop.png)", + "images": [ + "/img/nf-workshop.png" + ], "author": "Paolo Di Tommaso", "tags": "nextflow,genomic,workflow,reproducibility,workshop," }, @@ -187,7 +201,7 @@ "slug": "2018/bringing-nextflow-to-google-cloud-wuxinextcode", "title": "Bringing Nextflow to Google Cloud Platform with WuXi NextCODE", "date": "2018-12-18T00:00:00.000Z", - "content": "
\n*This is a guest post authored by Halli Bjornsson, Head of Product Development Operations at WuXi NextCODE and Jonathan Sheffi, Product Manager, Biomedical Data at Google Cloud.\n*\n
\n\nGoogle Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.\n\nWuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the [Google Pipelines API](https://cloud.google.com/genomics/pipelines). Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.\n\n
\n
\n \n
\n
\n \n
\n
\n\n\nNextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud.\n\n### Get started!\n\nThis feature is currently available in the Nextflow edge channel. Follow these steps to get started:\n\n- Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:\n\n ```\n export NXF_VER=18.12.0-edge\n export NXF_MODE=google\n curl https://get.nextflow.io | bash\n ```\n\n- [Enable the Google Genomics API for your GCP projects](https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com).\n\n- [Download and set credentials for your Genomics API-enabled project](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).\n\n- Change your `nextflow.config` file to use the Google Pipelines executor and specify the required config values for it as [described in the documentation](/docs/edge/google.html#google-pipelines).\n\n- Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the `-work-dir` option. For example:\n\n ```\n nextflow run rnaseq-nf -work-dir gs://your-bucket/scratch\n ```\n\n
\nYou can find more detailed info about available configuration settings and deployment options at [this link](/docs/edge/google.html).\n\nWe’re thrilled to make this contribution available to the Nextflow community!", + "content": "
\n*This is a guest post authored by Halli Bjornsson, Head of Product Development Operations at WuXi NextCODE and Jonathan Sheffi, Product Manager, Biomedical Data at Google Cloud. *\n
\n\nGoogle Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.\n\nWuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the [Google Pipelines API](https://cloud.google.com/genomics/pipelines). Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.\n\n
\n
\n \n
\n
\n \n
\n
\n\n\nNextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud.\n\n### Get started!\n\nThis feature is currently available in the Nextflow edge channel. Follow these steps to get started:\n\n- Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:\n\n ```\n export NXF_VER=18.12.0-edge\n export NXF_MODE=google\n curl https://get.nextflow.io | bash\n ```\n\n- [Enable the Google Genomics API for your GCP projects](https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com).\n\n- [Download and set credentials for your Genomics API-enabled project](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).\n\n- Change your `nextflow.config` file to use the Google Pipelines executor and specify the required config values for it as [described in the documentation](/docs/edge/google.html#google-pipelines).\n\n- Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the `-work-dir` option. For example:\n\n ```\n nextflow run rnaseq-nf -work-dir gs://your-bucket/scratch\n ```\n\n
\nYou can find more detailed info about available configuration settings and deployment options at [this link](/docs/edge/google.html).\n\nWe’re thrilled to make this contribution available to the Nextflow community!", "images": [ "/img/google-cloud.svg", "/img/wuxi-nextcode.jpeg" @@ -199,7 +213,7 @@ "slug": "2018/clarification-about-nextflow-license", "title": "Clarification about the Nextflow license", "date": "2018-07-20T00:00:00.000Z", - "content": "Over past week there was some discussion on social media regarding the Nextflow license\nand its impact on users' workflow applications.\n\n> … don’t use Nextflow, yo. [https://t.co/Paip5W1wgG](https://t.co/Paip5W1wgG)\n> \n> — Konrad Rudolph 👨‍🔬💻 (@klmr) [July 10, 2018](https://twitter.com/klmr/status/1016606226103357440?ref_src=twsrc%5Etfw)\n\n\n\n> This is certainly disappointing. An argument in favor of writing workflows in [@commonwl](https://twitter.com/commonwl?ref_src=twsrc%5Etfw), which is independent of the execution engine. [https://t.co/mIbdLQQxmf](https://t.co/mIbdLQQxmf)\n> \n> — John Didion (@jdidion) [July 10, 2018](https://twitter.com/jdidion/status/1016612435938160640?ref_src=twsrc%5Etfw)\n\n\n\n> GPL is generally considered toxic to companies due to fear of the viral nature of the license.\n> \n> — Jeff Gentry (@geoffjentry) [July 10, 2018](https://twitter.com/geoffjentry/status/1016656901139025921?ref_src=twsrc%5Etfw)\n\n\n\n### What's the problem with GPL?\n\nNextflow has been released under the GPLv3 license since its early days [over 5 years ago](https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy).\nGPL is a very popular open source licence used by many projects\n(like, for example, [Linux](https://www.kernel.org/doc/html/v4.17/process/license-rules.html) and [Git](https://git-scm.com/about/free-and-open-source))\nand it has been designed to promote the adoption and spread of open source software and culture.\n\nWith this idea in mind, GPL requires the author of a piece of software, _derived_ from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.\n\nThis is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.\n\nHowever, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example\n[this post by Titus Brown](http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html) at this regard.\n\n#### What's the impact of the Nextflow license on my application?\n\nIf you are not distributing your application, based on Nextflow, it doesn't affect you in any way.\nIf you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See [here](https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic) and [here](https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL).\n\n**However, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though.\n**\n\n### That's all?\n\nNo. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.\n\nWe are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.", + "content": "Over past week there was some discussion on social media regarding the Nextflow license\nand its impact on users' workflow applications.\n\n> … don’t use Nextflow, yo. [https://t.co/Paip5W1wgG](https://t.co/Paip5W1wgG)\n> \n> — Konrad Rudolph 👨‍🔬💻 (@klmr) [July 10, 2018](https://twitter.com/klmr/status/1016606226103357440?ref_src=twsrc%5Etfw)\n\n\n\n> This is certainly disappointing. An argument in favor of writing workflows in [@commonwl](https://twitter.com/commonwl?ref_src=twsrc%5Etfw), which is independent of the execution engine. [https://t.co/mIbdLQQxmf](https://t.co/mIbdLQQxmf)\n> \n> — John Didion (@jdidion) [July 10, 2018](https://twitter.com/jdidion/status/1016612435938160640?ref_src=twsrc%5Etfw)\n\n\n\n> GPL is generally considered toxic to companies due to fear of the viral nature of the license.\n> \n> — Jeff Gentry (@geoffjentry) [July 10, 2018](https://twitter.com/geoffjentry/status/1016656901139025921?ref_src=twsrc%5Etfw)\n\n\n\n### What's the problem with GPL?\n\nNextflow has been released under the GPLv3 license since its early days [over 5 years ago](https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy).\nGPL is a very popular open source licence used by many projects\n(like, for example, [Linux](https://www.kernel.org/doc/html/v4.17/process/license-rules.html) and [Git](https://git-scm.com/about/free-and-open-source))\nand it has been designed to promote the adoption and spread of open source software and culture.\n\nWith this idea in mind, GPL requires the author of a piece of software, _derived_ from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.\n\nThis is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.\n\nHowever, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example\n[this post by Titus Brown](http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html) at this regard.\n\n#### What's the impact of the Nextflow license on my application?\n\nIf you are not distributing your application, based on Nextflow, it doesn't affect you in any way.\nIf you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See [here](https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic) and [here](https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL).\n\n**However, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though. **\n\n### That's all?\n\nNo. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.\n\nWe are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.", "images": [], "author": "Paolo Di Tommaso", "tags": "nextflow,gpl,license" @@ -240,7 +254,13 @@ "title": "Nextflow turns five! Happy birthday!", "date": "2018-04-03T00:00:00.000Z", "content": "Nextflow is growing up. The past week marked five years since the [first commit](https://github.com/nextflow-io/nextflow/commit/c080150321e5000a2c891e477bb582df07b7f75f) of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!\n\nA lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the [Gitter channel](https://gitter.im/nextflow-io/nextflow).\n\nHere are a few neat developments churning out of the birthday cake mix.\n\n### nf-core\n\n[nf-core](https://nf-core.github.io/) is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being led by [Phil Ewels](https://github.com/ewels) of [MultiQC](http://multiqc.info/) fame. The principle is that _nf-core_ pipelines can be used out-of-the-box or as inspiration for something different.\n\nAs well as being a place for best-practise pipelines, other features of _nf-core_ include the [cookie cutter template tool](https://github.com/nf-core/cookiecutter) which provides a fast way to create a dependable workflow using many of Nextflow’s sweet capabilities such as:\n\n- _Outline:_ Skeleton pipeline script.\n- _Data:_ Reference Genome implementation (AWS iGenomes).\n- _Configuration:_ Robust configuration setup.\n- _Containers:_ Skeleton files for Docker image generation.\n- _Reporting:_ HTML email functionality and and HTML results output.\n- _Documentation:_ Installation, Usage, Output, Troubleshooting, etc.\n- _Continuous Integration:_ Skeleton files for automated testing using Travis CI.\n\nThere is also a Python package with helper tools for Nextflow.\n\nYou can find more information about the community via the project [website](https://nf-core.github.io), [GitHub repository](https://github.com/nf-core), [Twitter account](https://twitter.com/nf_core) or join the dedicated [Gitter](https://gitter.im/nf-core/Lobby) chat.\n\n
\n\n[![nf-core logo](/img/nf-core-logo-min.png)](https://nf-co.re)\n\n
\n\n### Kubernetes has landed\n\nAs of version 0.28.0 Nextflow now has support for Kubernetes. If you don’t know much about Kubernetes, at its heart it is an open-source platform for the management and deployment of containers at scale. Google led the initial design and it is now maintained by the Cloud Native Computing Foundation. I found the [The Illustrated Children's Guide to Kubernetes](https://www.youtube.com/watch?v=4ht22ReBjno) particularly useful in explaining the basic vocabulary and concepts.\n\nKubernetes looks be one of the key technologies for the application of containers in the cloud as well as for building Infrastructure as a Service (IaaS) and Platform and a Service (PaaS) applications. We have been approached by many users who wish to use Nextflow with Kubernetes to be able to deploy workflows across both academic and commercial settings. With enterprise versions of Kubernetes such as Red Hat's [OpenShift](https://www.openshift.com/), it was becoming apparent there was a need for native execution with Nextflow.\n\nThe new command `nextflow kuberun` launches the Nextflow driver as a _pod_ which is then able to run workflow tasks as other pods within a Kubernetes cluster. You can read more in the documentation on Kubernetes support for Nextflow [here](https://www.nextflow.io/docs/latest/kubernetes.html).\n\n![Nextflow and Kubernetes](/img/nextflow-kubernetes-min.png)\n\n### Improved reporting and notifications\n\nFollowing the hackathon in September we wrote about the addition of HTML trace reports that allow for the generation HTML detailing resource usage (CPU time, memory, disk i/o etc).\n\nThanks to valuable feedback there has continued to be many improvements to the reports as tracked through the Nextflow GitHub issues page. Reports are now able to display [thousands of tasks](https://github.com/nextflow-io/nextflow/issues/547) and include extra information such as the [container engine used](https://github.com/nextflow-io/nextflow/issues/521). Tasks can be filtered and an [overall progress bar](https://github.com/nextflow-io/nextflow/issues/534) has been added.\n\nYou can explore a [real-world HTML report](/misc/nf-trace-report2.html) and more information on HTML reports can be found in the [documentation](https://www.nextflow.io/docs/latest/tracing.html).\n\nThere has also been additions to workflow notifications. Currently these can be configured to automatically send a notification email when a workflow execution terminates. You can read more about how to setup notifications in the [documentation](https://www.nextflow.io/docs/latest/mail.html?highlight=notification#workflow-notification).\n\n### Syntax-tic!\n\nWriting workflows no longer has to be done in monochrome. There is now syntax highlighting for Nextflow in the popular [Atom editor](https://atom.io) as well as in [Visual Studio Code](https://code.visualstudio.com).\n\n
\n\n[![Nextflow syntax highlighting with Atom](/img/atom-min.png)](/img/atom-min.png)\n\n
\n\n[![Nextflow syntax highlighting with VSCode](/img/vscode-min.png)](/img/vscode-min.png)\n\n
\n\nYou can find the Atom plugin by searching for Nextflow in Atoms package installer or clicking [here](https://atom.io/packages/language-nextflow). The Visual Studio plugin can be downloaded [here](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow).\n\nOn a related note, Nextflow is now an official language on GitHub!\n\n![GitHub nextflow syntax](/img/github-nf-syntax-min.png)\n\n### Conclusion\n\nNextflow developments are progressing faster than ever and with the help of the community, there are a ton of great new features on the way. If you have any suggestions of your killer NF idea then please drop us a line, open an issue or even better, join in the fun.\n\nOver the coming months Nextflow will be reaching out with several training and presentation sessions across the US and Europe. We hope to see as many of you as possible on the road.", - "images": [], + "images": [ + "/img/nf-core-logo-min.png", + "/img/nextflow-kubernetes-min.png", + "/img/atom-min.png", + "/img/vscode-min.png", + "/img/github-nf-syntax-min.png" + ], "author": "Evan Floden", "tags": "nextflow,kubernetes,nf-core" }, @@ -375,7 +395,12 @@ "title": "6 Tips for Setting Up Your Nextflow Dev Environment", "date": "2021-03-04T00:00:00.000Z", "content": "_This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html)._\n\nThis guide is designed to walk you through a basic development setup for writing Nextflow pipelines.\n\n### 1. Installation\n\nNextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). Installing Nextflow is straightforward. You just need to download the `nextflow` executable. In your terminal type the following commands:\n\n```\n$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin\n```\n\nThe first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note `/usr/local/bin` is the default for MacOS, you might want to choose `~/bin` or `/usr/bin` depending on your PATH definition and operating system.\n\n### 2. Text Editor or IDE?\n\nNextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience.\n\nMy current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With [VSCode installed](https://code.visualstudio.com/download), you can search for the Nextflow extension in the marketplace.\n\n![VSCode with Nextflow Syntax Highlighting](/img/vscode-nf-highlighting.png)\n\nOther syntax highlighting has been made available by the community including:\n\n- [Atom](https://atom.io/packages/language-nextflow)\n- [Vim](https://github.com/LukeGoodsell/nextflow-vim)\n- [Emacs](https://github.com/Emiller88/nextflow-mode)\n\n### 3. The Nextflow REPL console\n\nThe Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts.\n\nStart the Nextflow console with the following command:\n\n```\n$ nextflow console\n```\n\n![Nextflow REPL console](/img/nf-repl-console.png)\n\nUse the `CTRL+R` keyboard shortcut to run (`⌘+R`on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the **Run selection**.\n\n### 4. Containerize all the things\n\nContainers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.\n\nOnce you have [downloaded and installed Docker](https://docs.docker.com/engine/install/), try pull a public docker image:\n\n```\n$ docker pull quay.io/nextflow/rnaseq-nf\n```\n\nTo run a Nextflow pipeline using the latest tag of the image, we can use:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest\n```\n\nTo learn more about building Docker containers, see the [Seqera Labs tutorial](https://seqera.io/training/#_manage_dependencies_containers) on managing dependencies with containers.\n\nAdditionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.\n\n![VSCode with Docker Extension](/img/vs-code-with-docker-extension.png)\n\n### 5. Use Tower to monitor your pipelines\n\nWhen developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With [Nextflow Tower](https://tower.nf), we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.\n\nSign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, `~/.bashrc`, or include them in your nextflow.config.\n\n```\n$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n```\n\nWe can then add the `-with-tower` child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.\n\n```\n$ nextflow run nextflow-io/rnaseq-nf -with-tower\n```\n\n### 6. nf-core tools\n\n[nf-core](https://nf-co.re/) is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage.\n\nThese tools are particularly useful for developers contributing to the community pipelines on [GitHub](https://github.com/nf-core/) with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.\n\n`nf-core tools` is a python package that can be installed in your development environment from Bioconda or PyPi.\n\n```\n$ conda install nf-core\n```\n\nor\n\n```\n$ pip install nf-core\n```\n\n![nf-core tools](/img/nf-core-tools.png)\n\n### Conclusion\n\nDeveloper workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.\n\nSolutions such as [GitHub Codespaces](https://github.com/features/codespaces) and [Gitpod](https://www.gitpod.io/) are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!", - "images": [], + "images": [ + "/img/vscode-nf-highlighting.png", + "/img/nf-repl-console.png", + "/img/vs-code-with-docker-extension.png", + "/img/nf-core-tools.png" + ], "author": "Evan Floden", "tags": "nextflow,development,learning" }, @@ -393,7 +418,27 @@ "title": "Setting up a Nextflow environment on Windows 10", "date": "2021-10-13T00:00:00.000Z", "content": "For Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as [Cygwin](http://www.cygwin.com/) or [Wubi](https://wiki.ubuntu.com/WubiGuide). Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.\n\nThe Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch.\n\nThis document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10.\n\n## High-level Steps\n\nThe steps described in this guide are as follows:\n\n- Install Windows PowerShell\n- Configure the Windows Subsystem for Linux (WSL2)\n- Obtain and Install a Linux distribution (on WSL2)\n- Install Windows Terminal\n- Install and configure Docker\n- Download and install an IDE (VS Code)\n- Install and test Nextflow\n- Configure X-Windows for use with the Nextflow Console\n- Install and Configure GIT\n\n## Install Windows PowerShell\n\nPowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install.\n\nPowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality.\n\n- You can obtain PowerShell for Windows from GitHub at the URL https://github.com/PowerShell/PowerShell.\n- Download and install the latest stable version of PowerShell for Windows x64 - e.g., [powershell-7.1.3-win-x64.msi](https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi).\n- If you run into difficulties, Microsoft provides detailed instructions [here](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1).\n\n## Configure the Windows Subsystem for Linux (WSL)\n\n### Enable the Windows Subsystem for Linux\n\nMake sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by select WIN-R (using the Windows key to run a command) and running the utility `winver`.\n\nFrom within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"_Run as administrator_\".\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nYou can learn more about DISM [here](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism).\n\n### Step 2: Enable the Virtual Machine Feature\n\nWithin PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.\n\n```powershell\nPS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\n```\n\nAfter enabling the Virtual Machine Platform support, **restart your machine**.\n\n### Step 3: Download the Linux Kernel Update Package\n\nNextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 [here](https://docs.microsoft.com/en-us/windows/wsl/compare-versions). Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:\n\n[WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi)\n\nOnce downloaded, double click on the kernel update package and select \"Yes\" to install it with elevated permissions.\n\n### STEP 4: Set WSL2 as your Default Version\n\nFrom within PowerShell:\n\n```powershell\nPS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2\n```\n\nIf you run into difficulties with any of these steps, Microsoft provides detailed installation instructions [here](https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps).\n\n## Obtain and Install a Linux Distribution on WSL\n\nIf you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux.\n\n- Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - https://aka.ms/wslstore.\n\n ![Linux Distributions at the Microsoft Store](/img/ms-store.png)\n\n- We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.\n- When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have `sudo` privileges. Whenever a shell is started, it will default to this user.\n- After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:\n\n ```bash\n sudo apt update && sudo apt upgrade\n ```\n\n- This is also a good time to add any additional Linux packages that you will want to use.\n\n ```bash\n sudo apt install net-tools\n ```\n\n## Install Windows Terminal\n\nWhile not necessary, it is a good idea to install [Windows Terminal](https://github.com/microsoft/terminal) at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.\n\nWindows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs.\n\n![Windows Terminal](/img/windows-terminal.png)\n\nInstructions for downloading and installing Windows Terminal are available at: https://docs.microsoft.com/en-us/windows/terminal/get-started.\n\nIt is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments.\n\nSome Windows Terminal commands you'll need right away are provided below:\n\n- Split the active window vertically: SHIFT ALT =\n- Split the active window horizontally: SHIFT ALT \n- Resize the active window: SHIFT ALT ``\n- Open a new window under the current tab: ALT v (_the new tab icon along the top of the Windows Terminal interface_)\n\n## Installing Docker on Windows\n\nThere are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this \"native Linux\" approach is provided [here](https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9).\n\nA second method is to run [Docker Desktop](https://www.docker.com/products/docker-desktop) on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:\n\n- The installation process is automated\n- Docker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)\n- Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extension\n- Docker Desktop provides support for auto-installing a single-node Kubernetes cluster\n- The Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user's perspective, Docker appears to be running natively on Linux.\n\nAn explanation of how the Docker Desktop WSL 2 Back-end works is provided [here](https://www.docker.com/blog/new-docker-desktop-wsl2-backend/).\n\n### Step 1: Install Docker Desktop on Windows\n\n- Download and install Docker Desktop for Windows from the following link: https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exe\n- Follow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.\n- After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.\n- The installation process is straightforward, but if you run into difficulties, detailed instructions are available [here](https://docs.docker.com/docker-for-windows/install/).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-images.png)\n\n The Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available [here](https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2).\n\n ![Nextflow Visual Studio Code Extension](/img/docker-windows-arch.png)\n\n### Step 2: Verify the Docker installation\n\nNow that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.\n\n- Run a Docker command from the Linux shell as shown below below. This command downloads a **centos** image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n\n ```console\n $ docker run -ti centos:6\n [root@02ac0beb2d2c /]# hostname\n 02ac0beb2d2c\n ```\n\n- You can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running `docker ps` in another window shows the running CentOS Docker container.\n\n ```console\n $ docker ps\n CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\n f5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes \thappy_hopper\n ```\n\n### Step 3: Dealing with exit code 139\n\nYou may encounter exit code `139` when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided [here](https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438) and [here](https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139).\n\nThe solution is to add two lines to a `.wslconfig` file in your Windows home directory. The `.wslconfig` file specifies kernel options that apply to all Linux distributions running under WSL 2.\n\nSome of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix.\n\n- Edit the `.wslconfig` file in your Windows home directory. You can do this using PowerShell as shown:\n\n ```powershell\n PS C:\\Users\\ notepad .wslconfig\n ```\n\n- Add these two lines to the `.wslconfig` file and save it:\n\n ```ini\n [wsl2]\n kernelCommandLine = vsyscall=emulate\n ```\n\n- After this, **restart your machine** to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code `139`.\n\n## Install Visual Studio Code as your IDE (optional)\n\nDevelopers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:\n\n- Visual Studio Code - https://code.visualstudio.com/Download (Nextflow VSCode Language plug-in [here](https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md))\n- Eclipse - https://www.eclipse.org/\n- VIM - https://www.vim.org/ (VIM plug-in for Nextflow [here](https://github.com/LukeGoodsell/nextflow-vim))\n- Emacs - https://www.gnu.org/software/emacs/download.html (Nextflow syntax highlighter [here](https://github.com/Emiller88/nextflow-mode))\n- JetBrains PyCharm - https://www.jetbrains.com/pycharm/\n- IntelliJ IDEA - https://www.jetbrains.com/idea/\n- Atom – https://atom.io/ (Nextflow Atom support available [here](https://atom.io/packages/language-nextflow))\n- Notepad++ - https://notepad-plus-plus.org/\n\nWe decided to install Visual Studio Code because it has some nice features, including:\n\n- Support for source code control from within the IDE (Git)\n- Support for developing on Linux via its WSL 2 Video Studio Code Backend\n- A library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an [extension pack for the nf-core community](https://github.com/nf-core/vscode-extensionpack).\n\nDownload Visual Studio Code from https://code.visualstudio.com/Download and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.\n\n- Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path `\\\\wsl$\\`. In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: [**\\\\wsl$\\Ubuntu-20.04**](file://wsl$/Ubuntu-20.04).\n\nNote that the reverse is possible also – from within Linux, `/mnt/c` maps to the Windows C: drive. You can inspect `/etc/mtab` to see the mounted file systems available under Linux.\n\n- It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at https://github.com/nextflow-io/vscode-language-nextflow\n\n ![Nextflow Visual Studio Code Extension](/img/nf-vscode-ext.png)\n\n## Visual Studio Code Remote Development\n\nVisual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment.\n\nWindows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below.\n\n![The Remote Development Environment in VS Code](/img/vscode-remote-dev.png)\n\nAn explanation of how VS Code Remote Development works is provided [here](https://code.visualstudio.com/docs/remote/remote-overview).\n\nVS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.\n\nTo open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select _\"New WSL Window using Distro ..\"_ and select `Ubuntu 20.04`. You'll notice that the environment changes to show that you are working in the WSL: `Ubuntu-20.04` environment.\n\n![Selecting the Remote Dev Environment within VS Code](/img/remote-dev-side-by-side.png)\n\nSelecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.\n\nThe Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.\n\n![Local vs. Remote Extensions in VS Code](/img/vscode-extensions.png)\n\n## Installing Nextflow\n\nWith Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available at https://www.nextflow.io/docs/latest/getstarted.html#installation\n\n### Step 1: Make sure Java is installed (under WSL)\n\nJava is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available [here](https://linuxize.com/post/install-java-on-ubuntu-18-04/). To install the default OpenJDK, follow the instructions below in a Linux shell window:\n\n- Update the _apt_ package index:\n\n ```bash\n sudo apt update\n ```\n\n- Install the latest default OpenJDK package\n\n ```bash\n sudo apt install default-jdk\n ```\n\n- Verify the installation\n\n ```bash\n java -version\n ```\n\n### Step 2: Make sure curl is installed\n\n`curl` is a convenient way to obtain Nextflow. `curl` is included in the default Ubuntu repositories, so installation is straightforward.\n\n- From the shell:\n\n ```bash\n sudo apt update\n sudo apt install curl\n ```\n\n- Verify that `curl` works:\n\n ```console\n $ curl\n curl: try 'curl --help' or 'curl --manual' for more information\n ```\n\n### STEP 3: Download and install Nextflow\n\n- Use `curl` to retrieve Nextflow into a temporary directory and then install it in `/usr/bin` so that the Nextflow command is on your path:\n\n ```bash\n mkdir temp\n cd temp\n curl -s https://get.nextflow.io | bash\n sudo cp nextflow /usr/bin\n ```\n\n- Make sure that Nextflow is executable:\n\n ```bash\n sudo chmod 755 /usr/bin/nextflow\n ```\n\n or if you prefer:\n\n ```bash\n sudo chmod +x /usr/bin/nextflow\n ```\n\n### Step 4: Verify the Nextflow installation\n\n- Make sure Nextflow runs:\n\n ```console\n $ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io\n ```\n\n- Run a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.\n\n ```console\n $ nextflow run hello\n\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\n executor > local (4)\n [06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\n Ciao world!\n\n Hola world!\n\n Bonjour world!\n\n Hello world!\n ```\n\n### Step 5: Run a Containerized Workflow\n\nTo validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at https://github.com/nextflow-io/blast-example. Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.\n\nThe `nextflow.config` file for the Blast example (below) specifies that process logic is encapsulated in the container `nextflow/examples` available from Docker Hub (https://hub.docker.com/r/nextflow/examples).\n\n- On GitHub: [nextflow-io/blast-example/nextflow.config](https://github.com/nextflow-io/blast-example/blob/master/nextflow.config)\n\n ```groovy\n manifest {\n nextflowVersion = '>= 20.01.0'\n }\n\n process {\n container = 'nextflow/examples'\n }\n ```\n\n- Run the _blast-example_ pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:\n\n ```console\n $ nextflow run blast-example -with-docker\n N E X T F L O W ~ version 21.04.2\n Launching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\n executor > local (2)\n [aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n [b3/c41401] process > extract (1) [100%] 1 of 1 ✔\n matching sequences:\n >lcl|1ABO:B unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1ABO:A unnamed protein product\n MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n >lcl|1YCS:B unnamed protein product\n PEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\n GHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\n VQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n >lcl|1IHD:C unnamed protein product\n LPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT\n ```\n\n- Nextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes.\n\n## Configuring an XServer for the Nextflow Console\n\nPipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console's REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.\n\nThe Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided [here](https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3).\n\n- Download an X-Windows server for Windows. In this example, we use the _VcXsrv Windows X Server_ available from source forge at https://sourceforge.net/projects/vcxsrv/.\n\n- Accept all the defaults when running the automated installer. The X-server will end up installed in `c:\\Program Files\\VcXsrv`.\n\n- The automated installation of VcXsrv will create an _\"XLaunch\"_ shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don't need to interact with the XLaunch interface every time you start the X-server.\n\n- Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:\n\n ```powershell\n \"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto\n ```\n\n- Inspecting the new shortcut properties, it should look something like this:\n\n ![X-Server (vcxsrc) Properties](/img/xserver.png)\n\n- Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.\n\n- One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the `netstat` command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using `netstat`, you should see output like the following:\n\n ```powershell\n PS C:\\WINDOWS\\system32> **netstat -abno | findstr 6000**\n TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING 35176\n TCP 127.0.0.1:6000 127.0.0.1:56516 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56517 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56518 ESTABLISHED 35176\n TCP 127.0.0.1:56516 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56517 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56518 127.0.0.1:6000 ESTABLISHED 35176\n TCP 172.28.192.1:6000 172.28.197.205:46290 TIME_WAIT 0\n TCP [::]:6000 [::]:0 LISTENING 35176\n ```\n\n- At this point, the X-server is up and running and awaiting a connection from a client.\n\n- Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.\n\n- The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly.\n\n ```console\n $ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n $ echo $DISPLAY\n 172.28.192.1:0.0\n ```\n\n- Add this command to the end of your `.bashrc` file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n\n ```bash\n cd ~\n vi .bashrc\n ```\n\n ```bash\n # set the X-Windows display to connect to VcXsrv on Windows\n export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n \".bashrc\" 120L, 3912C written\n ```\n\n- Use an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:\n\n ```bash\n sudo apt install xterm\n ```\n\n- Assuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\n Before testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\n Depending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n\n ![Ensure that the Firewall is not interfering](/img/firewall.png)\n\n- With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:\n\n ```bash\n xterm &\n ```\n\n- If everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n\n ![Launch an xterm to verify functionality](/img/xterm.png)\n\n- Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to _\"Open ports to Work and Home networks\"_ for the `vcxsrv.exe` program only as shown:\n\n ![Allowing access to XServer traffic](/img/xserver_setup.png)\n\n- With the X-server running, the `DISPLAY` variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:\n\n ```bash\n nextflow console\n ```\n\n The command above opens the Nextflow REPL console under X-Windows.\n\n ![Nextflow REPL Console under X-Windows](/img/repl_console.png)\n\nInside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.\n\n# Installing Git\n\nCollaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.\n\nAs explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)\n\nDevelopers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.\n\n### Step 1: Install Git on Windows (optional)\n\n- Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.\n\n- Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).\n\n ![Installing Git on Windows](/img/git-install.png)\n\n- The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the `PATH` variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.\n\n- After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git.\n\n- If you plan to use Git from the command line, GitHub provides a useful cheatsheet [here](https://training.github.com/downloads/github-git-cheat-sheet.pdf).\n\n- After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.\n\n ![Using Git within VS Code](/img/git-vscode.png)\n\n- Documentation on using Git with Visual Studio Code is provided at https://code.visualstudio.com/docs/editor/versioncontrol\n\n### Step 2: Install Git on Linux\n\n- Open a Remote VS Code Window on **\\*WSL: Ubuntu 20.04\\*** (By selecting the green icon on the lower-left corner of the VS code interface.)\n\n- Git should already be installed in `/usr/bin`, but you can validate this from the Ubuntu shell:\n\n ```console\n $ git --version\n git version 2.25.1\n ```\n\n- To get started using Git with VS Code Remote on WSL, select the _Source Control icon_ on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to _Clone a Repository_.\n\n- Select \"Clone Repository,\" and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see _blast-example_ appear as a source code repository within VS code as shown:\n\n ![Using Git within VS Code](/img/git-linux-1.png)\n\n- Select the _Explorer_ panel in VS Code to see the cloned _blast-example_ repo. Now we can explore and modify the pipeline code using the IDE.\n\n ![Using Git within VS Code](/img/git-linux-2.png)\n\n- After making modifications to the pipeline, we can execute the _local copy_ of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:\n\n ![Using Git within VS Code](/img/git-linux-3.png)\n\n- With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.\n\n- Git commands are available from within VS Code by selecting the _Source Control_ icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.\n\n ![Using Git within VS Code](/img/git-linux-4.png)\n\n## Summary\n\nWith WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers.\n\nPipelines developed in the Windows environment can easily be extended to compute environments in the cloud.\n\nWhile installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful.\n", - "images": [], + "images": [ + "/img/ms-store.png", + "/img/windows-terminal.png", + "/img/docker-images.png", + "/img/docker-windows-arch.png", + "/img/nf-vscode-ext.png", + "/img/vscode-remote-dev.png", + "/img/remote-dev-side-by-side.png", + "/img/vscode-extensions.png", + "/img/xserver.png", + "/img/firewall.png", + "/img/xterm.png", + "/img/xserver_setup.png", + "/img/repl_console.png", + "/img/git-install.png", + "/img/git-vscode.png", + "/img/git-linux-1.png", + "/img/git-linux-2.png", + "/img/git-linux-3.png", + "/img/git-linux-4.png" + ], "author": "Evan Floden", "tags": "windows,learning" }, @@ -402,7 +447,10 @@ "title": "Analyzing caching behavior of pipelines", "date": "2022-11-10T00:00:00.000Z", "content": "The ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending `-resume` to the `nextflow run` command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.\n\nWe have previously written about Nextflow's [resume functionality](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) as well as some [troubleshooting strategies](https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html) to gain more insights on the caching behavior.\n\nIn this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.\n\n### Local setup for the test\n\nFirst, we clone the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline locally:\n\n```bash\n$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf\n```\n\nIn the examples below, we have used Nextflow `v22.10.0`, Docker `v20.10.8` and `Java v17 LTS` on MacOS.\n\n### Pipeline flowchart\n\nThe flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.\n\n![rnaseq-nf](/img/rnaseq-nf.base.png)\n\n### Logs from initial (fresh) run\n\nAs a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article [Troubleshooting Nextflow resume](https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).\n\nTo have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called `fresh_run.log` and use them later on as \"ground-truth\" for the analysis. In order to save the process hashes we use the `-dump-hashes` flag, which prints them to the log.\n\n**TIP:** We rely upon the [`-log` option](https://www.nextflow.io/docs/latest/cli.html#execution-logs) in the `nextflow` command line interface to be able to supply a custom log file name instead of the default `.nextflow.log`.\n\n```console\n$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n### Edit the `FastQC` process\n\nAfter the initial run of the pipeline, we introduce a change in the `fastqc.nf` module, hard coding the number of threads which should be used to run the `FASTQC` process via Nextflow's [`cpus` directive](https://www.nextflow.io/docs/latest/process.html#cpus).\n\nHere's the output of `git diff` on the contents of `modules/fastqc/main.nf` file:\n\n```diff\n--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }\n```\n\n### Logs from the follow up run\n\nNext, we run the pipeline again with the `-resume` option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called `resumed_run.log`.\n\n```console\n$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]\n```\n\n## Analysis of cache hashes\n\nFrom the summary of the command line output above, we can see that the `RNASEQ:FASTQC (FASTQC on ggal_gut)` and `MULTIQC` processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the `fresh_run` and `resumed_run`.\n\nFor the analysis, we need to keep in mind that:\n\n1. The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.\n\n2. The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.\n\n### Find the process level hashes\n\nWe can use standard Unix tools like `grep`, `cut` and `sort` to address these points and filter out the relevant information:\n\n1. Use `grep` to isolate log entries with `cache hash` string\n2. Remove the prefix time-stamps using `cut -d ‘-’ -f 3`\n3. Remove the caching mode related information using `cut -d ';' -f 1`\n4. Sort the lines based on process names using `sort` to have a standard order before comparison\n5. Use `tee` to print the resultant strings to the terminal and simultaneously save to a file\n\nNow, let’s apply these transformations to the `fresh_run.log` as well as `resumed_run.log` entries.\n\n- `fresh_run.log`\n\n```console\n$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n- `resumed_run.log`\n\n```console\n$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038\n```\n\n### Inference from process top-level hashes\n\nComputing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in [Demystifying Nextflow resume](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html) blog post) . The change we made in the task level CPUs directive and script section of the `FASTQC` process triggered a re-computation of hashes:\n\n```diff\n--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n\n```\n\nEven though we only introduced changes in `FASTQC`, the `MULTIQC` process was re-run since it relies upon the output of the `FASTQC` process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:\n\n![rnaseq-nf after modification](/img/rnaseq-nf.modified.png)\n\n### Understanding why `FASTQC` was re-run\n\nWe can see the full list of `FASTQC` process hashes within the `fresh_run.log` file\n\n```console\n\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n\n```\n\nWhen we isolate and compare the log entries for `FASTQC` between `fresh_run.log` and `resumed_run.log`, we see the following diff:\n\n```diff\n--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n\n```\n\nObservations from the diff:\n\n1. We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command.\n2. There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.\n\nIn other words, the diff from log files is confirming our edits.\n\n### Understanding why `MULTIQC` was re-run\n\nNow, we apply the same analysis technique for the `MULTIQC` process in both log files:\n\n```diff\n--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n```\n\nHere, the highlighted diffs show the directory of the input files, changing as a result of `FASTQC` being re-run; as a result `MULTIQC` has a new hash and has to be re-run as well.\n\n## Conclusion\n\nDebugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run.\n\nWhen analyzing large datasets, it may be worth using the `-dump-hashes` option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.\n\nWhile this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an `nf-cache` plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins.", - "images": [], + "images": [ + "/img/rnaseq-nf.base.png", + "/img/rnaseq-nf.modified.png" + ], "author": "Abhinav Sharma", "tags": "nextflow,cache" }, @@ -412,7 +460,9 @@ "date": "2022-09-18T00:00:00.000Z", "content": "## Introduction\n\n
\n \"Word\n \n\n*Word cloud of scientific interest keywords, averaged across all applications.*\n\n
\n\nOur recent [The State of the Workflow 2022: Community Survey Results](https://seqera.io/blog/state-of-the-workflow-2022-results/) showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.\n\nThanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.\n\nFrom almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.\n\nPeople worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor / mentee pairs](/img/mentorships-round1-map.png)
\n_The mentorship program was worldwide._\n\n## Ndeye Marième Top (mentee) & John Juma (mentor)\n\nFor the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.\n\nTogether, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to [GISAID](https://gisaid.org/). She also got up to speed with the [nf-core/mag](https://nf-co.re/mag) pipeline for metagenomic analysis.\n\n> *\"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\"* - Marième Top (mentee)\n\n> *\"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\"* - John Juma (mentor)\n\n## Philip Ashton (mentee) & Robert Petit (mentor)\n\nPhilip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the [Bactopia](https://bactopia.github.io/) pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.\n\nRobert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.\n\n> *\"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\"* - Phil Ashton (mentee)\n\n> *\"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\"* - Robert Petit (mentor)\n\n## Kalayanee Chairat (mentee) & Alison Meynert (mentor)\n\nKalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom.\n\nWorking with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to [nf-core/configs](https://github.com/nf-core/configs) as an institutional profile. Now she is familiar with these tools, Kalayanee is using [nf-core/sarek](https://nf-co.re/sarek) and [nf-core/rnaseq](https://nf-co.re/rnaseq) to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment.\n\n> *\"The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\"* - Kalayanee Chairat (mentee)\n\n> *\"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\"* - Alison Meynert (mentor)\n\n## Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)\n\nFor the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.\n\nEdward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.\n\n> *\"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\"* - Edward Lukyamuzi (mentee)\n\n> *\"I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\"* - Emilio Garcia-Rios (mentor)\n\n## Suchitra Thapa (mentee) & Maxime Borry (mentor)\n\nSuchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the [metaphlankrona](https://github.com/suchitrathapa/metaphlankrona) pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.\n\nSuchitra will be sharing her work and progress on the pipeline as a poster at the [Nextflow Summit 2022](https://summit.nextflow.io/speakers/suchitra-thapa/).\n\n> *\"This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\"* - Suchitra Thapa (mentee)\n\n> *\"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\"* - Maxime Borry (mentor)\n\n## Conclusion\n\nFeedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.\n\n> \"This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!\" - John Juma (mentor)\n\n![Screenshot of final zoom meetup](/img/mentorships-round1-zoom.png)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow.\n\nWe were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See [https://nf-co.re/mentorships](https://nf-co.re/mentorships) for details.\n\n[Mentorship Round 2 - Details](https://nf-co.re/mentorships)", "images": [ - "/img/mentorships-round1-wordcloud.png" + "/img/mentorships-round1-wordcloud.png", + "/img/mentorships-round1-map.png", + "/img/mentorships-round1-zoom.png" ], "author": "Chris Hakkaart", "tags": "nextflow,nf-core,czi,mentorship,training" @@ -448,7 +498,7 @@ "slug": "2022/nextflow-is-moving-to-slack", "title": "Nextflow’s community is moving to Slack!", "date": "2022-02-22T00:00:00.000Z", - "content": "
\n*\n“Software communities don’t just write code together. They brainstorm feature ideas, help new users get their bearings, and collaborate on best ways to use the software.…conversations need their own place\" - [GitHub Satellite Blog 2020](https://github.blog/2020-05-06-new-from-satellite-2020-github-codespaces-github-discussions-securing-code-in-private-repositories-and-more)\n*\n
\n\nThe Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.\n\nI still remember when a [former colleague](https://twitter.com/helicobacter1) proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : \"would that not be a waste of time?\".\n\nFortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.\n\n### Where the future lies\n\nAs the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.\n\nFor these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!\n\nYou can join today using [this link](https://www.nextflow.io/slack-invite.html)!\n\nOnce you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.\n\nAlong the same lines, the Nextflow discussion forum is moving from [Google Groups](https://groups.google.com/forum/#!forum/nextflow) to the [Discussion forum](https://github.com/nextflow-io/nextflow/discussions) in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.\n\nThe old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.\n\nIf you have any questions or problems signing up then please feel free to let us know at info@nextflow.io.\n\nAs always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!\n\nSee you on Slack!\n\n### Credits\n\nThis was also made possible thanks to sponsorship from the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), the [Slack for Nonprofits program](https://slack.com/intl/en-gb/about/slack-for-good) and support from [Seqera Labs](https://www.seqera.io).", + "content": "
\n* “Software communities don’t just write code together. They brainstorm feature ideas, help new users get their bearings, and collaborate on best ways to use the software.…conversations need their own place\" - [GitHub Satellite Blog 2020](https://github.blog/2020-05-06-new-from-satellite-2020-github-codespaces-github-discussions-securing-code-in-private-repositories-and-more) *\n
\n\nThe Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.\n\nI still remember when a [former colleague](https://twitter.com/helicobacter1) proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : \"would that not be a waste of time?\".\n\nFortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.\n\n### Where the future lies\n\nAs the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.\n\nFor these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!\n\nYou can join today using [this link](https://www.nextflow.io/slack-invite.html)!\n\nOnce you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.\n\nAlong the same lines, the Nextflow discussion forum is moving from [Google Groups](https://groups.google.com/forum/#!forum/nextflow) to the [Discussion forum](https://github.com/nextflow-io/nextflow/discussions) in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.\n\nThe old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.\n\nIf you have any questions or problems signing up then please feel free to let us know at info@nextflow.io.\n\nAs always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!\n\nSee you on Slack!\n\n### Credits\n\nThis was also made possible thanks to sponsorship from the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/), the [Slack for Nonprofits program](https://slack.com/intl/en-gb/about/slack-for-good) and support from [Seqera Labs](https://www.seqera.io).", "images": [], "author": "Paolo Di Tommaso", "tags": "community, slack, github" @@ -458,7 +508,9 @@ "title": "Nextflow Summit 2022 Recap", "date": "2022-11-03T00:00:00.000Z", "content": "## Three days of Nextflow goodness in Barcelona\n\nAfter a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!\n\nThe week-long event kicked off with 50 people participating in a hackathon organized by nf-core beginning on October 10th. The [hackathon](https://nf-co.re/events/2022/hackathon-october-2022) tackled several cutting-edge projects with developer teams focused on various aspects of nf-core including documentation, subworkflows, pipelines, DSL2 conversions, modules, and infrastructure. The Nextflow Summit began mid-week attracting nearly 600 people, including 165 attending in person and another 433 remotely. The [YouTube live streams](https://summit.nextflow.io/stream/) have now collected over two and half thousand views. Just prior to the summit, three virtual Nextflow training events were also run with separate sessions for the Americas, EMEA, and APAC in which 835 people participated.\n\n## An action-packed agenda\n\nThe three-day Nextflow Summit featured 33 talks delivered by speakers from academia, research, healthcare providers, biotechs, and cloud providers. This year’s speakers came from the following organizations:\n\n- Amazon Web Services\n- Center for Genomic Regulation\n- Centre for Molecular Medicine and Therapeutics, University of British Columbia\n- Chan Zukerberg Biohub\n- Curative\n- DNA Nexus\n- Enterome\n- Google\n- Janelia Research Campus\n- Microsoft\n- Oxford Nanopore\n- Quadram Institute BioScience\n- Seqera Labs\n- Quantitative Biology Center, University of Tübingen\n- Quilt Data\n- UNC Lineberger Comprehensive Cancer Center\n- Università degli Studi di Macerata\n- University of Maryland\n- Wellcome Sanger Institute\n- Wyoming Public Health Laboratory\n\n## Some recurring themes\n\nWhile there were too many excellent talks to cover individually, a few themes surfaced throughout the summit. Not surprisingly, SARS-Cov-2 was a thread that wound through several talks. Tony Zeljkovic from Curative led a discussion about [unlocking automated bioinformatics for large-scale healthcare](https://www.youtube.com/watch?v=JZMaRYzZxGU&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=8), and Thanh Le Viet of Quadram Institute Bioscience discussed [large-scale SARS-Cov-2 genomic surveillance at QIB](https://www.youtube.com/watch?v=6jQr9dDaais&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=30). Several speakers discussed best practices for building portable, modular pipelines. Other common themes were data provenance & traceability, data management, and techniques to use compute and storage more efficiently. There were also a few talks about the importance of dataflows in new application areas outside of genomics and bioinformatics.\n\n## Data provenance tracking\n\nIn the Thursday morning keynote, Rob Patro﹘Associate Professor at the University of Maryland Dept. of Computer Science and CTO and co-founder of Ocean Genomics﹘described in his talk “[What could be next(flow)](https://www.youtube.com/watch?v=vNrKFT5eT8U&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=6),” how far the Nextflow community had come in solving problems such as reproducibility, scalability, modularity, and ease of use. He then challenged the community with some complex issues still waiting in the wings. He focused on data provenance as a particularly vexing challenge explaining how tremendous effort currently goes into manual metadata curation.\n\nRob offered suggestions about how Nextflow might evolve, and coined the term “augmented execution contexts” (AECs) drawing from his work on provenance tracking – answering questions such as “what are these files, and where did they come from.” This thinking is reflected in [tximeta](https://github.com/mikelove/tximeta), a project co-developed with Mike Love of UNC. Rob also proposed ideas around automating data format conversions analogous to type casting in programming languages explaining how such conversions might be built into Nextflow channels to make pipelines more interoperable.\n\nIn his talk with the clever title “[one link to rule them all](https://www.youtube.com/watch?v=dttkcuP3OBc&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=13),” Aneesh Karve of Quilt explained how every pipeline run is a function of the code, environment, and data, and went on to show how Quilt could help dramatically simplify data management with dataset versioning, accessibility, and verifiability. Data provenance and traceability were also front and center when Yih-Chii Hwang of DNAnexus described her team’s work around [bringing GxP compliance to Nextflow workflows](https://www.youtube.com/watch?v=RIwpJTDlLiE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=21).\n\n## Data management and storage\n\nOther speakers also talked about challenges related to data management and performance. Angel Pizarro of AWS gave an interesting talk comparing the [price/performance of different AWS cloud storage options](https://www.youtube.com/watch?v=VXtYCAqGEQQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=12). [Hatem Nawar](https://www.youtube.com/watch?v=jB91uqUqsRM&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=9) (Google) and [Venkat Malladi](https://www.youtube.com/watch?v=GAIL8ZAMJPQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=20) (Microsoft) also talked about cloud economics and various approaches to data handling in their respective clouds. Data management was also a key part of Evan Floden’s discussion about Nextflow Tower where he discussed Tower Datasets, as well as the various cloud storage options accessible through Nextflow Tower. Finally, Nextflow creator Paolo Di Tommaso unveiled new work being done in Nextflow to simplify access to data residing in object stores in his talk “[Nextflow and the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14)”.\n\n## Compute optimization\n\nAnother recurring theme was improving compute efficiency. Several talks discussed using containers more effectively, leveraging GPUs & FPGAs for added performance, improving virtual machine instance type selection, and automating resource requirements. Mike Smoot of Illumina talked about Nextflow, Kubernetes, and DRAGENs and how Illumina’s FPGA-based Bio-IT Platform can dramatically accelerate analysis. Venkat Malladi discussed efforts to suggest optimal VM types based on different standardized nf-core labels in the Azure cloud (process_low, process_medium, process_high, etc.) Finally, Evan Floden discussed [Nextflow Tower](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22) and unveiled an exciting new [resource optimization feature](https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/) that can intelligently tune pipeline resource requests to radically reduce cloud costs and improve run speed. Overall, the Nextflow community continues to make giant strides in improving efficiency and managing costs in the cloud.\n\n## Beyond genomics\n\nWhile most summit speakers focused on genomics, a few discussed data pipelines in other areas, including statistical modeling, analysis, and machine learning. Nicola Visonà from Università degli Studi di Macerata gave a fascinating talk about [using agent-based models to simulate the first industrial revolution](https://www.youtube.com/watch?v=PlKJ0IDV_ds&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=27). Similarly, Konrad Rokicki from the Janelia Research Campus explained how Janelia are using [Nextflow for petascale bioimaging data](https://www.youtube.com/watch?v=ZjSzx1I76z0&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=18) and why bioimage processing remains a large domain area with an unmet need for reproducible workflows.\n\n## Summit Announcements\n\nThis year’s summit also saw several exciting announcements from Nextflow developers. Paolo Di Tommaso, during his talk on [the future of containers](https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14), announced the availability of [Nextflow 22.10.0](https://github.com/nextflow-io/nextflow/releases/tag/v22.10.0). In addition to various bug fixes, the latest Nextflow release introduces an exciting new technology called Wave that allows containers to be built on the fly from Dockerfiles or Conda recipes saved within a Nextflow pipeline. Wave also helps to simplify containerized pipeline deployment with features such as “container augmentation”; enabling developers to inject new container scripts and functionality on the fly without needing to rebuild the base containers such as a cloud-native [Fusion file system](https://www.nextflow.io/docs/latest/fusion.html). When used with Nextflow Tower, Wave also simplifies authentication to various public and private container registries. The latest Nextflow release also brings improved support for Kubernetes and enhancements to documentation, along with many other features.\n\nSeveral other announcements were made during [Evan Floden’s talk](https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22&t=127s), such as:\n\n- MultiQC is joining the Seqera Labs family of products\n- Fusion – a distributed virtual file system for cloud-native data pipelines\n- Nextflow Tower support for Google Cloud Batch\n- Nextflow Tower resource optimization\n- Improved Resource Labels support in Tower with integrations for cost accounting with all major cloud providers\n- A new Nextflow Tower dashboard coming soon, providing visibility across workspaces\n\n## Thank you to our sponsors\n\nThe summit organizers wish to extend a sincere thank you to the event sponsors: AWS, Google Cloud, Seqera Labs, Quilt Data, Oxford Nanopore Technologies, and Element BioSciences. In addition, the [Chan Zuckerberg Initiative](https://chanzuckerberg.com/eoss/) continues to play a key role with their EOSS grants funding important work related to Nextflow and the nf-core community. The success of this year’s summit reminds us of the tremendous value of community and the critical impact of open science software in improving the quality, accessibility, and efficiency of scientific research.\n\n## Learning more\n\nFor anyone who missed the summit, you can still watch the sessions or view the training sessions at your convenience:\n\n- Watch post-event recordings of the [Nextflow Summit on YouTube](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32)\n- View replays of the recent online [Nextflow and nf-core training](https://nf-co.re/events/2022/training-october-2022)\n\nFor additional detail on the summit and the preceding nf-core events, also check out an excellent [summary of the event](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/) written by Marcel Ribeiro-Dantas in his blog, the [Dataist Storyteller](https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/)!\n\n_In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges._\n\n![grant logos](/img/blog-2022-11-03--img1.png)", - "images": [], + "images": [ + "/img/blog-2022-11-03--img1.png" + ], "author": "Noel Ortiz", "tags": "nextflow,tower,cloud" }, @@ -476,7 +528,9 @@ "title": "Rethinking containers for cloud native pipelines", "date": "2022-10-13T00:00:00.000Z", "content": "Containers have become an essential part of well-structured data analysis pipelines. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and [reproducible results](https://www.nature.com/articles/nbt.3820).\n\nNextflow was one of the first workflow technologies to fully embrace [containers](https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html) for data analysis pipelines. Community curated container collections such as [BioContainers](https://biocontainers.pro/) also helped speed container adoption.\n\nHowever, the increasing complexity of data analysis pipelines and the need to deploy them across different clouds and platforms pose new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task.\n\nAlso, multi-cloud deployments and the increased use of private container registries further increase complexity for developers. Building and maintaining containers, pushing them to multiple registries, and dealing with platform-specific authentication schemes are tedious, time consuming, and a source of potential errors.\n\n## Wave – a game changer\n\nFor these reasons, we decided to fundamentally rethink how containers are deployed and managed in Nextflow. Today we are thrilled to announce Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.\n\nInstead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself. This approach helps simplify development, improves reliability, and makes pipelines easier to maintain. It can even improve pipeline performance.\n\n## How container provisioning works with Wave\n\nInstead of creating container images, pushing them to registries, and referencing them using Nextflow's [container](https://www.nextflow.io/docs/latest/process.html#container) directive, Wave allows developers to simply include a Dockerfile in the directory where a process is defined.\n\nWhen a process runs, the new Wave plug-in for Nextflow takes the Dockerfile and submits it to the Wave service. Wave then builds a container on-the-fly, pushes it to a destination container registry, and returns the container used for the actual process execution. The Wave service also employs caching at multiple levels to ensure that containers are built only once or when there is a change in the corresponding Dockerfile.\n\nThe registry where images are stored can be specified in the Nextflow config file, along with the other pipeline settings. This means containers can be served from cloud registries closer to where pipelines execute, delivering better performance and reducing network traffic.\n\n![Wave diagram](/img/wave-diagram.png)\n\n## Nextflow, Wave, and Conda – a match made in heaven\n\n[Conda](https://conda.io/) is an excellent package manager, fully [supported in Nextflow](https://www.nextflow.io/blog/2018/conda-support-has-landed.html) as an alternative to using containers to manage software dependencies in pipelines. However, until now, Conda could not be easily used in cloud-native computing platforms such as AWS Batch or Kubernetes.\n\nWave provides developers with a powerful new way to leverage Conda in Nextflow by using a [conda](https://www.nextflow.io/docs/latest/process.html#conda) directive as an alternative way to provision containers in their pipelines. When Wave encounters the `conda` directive in a process definition, and no container or Dockerfile is present, Wave automatically builds a container based on the Conda recipe using the strategy described above. Wave makes this process exceptionally fast (at least compared to vanilla Conda) by leveraging with the [Micromamba](https://github.com/mamba-org/mamba) project under the hood.\n\n## Support for private registries\n\nA long-standing problem with containers in Nextflow was the lack of support for private container registries. Wave solves this problem by acting as an authentication proxy between the Docker client requesting the container and a target container repository. Wave relies on [Nextflow Tower](https://seqera.io/tower/) to authenticate user requests to container registries.\n\nTo access private container registries from a Nextflow pipeline, developers can simply specify their Tower access token in the pipeline configuration file and store their repository credentials in [Nextflow Tower](https://help.tower.nf/22.2/credentials/overview/) page in your account. Wave will automatically and securely use these credentials to authenticate to the private container registry.\n\n## But wait, there's more! Container augmentation!\n\nBy automatically building and provisioning containers, Wave dramatically simplifies how containers are handled in Nextflow. However, there are cases where organizations are required to use validated containers for security or policy reasons rather than build their own images, but still they need to provide additional functionality, like for example, adding site-specific scripts or logging agents while keeping the base container layers intact.\n\nNextflow allows for the definition of pipeline level (and more recently module level) scripts executed in the context of the task execution environment. These scripts can be made accessible to the container environment by mounting a host volume. However, this approach only works when using a local or shared file system.\n\nWave solves these problems by dynamically adding one or more layers to an existing container image during the container image download phase from the registry. Developers can use container augmentation to inject an arbitrary payload into any container without re-building it. Wave then recomputes the image's final manifest adding new layers and checksums on-the-fly, so that the final downloaded image reflects the added content.\n\nWith container augmentation, developers can include a directory called `resources` in pipeline [module directories](https://www.nextflow.io/docs/latest/dsl2.html#module-directory). When the corresponding containerized task is executed, Wave automatically mirrors the content of the resources directory in the root path of the container where it can be accessed by scripts running within the container.\n\n## A sneak preview of Fusion file system\n\nOne of the main motivations for implementing Wave is that we wanted to have the ability to easily package a Fusion client in containers to make this important functionality readily available in Nextflow pipelines.\n\nFusion implements a virtual distributed file system and presents a thin-client allowing data hosted in AWS S3 buckets to be accessed via the standard POSIX filesystem interface expected by the pipeline tools. This client runs in the task container and is added automatically via the Wave augmentation capability. This makes Fusion functionality available for pipeline execution at runtime.\n\nThis means the Nextflow pipeline can use an AWS S3 bucket as the work directory, and pipeline tasks can access the S3 bucket natively as a local file system path. This is an important innovation as it avoids the additional step of copying files in and out of object storage. Fusion takes advantage for the Nextflow tasks segregation and idempotent execution model to optimise and speedup file access operations.\n\n## Getting started\n\nWave requires Nextflow version 22.10.0 or later and can be enabled by using the `-with-wave` command line option or by adding the following snippet in your nextflow.config file:\n\n```\nwave {\n enabled = true\n strategy = 'conda,container'\n}\n\ntower {\n accessToken = \"\"\n}\n```\n\nThe use of the Tower access token is not mandatory, however, it's required to enable the access to private repositories. The use of authentication also allows higher service rate limits compared to anonymous users. You can run a Nextflow pipeline such as rnaseq-nf with Wave, as follows:\n\n```\nnextflow run nextflow-io/rnaseq-nf -with-wave\n```\n\nThe configuration in the nextflow.config snippet above will enable the provisioning of Wave containers created starting from the `conda` requirements specified in the pipeline processes.\n\nYou can find additional information and examples in the Nextflow [documentation](https://www.nextflow.io/docs/latest/wave.html) and in the Wave [showcase project](https://github.com/seqeralabs/wave-showcase).\n\n## Availability\n\nThe Wave container provisioning service is available free of charge as technology preview to all Nextflow and Tower users. Wave supports all major container registries including [Docker Hub](https://hub.docker.com/), [Quay.io](https://quay.io/), [AWS Elastic Container Registry](https://aws.amazon.com/ecr/), [Google Artifact Registry](https://cloud.google.com/artifact-registry) and [Azure Container Registry](https://azure.microsoft.com/en-us/products/container-registry/).\n\nDuring the preview period, anonymous users can build up to 10 container images per day and pull 100 containers per hour. Tower authenticated users can build 100 container images per hour and pull 1000 containers per minute. After the preview period, we plan to make the Wave service available free of charge to academic users and open-source software (OSS) projects.\n\n## Conclusion\n\nSoftware containers greatly simplify the deployment of complex data analysis pipelines. However, there still have been many challenges preventing organizations from fully unlocking the potential of this exciting technology. For too long, containers have been viewed as a replacement for package managers, but they serve a different purpose.\n\nIn our view, it's time to re-consider containers as monolithic artifacts that are assembled separately from pipeline code. Instead, containers should be viewed simply as an execution substrate facilitating the deployment of the pipeline software dependencies defined via a proper package manager such as Conda.\n\nWave, Nextflow, and Nextflow Tower combine to fully automate the container lifecycle including management, provisioning and dependencies of complex data pipelines on-demand while removing unnecessary error-prone manual steps.\n", - "images": [], + "images": [ + "/img/wave-diagram.png" + ], "author": "Paolo Di Tommaso", "tags": "nextflow,tower,cloud" }, @@ -485,7 +539,11 @@ "title": "Turbo-charging the Nextflow command line with Fig!", "date": "2022-09-22T00:00:00.000Z", "content": "Nextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn't be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options.\n\n![xkcd charge making fun of tar tricky command line arguments](/img/xkcd_tar_charge.png)\nhttps://xkcd.com/1168/\n\nMany tools have sprung up to make the command-line more user friendly, such as tldr pages and rich-click. [Fig](https://fig.io) is one such tool that adds powerful autocomplete functionality to your terminal. Fig gives you graphical popups with color-coded contexts more dynamic than shaded text for recent commands or long blocks of text after pressing tab.\n\nFig is compatible with most terminals, shells and IDEs (such as the VSCode terminal), is fully supported in MacOS, and has beta support for Linux and Windows. In MacOS, you can simply install it with `brew install --cask fig` and then running the `fig` command to set it up.\n\nWe have now added Nextflow for Fig. Thanks to Figs open source core we were able to contribute specifications in Typescript that will now be automatically added for anyone installing or updating Fig. Now, with Fig, when you start typing your Nextflow commands, you’ll see autocomplete suggestions based on what you are typing and what you have typed in the past, such as your favorite options.\n\n![GIF with a demo of nextflow log/list subcommands](/img/nxf-log-list-params.gif)\n\nThe Fig autocomplete functionality can also be adjusted to suit our preferences. Suggestions can be displayed in alphabetical order or as a list of your most recent commands. Similarly, suggestions can be displayed all the time or only when you press tab.\n\nThe Fig specification that we've written not only suggests commands and options, but dynamic inputs too. For example, finding previous run names when resuming or cleaning runs is tedious and error prone. Similarly, pipelines that you’ve already downloaded with `nextflow pull` will be autocompleted if they have been run in the past. You won't have to remember the full names anymore, as Fig generators in the autocomplete allow you to automatically complete the run name after typing a few letters where a run name is expected. Importantly, this also works for pipeline names!\n\n![GIF with a demo of nextflow pull/run/clean/view/config subcommands](/img/nxf-pull-run-clean-view-config.gif)\n\nFig for Nextflow will make you increase your productivity regardless of your user level. If you run multiple pipelines during your day you will immediately see the benefit of Fig. Your productivity will increase by taking advantage of this autocomplete function for run and project names. For Nextflow newcomers it will provide an intuitive way to explore the Nextflow CLI with built-in help text.\n\nWhile Fig won’t replace the need to view help menus and documentation it will undoubtedly save you time and energy searching for commands and copying and pasting run names. Take your coding to the next level using Fig!", - "images": [], + "images": [ + "/img/xkcd_tar_charge.png", + "/img/nxf-log-list-params.gif", + "/img/nxf-pull-run-clean-view-config.gif" + ], "author": "Marcel Ribeiro-Dantas", "tags": "nextflow,development,learning" }, @@ -543,7 +601,8 @@ "date": "2023-04-17T00:00:00.000Z", "content": "## Introduction\n\n
\n \"Mentorship\n \n\n*Nextflow and nf-core mentorship rocket.*\n\n
\n\nThe global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation.\n\nThanks to the support of our Chan Zuckerberg Initiative Diversity and Inclusion grant, we established an international Nextflow and nf-core mentoring program. With the second round of the mentorship program now complete, we celebrate the success of the most recent cohort of mentors and mentees.\n\nFrom hundreds of applications, thirteen pairs of mentors and mentees were chosen for the second round of the program. For the past four months, they met regularly to collaborate on Nextflow or nf-core projects. The project scope was left up to the mentees, enabling them to work on any project aligned with their scientific interests and schedules.\n\nMentor-mentee pairs worked on a range of projects that included learning Nextflow and nf-core fundamentals, setting up Nextflow on their institutional clusters, translating Nextflow training material into other languages, and developing and implementing Nextflow and nf-core pipelines. Impressively, despite many mentees starting the program with very limited knowledge of Nextflow and nf-core, they completed the program with confidence and improved their abilities to develop and implement scalable and reproducible scientific workflows.\n\n![Map of mentor and mentee pairs](/img/mentorships-round2-map.png)
\n_The second round of the mentorship program was global._\n\n## Jing Lu (Mentee) & Moritz Beber (Mentor)\n\nJing joined the program with the goal of learning how to develop advanced Nextflow pipelines for disease surveillance at the Guangdong Provincial Center for Diseases Control and Prevention in China. His mentor was Moritz Beber from Denmark.\n\nTogether, Jing and Moritz developed a pipeline for the analysis of SARS-CoV-2 genomes from sewage samples. They also used GitHub and docker containers to make the pipeline more sharable and reproducible. In the future, Jing hopes to use Nextflow Tower to share the pipeline with other institutions.\n\n## Luria Leslie Founou (Mentee) & Sebastian Malkusch (Mentor)\n\nLuria's goal was to accelerate her understanding of Nextflow and apply it to her exploration of the resistome, virulome, mobilome, and phylogeny of bacteria at the Research Centre of Expertise and Biological Diagnostic of Cameroon. Luria was mentored by Sebastian Malkusch, Kolja Becker, and Alex Peltzer from the Boehringer Ingelheim Pharma GmbH & Co. KG in Germany.\n\nFor their project, Luria and her mentors developed a [pipeline](https://github.com/SMLMS/nfml) for mapping multi-dimensional feature space onto a discrete or continuous response variable by using multivariate models from the field of classical machine learning. Their pipeline will be able to handle classification, regression, and time-to-event models and can be used for model training, validation, and feature selection.\n\n## Sebastian Musundi (Mentee) & Athanasios Baltzis (Mentor)\n\nSebastian, from Mount Kenya University in Kenya, joined the mentorship program with the goal of using Nextflow pipelines to identify vaccine targets in Apicomplexan parasites. He was mentored by Athanasios Balzis from the Centre for Genomic Regulation in Spain.\n\nWith Athanasios’s help, Sebastian learned the fundamentals for developing Nextflow pipelines. During the learning process, they developed a [pipeline](https://github.com/sebymusundi/simple_RNA-seq) for customized RNA sequencing and a [pipeline](https://github.com/sebymusundi/AMR_pipeline) for predicting antimicrobial resistance genes. With his new skills, Sebastian plans to keep using Nextflow on a daily basis and start contributing to nf-core.\n\n## Juan Ugalde (Mentee) & Robert Petit (Mentor)\n\nJuan joined the mentorship program with the goal of improving his understanding of Nextflow to support microbial and viral analysis at the Universidad Andres Bello in Chile. Juan was mentored by Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert is an experienced Nextflow mentor who also mentored in Round 1 of the program.\n\nJuan and Robert shared an interest in viral genomics. After learning more about the Nextflow and nf-core ecosystem, Robert mentored Juan as he developed a Nextflow viral amplicon analysis [pipeline](https://github.com/gene2dis/hantaflow). Juan will continue his Nextflow and nf-core journey by sharing his new knowledge with his group and incorporating it into his classes in the coming semester.\n\n## Bhargava Reddy Morampalli (Mentee) & Venkat Malladi (Mentor)\n\nBhargava studies at Massey University in New Zealand and joined the program with the goal of improving his understanding of Nextflow and resolving issues he was facing while developing a pipeline to analyze Nanopore direct RNA sequencing data. Bhargava was mentored by Venkat Malladi from Microsoft in the USA.\n\nBhargava and Venkat worked on Bhargava’s [pipeline](https://github.com/bhargava-morampalli/rnamods-nf/) to identify RNA modifications from bacteria. Their successes included advancing the pipeline and making Singularity images for the tools Bhargava was using to make it more reproducible. For Bhargava, the mentorship program was a great kickstart for learning Nextflow and his pipeline development. He hopes to continue to develop his pipeline and optimize it for cloud platforms in the future.\n\n## Odion Ikhimiukor (Mentee) & Ben Sherman (Mentor)\n\nBefore the program, Odion, who is at the University at Albany in the USA, was new to Nextflow and nf-core. He joined the program with the goal of improving his understanding and to learn how to develop pipelines for bacterial genome analysis. His mentor Ben Sherman works for Seqera Labs in the USA.\n\nDuring the program Odion and Ben developed a [pipeline](https://github.com/odionikh/nf-practice) to analyze bacterial genomes for antimicrobial resistance surveillance. They also developed configuration settings to enable the deployment of their pipeline with high and low resources. Odion has plans to share his new knowledge with others in his community.\n\n## Batool Almarzouq (Mentee) & Murray Wham (Mentor)\n\nBatool works at the King Abdullah International Medical Research Center in Saudi Arabia. Her goal for the mentorship program was to contribute to, and develop, nf-core pipelines.\nAdditionally, she aimed to develop new educational resources for nf-core that can support researchers from lowly represented groups. Her mentor was Murray Wham from the ​​University of Edinburgh in the UK.\n\nDuring the mentorship program, Murray helped Batool develop her molecular dynamics pipeline and participate in the 1st Biohackathon in MENA (KAUST). Batool and Murray also found ways to make documentation more accessible and are actively promoting Nextlfow and nf-core in Saudi Arabia.\n\n## Mariama Telly Diallo (Mentee) & Emilio Garcia (Mentor)\n\nMariama Telly joined the mentorship program with the goal of developing and implementing Nextflow pipelines for malaria research at the Medical Research Unit at The London School of Hygiene and Tropical Medicine in Gambia. She was mentored by Emilio Garcia from Platomics in Austria. Emilio is another experienced mentor who joined the program for a second time.\n\nTogether, Mariama Telly and Emilio worked on learning the basics of Nextflow, Git, and Docker. Putting these skills into practice they started to develop a Nextflow pipeline with a docker file and custom configuration. Mariama Telly greatly improved her understanding of best practices and Nextflow and intends to use her newfound knowledge for future projects.\n\n## Anabella Trigila (Mentee) & Matthias De Smet (Mentor)\n\nAnabella’s goal was to set up Nextflow on her institutional cluster at Héritas S.A. in Argentina and translate some bash pipelines into Nextflow pipelines. Anabella was mentored by Matthias De Smet from Ghent University in Belgium.\n\nAnabella and Matthias worked on developing several new nf-core modules. Extending this, they started the development of a [pipeline](https://github.com/atrigila/nf-core-saliva) to process VCFs obtained from saliva samples and a [pipeline](https://github.com/atrigila/nf-core-ancestry) to infer ancestry from VCF samples. Anabella has now transitioned from a user to a developer and made multiple contributions to the most recent nf-core hackathon. She also contributed to the Spanish translation of the Nextflow [training material](https://training.nextflow.io/es/).\n\n## Juliano de Oliveira Silveira (Mentee) & Maxime Garcia (Mentor)\n\nJuliano works at the Laboratório Central de Saúde Pública RS in Brazil. He joined the program with the goal of setting up Nextflow at his institution, which led him to learn to write his own pipelines. Juliano was mentored by Maxime Garcia from Seqera Labs in Sweden.\n\nJuliano and Maxime worked on learning about Nextflow and nf-core. Juliano applied his new skills to an open-source bioinformatics program that used Nextflow with a customized R script. Juliano hopes to give back to the wider community and peers in Brazil.\n\n## Patricia Agudelo-Romero (Mentee) & Abhinav Sharma (Mentor)\n\nPatricia's goal was to create, customize, and deploy nf-core pipelines at the Telethon Kids Institute in Australia. Her mentor was Abhinav Sharma from Stellenbosch University in South Africa.\n\nAbhinav helped Patricia learn how to write reproducible pipelines with Nextflow and how to work with shared code repositories on GitHub. With Abhinav's support, Patricia worked on translating a Snakemake [pipeline](https://github.com/agudeloromero/everest_nf) designed for genome virus identification and classification into Nextflow. Patricia is already applying her new skills and supporting others at her institute as they adopt Nextflow.\n\n## Mariana Guilardi (Mentee) & Alyssa Briggs (Mentor)\n\nMariana’s goal was to learn the fundamentals of Nextflow, construct and run pipelines, and help with nf-core pipeline development. Her mentor was Alyssa Briggs from the University of Texas at Dallas in the USA\n\nAt the start of the program, Alyssa helped Mariana learn the fundamentals of Nextflow. With Alyssa’s help, Mariana’s skills progressed rapidly and by the end of the program, they were running pipelines and developing new nf-core modules and the [nf-core/viralintegration](https://github.com/nf-core/viralintegration) pipeline. Mariana also made community contributions to the Portuguese translation of the Nextflow [training material](https://training.nextflow.io/pt/).\n\n## Liliane Cavalcante (Mentee) & Marcel Ribeiro-Dantas (Mentor)\n\nLiliane’s goal was to develop and apply Nextflow pipelines for genomic and epidemiological analyses at the Laboratório Central de Saúde Pública Noel Nutels in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera Labs in Brazil.\n\nLiliane and Marcel used Nextflow and nf-core to analyze SARS-CoV-2 genomes and demographic data for public health surveillance. They used the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline and made a new Nextflow script for additional analysis and generating graphs.\n\n## Conclusion\n\nAs with the first round of the program, the feedback about the second round of the mentorship program was overwhelmingly positive. All mentees found the experience to be highly beneficial and were grateful for the opportunity to participate.\n\n> *“Having a mentor guide through the entire program was super cool. We worked all the way from the basics of Nextflow and learned a lot about developing and debugging pipelines. Today, I feel more confident than before in using Nextflow on a daily basis.”* - Sebastian Musundi (Mentee)\n\nSimilarly, the mentors also found the experience to be highly rewarding.\n\n> *“As a mentor, I really enjoyed participating in the program. Not only did I have the chance to support and work with colleagues from lowly represented regions, but also I learned a lot and improved myself through the mentoring and teaching process.”* - Athanasios Baltzis (Mentor)\n\nImportantly, all program participants expressed their willingness to encourage others to be part of it in the future.\n\n> *“The mentorship allows mentees not only to learn nf-core/Nextflow but also a lot of aspects about open-source reproducible research. With your learning, at the end of the mentorship, you could even contribute back to the nf-core community, which is fantastic! I would tell everyone who is interested in the program to go for it.”* - Anabella Trigila (Mentee)\n\nAs the Nextflow and nf-core communities continue to grow, the mentorship program will have long-lasting benefits beyond those that can be immediately measured. Mentees from the program have already become positive role models, contributing new perspectives to the broader community.\n\n> *“I highly recommend this program. Independent if you are new to Nextflow or already have some experience, the possibility of working with amazing people to learn about the Nextflow ecosystem is invaluable. It helped me to improve my work, learn new things, and become confident enough to teach Nextflow to students.”* - Juan Ugalde (Mentee)\n\nWe were delighted with the achievements of the mentors and mentees. Applications for the third round are now open! For more information, please visit https://nf-co.re/mentorships.", "images": [ - "/img/mentorships-round2-rocket.png" + "/img/mentorships-round2-rocket.png", + "/img/mentorships-round2-map.png" ], "author": "Chris Hakkaart", "tags": "nextflow,nf-core,czi,mentorship,training" @@ -554,7 +613,9 @@ "date": "2023-11-13T00:00:00.000Z", "content": "
\n \"Mentorship\n \n\n*Nextflow and nf-core mentorship rocket.*\n\n
\n\nWith the third round of the [Nextflow and nf-core mentorship program](https://nf-co.re/mentorships) now behind us, it's time to pop the confetti and celebrate the outstanding achievements of our latest group of mentors and mentees!\n\nAs with the [first](https://www.nextflow.io/blog/2022/czi-mentorship-round-1.html) and [second](https://www.nextflow.io/blog/2023/czi-mentorship-round-2.html) rounds of the program, we received hundreds of applications from all over the world. Mentors and mentees were matched based on compatible interests and time zones and set off to work on a project of their choosing. Pairs met regularly to work on their projects and reported back to the group to discuss their progress every month.\n\nThe mentor-mentee duos chose to tackle many interesting projects during the program. From learning how to develop pipelines with Nextflow and nf-core, setting up Nextflow on their institutional clusters, and translating Nextflow training materials into other languages, this cohort of mentors and mentees did it all. Regardless of all initial challenges, every pair emerged from the program brimming with confidence and a knack for building scalable and reproducible scientific workflows with Nextlfow. Way to go, team!\n\n![Map of mentor and mentee pairs](/img/mentorship_3_map.png)
\n_Participants of the third round of the mentorship program._\n\n## Abhay Rastogi and Matthias De Smet\n\nAbhay Rastogi is a Clinical Research Fellow at the All India Institute Of Medical Sciences (AllMS Delhi). During the program, he wanted to contribute to the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline. He was mentored by Matthias De Smet, a Bioinformatician at the Center for Medical Genetics in the Ghent University Hospital. Together they worked on developing an nf-core module for Exomiser, a variant prioritization tool for short-read WGS data that they hope to incorporate into [nf-core/sarek](https://github.com/nf-core/sarek/). Keep an eye out for this brand new feature as they continue to work towards implementing this new feature into the [nf-core/sarek](https://github.com/nf-core/sarek/) pipeline!\n\n## Alan Möbbs and Simon Pearce\n\nAlan Möbbs, a Bioinformatics Analyst at MultiplAI, was mentored by Simon Pearce, Principal Bioinformatician at the Cancer Research UK Cancer Biomarker Centre. During the program, Alan wanted to create a custom pipeline that merges functionalities from the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) and [nf-core/rnavar](https://github.com/nf-core/rnavar/) pipelines. They started their project by forking the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/) pipeline and adding a subworkflow with variant calling functionalities. As the project moved on, they were able to remove tools from the pipeline that were no longer required. Finally, they created some custom definitions for processing samples and work queues to optimize the workflow on AWS. Alan plans to keep working on this project in the future.\n\n## Cen Liau and Chris Hakkaart\n\nCen Liau is a scientist at the Bragato Research Institute in New Zealand, analyzing the epigenetics of grapevines in response to environmental stress. Her mentor was Chris Hakkaart, a Developer Advocate at Seqera. They started the program by deploying the [nf-core/methylseq](https://github.com/nf-core/methylseq/) pipeline on New Zealand’s national infrastructure to analyze data Cen had produced. Afterward, they started to develop a proof of concept methylation pipeline to analyze additional data Cen has produced. Along the way, they learned about nf-core best practices and how to use GitHub to build pipelines collaboratively.\n\n## Chenyu Jin and Ben Sherman\n\nChenyu Jin is a Ph.D. student at the Center for Palaeogenetics of the Swedish Museum of Natural History. She worked with Ben Sherman, a Software Engineer at Seqera. Together they worked towards establishing a workflow for recursive step-down classification using experimental Nextflow features. During the program, they made huge progress in developing a cutting-edge pipeline that can be used for analyzing ancient environmental DNA and reconstructing flora and fauna. Watch this space for future developments!\n\n## Georgie Samaha and Cristina Tuñí i Domínguez\n\nGeorgie Samaha, a bioinformatician from the University of Sydney, was mentored by Cristina Tuñi i Domínguez, a Bioinformatics Scientist at Flomics Biotech SL. During the program, they developed Nextflow configuration files. As a part of this, they built institutional configuration files for multiple national research HPC and cloud infrastructures in Australia. Towards the end of the mentorship, they [built a tool for building configuration files](https://github.com/georgiesamaha/configBuilder-nf) that they hope to share widely in the future.\n\n## Ícaro Maia Santos de Castro and Robert Petit\n\nÍcaro Maia Santos is a Ph.D. Candidate at the University of São Paulo. He was mentored by Robert, a Research Scientist from Wyoming Public Health Lab. After learning the basics of Nextflow and nf-core, they worked on a [metatranscriptomics pipeline](https://github.com/icaromsc/nf-core-phiflow) that simultaneously characterizes microbial composition and host gene expression RNA sequencing samples. As a part of this process, they used nf-core modules that were already available and developed and contributed new modules to the nf-core repository. Ícaro found having someone to help him learn and overcome issues as he was developing his pipeline was invaluable for his career.\n\n![phiflow metro map](/img/phiflow_metro_map.png)
\n_Metro map of the phiflow workflow._\n\n## Lila Maciel Rodríguez Pérez and Priyanka Surana\n\nLila Maciel Rodríguez Pérez, from the National Agrarian University in Peru, was mentored by Priyanka Surana, a researcher from the Wellcome Sanger Institute in the UK. Lila and Priyanka focused on building and deploying Nextflow scripts for metagenomic assemblies. In particular, they were interested in the identification of Antibiotic-Resistant Genes (ARG), Metal-Resistant Genes (MRG), and Mobile Genetic Elements (MGE) in different environments, and in figuring out how these genes are correlated. Both Lila and Priyanka spoke highly of each other and how much they enjoyed being a part of the program.\n\n## Luisa Sacristan and Gisela Gabernet\n\nLuisa is an MSc. student studying computational biology in the Computational Biology and Microbial Ecology group at Universidad de los Andes in Colombia. She was mentored by Gisela Gabernet, a researcher at Yale Medical School. At the start of the program, Luisa and Gisela focused on learning more about GitHub. They quickly moved on to developing an nf-core configuration file for Luisa’s local university cluster. Finally, they started developing a pipeline for the analysis of custom ONT metagenomic amplicons from coffee beans.\n\n## Natalia Coutouné and Marcel Ribeiro-Dantas\n\nNatalia Coutoné is a Ph.D. Candidate at the University of Campinas in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera. Natalia and Marcel worked on developing a pipeline to identify relevant QTL among two or more pool-seq samples. Learning the little things, such as how and where to get help was a valuable part of the learning process for Natalia. She also found it especially useful to consolidate a “Frankenstein” pipeline she had been using into a cohesive Nextflow pipeline that she could share with others.\n\n## Raquel Manzano and Maxime Garcia\n\nRaquel Manzano is a bioinformatician and Ph.D. candidate at the University of Cambridge, Cancer Research UK Cambridge Institute. She was mentored by Maxime Garcia, a bioinformatics engineer at Seqera. During the program, they spent their time developing the [nf-core/rnadnavar](https://github.com/nf-core/rnadnavar/) pipeline. Initially designed for cancer research, this pipeline identifies a consensus call set from RNA and DNA somatic variant calling tools. Both Raquel and Maxime found the program to be highly rewarding. Raquel’s [presentation](https://www.youtube.com/watch?v=PzGOvqSI5n0) about the rnadnavar pipeline and her experience as a mentee from the 2023 Nextflow Summit in Barcelona is now online.\n\n## Conclusion\n\nWe are thrilled to report that the feedback from both mentors and mentees has been overwhelmingly positive. Every participant, whether mentor or mentee, found the experience extremely valuable and expressed gratitude for the chance to participate.\n\n> *“I loved the experience and the opportunity to develop my autonomy in nextflow/nf-core. This community is totally amazing!”* - Icaro Castro\n\n> *“I think this was a great opportunity to learn about a tool that can make our day-to-day easier and reproducible. Who knows, maybe it can give you a better chance when applying for jobs.”* - Alan Möbbs\n\nThanks to the fantastic support of the Chan Zuckerberg Initiative Diversity and Inclusion grant, Seqera, and our fantastic community, who made it possible to run all three rounds of the Nextflow and nf-core mentorship program.", "images": [ - "/img/mentorship_3_sticker.png" + "/img/mentorship_3_sticker.png", + "/img/mentorship_3_map.png", + "/img/phiflow_metro_map.png" ], "author": "Marcel Ribeiro-Dantas", "tags": "nextflow,nf-core,czi,mentorship" @@ -609,7 +670,8 @@ "images": [ "/img/blog-summit-2023-recap--img1b.jpg", "/img/blog-summit-2023-recap--img2b.jpg", - "/img/blog-summit-2023-recap--img3b.jpg" + "/img/blog-summit-2023-recap--img3b.jpg", + "/img/blog-2022-11-03--img1.png" ], "author": "Noel Ortiz", "tags": "nextflow,summit,event,hackathon" @@ -619,7 +681,10 @@ "title": "Get started with Nextflow on Google Cloud Batch", "date": "2023-02-01T00:00:00.000Z", "content": "[We have talked about Google Cloud Batch before](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html). Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The [Nextflow official documentation](https://www.nextflow.io/docs/latest/google.html) also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳\n\n### Running an RNAseq pipeline with Google Cloud Batch\n\nWelcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check [nf-core/rnaseq](https://github.com/nf-core/rnaseq). For the proof-of-concept RNAseq pipeline that we will use here, check [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf).\n\nNextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis.\n\nWe will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!\n\n### Setting up Google Cloud CLI (gcloud)\n\nIn this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions [here](https://cloud.google.com/sdk/docs/install) to install it. Once you have gcloud installed, run the command `gcloud init` to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project \"my-rnaseq-pipeline\". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.\n\n### Setting up Batch and Storage in Google Cloud Platform\n\n#### Enable Google Batch\n\nAccording to the [official Google documentation](https://cloud.google.com/batch/docs/get-started) _Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale_.\n\nThe first step is to download the `beta` command group. You can do this by executing:\n\n```bash\n$ gcloud components install beta\n```\n\nThen, enable billing for this project. You will first need to get your account id with\n\n```bash\n$ gcloud beta billing accounts list\n```\n\nAfter that, you will see something like the following appear in your window:\n\n```console\nACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True\n```\n\nIf you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.\n\n```bash\n$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ\n```\n\nNext, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:\n\n```bash\n$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com\n```\n\nYou should see a message similar to the one below:\n\n```console\nOperation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.\n```\n\n#### Create a Service Account\n\nIn order to access the APIs we enabled, you need to [create a Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud) and set the necessary IAM roles for the project. You can create the Service Account by executing:\n\n```bash\n$ gcloud iam service-accounts create rnaseq-pipeline-sa\n```\n\nAfter this, set appropriate roles for the project using the commands below:\n\n```bash\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"\n```\n\n#### Create your Bucket\n\nNow it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name\n\n```bash\n$ gcloud storage buckets create gs://rnaseq-pipeline-bckt\n```\n\nNow it's time for Nextflow to join the party! 🥳\n\n### Setting up Nextflow to make use of Batch and Storage\n\n#### Write the configuration file\n\nHere you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.\n\nStart by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally.\n\nInside the folder that you created for the project, create a file named `nextflow.config` with the following content (remember to replace PROJECT-ID with the project id you created above):\n\n```groovy\nworkDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}\n```\n\nThe `workDir` option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice.\n\nThe `process` scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the `nextflow/rnaseq-nf` Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed.\n\nThe `google` scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances [here](https://www.nextflow.io/docs/latest/google.html#spot-instances)), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the `nextflow.config` file should contain \"rnaseq-nxf\" as the project id.\n\nUse the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.\n\n```bash\n$ gcloud auth application-default login\n```\n\n#### Launch the pipeline!\n\nWith that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of [this tutorial](https://github.com/nextflow-io/rnaseq-nf), and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at .\n\nOne important thing to mention is that in this repository there is already a `nextflow.config` file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the `-c` Nextflow parameter. Run the command line below:\n\n```bash\n$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config\n```\n\nWhile the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called `results`, because of how the `publishDir` directive was specified in the `main.nf` script (example [here](https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5)). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the `publishDir` to another bucket directory, e.g. `gs://rnaseq-pipeline-bckt/results`.\n\nIn your terminal, you should see something like this:\n\n![Nextflow ongoing run on Google Cloud Batch](/img/ongoing-nxf-gbatch.png)\n\nYou can check the status of your jobs on Google Batch by opening another terminal and running the following command:\n\n```bash\n$ gcloud batch jobs list\n```\n\nBy the end of it, if everything worked well, you should see something like:\n\n![Nextflow run on Google Cloud Batch finished](/img/nxf-gbatch-finished.png)\n\nAnd that's all, folks! 😆\n\nYou will find more information about Nextflow on Google Batch in [this blog post](https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html) and the [official Nextflow documentation](https://www.nextflow.io/docs/latest/google.html).\n\nSpecial thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document.\n", - "images": [], + "images": [ + "/img/ongoing-nxf-gbatch.png", + "/img/nxf-gbatch-finished.png" + ], "author": "Marcel Ribeiro-Dantas", "tags": "nextflow,google,cloud" }, @@ -731,7 +796,9 @@ "title": "Nextflow 24.04 - Release highlights", "date": "2024-05-27T00:00:00.000Z", "content": "We release an \"edge\" version of Nextflow every month and a \"stable\" version every six months. The stable releases are recommended for production usage and represent a significant milestone. The [release changelogs](https://github.com/nextflow-io/nextflow/releases) contain a lot of detail, so we thought we'd highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let's get into it!\n\n:::tip\nWe also did a podcast episode about some of these changes!\nCheck it out here: [Channels Episode 41](/podcast/2024/ep41_nextflow_2404.html).\n:::\n\n## Table of contents\n\n- [New features](#new-features)\n - [Seqera Containers](#seqera-containers)\n - [Workflow output definition](#workflow-output-definition)\n - [Topic channels](#topic-channels)\n - [Process eval outputs](#process-eval-outputs)\n - [Resource limits](#resource-limits)\n - [Job arrays](#job-arrays)\n- [Enhancements](#enhancements)\n - [Colored logs](#colored-logs)\n - [AWS Fargate support](#aws-fargate-support)\n - [OCI auto pull mode for Singularity and Apptainer](#oci-auto-pull-mode-for-singularity-and-apptainer)\n - [Support for GA4GH TES](#support-for-ga4gh-tes)\n- [Fusion](#fusion)\n - [Enhanced Garbage Collection](#enhanced-garbage-collection)\n - [Increased File Handling Capacity](#increased-file-handling-capacity)\n - [Correct Publishing of Symbolic Links](#correct-publishing-of-symbolic-links)\n- [Other notable changes](#other-notable-changes)\n\n## New features\n\n### Seqera Containers\n\nA new flagship community offering was revealed at the Nextflow Summit 2024 Boston - **Seqera Containers**. This is a free-to-use container cache powered by [Wave](https://seqera.io/wave/), allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a [dedicated blog post](https://seqera.io/blog/introducing-seqera-pipelines-containers/) about this, but it's worth noting that the service can be used directly from Nextflow and not only through [https://seqera.io/containers/](https://seqera.io/containers/)\n\nIn order to use Seqera Containers in Nextflow, simply set `wave.freeze` _without_ setting `wave.build.repository` - for example, by using the following config for your pipeline:\n\n```groovy\nwave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'\n```\n\nAny processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether `singularity.enabled` is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the `nextflow inspect` command.\n\n### Workflow output definition\n\nThe workflow output definition is a new syntax for defining workflow outputs:\n\n```groovy\nnextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]\n```\n\nIt essentially provides a DSL2-style approach for publishing, and will replace `publishDir` once it is finalized. It also provides extra flexibility as it allows you to publish _any_ channel, not just process outputs. See the [Nextflow docs](https://nextflow.io/docs/latest/workflow.html#publishing-outputs) for more information.\n\n:::info\nThis feature is still in preview and may change in a future release.\nWe hope to finalize it in version 24.10, so don't hesitate to share any feedback with us!\n:::\n\n### Topic channels\n\nTopic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or \"topic\":\n\n```groovy\nprocess foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}\n```\n\nTopic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the `mix` operator). See the [Nextflow docs](https://nextflow.io/docs/latest/channel.html#topic) for more information.\n\n### Process `eval` outputs\n\nProcess `eval` outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:\n\n```groovy\nprocess sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}\n```\n\nThe shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a `path` or `env` output. The new `eval` output is a much more convenient way to capture this kind of command output directly. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#output-type-eval) for more information.\n\n#### Collecting software versions\n\nTogether, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:\n\n```groovy\nprocess FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}\n```\n\nThis approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq [here](https://github.com/nf-core/rnaseq/pull/1109) and [here](https://github.com/nf-core/rnaseq/pull/1115) to see them in action!\n\n### Resource limits\n\nThe **resourceLimits** directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:\n\n```groovy\nprocess.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]\n```\n\nIf a task requests more than the specified limit (e.g. due to [retry with dynamic resources](https://nextflow.io/docs/latest/process.html#dynamic-computing-resources)), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the `check_max()` function, which can now be replaced with `resourceLimits`. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#resourcelimits) for more information.\n\n### Job arrays\n\n**Job arrays** are now supported in Nextflow using the `array` directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a \"job array\" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.\n\nWith Nextflow, using job arrays is a one-liner:\n\n```groovy\nprocess.array = 100\n```\n\nYou can also enable job arrays for individual processes like any other directive. See the [Nextflow docs](https://nextflow.io/docs/latest/process.html#array) for more information.\n\n:::tip\nOn Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the `machineType` directive in conjunction with the `cpus` and `memory` directives.\n:::\n\n## Enhancements\n\n### Colored logs\n\n
\n\n**Colored logs** have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire [blog post](https://nextflow.io/blog/2024/nextflow-colored-logs.html) about it, so go check that out for more details!\n\n
\n\n![New coloured output from Nextflow](/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png)\n\n
\n\n### AWS Fargate support\n\nNextflow now supports **AWS Fargate** for AWS Batch jobs. See the [Nextflow docs](https://nextflow.io/docs/latest/aws.html#aws-fargate) for details.\n\n### OCI auto pull mode for Singularity and Apptainer\n\nNextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format.\n\nNow Nextflow allows specifying the option `ociAutoPull` both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the `exec` command.\n\n```groovy\nsingularity.ociAutoPull = true\n```\n\nThis results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/config.html#scope-singularity) for more information.\n\n### Support for GA4GH TES\n\nThe [Task Execution Service (TES)](https://ga4gh.github.io/task-execution-schemas/docs/) is an API specification, developed by [GA4GH](https://www.ga4gh.org/), which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are [Funnel](https://github.com/ohsu-comp-bio/funnel) and [TES Azure](https://github.com/microsoft/ga4gh-tes).\n\nNextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the `bin` directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:\n\n```groovy\nplugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'\n```\n\nSee the [Nextflow docs](https://nextflow.io/docs/latest/executor.html#ga4gh-tes) for more information.\n\n:::note\nTo better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, `nextflow-io/nf-ga4gh`. To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above.\n:::\n\n## Fusion\n\n[Fusion](https://seqera.io/fusion/) is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:\n\n### Enhanced Garbage Collection\n\nFusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage.\n\n### Increased File Handling Capacity\n\nSupport for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets.\n\n### Correct Publishing of Symbolic Links\n\nIn previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.\n\nThese enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users.\n\n## Other notable changes\n\n- Add native retry on spot termination for Google Batch ([`ea1c1b`](https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b))\n- Add support for instance templates in Google Batch ([`df7ed2`](https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96))\n- Allow secrets to be used with `includeConfig` ([`00c9f2`](https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d))\n- Allow secrets to be used in the pipeline script ([`df866a`](https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27))\n- Add retry strategy for publishing ([`c9c703`](https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761))\n- Add `k8s.cpuLimits` config option ([`3c6e96`](https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7))\n- Removed `seqera` and `defaults` from the standard channels used by the nf-wave plugin. ([`ec5ebd`](https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062))\n\nYou can view the full [Nextflow release notes on GitHub](https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0).", - "images": [], + "images": [ + "/img/blog-nextflow-colored-logs/nextflow_coloured_logs.png" + ], "author": "Paolo Di Tommaso", "tags": "nextflow" }, diff --git a/internal/export.mjs b/internal/export.mjs index 11ecb3b7..1ac75f88 100644 --- a/internal/export.mjs +++ b/internal/export.mjs @@ -7,15 +7,26 @@ const postsDirectory = path.join(process.cwd(), '../src/content/blog'); const outputFile = path.join(process.cwd(), 'export.json'); function extractImagePaths(content, postPath) { - const $ = cheerio.load(content); const images = []; + + // Extract HTML images + const $ = cheerio.load(content); $('img').each((i, elem) => { const src = $(elem).attr('src'); if (src) { images.push(src); } }); - return images; + + // Extract Markdown images + const markdownImageRegex = /!\[.*?\]\((.*?)\)/g; + let match; + while ((match = markdownImageRegex.exec(content)) !== null) { + images.push(match[1]); + } + + // Remove duplicates + return [...new Set(images)]; } function sanitizeMarkdown(content) { @@ -44,7 +55,7 @@ function sanitizeMarkdown(content) { $('a').each((i, elem) => { const $elem = $(elem); const href = $elem.attr('href'); - const text = $elem.text(); + const text = $elem.text().replace(/\n/g, ' '); $elem.replaceWith(`[${text}](${href})`); }); @@ -56,12 +67,12 @@ function sanitizeMarkdown(content) { $('em, i').each((i, elem) => { const $elem = $(elem); - $elem.replaceWith(`*${$elem.html()}*`); + $elem.replaceWith(`*${$elem.html().replace(/\n/g, ' ')}*`); }); $('strong, b').each((i, elem) => { const $elem = $(elem); - $elem.replaceWith(`**${$elem.html()}**`); + $elem.replaceWith(`**${$elem.html().replace(/\n/g, ' ')}**`); }); $('code').each((i, elem) => { diff --git a/internal/import.mjs b/internal/import.mjs index 28af64c7..db1cce63 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -46,7 +46,6 @@ function sanitizeText(text, removeLineBreaks = false) { } function tokenToPortableText(imageMap, token) { - switch (token.type) { case 'heading': return { @@ -58,8 +57,22 @@ function tokenToPortableText(imageMap, token) { case 'paragraph': const children = []; const markDefs = []; + let img; token.tokens.forEach(t => { + if (t.type === 'image') { + img = { + _type: 'image', + _key: nanoid(), + asset: { + _type: 'reference', + _ref: imageMap[t.href]._id, + }, + alt: t.text, + }; + return; + } + if (t.type === 'link') { const linkKey = nanoid(); children.push({ @@ -78,6 +91,8 @@ function tokenToPortableText(imageMap, token) { } }); + if (img) return img; + return { _type: 'block', _key: nanoid(), @@ -88,8 +103,11 @@ function tokenToPortableText(imageMap, token) { case 'image': const image = imageMap[src]; if (!image?._id) { - console.warn(`Failed to find image for token: ${token.href}`); - return null; + console.warn(`🚸 Failed to find image for token: ${token.href}`); + return { + _type: 'image', + _key: nanoid(), + }; } return { _type: 'image', @@ -116,8 +134,11 @@ function tokenToPortableText(imageMap, token) { const image = imageMap[src]; if (!image?._id) { - console.warn(`Failed to find image for token: ${token.text}`); - return null; + console.warn(`🚸 Failed to find image for token: ${token.text}`); + return { + _type: 'image', + _key: nanoid(), + }; } return { @@ -138,13 +159,13 @@ function tokenToPortableText(imageMap, token) { const src = srcMatch ? srcMatch[2] : ''; if (!src) { console.warn(`Failed to find src for script: ${token.text}`); - return null; + return { _type: 'block', _key: nanoid() }; } return { _type: 'script', _key: nanoid(), id, src }; } else { console.warn(`Unsupported HTML token: ${token.text}`); - return null; + return { _type: 'block', _key: nanoid() }; } case 'list': return { @@ -158,11 +179,21 @@ function tokenToPortableText(imageMap, token) { }; case 'space': - return null; + return { + _type: 'block', + _key: nanoid(), + style: 'normal', + children: [{ _type: 'span', text: '', _key: nanoid() }] + }; default: - console.warn(`Unsupported token type: ${token.type}`); - return null; + console.warn(`ℹ️ Unsupported token type: ${token.type}`); + return { + _type: 'block', + _key: nanoid(), + style: 'normal', + children: [{ _type: 'span', text: token.raw || '', _key: nanoid() }] + }; } } @@ -181,8 +212,11 @@ function inlineTokenToPortableText(imageMap, token) { case 'image': const image = imageMap[token.href]; if (!image?._id) { - console.warn(`Failed to find image for token: ${token.href}`); - return null; + console.warn(`🚸 Failed to find image for token: ${token.href}`); + return { + _type: 'image', + _key: nanoid(), + }; } return { _type: 'image', @@ -207,8 +241,30 @@ function inlineTokenToPortableText(imageMap, token) { text: sanitizeText(token.text, true), marks: [], }; + case 'em': + return { + _type: 'span', + text: sanitizeText(token.text, true), + marks: ['em'], + _key: nanoid() + }; + case 'strong': + return { + _type: 'span', + text: sanitizeText(token.text, true), + marks: ['strong'], + _key: nanoid() + }; + case 'html': + case 'del': + case 'escape': + return { + _type: 'span', + text: token.raw, + _key: nanoid() + }; default: - console.warn(`Unsupported inline token type: ${token.type}`); + console.warn(`ℹ️ Unsupported inline token type: ${token.type}`); return { _type: 'span', text: token.raw, _key: nanoid() }; } } @@ -219,6 +275,7 @@ async function migratePosts() { const selected = [ '2016/deploy-in-the-cloud-at-snap-of-a-finger', '2017/caw-and-singularity', + '2015/mpi-like-execution-with-nextflow' ] const selectedPosts = posts.filter(post => selected.includes(post.slug)); @@ -262,7 +319,6 @@ async function migratePosts() { let dateStr = post.date.split('T')[0]; dateStr = `${dateStr} 8:00`; - const sanityPost = { _type: 'blogPostDev', title: post.title, From 191c38ca32119a41c23b8902a0161dc043483683 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Thu, 26 Sep 2024 11:00:09 +0200 Subject: [PATCH 15/21] Add tag import --- internal/findTag.mjs | 17 +++++++++++++++++ internal/import.mjs | 24 ++++++++++++++++++++++++ 2 files changed, 41 insertions(+) create mode 100644 internal/findTag.mjs diff --git a/internal/findTag.mjs b/internal/findTag.mjs new file mode 100644 index 00000000..3fd16e20 --- /dev/null +++ b/internal/findTag.mjs @@ -0,0 +1,17 @@ +import sanityClient from '@sanity/client'; + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + +async function findTag(slug) { + let title = slug.replace(/-/g, ' '); + if (title === 'pipelines') title = 'seqera pipelines'; + const tag = await client.fetch(`*[_type == "tag" && lower(title) == lower($title)][0]`, { title }); + return tag +} + +export default findTag; \ No newline at end of file diff --git a/internal/import.mjs b/internal/import.mjs index db1cce63..d3c3e92b 100644 --- a/internal/import.mjs +++ b/internal/import.mjs @@ -4,6 +4,7 @@ import path from 'path'; import { customAlphabet } from 'nanoid'; import { marked } from 'marked'; import findPerson from './findPerson.mjs'; +import findTag from './findTag.mjs'; const nanoid = customAlphabet('0123456789abcdef', 12); @@ -306,6 +307,15 @@ async function migratePosts() { } } + const tags = [] + + for (const tag of post.tags.split(',')) { + const tagObj = await findTag(tag); + if (tagObj) tags.push(tagObj); + } + + const tagRefs = tags.map(tag => ({ _type: 'reference', _ref: tag._id, _key: nanoid() })); + const person = await findPerson(post.author); if (!person) { console.log(`⭕ No person found with the name "${post.author}"; skipping import.`); @@ -319,6 +329,19 @@ async function migratePosts() { let dateStr = post.date.split('T')[0]; dateStr = `${dateStr} 8:00`; + const existingPost = await client.fetch(`*[_type == "blogPostDev" && meta.slug.current == $slug][0]`, { slug: newSlug }); + + if (existingPost) { + console.log(`Updating post: ${existingPost.title}`, tagRefs); + try { + await client.patch(existingPost._id).set({ tags: tagRefs }).commit(); + } catch (error) { + console.error(`Failed to update post: ${existingPost.title}`); + console.error(error); + } + continue; + } + const sanityPost = { _type: 'blogPostDev', title: post.title, @@ -326,6 +349,7 @@ async function migratePosts() { publishedAt: new Date(dateStr).toISOString(), body: portableTextContent, author: { _type: 'reference', _ref: person._id }, + tags: tags.map(tag => ({ _type: 'reference', _ref: tag._id })), }; try { From e89e474dfab386569dc7f72304d422f3f212fc0d Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Fri, 4 Oct 2024 11:08:18 +0200 Subject: [PATCH 16/21] Link export --- internal/step2/exportLinks.mjs | 56 ++++++++++++++++++++++ internal/step2/posts.csv | 87 ++++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+) create mode 100644 internal/step2/exportLinks.mjs create mode 100644 internal/step2/posts.csv diff --git a/internal/step2/exportLinks.mjs b/internal/step2/exportLinks.mjs new file mode 100644 index 00000000..03b89387 --- /dev/null +++ b/internal/step2/exportLinks.mjs @@ -0,0 +1,56 @@ +import fs from 'fs'; +import path from 'path'; +import sanityClient from '@sanity/client'; + +const titlesFile = path.join(process.cwd(), 'posts.csv'); +const postsFile = path.join(process.cwd(), '../export.json'); +const outputFile = path.join(process.cwd(), './links.csv'); + +async function readPosts() { + const data = await fs.promises.readFile(postsFile, 'utf8'); + return JSON.parse(data); +} + + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + +async function fetchNewPosts() { + return await client.fetch(`*[_type == "blogPostDev"]`); +} + + + +async function getLinks() { + console.log('🟢🟢🟢 Export'); + const fileContents = fs.readFileSync(titlesFile, 'utf8'); + const titles = fileContents.split('\n'); + + const oldPosts = await readPosts(); + const newPosts = await fetchNewPosts(); + + for (const title of titles) { + const oldPost = oldPosts.find(p => p.title === title); + const newPost = newPosts.find(p => p.title === title); + if (!oldPost) console.log('⭕ old: ', title); + if (!newPost) console.log('⭕ new: ', title); + + let id = newPost?._id || ''; + if (id.split('.')[1]) id = id.split('.')[1]; + + let newSlug = newPost?.slug?.current || ''; + + const oldURL = oldPost ? `https://nextflow.io/blog/${oldPost.slug}.html` : ''; + const devURL = newPost ? `https://seqera.io/preview?type=blogPostDev&id=${id}` : ''; + const prodURL = newPost ? `https://seqera.io/blog/${newSlug}` : ''; + + } + + +} + +getLinks(); \ No newline at end of file diff --git a/internal/step2/posts.csv b/internal/step2/posts.csv new file mode 100644 index 00000000..8658c64c --- /dev/null +++ b/internal/step2/posts.csv @@ -0,0 +1,87 @@ +Application of Nextflow and nf-core to ancient environmental eDNA +Configure Git private repositories with Nextflow +Join us in welcoming the new Nextflow Ambassadors +Leveraging nf-test for enhanced quality control in nf-core +Nextflow Training: Bridging Online Learning with In-Person Connections +Nextflow workshop at the 20th KOGO Winter Symposium +nf-schema: the new and improved nf-validation +One-Year Reflections on Nextflow Mentorship +Optimizing Nextflow for HPC and Cloud at Scale +Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador +My Journey with Nextflow: From Exploration to Automation +Addressing Bioinformatics Core Challenges with Nextflow and nf-core +Moving toward better support through the Community forum +Experimental cleanup with nf-boost +How I became a Nextflow Ambassador! +Fostering Bioinformatics Growth in Türkiye +Nextflow 24.04 - Release highlights +Open call for new Nextflow Ambassadors closes June 14 +Empowering Bioinformatics: Mentoring Across Continents with Nextflow +Nextflow's colorful new console output +Nextflow and nf-core Mentorship, Round 3 +Nextflow Summit 2023 Recap +Introducing community.seqera.io +Introducing the Nextflow Ambassador Program +Geraldine Van der Auwera joins Seqera +Nextflow goes to university! +A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer” +Reflecting on ten years of Nextflow awesomeness +Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters +Selecting the right storage architecture for your Nextflow pipelines +Celebrating our largest international training event and hackathon to date +Nextflow and nf-core Mentorship, Round 2 +The State of Kubernetes in Nextflow +Learn Nextflow in 2023 +Get started with Nextflow on Google Cloud Batch +Analyzing caching behavior of pipelines +Nextflow Summit 2022 Recap +Rethinking containers for cloud native pipelines +Turbo-charging the Nextflow command line with Fig! +Nextflow and nf-core mentorship, Round 1 +Deploy Nextflow Pipelines with Google Cloud Batch! +Evolution of the Nextflow runtime +Nextflow’s community is moving to Slack! +Learning Nextflow in 2022 +Configure Git private repositories with Nextflow +Setting up a Nextflow environment on Windows 10 +Introducing Nextflow support for SQL databases +Five more tips for Nextflow user on HPC +5 Nextflow Tips for HPC Users +6 Tips for Setting Up Your Nextflow Dev Environment +Introducing Nextflow for Azure Batch +Learning Nextflow in 2020 +More syntax sugar for Nextflow developers! +The Nextflow CLI - tricks and treats! +Nextflow DSL 2 is here! +Easy provenance reporting +Troubleshooting Nextflow resume +Demystifying Nextflow resume +One more step towards Nextflow modules +Nextflow 19.04.0 stable release is out! +Edge release 19.03: The Sequence Read Archive & more! +Bringing Nextflow to Google Cloud Platform with WuXi NextCODE +Goodbye zero, Hello Apache! +Nextflow meets Dockstore +Clarification about the Nextflow license +Conda support has landed! +Nextflow turns five! Happy birthday! +Running CAW with Singularity and Nextflow +Scaling with AWS Batch +Nexflow Hackathon 2017 +Nextflow and the Common Workflow Language +Nextflow workshop is coming! +Nextflow published in Nature Biotechnology +More fun with containers in HPC +Enabling elastic computing with Nextflow +Deploy your computational pipelines in the cloud at the snap-of-a-finger +Docker for dunces & Nextflow for nunces +Workflows & publishing: best practice for reproducibility +Error recovery and automatic resource management with Nextflow +Developing a bioinformatics pipeline across multiple environments +MPI-like distributed execution with Nextflow +The impact of Docker containers on the performance of genomic pipelines +Innovation In Science - The story behind Nextflow +Introducing Nextflow REPL Console +Using Docker for scientific data analysis in an HPC cluster +Reproducibility in Science - Nextflow meets Docker +Share Nextflow pipelines with GitHub \ No newline at end of file From 964abdf47e6f86c7afb19378f9460915946ac13f Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Fri, 4 Oct 2024 11:26:11 +0200 Subject: [PATCH 17/21] Create CSV --- internal/step2/exportLinks.mjs | 12 +++-- internal/step2/links.csv | 88 ++++++++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+), 5 deletions(-) create mode 100644 internal/step2/links.csv diff --git a/internal/step2/exportLinks.mjs b/internal/step2/exportLinks.mjs index 03b89387..527bc943 100644 --- a/internal/step2/exportLinks.mjs +++ b/internal/step2/exportLinks.mjs @@ -11,7 +11,6 @@ async function readPosts() { return JSON.parse(data); } - export const client = sanityClient({ projectId: 'o2y1bt2g', dataset: 'seqera', @@ -23,8 +22,6 @@ async function fetchNewPosts() { return await client.fetch(`*[_type == "blogPostDev"]`); } - - async function getLinks() { console.log('🟢🟢🟢 Export'); const fileContents = fs.readFileSync(titlesFile, 'utf8'); @@ -33,6 +30,8 @@ async function getLinks() { const oldPosts = await readPosts(); const newPosts = await fetchNewPosts(); + let csvContent = 'title,oldURL,devURL,prodURL\n'; + for (const title of titles) { const oldPost = oldPosts.find(p => p.title === title); const newPost = newPosts.find(p => p.title === title); @@ -42,15 +41,18 @@ async function getLinks() { let id = newPost?._id || ''; if (id.split('.')[1]) id = id.split('.')[1]; - let newSlug = newPost?.slug?.current || ''; + let newSlug = newPost?.meta?.slug?.current || ''; const oldURL = oldPost ? `https://nextflow.io/blog/${oldPost.slug}.html` : ''; const devURL = newPost ? `https://seqera.io/preview?type=blogPostDev&id=${id}` : ''; const prodURL = newPost ? `https://seqera.io/blog/${newSlug}` : ''; + const escapedTitle = title.includes(',') ? `"${title}"` : title; + csvContent += `${escapedTitle},${oldURL},${devURL},${prodURL}\n`; } - + fs.writeFileSync(outputFile, csvContent, 'utf8'); + console.log(`CSV file has been written to ${outputFile}`); } getLinks(); \ No newline at end of file diff --git a/internal/step2/links.csv b/internal/step2/links.csv new file mode 100644 index 00000000..fb8181c4 --- /dev/null +++ b/internal/step2/links.csv @@ -0,0 +1,88 @@ +title,oldURL,devURL,prodURL +Application of Nextflow and nf-core to ancient environmental eDNA,https://nextflow.io/blog/2024/nextflow-nf-core-ancient-env-dna.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT6t5,https://seqera.io/blog/nextflow-nf-core-ancient-env-dna +Configure Git private repositories with Nextflow,https://nextflow.io/blog/2021/configure-git-repositories-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFqeR,https://seqera.io/blog/configure-git-repositories-with-nextflow +Join us in welcoming the new Nextflow Ambassadors,https://nextflow.io/blog/2024/welcome_ambassadors_20242.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT9FD,https://seqera.io/blog/welcome_ambassadors_20242 +Leveraging nf-test for enhanced quality control in nf-core,https://nextflow.io/blog/2024/nf-test-in-nf-core.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp6hv,https://seqera.io/blog/nf-test-in-nf-core +Nextflow Training: Bridging Online Learning with In-Person Connections,https://nextflow.io/blog/2024/training-local-site.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT8tI,https://seqera.io/blog/training-local-site +Nextflow workshop at the 20th KOGO Winter Symposium,https://nextflow.io/blog/2024/nxf-nf-core-workshop-kogo.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7Tc,https://seqera.io/blog/nxf-nf-core-workshop-kogo +nf-schema: the new and improved nf-validation,https://nextflow.io/blog/2024/nf-schema.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7CZ,https://seqera.io/blog/nf-schema +One-Year Reflections on Nextflow Mentorship,https://nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltGAlL,https://seqera.io/blog/reflections-on-nextflow-mentorship +Optimizing Nextflow for HPC and Cloud at Scale,https://nextflow.io/blog/2024/optimizing-nextflow-for-hpc-and-cloud-at-scale.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7iE,https://seqera.io/blog/optimizing-nextflow-for-hpc-and-cloud-at-scale +Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador,https://nextflow.io/blog/2024/reflecting-ambassador-collaboration.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp7UY,https://seqera.io/blog/reflecting-ambassador-collaboration +My Journey with Nextflow: From Exploration to Automation,,, +Addressing Bioinformatics Core Challenges with Nextflow and nf-core,https://nextflow.io/blog/2024/addressing-bioinformatics-core-challenges.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG6WC,https://seqera.io/blog/addressing-bioinformatics-core-challenges +Moving toward better support through the Community forum,https://nextflow.io/blog/2024/better-support-through-community-forum-2024.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp4cF,https://seqera.io/blog/better-support-through-community-forum-2024 +Experimental cleanup with nf-boost,https://nextflow.io/blog/2024/experimental-cleanup-with-nf-boost.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5pK,https://seqera.io/blog/experimental-cleanup-with-nf-boost +How I became a Nextflow Ambassador!,https://nextflow.io/blog/2024/how_i_became_a_nextflow_ambassador.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT61V,https://seqera.io/blog/how_i_became_a_nextflow_ambassador +Fostering Bioinformatics Growth in Türkiye,https://nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPWy4,https://seqera.io/blog/bioinformatics-growth-in-turkiye +Nextflow 24.04 - Release highlights,https://nextflow.io/blog/2024/nextflow-24.04-highlights.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG7Im,https://seqera.io/blog/nextflow-24.04-highlights +Open call for new Nextflow Ambassadors closes June 14,https://nextflow.io/blog/2024/ambassador-second-call.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPWcr,https://seqera.io/blog/ambassador-second-call +Empowering Bioinformatics: Mentoring Across Continents with Nextflow,https://nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPXF2,https://seqera.io/blog/empowering-bioinformatics-mentoring +Nextflow's colorful new console output,https://nextflow.io/blog/2024/nextflow-colored-logs.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPYob,https://seqera.io/blog/nextflow-colored-logs +"Nextflow and nf-core Mentorship, Round 3",https://nextflow.io/blog/2023/czi-mentorship-round-3.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp23O,https://seqera.io/blog/czi-mentorship-round-3 +Nextflow Summit 2023 Recap,https://nextflow.io/blog/2023/nextflow-summit-2023-recap.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp36E,https://seqera.io/blog/nextflow-summit-2023-recap +Introducing community.seqera.io,https://nextflow.io/blog/2023/community-forum.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPTkh,https://seqera.io/blog/community-forum +Introducing the Nextflow Ambassador Program,https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG5TQ,https://seqera.io/blog/introducing-nextflow-ambassador-program +Geraldine Van der Auwera joins Seqera,https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp2Q5,https://seqera.io/blog/geraldine-van-der-auwera-joins-seqera +Nextflow goes to university!,https://nextflow.io/blog/2023/nextflow-goes-to-university.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG5jc,https://seqera.io/blog/nextflow-goes-to-university +A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer”,https://nextflow.io/blog/2023/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT1ye,https://seqera.io/blog/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer +Reflecting on ten years of Nextflow awesomeness,https://nextflow.io/blog/2023/reflecting-on-ten-years-of-nextflow-awesomeness.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5En,https://seqera.io/blog/reflecting-on-ten-years-of-nextflow-awesomeness +Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters,https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT2PR,https://seqera.io/blog/best-practices-deploying-pipelines-with-hpc-workload-managers +Selecting the right storage architecture for your Nextflow pipelines,https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp494,https://seqera.io/blog/selecting-the-right-storage-architecture-for-your-nextflow-pipelines +Celebrating our largest international training event and hackathon to date,https://nextflow.io/blog/2023/celebrating-our-largest-international-training-event-and-hackathon-to-date.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp1AH,https://seqera.io/blog/celebrating-our-largest-international-training-event-and-hackathon-to-date +"Nextflow and nf-core Mentorship, Round 2",https://nextflow.io/blog/2023/czi-mentorship-round-2.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG4cn,https://seqera.io/blog/czi-mentorship-round-2 +The State of Kubernetes in Nextflow,https://nextflow.io/blog/2023/the-state-of-kubernetes-in-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5M6,https://seqera.io/blog/the-state-of-kubernetes-in-nextflow +Learn Nextflow in 2023,https://nextflow.io/blog/2023/learn-nextflow-in-2023.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPV3I,https://seqera.io/blog/learn-nextflow-in-2023 +Get started with Nextflow on Google Cloud Batch,https://nextflow.io/blog/2023/nextflow-with-gbatch.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp3ZP,https://seqera.io/blog/nextflow-with-gbatch +Analyzing caching behavior of pipelines,https://nextflow.io/blog/2022/caching-behavior-analysis.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG1gc,https://seqera.io/blog/caching-behavior-analysis +Nextflow Summit 2022 Recap,https://nextflow.io/blog/2022/nextflow-summit-2022-recap.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPR11,https://seqera.io/blog/nextflow-summit-2022-recap +Rethinking containers for cloud native pipelines,https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp00x,https://seqera.io/blog/rethinking-containers-for-cloud-native-pipelines +Turbo-charging the Nextflow command line with Fig!,https://nextflow.io/blog/2022/turbocharging-nextflow-with-fig.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp0XN,https://seqera.io/blog/turbocharging-nextflow-with-fig +"Nextflow and nf-core mentorship, Round 1",https://nextflow.io/blog/2022/czi-mentorship-round-1.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG2L6,https://seqera.io/blog/czi-mentorship-round-1 +Deploy Nextflow Pipelines with Google Cloud Batch!,https://nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT1Sz,https://seqera.io/blog/deploy-nextflow-pipelines-with-google-cloud-batch +Evolution of the Nextflow runtime,https://nextflow.io/blog/2022/evolution-of-nextflow-runtime.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPQOq,https://seqera.io/blog/evolution-of-nextflow-runtime +Nextflow’s community is moving to Slack!,https://nextflow.io/blog/2022/nextflow-is-moving-to-slack.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPQfo,https://seqera.io/blog/nextflow-is-moving-to-slack +Learning Nextflow in 2022,https://nextflow.io/blog/2022/learn-nextflow-in-2022.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrozXm,https://seqera.io/blog/learn-nextflow-in-2022 +Configure Git private repositories with Nextflow,https://nextflow.io/blog/2021/configure-git-repositories-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFqeR,https://seqera.io/blog/configure-git-repositories-with-nextflow +Setting up a Nextflow environment on Windows 10,https://nextflow.io/blog/2021/setup-nextflow-on-windows.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPPIy,https://seqera.io/blog/setup-nextflow-on-windows +Introducing Nextflow support for SQL databases,https://nextflow.io/blog/2021/nextflow-sql-support.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFxxr,https://seqera.io/blog/nextflow-sql-support +Five more tips for Nextflow user on HPC,https://nextflow.io/blog/2021/5-more-tips-for-nextflow-user-on-hpc.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPMV3,https://seqera.io/blog/5-more-tips-for-nextflow-user-on-hpc +5 Nextflow Tips for HPC Users,https://nextflow.io/blog/2021/5_tips_for_hpc_users.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyP1,https://seqera.io/blog/5_tips_for_hpc_users +6 Tips for Setting Up Your Nextflow Dev Environment,https://nextflow.io/blog/2021/nextflow-developer-environment.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFxli,https://seqera.io/blog/nextflow-developer-environment +Introducing Nextflow for Azure Batch,https://nextflow.io/blog/2021/introducing-nextflow-for-azure-batch.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyWK,https://seqera.io/blog/introducing-nextflow-for-azure-batch +Learning Nextflow in 2020,https://nextflow.io/blog/2020/learning-nextflow-in-2020.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyAP,https://seqera.io/blog/learning-nextflow-in-2020 +More syntax sugar for Nextflow developers!,https://nextflow.io/blog/2020/groovy3-syntax-sugar.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPM5b,https://seqera.io/blog/groovy3-syntax-sugar +The Nextflow CLI - tricks and treats!,https://nextflow.io/blog/2020/cli-docs-release.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrovFw,https://seqera.io/blog/cli-docs-release +Nextflow DSL 2 is here!,https://nextflow.io/blog/2020/dsl2-is-here.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPLss,https://seqera.io/blog/dsl2-is-here +Easy provenance reporting,https://nextflow.io/blog/2019/easy-provenance-report.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFpzx,https://seqera.io/blog/easy-provenance-report +Troubleshooting Nextflow resume,https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSu2g,https://seqera.io/blog/troubleshooting-nextflow-resume +Demystifying Nextflow resume,https://nextflow.io/blog/2019/demystifying-nextflow-resume.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFpjl,https://seqera.io/blog/demystifying-nextflow-resume +One more step towards Nextflow modules,https://nextflow.io/blog/2019/one-more-step-towards-modules.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpStqV,https://seqera.io/blog/one-more-step-towards-modules +Nextflow 19.04.0 stable release is out!,https://nextflow.io/blog/2019/release-19.04.0-stable.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPLPB,https://seqera.io/blog/release-19.04.0-stable +Edge release 19.03: The Sequence Read Archive & more!,https://nextflow.io/blog/2019/release-19.03.0-edge.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkroud2,https://seqera.io/blog/release-19.03.0-edge +Bringing Nextflow to Google Cloud Platform with WuXi NextCODE,https://nextflow.io/blog/2018/bringing-nextflow-to-google-cloud-wuxinextcode.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFoEe,https://seqera.io/blog/bringing-nextflow-to-google-cloud-wuxinextcode +"Goodbye zero, Hello Apache!",https://nextflow.io/blog/2018/goodbye-zero-hello-apache.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrotAG,https://seqera.io/blog/goodbye-zero-hello-apache +Nextflow meets Dockstore,https://nextflow.io/blog/2018/nextflow-meets-dockstore.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPDoK,https://seqera.io/blog/nextflow-meets-dockstore +Clarification about the Nextflow license,https://nextflow.io/blog/2018/clarification-about-nextflow-license.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrosnZ,https://seqera.io/blog/clarification-about-nextflow-license +Conda support has landed!,https://nextflow.io/blog/2018/conda-support-has-landed.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrosxI,https://seqera.io/blog/conda-support-has-landed +Nextflow turns five! Happy birthday!,https://nextflow.io/blog/2018/nextflow-turns-5.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpStjC,https://seqera.io/blog/nextflow-turns-5 +Running CAW with Singularity and Nextflow,https://nextflow.io/blog/2017/caw-and-singularity.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFgCh,https://seqera.io/blog/caw-and-singularity +Scaling with AWS Batch,https://nextflow.io/blog/2017/scaling-with-aws-batch.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFniG,https://seqera.io/blog/scaling-with-aws-batch +Nexflow Hackathon 2017,https://nextflow.io/blog/2017/nextflow-hack17.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkros0w,https://seqera.io/blog/nextflow-hack17 +Nextflow and the Common Workflow Language,https://nextflow.io/blog/2017/nextflow-and-cwl.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSrvA,https://seqera.io/blog/nextflow-and-cwl +Nextflow workshop is coming!,https://nextflow.io/blog/2017/nextflow-workshop.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSsEe,https://seqera.io/blog/nextflow-workshop +Nextflow published in Nature Biotechnology,https://nextflow.io/blog/2017/nextflow-nature-biotech-paper.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSs2T,https://seqera.io/blog/nextflow-nature-biotech-paper +More fun with containers in HPC,https://nextflow.io/blog/2016/more-fun-containers-hpc.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPBtY,https://seqera.io/blog/more-fun-containers-hpc +Enabling elastic computing with Nextflow,https://nextflow.io/blog/2016/enabling-elastic-computing-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrorKn,https://seqera.io/blog/enabling-elastic-computing-nextflow +Deploy your computational pipelines in the cloud at the snap-of-a-finger,https://nextflow.io/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPA31,https://seqera.io/blog/deploy-in-the-cloud-at-snap-of-a-finger +Docker for dunces & Nextflow for nunces,https://nextflow.io/blog/2016/docker-for-dunces-nextflow-for-nunces.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSnAT,https://seqera.io/blog/docker-for-dunces-nextflow-for-nunces +Workflows & publishing: best practice for reproducibility,https://nextflow.io/blog/2016/best-practice-for-reproducibility.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP9m3,https://seqera.io/blog/best-practice-for-reproducibility +Error recovery and automatic resource management with Nextflow,https://nextflow.io/blog/2016/error-recovery-and-automatic-resources-management.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFfoP,https://seqera.io/blog/error-recovery-and-automatic-resources-management +Developing a bioinformatics pipeline across multiple environments,https://nextflow.io/blog/2016/developing-bioinformatics-pipeline-across-multiple-environments.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkroqur,https://seqera.io/blog/developing-bioinformatics-pipeline-across-multiple-environments +MPI-like distributed execution with Nextflow,https://nextflow.io/blog/2015/mpi-like-execution-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFexm,https://seqera.io/blog/mpi-like-execution-with-nextflow +The impact of Docker containers on the performance of genomic pipelines,https://nextflow.io/blog/2015/the-impact-of-docker-on-genomic-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFf9v,https://seqera.io/blog/the-impact-of-docker-on-genomic-pipelines +Innovation In Science - The story behind Nextflow,https://nextflow.io/blog/2015/innovation-in-science-the-story-behind-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP99s,https://seqera.io/blog/innovation-in-science-the-story-behind-nextflow +Introducing Nextflow REPL Console,https://nextflow.io/blog/2015/introducing-nextflow-console.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrooih,https://seqera.io/blog/introducing-nextflow-console +Using Docker for scientific data analysis in an HPC cluster,https://nextflow.io/blog/2014/using-docker-in-hpc-cluster.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrooM0,https://seqera.io/blog/using-docker-in-hpc-cluster +Reproducibility in Science - Nextflow meets Docker,https://nextflow.io/blog/2014/nextflow-meets-docker.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP8bw,https://seqera.io/blog/nextflow-meets-docker +Share Nextflow pipelines with GitHub,https://nextflow.io/blog/2014/share-nextflow-pipelines-with-github.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP8su,https://seqera.io/blog/share-nextflow-pipelines-with-github From 40fbfc63994b6d797b859402bf10fe6827177d6a Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Fri, 4 Oct 2024 12:08:54 +0200 Subject: [PATCH 18/21] Export updates --- internal/step2/exportLinks.mjs | 5 +- internal/step2/exportLinks2.mjs | 45 ++++++++ internal/step2/links.csv | 176 ++++++++++++++++---------------- internal/step2/links2.csv | 75 ++++++++++++++ 4 files changed, 211 insertions(+), 90 deletions(-) create mode 100644 internal/step2/exportLinks2.mjs create mode 100644 internal/step2/links2.csv diff --git a/internal/step2/exportLinks.mjs b/internal/step2/exportLinks.mjs index 527bc943..944f773c 100644 --- a/internal/step2/exportLinks.mjs +++ b/internal/step2/exportLinks.mjs @@ -30,7 +30,7 @@ async function getLinks() { const oldPosts = await readPosts(); const newPosts = await fetchNewPosts(); - let csvContent = 'title,oldURL,devURL,prodURL\n'; + let csvContent = 'title,oldURL,devURL,prodURL,cmsURL\n'; for (const title of titles) { const oldPost = oldPosts.find(p => p.title === title); @@ -46,9 +46,10 @@ async function getLinks() { const oldURL = oldPost ? `https://nextflow.io/blog/${oldPost.slug}.html` : ''; const devURL = newPost ? `https://seqera.io/preview?type=blogPostDev&id=${id}` : ''; const prodURL = newPost ? `https://seqera.io/blog/${newSlug}` : ''; + const cmsURL = 'https://seqera-cms.netlify.app/seqera/structure/blogPostDev;' + id; const escapedTitle = title.includes(',') ? `"${title}"` : title; - csvContent += `${escapedTitle},${oldURL},${devURL},${prodURL}\n`; + csvContent += `${escapedTitle},${oldURL},${devURL},${prodURL},${cmsURL}\n`; } fs.writeFileSync(outputFile, csvContent, 'utf8'); diff --git a/internal/step2/exportLinks2.mjs b/internal/step2/exportLinks2.mjs new file mode 100644 index 00000000..1488ce0a --- /dev/null +++ b/internal/step2/exportLinks2.mjs @@ -0,0 +1,45 @@ +import fs from 'fs'; +import path from 'path'; +import sanityClient from '@sanity/client'; + +const outputFile = path.join(process.cwd(), './links2.csv'); + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + +async function fetchNewPosts() { + return await client.fetch(`*[_type == "blogPostDev2"]`); +} + +async function getLinks() { + console.log('🟢🟢🟢 Export'); + const newPosts = await fetchNewPosts(); + + let csvContent = 'title,oldURL,devURL,prodURL,cmsURL\n'; + + for (const post of newPosts) { + + let id = post?._id || ''; + if (id.split('.')[1]) id = id.split('.')[1]; + + const slug = post?.meta?.slug?.current || ''; + const title = post?.title || ''; + + const oldURL = post ? `https://seqera.io/blog/${slug}` : ''; + const devURL = post ? `https://seqera.io/preview?type=blogPostDev&id=${id}` : ''; + const prodURL = '' + const cmsURL = 'https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;' + id; + + const escapedTitle = title.includes(',') ? `"${title}"` : title; + csvContent += `${escapedTitle},${oldURL},${devURL},${prodURL},${cmsURL}\n`; + } + + fs.writeFileSync(outputFile, csvContent, 'utf8'); + console.log(`CSV file has been written to ${outputFile}`); +} + +getLinks(); \ No newline at end of file diff --git a/internal/step2/links.csv b/internal/step2/links.csv index fb8181c4..c4e23dfc 100644 --- a/internal/step2/links.csv +++ b/internal/step2/links.csv @@ -1,88 +1,88 @@ -title,oldURL,devURL,prodURL -Application of Nextflow and nf-core to ancient environmental eDNA,https://nextflow.io/blog/2024/nextflow-nf-core-ancient-env-dna.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT6t5,https://seqera.io/blog/nextflow-nf-core-ancient-env-dna -Configure Git private repositories with Nextflow,https://nextflow.io/blog/2021/configure-git-repositories-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFqeR,https://seqera.io/blog/configure-git-repositories-with-nextflow -Join us in welcoming the new Nextflow Ambassadors,https://nextflow.io/blog/2024/welcome_ambassadors_20242.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT9FD,https://seqera.io/blog/welcome_ambassadors_20242 -Leveraging nf-test for enhanced quality control in nf-core,https://nextflow.io/blog/2024/nf-test-in-nf-core.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp6hv,https://seqera.io/blog/nf-test-in-nf-core -Nextflow Training: Bridging Online Learning with In-Person Connections,https://nextflow.io/blog/2024/training-local-site.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT8tI,https://seqera.io/blog/training-local-site -Nextflow workshop at the 20th KOGO Winter Symposium,https://nextflow.io/blog/2024/nxf-nf-core-workshop-kogo.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7Tc,https://seqera.io/blog/nxf-nf-core-workshop-kogo -nf-schema: the new and improved nf-validation,https://nextflow.io/blog/2024/nf-schema.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7CZ,https://seqera.io/blog/nf-schema -One-Year Reflections on Nextflow Mentorship,https://nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltGAlL,https://seqera.io/blog/reflections-on-nextflow-mentorship -Optimizing Nextflow for HPC and Cloud at Scale,https://nextflow.io/blog/2024/optimizing-nextflow-for-hpc-and-cloud-at-scale.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7iE,https://seqera.io/blog/optimizing-nextflow-for-hpc-and-cloud-at-scale -Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador,https://nextflow.io/blog/2024/reflecting-ambassador-collaboration.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp7UY,https://seqera.io/blog/reflecting-ambassador-collaboration -My Journey with Nextflow: From Exploration to Automation,,, -Addressing Bioinformatics Core Challenges with Nextflow and nf-core,https://nextflow.io/blog/2024/addressing-bioinformatics-core-challenges.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG6WC,https://seqera.io/blog/addressing-bioinformatics-core-challenges -Moving toward better support through the Community forum,https://nextflow.io/blog/2024/better-support-through-community-forum-2024.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp4cF,https://seqera.io/blog/better-support-through-community-forum-2024 -Experimental cleanup with nf-boost,https://nextflow.io/blog/2024/experimental-cleanup-with-nf-boost.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5pK,https://seqera.io/blog/experimental-cleanup-with-nf-boost -How I became a Nextflow Ambassador!,https://nextflow.io/blog/2024/how_i_became_a_nextflow_ambassador.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT61V,https://seqera.io/blog/how_i_became_a_nextflow_ambassador -Fostering Bioinformatics Growth in Türkiye,https://nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPWy4,https://seqera.io/blog/bioinformatics-growth-in-turkiye -Nextflow 24.04 - Release highlights,https://nextflow.io/blog/2024/nextflow-24.04-highlights.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG7Im,https://seqera.io/blog/nextflow-24.04-highlights -Open call for new Nextflow Ambassadors closes June 14,https://nextflow.io/blog/2024/ambassador-second-call.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPWcr,https://seqera.io/blog/ambassador-second-call -Empowering Bioinformatics: Mentoring Across Continents with Nextflow,https://nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPXF2,https://seqera.io/blog/empowering-bioinformatics-mentoring -Nextflow's colorful new console output,https://nextflow.io/blog/2024/nextflow-colored-logs.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPYob,https://seqera.io/blog/nextflow-colored-logs -"Nextflow and nf-core Mentorship, Round 3",https://nextflow.io/blog/2023/czi-mentorship-round-3.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp23O,https://seqera.io/blog/czi-mentorship-round-3 -Nextflow Summit 2023 Recap,https://nextflow.io/blog/2023/nextflow-summit-2023-recap.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp36E,https://seqera.io/blog/nextflow-summit-2023-recap -Introducing community.seqera.io,https://nextflow.io/blog/2023/community-forum.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPTkh,https://seqera.io/blog/community-forum -Introducing the Nextflow Ambassador Program,https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG5TQ,https://seqera.io/blog/introducing-nextflow-ambassador-program -Geraldine Van der Auwera joins Seqera,https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp2Q5,https://seqera.io/blog/geraldine-van-der-auwera-joins-seqera -Nextflow goes to university!,https://nextflow.io/blog/2023/nextflow-goes-to-university.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG5jc,https://seqera.io/blog/nextflow-goes-to-university -A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer”,https://nextflow.io/blog/2023/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT1ye,https://seqera.io/blog/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer -Reflecting on ten years of Nextflow awesomeness,https://nextflow.io/blog/2023/reflecting-on-ten-years-of-nextflow-awesomeness.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5En,https://seqera.io/blog/reflecting-on-ten-years-of-nextflow-awesomeness -Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters,https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT2PR,https://seqera.io/blog/best-practices-deploying-pipelines-with-hpc-workload-managers -Selecting the right storage architecture for your Nextflow pipelines,https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp494,https://seqera.io/blog/selecting-the-right-storage-architecture-for-your-nextflow-pipelines -Celebrating our largest international training event and hackathon to date,https://nextflow.io/blog/2023/celebrating-our-largest-international-training-event-and-hackathon-to-date.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp1AH,https://seqera.io/blog/celebrating-our-largest-international-training-event-and-hackathon-to-date -"Nextflow and nf-core Mentorship, Round 2",https://nextflow.io/blog/2023/czi-mentorship-round-2.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG4cn,https://seqera.io/blog/czi-mentorship-round-2 -The State of Kubernetes in Nextflow,https://nextflow.io/blog/2023/the-state-of-kubernetes-in-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5M6,https://seqera.io/blog/the-state-of-kubernetes-in-nextflow -Learn Nextflow in 2023,https://nextflow.io/blog/2023/learn-nextflow-in-2023.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPV3I,https://seqera.io/blog/learn-nextflow-in-2023 -Get started with Nextflow on Google Cloud Batch,https://nextflow.io/blog/2023/nextflow-with-gbatch.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp3ZP,https://seqera.io/blog/nextflow-with-gbatch -Analyzing caching behavior of pipelines,https://nextflow.io/blog/2022/caching-behavior-analysis.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG1gc,https://seqera.io/blog/caching-behavior-analysis -Nextflow Summit 2022 Recap,https://nextflow.io/blog/2022/nextflow-summit-2022-recap.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPR11,https://seqera.io/blog/nextflow-summit-2022-recap -Rethinking containers for cloud native pipelines,https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp00x,https://seqera.io/blog/rethinking-containers-for-cloud-native-pipelines -Turbo-charging the Nextflow command line with Fig!,https://nextflow.io/blog/2022/turbocharging-nextflow-with-fig.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp0XN,https://seqera.io/blog/turbocharging-nextflow-with-fig -"Nextflow and nf-core mentorship, Round 1",https://nextflow.io/blog/2022/czi-mentorship-round-1.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG2L6,https://seqera.io/blog/czi-mentorship-round-1 -Deploy Nextflow Pipelines with Google Cloud Batch!,https://nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT1Sz,https://seqera.io/blog/deploy-nextflow-pipelines-with-google-cloud-batch -Evolution of the Nextflow runtime,https://nextflow.io/blog/2022/evolution-of-nextflow-runtime.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPQOq,https://seqera.io/blog/evolution-of-nextflow-runtime -Nextflow’s community is moving to Slack!,https://nextflow.io/blog/2022/nextflow-is-moving-to-slack.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPQfo,https://seqera.io/blog/nextflow-is-moving-to-slack -Learning Nextflow in 2022,https://nextflow.io/blog/2022/learn-nextflow-in-2022.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrozXm,https://seqera.io/blog/learn-nextflow-in-2022 -Configure Git private repositories with Nextflow,https://nextflow.io/blog/2021/configure-git-repositories-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFqeR,https://seqera.io/blog/configure-git-repositories-with-nextflow -Setting up a Nextflow environment on Windows 10,https://nextflow.io/blog/2021/setup-nextflow-on-windows.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPPIy,https://seqera.io/blog/setup-nextflow-on-windows -Introducing Nextflow support for SQL databases,https://nextflow.io/blog/2021/nextflow-sql-support.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFxxr,https://seqera.io/blog/nextflow-sql-support -Five more tips for Nextflow user on HPC,https://nextflow.io/blog/2021/5-more-tips-for-nextflow-user-on-hpc.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPMV3,https://seqera.io/blog/5-more-tips-for-nextflow-user-on-hpc -5 Nextflow Tips for HPC Users,https://nextflow.io/blog/2021/5_tips_for_hpc_users.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyP1,https://seqera.io/blog/5_tips_for_hpc_users -6 Tips for Setting Up Your Nextflow Dev Environment,https://nextflow.io/blog/2021/nextflow-developer-environment.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFxli,https://seqera.io/blog/nextflow-developer-environment -Introducing Nextflow for Azure Batch,https://nextflow.io/blog/2021/introducing-nextflow-for-azure-batch.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyWK,https://seqera.io/blog/introducing-nextflow-for-azure-batch -Learning Nextflow in 2020,https://nextflow.io/blog/2020/learning-nextflow-in-2020.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyAP,https://seqera.io/blog/learning-nextflow-in-2020 -More syntax sugar for Nextflow developers!,https://nextflow.io/blog/2020/groovy3-syntax-sugar.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPM5b,https://seqera.io/blog/groovy3-syntax-sugar -The Nextflow CLI - tricks and treats!,https://nextflow.io/blog/2020/cli-docs-release.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrovFw,https://seqera.io/blog/cli-docs-release -Nextflow DSL 2 is here!,https://nextflow.io/blog/2020/dsl2-is-here.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPLss,https://seqera.io/blog/dsl2-is-here -Easy provenance reporting,https://nextflow.io/blog/2019/easy-provenance-report.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFpzx,https://seqera.io/blog/easy-provenance-report -Troubleshooting Nextflow resume,https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSu2g,https://seqera.io/blog/troubleshooting-nextflow-resume -Demystifying Nextflow resume,https://nextflow.io/blog/2019/demystifying-nextflow-resume.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFpjl,https://seqera.io/blog/demystifying-nextflow-resume -One more step towards Nextflow modules,https://nextflow.io/blog/2019/one-more-step-towards-modules.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpStqV,https://seqera.io/blog/one-more-step-towards-modules -Nextflow 19.04.0 stable release is out!,https://nextflow.io/blog/2019/release-19.04.0-stable.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPLPB,https://seqera.io/blog/release-19.04.0-stable -Edge release 19.03: The Sequence Read Archive & more!,https://nextflow.io/blog/2019/release-19.03.0-edge.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkroud2,https://seqera.io/blog/release-19.03.0-edge -Bringing Nextflow to Google Cloud Platform with WuXi NextCODE,https://nextflow.io/blog/2018/bringing-nextflow-to-google-cloud-wuxinextcode.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFoEe,https://seqera.io/blog/bringing-nextflow-to-google-cloud-wuxinextcode -"Goodbye zero, Hello Apache!",https://nextflow.io/blog/2018/goodbye-zero-hello-apache.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrotAG,https://seqera.io/blog/goodbye-zero-hello-apache -Nextflow meets Dockstore,https://nextflow.io/blog/2018/nextflow-meets-dockstore.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPDoK,https://seqera.io/blog/nextflow-meets-dockstore -Clarification about the Nextflow license,https://nextflow.io/blog/2018/clarification-about-nextflow-license.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrosnZ,https://seqera.io/blog/clarification-about-nextflow-license -Conda support has landed!,https://nextflow.io/blog/2018/conda-support-has-landed.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrosxI,https://seqera.io/blog/conda-support-has-landed -Nextflow turns five! Happy birthday!,https://nextflow.io/blog/2018/nextflow-turns-5.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpStjC,https://seqera.io/blog/nextflow-turns-5 -Running CAW with Singularity and Nextflow,https://nextflow.io/blog/2017/caw-and-singularity.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFgCh,https://seqera.io/blog/caw-and-singularity -Scaling with AWS Batch,https://nextflow.io/blog/2017/scaling-with-aws-batch.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFniG,https://seqera.io/blog/scaling-with-aws-batch -Nexflow Hackathon 2017,https://nextflow.io/blog/2017/nextflow-hack17.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkros0w,https://seqera.io/blog/nextflow-hack17 -Nextflow and the Common Workflow Language,https://nextflow.io/blog/2017/nextflow-and-cwl.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSrvA,https://seqera.io/blog/nextflow-and-cwl -Nextflow workshop is coming!,https://nextflow.io/blog/2017/nextflow-workshop.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSsEe,https://seqera.io/blog/nextflow-workshop -Nextflow published in Nature Biotechnology,https://nextflow.io/blog/2017/nextflow-nature-biotech-paper.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSs2T,https://seqera.io/blog/nextflow-nature-biotech-paper -More fun with containers in HPC,https://nextflow.io/blog/2016/more-fun-containers-hpc.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPBtY,https://seqera.io/blog/more-fun-containers-hpc -Enabling elastic computing with Nextflow,https://nextflow.io/blog/2016/enabling-elastic-computing-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrorKn,https://seqera.io/blog/enabling-elastic-computing-nextflow -Deploy your computational pipelines in the cloud at the snap-of-a-finger,https://nextflow.io/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPA31,https://seqera.io/blog/deploy-in-the-cloud-at-snap-of-a-finger -Docker for dunces & Nextflow for nunces,https://nextflow.io/blog/2016/docker-for-dunces-nextflow-for-nunces.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSnAT,https://seqera.io/blog/docker-for-dunces-nextflow-for-nunces -Workflows & publishing: best practice for reproducibility,https://nextflow.io/blog/2016/best-practice-for-reproducibility.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP9m3,https://seqera.io/blog/best-practice-for-reproducibility -Error recovery and automatic resource management with Nextflow,https://nextflow.io/blog/2016/error-recovery-and-automatic-resources-management.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFfoP,https://seqera.io/blog/error-recovery-and-automatic-resources-management -Developing a bioinformatics pipeline across multiple environments,https://nextflow.io/blog/2016/developing-bioinformatics-pipeline-across-multiple-environments.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkroqur,https://seqera.io/blog/developing-bioinformatics-pipeline-across-multiple-environments -MPI-like distributed execution with Nextflow,https://nextflow.io/blog/2015/mpi-like-execution-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFexm,https://seqera.io/blog/mpi-like-execution-with-nextflow -The impact of Docker containers on the performance of genomic pipelines,https://nextflow.io/blog/2015/the-impact-of-docker-on-genomic-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFf9v,https://seqera.io/blog/the-impact-of-docker-on-genomic-pipelines -Innovation In Science - The story behind Nextflow,https://nextflow.io/blog/2015/innovation-in-science-the-story-behind-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP99s,https://seqera.io/blog/innovation-in-science-the-story-behind-nextflow -Introducing Nextflow REPL Console,https://nextflow.io/blog/2015/introducing-nextflow-console.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrooih,https://seqera.io/blog/introducing-nextflow-console -Using Docker for scientific data analysis in an HPC cluster,https://nextflow.io/blog/2014/using-docker-in-hpc-cluster.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrooM0,https://seqera.io/blog/using-docker-in-hpc-cluster -Reproducibility in Science - Nextflow meets Docker,https://nextflow.io/blog/2014/nextflow-meets-docker.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP8bw,https://seqera.io/blog/nextflow-meets-docker -Share Nextflow pipelines with GitHub,https://nextflow.io/blog/2014/share-nextflow-pipelines-with-github.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP8su,https://seqera.io/blog/share-nextflow-pipelines-with-github +title,oldURL,devURL,prodURL,cmsURL +Application of Nextflow and nf-core to ancient environmental eDNA,https://nextflow.io/blog/2024/nextflow-nf-core-ancient-env-dna.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT6t5,https://seqera.io/blog/nextflow-nf-core-ancient-env-dna,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT6t5 +Configure Git private repositories with Nextflow,https://nextflow.io/blog/2021/configure-git-repositories-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFqeR,https://seqera.io/blog/configure-git-repositories-with-nextflow,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFqeR +Join us in welcoming the new Nextflow Ambassadors,https://nextflow.io/blog/2024/welcome_ambassadors_20242.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT9FD,https://seqera.io/blog/welcome_ambassadors_20242,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT9FD +Leveraging nf-test for enhanced quality control in nf-core,https://nextflow.io/blog/2024/nf-test-in-nf-core.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp6hv,https://seqera.io/blog/nf-test-in-nf-core,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp6hv +Nextflow Training: Bridging Online Learning with In-Person Connections,https://nextflow.io/blog/2024/training-local-site.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT8tI,https://seqera.io/blog/training-local-site,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT8tI +Nextflow workshop at the 20th KOGO Winter Symposium,https://nextflow.io/blog/2024/nxf-nf-core-workshop-kogo.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7Tc,https://seqera.io/blog/nxf-nf-core-workshop-kogo,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT7Tc +nf-schema: the new and improved nf-validation,https://nextflow.io/blog/2024/nf-schema.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7CZ,https://seqera.io/blog/nf-schema,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT7CZ +One-Year Reflections on Nextflow Mentorship,https://nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltGAlL,https://seqera.io/blog/reflections-on-nextflow-mentorship,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltGAlL +Optimizing Nextflow for HPC and Cloud at Scale,https://nextflow.io/blog/2024/optimizing-nextflow-for-hpc-and-cloud-at-scale.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT7iE,https://seqera.io/blog/optimizing-nextflow-for-hpc-and-cloud-at-scale,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT7iE +Reflecting on a Six-Month Collaboration: Insights from a Nextflow Ambassador,https://nextflow.io/blog/2024/reflecting-ambassador-collaboration.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp7UY,https://seqera.io/blog/reflecting-ambassador-collaboration,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp7UY +My Journey with Nextflow: From Exploration to Automation,,,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev; +Addressing Bioinformatics Core Challenges with Nextflow and nf-core,https://nextflow.io/blog/2024/addressing-bioinformatics-core-challenges.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG6WC,https://seqera.io/blog/addressing-bioinformatics-core-challenges,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG6WC +Moving toward better support through the Community forum,https://nextflow.io/blog/2024/better-support-through-community-forum-2024.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp4cF,https://seqera.io/blog/better-support-through-community-forum-2024,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp4cF +Experimental cleanup with nf-boost,https://nextflow.io/blog/2024/experimental-cleanup-with-nf-boost.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5pK,https://seqera.io/blog/experimental-cleanup-with-nf-boost,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT5pK +How I became a Nextflow Ambassador!,https://nextflow.io/blog/2024/how_i_became_a_nextflow_ambassador.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT61V,https://seqera.io/blog/how_i_became_a_nextflow_ambassador,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT61V +Fostering Bioinformatics Growth in Türkiye,https://nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPWy4,https://seqera.io/blog/bioinformatics-growth-in-turkiye,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPWy4 +Nextflow 24.04 - Release highlights,https://nextflow.io/blog/2024/nextflow-24.04-highlights.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG7Im,https://seqera.io/blog/nextflow-24.04-highlights,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG7Im +Open call for new Nextflow Ambassadors closes June 14,https://nextflow.io/blog/2024/ambassador-second-call.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPWcr,https://seqera.io/blog/ambassador-second-call,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPWcr +Empowering Bioinformatics: Mentoring Across Continents with Nextflow,https://nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPXF2,https://seqera.io/blog/empowering-bioinformatics-mentoring,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPXF2 +Nextflow's colorful new console output,https://nextflow.io/blog/2024/nextflow-colored-logs.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPYob,https://seqera.io/blog/nextflow-colored-logs,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPYob +"Nextflow and nf-core Mentorship, Round 3",https://nextflow.io/blog/2023/czi-mentorship-round-3.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp23O,https://seqera.io/blog/czi-mentorship-round-3,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp23O +Nextflow Summit 2023 Recap,https://nextflow.io/blog/2023/nextflow-summit-2023-recap.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp36E,https://seqera.io/blog/nextflow-summit-2023-recap,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp36E +Introducing community.seqera.io,https://nextflow.io/blog/2023/community-forum.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPTkh,https://seqera.io/blog/community-forum,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPTkh +Introducing the Nextflow Ambassador Program,https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG5TQ,https://seqera.io/blog/introducing-nextflow-ambassador-program,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG5TQ +Geraldine Van der Auwera joins Seqera,https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp2Q5,https://seqera.io/blog/geraldine-van-der-auwera-joins-seqera,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp2Q5 +Nextflow goes to university!,https://nextflow.io/blog/2023/nextflow-goes-to-university.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG5jc,https://seqera.io/blog/nextflow-goes-to-university,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG5jc +A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer”,https://nextflow.io/blog/2023/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT1ye,https://seqera.io/blog/a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT1ye +Reflecting on ten years of Nextflow awesomeness,https://nextflow.io/blog/2023/reflecting-on-ten-years-of-nextflow-awesomeness.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5En,https://seqera.io/blog/reflecting-on-ten-years-of-nextflow-awesomeness,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT5En +Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters,https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT2PR,https://seqera.io/blog/best-practices-deploying-pipelines-with-hpc-workload-managers,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT2PR +Selecting the right storage architecture for your Nextflow pipelines,https://nextflow.io/blog/2023/selecting-the-right-storage-architecture-for-your-nextflow-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp494,https://seqera.io/blog/selecting-the-right-storage-architecture-for-your-nextflow-pipelines,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp494 +Celebrating our largest international training event and hackathon to date,https://nextflow.io/blog/2023/celebrating-our-largest-international-training-event-and-hackathon-to-date.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp1AH,https://seqera.io/blog/celebrating-our-largest-international-training-event-and-hackathon-to-date,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp1AH +"Nextflow and nf-core Mentorship, Round 2",https://nextflow.io/blog/2023/czi-mentorship-round-2.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG4cn,https://seqera.io/blog/czi-mentorship-round-2,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG4cn +The State of Kubernetes in Nextflow,https://nextflow.io/blog/2023/the-state-of-kubernetes-in-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT5M6,https://seqera.io/blog/the-state-of-kubernetes-in-nextflow,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT5M6 +Learn Nextflow in 2023,https://nextflow.io/blog/2023/learn-nextflow-in-2023.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPV3I,https://seqera.io/blog/learn-nextflow-in-2023,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPV3I +Get started with Nextflow on Google Cloud Batch,https://nextflow.io/blog/2023/nextflow-with-gbatch.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp3ZP,https://seqera.io/blog/nextflow-with-gbatch,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp3ZP +Analyzing caching behavior of pipelines,https://nextflow.io/blog/2022/caching-behavior-analysis.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG1gc,https://seqera.io/blog/caching-behavior-analysis,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG1gc +Nextflow Summit 2022 Recap,https://nextflow.io/blog/2022/nextflow-summit-2022-recap.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPR11,https://seqera.io/blog/nextflow-summit-2022-recap,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPR11 +Rethinking containers for cloud native pipelines,https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp00x,https://seqera.io/blog/rethinking-containers-for-cloud-native-pipelines,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp00x +Turbo-charging the Nextflow command line with Fig!,https://nextflow.io/blog/2022/turbocharging-nextflow-with-fig.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrp0XN,https://seqera.io/blog/turbocharging-nextflow-with-fig,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrp0XN +"Nextflow and nf-core mentorship, Round 1",https://nextflow.io/blog/2022/czi-mentorship-round-1.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltG2L6,https://seqera.io/blog/czi-mentorship-round-1,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltG2L6 +Deploy Nextflow Pipelines with Google Cloud Batch!,https://nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpT1Sz,https://seqera.io/blog/deploy-nextflow-pipelines-with-google-cloud-batch,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpT1Sz +Evolution of the Nextflow runtime,https://nextflow.io/blog/2022/evolution-of-nextflow-runtime.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPQOq,https://seqera.io/blog/evolution-of-nextflow-runtime,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPQOq +Nextflow’s community is moving to Slack!,https://nextflow.io/blog/2022/nextflow-is-moving-to-slack.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPQfo,https://seqera.io/blog/nextflow-is-moving-to-slack,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPQfo +Learning Nextflow in 2022,https://nextflow.io/blog/2022/learn-nextflow-in-2022.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrozXm,https://seqera.io/blog/learn-nextflow-in-2022,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrozXm +Configure Git private repositories with Nextflow,https://nextflow.io/blog/2021/configure-git-repositories-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFqeR,https://seqera.io/blog/configure-git-repositories-with-nextflow,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFqeR +Setting up a Nextflow environment on Windows 10,https://nextflow.io/blog/2021/setup-nextflow-on-windows.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPPIy,https://seqera.io/blog/setup-nextflow-on-windows,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPPIy +Introducing Nextflow support for SQL databases,https://nextflow.io/blog/2021/nextflow-sql-support.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFxxr,https://seqera.io/blog/nextflow-sql-support,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFxxr +Five more tips for Nextflow user on HPC,https://nextflow.io/blog/2021/5-more-tips-for-nextflow-user-on-hpc.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPMV3,https://seqera.io/blog/5-more-tips-for-nextflow-user-on-hpc,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPMV3 +5 Nextflow Tips for HPC Users,https://nextflow.io/blog/2021/5_tips_for_hpc_users.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyP1,https://seqera.io/blog/5_tips_for_hpc_users,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSyP1 +6 Tips for Setting Up Your Nextflow Dev Environment,https://nextflow.io/blog/2021/nextflow-developer-environment.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFxli,https://seqera.io/blog/nextflow-developer-environment,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFxli +Introducing Nextflow for Azure Batch,https://nextflow.io/blog/2021/introducing-nextflow-for-azure-batch.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyWK,https://seqera.io/blog/introducing-nextflow-for-azure-batch,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSyWK +Learning Nextflow in 2020,https://nextflow.io/blog/2020/learning-nextflow-in-2020.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSyAP,https://seqera.io/blog/learning-nextflow-in-2020,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSyAP +More syntax sugar for Nextflow developers!,https://nextflow.io/blog/2020/groovy3-syntax-sugar.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPM5b,https://seqera.io/blog/groovy3-syntax-sugar,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPM5b +The Nextflow CLI - tricks and treats!,https://nextflow.io/blog/2020/cli-docs-release.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrovFw,https://seqera.io/blog/cli-docs-release,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrovFw +Nextflow DSL 2 is here!,https://nextflow.io/blog/2020/dsl2-is-here.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPLss,https://seqera.io/blog/dsl2-is-here,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPLss +Easy provenance reporting,https://nextflow.io/blog/2019/easy-provenance-report.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFpzx,https://seqera.io/blog/easy-provenance-report,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFpzx +Troubleshooting Nextflow resume,https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSu2g,https://seqera.io/blog/troubleshooting-nextflow-resume,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSu2g +Demystifying Nextflow resume,https://nextflow.io/blog/2019/demystifying-nextflow-resume.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFpjl,https://seqera.io/blog/demystifying-nextflow-resume,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFpjl +One more step towards Nextflow modules,https://nextflow.io/blog/2019/one-more-step-towards-modules.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpStqV,https://seqera.io/blog/one-more-step-towards-modules,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpStqV +Nextflow 19.04.0 stable release is out!,https://nextflow.io/blog/2019/release-19.04.0-stable.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPLPB,https://seqera.io/blog/release-19.04.0-stable,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPLPB +Edge release 19.03: The Sequence Read Archive & more!,https://nextflow.io/blog/2019/release-19.03.0-edge.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkroud2,https://seqera.io/blog/release-19.03.0-edge,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkroud2 +Bringing Nextflow to Google Cloud Platform with WuXi NextCODE,https://nextflow.io/blog/2018/bringing-nextflow-to-google-cloud-wuxinextcode.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFoEe,https://seqera.io/blog/bringing-nextflow-to-google-cloud-wuxinextcode,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFoEe +"Goodbye zero, Hello Apache!",https://nextflow.io/blog/2018/goodbye-zero-hello-apache.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrotAG,https://seqera.io/blog/goodbye-zero-hello-apache,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrotAG +Nextflow meets Dockstore,https://nextflow.io/blog/2018/nextflow-meets-dockstore.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPDoK,https://seqera.io/blog/nextflow-meets-dockstore,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPDoK +Clarification about the Nextflow license,https://nextflow.io/blog/2018/clarification-about-nextflow-license.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrosnZ,https://seqera.io/blog/clarification-about-nextflow-license,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrosnZ +Conda support has landed!,https://nextflow.io/blog/2018/conda-support-has-landed.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrosxI,https://seqera.io/blog/conda-support-has-landed,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrosxI +Nextflow turns five! Happy birthday!,https://nextflow.io/blog/2018/nextflow-turns-5.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpStjC,https://seqera.io/blog/nextflow-turns-5,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpStjC +Running CAW with Singularity and Nextflow,https://nextflow.io/blog/2017/caw-and-singularity.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFgCh,https://seqera.io/blog/caw-and-singularity,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFgCh +Scaling with AWS Batch,https://nextflow.io/blog/2017/scaling-with-aws-batch.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFniG,https://seqera.io/blog/scaling-with-aws-batch,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFniG +Nexflow Hackathon 2017,https://nextflow.io/blog/2017/nextflow-hack17.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkros0w,https://seqera.io/blog/nextflow-hack17,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkros0w +Nextflow and the Common Workflow Language,https://nextflow.io/blog/2017/nextflow-and-cwl.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSrvA,https://seqera.io/blog/nextflow-and-cwl,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSrvA +Nextflow workshop is coming!,https://nextflow.io/blog/2017/nextflow-workshop.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSsEe,https://seqera.io/blog/nextflow-workshop,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSsEe +Nextflow published in Nature Biotechnology,https://nextflow.io/blog/2017/nextflow-nature-biotech-paper.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSs2T,https://seqera.io/blog/nextflow-nature-biotech-paper,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSs2T +More fun with containers in HPC,https://nextflow.io/blog/2016/more-fun-containers-hpc.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPBtY,https://seqera.io/blog/more-fun-containers-hpc,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPBtY +Enabling elastic computing with Nextflow,https://nextflow.io/blog/2016/enabling-elastic-computing-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrorKn,https://seqera.io/blog/enabling-elastic-computing-nextflow,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrorKn +Deploy your computational pipelines in the cloud at the snap-of-a-finger,https://nextflow.io/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoPA31,https://seqera.io/blog/deploy-in-the-cloud-at-snap-of-a-finger,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoPA31 +Docker for dunces & Nextflow for nunces,https://nextflow.io/blog/2016/docker-for-dunces-nextflow-for-nunces.html,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSpSnAT,https://seqera.io/blog/docker-for-dunces-nextflow-for-nunces,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;CXkBDLKNuhb4nnJzSpSnAT +Workflows & publishing: best practice for reproducibility,https://nextflow.io/blog/2016/best-practice-for-reproducibility.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP9m3,https://seqera.io/blog/best-practice-for-reproducibility,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoP9m3 +Error recovery and automatic resource management with Nextflow,https://nextflow.io/blog/2016/error-recovery-and-automatic-resources-management.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFfoP,https://seqera.io/blog/error-recovery-and-automatic-resources-management,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFfoP +Developing a bioinformatics pipeline across multiple environments,https://nextflow.io/blog/2016/developing-bioinformatics-pipeline-across-multiple-environments.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkroqur,https://seqera.io/blog/developing-bioinformatics-pipeline-across-multiple-environments,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkroqur +MPI-like distributed execution with Nextflow,https://nextflow.io/blog/2015/mpi-like-execution-with-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFexm,https://seqera.io/blog/mpi-like-execution-with-nextflow,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFexm +The impact of Docker containers on the performance of genomic pipelines,https://nextflow.io/blog/2015/the-impact-of-docker-on-genomic-pipelines.html,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zltFf9v,https://seqera.io/blog/the-impact-of-docker-on-genomic-pipelines,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;ntV3A5cVsWRByk7zltFf9v +Innovation In Science - The story behind Nextflow,https://nextflow.io/blog/2015/innovation-in-science-the-story-behind-nextflow.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP99s,https://seqera.io/blog/innovation-in-science-the-story-behind-nextflow,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoP99s +Introducing Nextflow REPL Console,https://nextflow.io/blog/2015/introducing-nextflow-console.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrooih,https://seqera.io/blog/introducing-nextflow-console,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrooih +Using Docker for scientific data analysis in an HPC cluster,https://nextflow.io/blog/2014/using-docker-in-hpc-cluster.html,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkrooM0,https://seqera.io/blog/using-docker-in-hpc-cluster,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;mNsm4Vx1W1Wy6aYYkrooM0 +Reproducibility in Science - Nextflow meets Docker,https://nextflow.io/blog/2014/nextflow-meets-docker.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP8bw,https://seqera.io/blog/nextflow-meets-docker,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoP8bw +Share Nextflow pipelines with GitHub,https://nextflow.io/blog/2014/share-nextflow-pipelines-with-github.html,https://seqera.io/preview?type=blogPostDev&id=L90MLvtZSPRQtUzPRoP8su,https://seqera.io/blog/share-nextflow-pipelines-with-github,https://seqera-cms.netlify.app/seqera/structure/blogPostDev;L90MLvtZSPRQtUzPRoP8su diff --git a/internal/step2/links2.csv b/internal/step2/links2.csv new file mode 100644 index 00000000..42e66958 --- /dev/null +++ b/internal/step2/links2.csv @@ -0,0 +1,75 @@ +title,oldURL,devURL,prodURL,cmsURL +"Container provisioning, cloud resource optimization, Google Cloud Batch support, and more in Tower Enterprise 22.3",https://seqera.io/blog/container-provisioning-cloud-resource-optimization-google-cloud-batch-support-and-more-in-tower-enterprise-22-3,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuyjrI,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuyjrI +Introducing the Tower Cloud Community Workspace,https://seqera.io/blog/introducing-the-tower-cloud-community-workspace,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuyz6G,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuyz6G +Introducing Tower Datasets,https://seqera.io/blog/introducing-tower-datasets,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuyzUc,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuyzUc +Kubernetes is everywhere!,https://seqera.io/blog/kubernetes-is-everywhere,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuz0R4,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuz0R4 +Fusion file system and Mountpoint for Amazon S3 – understanding the differences,https://seqera.io/blog/mountpoint-for-amazon-s3-vs-fusion-file-system,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuz0dF,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuz0dF +MultiQC turns eight years old!,https://seqera.io/blog/multiqc-turns-8-years-old,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuz2ft,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuz2ft +Nextflow and AWS Batch – Inside the Integration (2 of 3),https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuz2zN,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuz2zN +Nextflow and Azure Batch – Inside the Integration (1 of 2),https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzPDr,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzPDr +Nextflow and Azure Batch – working with Tower (2 of 2),https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzQzS,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzQzS +nf-core June 2021 Updates - The latest news from the nf-core team,https://seqera.io/blog/nf-core-updates-june-2021,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzT9P,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzT9P +Phil Ewels joins Seqera,https://seqera.io/blog/phil-ewels-joins-seqera-labs,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzXJZ,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzXJZ +Pipeline Secrets: Secure Handling of Sensitive Information in Tower,https://seqera.io/blog/pipeline-secrets-secure-handling-of-sensitive-information-in-tower,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzYNK,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzYNK +Preparing for a multi-cloud future,https://seqera.io/blog/preparing-for-a-multicloud-future,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzZHL,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzZHL +RNA Society Hosts APAeval Challenge on Nextflow Tower,https://seqera.io/blog/rna-society,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzZOe,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzZOe +State of the Workflow 2022: The Nextflow and nf-core community survey,https://seqera.io/blog/the-state-of-the-workflow-the-2022-nextflow-and-nf-core-community-survey,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzi9l,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzi9l +The State of the Workflow 2023: The Nextflow and nf-core community survey,https://seqera.io/blog/the-state-of-the-workflow-the-2023-nextflow-and-nf-core-community-survey,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuziQo,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuziQo +ZS and Seqera announce a new partnership agreement,https://seqera.io/blog/zs-and-seqera-labs-announce-a-new-partnership-agreement,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzn1l,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzn1l +Seqera Announces Support for AWS for Health Initiative,https://seqera.io/blog/a4health,https://seqera.io/preview?type=blogPostDev&id=DkCj6afB3uASKneoI6pQ5d,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;DkCj6afB3uASKneoI6pQ5d +Accelerating Analytics with Easy Genomics - Wisconsin State Laboratory,https://seqera.io/blog/accelerating-analytics-with-easy-genomics-wisconsin-state-laboratory,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3a9o,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3a9o +Announcing Labels in Tower,https://seqera.io/blog/announcing-labels-in-tower,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3blB,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3blB +Announcing Nextflow Support for Google Cloud Batch,https://seqera.io/blog/announcing-nextflow-support-for-google-cloud-batch,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3bq4,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3bq4 +Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS,https://seqera.io/blog/deploying-nextflow-on-amazon-eks,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3icM,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3icM +"Element Biosciences and Seqera – Flexible, powerful, end-to-end analysis at scale",https://seqera.io/blog/element-biosciences-and-seqera-flexible-powerful-end-to-end-analysis-at-scale,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3jQr,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3jQr +Introducing Data Explorer,https://seqera.io/blog/introducing-data-explorer,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3lvc,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3lvc +"Introducing Harshil Patel, Head of Scientific Development",https://seqera.io/blog/introducing-harshil-patel-head-of-scientific-development,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3m5O,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3m5O +Introducing the Seqera Platform - one platform for the data analysis lifecycle,https://seqera.io/blog/introducing-the-seqera-platform,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3sEa,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3sEa +Keep your genomic analysis on track with MultiQC,https://seqera.io/blog/keep-your-genomic-analysis-on-track-with-multiqc,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3syC,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3syC +"Container provisioning, cloud resource optimization, Google Cloud Batch support, and more in Tower Enterprise 22.3",https://seqera.io/blog/container-provisioning-cloud-resource-optimization-google-cloud-batch-support-and-more-in-tower-enterprise-22-3,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuyjrI,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuyjrI +The State of the Workflow 2023: The Nextflow and nf-core community survey,https://seqera.io/blog/the-state-of-the-workflow-the-2023-nextflow-and-nf-core-community-survey,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuziQo,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuziQo +ZS and Seqera announce a new partnership agreement,https://seqera.io/blog/zs-and-seqera-labs-announce-a-new-partnership-agreement,https://seqera.io/preview?type=blogPostDev&id=CXkBDLKNuhb4nnJzSuzn1l,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;CXkBDLKNuhb4nnJzSuzn1l +Announcing Labels in Tower,https://seqera.io/blog/announcing-labels-in-tower,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3blB,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3blB +"Element Biosciences and Seqera – Flexible, powerful, end-to-end analysis at scale",https://seqera.io/blog/element-biosciences-and-seqera-flexible-powerful-end-to-end-analysis-at-scale,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3jQr,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3jQr +Introducing the Seqera Platform - one platform for the data analysis lifecycle,https://seqera.io/blog/introducing-the-seqera-platform,https://seqera.io/preview?type=blogPostDev&id=c0i6tx6LieG6Y4wPmG3sEa,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;c0i6tx6LieG6Y4wPmG3sEa +Automating pipeline execution with Nextflow and Tower,https://seqera.io/blog/automating-workflows-with-nextflow-and-tower,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9hpi,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9hpi +Announcing Illumina DRAGEN integration with Nextflow Tower,https://seqera.io/blog/announcing-illumina-dragen-integration-with-nextflow-tower,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2KoLK,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2KoLK +Kickstart your Tower Cloud experience,https://seqera.io/blog/getting-started-with-nextflow-tower,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2LDk9,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2LDk9 +Workflow Automation for Nextflow Pipelines,https://seqera.io/blog/workflow-automation,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2N3II,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2N3II +Announcing the Nextflow Tower CLI,https://seqera.io/blog/announcing-the-nextflow-tower-cli,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9hPm,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9hPm +Automating pipeline execution with Nextflow and Tower,https://seqera.io/blog/automating-workflows-with-nextflow-and-tower,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9hpi,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9hpi +Breakthrough performance and cost-efficiency with the new Fusion file system,https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9m4J,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9m4J +Building Containers for Scientific Workflows,https://seqera.io/blog/building-containers-for-scientific-workflows,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9mUF,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9mUF +Cloud Native Data Pipelines,https://seqera.io/blog/cloud-native-data-pipelines,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9nx1,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9nx1 +Day one,https://seqera.io/blog/day-one,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9p6L,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9p6L +Geraldine Van der Auwera joins Seqera,https://seqera.io/blog/geraldine-van-der-auwera-joins-seqera,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9qM9,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9qM9 +How Nextflow is helping win the global battle against future outbreaks and pandemics,https://seqera.io/blog/how-nextflow-is-helping-win-the-global-battle-against-future-outbreaks-and-pandemics,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkz9tBD,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkz9tBD +Personalized Immunotherapy – Machine Learning meets Next Generation Sequencing,https://seqera.io/blog/machine-learning-meets-ngs,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzAAnk,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzAAnk +MultiQC: A fresh coat of paint,https://seqera.io/blog/multiqc-plotly,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzAINp,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzAINp +Nextflow and Tower for Machine Learning,https://seqera.io/blog/nextflow-and-tower-for-machine-learning,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzAeOg,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzAeOg +The next step for multi-platform collaboration at scale,https://seqera.io/blog/orgs-and-launchpad,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzAi6r,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzAi6r +Seqera at ASHG 2023,https://seqera.io/blog/seqera-at-ashg-2023,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzAwkb,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzAwkb +State of Nextflow Survey 2021,https://seqera.io/blog/state-of-nextflow-2021,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzB0iz,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzB0iz +Tower Forge for AWS Batch,https://seqera.io/blog/tower-batch-forge,https://seqera.io/preview?type=blogPostDev&id=mNsm4Vx1W1Wy6aYYkzB8W2,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;mNsm4Vx1W1Wy6aYYkzB8W2 +Announcing Illumina DRAGEN integration with Nextflow Tower,https://seqera.io/blog/announcing-illumina-dragen-integration-with-nextflow-tower,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2KoLK,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2KoLK +Best Practices for Deploying Pipelines with the Seqera Platform (formerly Nextflow Tower),https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2Kt6r,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2Kt6r +Kickstart your Tower Cloud experience,https://seqera.io/blog/getting-started-with-nextflow-tower,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2LDk9,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2LDk9 +Grid Engine Support for Tower Workloads,https://seqera.io/blog/grid-engine,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2LDwI,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2LDwI +Break-through Science - Machine Learning and Imaging Pipelines using Nextflow,https://seqera.io/blog/imeka-breakthrough-science,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2LFNM,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2LFNM +Introducing Nextflow Tower - Seamless monitoring of data analysis workflows from anywhere,https://seqera.io/blog/introducing-nextflow-tower,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2LGsT,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2LGsT +Nextflow and AWS Batch – Inside the Integration (1 of 3),https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2LgXU,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2LgXU +Nextflow and AWS Batch - Using Tower (3 of 3),https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2Lxa7,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2Lxa7 +Nextflowomics: Untangling Biology with Data Analysis Pipelines,https://seqera.io/blog/nextflowomics-untangling-biology-with-data-analysis-pipelines,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MGaC,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MGaC +Optimizing resource usage with Nextflow Tower,https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MHQp,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MHQp +Optimizing Workflows with Nextflow,https://seqera.io/blog/optimizing-workflows-with-nextflow,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MJ42,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MJ42 +Running AI workloads in the cloud with Nextflow Tower — a step-by-step guide,https://seqera.io/blog/running-ai-workloads-in-the-cloud-with-nextflow-tower-a-step-by-step-guide,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MVD2,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MVD2 +Seqera and AWS Fargate,https://seqera.io/blog/seqera-and-aws-fargate,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MXx4,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MXx4 +7 tips for reducing your Nextflow cloud spend,https://seqera.io/blog/seven-tips-article,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MZuW,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MZuW +Sharing is Caring,https://seqera.io/blog/sharing-is-caring,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2McGG,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2McGG +Singularity Reloaded,https://seqera.io/blog/singularity-reloaded-article,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MdAw,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MdAw +The State of Nextflow 2021,https://seqera.io/blog/state-of-nextflow-2021-results,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MhU8,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MhU8 +The State of the Workflow 2022: Community Survey Results,https://seqera.io/blog/state-of-the-workflow-2022-results,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2Mmq6,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2Mmq6 +The State of the Workflow 2023: Community Survey Results,https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2Mqcu,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2Mqcu +Tower API & Docs release,https://seqera.io/blog/tower-api-docs-release,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MxTz,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MxTz +Tower Cloud Launch,https://seqera.io/blog/tower-cloud-launch,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2MyGZ,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2MyGZ +Workflow Automation for Nextflow Pipelines,https://seqera.io/blog/workflow-automation,https://seqera.io/preview?type=blogPostDev&id=ntV3A5cVsWRByk7zm2N3II,,https://seqera-cms.netlify.app/seqera/structure/blogPostDev2;ntV3A5cVsWRByk7zm2N3II From 121fc12d085ae03b2a7f6bd6359505ddfeb6ebf2 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 16 Oct 2024 16:42:53 +0200 Subject: [PATCH 19/21] Migrate from blogPostDev --- internal/step3/backup.json | 19478 +++++++++++++++++++++++++++ internal/step3/backup.mjs | 24 + internal/step3/migrateBlogType.mjs | 46 + 3 files changed, 19548 insertions(+) create mode 100644 internal/step3/backup.json create mode 100644 internal/step3/backup.mjs create mode 100644 internal/step3/migrateBlogType.mjs diff --git a/internal/step3/backup.json b/internal/step3/backup.json new file mode 100644 index 00000000..4db744ba --- /dev/null +++ b/internal/step3/backup.json @@ -0,0 +1,19478 @@ +[ + { + "_id": "0b2d6b7b-2e03-41e1-8b10-74ad41686e89", + "_updatedAt": "2024-09-16T07:32:05Z", + "author": { + "_ref": "rob-syme", + "_type": "reference" + }, + "_rev": "Qhrcj1462eoyp9RZGGQNso", + "_type": "blogPost", + "publishedAt": "2024-09-02T07:17:00.000Z", + "meta": { + "_type": "meta", + "description": "Performing interactive analysis is considered one of the most difficult phases in the entire bioinformatics process. User-friendly interactive environments that are adjacent to your data and streamline the end-to end analysis process are critical.\n", + "noIndex": false, + "slug": { + "current": "data-studios-image-segmentation", + "_type": "slug" + } + }, + "body": [ + { + "style": "normal", + "_key": "92fe05ff0537", + "markDefs": [], + "children": [ + { + "_key": "631a6e0767360", + "_type": "span", + "marks": [], + "text": "Scientific research is rarely direct, and workflows commonly require further downstream analyses beyond pipeline runs. While Nextflow excels at batch automation, human interpretation of the generated data is also an essential part of the scientific process. Interactive environments facilitate this process by enabling model refinement and report generation, increasing efficiency and facilitating informed decision-making." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "fc43232ac000", + "markDefs": [ + { + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8482564/", + "_key": "1bbb46da0dcc", + "_type": "link" + } + ], + "children": [ + { + "_key": "dbe6c56444880", + "_type": "span", + "marks": [], + "text": "Performing interactive analysis is considered one of the " + }, + { + "_key": "442e2de71eb8", + "_type": "span", + "marks": [ + "1bbb46da0dcc" + ], + "text": "most challenging steps in the entire bioinformatics process" + }, + { + "marks": [], + "text": ". Users face cumbersome, time-consuming, and error-prone manual tasks such as transferring data from the cloud to local storage and navigating various APIs, programming languages, libraries, and tools. ", + "_key": "201701c45af0", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "User-friendly interactive environments", + "_key": "04f2d036269a", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " that exist adjacent to your data are critical to streamline end-to-end computational analyses.", + "_key": "9aeb813c2016" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "1a4b67c2eec10", + "_type": "span", + "marks": [], + "text": "Seqera’s " + }, + { + "_type": "span", + "marks": [ + "11674253ce1c" + ], + "text": "Data Studios", + "_key": "1a4b67c2eec11" + }, + { + "_key": "1a4b67c2eec12", + "_type": "span", + "marks": [], + "text": " bridges the gap between pipeline outputs and secure interactive analysis environments by bringing " + }, + { + "_key": "1a4b67c2eec13", + "_type": "span", + "marks": [ + "203ac6ca0086" + ], + "text": "reproducible, containerized and interactive analytical notebook environments" + }, + { + "marks": [], + "text": " to your data. In this way, the output of one workflow can be analyzed manually and be used as the input for a subsequent workflow. Here, we show how a scientist can use the Seqera Platform’s Runs and Data Studios features to ", + "_key": "1a4b67c2eec14", + "_type": "span" + }, + { + "text": "optimize image segmentation model iteration", + "_key": "1a4b67c2eec15", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "1a4b67c2eec16", + "_type": "span", + "marks": [], + "text": " in the " + }, + { + "_type": "span", + "marks": [ + "2e6de2623899" + ], + "text": "nf-core/molkart", + "_key": "1a4b67c2eec17" + }, + { + "text": " pipeline.", + "_key": "1a4b67c2eec18", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "20397f5fc865", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/latest/data/data-studios", + "_key": "11674253ce1c" + }, + { + "href": "https://seqera.io/blog/data-studios-announcement/", + "_key": "203ac6ca0086", + "_type": "link" + }, + { + "_key": "2e6de2623899", + "_type": "link", + "href": "https://nf-co.re/molkart/1.0.0" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a3328a6c71f1", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6fcebc357158", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "blockquote", + "_key": "915063774d4c", + "markDefs": [], + "children": [ + { + "_key": "3344b7d0b586", + "_type": "span", + "marks": [], + "text": "Watch the full presentation from Nextflow Summit in Boston, May 2024 " + } + ], + "_type": "block" + }, + { + "_type": "youtube", + "id": "sIFL-Pk9Wl4", + "_key": "2c940faebb5f" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "How does image segmentation work?", + "_key": "700ac256fecf" + } + ], + "_type": "block", + "style": "h2", + "_key": "7df84a8fb865", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "A central task in molecular biology is quantifying the abundance of different molecules (often RNAs or proteins) per cell or structure. Traditionally, this was done by sampling entire tissues or, in later approaches, using single-cell methods to measure such molecules within each cell. However, both bulk and single-cell omics methods lose information about the spatial organization of cells within a tissue, a key factor during tissue development and a potential driver for diseases like cancer. Spatial omics, which combines imaging with ultra-sensitive assays to measure molecules, now allows the identification of hundreds to thousands of transcripts on tissue sections.", + "_key": "5a978796f4610" + } + ], + "_type": "block", + "style": "normal", + "_key": "6d0963afece2", + "markDefs": [] + }, + { + "_key": "806e4ab36139", + "markDefs": [ + { + "href": "http://nf-core/molkart", + "_key": "b3d4e5d943ce", + "_type": "link" + }, + { + "_key": "f2172ebc7417", + "_type": "link", + "href": "https://resolvebiosciences.com/" + }, + { + "_type": "link", + "href": "https://github.com/MouseLand/cellpose", + "_key": "aa9d7482adbf" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "b3d4e5d943ce" + ], + "text": "nf-core/molkart", + "_key": "707cd6cbb7851" + }, + { + "marks": [], + "text": " is a spatial transcriptomics pipeline for processing ", + "_key": "707cd6cbb7852", + "_type": "span" + }, + { + "_key": "707cd6cbb7853", + "_type": "span", + "marks": [ + "f2172ebc7417" + ], + "text": "Molecular Cartography data by Resolve Bioscience" + }, + { + "_key": "707cd6cbb7854", + "_type": "span", + "marks": [], + "text": ", which measures hundreds of RNA transcripts on a tissue section using single-molecule fluorescent in-situ hybridization (smFISH) (Figure 1). This pipeline includes a Nextflow module for the popular segmentation method " + }, + { + "_key": "707cd6cbb7855", + "_type": "span", + "marks": [ + "aa9d7482adbf" + ], + "text": "Cellpose" + }, + { + "_key": "707cd6cbb7856", + "_type": "span", + "marks": [], + "text": ", which allows a human-in-the-loop approach for improving cell segmentation. Conveniently, the nf-core/molkart pipeline includes a workflow branch for generating custom training data from a source data set. Training a performant, custom cellpose model typically requires multiple time consuming human-in-the-loop model iterations within an interactive analysis environment.\n" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "ff68a5ff91f0", + "asset": { + "_ref": "image-2bf639c49db818e0ac460c03bf1358c842865511-1600x900-png", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "782b164da2b9", + "markDefs": [ + { + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3", + "_key": "71d69b8521c9", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "strong", + "em" + ], + "text": "Figure 1. ", + "_key": "e38653a66779", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Adapted workflow diagram of the nf-core/molkart pipeline for processing molecular cartography data using Nextflow. Original image data shown was taken from the literature (", + "_key": "ac613f5dd761" + }, + { + "_type": "span", + "marks": [ + "em", + "71d69b8521c9" + ], + "text": "Perico et al", + "_key": "6de7c14d3d24" + }, + { + "text": ".).", + "_key": "76c52cc20136", + "_type": "span", + "marks": [ + "em" + ] + } + ] + }, + { + "_key": "4d9dddc44892", + "markDefs": [ + { + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3", + "_key": "e387c2f0dc3e" + } + ], + "children": [ + { + "marks": [], + "text": "We used Data Studios to bring the tertiary analysis adjacent to the data in cloud storage, using data from ta 2024 preprint by ", + "_key": "d2dab384f2980", + "_type": "span" + }, + { + "text": "Perico et. al", + "_key": "46434dbd743d", + "_type": "span", + "marks": [ + "e387c2f0dc3e" + ] + }, + { + "marks": [], + "text": ". This allows us to iteratively train and improve a custom cellpose model for our specific dataset (Figure 2).", + "_key": "57b73961eea5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "52cd6f4b80c7", + "asset": { + "_ref": "image-2d79c51097a15dfee042f120bbda1bebe4b129a4-1600x900-png", + "_type": "reference" + } + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3", + "_key": "2c25749c710f" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 2. ", + "_key": "dfba0db9702b0" + }, + { + "text": "Adapted workflow diagram of the nf-core/molkart pipeline using Data Studios (highlighted in gray) to iteratively train a custom cellpose model to use as input for cell segmentation. Original image data shown was taken from the literature (", + "_key": "fdbbdf7dd8a9", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": "Perico et al", + "_key": "862bb7f81cd3", + "_type": "span", + "marks": [ + "em", + "2c25749c710f" + ] + }, + { + "_key": "a88e1550c228", + "_type": "span", + "marks": [ + "em" + ], + "text": ".).\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a444b4d08d3" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Adding Data Studios to the workflow", + "_key": "ebdcac6fc010" + } + ], + "_type": "block", + "style": "h2", + "_key": "3b0ef9de8028", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Using Data Studios as part of an adapted workflow was extremely beneficial:", + "_key": "f286f6dec6a60", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f7eb2c08e442" + }, + { + "children": [ + { + "_key": "797be56fead20", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "458e3394230e", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ff0a1dded277", + "listItem": "number", + "markDefs": [ + { + "_key": "33bd89a47469", + "_type": "link", + "href": "https://napari.org/stable/" + }, + { + "href": "https://qupath.github.io/", + "_key": "7ddd78939bfc", + "_type": "link" + }, + { + "_type": "link", + "href": "https://imagej.net/software/fiji/", + "_key": "a8f46f9aa40a" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Rapid review of image training data", + "_key": "b9b6ceaf6cc30" + }, + { + "text": " –", + "_key": "b9b6ceaf6cc31", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " ", + "_key": "b9b6ceaf6cc32" + }, + { + "_type": "span", + "marks": [], + "text": "Images can be quickly reviewed directly in the cloud-hosted Data Studio analysis environment using common tools such as ", + "_key": "b9b6ceaf6cc33" + }, + { + "_key": "b9b6ceaf6cc34", + "_type": "span", + "marks": [ + "33bd89a47469" + ], + "text": "napari" + }, + { + "marks": [], + "text": ", ", + "_key": "b9b6ceaf6cc35", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "7ddd78939bfc" + ], + "text": "QuPath", + "_key": "b9b6ceaf6cc36" + }, + { + "_type": "span", + "marks": [], + "text": ", or ", + "_key": "b9b6ceaf6cc37" + }, + { + "text": "Fiji", + "_key": "b9b6ceaf6cc38", + "_type": "span", + "marks": [ + "a8f46f9aa40a" + ] + }, + { + "text": ". Prior to Data Studios, bioinformaticians would typically download the images, review, and re-upload to blob storage.", + "_key": "b9b6ceaf6cc39", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "2ea3f4af5025", + "listItem": "number", + "markDefs": [], + "children": [ + { + "text": "Collaboratively train a custom model in-situ ", + "_key": "1774f85b64fa0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": "– Using a GPU-enabled compute environment for the Data Studios session, we used cellpose to train a new custom model on-the-fly using the previously generated image crops. Using a shareable URL, Data Studios enables seamless collaboration between data scientists and bench scientists with domain expertise in a single location.", + "_key": "1774f85b64fa1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "9aaefadfd9da", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Apply the new model to the original data ", + "_key": "5596d4c9c4d70" + }, + { + "_type": "span", + "marks": [], + "text": "– The new, manually-trained model was then applied to the original, full size image dataset. The cell segmentation results of the custom model can be inspected in the same Data Studios instance using any standard tool.\n", + "_key": "3e9f21a0c9d6" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "image", + "_key": "0b1b42edce43", + "asset": { + "_ref": "image-481b2d03aee5622987b731a4819945bdced48291-1920x1080-png", + "_type": "reference" + } + }, + { + "markDefs": [ + { + "_key": "b3bd36addbaf", + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3" + } + ], + "children": [ + { + "_key": "bc888466590a", + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 3. " + }, + { + "text": "Schematic workflow of image segmentation using nf-core/molkart with (bottom) and without (top) Data Studios. Original image data shown was taken from the literature (", + "_key": "94887d38f64d", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [ + "b3bd36addbaf", + "em" + ], + "text": "Perico et al", + "_key": "9bf7d5bb618e", + "_type": "span" + }, + { + "text": ".).", + "_key": "c8a5c75c8854", + "_type": "span", + "marks": [ + "em" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "164ca6145027" + }, + { + "style": "h2", + "_key": "bf6040867323", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "The benefits of Data Studios", + "_key": "ce0ed88dbf6a0" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/fusion/", + "_key": "4c1738586069" + } + ], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Data remains in-situ", + "_key": "c727cb86c6ac0", + "_type": "span" + }, + { + "marks": [], + "text": " – No shuttling large volumes of data back and forth between your cloud storage and local analysis environments, which can quickly become expensive with ingress and egress charges, is extremely inefficient, and can result in data loss. Using the ", + "_key": "c727cb86c6ac1", + "_type": "span" + }, + { + "_key": "c727cb86c6ac2", + "_type": "span", + "marks": [ + "4c1738586069" + ], + "text": "Fusion file system" + }, + { + "_key": "c727cb86c6ac3", + "_type": "span", + "marks": [], + "text": ", Data Studios enables direct file access to cloud blob storage and is incredibly performant." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0e2b6a59dd98", + "listItem": "bullet" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "28efa7b3f7af0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b8155228df9a", + "markDefs": [] + }, + { + "style": "normal", + "_key": "9ca108cbb51b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Stable, containerized analysis environments", + "_key": "d389e047db4e0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " – Data Studio sessions are checkpointed, and can be rolled back to any previous state each time the session is stopped and restarted. Each checkpoint preserves the state of the running machine at a point in time, ensuring consistency and reproducibility of the environment, the software used, and data worked with.", + "_key": "d389e047db4e1" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "6f1fd4d1251a0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca582579a6fd" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1415c744c5c3", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Provision only the resources you need", + "_key": "52f988cd7a9d0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " – Data Studio sessions are fully customizable. Based on the analysis task(s) at hand, they can be provisioned as lean or as fully-featured as required, for example, making them GPU-enabled or adding hundreds of cores.", + "_key": "52f988cd7a9d1", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "7e36cc508b12", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "36a4f06d680e0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c086ca066584", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Permissions are centrally managed", + "_key": "745473d296350" + }, + { + "_type": "span", + "marks": [], + "text": " – Organization and workspace credentials are centrally managed by your organization administrators, ensuring only authenticated users with the appropriate permissions can connect to the data and analysis environment(s). Bioinformaticians and data scientists shouldn’t spend time managing infrastructure and permissions.", + "_key": "745473d296351" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "6320ec1510da0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d6ed9176d77a", + "markDefs": [] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Secure, real time collaboration –", + "_key": "b08f197315d30" + }, + { + "_type": "span", + "marks": [], + "text": " The shareable URL feature ensures safe collaboration within, or across, bioinformatician and data science teams.", + "_key": "b08f197315d31" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5af126e9be9c" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "375ea8c072cb" + } + ], + "_type": "block", + "style": "normal", + "_key": "47ce6afd2c6e", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Streamline the entire data lifecycle", + "_key": "caf7f2d160530", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block", + "style": "h2", + "_key": "ef8ae7eca4d6" + }, + { + "markDefs": [], + "children": [ + { + "text": "Data Studios can ", + "_key": "c08a98a9fb88", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "streamline the entire end-to-end scientific data lifecycle", + "_key": "0ed1694aa9ad", + "_type": "span" + }, + { + "text": " by bringing reproducible, containerized and interactive analytical notebook environments to your data in real-time. This allows you to seamlessly transition from Nextflow pipeline outputs to secure interactive environments, consolidating data and analytics into one unified location.", + "_key": "a3942ac3f18e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "84b24ceaf43a" + }, + { + "_key": "564b3312ce1a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "befe51b498150" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "“Data Studios enables the creation of the needed package environment for any project quickly, expediting the project start-up process. This allows us to promptly focus on data analysis and efficiently share the environment with the team”\n\n- ", + "_key": "b7d2c5cc7e600", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "Lorena Pantano, PhD\nDirector of Bioinformatics Platform, Harvard Chan Bioinformatics Core", + "_key": "6fb6b1456ec5", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "fbdc3a7ad46c" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "270b32efa102" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c0662e7f397", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "View Data Studios in the Seqera Platform ", + "_key": "76dd999d4e8f0" + }, + { + "text": "Community Showcase workspace", + "_key": "76dd999d4e8f1", + "_type": "span", + "marks": [ + "98aa74b74492" + ] + }, + { + "_type": "span", + "marks": [], + "text": " or start a ", + "_key": "76dd999d4e8f2" + }, + { + "text": "free trial today", + "_key": "76dd999d4e8f3", + "_type": "span", + "marks": [ + "cf07f789df27" + ] + }, + { + "text": "!", + "_key": "76dd999d4e8f4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0fa70ffe4664", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02Nhk4y0", + "_key": "98aa74b74492" + }, + { + "href": "https://hubs.la/Q02NhjDZ0", + "_key": "cf07f789df27", + "_type": "link" + } + ] + } + ], + "title": "Optimizing image segmentation modeling using Seqera Platform", + "tags": [ + { + "_ref": "82fd60f1-c6d0-4b8a-9c5d-f971c622f341", + "_type": "reference", + "_key": "07cea8dc5caa" + }, + { + "_ref": "f1d61674-9374-4d2c-97c2-55778db7c922", + "_type": "reference", + "_key": "5b7351bdac98" + }, + { + "_ref": "32377094-ace0-4f1e-bb48-b47f02d3849e", + "_type": "reference", + "_key": "cb6a3c4b282b" + }, + { + "_ref": "b70b4c8b-10e9-4630-b43f-e11b33f14daf", + "_type": "reference", + "_key": "48f31265ebdd" + }, + { + "_ref": "8c6a46a2-4653-49fb-a5c3-ddf572a75381", + "_type": "reference", + "_key": "231a068aa82a" + }, + { + "_ref": "2b5c9a56-b491-42aa-b291-86611d77ccec", + "_type": "reference", + "_key": "28d742bb8a84" + } + ], + "_createdAt": "2024-08-27T08:23:51Z" + }, + { + "author": { + "_ref": "109f0c7b-3d40-42a9-af77-3844f0e031c0", + "_type": "reference" + }, + "_createdAt": "2024-05-13T11:54:28Z", + "publishedAt": "2024-05-15T13:59:00.000Z", + "title": "nf-core/riboseq: A collaboration between Altos Labs and Seqera", + "_type": "blogPost", + "_updatedAt": "2024-05-15T10:12:46Z", + "tags": [ + { + "_key": "f30d3e591314", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "4525d8907a1f", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058" + }, + { + "_key": "7c9827906277", + "_ref": "ab59634e-a349-468d-8f99-cb9fe4c38228", + "_type": "reference" + } + ], + "meta": { + "description": "nf-core/riboseq: A collaboration between Altos Labs and Seqera", + "noIndex": false, + "slug": { + "current": "nf-core-riboseq", + "_type": "slug" + }, + "_type": "meta", + "shareImage": { + "_type": "image", + "asset": { + "_ref": "image-10399aee1fa48e4250f2e7ab3c7fb76ca3aa1ac4-1200x628-png", + "_type": "reference" + } + } + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "This is a joint article contributed to the Seqera blog by Jon Manning of Seqera and Felix Krueger of Altos Labs describing the new nf-core/riboseq pipeline.", + "_key": "8c2ee84cdf5e0", + "_type": "span", + "marks": [ + "em" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "fc11d5317163" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "39d86b09469d" + }, + { + "_key": "f22304d582ae", + "_type": "link", + "href": "https://nf-co.re/riboseq" + }, + { + "_key": "23797f8146f8", + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Ribosome_profiling" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In April 2024, the bioinformatics community welcomed a significant addition to the ", + "_key": "5355407782e60" + }, + { + "text": "nf-core", + "_key": "5355407782e61", + "_type": "span", + "marks": [ + "39d86b09469d" + ] + }, + { + "_type": "span", + "marks": [], + "text": " suite: the ", + "_key": "5355407782e62" + }, + { + "_type": "span", + "marks": [ + "f22304d582ae" + ], + "text": "nf-core/riboseq", + "_key": "5355407782e63" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline. This new tool, born from a collaboration between Altos Labs and Seqera, underscores the potential of strategic partnerships to advance scientific research. In this article, we provide some background on the project, offer details on the pipeline, and explain how readers can get started with ", + "_key": "5355407782e64" + }, + { + "_key": "5355407782e65", + "_type": "span", + "marks": [ + "23797f8146f8" + ], + "text": "Ribo-seq" + }, + { + "text": " analysis.", + "_key": "5355407782e66", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a96b84f9b665" + }, + { + "_type": "block", + "style": "h2", + "_key": "ff2e29964409", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A Fruitful Collaboration", + "_key": "06511e51fc0b" + } + ] + }, + { + "_key": "212704cdad6c", + "markDefs": [], + "children": [ + { + "text": "Altos Labs is known for its ambitious efforts in harnessing cellular rejuvenation to reverse disease, injury, and disabilities that can occur throughout life. Their scientific strategy heavily relies on understanding cellular mechanisms via advanced technologies. Ribo-seq provides insights into the real-time translation of proteins, a core process often dysregulated during aging and disease. Altos Labs needed a way to ensure reliable, reproducible Ribo-seq analysis that its research teams could use. While a Ribo-seq pipeline had been started in nf-core, limited progress had been made. Seqera seemed the ideal partner to help build one!", + "_key": "ef4460f305a4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera, known for creating and developing the ", + "_key": "402551d96a99" + }, + { + "_key": "a11895ee51be", + "_type": "span", + "marks": [ + "afd8d4976f75" + ], + "text": "Nextflow DSL" + }, + { + "text": " and being an active partner in establishing community standards on nf-core, brought the expertise needed to translate Altos Labs' vision into a viable community pipeline. As part of this collaboration, we formed a working group and also reached out to colleagues at ", + "_key": "206247a437cc", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "8fc76bfd5785" + ], + "text": "ZS", + "_key": "da520fc0d7f3" + }, + { + "text": " and other community members who had done prior work with Ribosome profiling in Nextflow. Our goal was not only to enhance Ribo-seq analysis capabilities but also to ensure the pipeline’s sustainability through a community-driven process.", + "_key": "c8e26b5b7392", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3a4e325a6885", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/nextflow/", + "_key": "afd8d4976f75" + }, + { + "_type": "link", + "href": "https://www.zs.com/", + "_key": "8fc76bfd5785" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Development Insights", + "_key": "023772c169b7" + } + ], + "_type": "block", + "style": "h2", + "_key": "110443549dbc", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1bb6d0dcf94a", + "markDefs": [], + "children": [ + { + "text": "The nf-core/riboseq project was structured into several phases:", + "_key": "fcef7ffc7722", + "_type": "span", + "marks": [] + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "04af5c122b050", + "_type": "span", + "marks": [ + "em" + ], + "text": "Initial planning" + }, + { + "_type": "span", + "marks": [], + "text": ": This phase involved detailed discussions between the Scientific Development team at Seqera, Altos Labs, and expert partners to ensure alignment with best practices and effective tool selection.", + "_key": "04af5c122b051" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9bd450582af3" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ef189e78f7f5", + "listItem": "bullet", + "markDefs": [ + { + "_key": "3fa0f88295d5", + "_type": "link", + "href": "https://nf-co.re/rnaseq" + } + ], + "children": [ + { + "marks": [ + "em" + ], + "text": "Adapting existing components", + "_key": "4eb6302b38970", + "_type": "span" + }, + { + "text": ": Key pre-processing and alignment functions were adapted from the ", + "_key": "4eb6302b38971", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "3fa0f88295d5" + ], + "text": "nf-core/rnaseq", + "_key": "4eb6302b38972" + }, + { + "_key": "4eb6302b38973", + "_type": "span", + "marks": [], + "text": " pipeline, allowing for shareability, efficiency, and scalability." + } + ] + }, + { + "style": "normal", + "_key": "dc6acae62561", + "listItem": "bullet", + "markDefs": [ + { + "_key": "6be1a3f37f71", + "_type": "link", + "href": "https://github.com/zhpn1024/ribotish" + }, + { + "_type": "link", + "href": "https://github.com/smithlabcode/ribotricer", + "_key": "67a956a543b0" + }, + { + "_type": "link", + "href": "https://www.bioconductor.org/packages/release/bioc/html/anota2seq.html", + "_key": "5f9cca0d1922" + }, + { + "_type": "link", + "href": "https://biocontainers.pro/", + "_key": "a24a587b6c75" + }, + { + "_key": "d813571ed2e7", + "_type": "link", + "href": "https://github.com/nf-core/modules" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "New tool integration", + "_key": "f59020155b400" + }, + { + "text": ": Specific tools for Ribo-seq analysis, such as ", + "_key": "f59020155b401", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "6be1a3f37f71" + ], + "text": "Ribo-TISH", + "_key": "f59020155b402", + "_type": "span" + }, + { + "text": ", ", + "_key": "f59020155b403", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "67a956a543b0" + ], + "text": "Ribotricer", + "_key": "f59020155b404" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "f59020155b405" + }, + { + "marks": [ + "5f9cca0d1922" + ], + "text": "anota2seq", + "_key": "f59020155b406", + "_type": "span" + }, + { + "_key": "f59020155b407", + "_type": "span", + "marks": [], + "text": ", were wrapped into modules using " + }, + { + "_type": "span", + "marks": [ + "a24a587b6c75" + ], + "text": "Biocontainers", + "_key": "f59020155b408" + }, + { + "text": ", within comprehensive testing frameworks to prevent regression and ensure reliability. These components were contributed to the ", + "_key": "f59020155b409", + "_type": "span", + "marks": [] + }, + { + "_key": "f59020155b4010", + "_type": "span", + "marks": [ + "d813571ed2e7" + ], + "text": "nf-core/modules" + }, + { + "_type": "span", + "marks": [], + "text": " repository, which will now be available for the wider community to reuse, independent of this effort.", + "_key": "f59020155b4011" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "06ef45923942", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Pipeline development", + "_key": "27cae4355e4d0", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": ": Individual components were stitched together coherently to create the nf-core/riboseq pipeline, with its own testing framework and user documentation.", + "_key": "27cae4355e4d1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "a11532d70cbc", + "markDefs": [], + "children": [ + { + "text": "Technical and Community Challenges", + "_key": "9af59990c0c00", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "449d8032618a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Generalizing existing functionality", + "_key": "262d18dad67e0", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "af3382a99d21", + "markDefs": [ + { + "_key": "601d56009a00", + "_type": "link", + "href": "https://nf-co.re/modules" + }, + { + "href": "https://nf-co.re/subworkflows", + "_key": "9d9691a3a5b4", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "nf-core has become an encyclopedia of components, including ", + "_key": "48bd49dd01300", + "_type": "span" + }, + { + "text": "modules", + "_key": "48bd49dd01301", + "_type": "span", + "marks": [ + "601d56009a00" + ] + }, + { + "marks": [], + "text": " and ", + "_key": "48bd49dd01302", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9d9691a3a5b4" + ], + "text": "subworkflows", + "_key": "48bd49dd01303" + }, + { + "_key": "48bd49dd01304", + "_type": "span", + "marks": [], + "text": " that developers can leverage to build Nextflow pipelines. RNA-seq data analysis, in particular, is well served by the nf-core/rnaseq pipeline, one of the longest-standing and most popular members of the nf-core community. Some of the components used in nf-core/rnaseq were not written with re-use in mind, so the first task in this project was to abstract the commodity components for processes such as preprocessing and quantification so that they could be effectively shared by the nf-core/riboseq pipeline." + } + ] + }, + { + "_key": "5669adb1dcd3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Test dataset generation", + "_key": "68d04249013b0" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "2767c14b9d80", + "markDefs": [], + "children": [ + { + "_key": "a4da2ac411130", + "_type": "span", + "marks": [], + "text": "Another significant hurdle was generating robust test data capable of supporting the ongoing quality assurance of our software. In Ribo-seq analysis, the basic operation of some tools depends on the quality of input data, so random down-sampling of variable quality input reads, especially at shallow depths may not be useful to generate test data. To overcome this, we implemented a targeted down-sampling strategy, selectively using input reads that meet high-quality standards and are known to align well with a specific chromosome. This method enabled us to produce a concise yet effective test data set, ensuring that our Ribo-seq tools operate reliably under realistic conditions." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Tool selection", + "_key": "27bceef25c9e0" + } + ], + "_type": "block", + "style": "h3", + "_key": "2aaebc117fde" + }, + { + "markDefs": [], + "children": [ + { + "text": "A primary challenge in developing the pipeline was the selection of high-quality, sustainable software. In bioinformatics, funding often limits software development, and many tools are poorly maintained. Furthermore, the understanding of what software 'works' can be ambiguous, embedded in the community's shared knowledge rather than documented formally. Our cooperative approach enabled us to make informed decisions and contribute improvements to the underlying software, enhancing utility for users beyond the nf-core community.", + "_key": "42c9c78112020", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1cb88d14f05e" + }, + { + "children": [ + { + "_key": "b2c37914fe590", + "_type": "span", + "marks": [], + "text": "Parameter selection" + } + ], + "_type": "block", + "style": "h3", + "_key": "d1e0de03d5a1", + "markDefs": [] + }, + { + "style": "normal", + "_key": "2523548b9954", + "markDefs": [], + "children": [ + { + "text": "Selecting the correct parameter settings for optimal operation of bioinformatics tools is a perennial problem in the community. In particular, the settings for the STAR alignment algorithm have very different constraints in Ribo-seq analysis relative to generic RNA-seq analysis. We conducted a series of benchmarks to assess the impact on alignment statistics of various combinations of parameters. We settled on a starting set, but this is a subject of continuing discussion with community members to drive further optimizations.", + "_key": "decd6cfc25240", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Pipeline Features", + "_key": "9a31de208e060", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "a8c53464a53f" + }, + { + "_key": "45f1476190e5", + "markDefs": [], + "children": [ + { + "_key": "f51ea64a9e180", + "_type": "span", + "marks": [], + "text": "The nf-core/riboseq pipeline is now a robust framework written using the nf-core pipeline template, and specifically tailored to handle the complexities of Ribo-seq data analysis." + } + ], + "_type": "block", + "style": "normal" + }, + { + "asset": { + "_type": "reference", + "_ref": "image-83f90945d29b41fcdc562789b06f3abbdbfa4d9a-1010x412-png" + }, + "_type": "image", + "_key": "9024177c2c73" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Here is what it offers:", + "_key": "3460577cae3f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c4c2c021e47b" + }, + { + "style": "normal", + "_key": "cfb811774489", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "5e7ebc27391f0", + "_type": "span", + "marks": [], + "text": "Baseline read preprocessing using processes adapted from existing nf-core components." + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "f78073ef3267", + "listItem": "bullet", + "markDefs": [ + { + "_key": "159e3bc6217d", + "_type": "link", + "href": "https://github.com/alexdobin/STAR" + } + ], + "children": [ + { + "marks": [], + "text": "Alignment to references with ", + "_key": "4ce6dc424aed0", + "_type": "span" + }, + { + "marks": [ + "159e3bc6217d" + ], + "text": "STAR", + "_key": "4ce6dc424aed1", + "_type": "span" + }, + { + "_key": "4ce6dc424aed2", + "_type": "span", + "marks": [], + "text": ", producing both transcriptome and genome alignments." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3cdb46402566", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "1b345d3fa4f80", + "_type": "span", + "marks": [], + "text": "Analysis of read distribution around protein-coding regions to assess frame bias and P-site offsets. This produces a rich selection of diagnostic plots to assess Ribo-seq data quality." + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "9e3414d59445", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Prediction and identification of translated open reading frames using tools like Ribo-TISH and Ribotricer.", + "_key": "3299c56efe000", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "9e8c117a96a2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Assessment of translational efficiency, which requires matched RNA-seq and Ribo-seq data, facilitated by the anota2seq Bioconductor package (see dot plot below).", + "_key": "c39d9d7b14f8" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-ca5f9967df813470051fcf548e962bdbf4c50ee5-624x624-png", + "_type": "reference" + }, + "_type": "image", + "_key": "7122c68ade88" + }, + { + "markDefs": [], + "children": [ + { + "_key": "57c0e67a28250", + "_type": "span", + "marks": [ + "em" + ], + "text": "An example result from anota2seq, a tool used to study gene expression, shows how transcription and translation are connected. The x-axis shows changes in overall mRNA levels (transcription) between a treated and a control group, while the y-axis displays changes in the rate of protein synthesis (translation) between those groups, as measured by Ribo-seq. Grey points represent genes with no significant change in either metric and most points align near the center of the x-axis, indicating little change in mRNA levels. However, some genes exhibit increased (orange) or decreased (red) protein synthesis, suggesting direct regulation of translation rather than changes driven solely by mRNA abundance." + } + ], + "_type": "block", + "style": "normal", + "_key": "067ad9c9d6d7" + }, + { + "children": [ + { + "marks": [], + "text": "If you are a researcher interested in Ribo-seq data analysis, you can test the pipeline by following the instructions in the ", + "_key": "e5078088e49b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "34ab33c4a8e1" + ], + "text": "getting started", + "_key": "e5078088e49b1" + }, + { + "_type": "span", + "marks": [], + "text": " section of the pipeline. Please feel free to submit bugs and feature requests to drive ongoing improvements. You can also become part of the conversation by joining the ", + "_key": "e5078088e49b2" + }, + { + "text": "#riboseq", + "_key": "e5078088e49b3", + "_type": "span", + "marks": [ + "218183b5348d" + ] + }, + { + "_type": "span", + "marks": [], + "text": " channel in the nf-core community Slack workspace. We would love to see you there!", + "_key": "e5078088e49b4" + } + ], + "_type": "block", + "style": "normal", + "_key": "46beba019134", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/riboseq/#usage", + "_key": "34ab33c4a8e1" + }, + { + "_type": "link", + "href": "https://nfcore.slack.com/channels/riboseq", + "_key": "218183b5348d" + } + ] + }, + { + "style": "h2", + "_key": "515022911e71", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Next Steps", + "_key": "bd13a8c55f6e" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Following this initial phase of work, Seqera and Altos Labs have handed over the nf-core/riboseq pipeline to the nf-core community for ongoing maintenance and development. As members of that community, we will continue to play a part in enhancing the pipeline going forward. We hope others will benefit from this effort and continue to improve and refine pipeline functionality.", + "_key": "14a152a9174f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "2d75d51ff270" + }, + { + "style": "normal", + "_key": "09c10fe38376", + "markDefs": [ + { + "href": "https://github.com/iraiosub/riboseq-flow", + "_key": "46fa6099abc2", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Coincidentally the authors of ", + "_key": "98347010c2330" + }, + { + "text": "riboseq-flow", + "_key": "98347010c2331", + "_type": "span", + "marks": [ + "46fa6099abc2" + ] + }, + { + "_type": "span", + "marks": [], + "text": " published their related work on the same day that nf-core/riboseq was first released. This pipeline has a highly complementary set of steps, and there is already ongoing collaboration to work together to build an even better community resource.", + "_key": "98347010c2332" + } + ], + "_type": "block" + }, + { + "_key": "c566b4d435e3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Empowering Research and Innovation", + "_key": "e5fdf870848b0" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "99da8271ab0f", + "markDefs": [], + "children": [ + { + "_key": "35352a1b306b0", + "_type": "span", + "marks": [], + "text": "The joint contribution of Seqera and Altos Labs to the nf-core/riboseq pipeline highlights how collaboration between industry and open-source communities can result in tools that push scientific boundaries and foster community engagement and development. By adhering to rigorous code quality and testing standards, nf-core/riboseq ensures researchers access to a dependable, cutting-edge tool." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "56719298b452", + "markDefs": [ + { + "href": "mailto:services@seqera.io", + "_key": "ccafa728bca7", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "We believe this new pipeline is poised to be vital in studying protein synthesis and its implications for aging and health. This is not just a technical achievement - it's a step forward in collaborative, open scientific progress.", + "_key": "53386085eb760", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "mailto:services@seqera.io", + "_key": "ccafa728bca7", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you have a project in mind where Seqera may be able to help with our Professional Services offerings, please contact us at ", + "_key": "cafe02f0755d" + }, + { + "text": "services@seqera.io", + "_key": "53386085eb761", + "_type": "span", + "marks": [ + "ccafa728bca7" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". We are the content experts for Nextflow, nf-core, and the Seqera Platform, and can offer tailored solutions and expert guidance to help you fulfill your objectives.", + "_key": "53386085eb762" + } + ], + "_type": "block", + "style": "normal", + "_key": "6e42514da79e" + }, + { + "children": [ + { + "marks": [], + "text": "To learn more about Altos Labs, visit ", + "_key": "3babdea8c79d0", + "_type": "span" + }, + { + "_key": "3babdea8c79d1", + "_type": "span", + "marks": [ + "026178e92bb6" + ], + "text": "https://www.altoslabs.com/" + }, + { + "text": ".", + "_key": "3babdea8c79d2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a5dc365dc556", + "markDefs": [ + { + "href": "https://www.altoslabs.com/", + "_key": "026178e92bb6", + "_type": "link" + } + ] + }, + { + "_key": "5b95f381569b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Acknowledgments", + "_key": "48b61c9282e00" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "text": "nf-core/riboseq was initially written by Jonathan Manning (Bioinformatics Engineer at Seqera) in collaboration with Felix Krueger and Christel Krueger (Altos Labs). The development work carried out on the pipeline was funded by Altos Labs. We thank the following people for their input (", + "_key": "d836d0eff50e0", + "_type": "span", + "marks": [] + }, + { + "_key": "d836d0eff50e1", + "_type": "span", + "marks": [ + "em" + ], + "text": "in alphabetical order" + }, + { + "text": "):", + "_key": "d836d0eff50e2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "258428890647" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "be9ad649bb8d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "376c006c20de0", + "_type": "span", + "marks": [], + "text": "Felipe Almeida (ZS)" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Anne Bresciani (ZS)", + "_key": "6046a5e41c110" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "abb0a8d9fba2" + }, + { + "style": "normal", + "_key": "31c2f31a40bc", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Caroline Eastwood (University of Edinburgh)", + "_key": "040c3d125ae60", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "ce8f076685cf", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "f3c530a930470", + "_type": "span", + "marks": [], + "text": "Maxime U Garcia (Seqera)" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "7b34ffefab7d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Mikhail Osipovitch (ZS)", + "_key": "e21649c58e7b0", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1f18a294d9a20", + "_type": "span", + "marks": [], + "text": "Jack Tierney (University College Cork)" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "02884c22d195", + "listItem": "bullet" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Edward Wallace (University of Edinburgh)\n\n", + "_key": "86b2bce07178", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8f03c90bd810" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "1da880ad30a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e59fb1d47363", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "736ce4dde440", + "markDefs": [], + "children": [ + { + "text": "\n\n", + "_key": "1c8d35ffcae9", + "_type": "span", + "marks": [] + } + ] + } + ], + "_id": "0d583937-1d7f-4c31-9e79-d8f1e5f2a2da", + "_rev": "UBGILU345IzqgWYhEN5Di2" + }, + { + "title": "Bioinformatics events you can’t miss in fall 2024 and early 2025", + "_rev": "odsN0KVxadbI50QPUHiVWo", + "publishedAt": "2024-09-24T09:27:00.000Z", + "meta": { + "description": "Get ready to mark your calendars because the fall of 2024 is going to be jam-packed with amazing opportunities to expand your knowledge, make new connections, and stay at the forefront of bioinformatics!", + "noIndex": false, + "slug": { + "current": "bioinformatics-events-2024-2025", + "_type": "slug" + }, + "_type": "meta" + }, + "_id": "15c75021-e091-4854-9aa0-fc04970ec963", + "tags": [ + { + "_type": "reference", + "_key": "851fad916bc4", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058" + } + ], + "_createdAt": "2024-09-24T07:27:22Z", + "author": { + "_ref": "irina-silva", + "_type": "reference" + }, + "_type": "blogPost", + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "Bioinformaticians worldwide, get ready to mark your calendars: Fall 2024 is looking jam-packed with amazing opportunities to learn, connect, and stay at the forefront of bioinformatics!", + "_key": "005350c7abb30", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a4fd9b6dafb3" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "83660dc9adba0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "616a0b80c1b7" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "With so many fantastic events happening worldwide, we've handpicked those that are bioinformatics-focused or feature bioinformatics tracks – so you can be sure not to miss the ones most relevant to you. \n\nHere is our curated compilation of some of the best industry events to attend this fall in Europe, North America, and Asia-Pacific, as well as a sneak peak of events coming up in 2025.", + "_key": "b2ce77baf4420" + } + ], + "_type": "block", + "style": "normal", + "_key": "1772499dc381", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "736725c102d0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bc91ea5b179f0" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Top bioinformatics events in Europe", + "_key": "ba883d260fbf0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "80b88306ce1f" + }, + { + "_key": "6ef975430bd0", + "markDefs": [ + { + "href": "https://summit.nextflow.io/2024/barcelona/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "a58cd46a7162", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "a58cd46a7162", + "strong" + ], + "text": "Nextflow Summit Barcelona", + "_key": "504b028511d50" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e1cc809749e10", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7408e5fde66f" + }, + { + "children": [ + { + "text": "Location:", + "_key": "ba521555b8600", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": " Barcelona, Spain", + "_key": "e38d045f888b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "eedc79c932b6", + "markDefs": [] + }, + { + "style": "normal", + "_key": "4a2541ddb2bf", + "markDefs": [], + "children": [ + { + "text": "Dates:", + "_key": "6f4fbc08e04d0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "3022f367802b", + "_type": "span", + "marks": [], + "text": " October 28 - November 1, 2024" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "92a0bd002639", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In-person | Online", + "_key": "b6f470c2fc4a0" + } + ], + "_type": "block" + }, + { + "_key": "81af197467b0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7b2910b77cf90" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f2f63e0ffe44", + "markDefs": [ + { + "_key": "e53a5efb455a", + "_type": "link", + "href": "https://summit.nextflow.io/2024/boston/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024" + }, + { + "_type": "link", + "href": "https://seqera.io?utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "d19e6bd20b41" + }, + { + "_type": "link", + "href": "https://seqera.io/nextflow/?utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "3f0fdb18390e" + }, + { + "_key": "3b66656713f3", + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/training/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/hackathon/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "3a54f51827fc" + } + ], + "children": [ + { + "text": "Did you miss out on the ", + "_key": "4b0e9968a18b0", + "_type": "span", + "marks": [] + }, + { + "_key": "4b0e9968a18b1", + "_type": "span", + "marks": [ + "e53a5efb455a" + ], + "text": "Nextflow Summit in Boston" + }, + { + "_key": "4b0e9968a18b2", + "_type": "span", + "marks": [], + "text": " earlier this year? Don’t worry! The premier event in bioinformatics, from " + }, + { + "_type": "span", + "marks": [ + "d19e6bd20b41" + ], + "text": "Seqera ", + "_key": "4b0e9968a18b3" + }, + { + "_type": "span", + "marks": [], + "text": "- the creators of ", + "_key": "4b0e9968a18b4" + }, + { + "_type": "span", + "marks": [ + "3f0fdb18390e" + ], + "text": "Nextflow", + "_key": "4b0e9968a18b5" + }, + { + "_type": "span", + "marks": [], + "text": " - returns to the old continent and will bring together leading experts, innovators, and researchers to showcase the latest breakthroughs in ", + "_key": "4b0e9968a18b6" + }, + { + "text": "bioinformatics workflow management", + "_key": "9d456da5cb10", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": ". Whether you are new to Nextflow or a seasoned pro, the Nextflow Summit offers something for everyone. The ", + "_key": "373cbfba2986", + "_type": "span", + "marks": [] + }, + { + "_key": "4b0e9968a18b7", + "_type": "span", + "marks": [ + "3b66656713f3" + ], + "text": "foundational training" + }, + { + "marks": [], + "text": " is perfect for newcomers, while experienced users can dive into advanced topics during the ", + "_key": "4b0e9968a18b8", + "_type": "span" + }, + { + "_key": "4b0e9968a18b9", + "_type": "span", + "marks": [ + "3a54f51827fc" + ], + "text": "nf-core hackathon" + }, + { + "text": ". The event concludes with three days of talks where attendees can learn about the latest developments from the Nextflow world.", + "_key": "4b0e9968a18b10", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "35a1087e0cc40", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8116e6ce1ea3" + }, + { + "children": [ + { + "text": "Register by October 11 for the in-person event or by October 21 for the online event — don’t miss your chance to join! ", + "_key": "de39ddb7f24d0", + "_type": "span", + "marks": [] + }, + { + "_key": "b1db89a2a9460", + "_type": "span", + "marks": [ + "5bfcc95e707c" + ], + "text": "Secure your spot now" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "ac094e244fb8", + "markDefs": [ + { + "_key": "5bfcc95e707c", + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024" + } + ] + }, + { + "children": [ + { + "_key": "20a3f7967a050", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3e5fcc90ac56", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.terrapinn.com/conference/biotechx/index.stm", + "_key": "b7d0470bb0a0" + } + ], + "children": [ + { + "marks": [ + "b7d0470bb0a0", + "strong" + ], + "text": "BiotechX Europe", + "_key": "9a11618984d80", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "e416b76c45cc" + }, + { + "_type": "block", + "style": "normal", + "_key": "25078f3c087a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "eb507cbfd26d0", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [ + "strong" + ], + "text": "Location", + "_key": "e089ddce37050", + "_type": "span" + }, + { + "marks": [], + "text": ": Basel, Switzerland", + "_key": "6f54ca8682e0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "56111676db9c", + "markDefs": [] + }, + { + "_key": "97e0f9f38fc5", + "markDefs": [], + "children": [ + { + "_key": "b16b44c054520", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates:" + }, + { + "_type": "span", + "marks": [], + "text": " October 9-10, 2024", + "_key": "a05dff9d821a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In-person", + "_key": "f8f8a4a1a7570", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5c447814d2e" + }, + { + "children": [ + { + "text": "", + "_key": "8a9758c9fea70", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "40fa33cf6eb4", + "markDefs": [] + }, + { + "children": [ + { + "_key": "2e13ccf9b0c60", + "_type": "span", + "marks": [], + "text": "If you work in pharmaceutical development and healthcare, this is the event for you to attend. As Europe’s largest conference in the industry, BiotechX Europe will welcome more than 400 speakers, 3,500 attendees, and 150 exhibitors. Aiming to foster collaboration between research and industry, the event features 16 tracks covering a wide range of topics, including " + }, + { + "_key": "9383f9cb63d8", + "_type": "span", + "marks": [ + "strong" + ], + "text": "bioinformatics, multi-omics data management, AI, and computational genomics" + }, + { + "marks": [], + "text": ".", + "_key": "88656d154026", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d3885f9b73c9", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a4d22394e97d0" + } + ], + "_type": "block", + "style": "normal", + "_key": "b0c44d155f57", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "c31bb3fd6482", + "markDefs": [ + { + "href": "https://seqera.io/seqera-at-biotechx-eu-2024/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "5bddbaea2256", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "5bddbaea2256" + ], + "text": "Seqera will be at the event", + "_key": "fa7beca9b4170" + }, + { + "_key": "fa7beca9b4171", + "_type": "span", + "marks": [], + "text": " for two full days of networking and discussion with the life sciences community from around the world. We'll also deliver a talk as part of the bioinformatics track, so be sure to stop by. Can’t make it? No worries–we'll send you the recording afterward." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d20d7581ffe7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fb0c1c420266" + } + ] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "c34556d8796c", + "markDefs": [ + { + "href": "https://seqera.io/seqera-at-biotechx-eu-2024/?utm_campaign=BiotechX%20Europe%20October%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "abc21a447095", + "_type": "link" + } + ], + "children": [ + { + "text": "Send me the recording!", + "_key": "75425c19bb920", + "_type": "span", + "marks": [ + "abc21a447095" + ] + } + ] + }, + { + "style": "normal", + "_key": "9b2c991e6bc8", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ccffff4a3a840", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "ad59bb4d3c92", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Top bioinformatics events in North America", + "_key": "dee5fb2705be0" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "54f53931838a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b2f8fdbbacd3", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.ashg.org/meetings/2024meeting/", + "_key": "1de576a95b41" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "1de576a95b41", + "strong" + ], + "text": "American Society of Human Genetics (ASHG) 2024 Annual Meeting", + "_key": "f121c91a01960" + } + ], + "_type": "block", + "style": "h3", + "_key": "b82701a347b9" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "270a0b0c93520" + } + ], + "_type": "block", + "style": "normal", + "_key": "9791344b1601", + "markDefs": [] + }, + { + "children": [ + { + "text": "Location", + "_key": "5f83a2c703a00", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "34f611874891", + "_type": "span", + "marks": [], + "text": ": Denver, CO" + } + ], + "_type": "block", + "style": "normal", + "_key": "6cafccd72b27", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates:", + "_key": "200e5ac4ecbf0" + }, + { + "_key": "1c818d12c54a", + "_type": "span", + "marks": [], + "text": " November 5-9, 2024" + } + ], + "_type": "block", + "style": "normal", + "_key": "6dd5d782c459" + }, + { + "children": [ + { + "text": "In-person", + "_key": "48854f5dda260", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f2486648036e", + "markDefs": [] + }, + { + "style": "normal", + "_key": "736a1293e25e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2d6ee098f3950" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "ASHG 2024 will welcome more than 8,000 scientists from around the world for five days of talks, exhibits, and networking events focused on ", + "_key": "bbcbfde31c4b0" + }, + { + "text": "genetics and genomics science", + "_key": "cacd04d5d23c", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "712fc74e16b6", + "_type": "span", + "marks": [], + "text": ". The conference will feature many sessions and workshops dedicated to bioinformatics, big data analysis, and computational biology, making it one of the most anticipated events this year for bioinformaticians and computational biologists." + } + ], + "_type": "block", + "style": "normal", + "_key": "886e182edc4d" + }, + { + "style": "normal", + "_key": "18ad39abbcf6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0414fa03d6dd", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e5ca7f24e4c2", + "markDefs": [], + "children": [ + { + "text": "Seqera will exhibit at the event and lead an industry session on November 6 at 12:00 pm. More information will be available soon.", + "_key": "3e19b04852a00", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "ef22f2d30f12", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f2faa13a66e9" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c8edc7ead73f0", + "_type": "span", + "marks": [], + "text": "If you'd like to join ASHG, make sure to register by October 1 – time is running out!" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "a75336029a69" + }, + { + "style": "normal", + "_key": "406e5aa50374", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "97a5825ce0bd0" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "c1609723d99c", + "markDefs": [ + { + "_type": "link", + "href": "https://sc24.supercomputing.org/", + "_key": "aad3372eb124" + } + ], + "children": [ + { + "_key": "7f49ea4d4a120", + "_type": "span", + "marks": [ + "aad3372eb124", + "strong" + ], + "text": "Supercomputing Conference (SC) 2024" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4aa3272646850" + } + ], + "_type": "block", + "style": "normal", + "_key": "f80b88664b26" + }, + { + "children": [ + { + "_key": "2aa2e076c3e00", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location" + }, + { + "marks": [], + "text": ": Atlanta, GA", + "_key": "26f9d1914085", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8856171a422b", + "markDefs": [] + }, + { + "_key": "8e494c697046", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Dates", + "_key": "430211bfc8fd0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ": November 17-22, 2024", + "_key": "4b57e300353f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In-person", + "_key": "d02b7076e2730" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6687f77180c", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0784e7e7b650", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1786737407f60", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "b05914a7f5f4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "SC 2024 is an essential event for professionals and students in the high-performance computing (HPC) community. It is heavily oriented towards bioinformaticians involved in the computational aspects of bioinformatics and will tackle topics including ", + "_key": "a6f68a9bee6f0" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "AI, machine learning, and cloud computing", + "_key": "4c799f8dcbf5" + }, + { + "text": ". The six-day event will also allow attendees to attend tutorials and workshops, giving them the chance to learn from leading experts in the most popular areas of HPC.", + "_key": "07dfe2450254", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "0972cb55b5a20", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "709058259fc0", + "markDefs": [] + }, + { + "_key": "613b9498e172", + "markDefs": [], + "children": [ + { + "text": "Top bioinformatics event in Asia-Pacific", + "_key": "c59da1af8df20", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "d645227f6012", + "markDefs": [ + { + "_key": "48fc94ab104d", + "_type": "link", + "href": "https://www.abacbs.org/conference2024/home" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "48fc94ab104d", + "strong" + ], + "text": "Australian Bioinformatics and Computational Biology Society (ABACBS)", + "_key": "cdbc4313d6e40" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "b7bf73f58eec", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "08ac1af2bef90" + } + ] + }, + { + "_key": "8c082012ba0e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location", + "_key": "74cbc34e6fc00" + }, + { + "_type": "span", + "marks": [], + "text": ": Sydney, Australia", + "_key": "6c5ae628c782" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "Dates", + "_key": "3ad60b7ef0f70", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "591dd93801c2", + "_type": "span", + "marks": [], + "text": ": November 4-6, 2024" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa5f5d785b87" + }, + { + "style": "normal", + "_key": "51210721b576", + "markDefs": [], + "children": [ + { + "text": "In-person", + "_key": "70aeae7c0a1c0", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "084892f223c3", + "markDefs": [], + "children": [ + { + "_key": "dc51128dbf810", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9d1c6ab024b3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Back for its 9th edition, the Australian Bioinformatics and Computational Biology Society conference (ABACBS) is an exciting event for bioinformatics professionals and students in APAC, serving as the central hub for bioinformatics and computational biology in the region. In addition to highlighting international developments in the field, the conference focuses on regional bioinformatics innovations across central themes such as ", + "_key": "0b7cb42aa9110" + }, + { + "_key": "0290cc74565c", + "_type": "span", + "marks": [ + "strong" + ], + "text": "AI, statistical bioinformatics, genomics, proteomics, and single-cell and spatial technologies" + }, + { + "marks": [], + "text": ".", + "_key": "aca568628005", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "99b177a43c3c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "733d749612cf0", + "_type": "span" + } + ] + }, + { + "_key": "00d0e23d734f", + "markDefs": [ + { + "_key": "dae3da921ce9", + "_type": "link", + "href": "https://www.combine.org.au/symp/" + } + ], + "children": [ + { + "text": "If you’re a student in the field, you should consider attending the event, which will be held in conjunction with the ", + "_key": "9513b42b74790", + "_type": "span", + "marks": [] + }, + { + "_key": "9513b42b74791", + "_type": "span", + "marks": [ + "dae3da921ce9" + ], + "text": "COMBINE" + }, + { + "_type": "span", + "marks": [], + "text": " student symposium.", + "_key": "9513b42b74792" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "fbe1769338e30", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d7f7b49e0840" + }, + { + "_type": "block", + "style": "h2", + "_key": "4560ed2f3628", + "markDefs": [], + "children": [ + { + "_key": "138d633d7ef70", + "_type": "span", + "marks": [], + "text": "Upcoming bioinformatics events in 2025" + } + ] + }, + { + "_key": "f33391676310", + "markDefs": [], + "children": [ + { + "_key": "bb6eb50c89060", + "_type": "span", + "marks": [], + "text": "For those of you already planning for next year's conference season, we’ve highlighted events that are already confirmed for 2025. While their programs are yet to be released, you can count on these events taking place." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "cbc69b8eee9f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8575b2b8ad7c", + "markDefs": [] + }, + { + "style": "h3", + "_key": "9527133dd59c", + "markDefs": [ + { + "href": "https://summit.nextflow.io/preregister-2025/", + "_key": "68b913f586a0", + "_type": "link" + } + ], + "children": [ + { + "_key": "facd674d64600", + "_type": "span", + "marks": [ + "68b913f586a0", + "strong" + ], + "text": "Nextflow Summit 2025" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f7a2358dce0a0" + } + ], + "_type": "block", + "style": "normal", + "_key": "1755b2902f51", + "markDefs": [] + }, + { + "style": "normal", + "_key": "efae1385cf87", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location", + "_key": "f026d6e6cc2d0" + }, + { + "marks": [], + "text": ": Boston & Barcelona", + "_key": "35978ba96f8b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "a08e77d4c18c", + "markDefs": [], + "children": [ + { + "_key": "f09ea28e89f20", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates:" + }, + { + "marks": [], + "text": " May 13-16, 2025, Boston | Fall 2025, Barcelona", + "_key": "16d8512f2f29", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In-person | Online", + "_key": "234a304394e60" + } + ], + "_type": "block", + "style": "normal", + "_key": "a62cf4f06de9" + }, + { + "_type": "block", + "style": "normal", + "_key": "38bdbbe0ea31", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e2bba980e95a0" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you missed earlier editions of the Boston and Barcelona Nextflow Summits, this is your chance to take part. The Nextflow Summit will be back in Boston during the Spring of 2025 and to Barcelona in the Fall. While the full agenda is yet to be released, you can already pre-register to be the first to know when tickets go on sale.", + "_key": "949ff83df9750" + } + ], + "_type": "block", + "style": "normal", + "_key": "3b9cca4be7c4" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "808678869018", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "174f3cf7650b" + }, + { + "style": "blockquote", + "_key": "ee129ec0b1f1", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/preregister-2025/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "5b563ada5eeb" + } + ], + "children": [ + { + "marks": [ + "5b563ada5eeb" + ], + "text": "Pre-register", + "_key": "00e54ab70ebd0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "d2097b2976a1", + "markDefs": [ + { + "href": "https://festivalofgenomics.com/london/en/page/home", + "_key": "ae10362c6b73", + "_type": "link" + } + ], + "children": [ + { + "_key": "56dc071f15870", + "_type": "span", + "marks": [], + "text": "\n" + }, + { + "_type": "span", + "marks": [ + "ae10362c6b73", + "strong" + ], + "text": "The Festival of Genomics & Biodata", + "_key": "5ac46e126d80" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "857eb01550dd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "51ff3f3d71470" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "3c23340b964e0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location" + }, + { + "marks": [], + "text": ": London, UK", + "_key": "dc560c599223", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e7da00284d71" + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Dates", + "_key": "679432d9beff0", + "_type": "span" + }, + { + "text": ": January 29-30, 2025", + "_key": "dd4fd5fccef2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5554a518940e" + }, + { + "style": "normal", + "_key": "b00fc59f223a", + "markDefs": [], + "children": [ + { + "_key": "706f90cb61650", + "_type": "span", + "marks": [], + "text": "In-person" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e6d4529eb25f0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "82b182d551db" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Established as the UK’s largest annual life sciences event, the Festival of Genomics & Biodata is particularly relevant for ", + "_key": "4e96f23e3acc0" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "bioinformaticians in the genomics community", + "_key": "ba05da7be364" + }, + { + "_key": "72271b05856e", + "_type": "span", + "marks": [], + "text": ". The 2025 edition is expected to gather more than 7000 attendees and 300 speakers. The full agenda will be released on October 15, 2024, but you can already express interest in registering to be the first to know when tickets go on sale!" + } + ], + "_type": "block", + "style": "normal", + "_key": "5e3ffe169818" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c4dd4d1cba7f" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e05733ec461", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "64a0959f1585", + "markDefs": [ + { + "href": "https://seqera.io/events/?utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "511e6294d45b", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera will be attending the Festival for the third year in a row! We’ll share more information about our participation soon–stay tuned! To make sure you don’t miss out on any announcements, follow us on social media or check out our ", + "_key": "a552c6b5f6c50" + }, + { + "_type": "span", + "marks": [ + "511e6294d45b" + ], + "text": "events page", + "_key": "a552c6b5f6c51" + }, + { + "text": ".", + "_key": "a552c6b5f6c52", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "938fc2309ea1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "da2fe42732c0" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "3af3eac54fd9", + "strong" + ], + "text": "Bio-IT World Conference & Expo", + "_key": "f95dcb9530220" + } + ], + "_type": "block", + "style": "h3", + "_key": "e878b99f1723", + "markDefs": [ + { + "_type": "link", + "href": "https://www.bio-itworldexpo.com/", + "_key": "3af3eac54fd9" + } + ] + }, + { + "_key": "54ddb92cf29e", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "465133b3815c0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "54792387f148", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Location", + "_key": "f593685fd4f20", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ": Boston, MA", + "_key": "1b69f8ea855f" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates", + "_key": "e6746c9c63d90" + }, + { + "_key": "369508d1c229", + "_type": "span", + "marks": [], + "text": ": April 2-4, 2025" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e746b424dfe" + }, + { + "markDefs": [], + "children": [ + { + "text": "In-person | Online", + "_key": "d3d807db73a90", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "10af5b1aaee8" + }, + { + "style": "normal", + "_key": "d17fe67cf1d4", + "markDefs": [], + "children": [ + { + "_key": "69afc5921d850", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d98d6ce09255", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Annual Conference and Expo focuses on the intersection of ", + "_key": "c570a28e8bac0" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "life sciences, data sciences, and technology", + "_key": "ec0b70541836" + }, + { + "_key": "a28d11d5fc55", + "_type": "span", + "marks": [], + "text": " and is particularly suited to bioinformaticians and computational biologists with a strong interest in data and technology. The event includes plenary keynotes, over 200 educational and technical presentations across 11 tracks, interactive discussions, and exhibits on the latest technologies in the life sciences. Those of you who can’t attend in person can follow a live virtual stream. Registrations are already open, and you can benefit from a discounted rate until November 15, 2024!" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "6f9ff5363f1a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7b614ade8e38", + "_type": "span" + } + ] + }, + { + "_key": "b061a380bfe9", + "markDefs": [], + "children": [ + { + "text": "Seqera will be a Platinum sponsor of Bio-IT World. Visit our booth on the tradeshow floor and listen to our presentation on the Cloud Computing track on Thursday, April 3. More information will be available earlier in the year.", + "_key": "46425ee3e2460", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "4ec1d8fa8969", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "94fff51924ce" + }, + { + "_key": "9b2834fdcebf", + "markDefs": [], + "children": [ + { + "_key": "b4d6c1726cd40", + "_type": "span", + "marks": [], + "text": "Why these events matter: learn, innovate, connect" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "32dd265b0c7b", + "markDefs": [], + "children": [ + { + "text": "The events we’ve highlighted are all well-established and represent a unique opportunity to keep up with the latest research, build strong industry connections, and learn new skills. Throughout the wide range of topics and specialties covered, data scientists and bioinformaticians can keep up with how the field is advancing, both at the regional and international levels.", + "_key": "3a6692daa6b20", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "91d71b7afa4b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The hands-on workshops and tutorials will also help develop practical skills that you can apply to your research or work.", + "_key": "02f1d633eef70" + } + ] + }, + { + "children": [ + { + "text": "Whether you’re just starting or a seasoned expert, these events represent an excellent opportunity for professional growth and to remain at the forefront of bioinformatics.", + "_key": "5e231565ed770", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d814e73f13d4", + "markDefs": [] + }, + { + "style": "normal", + "_key": "c77e333c8e1d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a388e852748e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "737760a444b6", + "asset": { + "_ref": "image-54912048f85a1aa655553391b6d0e62fa57e82de-1200x628-png", + "_type": "reference" + } + }, + { + "style": "normal", + "_key": "e4638b15e2df", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "1ced84af1e7a0", + "_type": "span" + } + ], + "_type": "block" + } + ], + "_updatedAt": "2024-09-24T09:27:24Z" + }, + { + "title": "MultiQC: Grouped samples and custom scripts", + "_updatedAt": "2024-10-16T13:31:10Z", + "_createdAt": "2024-05-23T06:21:37Z", + "publishedAt": "2024-10-16T06:00:00.000Z", + "body": [ + { + "_key": "825c0af35887", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/agenda/10-31--multiqc-new-features-and-flexible/", + "_key": "85a288678e95" + } + ], + "children": [ + { + "_key": "f07cbc53bd370", + "_type": "span", + "marks": [], + "text": "It’s been an exciting year for the MultiQC team at Seqera, with developments aimed at modernizing the codebase and expanding functionality. In this blog post we’ll recap the big features, such as long-awaited " + }, + { + "text": "Sample Grouping", + "_key": "f07cbc53bd371", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " to simplify report tables, as well as the ability to use MultiQC as a Python library, enabling custom scripts and dynamic report generation. And there’s even more to come – stay tuned for the upcoming ", + "_key": "f07cbc53bd372", + "_type": "span" + }, + { + "text": "MultiQC talk", + "_key": "f07cbc53bd373", + "_type": "span", + "marks": [ + "85a288678e95" + ] + }, + { + "_type": "span", + "marks": [], + "text": " at the Nextflow Summit in Barcelona, excitement guaranteed!", + "_key": "f07cbc53bd374" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "05dc46b2aa760", + "_type": "span", + "marks": [], + "text": "Sample grouping 🫂" + } + ], + "_type": "block", + "style": "h2", + "_key": "a2253fab8405" + }, + { + "_type": "block", + "style": "normal", + "_key": "e597c2f9c4b4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Many of you who are used to reading MultiQC reports will be familiar with seeing ", + "_key": "8808adba7b870" + }, + { + "marks": [ + "em" + ], + "text": "General Statistics", + "_key": "8808adba7b871", + "_type": "span" + }, + { + "text": " tables that have “gaps” in rows like this:", + "_key": "8808adba7b872", + "_type": "span", + "marks": [] + } + ] + }, + { + "asset": { + "_ref": "image-0bc6a5e44bd0449bf63fa8a3fe9380e10fcaed01-3482x2064-png", + "_type": "reference" + }, + "_type": "image", + "_key": "f111f647ef2a" + }, + { + "markDefs": [], + "children": [ + { + "text": "This happens because MultiQC finds sample names from input data filenames. In the case of FastQC, paired-end sequencing data will have two FASTQ files and generate two separate FastQC reports. This means each sample name has a ", + "_key": "fe85e8a0b3450", + "_type": "span", + "marks": [] + }, + { + "text": "_R1", + "_key": "5c683f75103f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " or ", + "_key": "387b4dbd0c21", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "_R2", + "_key": "be0566f07b2f", + "_type": "span" + }, + { + "text": " suffix and cannot be merged with outputs from downstream analysis, where these are collapsed into a single sample identifier. Until now, the best advice we’ve been able to give is to either throw half of the data away or put up with the ugly tables - neither are good options!", + "_key": "dc7b7eb36c44", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0acde0d59fb6" + }, + { + "_key": "a5338993f69c", + "markDefs": [ + { + "_key": "09acce0b2b3f", + "_type": "link", + "href": "https://github.com/MultiQC/MultiQC/issues/542" + } + ], + "children": [ + { + "marks": [], + "text": "One of the oldest open issues in the MultiQC repo (", + "_key": "8eef787f80ea0", + "_type": "span" + }, + { + "_key": "8eef787f80ea1", + "_type": "span", + "marks": [ + "09acce0b2b3f" + ], + "text": "#542" + }, + { + "_key": "8eef787f80ea2", + "_type": "span", + "marks": [], + "text": ", from 2017) is about introducing a new technique to group samples. Phil started a branch to work on the problem but hit a wall, leaving the comment " + }, + { + "text": "“This got really complicated. Need to think about how to improve it.”", + "_key": "8eef787f80ea3", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": " There it sat, racking up occasional comments and requests for updates.", + "_key": "8eef787f80ea4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "cd6951d8332a0", + "_type": "span", + "marks": [], + "text": "Finally in MultiQC v1.25, seven years after this issue was created, we’re delighted to introduce – " + }, + { + "text": "Sample grouping", + "_key": "cd6951d8332a1", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": ":", + "_key": "cd6951d8332a2" + } + ], + "_type": "block", + "style": "normal", + "_key": "3423abf00059" + }, + { + "_key": "48911ef9fb45", + "asset": { + "_type": "reference", + "_ref": "image-c12d430eb8dc05b871a48add5e8f8e22c2ff6028-1640x720-gif" + }, + "_type": "image" + }, + { + "style": "normal", + "_key": "5f7071610ad9", + "markDefs": [ + { + "href": "https://docs.seqera.io/multiqc/reports/customisation#sample-grouping", + "_key": "48b4eec2b409", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This new ", + "_key": "f9dd723187bd0" + }, + { + "_key": "988338ff4a02", + "_type": "span", + "marks": [ + "code" + ], + "text": "table_sample_merge" + }, + { + "_key": "d921626516f4", + "_type": "span", + "marks": [], + "text": " config option allows you to specify sample name suffixes to group into a single row (see " + }, + { + "marks": [ + "48b4eec2b409" + ], + "text": "docs", + "_key": "f9dd723187bd1", + "_type": "span" + }, + { + "_key": "f9dd723187bd2", + "_type": "span", + "marks": [], + "text": "). When set, MultiQC will group samples in supported modules under a common prefix. Any component sample statistics can be shown by toggling the caret in the row header, with summary statistics on the main row. This allows a compressed yet accurate overview of all samples, whilst still allowing readers of the report to dig in and see the underlying data for each input sample." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8a074f0efc68", + "markDefs": [], + "children": [ + { + "_key": "edc4f643059d0", + "_type": "span", + "marks": [], + "text": "For now, the new config option is opt-in, but we hope to soon set some common suffixes such as " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "_R1", + "_key": "c67dcad3c755" + }, + { + "_key": "18b26913d3a7", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_key": "7a569be866d9", + "_type": "span", + "marks": [ + "code" + ], + "text": "_R2" + }, + { + "_key": "4811b985f605", + "_type": "span", + "marks": [], + "text": " as defaults for all users. Some modules have the concept of sub-samples within parsed data (e.g., flow cells → lanes) and use sample grouping without needing additional configuration. The sample grouping implementation is entirely bespoke to each MultiQC module: each column needs consideration as to whether it should be averaged, summed, or something else. We’ve added support to key modules such as FastQC, Cutadapt, and BCLConvert, and plan to add support to more modules over time." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "MultiQC as a library 📜", + "_key": "e2bf849132580" + } + ], + "_type": "block", + "style": "h2", + "_key": "32587d324a17" + }, + { + "markDefs": [], + "children": [ + { + "_key": "91f6511e64730", + "_type": "span", + "marks": [], + "text": "Version 1.22 brought some major behind-the-scenes refactoring to MultiQC. These changes enable MultiQC to be used as a library within scripts. It adds another way to customize report content beyond “Custom Content” and MultiQC Plugins, as you can now dynamically inject data, filter, and customize report content within a script. Ideal for use within analysis pipelines!" + } + ], + "_type": "block", + "style": "normal", + "_key": "0f9ae21494d4" + }, + { + "_type": "block", + "style": "normal", + "_key": "6049bf0f935f", + "markDefs": [ + { + "_key": "f089f7503fa5", + "_type": "link", + "href": "https://github.com/OpenGene/fastp" + } + ], + "children": [ + { + "marks": [], + "text": "Let's look at a very basic example to give a feel for how this could be used. Here, we have a Python script that imports MultiQC, parses report data from ", + "_key": "3bb10cbfcadb0", + "_type": "span" + }, + { + "_key": "3bb10cbfcadb1", + "_type": "span", + "marks": [ + "f089f7503fa5" + ], + "text": "fastp" + }, + { + "_type": "span", + "marks": [], + "text": ", adds a custom report section and table, and then generates a report.", + "_key": "3bb10cbfcadb2" + } + ] + }, + { + "code": "import multiqc\nfrom multiqc.plots import table\n\n# Parse logs from fastp\nmultiqc.parse_logs('./data/fastp')\n\n# Add a custom table\nmodule = multiqc.BaseMultiqcModule()\nmodule.add_section(\n plot=table.plot(\n data={\n \"sample 1\": {\"aligned\": 23542, \"not_aligned\": 343},\n \"sample 2\": {\"aligned\": 1275, \"not_aligned\": 7328},\n },\n pconfig={\n \"id\": \"my_metrics_table\",\n \"title\": \"My metrics\"\n }\n )\n)\nmultiqc.report.modules.append(module)\n\n# Generate the report\nmultiqc.write_report()", + "_type": "code", + "language": "python", + "_key": "2fc6817b3c84" + }, + { + "markDefs": [ + { + "_key": "07b92ff1c1c2", + "_type": "link", + "href": "https://docs.seqera.io/multiqc/development/plugins" + } + ], + "children": [ + { + "text": "Scripts like this can be written to do any number of things. We hope it removes the need to run MultiQC multiple times to report on secondary statistics. It can also enable customization of things like table columns, custom data injection, and most other things you can think of! Best of all, unlike ", + "_key": "90677f2e0a320", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "07b92ff1c1c2" + ], + "text": "MultiQC plugins", + "_key": "90677f2e0a321", + "_type": "span" + }, + { + "text": ", no special installation is needed. This will be hugely powerful for custom analysis and reporting. It also means that MultiQC becomes a first-class citizen for explorative analysis within notebooks and analysis apps.", + "_key": "90677f2e0a322", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0d4aca5310f8" + }, + { + "_key": "4bf457962649", + "markDefs": [ + { + "_type": "link", + "href": "https://multiqc.info/docs/usage/interactive/", + "_key": "8135552fbdf7" + }, + { + "href": "https://community.seqera.io/multiqc", + "_key": "ec397c4c15d7", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "See the new ", + "_key": "98837a87371a0", + "_type": "span" + }, + { + "text": "Using MultiQC in interactive environments", + "_key": "98837a87371a1", + "_type": "span", + "marks": [ + "8135552fbdf7" + ] + }, + { + "_type": "span", + "marks": [], + "text": " page to learn more about MultiQC Python functions. ", + "_key": "98837a87371a2" + }, + { + "text": "Let us know", + "_key": "98837a87371a3", + "_type": "span", + "marks": [ + "ec397c4c15d7" + ] + }, + { + "marks": [], + "text": " how you get on with this functionality - we’d love to see what you build!", + "_key": "98837a87371a4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Major performance improvements 🚅", + "_key": "343f5cc92fe80" + } + ], + "_type": "block", + "style": "h2", + "_key": "31398f87eeaf" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/rhpvorderman", + "_key": "fd09f4c005cf" + } + ], + "children": [ + { + "text": "In MultiQC v1.22 we’ve had a number of high-impact pull requests from ", + "_key": "17f6d6a138610", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "fd09f4c005cf" + ], + "text": "@rhpvorderman", + "_key": "17f6d6a138611", + "_type": "span" + }, + { + "marks": [], + "text": ". He did a deep-dive on the compression that MultiQC uses for embedding data within the HTML reports, switching the old ", + "_key": "17f6d6a138612", + "_type": "span" + }, + { + "_key": "19801d576faf", + "_type": "span", + "marks": [ + "code" + ], + "text": "lzstring" + }, + { + "text": " compression for a more up-to-date ", + "_key": "45969e812380", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "gzip", + "_key": "c5f9f1fce5cc", + "_type": "span" + }, + { + "text": " implementation, which made writing reports ", + "_key": "fa6d2402801f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "4x times faster", + "_key": "17f6d6a138613" + }, + { + "marks": [], + "text": ".", + "_key": "17f6d6a138614", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5817aa826cf0" + }, + { + "children": [ + { + "marks": [], + "text": "He also significantly optimized the file search, making it ", + "_key": "d3ed30f558930", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "54% faster", + "_key": "d3ed30f558931" + }, + { + "text": " on our benchmarks, and key modules. For example, ", + "_key": "d3ed30f558932", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "FastQC got 6x faster and uses 10x less memory", + "_key": "d3ed30f558933" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d3ed30f558934" + } + ], + "_type": "block", + "style": "normal", + "_key": "e355a0723a10", + "markDefs": [] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "688e13eb74b5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Taken together, comparing a typical v1.22 run against v1.21 shows that MultiQC is ", + "_key": "ca2a7d2cc9ea0" + }, + { + "text": "53% faster", + "_key": "ca2a7d2cc9ea1", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "ca2a7d2cc9ea2", + "_type": "span", + "marks": [], + "text": " and has a " + }, + { + "marks": [ + "strong" + ], + "text": "6x smaller peak-memory footprint", + "_key": "ca2a7d2cc9ea3", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". It’s well worth updating!", + "_key": "ca2a7d2cc9ea4" + } + ] + }, + { + "_key": "27a4647d4af0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.science.org/doi/full/10.1126/sciadv.aba1190", + "_key": "98689cf6da09" + } + ], + "children": [ + { + "marks": [], + "text": "To get these numbers for real-world scenarios, we tested some huge input datasets (many thanks to Felix Krueger for helping with these). For example, from ", + "_key": "bd242e39c19c0", + "_type": "span" + }, + { + "text": "Xing et. al. 2020", + "_key": "bd242e39c19c1", + "_type": "span", + "marks": [ + "98689cf6da09" + ] + }, + { + "marks": [], + "text": ":", + "_key": "bd242e39c19c2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "187c0ee80e97", + "asset": { + "_ref": "image-d8472852383a8de068c60e4b67cccb9401fda6e8-2202x1206-svg", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "_key": "26881f3f445b0", + "_type": "span", + "marks": [], + "text": "These three runs were run with identical inputs and generated essentially identical reports." + } + ], + "_type": "block", + "style": "normal", + "_key": "4c2e62b1a6ce", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "c5778edba7c9", + "markDefs": [], + "children": [ + { + "_key": "29c455e60d230", + "_type": "span", + "marks": [], + "text": "These improvements will be especially noticeable with large runs. Improvements are also especially significant in certain MultiQC modules, including FastQC (10x less peak memory), Mosdepth, and Kraken (~20x improvement in memory and CPU in MultiQC v1.24, larger improvements with more samples)." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "We hope that this makes MultiQC more usable at scale and makes your analysis pipelines run a little smoother!", + "_key": "0299272ee77e0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "16f4ee957baa" + }, + { + "_key": "03fd047a042d", + "markDefs": [], + "children": [ + { + "_key": "fe15ec5a699d0", + "_type": "span", + "marks": [], + "text": "Unit tests 🧪" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "9a1dcc7171ea", + "markDefs": [], + "children": [ + { + "_key": "8dac686af5890", + "_type": "span", + "marks": [], + "text": "Until now, MultiQC only had rudimentary end-to-end testing - each continuous integration test simply runs MultiQC on a range of test data and checks that it doesn’t crash (there are a few more bells and whistles, but that’s the essence of it). These CI tests have worked remarkably well, considering. However - they do not catch unintentional changes to data outputs and are limited in their scope." + } + ] + }, + { + "markDefs": [ + { + "_key": "8c10e89e187d", + "_type": "link", + "href": "https://docs.pytest.org/" + }, + { + "_type": "link", + "href": "https://docs.seqera.io/multiqc/development/modules#tests", + "_key": "49e1086b5601" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Version 1.23 of MultiQC introduced unit tests. These small, isolated tests are a cornerstone of modern software development. A suite of ", + "_key": "69fec00fe0b70" + }, + { + "marks": [ + "8c10e89e187d" + ], + "text": "pytest", + "_key": "69fec00fe0b71", + "_type": "span" + }, + { + "_key": "69fec00fe0b72", + "_type": "span", + "marks": [], + "text": " tests now cover most of the core library code. Pytest is also used to “just run” modules as before (with 90% code coverage!), but going forward we will require module authors to include a tests directory with custom detailed unit tests. See " + }, + { + "_key": "69fec00fe0b73", + "_type": "span", + "marks": [ + "49e1086b5601" + ], + "text": "Tests" + }, + { + "_type": "span", + "marks": [], + "text": " for more information.", + "_key": "69fec00fe0b74" + } + ], + "_type": "block", + "style": "normal", + "_key": "ea3631bb639a" + }, + { + "markDefs": [], + "children": [ + { + "text": "It’s a lot of work to add useful test coverage to such a large codebase, and anyone familiar with the topic will know that it’s a job that’s never done. However, now that we have a framework and pattern in place we’re hopeful that test coverage will steadily increase and code quality with it.", + "_key": "154f52b5a7560", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e632a2c061ef" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Refactoring and static typing 📐", + "_key": "751fd31f3f260" + } + ], + "_type": "block", + "style": "h2", + "_key": "62766583b084", + "markDefs": [] + }, + { + "style": "normal", + "_key": "a6ef8f12773a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "MultiQC v1.22 refactoring brings with it the first wave of Pydantic models in the back end. This unlocks run-time validation of plot config attributes - we found and fixed a lot of bugs with this already! The code looks very similar, but the Pydantic models use classes that allow most code IDEs to highlight errors as you write. Validation at run time also means that you catch typos right away, instead of wondering why your configuration is not being applied.", + "_key": "ddace25f43050" + } + ], + "_type": "block" + }, + { + "_key": "26d505ad4426", + "asset": { + "_ref": "image-fd6b80f3dcb5eaf7c243180d8f926e59b013795c-1175x545-svg", + "_type": "reference" + }, + "_type": "image" + }, + { + "_key": "135a4646454f", + "asset": { + "_ref": "image-2cec52b9fc023cf0214e91ae653af17ab68d8631-2237x634-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "_key": "aaf630994d490", + "_type": "span", + "marks": [], + "text": "Along similar lines, the core MultiQC library and test suite has had type annotations added throughout, complete with CI testing using " + }, + { + "marks": [ + "de93e9dea13b" + ], + "text": "mypy", + "_key": "aaf630994d491", + "_type": "span" + }, + { + "text": ". We will progressively add typing to all MultiQC modules over time. Typing also helps the MultiQC developer experience, with rich IDE integrations and earlier bug-catching.", + "_key": "aaf630994d492", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "91e612186bc9", + "markDefs": [ + { + "href": "https://mypy-lang.org/", + "_key": "de93e9dea13b", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "5369acf21e63", + "markDefs": [], + "children": [ + { + "_key": "5e015b0762a50", + "_type": "span", + "marks": [], + "text": "HighCharts removed 🗑" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In v1.20 we added support for using Plotly instead of HighCharts for graphs in MultiQC reports. We left the HighCharts code in place whilst we transitioned to the new library, in case people hit any major issues with Plotly. As of v1.22 the HighCharts support (via ", + "_key": "20b3965936730" + }, + { + "marks": [ + "code" + ], + "text": "--template highcharts", + "_key": "ffa5eceee420", + "_type": "span" + }, + { + "text": ") has been removed completely. See the ", + "_key": "5c85500d9de2", + "_type": "span", + "marks": [] + }, + { + "_key": "20b3965936731", + "_type": "span", + "marks": [ + "85116740be7f" + ], + "text": "MultiQC: A fresh coat of paint" + }, + { + "marks": [], + "text": " blog to find out more about this topic.", + "_key": "20b3965936732", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8c352ae0daf", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/multiqc-plotly/", + "_key": "85116740be7f" + } + ] + }, + { + "children": [ + { + "text": "Moving to seqera.io", + "_key": "330cd974225b0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "f7684112839e", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "3fb7ceae5313", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Since MultiQC joined the Seqera family in 2022, we’ve been steadily improving integration with other Seqera tools and websites. Last year, we launched the Seqera Community Forum with a dedicated MultiQC section, which has been a valuable resource for users. Recently, we’ve continued this effort by moving all MultiQC documentation to Seqera.io, providing a single, streamlined location for accessing information and searching across all Seqera tools. Old links will still redirect, ensuring a smooth transition.", + "_key": "e79f1010d2450", + "_type": "span" + } + ] + }, + { + "_key": "d4f6fd75df99", + "markDefs": [ + { + "href": "https://seqera.io/multiqc/", + "_key": "04cd3809c790", + "_type": "link" + }, + { + "_type": "link", + "href": "https://multiqc.info", + "_key": "205fbd181003" + } + ], + "children": [ + { + "_key": "44fc010597ec0", + "_type": "span", + "marks": [], + "text": "We’re also excited to announce that we’re launching a new MultiQC product page at " + }, + { + "text": "https://seqera.io/multiqc/", + "_key": "44fc010597ec1", + "_type": "span", + "marks": [ + "04cd3809c790" + ] + }, + { + "_key": "44fc010597ec2", + "_type": "span", + "marks": [], + "text": " with an updated design, which will replace " + }, + { + "_type": "span", + "marks": [ + "205fbd181003" + ], + "text": "https://multqc.info", + "_key": "44fc010597ec3" + }, + { + "marks": [], + "text": ". This fresh look aligns with the rest of the Seqera ecosystem, making it easier to explore MultiQC’s features and stay up to date with future developments.", + "_key": "44fc010597ec4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + } + ], + "tags": [ + { + "_ref": "ea6c309b-154f-45c3-9fda-650d7764b260", + "_type": "reference", + "_key": "933aa64152ba" + }, + { + "_type": "reference", + "_key": "0e00c52955ba", + "_ref": "be8b298c-af12-4b5f-89cd-d2e208580926" + } + ], + "author": { + "_ref": "phil-ewels", + "_type": "reference" + }, + "_type": "blogPost", + "_rev": "mvya9zzDXWakVjnX4hBcNe", + "_id": "28fbd463-3640-4195-8c8f-82cf183846f9", + "meta": { + "_type": "meta", + "shareImage": { + "asset": { + "_ref": "image-d7dd7dfbf392ebb35e2f6a2be71934efc944ccc4-1200x1200-png", + "_type": "reference" + }, + "_type": "image" + }, + "description": "Introducing grouped table rows with collapsed sub-samples! Also big performance improvements and a new ability to work as a Python library within scripts, notebooks and Python apps.", + "noIndex": false, + "slug": { + "current": "multiqc-grouped-samples", + "_type": "slug" + } + } + }, + { + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "meta": { + "description": "Today marks a major milestone in that journey as we release two new free and open resources for the community: Seqera Pipelines and Seqera Containers.", + "noIndex": false, + "slug": { + "current": "introducing-seqera-pipelines-containers", + "_type": "slug" + }, + "_type": "meta", + "shareImage": { + "asset": { + "_ref": "image-85ca91b4138fbab39962965a2ac2eec7e49514bf-4800x2700-png", + "_type": "reference" + }, + "_type": "image" + } + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Seqera is built on the promise that modern tooling and open software can improve scientists’ daily lives. We believe in empowering scientists and developers to focus on what they do best: groundbreaking research. Today marks a major milestone in that journey as we release two new free and open resources for the community: Seqera Pipelines and Seqera Containers.", + "_key": "a8a33347272f0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a558c16e7d96" + }, + { + "style": "normal", + "_key": "a5a7c42890d3", + "markDefs": [], + "children": [ + { + "_key": "1f5f2a98e9c80", + "_type": "span", + "marks": [], + "text": "These projects bring together the components bioinformaticians need into a simple interface, making it easy to find open-source pipelines to run or build a software container combining virtually any tools. By streamlining access to resources and fostering collaboration, we improve the velocity, quality, and reproducibility of your research." + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Seqera Pipelines: Guiding Your Research Journey", + "_key": "5d6bde0200f20", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "ee4ea263d7a6", + "markDefs": [] + }, + { + "style": "normal", + "_key": "0ee406b47b21", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/awesome-nextflow", + "_key": "0bac7a322b6a" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the early days of Nextflow, the ", + "_key": "6a2a8f59d99f0" + }, + { + "_key": "6a2a8f59d99f1", + "_type": "span", + "marks": [ + "0bac7a322b6a" + ], + "text": "“awesome-nextflow” GitHub repository" + }, + { + "_type": "span", + "marks": [], + "text": " was the go-to place to find pipelines. People would list their open-source workflows so that others could find one to match their data. Over time, the Nextflow community grew, and this particular resource became unmanageable. Projects such as nf-core have emerged with collections of workflows, but there are very many other high-quality Nextflow pipelines beyond nf-core that can be difficult to find.", + "_key": "6a2a8f59d99f2" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "4b279492ce7a0", + "_type": "span", + "marks": [], + "text": "Seqera Pipelines is the modern replacement for the " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "“awesome-nextflow”", + "_key": "4b279492ce7a1" + }, + { + "_type": "span", + "marks": [], + "text": " repo. We’ve put together a list of the best open-source workflows for you to search. We know from experience that finding high-quality pipelines is critical, so we’re using a tightly curated list of the very best workflows to begin with. Every pipeline comes with curated test data, so you can import into Seqera Platform and launch a test run in just a few clicks:", + "_key": "4b279492ce7a2" + } + ], + "_type": "block", + "style": "normal", + "_key": "b91d8e210a02", + "markDefs": [] + }, + { + "_type": "youtube", + "id": "KWw0NP-CT_s", + "_key": "659e5fb9c13f" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3aed490d3739", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3f97cf65f113" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/", + "_key": "58d9d8012ab0" + }, + { + "_key": "9a41867bc689", + "_type": "link", + "href": "https://github.com/seqeralabs/tower-cli" + }, + { + "_type": "link", + "href": "https://nf-co.re/docs/nf-core-tools/pipelines/launch", + "_key": "92b93540f2b1" + } + ], + "children": [ + { + "text": "Once you’ve found an interesting pipeline, you can easily dive into the details. We show key information on the pipeline details page and provide a one-click experience to add pipelines to your launchpad within Seqera Platform. If you’re more at home in the terminal, you can use the launch box to grab commands for ", + "_key": "72ba431d1b440", + "_type": "span", + "marks": [] + }, + { + "_key": "72ba431d1b441", + "_type": "span", + "marks": [ + "58d9d8012ab0" + ], + "text": "Nextflow" + }, + { + "_key": "72ba431d1b442", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "9a41867bc689" + ], + "text": "Seqera Platform CLI", + "_key": "72ba431d1b443" + }, + { + "_key": "72ba431d1b444", + "_type": "span", + "marks": [], + "text": ", and " + }, + { + "text": "nf-core/tools", + "_key": "72ba431d1b445", + "_type": "span", + "marks": [ + "92b93540f2b1" + ] + }, + { + "marks": [], + "text": ".", + "_key": "72ba431d1b446", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "82628c058ed8" + }, + { + "style": "normal", + "_key": "414a3516da55", + "markDefs": [], + "children": [ + { + "_key": "4e2e0870666b0", + "_type": "span", + "marks": [], + "text": "We have big plans for Seqera Pipelines. By prioritizing actively maintained pipelines that adhere to industry standards, we minimize the risk of researchers encountering obsolete or malfunctioning pipelines. As we improve our accuracy, we will open up the catalog to include greater numbers of workflows." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "7064064535a60", + "_type": "span", + "marks": [], + "text": "Discovering a workflow is only the first step of a journey. In the future, we will extend Seqera Pipelines with additional features, such as the ability to create collections of your favorite pipelines and discuss their usage – both to get help and to help others in the community. Seqera Pipelines is already the best place to find your next workflow, and it’s only going to get better." + } + ], + "_type": "block", + "style": "normal", + "_key": "1c4ddb148e2c" + }, + { + "markDefs": [], + "children": [ + { + "text": "Seqera Containers: The Magic of Reproducibility", + "_key": "8a75a73cb2f80", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "dd0249c1bab7" + }, + { + "_key": "b5ae89a0a5a4", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/podcast/2023/ep13_nextflow_10_years.html", + "_key": "122f56122a7d" + } + ], + "children": [ + { + "text": "Containers have transformed the research landscape, providing portable environments that encapsulate software, dependencies, and libraries – eliminating compatibility issues across various computing environments. Nextflow was a ", + "_key": "0052d9e9bd970", + "_type": "span", + "marks": [] + }, + { + "text": "very early adopter", + "_key": "0052d9e9bd971", + "_type": "span", + "marks": [ + "122f56122a7d" + ] + }, + { + "text": " of Docker and has provided first-class support for software containers for nearly a decade.", + "_key": "0052d9e9bd972", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "47493cbada63", + "markDefs": [ + { + "href": "https://biocontainers.pro/", + "_key": "618378cd6cb4", + "_type": "link" + }, + { + "href": "https://bioconda.github.io/", + "_key": "e82d6ce5c752", + "_type": "link" + } + ], + "children": [ + { + "text": "While using containers isn’t entirely without friction. Pipeline developers need to write Dockerfile scripts for each step in their workflow. Projects such as ", + "_key": "cddca7666cf60", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "618378cd6cb4" + ], + "text": "BioContainers", + "_key": "cddca7666cf61" + }, + { + "_type": "span", + "marks": [], + "text": " have greatly simplified this process with pre-built images for ", + "_key": "cddca7666cf62" + }, + { + "marks": [ + "e82d6ce5c752" + ], + "text": "Bioconda", + "_key": "cddca7666cf63", + "_type": "span" + }, + { + "marks": [], + "text": " tools but are somewhat limited, especially when multiple tools are needed in a single container. We set out to improve this experience with Wave: our open-source on-demand container provisioning service. Wave allows Nextflow developers to simply reference a set of conda packages or a bundled Dockerfile. When the pipeline runs, the container is built on the fly and can be targeted for the specific local environment that the workflow is running in.", + "_key": "cddca7666cf64", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_key": "a60f91c08427", + "_type": "link", + "href": "https://seqera.io/containers" + } + ], + "children": [ + { + "marks": [], + "text": "With ", + "_key": "19330cec0f8c0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "a60f91c08427" + ], + "text": "Seqera Containers", + "_key": "19330cec0f8c1" + }, + { + "text": ", we’re taking the experience of Wave one step further. Instead of browsing available images as you would with a traditional container registry, just type in the names of the tools you want to use. Clicking “Get container” returns a container URI instantly, which you can use for anything - Nextflow pipeline or not. The key difference with Seqera Containers is that the image is also stored in an image cache, with infrastructure provided by our friends at AWS. Subsequent requests for the same package set will return the same image, ensuring reproducibility across runs. The cache has no expiry date, so those images will still be there if you need to rerun your analysis in the future.", + "_key": "19330cec0f8c2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "86a0cd5ff0e7" + }, + { + "id": "mk67PjOIp8o", + "_key": "c6c73031246e", + "_type": "youtube" + }, + { + "style": "normal", + "_key": "ea5a4d906865", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Not only can you request any combination of packages, but you can also select architecture and image format. Builds with linux/arm64 architecture promise to open up analysis to new, more efficient compute platforms. Choosing Singularity leads to a native Singularity / Apptainer build with an OCI-compliant architecture and even a URL to download the ", + "_key": "867e4c63d88b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".sif", + "_key": "867e4c63d88b1" + }, + { + "marks": [], + "text": " file directly.", + "_key": "867e4c63d88b2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Clicking “View build details” for the container shows the full information of the Dockerfile, conda environment file, and build settings, as well as the complete build logs. Every container includes results from a security scan using ", + "_key": "2c3f04b3ed850" + }, + { + "_type": "span", + "marks": [ + "5dc792b284e9" + ], + "text": "Trivy", + "_key": "2c3f04b3ed851" + }, + { + "marks": [], + "text": " attached.", + "_key": "2c3f04b3ed852", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3518d0cbbefc", + "markDefs": [ + { + "_type": "link", + "href": "https://trivy.dev/", + "_key": "5dc792b284e9" + } + ] + }, + { + "style": "normal", + "_key": "8fadea99d6d1", + "markDefs": [], + "children": [ + { + "_key": "60ca627f3ac10", + "_type": "span", + "marks": [], + "text": "While the web interface is the easiest way to get started with Seqera Containers, it doesn’t end there. The same functionality extends to Nextflow and the Wave CLI. Just tell Wave to “freeze” with a set of conda packages, and the resulting image will be cached in the public Seqera Containers registry." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "845b89f38040", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Seqera Containers is a free service provided by Seqera and AWS. It does not require authentication of any kind to use, and is configured with very high rate limits so that nothing stops your pipeline from pulling 50 images all at once! We can’t wait to see how the entire bioinformatics community uses it, both Nextflow users and beyond.", + "_key": "060de8d675b40", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "1f7bbdd1a7fb", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "What lies ahead", + "_key": "a5f3661cee520", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "5139413842540", + "_type": "span", + "marks": [], + "text": "Pipelines and Containers represent just the beginning of Seqera’s vision to be the home of open science. We think that these two resources can have a real impact on researchers around the globe, and we’re excited to continue working with them to extend their functionality. We’re committed to collaborating with the community to focus on the features that you need, so do let us know what you think and what you want next!" + } + ], + "_type": "block", + "style": "normal", + "_key": "9bb85bc9c54e" + }, + { + "_type": "block", + "style": "normal", + "_key": "cb6fe74411f4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "f64f44ea540a0", + "_type": "span" + } + ] + } + ], + "_rev": "mAO9W5hBo57qoxiglmBcPn", + "_createdAt": "2024-05-23T07:01:07Z", + "title": "Empowering scientists with seamless access to bioinformatics resources", + "publishedAt": "2024-05-23T12:00:00.000Z", + "tags": [ + { + "_ref": "ea6c309b-154f-45c3-9fda-650d7764b260", + "_type": "reference", + "_key": "ef12481e08d5" + }, + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "508790ebf0f9" + } + ], + "_type": "blogPost", + "_id": "35e0b13e-aa5a-4018-88c5-6a175d477f1d", + "_updatedAt": "2024-05-28T14:18:22Z" + }, + { + "meta": { + "_type": "meta", + "description": "Call for grants 2021 aimed at R&D Projects in AI and other digital technologies and their integration into value chains", + "noIndex": false, + "slug": { + "current": "optimization-computation-resources-ML-AI", + "_type": "slug" + } + }, + "body": [ + { + "asset": { + "_ref": "image-22a6d646f122e9df55c154735882a2cb56ae7d87-1600x225-jpg", + "_type": "reference" + }, + "_type": "image", + "_key": "2ca9d274f836" + }, + { + "style": "h3", + "_key": "f01b00d90f54", + "markDefs": [], + "children": [ + { + "text": "Call for grants 2021 aimed at R&D projects in AI and other digital technologies and their integration into value chains", + "_key": "958028fa63b50", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6155ddde2c0a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9af02a5030750", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.red.es/es", + "_key": "64b2513c3ed1" + }, + { + "href": "https://commission.europa.eu/funding-tenders/find-funding/eu-funding-programmes/european-regional-development-fund-erdf_en#:~:text=The%20European%20Regional%20Development%20Fund,dedicated%20national%20or%20regional%20programmes.", + "_key": "19a9e11e0b53", + "_type": "link" + }, + { + "_type": "link", + "href": "https://next-generation-eu.europa.eu/index_en", + "_key": "47482c371588" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The project 'Optimization of computational resources for HPC workloads in the cloud through ML/AI' by Seqera Labs S.L. has been funded by the ", + "_key": "de8345dee2320" + }, + { + "_type": "span", + "marks": [ + "19a9e11e0b53" + ], + "text": "European Regional Development Fund (ERDF) ", + "_key": "fda11d32da9b" + }, + { + "_type": "span", + "marks": [], + "text": "of the ", + "_key": "7ee991ead9e1" + }, + { + "marks": [ + "47482c371588" + ], + "text": "European Union", + "_key": "34fa535e4601", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", coordinated and managed by ", + "_key": "b620d2b78c0b" + }, + { + "text": "red.es", + "_key": "41eb06c9ae78", + "_type": "span", + "marks": [ + "64b2513c3ed1" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", aiming to carry out the development of technological entrepreneurship and technological demand within the framework of the Strategic Action of Digital Economy and Society of the State R&D&I Program oriented towards societal challenges.", + "_key": "7eae1ca7a7b9" + } + ], + "_type": "block", + "style": "normal", + "_key": "09aebbc4262f" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4c8c550d6dea" + } + ], + "_type": "block", + "style": "normal", + "_key": "e210c007ebd3" + }, + { + "style": "h3", + "_key": "b0e67983a64e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Project Description", + "_key": "78e9e7fb0ba20" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "b608cc1805b8", + "markDefs": [], + "children": [ + { + "text": "The project aims to develop a machine learning model to optimize workflow execution in the cloud, ensuring efficient use of resources. This enables users to control execution costs and achieve significant savings. Through this project's implementation, it is expected that the application of this technology will not only reduce costs and execution time but also minimize the environmental impact of computing tasks. Seqera Labs plays a key role in advancing personalized medicine and the discovery of new drugs.", + "_key": "71c829c8b959", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "eaac793c2e93", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "File number: 2021/C005/00149902", + "_key": "96acc3bfbd660", + "_type": "span" + } + ], + "level": 1 + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "de5b39af36aa", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "8b0400c9463d", + "_type": "span", + "marks": [], + "text": "Total investment: €1,165,466.66" + } + ] + }, + { + "_key": "0c274b82d229", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Amount of aid: €669,279.99", + "_key": "034a4c2dc3db" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nConvocatoria de ayudas 2021 destinadas a proyectos de investigación y desarrollo en IA y otras technologías digitales y su integración en las cadenas de valor", + "_key": "a0489e395fa60" + } + ], + "_type": "block", + "style": "h3", + "_key": "39cdd31e17dd", + "markDefs": [] + }, + { + "_key": "a59731faab3f", + "markDefs": [ + { + "_type": "link", + "href": "https://www.red.es/es", + "_key": "44d9a8fb6c5c" + }, + { + "href": "https://commission.europa.eu/funding-tenders/find-funding/eu-funding-programmes/european-regional-development-fund-erdf_en#:~:text=The%20European%20Regional%20Development%20Fund,dedicated%20national%20or%20regional%20programmes.", + "_key": "5444a1f98c6a", + "_type": "link" + }, + { + "_key": "a8150942df91", + "_type": "link", + "href": "https://next-generation-eu.europa.eu/index_en" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "El proyecto de ‘Optimización de los recursos computacionales para las cargas de trabajo de HPC en la nube mediante ML/AI’ de Seqera Labs S.L. ha sido financiado por el ", + "_key": "21a068641a3c0" + }, + { + "text": "Fondo Europeo de Desarrollo Regional (FEDER)", + "_key": "40aa3b3d0ecc", + "_type": "span", + "marks": [ + "5444a1f98c6a" + ] + }, + { + "text": " de la ", + "_key": "78ce1222a23b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "a8150942df91" + ], + "text": "Unión Europea", + "_key": "e9cd306a59f1" + }, + { + "text": ", coordinada y gestionada por ", + "_key": "1ea233b7e3bc", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "44d9a8fb6c5c" + ], + "text": "red.es", + "_key": "7045f57ce29d" + }, + { + "_type": "span", + "marks": [], + "text": ", con el objetivo llevar a cabo el desarrollo del emprendimiento tecnológico y la demanda tecnológica, en el marco de la Acción Estratégica de Economía y Sociedad Digital del Programa Estatal de I+D+i orientada a retos de la sociedad.", + "_key": "e3e2efd75bba" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5f02caf6ec9e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a0b3a9584258" + }, + { + "style": "h3", + "_key": "7c937551c3d3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Descripción del proyecto", + "_key": "8916706675c00", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "El proyecto busca desarrollar un modelo de machine learning para optimizar la ejecución de flujos de trabajo en la nube, garantizando el uso eficiente de recursos. Esto permite a los usuarios controlar los costes de ejecución y lograr ahorros significativos. Con la presente ejecución del proyecto se espera que la aplicación de esta tecnología no solo reduzca los costes y el tiempo de ejecución, sino que también minimice el impacto ambiental de los trabajos de computación. Seqera Labs desempeña un papel fundamental en el avance de la medicina personalizada y el descubrimiento de nuevos medicamentos.", + "_key": "5b5110cae217", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "51fede68e0f0" + }, + { + "children": [ + { + "text": "Expediente nº: 2021/C005/00149902", + "_key": "ad6851bdbc700", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d7bc90b57c58", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "f01aac1dae9b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Inversión total: 1.165.466,66 €", + "_key": "aaf0c99b85540" + } + ], + "level": 1 + }, + { + "markDefs": [], + "children": [ + { + "_key": "5e32e917e71e0", + "_type": "span", + "marks": [], + "text": "Importe de la ayuda: 669.279,99 €" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c724b1e64798", + "listItem": "bullet" + } + ], + "tags": [ + { + "_type": "reference", + "_key": "ce64efeb3685", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058" + }, + { + "_key": "40689d831034", + "_ref": "d356a4d5-06c1-40c2-b655-4cb21cf74df1", + "_type": "reference" + } + ], + "_createdAt": "2024-07-26T11:11:53Z", + "_rev": "0HV4XeadlxB19r3p3EDEa1", + "title": "Seqera's project on the optimization of computational resources for HPC workloads in the cloud through ML/AI has been funded by the European Union", + "publishedAt": "2024-06-05T13:38:00.000Z", + "_type": "blogPost", + "author": { + "_ref": "a7e6fb2d-94cb-4bcd-bcbd-120e379b2298", + "_type": "reference" + }, + "_id": "38329391-8e62-4aba-b4fa-32c658e33b13", + "_updatedAt": "2024-08-23T14:06:31Z" + }, + { + "_updatedAt": "2024-10-11T07:26:17Z", + "_id": "4ec4b56d-7cc0-4395-bb84-83f0e70b3f65", + "body": [ + { + "style": "normal", + "_key": "f9560979d244", + "markDefs": [], + "children": [ + { + "text": "We are excited to launch our new ", + "_key": "2416087f7fc20", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Step-by-Step blog series ", + "_key": "b54b44c628b4" + }, + { + "_key": "c48b795647da", + "_type": "span", + "marks": [], + "text": "on running Nextflow pipelines in Seqera Platform. With accompanying technical guides, the series also demonstrates how to create and configure environments for flexible tertiary analysis and troubleshooting with Data Studios." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8df044584c5b", + "markDefs": [ + { + "href": "https://nf-co.re/rnaseq/3.14.0/", + "_key": "89cc513e49f2", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "First up: bulk RNA sequencing (RNA-Seq) analysis with the popular ", + "_key": "9e78ccaa040f0", + "_type": "span" + }, + { + "text": "nf-core/rnaseq pipeline", + "_key": "9e78ccaa040f1", + "_type": "span", + "marks": [ + "89cc513e49f2" + ] + }, + { + "_key": "9e78ccaa040f2", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "_key": "9db3354d0bc0", + "asset": { + "_ref": "image-86381b024e7fed16914933c27bbe38ccfd8e1218-2265x946-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "text": "The challenge of bulk RNA-Seq analysis", + "_key": "0a14112491e40", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "86b1f5b85f80", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "fd6999649d8e", + "markDefs": [ + { + "_key": "42ba36bdabd3", + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406561/" + }, + { + "_key": "46970291c5f9", + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/#pone.0278609.ref002" + } + ], + "children": [ + { + "marks": [], + "text": "A single RNA-Seq experiment can generate ", + "_key": "b7221a3c3d190", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "42ba36bdabd3" + ], + "text": "gigabytes, or even terabytes", + "_key": "b7221a3c3d191" + }, + { + "text": ", of raw data. Translating this data into meaningful scientific results demands ", + "_key": "b7221a3c3d192", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "46970291c5f9" + ], + "text": "substantial computational power, automation, and storage", + "_key": "b7221a3c3d193" + }, + { + "text": ".", + "_key": "b7221a3c3d194", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "b5e33c75a3a6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "197efa67c9e40", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "9d6e316ffb52", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As data volumes continue to grow, analysis becomes increasingly complex, especially when leveraging public resources while maintaining full sovereignty over your data. The solution?", + "_key": "66e0cb26931d0" + }, + { + "marks": [ + "strong" + ], + "text": " Seqera — a centralized bio data stack", + "_key": "66e0cb26931d1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for bulk RNA-Seq analysis.", + "_key": "66e0cb26931d2" + } + ] + }, + { + "style": "normal", + "_key": "8191c9a75257", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "60fff7fa9e5d0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this blog post, we provide a step-by-step guide to analyze RNA-Seq data with Seqera, from quality control to differential expression analysis. We also demonstrate how to perform downstream analysis and visualize your data in a unified location.", + "_key": "99674b059d630" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f23b3e52023", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "536666ed7688", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "93b49ab6c33f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "blockquote", + "_key": "f50c329381fa", + "markDefs": [ + { + "href": "https://hubs.la/Q02T26c10", + "_key": "c560b9e28fb8", + "_type": "link" + } + ], + "children": [ + { + "text": "Check out the full", + "_key": "b3a027e204c0", + "_type": "span", + "marks": [] + }, + { + "text": " ", + "_key": "8f1f21426b6b", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "81f13e2f0aa9", + "_type": "span", + "marks": [ + "strong", + "c560b9e28fb8" + ], + "text": "RNA-Seq guide" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " ", + "_key": "6659f0e1d752" + }, + { + "_type": "span", + "marks": [], + "text": "now", + "_key": "4c954b7253b7" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "bb9706570246", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f530ea7976a", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Perform bulk RNA-Seq analysis in Seqera ", + "_key": "e0c48b21920f0" + } + ], + "_type": "block", + "style": "h2", + "_key": "c9caa0f35d6d" + }, + { + "markDefs": [], + "children": [ + { + "_key": "af84e8b35d8a", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "063186af7ec2" + }, + { + "children": [ + { + "marks": [], + "text": "1. Add a compute environment", + "_key": "4d8c8e640876", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "9e533abca7ae", + "markDefs": [] + }, + { + "children": [ + { + "_key": "d55935c16025", + "_type": "span", + "marks": [], + "text": "In Seqera, you are not limited to hosted compute solutions. Add and configure your choice of cloud or HPC compute environments tailored to your analysis needs in your organization workspace.\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "fdb30e0a53c3", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://deploy-preview-131--seqera-docs.netlify.app/platform/24.1.1/getting-started/rnaseq#rna-seq-data-and-requirements", + "_key": "231c3c1f5d6e", + "_type": "link" + } + ], + "children": [ + { + "_key": "6630bd2611cb0", + "_type": "span", + "marks": [], + "text": "💡 " + }, + { + "text": "Hint: ", + "_key": "d5d5e50eb8af", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "e70a96b76c37", + "_type": "span", + "marks": [], + "text": "Depending on the number of samples and the sequencing depth of your input data, select the desired " + }, + { + "_key": "6630bd2611cb1", + "_type": "span", + "marks": [ + "231c3c1f5d6e" + ], + "text": "compute and storage recommendations" + }, + { + "_key": "6630bd2611cb2", + "_type": "span", + "marks": [], + "text": " for your RNA-Seq analysis." + } + ], + "_type": "block", + "style": "blockquote", + "_key": "d343783891dd" + }, + { + "_type": "image", + "_key": "7c3d10af89b6", + "asset": { + "_type": "reference", + "_ref": "image-11f5e2e5a1fdf1554329af5843be890dcf7f60b0-2452x1080-gif" + } + }, + { + "children": [ + { + "marks": [], + "text": "See the ", + "_key": "3340851e8d700", + "_type": "span" + }, + { + "marks": [ + "92d7b15144de", + "strong" + ], + "text": "full RNASeq guide", + "_key": "c394ca815f2b", + "_type": "span" + }, + { + "_key": "e01af32724dd", + "_type": "span", + "marks": [], + "text": " for AWS Batch compute environment configuration steps." + } + ], + "_type": "block", + "style": "blockquote", + "_key": "4f47a0ac33d5", + "markDefs": [ + { + "href": "https://hubs.la/Q02T26c10", + "_key": "92d7b15144de", + "_type": "link" + } + ] + }, + { + "_key": "b82d885bb30e", + "markDefs": [], + "children": [ + { + "_key": "338b9773274b", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "3277fa6e7a1e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "2. Add the nf-core/rnaseq pipeline to your workspace", + "_key": "f09af77e58640" + } + ] + }, + { + "_key": "8589cc2e751f", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/pipelines/", + "_key": "9dabd7634c9e" + } + ], + "children": [ + { + "_key": "fff87197a1c90", + "_type": "span", + "marks": [], + "text": "Quickly locate and import the nf-core/rnaseq pipeline from " + }, + { + "text": "Seqera Pipelines", + "_key": "ee81bf6bc3d1", + "_type": "span", + "marks": [ + "9dabd7634c9e" + ] + }, + { + "text": ", the largest curated open source repository of Nextflow pipelines.\n", + "_key": "5490456192c1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "af17be5a781e", + "asset": { + "_type": "reference", + "_ref": "image-cbd868250d3235cc42d5d2b9afed55cf4a51afc4-2452x1080-gif" + } + }, + { + "style": "normal", + "_key": "9097e2b50848", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "c923b38b50c7" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "b828d3ebe44c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "3. Add your input data", + "_key": "fef4b73d0a460", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d3e46de119d6" + } + ], + "_type": "block", + "style": "normal", + "_key": "29ca750385d3" + }, + { + "markDefs": [ + { + "_key": "0b4ee73ccb98", + "_type": "link", + "href": "https://docs.seqera.io/platform/23.3/data/data-explorer" + }, + { + "_key": "3d81dbad7494", + "_type": "link", + "href": "https://docs.seqera.io/platform/23.2/datasets/overview" + } + ], + "children": [ + { + "marks": [], + "text": "Easily access your RNA-Seq data directly from cloud storage with ", + "_key": "9219b4d669940", + "_type": "span" + }, + { + "marks": [ + "0b4ee73ccb98" + ], + "text": "Data Explorer", + "_key": "66f2139acb5c", + "_type": "span" + }, + { + "marks": [], + "text": ", or upload your samplesheets as CSV or TSV files with ", + "_key": "93a5a85286cc", + "_type": "span" + }, + { + "_key": "d7ebdb1b9163", + "_type": "span", + "marks": [ + "3d81dbad7494" + ], + "text": "Seqera Datasets" + }, + { + "marks": [], + "text": ".", + "_key": "174bdaa61b05", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d3e6240f6a5b" + }, + { + "style": "normal", + "_key": "2e919cf9fcfe", + "markDefs": [], + "children": [ + { + "_key": "220e46a63df8", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-c4001e1a1358d7824560347d93e5f73380c2ecbc-2842x1430-gif", + "_type": "reference" + }, + "_type": "image", + "_key": "014b6b9185b0" + }, + { + "style": "normal", + "_key": "b670539c67cd", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "76b62d63229a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "14b3055d38ba", + "markDefs": [ + { + "href": "https://docs.seqera.io/platform/24.1/getting-started/quickstart-demo/add-data", + "_key": "ee3e29b1836a", + "_type": "link" + } + ], + "children": [ + { + "text": "For more information on how to add samplesheets or other data to your workspace, see ", + "_key": "34d2e93139c10", + "_type": "span", + "marks": [] + }, + { + "text": "Add data", + "_key": "34d2e93139c11", + "_type": "span", + "marks": [ + "ee3e29b1836a", + "strong" + ] + }, + { + "text": ".", + "_key": "34d2e93139c12", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n4. Launch your RNA-Seq analysis", + "_key": "d8ffc77eb2eb0" + } + ], + "_type": "block", + "style": "h3", + "_key": "200f421da11f" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "140e7fde60d1" + } + ], + "_type": "block", + "style": "normal", + "_key": "7b5a1dd73392" + }, + { + "_key": "297cf7a52db2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "So far, you have:\n", + "_key": "7817985b2a6f0" + }, + { + "_key": "7817985b2a6f1", + "_type": "span", + "marks": [], + "text": "✔ Created a compute environment\n✔ Added a pipeline to your workspace\n✔ Made your RNA-Seq data accessible" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_type": "block", + "style": "normal", + "_key": "66b8edceceae", + "markDefs": [], + "children": [ + { + "_key": "b6ded3b5950b", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7372b7e9b188", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "With your compute environment, pipeline, and data all accessible in your Seqera workspace, you are now ready to launch your analysis.", + "_key": "327b97a200040", + "_type": "span" + } + ] + }, + { + "_key": "9e9e369ea4ea", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6640e6b2d8ad", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "4e5386177db6", + "asset": { + "_type": "reference", + "_ref": "image-ec22f1a3f3bf30daa89a6e2299af6d90e324f5f1-2452x1080-gif" + } + }, + { + "markDefs": [], + "children": [ + { + "_key": "464c6a3ffbbd", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "feeede0ec4b6" + }, + { + "_key": "182c2347f18d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "5. Monitor your pipeline run", + "_key": "020fdd0a93170", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "_key": "554f6ea184ee", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9a7c4161c7b5" + }, + { + "_key": "d59004f41672", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Monitor your RNA-Seq analysis in real-time with aggregated statistics, workflow metrics, execution logs, and task details.", + "_key": "120b0356a9ef0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "4912e62172b9", + "asset": { + "_type": "reference", + "_ref": "image-9fd15d225aeb54b8c2841bc74a54e42a5c8bf410-2844x1390-gif" + } + }, + { + "style": "normal", + "_key": "4aec37e32e6f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "394eff86aae5" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "562e0b16234f", + "markDefs": [], + "children": [ + { + "_key": "00861713cf910", + "_type": "span", + "marks": [], + "text": "6. Visualize results in a single, shareable report" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "46601358193a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ea18e816a43b" + }, + { + "style": "normal", + "_key": "59e41678413f", + "markDefs": [], + "children": [ + { + "text": "Generate a single HTML report with MultiQC for your RNA-Seq analysis to assess the integrity of your results, including statistics, alignment scores, and quality control metrics. Easily share your findings with collaborators via the report URL.", + "_key": "c4666f11149d0", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a8cf58fd9ac7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b19eb6571ffe" + }, + { + "asset": { + "_ref": "image-1adf78a2589c3429a67b2d2935dc62ac0139e06c-2452x1080-gif", + "_type": "reference" + }, + "_type": "image", + "_key": "ce897e818b5d" + }, + { + "children": [ + { + "text": "", + "_key": "8bd8a33eaca0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "bc8b22dc8bcb", + "markDefs": [] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "86269ae87f15", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "💡", + "_key": "836844ce5743" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Hint:", + "_key": "f46f32f1d643" + }, + { + "text": " Easily share your findings with collaborators via the report URL.", + "_key": "57892c4eb147", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "bddab7973f06", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n7. Perform interactive downstream analysis adjacent to your pipeline outputs", + "_key": "b25adfd756ce0", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "73f5877cf28c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "bb7ed0bd126d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "d20a88decf2d", + "_type": "span", + "marks": [], + "text": "RNA-Seq analysis often requires human interpretation or further downstream analysis of pipeline outputs. For example, using " + }, + { + "marks": [ + "strong" + ], + "text": "DESeq2", + "_key": "ee61aae39e891", + "_type": "span" + }, + { + "marks": [], + "text": " for differential gene expression analysis.", + "_key": "ee61aae39e892", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0fd3c84dff27", + "markDefs": [] + }, + { + "style": "normal", + "_key": "f200cfbaf920", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/24.1/data/data-studios", + "_key": "43bb4dfea049" + } + ], + "children": [ + { + "marks": [], + "text": "Bring interactive analytical notebook environments (RStudio, Jupyter, VSCode) adjacent to your data with ", + "_key": "764a091e4b5c0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "43bb4dfea049" + ], + "text": "Seqera’s Data Studios", + "_key": "764a091e4b5c1" + }, + { + "marks": [], + "text": " and perform downstream analysis as if you were running locally.", + "_key": "764a091e4b5c2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "fc4e3a11cc58", + "asset": { + "_type": "reference", + "_ref": "image-9fed530cfba0aa3bd72f477449603e8bded83f09-2452x1080-gif" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "af81eb9ab77f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "d56aeedbf5a4", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Check out the ", + "_key": "153ee960e8ee0" + }, + { + "_type": "span", + "marks": [ + "c027a3adceef", + "strong" + ], + "text": "full RNASeq guide", + "_key": "f2f0efceb629" + }, + { + "text": " ", + "_key": "abc753492ace", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": "now", + "_key": "cba517eb8e9c" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "73600369d6a7", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02T26c10", + "_key": "c027a3adceef" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "ca0db6380e4d", + "markDefs": [], + "children": [ + { + "_key": "422d6784ced7", + "_type": "span", + "marks": [ + "strong" + ], + "text": "\nTry Seqera for free" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e30a2b48e8c5" + } + ], + "_type": "block", + "style": "normal", + "_key": "814ff460cdd6" + }, + { + "_type": "block", + "style": "normal", + "_key": "f8ca9ce7dfd1", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "By leveraging cloud-native technology, Seqera bridges the gap between experimental data and computational analysis, allowing you to accelerate the time from data generation to meaningful scientific insights.", + "_key": "f115a7417898", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_key": "e87779c2247d", + "_type": "link", + "href": "https://hubs.la/Q02T26TB0" + } + ], + "children": [ + { + "text": "Sign-up", + "_key": "8e509cdc34581", + "_type": "span", + "marks": [ + "e87779c2247d", + "strong" + ] + }, + { + "text": " for free", + "_key": "22644e6e6b12", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "d6ccd91918ea" + } + ], + "_createdAt": "2024-10-02T07:26:47Z", + "author": { + "_type": "reference", + "_ref": "7691d57c-16a2-4ca7-a29a-fa5d9b158a3b" + }, + "tags": [ + { + "_key": "e4630a226ba3", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference" + } + ], + "meta": { + "noIndex": false, + "slug": { + "_type": "slug", + "current": "step-by-step-rna-seq" + }, + "_type": "meta", + "description": "We are excited to launch our new Step-by-Step blog series on running Nextflow pipelines in Seqera Platform. With accompanying technical guides, the series also demonstrates how to create and configure environments for flexible tertiary analysis and troubleshooting with Data Studios." + }, + "_type": "blogPost", + "title": "Step-by-Step Series: RNA-Seq analysis in Seqera", + "_rev": "hf9hwMPb7ybAE3bqEITLMZ", + "publishedAt": "2024-10-11T07:54:00.000Z" + }, + { + "_id": "561ca06ac707", + "body": [ + { + "children": [ + { + "marks": [ + "em" + ], + "text": "Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution.", + "_key": "aa74c907fb89", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5de644223001", + "markDefs": [] + }, + { + "style": "normal", + "_key": "fba2c75d251d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1e58c8a15fb2" + } + ], + "_type": "block" + }, + { + "_key": "50833a8d465d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Modern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.", + "_key": "5ad57d04cb9d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f171be4200cf" + } + ], + "_type": "block", + "style": "normal", + "_key": "df4dbb73e883" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "During several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.", + "_key": "85b35fa626c4" + } + ], + "_type": "block", + "style": "normal", + "_key": "84ce0feaea47", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d043f09e00b4" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca94bc941408" + }, + { + "style": "normal", + "_key": "974f1a1cdfa3", + "markDefs": [ + { + "_key": "99165958e6b5", + "_type": "link", + "href": "http://www.github.com/cbcrg/lncrna-annotation-nf" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Creating the Nextflow pipeline (", + "_key": "155a8a08d8cd" + }, + { + "_key": "357c4685588b", + "_type": "span", + "marks": [ + "99165958e6b5" + ], + "text": "here" + }, + { + "_key": "f3317867e3c0", + "_type": "span", + "marks": [], + "text": ") in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "2a5c98bf3a96", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab6f59d351cb" + }, + { + "markDefs": [ + { + "_key": "905a8bc500ad", + "_type": "link", + "href": "https://www.docker.com/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the ", + "_key": "8390ee0ee4e6" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "long", + "_key": "c58b7dc20cce" + }, + { + "text": " dependency tree. The solution was to turn to ", + "_key": "e384258a5c3f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "905a8bc500ad" + ], + "text": "Docker", + "_key": "48755f8b6d14" + }, + { + "_key": "236e84a2092d", + "_type": "span", + "marks": [], + "text": " containers." + } + ], + "_type": "block", + "style": "normal", + "_key": "004440881a96" + }, + { + "style": "normal", + "_key": "55e482405e7c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f4792876a9aa" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "df983b305d4f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.", + "_key": "8fe4f707201e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "649e13290a13" + } + ], + "_type": "block", + "style": "normal", + "_key": "64ccdad0c58d" + }, + { + "children": [ + { + "_key": "f8e4f2418ada", + "_type": "span", + "marks": [], + "text": "###Docker Installation" + } + ], + "_type": "block", + "style": "normal", + "_key": "9daaf61343a0", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "7dbae6fbfa16", + "children": [ + { + "_type": "span", + "text": "", + "_key": "22f03df3d9b5" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://docs.docker.com/engine/installation", + "_key": "b39b383b61e5" + }, + { + "_type": "link", + "href": "https://blog.docker.com/2016/02/docker-engine-1-10-security/", + "_key": "1664943865ae" + } + ], + "children": [ + { + "_key": "3af57ef1c497", + "_type": "span", + "marks": [], + "text": "By far the most challenging issue is the installation of Docker. For local installations, the " + }, + { + "text": "process is relatively straight forward", + "_key": "29497e07ff62", + "_type": "span", + "marks": [ + "b39b383b61e5" + ] + }, + { + "text": ". However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with ", + "_key": "bac8833f273e", + "_type": "span", + "marks": [] + }, + { + "text": "updates addressing these issues", + "_key": "f4e68c0049e2", + "_type": "span", + "marks": [ + "1664943865ae" + ] + }, + { + "text": ".", + "_key": "ebdcec8ebe01", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a438411f6220" + }, + { + "children": [ + { + "text": "", + "_key": "6ea9c938cc17", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "369a356018e0" + }, + { + "style": "normal", + "_key": "9c82fe0136e7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "That being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.", + "_key": "f06c6b5ed104" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "9f5f313834ae", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "11b1347afcab" + }, + { + "_type": "block", + "style": "normal", + "_key": "3640fc87e1c5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.", + "_key": "0b77a23d5bf7" + } + ] + }, + { + "_key": "9edd5abef435", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b0ad6ffae120" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e04747c2e377", + "code": "docker pull cbcrg/lncrna_annotation\nnextflow run cbcrg/lncrna-annotation-nf -profile test", + "_type": "code" + }, + { + "children": [ + { + "text": "[If the following does not work, there could be a problem with your Docker installation.]", + "_key": "fb8752c7e000", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0fc16192bebe", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0af154258b87" + } + ], + "_type": "block", + "style": "normal", + "_key": "e0523eff522a" + }, + { + "style": "normal", + "_key": "773b9de99fad", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and run it using the test data included with it.", + "_key": "36689d3a632c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0973e8d341fc" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ba84ebe36e1" + }, + { + "_key": "1f30f62bc089", + "markDefs": [], + "children": [ + { + "text": "###The Dockerfile", + "_key": "3a33f2cb54af", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "50364dafcb96" + } + ], + "_type": "block", + "style": "normal", + "_key": "4cb45a2ade99" + }, + { + "style": "normal", + "_key": "3f1d99c7b705", + "markDefs": [], + "children": [ + { + "text": "The ", + "_key": "5f45a4596c7c", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "Dockerfile", + "_key": "6e92add363fc", + "_type": "span" + }, + { + "_key": "908e792d54df", + "_type": "span", + "marks": [], + "text": " contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "eedc860980f3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb6597312e37" + }, + { + "children": [ + { + "_key": "b0b033a77a83", + "_type": "span", + "marks": [], + "text": "We begin by creating a file " + }, + { + "text": "Dockerfile", + "_key": "69aa3263d8b0", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " in the Nextflow project directory. The Dockerfile begins with:", + "_key": "ea7e45e2295a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d4223ee66e84", + "markDefs": [] + }, + { + "style": "normal", + "_key": "b6aef4e4bff6", + "children": [ + { + "text": "", + "_key": "dd2d05610c5b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "c95932bd73bd", + "code": "# Set the base image to debian jessie\nFROM debian:jessie\n\n# File Author / Maintainer\nMAINTAINER Evan Floden " + }, + { + "markDefs": [], + "children": [ + { + "text": "This sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.", + "_key": "dbd6ec0da776", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dd72b4cb8f73" + }, + { + "children": [ + { + "_key": "8ce23ba404a7", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e3e22b6493fa" + }, + { + "_key": "24d492f7dd06", + "markDefs": [], + "children": [ + { + "text": "Next we update the repository sources and install some essential tools such as ", + "_key": "883b5be27cf1", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "wget", + "_key": "de93d23dcc24" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "a75f0d48042f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "perl", + "_key": "b8f1f6977f76" + }, + { + "_key": "3d2e30dbd5be", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0b388a47cc16", + "children": [ + { + "_type": "span", + "text": "", + "_key": "74087f39767c" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "0781f0913220", + "code": "RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ..." + }, + { + "_type": "block", + "style": "normal", + "_key": "3ca70fafd6b8", + "markDefs": [], + "children": [ + { + "_key": "82c7900bb435", + "_type": "span", + "marks": [], + "text": "Notice that we use the command " + }, + { + "text": "RUN", + "_key": "7029c2127e5e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " before each line. The ", + "_key": "cc075c2808b5" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "RUN", + "_key": "5372b2fbc07e" + }, + { + "marks": [], + "text": " instruction executes commands as if they are performed from the Linux shell.", + "_key": "54c24028d590", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "2f4253d9b870" + } + ], + "_type": "block", + "style": "normal", + "_key": "24e6cf4eeaad" + }, + { + "_key": "ac0cc4e414e7", + "markDefs": [ + { + "_type": "link", + "href": "https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/", + "_key": "3b99f1c6e0d0" + }, + { + "_type": "link", + "href": "https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/", + "_key": "ee681c47a630" + } + ], + "children": [ + { + "marks": [], + "text": "Also is good practice to group as many as possible commands in the same ", + "_key": "a715e201a410", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "RUN", + "_key": "4c0542b30503" + }, + { + "_type": "span", + "marks": [], + "text": " statement. This reduces the size of the final Docker image. See ", + "_key": "cd0129fc2cb4" + }, + { + "_type": "span", + "marks": [ + "3b99f1c6e0d0" + ], + "text": "here", + "_key": "95753b3703a7" + }, + { + "marks": [], + "text": " for these details and ", + "_key": "b3d6166d7b40", + "_type": "span" + }, + { + "marks": [ + "ee681c47a630" + ], + "text": "here", + "_key": "ea9f63a37e2f", + "_type": "span" + }, + { + "_key": "fec090986d03", + "_type": "span", + "marks": [], + "text": " for more best practices." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "24659e48c3e7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9f35046732fb" + } + ], + "_type": "block" + }, + { + "_key": "57e5a413a943", + "markDefs": [ + { + "href": "http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo", + "_key": "d68e3d739fed", + "_type": "link" + } + ], + "children": [ + { + "_key": "ab9ae2c48fd3", + "_type": "span", + "marks": [], + "text": "Next we can specify the install of the required perl modules using " + }, + { + "text": "cpan minus", + "_key": "376a38ae89cc", + "_type": "span", + "marks": [ + "d68e3d739fed" + ] + }, + { + "text": ":", + "_key": "b82c42d7d1f5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "a23d9bbf5ef9", + "children": [ + { + "text": "", + "_key": "0b5c9131deb9", + "_type": "span" + } + ] + }, + { + "_type": "code", + "_key": "e7530c3f6dba", + "code": "# Install perl modules\nRUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ..." + }, + { + "children": [ + { + "_key": "c3ff2167e3c1", + "_type": "span", + "marks": [], + "text": "We can give the instructions to download and install software from GitHub using:" + } + ], + "_type": "block", + "style": "normal", + "_key": "83711b5bfb64", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "6891af5db4de" + } + ], + "_type": "block", + "style": "normal", + "_key": "00fd8f533a9a" + }, + { + "_key": "ac765553f6ad", + "code": "# Install Star Mapper\nRUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "5387c5d1aae0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We can add custom Perl modules and specify environmental variables such as ", + "_key": "21f01a7dee08" + }, + { + "text": "PERL5LIB", + "_key": "3c35ccd9597e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "7edd690d58bf", + "_type": "span", + "marks": [], + "text": " as below:" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "88da8fa38161" + } + ], + "_type": "block", + "style": "normal", + "_key": "95b43e15b080" + }, + { + "code": "# Install FEELnc\nRUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\nENV FEELNCPATH /FEELnc\nENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/", + "_type": "code", + "_key": "02cae409f036" + }, + { + "_type": "block", + "style": "normal", + "_key": "3db7c8965a0b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "R and R libraries can be installed as follows:", + "_key": "fab1d01a8d76" + } + ] + }, + { + "style": "normal", + "_key": "369cb978dbc9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7e8b16febe0b" + } + ], + "_type": "block" + }, + { + "_key": "b635cd93fe02", + "code": "# Install R\nRUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\napt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\napt-get update --fix-missing && \\\napt-get -y install r-base\n\n# Install R libraries\nRUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'", + "_type": "code" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile", + "_key": "95d80901751f" + } + ], + "children": [ + { + "_key": "31f01f88d7d4", + "_type": "span", + "marks": [], + "text": "For the complete working Dockerfile of this project see " + }, + { + "text": "here", + "_key": "61cd37841c10", + "_type": "span", + "marks": [ + "95d80901751f" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "f897e630ac44" + }, + { + "_type": "block", + "style": "normal", + "_key": "34a2ed31ef9a", + "children": [ + { + "_key": "ec0c46f9c3c6", + "_type": "span", + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Building the Docker Image", + "_key": "99404c3f6b68", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1abf7c16ad8c" + }, + { + "children": [ + { + "_key": "1d5e4d812566", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b636fd70f4f5" + }, + { + "style": "normal", + "_key": "5650048a4760", + "markDefs": [], + "children": [ + { + "_key": "5264f09e8e11", + "_type": "span", + "marks": [], + "text": "Once we start working on the Dockerfile, we can build it anytime using:" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d641e7f6bf5b", + "children": [ + { + "_key": "70fcc4126623", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "docker build -t skptic/lncRNA_annotation .", + "_type": "code", + "_key": "e90f06c1b843" + }, + { + "markDefs": [], + "children": [ + { + "text": "This builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.", + "_key": "fe3388bbb799", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8ccc8a028371" + }, + { + "children": [ + { + "_key": "23129a90f294", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7738ce1608b0" + }, + { + "markDefs": [], + "children": [ + { + "text": "###Testing the Docker Image", + "_key": "ac9aefee0790", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "53e684ed0883" + }, + { + "children": [ + { + "text": "", + "_key": "29310a754336", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f459dd9c7e8f" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:", + "_key": "0f7532136e6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "995c7f634de1" + }, + { + "_type": "block", + "style": "normal", + "_key": "0f3a99e8f0f9", + "children": [ + { + "text": "", + "_key": "76902143bcad", + "_type": "span" + } + ] + }, + { + "code": "docker run -ti lncrna_annotation\n\ncd FEELnc/test\n\nFEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n> -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\nexit # remember to exit the Docker image", + "_type": "code", + "_key": "8bc163f9f47c" + }, + { + "style": "normal", + "_key": "8a04e5fe54c3", + "markDefs": [], + "children": [ + { + "text": "###Tagging the Docker Image", + "_key": "3c27e7d47f5a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "376d9185809e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b58a8fff4134" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hub.docker.com/", + "_key": "e8267b213edb" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you are confident your image is built correctly, you can tag it, allowing you to push it to ", + "_key": "0e81f997274e" + }, + { + "text": "Dockerhub.io", + "_key": "9f99511c671e", + "_type": "span", + "marks": [ + "e8267b213edb" + ] + }, + { + "marks": [], + "text": ". Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.", + "_key": "62279d8d8677", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1a1035fe3e9e" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "629916622f88" + } + ], + "_type": "block", + "style": "normal", + "_key": "7ad2329cd8e6" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can view the images in your local repository with the ", + "_key": "83a7985ea39e" + }, + { + "_key": "9b9c237f8f87", + "_type": "span", + "marks": [ + "code" + ], + "text": "docker images" + }, + { + "marks": [], + "text": " command and tag using ", + "_key": "6aaa7f1f9459", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "docker tag", + "_key": "56edcf9c0231", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " with the image ID and the name.", + "_key": "fccfd00ea0ef" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab2403070ee6", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4796d2e24cad" + } + ], + "_type": "block", + "style": "normal", + "_key": "2883293716da" + }, + { + "_type": "code", + "_key": "cb58c9b6a966", + "code": "docker images\n\nREPOSITORY TAG IMAGE ID CREATED SIZE\nlncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\ndocker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest" + }, + { + "children": [ + { + "marks": [], + "text": "Now when we check our local images we can see the updated tag.", + "_key": "efecf9499efc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "977cb77dafd8", + "markDefs": [] + }, + { + "children": [ + { + "_key": "de27f8c8d34d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e069ba58981" + }, + { + "_type": "code", + "_key": "859c42e5cad8", + "code": "docker images\n\nREPOSITORY TAG IMAGE ID CREATED SIZE\ncbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB" + }, + { + "style": "normal", + "_key": "36110c0bc0bc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "###Pushing the Docker Image to Dockerhub", + "_key": "adbb0489873f" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "72eb6aa2d1ff", + "children": [ + { + "text": "", + "_key": "1818ebcbf996", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a7bd5e43df27", + "markDefs": [ + { + "_key": "1cf86a9aeb72", + "_type": "link", + "href": "https://hub.docker.com/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you have not previously, sign up for a Dockerhub account ", + "_key": "d3c68be9bab9" + }, + { + "marks": [ + "1cf86a9aeb72" + ], + "text": "here", + "_key": "73adbe5a767b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". From the command line, login to Dockerhub and push your image.", + "_key": "fdb56fb68fc0" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "74b517dfa3a2" + } + ], + "_type": "block", + "style": "normal", + "_key": "76d11410797f" + }, + { + "code": "docker login --username=cbcrg\ndocker push cbcrg/lncrna_annotation", + "_type": "code", + "_key": "72e018a1b3a7" + }, + { + "style": "normal", + "_key": "4e814562758e", + "markDefs": [], + "children": [ + { + "_key": "603c47308e12", + "_type": "span", + "marks": [], + "text": "You can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "12f3da40fcc2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "25b68c28836e" + } + ] + }, + { + "code": "docker rmi -f d8ec49cbe3ed\n\n# Ensure the local version is not listed.\ndocker images\n\ndocker pull cbcrg/lncrna_annotation", + "_type": "code", + "_key": "9c8cc03d66d4" + }, + { + "_key": "67b0d083f1e1", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We are now almost ready to run our pipeline. The last step is to set up the Nexflow config.", + "_key": "fb18e8ebb6fb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "851718e1c203", + "children": [ + { + "_key": "7e1f6285672c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Nextflow Configuration", + "_key": "ee703ba1a7b8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "89a8e9b57253" + }, + { + "style": "normal", + "_key": "301b53373abc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3c12e4f84be6" + } + ], + "_type": "block" + }, + { + "_key": "853618d141bc", + "markDefs": [], + "children": [ + { + "text": "Within the ", + "_key": "e450d4c03687", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "eb562dcd976e" + }, + { + "_key": "0aada97916c3", + "_type": "span", + "marks": [], + "text": " file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "56eecb336e47", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bcefea639daa" + }, + { + "style": "normal", + "_key": "b30a67dabb0c", + "listItem": "bullet", + "children": [ + { + "marks": [], + "text": "General (same docker image for all processes):", + "_key": "dc79564414fe", + "_type": "span" + }, + { + "_key": "88c45b92cac0", + "_type": "span", + "text": "\n\n" + }, + { + "_type": "span", + "text": " process {\n container = 'cbcrg/lncrna_annotation'\n }\n", + "_key": "b8e693fb94da" + }, + { + "_type": "span", + "marks": [], + "text": "Specific to a profile (specified by `-profile crg` for example):", + "_key": "dbfdedf028a7" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "93fd2bf6af97" + }, + { + "text": " profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n", + "_key": "0e604dd4732b", + "_type": "span" + }, + { + "marks": [], + "text": "Specific to a given process within a pipeline:", + "_key": "711cd0470649", + "_type": "span" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "345b60ce2451" + }, + { + "_type": "span", + "text": " $processName.container = 'cbcrg/lncrna_annotation'", + "_key": "96fd3fcfe331" + } + ], + "_type": "block" + }, + { + "_key": "a560b7a67c40", + "children": [ + { + "_type": "span", + "text": "", + "_key": "14aebd1b7468" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "4033d3bebdf9", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html", + "_key": "f61aacdb2ef0" + } + ], + "children": [ + { + "marks": [], + "text": "In most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have ", + "_key": "1a64527f3033", + "_type": "span" + }, + { + "text": "previously written about this", + "_key": "38cf9657683c", + "_type": "span", + "marks": [ + "f61aacdb2ef0" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", but briefly, including a hash ensures that not a single byte of the operating system or software is different.", + "_key": "bc4e97553513" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "441548f75de3" + } + ], + "_type": "block", + "style": "normal", + "_key": "0eaf14f96c05" + }, + { + "code": " process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }", + "_type": "code", + "_key": "e986f84b6af5" + }, + { + "markDefs": [], + "children": [ + { + "text": "All that is left now to run the pipeline.", + "_key": "132c729c8d25", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "39e6843958d4" + }, + { + "style": "normal", + "_key": "a4a85a9e7228", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3eba503d4fca" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "a90f3eeed817", + "code": "nextflow run lncRNA-Annotation-nf -profile test" + }, + { + "children": [ + { + "marks": [], + "text": "Whilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.", + "_key": "6bc1b9275274", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e51c1eda68c5", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7ece5b1d69ec" + } + ], + "_type": "block", + "style": "normal", + "_key": "f4ab602e7e18" + }, + { + "children": [ + { + "marks": [], + "text": "###CircleCI and Nextflow", + "_key": "127548bd2ca6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3e9640736a34", + "markDefs": [] + }, + { + "_key": "8bd12b9a35d5", + "children": [ + { + "_key": "48776ea3a77d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://www.circleci.com", + "_key": "52d7d21fec88" + } + ], + "children": [ + { + "_key": "e1a0115c8a63", + "_type": "span", + "marks": [], + "text": "Now that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the " + }, + { + "_key": "bf9e5650e51a", + "_type": "span", + "marks": [ + "52d7d21fec88" + ], + "text": "CircleCI" + }, + { + "_type": "span", + "marks": [], + "text": " servers (using Docker).", + "_key": "a7690c9f35e1" + } + ], + "_type": "block", + "style": "normal", + "_key": "006188af7329" + }, + { + "style": "normal", + "_key": "bbb43942df4f", + "children": [ + { + "text": "", + "_key": "565cf1047e08", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "f8cea4ca2097", + "_type": "span", + "marks": [], + "text": "To include CircleCI in the Nexflow pipeline, create a file named " + }, + { + "marks": [ + "code" + ], + "text": "circle.yml", + "_key": "fa98c01db045", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in the project directory. We add the following instructions to the file:", + "_key": "2f21b332f3b0" + } + ], + "_type": "block", + "style": "normal", + "_key": "41364a5f63d3", + "markDefs": [] + }, + { + "style": "normal", + "_key": "e2b0b14d0fd2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "25942a6c677a" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "7433acb412d2", + "code": "machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\ndependencies:\n override:\n\ntest:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test" + }, + { + "style": "normal", + "_key": "70d6d1859e1d", + "markDefs": [], + "children": [ + { + "_key": "433129d9fd5e", + "_type": "span", + "marks": [], + "text": "Next you can sign up to CircleCI, linking your GitHub account." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "2f2296b7bb34", + "children": [ + { + "text": "", + "_key": "3a0243c5639e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "261b716a06a7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Within the GitHub README.md you can add a badge with the following:", + "_key": "0be2d4a60379", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c227bf5b9089", + "children": [ + { + "text": "", + "_key": "64db719ff8e1", + "_type": "span" + } + ] + }, + { + "_key": "a375b3bed0e9", + "code": "![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)", + "_type": "code" + }, + { + "style": "normal", + "_key": "1642b961bc5a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "###Tips and Tricks", + "_key": "46f101c27e69" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "993ba2832874", + "children": [ + { + "_key": "dd9168b63937", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "d33d746e9473", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "File permissions", + "_key": "a2f6b726c62d" + }, + { + "marks": [], + "text": ": When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:", + "_key": "287310bd8df1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "0fd72cb1c652", + "children": [ + { + "_key": "f294eccb09f6", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d03452f5b41c", + "markDefs": [], + "children": [ + { + "text": "find ", + "_key": "d7e384eae7c7", + "_type": "span", + "marks": [] + }, + { + "text": "", + "_key": "28f9a9ea28bf", + "_type": "span" + }, + { + "text": " -type f -exec chmod 644 {} ", + "_key": "d91084971fde", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "text": "\\;", + "_key": "542ab5615352" + }, + { + "_type": "span", + "marks": [], + "text": " find ", + "_key": "64fa506b82ef" + }, + { + "_type": "span", + "text": "", + "_key": "151f315f9c14" + }, + { + "_type": "span", + "marks": [], + "text": " -type d -exec chmod 755 {} ", + "_key": "d588ee4d8fb4" + }, + { + "_type": "span", + "text": "\\;", + "_key": "83813e5d73d2" + } + ] + }, + { + "_key": "5c097f5ad5b2", + "children": [ + { + "text": "", + "_key": "8dc4ce35290d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "16153769e1e6", + "markDefs": [ + { + "_key": "a645ea709cb2", + "_type": "link", + "href": "mailto:/evanfloden@gmail.com" + } + ], + "children": [ + { + "marks": [], + "text": "###Summary This was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform. ", + "_key": "f21d4187558c", + "_type": "span" + }, + { + "text": "", + "_key": "4270a7ace6f5", + "_type": "span" + }, + { + "_type": "span", + "text": "", + "_key": "91edb73c5f75" + }, + { + "_type": "span", + "marks": [ + "a645ea709cb2" + ], + "text": "/evanfloden@gmail.com", + "_key": "38e9df5b660d" + } + ], + "_type": "block" + } + ], + "_rev": "Ot9x7kyGeH5005E3MIo38v", + "title": "Docker for dunces & Nextflow for nunces", + "_updatedAt": "2024-09-26T09:01:26Z", + "_type": "blogPost", + "_createdAt": "2024-09-25T14:15:05Z", + "meta": { + "slug": { + "current": "docker-for-dunces-nextflow-for-nunces" + } + }, + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "5edc3ed408ba" + }, + { + "_type": "reference", + "_key": "c2a74b2b2cad", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "publishedAt": "2016-06-10T06:00:00.000Z" + }, + { + "_type": "blogPost", + "title": "Introducing the new Pipeline Launch forms: A leap forward in usability and functionality", + "meta": { + "noIndex": false, + "slug": { + "_type": "slug", + "current": "new-pipeline-launch-forms" + }, + "_type": "meta", + "description": "Today, we are excited to introduce the newly redesigned Pipeline launch forms, marking the first phase in a broader initiative to revamp the entire form submission experience across our platform. " + }, + "_rev": "y83n3eQxj1PRqzuDdkeW1u", + "publishedAt": "2024-08-15T09:11:00.000Z", + "body": [ + { + "_key": "c3aaafb21d9e", + "markDefs": [ + { + "href": "https://feedback.seqera.io/feature-requests/p/update-forms-user-interface-including-pipeline-launch-relaunch-form-redesign", + "_key": "88ea45bac22e", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "At Seqera, we’re committed to listening to our users feedback and continuously improving the Platform to meet your evolving needs. One of the most ", + "_key": "02fd5be96db90" + }, + { + "_type": "span", + "marks": [ + "88ea45bac22e" + ], + "text": "common feature requests", + "_key": "02fd5be96db91" + }, + { + "text": " has been to enhance the form submission process, specifically the Pipeline Launch and Relaunch forms. Today, we are excited to introduce the newly redesigned Pipeline Launch forms, marking the first phase in a broader initiative to ", + "_key": "02fd5be96db92", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "revamp the entire form submission experience", + "_key": "02fd5be96db93", + "_type": "span" + }, + { + "marks": [], + "text": " across our platform. This update drastically simplifies interactions in the Seqera Platform, enhancing the day-to-day user experience by addressing known usability issues in our most frequently used forms.", + "_key": "02fd5be96db94", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "youtube", + "id": "_fru3RxBDPY", + "_key": "8957e1a6644c" + }, + { + "_type": "block", + "style": "normal", + "_key": "d013fcd8ab46", + "markDefs": [ + { + "href": "https://nf-co.re/rnaseq", + "_key": "115a2d75cd73", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Screen recording of the submission of the popular", + "_key": "ff0559080c5f" + }, + { + "_type": "span", + "marks": [], + "text": " ", + "_key": "4a7b74425853" + }, + { + "text": "nf-core rnaseq", + "_key": "825c7d2a20a2", + "_type": "span", + "marks": [ + "em", + "115a2d75cd73" + ] + }, + { + "_key": "1371156ece2d", + "_type": "span", + "marks": [], + "text": " " + }, + { + "marks": [ + "em" + ], + "text": "pipeline, highlighting several features of the new form.", + "_key": "61f7d7a58d6d", + "_type": "span" + } + ] + }, + { + "style": "blockquote", + "_key": "0c7e71524011", + "markDefs": [ + { + "_key": "d729831b3c3d", + "_type": "link", + "href": "/contact-us/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The new Pipeline Launch and Relaunch form will be available to all Cloud users in an upcoming release, but if you are interested in being one of the early adopters, please contact your Seqera Account Executive or ", + "_key": "62c6e4fa4b92" + }, + { + "_type": "span", + "marks": [ + "d729831b3c3d" + ], + "text": "send us an email directly", + "_key": "a327f42f4d58" + }, + { + "_key": "0f90a0416b6a", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "8d2fef8b52c1", + "markDefs": [], + "children": [ + { + "text": "Why the change?", + "_key": "b99fcb056cb80", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "2ab0657a6c5d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "User experience is at the heart of everything we do. Over time, we've received valuable feedback from our users about the forms on our platform. In particular, we gathered feedback on some of the most frequently used forms: Pipeline Launch, Relaunch and Resume forms. In response, we have made significant enhancements to create a more intuitive, efficient, and user-friendly experience.", + "_key": "e1d6b1a2dfc3" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "839b514875ce", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Key Objectives", + "_key": "eff267db0ffe0", + "_type": "span" + } + ] + }, + { + "_key": "663f9546066e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The redesign of the Pipeline Launch and Relaunch forms was guided by four objectives:", + "_key": "7ff0898d0be60" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "615515ed9874", + "listItem": "number", + "markDefs": [], + "children": [ + { + "text": "Simpler navigation: ", + "_key": "1d26e055f5610", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "5e9912500f67", + "_type": "span", + "marks": [], + "text": "The new multi-step approach ensures that users can easily navigate through pipeline launch form submissions without unnecessary steps. Key information is stored and grouped logically, allowing users to focus on the essential steps.\n" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Enhanced validation:", + "_key": "29322453600b0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "38f8ea4cc069", + "_type": "span", + "marks": [], + "text": " We've added robust validation features to ensure the accuracy and completeness of submitted information, reducing errors and helping users avoid common pitfalls during pipeline configuration.\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0a3c9275d682", + "listItem": "number" + }, + { + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Improved clarity:", + "_key": "9a36879cad1a0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " Form content has been updated to be more concise and clear, ensuring users can quickly understand the requirements and options available to them, thus reducing confusion and improving overall efficiency.\n", + "_key": "b59310c7a08f" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "41c43b0e4337" + }, + { + "markDefs": [], + "children": [ + { + "text": "Enhanced key components:", + "_key": "cc7b374d1f3c0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "0b6abf765bcd", + "_type": "span", + "marks": [], + "text": " Key form components have been redesigned to offer a more intuitive user experience. This includes more dynamic control of the configured parameters, the ability to switch between a UI schema view, and interactive JSON and YAML rendering for full control every time a user launches, relaunches or resumes a pipeline." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "62a1887cef00", + "listItem": "number" + }, + { + "_type": "block", + "style": "h2", + "_key": "fa6adc370047", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Enhancements", + "_key": "312d4a3b6aaf0" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "The redesigned Pipeline Launch and Relaunch forms come with a host of new features designed to improve usability and functionality:", + "_key": "61e7239ff41b0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "73b1e280cd30" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "38620088a7ad", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Multi-step approach:", + "_key": "cdb7790f12f30", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " Users can now navigate through forms with a streamlined, multi-step approach. If everything is set up correctly, there's no need to go through all steps –simply run what you know works.", + "_key": "1890588ea99a" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b97a50ad5378", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Enhanced assistance:", + "_key": "527b2be40ea70" + }, + { + "_key": "e27e6ee2d88f", + "_type": "span", + "marks": [], + "text": " We've improved feedback mechanisms to provide detailed information about errors or missing parameters helping users to quickly identify and rectify issues before launching pipelines." + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "5e86cf51f887", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Developer-friendly:", + "_key": "b73d4356a53b0" + }, + { + "text": " Developers can switch between UI schema views and a more comprehensive parameter view using JSON and YAML interactive rendering. This flexibility allows for dynamic control of form validity and ensures that developers have the tools they need to configure their pipelines effectively.", + "_key": "b02329811097", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "_key": "407755155091", + "listItem": "number", + "markDefs": [], + "children": [ + { + "text": "Enhanced rendering:", + "_key": "ea635d07dac20", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " The form now dynamically generates the UI interface for parameter input whenever a compatible schema is defined. This improvement addresses previous limitations where the UI interface was only rendered when launching a saved pipeline, as opposed to relaunching or using Quick Launch. With this update, the UI is rendered consistently across all launching scenarios, providing a more convenient and streamlined experience.", + "_key": "34bd6d508e99", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Improved flow and status information:", + "_key": "afcef5fe2e9a0" + }, + { + "_type": "span", + "marks": [], + "text": " The new design offers a smoother flow and more informative status updates, providing a clear view of the submission process at every stage.", + "_key": "45d7a470b0fb" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "fdc5babed53d", + "listItem": "number", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "909bfadb3e3e", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Summary step:", + "_key": "3e527527f45a", + "_type": "span" + }, + { + "text": " A new summary view allows users to review all information at a glance before launching their pipeline.\n", + "_key": "88d8bdfc44f4", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "_key": "73fc1879d9a4", + "markDefs": [], + "children": [ + { + "_key": "1a460c4d3b490", + "_type": "span", + "marks": [], + "text": "Summary" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "2f8698228d89", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "By focusing on these key objectives and enhancements, we aimed to improve one of the most commonly performed actions in the Seqera Platform. The redesigned form makes it easier for new and experienced users to run pipelines in the Seqera Platform. This effort is just the beginning of our goal of enhancing the form submission experience across the platform. Moreover, this initial refactor enables us to continue improving and expanding the user experience in the future. You can expect more enhancements as we roll out additional features and improvements based on our community feedback.", + "_key": "f7514251e56d0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0fcc413706ad", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "77091802bb0d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "dc820fe7dbe3", + "markDefs": [ + { + "_key": "0e8e643720c4", + "_type": "link", + "href": "https://docs.seqera.io/platform/24.1/launch/launchpad#launch-form" + } + ], + "children": [ + { + "marks": [], + "text": "Read the ", + "_key": "9654eadd2b3b0", + "_type": "span" + }, + { + "_key": "bdb86ee0c748", + "_type": "span", + "marks": [ + "0e8e643720c4" + ], + "text": "official documentation" + }, + { + "text": " to find out more.", + "_key": "881341a728c4", + "_type": "span", + "marks": [] + } + ] + } + ], + "author": { + "_ref": "mattia-bosio", + "_type": "reference" + }, + "_updatedAt": "2024-08-16T15:23:20Z", + "_createdAt": "2024-08-15T08:44:07Z", + "_id": "5cf61b02-f036-49f9-850c-72e0bf3d4f35", + "tags": [ + { + "_ref": "82fd60f1-c6d0-4b8a-9c5d-f971c622f341", + "_type": "reference", + "_key": "c2000532faf0" + } + ] + }, + { + "_updatedAt": "2024-09-03T07:58:25Z", + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "_id": "5df71356-10dc-422f-bae8-e26491a560dc", + "_createdAt": "2024-08-06T13:37:21Z", + "body": [ + { + "_type": "image", + "_key": "39e47c87a1bd", + "asset": { + "_type": "reference", + "_ref": "image-3d25c202215864675258a5c2c5084d2f656aae73-1200x629-png" + } + }, + { + "_key": "f5d38bed2100", + "markDefs": [ + { + "href": "https://www.tinybio.cloud/", + "_key": "0eadc8ec1fed", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We are thrilled to announce that Seqera is joining forces with ", + "_key": "e2be5dfe2b940" + }, + { + "text": "tinybio", + "_key": "e2be5dfe2b941", + "_type": "span", + "marks": [ + "0eadc8ec1fed" + ] + }, + { + "_key": "e2be5dfe2b942", + "_type": "span", + "marks": [], + "text": ", a NYC-based tech-bio start-up known for its AI-integrated scientific tools focused on executing pipelines and analyses via natural language. We are happy to welcome the tinybio team and community into the Seqera family." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "28fd4bc59876", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Empowering All Scientists with Advanced Data Tools", + "_key": "e583cc66c5fc0" + } + ] + }, + { + "_key": "9d1bdb6acb67", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nature.com/articles/s41467-024-49777-x", + "_key": "b2a101e1faea" + }, + { + "_type": "link", + "href": "https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01304-y", + "_key": "8e1ad6d30663" + }, + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8719813/", + "_key": "307be9175f5a" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Scientists spend a ", + "_key": "6a6167e26f940" + }, + { + "marks": [ + "b2a101e1faea" + ], + "text": "significant proportion of their time", + "_key": "6a6167e26f941", + "_type": "span" + }, + { + "_key": "6a6167e26f942", + "_type": "span", + "marks": [], + "text": " transforming and structuring data for analysis. In fact, a " + }, + { + "_key": "6a6167e26f943", + "_type": "span", + "marks": [ + "8e1ad6d30663" + ], + "text": "lessons learned piece on the COVID-19 pandemic " + }, + { + "_type": "span", + "marks": [], + "text": "underscored how ", + "_key": "6a6167e26f944" + }, + { + "marks": [ + "strong" + ], + "text": "issues in data analysis ", + "_key": "6a6167e26f945", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "and study design can ", + "_key": "6a6167e26f946" + }, + { + "_key": "6a6167e26f947", + "_type": "span", + "marks": [ + "strong" + ], + "text": "significantly impact scientific breakthroughs" + }, + { + "_key": "6a6167e26f948", + "_type": "span", + "marks": [], + "text": ". " + }, + { + "_key": "6a6167e26f949", + "_type": "span", + "marks": [ + "307be9175f5a" + ], + "text": "As biological data continues to grow exponentially" + }, + { + "text": ", there is an urgent need to manage large-scale data more rapidly for accelerated scientific breakthroughs. To achieve this, we are partnering with tinybio to ", + "_key": "6a6167e26f9410", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "harness the power of GenAI,", + "_key": "6a6167e26f9411", + "_type": "span" + }, + { + "_key": "6a6167e26f9412", + "_type": "span", + "marks": [], + "text": " lowering the barrier for scientists to fully leverage advanced computational tools to achieve their research goals." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "27ddeccac626", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "45b78a00c7e8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "tinybio: Specialized ChatGPT for Researchers", + "_key": "a3b9dc23dc9a0" + } + ], + "_type": "block", + "style": "h2", + "_key": "3ab4e5cf5dfe" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.tinybio.cloud/", + "_key": "908a8535997e" + }, + { + "_key": "4cc3db069b96", + "_type": "link", + "href": "https://chatgpt.com/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Motivated by challenges faced as researchers experimenting with different bioinformatics packages, Sasha and Vishal founded ", + "_key": "405ee96282980" + }, + { + "_type": "span", + "marks": [ + "908a8535997e" + ], + "text": "tinybio", + "_key": "405ee96282981" + }, + { + "marks": [], + "text": " in 2022, convinced there had to be a better, easier way to get started with bioinformatics. The initial goal of tinybio was to remove the barrier to entry for running bioinformatics packages, a mission that gained significant momentum with the announcement of ", + "_key": "405ee96282982", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "4cc3db069b96" + ], + "text": "ChatGPT", + "_key": "405ee96282983" + }, + { + "text": " in November 2022. The tinybio co-founders recognized the potential of ", + "_key": "405ee96282984", + "_type": "span", + "marks": [] + }, + { + "_key": "0ba59e2f9664", + "_type": "span", + "marks": [ + "strong" + ], + "text": "leveraging GenAI " + }, + { + "_key": "984fcc760079", + "_type": "span", + "marks": [], + "text": "for empowering all scientists to effectively utilize bioinformatics tools, regardless of their experience or research background. Ever since, tinybio have focused on applying GenAI to drive bioinformatics innovation." + } + ], + "_type": "block", + "style": "normal", + "_key": "58b28d00c516" + }, + { + "style": "normal", + "_key": "e289bac08afd", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9722baf8eb0a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "c3ff7a2beda0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\"After seeing the amazing traction around our chat-based pipeline execution and analysis tool, Vishal and I knew that we needed to partner with the leader in bioinformatics pipelines to enable our vision for ", + "_key": "bc450efddd08", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "more open science", + "_key": "d051bfadfe53", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and to ", + "_key": "f42d1845bb72" + }, + { + "_key": "6b8aae3beb95", + "_type": "span", + "marks": [ + "strong" + ], + "text": "onboard millions more to computational biology." + }, + { + "marks": [], + "text": " We are truly excited to be joining the Seqera team and contributing to advancing science for everyone through their Nextflow, Wave, MultiQC, and Fusion products.\" - ", + "_key": "2f742e542afe", + "_type": "span" + }, + { + "text": "Sasha Dagayev, Co-founder at tinybio", + "_key": "bba720e98780", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "children": [ + { + "_key": "8a74a60c4d9c", + "_type": "span", + "marks": [], + "text": "\ntinybio’s authentic and pragmatic approach to " + }, + { + "_key": "f21eee87a50a", + "_type": "span", + "marks": [ + "strong" + ], + "text": "leveraging LLMs for bioinformatics" + }, + { + "text": " is essential in bridging the gap between scientists and advanced computational capabilities to accelerate scientific discovery. By incorporating this technology, we aim to significantly enhance our existing ", + "_key": "1b90bec51a6f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "95fd0bd0acf6" + ], + "text": "pipelines", + "_key": "d079d983b04a" + }, + { + "text": ",", + "_key": "b1b5f1bfc9ae", + "_type": "span", + "marks": [] + }, + { + "_key": "c42631b1fa90", + "_type": "span", + "marks": [ + "d259679e3355" + ], + "text": " containers" + }, + { + "_type": "span", + "marks": [], + "text": " and web resources, making high-quality, reproducible bioinformatics tools more accessible to researchers worldwide. Our goal is to ", + "_key": "a1530ae5fa4d" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "empower the global scientific community", + "_key": "f5a66b9f8ce7" + }, + { + "_key": "4f48f33b58fc", + "_type": "span", + "marks": [], + "text": " with the resources they need to drive innovation and advance our understanding of complex biological systems." + } + ], + "_type": "block", + "style": "normal", + "_key": "20b81fc022bb", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/pipelines/", + "_key": "95fd0bd0acf6" + }, + { + "_type": "link", + "href": "https://seqera.io/containers/", + "_key": "d259679e3355" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "781c75a3bb98", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "de50ded18161", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "A New Era for AI-enabled Bioinformatics", + "_key": "e0633d49502f0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "a35537f824ee" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7998306/", + "_key": "81a2231f5c08" + } + ], + "children": [ + { + "marks": [], + "text": "The biotech and bioinformatics landscape is rapidly evolving, driven in-part by technological advances in AI. The ability to analyze massive datasets, identify patterns, and generate predictive models is revolutionizing scientific research. We also believe that AI is a powerful tool to democratize and amplify access to the most sophisticated bioinformatics tools out there. By leveraging ", + "_key": "03d0891e1aa70", + "_type": "span" + }, + { + "marks": [ + "81a2231f5c08" + ], + "text": "human-centric AI", + "_key": "03d0891e1aa71", + "_type": "span" + }, + { + "_key": "03d0891e1aa72", + "_type": "span", + "marks": [], + "text": ", we can " + }, + { + "text": "enable the 10x scientist", + "_key": "0a03364b0e08", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to translate complex biological data into actionable insights, thereby expediting scientific discovery and innovation.", + "_key": "49ef31effc8b" + } + ], + "_type": "block", + "style": "normal", + "_key": "cf91c1dabfd0" + }, + { + "markDefs": [], + "children": [ + { + "_key": "8560ad3cef8e0", + "_type": "span", + "marks": [], + "text": "\n" + }, + { + "marks": [], + "text": "Our partnership with tinybio represents a significant milestone in our journey to ", + "_key": "903ece1a151f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "advance science for everyone through software", + "_key": "3cc381edad69" + }, + { + "text": ". This collaboration will lower the barrier of entry for a broader range of researchers to utilize bioinformatics tools effectively, facilitating groundbreaking innovations and transforming the future of genomics.", + "_key": "9d2e53834331", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dd017c0c5d56" + }, + { + "children": [ + { + "text": "", + "_key": "32ab8ed1feb7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3a554d01a336", + "markDefs": [] + }, + { + "_key": "fd72009a708b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\"New interaction models with powerful computational platforms are transforming not just how scientists work but also what they discover. By empowering scientists with modern software engineering practices, we are ", + "_key": "ce5a8009c2f90" + }, + { + "_key": "797c10fef3eb", + "_type": "span", + "marks": [ + "strong" + ], + "text": "enabling the next generation of innovations" + }, + { + "text": " in personalized therapeutics, sustainable materials, better drug delivery methodologies, and green chemical and agricultural production. This acquisition marks a significant step towards accelerating scientific discoveries and enabling researchers with better software.\" -", + "_key": "f3cfe9eb53f7", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " Evan Floden, CEO at Seqera", + "_key": "7c3b3468dfe3" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1bcc2f6d724a" + } + ], + "_type": "block", + "style": "normal", + "_key": "849a938fb79b" + }, + { + "_key": "57dd7a75086e", + "markDefs": [], + "children": [ + { + "_key": "2b06457c0ed50", + "_type": "span", + "marks": [], + "text": "Enhancing our Open Science Core" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "marks": [], + "text": "Our mission at Seqera is to ", + "_key": "28b149c00eee0", + "_type": "span" + }, + { + "_key": "57dc51e60f83", + "_type": "span", + "marks": [ + "strong" + ], + "text": "make science accessible to everyone through software" + }, + { + "text": ". As research becomes increasingly digitized, there is a critical need to access all available scientific research to make informed R&D decisions and ultimately accelerate the impact on patients. Central to achieving this is Open Science, which ensures reproducibility, validation and transparency across the scientific community. With AI, we want to further enhance our Open Science core, by lowering the barrier of adoption of bioinformatics tools for ", + "_key": "f068fe8ef9f4", + "_type": "span", + "marks": [] + }, + { + "text": "millions more researchers worldwide,", + "_key": "28b149c00eee1", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " driving more rapid advancements in science and medicine.", + "_key": "28b149c00eee2" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ace1e068755", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "a54ccaf62d6c0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "83a6b2dc9d92" + }, + { + "style": "h2", + "_key": "5b4fbb778158", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "What’s Next for Seqera and tinybio?", + "_key": "6db8f88cd0a20" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "be45046290e5", + "markDefs": [], + "children": [ + { + "_key": "d1f478e8ca4e0", + "_type": "span", + "marks": [], + "text": "Seqera is excited to collaborate closely with tinybio’s founders Sasha Dagayev and Vishal Patel to further its mission of advancing science for everyone through software. Their expertise will be instrumental in driving the development of community-centric tools on Seqera.io," + }, + { + "text": " empowering scientists worldwide", + "_key": "7df2ef0b8ff2", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "30172707b906", + "_type": "span", + "marks": [], + "text": " to leverage modern software capabilities on demand." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "249e21a060a8", + "markDefs": [], + "children": [ + { + "_key": "a38bd6a01e2e0", + "_type": "span", + "marks": [], + "text": "We will first focus on leveraging AI to solve the cold start problem for the next generation of scientists and " + }, + { + "text": "removing barriers to entry to bioinformatics", + "_key": "6970533750e5", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". Existing powerful frameworks and resources, such as Nextflow, nf-core, Seqera Pipelines and Containers, have been significantly enhancing the research productivity of bioinformaticians, but come with a steep learning curve that prevents newcomers from getting started fast.", + "_key": "7f4fec95cdd5" + } + ] + }, + { + "style": "normal", + "_key": "d42b3b57dfac", + "markDefs": [], + "children": [ + { + "_key": "8ee4c6456e1a0", + "_type": "span", + "marks": [], + "text": "We want to free the next generation of scientists from wasting time in the nitty gritty of setting up various bioinformatics packages and infrastructure. We believe future scientists should be able to focus on understanding the “what” and “why” of their analysis, while the “how” is generated for them in an understandable and verifiable way. Our tools and resources provide already powerful building blocks to enable this, and we cannot wait to bring these new updates to users in the coming months. Stay tuned!" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "7f7a558dfa34", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6d3fbe9d99a0", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://hubs.la/Q02NlSWM0", + "_key": "eb86d506d8aa", + "_type": "link" + }, + { + "_key": "5f0b731c8d4f", + "_type": "link", + "href": "https://hubs.la/Q02NlVXy0" + } + ], + "children": [ + { + "_key": "fd5a307d6e25", + "_type": "span", + "marks": [], + "text": "Interested in finding out more? Watch the Nextflow Channels podcast on " + }, + { + "marks": [ + "eb86d506d8aa" + ], + "text": "GenAI for bioinformatics", + "_key": "f1a61d9d9482", + "_type": "span" + }, + { + "marks": [], + "text": " or ", + "_key": "8a4c1b94d2cb", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5f0b731c8d4f" + ], + "text": "subscribe to our newsletter", + "_key": "d505924c7484" + }, + { + "_type": "span", + "marks": [], + "text": " to stay tuned!", + "_key": "09ba811e5ef1" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "902d0053f19e" + }, + { + "_type": "block", + "style": "normal", + "_key": "83248382cacc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d1a81dc47aac" + } + ] + }, + { + "style": "h2", + "_key": "cfcca3842875", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "About tinybio", + "_key": "113745debebc0" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "tinybio is a New York City based startup focused on the application of generally available generative AI technologies to help bioinformaticians and researchers. It was started by Sasha Dagayev and Vishal Patel in 2022. To date, the company has helped thousands of researchers to resolve hundreds of thousands of bioinformatics issues.", + "_key": "eedbe6e83ab60" + } + ], + "_type": "block", + "style": "normal", + "_key": "0119756f1ecc", + "markDefs": [] + } + ], + "tags": [ + { + "_ref": "d356a4d5-06c1-40c2-b655-4cb21cf74df1", + "_type": "reference", + "_key": "3395edbcdd9d" + }, + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "c9112db92839" + } + ], + "meta": { + "slug": { + "_type": "slug", + "current": "tinybio-joins-seqera-to-advance-science-for-everyone-now-through-genai" + }, + "_type": "meta", + "shareImage": { + "asset": { + "_type": "reference", + "_ref": "image-3d25c202215864675258a5c2c5084d2f656aae73-1200x629-png" + }, + "_type": "image" + }, + "description": "We are thrilled to announce that Seqera is joining forces with tinybio, a NYC-based tech-bio start-up known for its AI-integrated scientific tools focused on executing pipelines and analyses via natural language. \n", + "noIndex": false + }, + "_rev": "Z979U64FXLC2cFZCkgkV9v", + "title": "Seqera acquires tinybio to Advance Science for Everyone - Now Through GenAI!", + "_type": "blogPost", + "publishedAt": "2024-08-06T13:43:00.000Z" + }, + { + "_type": "blogPost", + "_id": "8e1a9fb2-814c-455b-890c-5f3f07e83da4", + "tags": [ + { + "_key": "e3b6edcaea48", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_updatedAt": "2024-07-24T10:52:39Z", + "publishedAt": "2024-07-17T12:39:00.000Z", + "_createdAt": "2024-07-15T13:31:19Z", + "body": [ + { + "_type": "block", + "style": "h2", + "_key": "e4775c3fdbeb", + "markDefs": [], + "children": [ + { + "text": "Understanding the Nextflow User Community", + "_key": "8e229381fa710", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "1a4d6ab2b758", + "markDefs": [ + { + "_key": "3025a534b404", + "_type": "link", + "href": "https://nextflow.io/" + }, + { + "href": "https://hubs.la/Q02HMCZ70", + "_key": "c521b2714e5e", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In April, we conducted our annual ", + "_key": "19c26daec2820" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "State of the Workflow Community survey", + "_key": "eb8144492759" + }, + { + "_type": "span", + "marks": [], + "text": " to gather insights and feedback from the ", + "_key": "cc6ecf47c330" + }, + { + "_key": "19c26daec2823", + "_type": "span", + "marks": [ + "3025a534b404" + ], + "text": "Nextflow" + }, + { + "marks": [], + "text": " user community, and we are excited to share that this year, ", + "_key": "19c26daec2824", + "_type": "span" + }, + { + "_key": "19c26daec2825", + "_type": "span", + "marks": [ + "strong" + ], + "text": "600+ Nextflow users" + }, + { + "marks": [], + "text": " participated - a 21% increase from 2023! By sharing these insights, we aim to empower researchers, developers, and organizations to leverage Nextflow effectively, fostering innovation and collaboration amongst the community. Here we share some key findings from the Nextflow user community.\n\n", + "_key": "19c26daec2826", + "_type": "span" + }, + { + "text": "DOWNLOAD THE FULL SURVEY", + "_key": "a71cfed51202", + "_type": "span", + "marks": [ + "c521b2714e5e", + "strong", + "underline" + ] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "eefa4afc5452", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bc60c34cfbea" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Key Findings At a Glance", + "_key": "1789872287f70" + } + ], + "_type": "block", + "style": "h2", + "_key": "a163a8498996" + }, + { + "children": [ + { + "text": "", + "_key": "1b07495615020", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3773f2653df9", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "✔ Shift from HPC to public cloud ", + "_key": "159cafaf5df10" + }, + { + "marks": [], + "text": "- Majority of biotech and industrial sectors now favor public clouds for running Nextflow, with 78% indicating plans to migrate in the next two years.\n", + "_key": "5612ee1622a6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab334e5b3812", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "✔ Multi-cloud deployments are on the rise", + "_key": "020a0bd3ea7a0" + }, + { + "_type": "span", + "marks": [], + "text": " - To meet growing computational and data availability needs, 14% of Nextflow users manage workloads across two clouds.", + "_key": "020a0bd3ea7a1" + } + ], + "_type": "block", + "style": "normal", + "_key": "65e306b63bb8", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "2fc2e91abdef0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9660db208e5b", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "✔ Open Science is key for streamlining research", + "_key": "0c05be0b85150" + }, + { + "_type": "span", + "marks": [], + "text": " - 82% of Nextflow users view Open Science as fundamental to research, advancing science for everyone.", + "_key": "0c05be0b85151" + } + ], + "_type": "block", + "style": "normal", + "_key": "529f6f989016", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "03d694b833bd", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6a2a3e4ecd55", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "image", + "_key": "5fd07cc3c181", + "asset": { + "_ref": "image-3fff8e82dc7ab66c289f1c32186e563997af4e7f-1200x836-png", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "3012982ecad0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4a6b7ff26854" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Bioinformatics Analysis is Moving to Public Clouds", + "_key": "94dd65e5e77d0", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "c100760180b5", + "markDefs": [] + }, + { + "style": "normal", + "_key": "47d223e5d87e", + "markDefs": [], + "children": [ + { + "text": "In recent years, we have witnessed a notable shift in bioinformatics analysis towards public cloud platforms, driven largely by for-profit organizations seeking enhanced reliability, scalability and flexibility in their computational workflows. Our survey found that while on-premises clusters remain the most common for users in general, the prevalence of traditional HPC environments is on a steady decline. Specifically, in the biotech industry, nearly ", + "_key": "f848cb5dd3cc0", + "_type": "span", + "marks": [] + }, + { + "text": "three-quarters of firms now favor public clouds, ", + "_key": "f848cb5dd3cc1", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "f848cb5dd3cc2", + "_type": "span", + "marks": [], + "text": "reflecting a broader industry trend toward adaptable and robust computing solutions. " + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "474b62cda575", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e531fa4ad4dc" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "6a32f4c62759", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d571f398163a", + "markDefs": [] + }, + { + "_key": "77bfd81b8e7f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Multi-Cloud Deployments are Rising", + "_key": "cbc09602d1f20" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "_key": "b8fc76e313c00", + "_type": "span", + "marks": [], + "text": "As the industry continues to scale their workflow, they are increasingly adopting multi-cloud strategies to meet the demands of diverse computational workflows. In 2021, just " + }, + { + "text": "10% of cloud batch service users", + "_key": "b8fc76e313c01", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " were running workloads in ", + "_key": "b8fc76e313c02" + }, + { + "marks": [ + "strong" + ], + "text": "two separate clouds", + "_key": "b8fc76e313c03", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". By 2024, this figure had ", + "_key": "b8fc76e313c04" + }, + { + "text": "risen to 14%", + "_key": "b8fc76e313c05", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". Additionally, 3% of users utilized three different cloud batch services in 2021, which increased to 4% by 2024. This trend highlights the ", + "_key": "b8fc76e313c06" + }, + { + "marks": [ + "strong" + ], + "text": "move towards deploying across multiple cloud providers", + "_key": "b8fc76e313c07", + "_type": "span" + }, + { + "_key": "b8fc76e313c08", + "_type": "span", + "marks": [], + "text": " to address bioinformatics' growing computational and data availability needs across various regions and technical complexities." + } + ], + "_type": "block", + "style": "normal", + "_key": "d8db77c8db24" + }, + { + "style": "normal", + "_key": "7428ac4b77d7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "fa0bd7669202", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e835fa9dd922" + } + ], + "_type": "block", + "style": "normal", + "_key": "303f6c725740" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Open Science: Advancing Science for Everyone", + "_key": "cf20f469a9910" + } + ], + "_type": "block", + "style": "h3", + "_key": "dff1615fb612", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Open Science has emerged as a transformative approach within the bioinformatics community, significantly enhancing collaboration, efficiency, and cost-effectiveness. Around ", + "_key": "957a2c3bdc7a0", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "82% of survey respondents", + "_key": "957a2c3bdc7a1", + "_type": "span" + }, + { + "marks": [], + "text": " emphasized the", + "_key": "957a2c3bdc7a2", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": " fundamental role of Open Science", + "_key": "957a2c3bdc7a3", + "_type": "span" + }, + { + "_key": "957a2c3bdc7a4", + "_type": "span", + "marks": [], + "text": " in their research practices, reflecting strong community endorsement. Additionally, two-thirds reported " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "significant time-savings", + "_key": "957a2c3bdc7a5" + }, + { + "_key": "957a2c3bdc7a6", + "_type": "span", + "marks": [], + "text": " through Open Science and 42% acknowledged the " + }, + { + "marks": [ + "strong" + ], + "text": "financial benefits, ", + "_key": "957a2c3bdc7a7", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "highlighting the value of transparency in research. This shift fosters effective knowledge sharing and collaborative advancement, accelerating research outcomes while reinforcing accountability and scientific integrity.", + "_key": "957a2c3bdc7a8" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e370eea37e9", + "markDefs": [] + }, + { + "_key": "a09d247e84e8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e803e4047b81" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Read the Full Report Now", + "_key": "7f3ac0c64cdf" + } + ], + "_type": "block", + "style": "h2", + "_key": "2850ca907c4c", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "b357bf5090ac0", + "_type": "span", + "marks": [], + "text": "Our 2024 State of the Workflow Community Survey provides insights into the evolving landscape of bioinformatics and scientific computing. The shift towards public and multi-cloud platforms, combined with the transformative impact of Open Science, is reshaping the Nextflow ecosystem and revolutionizing computational workflows. Embracing these trends not only drives innovation but also ensures that scientific inquiry remains robust, accountable, and accessible to all, paving the way for continued progress in bioinformatics and beyond." + } + ], + "_type": "block", + "style": "normal", + "_key": "1b90b2084f12" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "517a41101c8d" + }, + { + "_type": "link", + "href": "https://hubs.la/Q02HMCZ70", + "_key": "688962534403" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Dive into the ", + "_key": "01a3a713b8430" + }, + { + "marks": [ + "688962534403" + ], + "text": "full report", + "_key": "01a3a713b8431", + "_type": "span" + }, + { + "marks": [], + "text": " to uncover further insights on how bioinformaticians are running pipelines, the pivotal role of the ", + "_key": "01a3a713b8432", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "517a41101c8d" + ], + "text": "nf-core community", + "_key": "01a3a713b8433" + }, + { + "text": ", and other key trends —your glimpse into the future of computational workflows awaits!\n", + "_key": "01a3a713b8434", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c690c04d6473" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "722484bb3cf0" + } + ], + "_type": "block", + "style": "normal", + "_key": "d7acd64713cf" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02HMCZ70", + "_key": "042ff6244a59" + } + ], + "children": [ + { + "text": "DOWNLOAD THE FULL SURVEY NOW", + "_key": "e3fcb08b34ad0", + "_type": "span", + "marks": [ + "042ff6244a59", + "strong", + "underline" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "729b6f099a85" + }, + { + "style": "normal", + "_key": "7d81e062e8a9", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "d1cf1e278328", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong", + "underline" + ], + "text": "\n\n", + "_key": "cbd3cd3cbad9" + } + ], + "_type": "block" + } + ], + "title": "The State of the Workflow 2024: Community Survey Results", + "meta": { + "_type": "meta", + "description": "State of the Workflow 2024 Community Survey Results: Insights from 600+ Nextflow users about the state of workflow management for scientific data analysis", + "noIndex": false, + "slug": { + "current": "the-state-of-the-workflow-2024-community-survey-results", + "_type": "slug" + } + }, + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_rev": "c8Y6ejr6xtast8r4qB9SlG" + }, + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_rev": "n1tMSWxwIdUSjJ5EuKAZgf", + "body": [ + { + "_type": "block", + "style": "h2", + "_key": "5c3efce1d43a", + "markDefs": [], + "children": [ + { + "text": "Streamlining containers lifecycle", + "_key": "9f757a56788c0", + "_type": "span", + "marks": [ + "strong" + ] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "In the bioinformatics landscape, containerized workflows have become crucial for ensuring reproducibility in data analysis. By encapsulating applications and their dependencies into", + "_key": "bbd08134fa220", + "_type": "span", + "marks": [] + }, + { + "text": " portable, self-contained packages", + "_key": "bbd08134fa221", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": ", containers enable seamless distribution across diverse computing environments. However, this innovation comes with its own set of challenges such as maintaining and validating collections of images, operating private registries and limited tool access.", + "_key": "bbd08134fa222", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "133925eb69a3" + }, + { + "_type": "block", + "style": "normal", + "_key": "5406af305818", + "markDefs": [ + { + "_key": "3af825e880ca", + "_type": "link", + "href": "https://seqera.io/wave/" + }, + { + "href": "https://seqera.io/containers/", + "_key": "df13db0993d9", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Seqera’s ", + "_key": "ea38e78f202c0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3af825e880ca" + ], + "text": "Wave", + "_key": "ea38e78f202c1" + }, + { + "_key": "ea38e78f202c2", + "_type": "span", + "marks": [], + "text": " tackles these challenges by offering a suite of features designed to simplify the configuration, provisioning and management of software containers for data pipelines at scale. In this blog, we will explore common pitfalls of managing containerized workflows, examine how Wave overcomes these obstacles, and discover how " + }, + { + "_type": "span", + "marks": [ + "df13db0993d9" + ], + "text": "Seqera Containers", + "_key": "ea38e78f202c3" + }, + { + "text": " further enhances the Wave user experience.", + "_key": "ea38e78f202c4", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d494224541cb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce9241113361" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4r9W0", + "_key": "34a3f2d2cbdc" + } + ], + "children": [ + { + "_key": "3d0c7b96ea780", + "_type": "span", + "marks": [ + "34a3f2d2cbdc" + ], + "text": "Read the Whitepaper Now!" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "e2920de8bcf3" + }, + { + "children": [ + { + "_key": "9b968ce154aa", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "95228c8b8d8b", + "markDefs": [] + }, + { + "style": "h2", + "_key": "24eb80288494", + "markDefs": [], + "children": [ + { + "text": "Handling containerized workflows at scale is not easy", + "_key": "1c7b66a371820", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "2057a7ea39aa", + "markDefs": [ + { + "_type": "link", + "href": "https://biocontainers.pro/", + "_key": "d3824d3cdd3a" + } + ], + "children": [ + { + "text": "Software containers have been heavily adopted as a solution to streamline both the configuration and deployment of dependencies in complex data pipelines. However, maintaining containers at scale is not without its difficulties. Building, storing and distributing container images is an error-prone and tedious task that increases the cognitive load on software engineers, ultimately diminishing their productivity. Community-maintained container collections, such as ", + "_key": "d660fe7595e30", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "d3824d3cdd3a" + ], + "text": "BioContainers", + "_key": "d660fe7595e31", + "_type": "span" + }, + { + "_key": "d660fe7595e32", + "_type": "span", + "marks": [], + "text": ", have emerged to mitigate some of these challenges. However, still, several problems remain:" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Publicly Accessible Container Images", + "_key": "a01a55bd260e0", + "_type": "span" + }, + { + "marks": [], + "text": ": Issues with stability can compromise reliability. Typically unsuitable for non-academic organizations due to security and compliance concerns.\n\n", + "_key": "a01a55bd260e1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3099a4010964" + }, + { + "_key": "592901bbf056", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Limited Tool Access: ", + "_key": "2b4dea072f470" + }, + { + "text": "Access is restricted to only to specific tools or collections (e.g. BioConda). Organizations often need the flexibility to assemble and deploy custom containers.\n\n", + "_key": "2b4dea072f471", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "1969250dec82", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "API Rate Limits:", + "_key": "f3612e248c410" + }, + { + "_type": "span", + "marks": [], + "text": " Public registries often impose low API rate limits and afford low-rate or low-quality SLAs, making them unsuitable for production workloads.\n\n", + "_key": "f3612e248c411" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Egress Costs", + "_key": "5a3fdfc32935" + }, + { + "text": ": Use of private registries can incur outbound data transfer costs, particularly when deploying pipelines at scale across multiple regions or cloud providers.\n\n", + "_key": "dd0bf6ca10b3", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9ea73e59c666", + "listItem": "bullet" + }, + { + "children": [ + { + "marks": [], + "text": "Seqera’s ", + "_key": "2d0b37d34c330", + "_type": "span" + }, + { + "_key": "2d0b37d34c331", + "_type": "span", + "marks": [ + "f54c460b69c3" + ], + "text": "Wave" + }, + { + "_key": "2d0b37d34c332", + "_type": "span", + "marks": [], + "text": " solves these problems by simplifying the management of containerized bioinformatics workflows by " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "provisioning containers on-demand during pipeline execution.", + "_key": "2d0b37d34c333" + }, + { + "_type": "span", + "marks": [], + "text": " This approach ensures the delivery of container images that are defined precisely depending on requirements of each pipeline task in terms of dependencies and platform architecture. The process is ", + "_key": "2d0b37d34c334" + }, + { + "text": "completely transparent and fully automated,", + "_key": "2d0b37d34c335", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "2d0b37d34c336", + "_type": "span", + "marks": [], + "text": " eliminating the need to manually create, upload and maintain the numerous container images required for pipeline execution." + } + ], + "_type": "block", + "style": "normal", + "_key": "1bd6e257c774", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/wave/", + "_key": "f54c460b69c3" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "By integrating containers as ", + "_key": "a75abd67740a0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "dynamic pipeline components ", + "_key": "a75abd67740a1" + }, + { + "marks": [], + "text": "rather than standalone artifacts, Wave streamlines development, enhances reliability, and reduces maintenance overhead. This makes it easier for developers and operations teams to build, deploy, and manage containers efficiently and securely.", + "_key": "a75abd67740a2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b7b00332c37f" + }, + { + "_key": "2bcc79444f9d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ff8ecd1929ad" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "How does Wave work?", + "_key": "ada86664aefe0", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "bc1396a38011" + }, + { + "markDefs": [ + { + "_key": "0d6219f32dfe", + "_type": "link", + "href": "https://training.nextflow.io/basic_training/containers/#container-directives" + } + ], + "children": [ + { + "_key": "d2a901bcfde60", + "_type": "span", + "marks": [], + "text": "Wave transforms containers and pipeline management by allowing bioinformaticians to specify container requirements directly within their pipeline definitions. Instead of referencing manually created container images in " + }, + { + "marks": [ + "0d6219f32dfe" + ], + "text": "Nextflow’s ", + "_key": "d2a901bcfde61", + "_type": "span" + }, + { + "_key": "d2a901bcfde62", + "_type": "span", + "marks": [ + "0d6219f32dfe", + "em" + ], + "text": "container" + }, + { + "_type": "span", + "marks": [ + "0d6219f32dfe" + ], + "text": " directive", + "_key": "d2a901bcfde63" + }, + { + "_type": "span", + "marks": [], + "text": ", developers can either include a Dockerfile in the directory where the process' module is defined or just instruct Wave to use the Conda package associated with the process definition. By using this information, Wave provisions a container on-demand either using an existing container image in the target registry matching the specified requirement or building an new one on-the-fly to fulfill a new request, and returns the container URI pointing to the Wave container for process execution. The built container is then pushed to a destination registry and returned to the pipeline for execution, ensuring seamless integration and optimization across ", + "_key": "d2a901bcfde64" + }, + { + "marks": [ + "strong" + ], + "text": "diverse computational architectures.", + "_key": "d2a901bcfde65", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "86f8349d8dd9" + }, + { + "style": "normal", + "_key": "4bd12760fcd8", + "markDefs": [ + { + "_key": "b0443a516524", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/config.html" + } + ], + "children": [ + { + "marks": [], + "text": "Wave can also direct containers into a registry specified in the ", + "_key": "201624e167ae0", + "_type": "span" + }, + { + "_key": "201624e167ae1", + "_type": "span", + "marks": [ + "b0443a516524" + ], + "text": "nextflow.config file" + }, + { + "_type": "span", + "marks": [], + "text": ", along with other pipeline settings. This means containers can be served from cloud registries closer to where pipelines are executed, delivering ", + "_key": "201624e167ae2" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "better performance and reducing network traffic", + "_key": "201624e167ae3" + }, + { + "marks": [], + "text": ". Moreover, Wave operates independently, serving as a versatile tool for bioinformaticians across various platforms and workflows. By employing ", + "_key": "201624e167ae4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "multi-level caching,", + "_key": "201624e167ae5" + }, + { + "marks": [], + "text": " Wave ensures that containers are built only once or when the Dockerfile changes, enhancing efficiency and streamlining the management of bioinformatics workflows.", + "_key": "201624e167ae6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "6f161a64aa34", + "asset": { + "_ref": "image-63c1caffc660a4c615ef2551318bc7b8fb8eca7b-2165x680-png", + "_type": "reference" + } + }, + { + "children": [ + { + "_key": "2bb9a056f41c0", + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 1." + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": " Wave —a smart container provisioning and augmentation service for Nextflow.", + "_key": "2bb9a056f41c1" + } + ], + "_type": "block", + "style": "normal", + "_key": "e33f18dee5bf", + "markDefs": [] + }, + { + "children": [ + { + "_key": "4e5278eb85f4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f6e484ee66d6", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "1f7634971dca", + "markDefs": [], + "children": [ + { + "text": "Key features of Wave", + "_key": "a220fc9a291d0", + "_type": "span", + "marks": [ + "strong" + ] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "14d7473d0d0f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "✔ ", + "_key": "5178d679f1fd0" + }, + { + "marks": [ + "strong" + ], + "text": "Access private container repositories", + "_key": "5178d679f1fd1", + "_type": "span" + }, + { + "_key": "5178d679f1fd2", + "_type": "span", + "marks": [], + "text": ": Seamlessly integrate Nextflow pipelines with Seqera Platform to grant access to private container repositories." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ee1c448e55ff", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "✔ ", + "_key": "770c482bc4f10" + }, + { + "_key": "770c482bc4f11", + "_type": "span", + "marks": [ + "strong" + ], + "text": "On-demand container provisioning:" + }, + { + "_type": "span", + "marks": [], + "text": " Automatically provision containers (via Dockerfile or Conda packages) based on dependencies in your Nextflow pipeline, enhancing efficiency, reducing errors, and eliminating the need for separate container builds and maintenance.", + "_key": "770c482bc4f12" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "✔ ", + "_key": "753a230208440" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Enhanced security", + "_key": "753a230208441" + }, + { + "marks": [], + "text": ": Each new container provisioned by Wave undergoes a security scan to identify potential vulnerabilities.", + "_key": "753a230208442", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "dfc5be5494ed", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "✔", + "_key": "7ec429cf22a90", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " Create multi-tool and multi-package containers", + "_key": "7ec429cf22a91" + }, + { + "text": ": Easily build and manage containers with diverse tools and packages, streamlining complex workflows with multiple dependencies.", + "_key": "7ec429cf22a92", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e50b6a0ff82a" + }, + { + "style": "normal", + "_key": "a8c485516400", + "markDefs": [], + "children": [ + { + "_key": "51c15bf17a880", + "_type": "span", + "marks": [], + "text": "✔" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " Provision multi-format and multi-platform containers: ", + "_key": "51c15bf17a881" + }, + { + "marks": [], + "text": "Automatically provision containers for Docker or Singularity based on your Nextflow pipeline configuration and platform, including ARM64 containers for AWS Graviton if a compatible Dockerfile or Conda package is provided.", + "_key": "51c15bf17a882", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "741df5fa35600", + "_type": "span", + "marks": [], + "text": "✔ " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Mirror Public and Private Repositories", + "_key": "741df5fa35601" + }, + { + "text": ": Mirror the containers needed by your pipelines in a registry co-located with where pipeline execution is carried out, allowing optimized data transfer costs and accelerated execution of pipeline tasks.", + "_key": "741df5fa35602", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8d580f86471d" + }, + { + "_key": "e1cf4e3eabf4", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4r9W0", + "_key": "45a6b1f720b3" + } + ], + "children": [ + { + "text": "Download the Whitepaper", + "_key": "5ebb5fcdeb6b", + "_type": "span", + "marks": [ + "45a6b1f720b3" + ] + }, + { + "_key": "835caf3a91f0", + "_type": "span", + "marks": [], + "text": " to explore features in more detail" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Seqera Containers for publicly accessible container images", + "_key": "f4fa5324fa620" + } + ], + "_type": "block", + "style": "h2", + "_key": "418255c8286d" + }, + { + "_key": "6b076c6e5be9", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4rwk0", + "_key": "49dd8bebf517" + } + ], + "children": [ + { + "text": "With the newly launched ", + "_key": "f2dc6721adb90", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "49dd8bebf517" + ], + "text": "Seqera Containers", + "_key": "f2dc6721adb91", + "_type": "span" + }, + { + "marks": [], + "text": ", the Wave experience is elevated even further. Now, instead of browsing existing container images as with a traditional container registry, users can just specify which tools they require through an ", + "_key": "f2dc6721adb92", + "_type": "span" + }, + { + "_key": "f2dc6721adb93", + "_type": "span", + "marks": [ + "strong" + ], + "text": "intuitive and user-friendly web interface. " + }, + { + "text": "This will find an existing container image for the required tool(s) or build a container on-the-fly using the Wave service. Currently it supports any software package provided by the Bioconda, Conda forge and Pypi Conda channels. Container can be built both for Docker and Singularity image format and linux/amd64 and linux/amd64 CPU architecture.", + "_key": "f2dc6721adb94", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d950641601c00", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4c985acdcf31" + }, + { + "_key": "e81e3d89583c", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/containers/", + "_key": "171c64cfcfbe" + }, + { + "_key": "339b65aacd58", + "_type": "link", + "href": "https://community.wave.seqera.io/" + } + ], + "children": [ + { + "_key": "3a00b180c5180", + "_type": "span", + "marks": [], + "text": "Additionally, " + }, + { + "text": "Seqera Containers", + "_key": "3a00b180c5181", + "_type": "span", + "marks": [ + "171c64cfcfbe" + ] + }, + { + "marks": [], + "text": " are stored permanently and publicly accessible via the registry host ", + "_key": "3a00b180c5182", + "_type": "span" + }, + { + "_key": "8c65146fbd29", + "_type": "span", + "marks": [ + "339b65aacd58" + ], + "text": "community.wave.seqera.io" + }, + { + "_key": "11bc24deeace", + "_type": "span", + "marks": [], + "text": ". This ensures that any future requests for the same package will return the exact container image, guaranteeing reproducibility across runs. Seqera Containers project was developed in collaboration with Amazon Web Service, which is sponsoring the container hosting infrastructure.\n" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "7035fe8ea204", + "asset": { + "_ref": "image-d505d0b687501b2f43a47a688dc2e096886fbfff-883x451-jpg", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 2", + "_key": "5497721bd8e40" + }, + { + "_key": "5497721bd8e41", + "_type": "span", + "marks": [ + "em" + ], + "text": ". Snapshot of Seqera Containers, demonstrating how you can create containers with the tools you want, on the fly." + }, + { + "text": "\n", + "_key": "5fccd7d33cef", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c5f9e1119426" + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Discover the benefits of Wave", + "_key": "f811f3a434440", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "aa383c5b254c" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9da5596afc7b0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a9639af3c287" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a44c589047e10", + "_type": "span", + "marks": [], + "text": "Wave offers a transformative solution to the complexities of managing containerized bioinformatics workflows. By integrating containers directly into pipelines and prioritizing flexibility and efficiency, Wave streamlines development, enhances security, and optimizes performance across diverse computing environments. Deep dive into how Wave can revolutionize your workflow management by downloading our whitepaper today." + } + ], + "_type": "block", + "style": "normal", + "_key": "5f5fe5035844" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4r9W0", + "_key": "29092c152215" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "29092c152215" + ], + "text": "Download the Wave Whitepaper", + "_key": "08116ce2b1b70" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "c4ebd70557cd" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2d5affde12080" + } + ], + "_type": "block", + "style": "normal", + "_key": "2deff6b4aab7" + } + ], + "_updatedAt": "2024-09-10T08:00:16Z", + "tags": [ + { + "_ref": "6f35c54a-0d93-4aef-9d80-bd4ccb6527b4", + "_type": "reference", + "_key": "e6e4331ef27a" + } + ], + "meta": { + "description": "In the bioinformatics landscape, containerized workflows have become crucial for ensuring reproducibility in data analysis. By encapsulating applications and their dependencies into portable, self-contained packages, containers enable seamless distribution across diverse computing environments.", + "noIndex": false, + "slug": { + "current": "wave-rethinking-software-containers-for-data-pipelines", + "_type": "slug" + }, + "_type": "meta" + }, + "_type": "blogPost", + "_id": "b032b7fb-8dc8-464e-b4c8-18cc9b8c2dd1", + "publishedAt": "2024-09-10T07:44:00.000Z", + "_createdAt": "2024-09-09T07:56:25Z", + "title": "Wave: rethinking software containers for data pipelines" + }, + { + "_id": "b4ad09fa-b8ee-484f-9843-57c3073027a8", + "_rev": "hLqYCNYcORjetYCGdcbaMx", + "_updatedAt": "2024-08-27T23:50:43Z", + "_createdAt": "2024-04-29T15:30:18Z", + "_type": "blogPost", + "title": "Data Studios – Interactive analysis in Seqera Platform", + "publishedAt": "2024-05-23T12:29:00.000Z", + "author": { + "_type": "reference", + "_ref": "f25df58c-156e-4294-98ba-f9dcd6860c39" + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow is the de facto standard for reproducible workflows in the cloud, but the scientific data lifecycle is much broader than just pipelines — including iterative development, tertiary analysis, and data modeling. With the Seqera Platform, we aim to enable rapid iteration and collaboration across the entire scientific lifecycle, saving you time whether you’re experimenting, conducting research, preparing for your next clinical trial, or producing a new therapeutic.", + "_key": "9b67215adc020", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3747df96bf39" + }, + { + "children": [ + { + "_key": "68aad3fffb30", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d0aeec9f5ae8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "eede8d97a919", + "markDefs": [ + { + "_type": "link", + "href": "https://youtu.be/yfMFFHTR-dk?feature=shared", + "_key": "241de9355755" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In October 2023, Seqera CEO and co-founder Evan Floden ", + "_key": "82d742a05e220" + }, + { + "text": "unveiled the private-preview of Data Studios", + "_key": "82d742a05e221", + "_type": "span", + "marks": [ + "241de9355755" + ] + }, + { + "_key": "82d742a05e222", + "_type": "span", + "marks": [], + "text": ", enabling streamlined creation of collaborative notebook environments using cloud-native components coupled with your data and hosted in your own secure environment. Today we’re excited to announce that Data Studios is publicly available to all Seqera Cloud users in Public Preview!" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Combining Workflows and Data Analysis", + "_key": "b01378acab3e0" + } + ], + "_type": "block", + "style": "h2", + "_key": "5ee9848c5668", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Nextflow and Seqera Platform are enormously effective at launching, managing, and collaborating on scientific data analysis pipelines. However, a pipeline run is often not where the analysis ends, and for every user who needs to run and manage pipelines, many others, including analysts and data scientists, need interactive environments such as ", + "_key": "83cca5c492c90", + "_type": "span" + }, + { + "_key": "83cca5c492c91", + "_type": "span", + "marks": [ + "deaaa72caddd" + ], + "text": "Jupyter Notebooks" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "83cca5c492c92" + }, + { + "_type": "span", + "marks": [ + "ba240fce2f0a" + ], + "text": "RStudio", + "_key": "83cca5c492c93" + }, + { + "marks": [], + "text": ". These are used for exploratory data analysis, modeling, and building visualizations and dashboards for analyzing and sharing scientific results.", + "_key": "83cca5c492c94", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6f8a7dbbff78", + "markDefs": [ + { + "_key": "deaaa72caddd", + "_type": "link", + "href": "https://jupyter.org/" + }, + { + "href": "https://posit.co/products/open-source/rstudio-server/", + "_key": "ba240fce2f0a", + "_type": "link" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "For scientific users, deploying and configuring secure, performant interactive notebook environments to work with data in context has traditionally been surprisingly hard. As a concrete example, consider a scenario where a pipeline is running on AWS, and a data scientist wants to analyze results stored in Amazon S3 using a familiar Jupyter Notebook. Configuration doesn’t happen by itself: the notebook must be hosted, made network accessible, authorization limited to specific groups of users, and pre-configured with packages commonly used in bioinformatics, such as ", + "_key": "cad5c149afc70" + }, + { + "text": "Biopython", + "_key": "cad5c149afc71", + "_type": "span", + "marks": [ + "176bf45acd7f" + ] + }, + { + "text": ", ", + "_key": "cad5c149afc72", + "_type": "span", + "marks": [] + }, + { + "text": "NumPy", + "_key": "cad5c149afc73", + "_type": "span", + "marks": [ + "23c777ded073" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "cad5c149afc74" + }, + { + "_type": "span", + "marks": [ + "76defb7bdf72" + ], + "text": "Scikit-learn", + "_key": "cad5c149afc75" + }, + { + "marks": [], + "text": ", and ", + "_key": "cad5c149afc76", + "_type": "span" + }, + { + "_key": "cad5c149afc77", + "_type": "span", + "marks": [ + "05f2be477354" + ], + "text": "Matplotlib" + }, + { + "text": ".", + "_key": "cad5c149afc78", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6c284cb1c78b", + "markDefs": [ + { + "_type": "link", + "href": "https://biopython.org/", + "_key": "176bf45acd7f" + }, + { + "_type": "link", + "href": "https://numpy.org/", + "_key": "23c777ded073" + }, + { + "_key": "76defb7bdf72", + "_type": "link", + "href": "https://scikit-learn.org/stable/" + }, + { + "_type": "link", + "href": "https://matplotlib.org/", + "_key": "05f2be477354" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "33ed6097af29", + "markDefs": [ + { + "_type": "link", + "href": "https://pandas.pydata.org/", + "_key": "b03950b8cd86" + }, + { + "_type": "link", + "href": "https://pypi.org/project/s3fs/", + "_key": "d9bd8791dd9f" + } + ], + "children": [ + { + "marks": [], + "text": "Before data can even be read using ", + "_key": "79bac17c20c90", + "_type": "span" + }, + { + "_key": "79bac17c20c91", + "_type": "span", + "marks": [ + "b03950b8cd86" + ], + "text": "pandas" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "79bac17c20c92" + }, + { + "text": "s3fs", + "_key": "79bac17c20c93", + "_type": "span", + "marks": [ + "d9bd8791dd9f" + ] + }, + { + "text": " must be installed, which in turn depends on other prerequisite packages. Additionally, Notebook users must know the paths to the S3 buckets where the data files reside, including the Nextflow pipeline work directory, and have appropriate access.", + "_key": "79bac17c20c94", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Multiply this complexity across multiple tools, cloud providers, file stores, languages, and libraries, and you get the picture: configuring these environments is tedious, time-consuming, error-prone, and often beyond the privilege-level or expertise of analysts.", + "_key": "656ebd1feba10" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d7987426cdb", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Data Studios – Simplifying Analysis Environment Management", + "_key": "9f2d80bce8e40", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "5f76dcbc9e10", + "markDefs": [] + }, + { + "style": "normal", + "_key": "9ee6aebb5ed6", + "markDefs": [], + "children": [ + { + "_key": "7a29798a76410", + "_type": "span", + "marks": [], + "text": "Data Studios enable you to easily create, manage, and share notebook environments in Seqera Platform using point-and-click actions — connecting your data to on-demand batch computing resources — similar to how you currently manage Nextflow pipelines." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Like pipelines, Data Studios enable simple deployment and scaling using customizable, ephemeral compute environments and containers. You add new interactive environments based on predefined templates, as shown below, defining your own metadata, vCPUs and memory, and deploying them with any (public or private) data mounted on a variety of compute environments already configured in Seqera Platform.", + "_key": "e89cae2f83e70", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b1c7a00ccc8a" + }, + { + "_type": "youtube", + "id": "hXqaxkfx5Fo", + "_key": "828447aabb80" + }, + { + "children": [ + { + "text": "The initial release of Data Studios ships with pre-built container templates for Jupyter and RStudio, and environments can be shared with individuals and teams in Seqera Platform using Role Based Access Control (RBAC).", + "_key": "9858767267210", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "19737e623cef", + "markDefs": [] + }, + { + "style": "normal", + "_key": "8c7b3e38ba0a", + "markDefs": [], + "children": [ + { + "text": "The productivity impacts for data scientists and analysts are profound: you can launch your preferred interactive environment with a single click, pre-configured with the necessary libraries and notebook markdown files, and have immediate access to pipeline data output for real-time analysis in-context. Furthermore, you can collaborate with colleagues by securely sharing Data Studios, along with the code and visualizations within. Some use cases already developed include:", + "_key": "f5a74dcbc4f90", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "6962bf0684f4", + "_type": "link", + "href": "https://nf-co.re/scrnaseq/2.6.0" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Processing single-cell RNAseq data using ", + "_key": "8597e771fe3c0" + }, + { + "marks": [ + "6962bf0684f4" + ], + "text": "nf-core/scrnaseq", + "_key": "8597e771fe3c1", + "_type": "span" + }, + { + "_key": "8597e771fe3c2", + "_type": "span", + "marks": [], + "text": " and performing downstream, interactive analysis using the popular Scanpy (Python) or Seurat (R) packages." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1dbbb766369d" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/differentialabundance/1.5.0", + "_key": "ca2f018962e4" + } + ], + "children": [ + { + "marks": [], + "text": "Running a differential gene expression analysis using ", + "_key": "f3f51ce005e00", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "ca2f018962e4" + ], + "text": "nf-core/differentialabundance", + "_key": "f3f51ce005e01" + }, + { + "_type": "span", + "marks": [], + "text": " and launching an R Shiny app to explore the results in an RStudio notebook.", + "_key": "f3f51ce005e02" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5fb4b2606c71" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Extending pipeline functionality by experimenting with Nextflow and Bash in VSCode directly using output data from your pipeline run in Seqera Platform.", + "_key": "38ecc47eb508", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d69c14515886" + }, + { + "asset": { + "_ref": "image-cca2af1246925935f1649dab3584aeb6d5d63d58-1314x858-png", + "_type": "reference" + }, + "_type": "image", + "_key": "50f8e11bbc35" + }, + { + "children": [ + { + "text": "Snapshots and Session Persistence", + "_key": "c7c22f10dc4a0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "b53f59b5c016", + "markDefs": [] + }, + { + "style": "normal", + "_key": "919ee4be64cb", + "markDefs": [], + "children": [ + { + "_key": "91775fdb33e00", + "_type": "span", + "marks": [], + "text": "Data Studios can be started and stopped at-will, preserving state at every step. This includes all code, output and metadata, ensuring minimum costs are incurred compared to managing independent, dedicated analysis VMs. And all while providing fault tolerance, improved reproducibility, and portability of analyses." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "aae0e9bff38b0", + "_type": "span", + "marks": [], + "text": "State is preserved via timestamped snapshots of the Data Studio environment. Individual snapshots can optionally be renamed for improved discoverability, and used as the base template for a new Data Studio, preserving the complete analysis history and allowing experimentation without impacting the original analysis environment." + } + ], + "_type": "block", + "style": "normal", + "_key": "788d1a1a9db2" + }, + { + "asset": { + "_ref": "image-b752c9e4a00bcd028391a8265b31992b7bdf04d0-1313x859-png", + "_type": "reference" + }, + "_type": "image", + "_key": "babd96dc2978" + }, + { + "_type": "block", + "style": "h2", + "_key": "436881fdd30c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Not just for analysis", + "_key": "1adec5edc4090" + } + ] + }, + { + "style": "normal", + "_key": "9d9a7cf2c0e5", + "markDefs": [ + { + "href": "https://code.visualstudio.com/docs/remote/vscode-server", + "_key": "a227a4250f32", + "_type": "link" + } + ], + "children": [ + { + "_key": "9ab1335a4b120", + "_type": "span", + "marks": [], + "text": "Beyond analysts and data scientists, Data Studios are a powerful tool for bioinformaticians developing workflows. In this initial release, we offer a Data Studios template for Microsoft’s " + }, + { + "_key": "9ab1335a4b121", + "_type": "span", + "marks": [ + "a227a4250f32" + ], + "text": "VS Code Server" + }, + { + "marks": [], + "text": " — a web-based version of the popular VS Code IDE commonly used by Nextflow pipeline developers.", + "_key": "9ab1335a4b122", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "724074849cc4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Unlike the current process where developers typically build and test Nextflow modules and pipelines locally, Data Studios facilitates building, testing, and troubleshooting pipelines in production environments using cloud executors and real data.", + "_key": "0b117cdb56d50" + } + ] + }, + { + "_key": "9323e2b0b843", + "markDefs": [], + "children": [ + { + "_key": "e9a49bc43c070", + "_type": "span", + "marks": [], + "text": "Software issues commonly appear when running in specific environments or with particular datasets. Faced with a problem, developers can simply enter their familiar IDE in Data Studios and begin troubleshooting the issue live and in context using real pipeline data." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "786c5d5206f3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Looking forward", + "_key": "0526e2f1baaa0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "cebbe88ca952", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/introducing-data-explorer/", + "_key": "90e54ff7b10a" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Just as ", + "_key": "c34f69a868af0" + }, + { + "marks": [ + "90e54ff7b10a" + ], + "text": "Data Explorer", + "_key": "c34f69a868af1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " boosts productivity for researchers and analysts, Data Studios does the same for data scientists. Data Explorer enables researchers to easily access and manage data residing in cloud storage buckets from within Seqera Platform, without switching to external environments like the Amazon S3 console. Similarly, Data Studios enables users to easily launch interactive open science tools to analyze data in-context — no matter where the pipelines run or the output data resides — and use those analyses to inform colleagues in real-time with critical updates to pivot experimental approaches or methodologies. By combining Data Explorer, Pipelines, and Data Studios, Seqera Platform helps guide teams through the scientific data lifecycle enabling:", + "_key": "c34f69a868af2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "048875dc524f", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Simple linking and exploration of data as it’s generated via Data Explorer.", + "_key": "56c8273279270", + "_type": "span" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "160fcb340239", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Ability to easily develop, deploy, and scale Pipelines.", + "_key": "f7ce514137250", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b52a78e1182c", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Seamless transition from Pipeline output to interactive analysis with Data Studios.", + "_key": "77a3907646b50", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "46b053085fb1", + "markDefs": [], + "children": [ + { + "_key": "cfc21d68932c0", + "_type": "span", + "marks": [], + "text": "While work continues, Data Studios represents a significant step forward. In the coming months, we'll continue developing additional features including support for custom templates, a cost estimator, resource labels, and improved integration across the Seqera Platform." + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "a375f3e9e299", + "_type": "link", + "href": "https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/integrative-genomics-viewer.html" + }, + { + "_type": "link", + "href": "https://github.com/Xpra-org/xpra/tree/master", + "_key": "0f7c000bf363" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Much as the nf-core community builds and curates production-quality pipelines and modules, we envision a similar catalog of Data Studio templates in the future comprising additional interactive analysis tools, such as ", + "_key": "50e028c069d90" + }, + { + "_type": "span", + "marks": [ + "a375f3e9e299" + ], + "text": "Integrative Genomics Viewer", + "_key": "50e028c069d91" + }, + { + "marks": [], + "text": " (IGV) and web-based IDEs such as ", + "_key": "50e028c069d92", + "_type": "span" + }, + { + "text": "xpra", + "_key": "50e028c069d93", + "_type": "span", + "marks": [ + "0f7c000bf363" + ] + }, + { + "marks": [], + "text": ".", + "_key": "50e028c069d94", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc3c736fd7db" + }, + { + "style": "h2", + "_key": "73553d428748", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Learning more", + "_key": "6339031fea9f0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can view running Data Studios today in the Seqera Platform ", + "_key": "a765d985bb520" + }, + { + "text": "Community Showcase workspace", + "_key": "a765d985bb521", + "_type": "span", + "marks": [ + "d64d0b1e03f4" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". To enable Data Studios for your own organization, reach out to your Seqera Account Manager or start a ", + "_key": "a765d985bb522" + }, + { + "text": "free-trial", + "_key": "a765d985bb523", + "_type": "span", + "marks": [ + "e8e9efad9a15" + ] + }, + { + "marks": [], + "text": " today.", + "_key": "a765d985bb524", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f85467450616", + "markDefs": [ + { + "_key": "d64d0b1e03f4", + "_type": "link", + "href": "https://cloud.seqera.io/orgs/community/workspaces/showcase/" + }, + { + "href": "https://cloud.seqera.io/login", + "_key": "e8e9efad9a15", + "_type": "link" + } + ] + } + ], + "tags": [ + { + "_type": "reference", + "_key": "8e3ca4dcbba0", + "_ref": "82fd60f1-c6d0-4b8a-9c5d-f971c622f341" + }, + { + "_type": "reference", + "_key": "25faaa6c567d", + "_ref": "f1d61674-9374-4d2c-97c2-55778db7c922" + }, + { + "_type": "reference", + "_key": "c757b6d26455", + "_ref": "2b5c9a56-b491-42aa-b291-86611d77ccec" + } + ], + "meta": { + "noIndex": false, + "slug": { + "_type": "slug", + "current": "data-studios-announcement" + }, + "_type": "meta", + "shareImage": { + "asset": { + "_ref": "image-a9c10263524a5906891633a2c94da730848d3ba8-1200x628-png", + "_type": "reference" + }, + "_type": "image" + }, + "description": "An overview of Data Studios - a new feature in Seqera Platform that enables analysts and data scientists to add interactive envrionments." + } + }, + { + "tags": [ + { + "_type": "reference", + "_key": "f30d3e591314", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "4525d8907a1f" + }, + { + "_key": "7c9827906277", + "_ref": "ab59634e-a349-468d-8f99-cb9fe4c38228", + "_type": "reference" + } + ], + "_rev": "347cad33-9d92-4365-ba09-18e6c2a688a3", + "body": [ + { + "_key": "fc11d5317163", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "This is a joint article contributed to the Seqera blog by Jon Manning of Seqera and Felix Krueger of Altos Labs describing the new nf-core/riboseq pipeline.", + "_key": "8c2ee84cdf5e0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "a96b84f9b665", + "markDefs": [ + { + "_key": "39d86b09469d", + "_type": "link", + "href": "https://nf-co.re/" + }, + { + "href": "https://nf-co.re/riboseq", + "_key": "f22304d582ae", + "_type": "link" + }, + { + "href": "https://en.wikipedia.org/wiki/Ribosome_profiling", + "_key": "23797f8146f8", + "_type": "link" + } + ], + "children": [ + { + "text": "In April 2024, the bioinformatics community welcomed a significant addition to the ", + "_key": "5355407782e60", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "39d86b09469d" + ], + "text": "nf-core", + "_key": "5355407782e61", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " suite: the ", + "_key": "5355407782e62" + }, + { + "_key": "5355407782e63", + "_type": "span", + "marks": [ + "f22304d582ae" + ], + "text": "nf-core/riboseq" + }, + { + "marks": [], + "text": " pipeline. This new tool, born from a collaboration between Altos Labs and Seqera, underscores the potential of strategic partnerships to advance scientific research. In this article, we provide some background on the project, offer details on the pipeline, and explain how readers can get started with ", + "_key": "5355407782e64", + "_type": "span" + }, + { + "_key": "5355407782e65", + "_type": "span", + "marks": [ + "23797f8146f8" + ], + "text": "Ribo-seq" + }, + { + "_key": "5355407782e66", + "_type": "span", + "marks": [], + "text": " analysis." + } + ] + }, + { + "_key": "ff2e29964409", + "markDefs": [], + "children": [ + { + "_key": "06511e51fc0b", + "_type": "span", + "marks": [], + "text": "A Fruitful Collaboration" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Altos Labs is known for its ambitious efforts in harnessing cellular rejuvenation to reverse disease, injury, and disabilities that can occur throughout life. Their scientific strategy heavily relies on understanding cellular mechanisms via advanced technologies. Ribo-seq provides insights into the real-time translation of proteins, a core process often dysregulated during aging and disease. Altos Labs needed a way to ensure reliable, reproducible Ribo-seq analysis that its research teams could use. While a Ribo-seq pipeline had been started in nf-core, limited progress had been made. Seqera seemed the ideal partner to help build one!", + "_key": "ef4460f305a4" + } + ], + "_type": "block", + "style": "normal", + "_key": "212704cdad6c" + }, + { + "_type": "block", + "style": "normal", + "_key": "3a4e325a6885", + "markDefs": [ + { + "href": "https://seqera.io/nextflow/", + "_key": "afd8d4976f75", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.zs.com/", + "_key": "8fc76bfd5785" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera, known for creating and developing the ", + "_key": "402551d96a99" + }, + { + "marks": [ + "afd8d4976f75" + ], + "text": "Nextflow DSL", + "_key": "a11895ee51be", + "_type": "span" + }, + { + "text": " and being an active partner in establishing community standards on nf-core, brought the expertise needed to translate Altos Labs' vision into a viable community pipeline. As part of this collaboration, we formed a working group and also reached out to colleagues at ", + "_key": "206247a437cc", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "8fc76bfd5785" + ], + "text": "ZS", + "_key": "da520fc0d7f3" + }, + { + "_key": "c8e26b5b7392", + "_type": "span", + "marks": [], + "text": " and other community members who had done prior work with Ribosome profiling in Nextflow. Our goal was not only to enhance Ribo-seq analysis capabilities but also to ensure the pipeline’s sustainability through a community-driven process." + } + ] + }, + { + "_key": "110443549dbc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Development Insights", + "_key": "023772c169b7" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "1bb6d0dcf94a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The nf-core/riboseq project was structured into several phases:", + "_key": "fcef7ffc7722", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Initial planning", + "_key": "04af5c122b050" + }, + { + "_type": "span", + "marks": [], + "text": ": This phase involved detailed discussions between the Scientific Development team at Seqera, Altos Labs, and expert partners to ensure alignment with best practices and effective tool selection.", + "_key": "04af5c122b051" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9bd450582af3", + "listItem": "bullet" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://nf-co.re/rnaseq", + "_key": "3fa0f88295d5", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Adapting existing components", + "_key": "4eb6302b38970" + }, + { + "_type": "span", + "marks": [], + "text": ": Key pre-processing and alignment functions were adapted from the ", + "_key": "4eb6302b38971" + }, + { + "text": "nf-core/rnaseq", + "_key": "4eb6302b38972", + "_type": "span", + "marks": [ + "3fa0f88295d5" + ] + }, + { + "marks": [], + "text": " pipeline, allowing for shareability, efficiency, and scalability.", + "_key": "4eb6302b38973", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ef189e78f7f5" + }, + { + "_key": "dc6acae62561", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/zhpn1024/ribotish", + "_key": "6be1a3f37f71" + }, + { + "_type": "link", + "href": "https://github.com/smithlabcode/ribotricer", + "_key": "67a956a543b0" + }, + { + "_type": "link", + "href": "https://www.bioconductor.org/packages/release/bioc/html/anota2seq.html", + "_key": "5f9cca0d1922" + }, + { + "href": "https://biocontainers.pro/", + "_key": "a24a587b6c75", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/modules", + "_key": "d813571ed2e7" + } + ], + "children": [ + { + "marks": [ + "em" + ], + "text": "New tool integration", + "_key": "f59020155b400", + "_type": "span" + }, + { + "marks": [], + "text": ": Specific tools for Ribo-seq analysis, such as ", + "_key": "f59020155b401", + "_type": "span" + }, + { + "marks": [ + "6be1a3f37f71" + ], + "text": "Ribo-TISH", + "_key": "f59020155b402", + "_type": "span" + }, + { + "text": ", ", + "_key": "f59020155b403", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "67a956a543b0" + ], + "text": "Ribotricer", + "_key": "f59020155b404" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "f59020155b405" + }, + { + "marks": [ + "5f9cca0d1922" + ], + "text": "anota2seq", + "_key": "f59020155b406", + "_type": "span" + }, + { + "_key": "f59020155b407", + "_type": "span", + "marks": [], + "text": ", were wrapped into modules using " + }, + { + "_type": "span", + "marks": [ + "a24a587b6c75" + ], + "text": "Biocontainers", + "_key": "f59020155b408" + }, + { + "_type": "span", + "marks": [], + "text": ", within comprehensive testing frameworks to prevent regression and ensure reliability. These components were contributed to the ", + "_key": "f59020155b409" + }, + { + "_type": "span", + "marks": [ + "d813571ed2e7" + ], + "text": "nf-core/modules", + "_key": "f59020155b4010" + }, + { + "text": " repository, which will now be available for the wider community to reuse, independent of this effort.", + "_key": "f59020155b4011", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "27cae4355e4d0", + "_type": "span", + "marks": [ + "em" + ], + "text": "Pipeline development" + }, + { + "marks": [], + "text": ": Individual components were stitched together coherently to create the nf-core/riboseq pipeline, with its own testing framework and user documentation.", + "_key": "27cae4355e4d1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "06ef45923942" + }, + { + "_key": "a11532d70cbc", + "markDefs": [], + "children": [ + { + "_key": "9af59990c0c00", + "_type": "span", + "marks": [], + "text": "Technical and Community Challenges" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "449d8032618a", + "markDefs": [], + "children": [ + { + "text": "Generalizing existing functionality", + "_key": "262d18dad67e0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [ + { + "_key": "601d56009a00", + "_type": "link", + "href": "https://nf-co.re/modules" + }, + { + "_key": "9d9691a3a5b4", + "_type": "link", + "href": "https://nf-co.re/subworkflows" + } + ], + "children": [ + { + "marks": [], + "text": "nf-core has become an encyclopedia of components, including ", + "_key": "48bd49dd01300", + "_type": "span" + }, + { + "_key": "48bd49dd01301", + "_type": "span", + "marks": [ + "601d56009a00" + ], + "text": "modules" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "48bd49dd01302" + }, + { + "_type": "span", + "marks": [ + "9d9691a3a5b4" + ], + "text": "subworkflows", + "_key": "48bd49dd01303" + }, + { + "marks": [], + "text": " that developers can leverage to build Nextflow pipelines. RNA-seq data analysis, in particular, is well served by the nf-core/rnaseq pipeline, one of the longest-standing and most popular members of the nf-core community. Some of the components used in nf-core/rnaseq were not written with re-use in mind, so the first task in this project was to abstract the commodity components for processes such as preprocessing and quantification so that they could be effectively shared by the nf-core/riboseq pipeline.", + "_key": "48bd49dd01304", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "af3382a99d21" + }, + { + "markDefs": [], + "children": [ + { + "_key": "68d04249013b0", + "_type": "span", + "marks": [], + "text": "Test dataset generation" + } + ], + "_type": "block", + "style": "h3", + "_key": "5669adb1dcd3" + }, + { + "markDefs": [], + "children": [ + { + "text": "Another significant hurdle was generating robust test data capable of supporting the ongoing quality assurance of our software. In Ribo-seq analysis, the basic operation of some tools depends on the quality of input data, so random down-sampling of variable quality input reads, especially at shallow depths may not be useful to generate test data. To overcome this, we implemented a targeted down-sampling strategy, selectively using input reads that meet high-quality standards and are known to align well with a specific chromosome. This method enabled us to produce a concise yet effective test data set, ensuring that our Ribo-seq tools operate reliably under realistic conditions.", + "_key": "a4da2ac411130", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2767c14b9d80" + }, + { + "style": "h3", + "_key": "2aaebc117fde", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Tool selection", + "_key": "27bceef25c9e0" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "1cb88d14f05e", + "markDefs": [], + "children": [ + { + "_key": "42c9c78112020", + "_type": "span", + "marks": [], + "text": "A primary challenge in developing the pipeline was the selection of high-quality, sustainable software. In bioinformatics, funding often limits software development, and many tools are poorly maintained. Furthermore, the understanding of what software 'works' can be ambiguous, embedded in the community's shared knowledge rather than documented formally. Our cooperative approach enabled us to make informed decisions and contribute improvements to the underlying software, enhancing utility for users beyond the nf-core community." + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "d1e0de03d5a1", + "markDefs": [], + "children": [ + { + "text": "Parameter selection", + "_key": "b2c37914fe590", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "2523548b9954", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Selecting the correct parameter settings for optimal operation of bioinformatics tools is a perennial problem in the community. In particular, the settings for the STAR alignment algorithm have very different constraints in Ribo-seq analysis relative to generic RNA-seq analysis. We conducted a series of benchmarks to assess the impact on alignment statistics of various combinations of parameters. We settled on a starting set, but this is a subject of continuing discussion with community members to drive further optimizations.", + "_key": "decd6cfc25240" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "9a31de208e060", + "_type": "span", + "marks": [], + "text": "Pipeline Features" + } + ], + "_type": "block", + "style": "h2", + "_key": "a8c53464a53f" + }, + { + "_type": "block", + "style": "normal", + "_key": "45f1476190e5", + "markDefs": [], + "children": [ + { + "_key": "f51ea64a9e180", + "_type": "span", + "marks": [], + "text": "The nf-core/riboseq pipeline is now a robust framework written using the nf-core pipeline template, and specifically tailored to handle the complexities of Ribo-seq data analysis." + } + ] + }, + { + "asset": { + "_ref": "image-83f90945d29b41fcdc562789b06f3abbdbfa4d9a-1010x412-png", + "_type": "reference" + }, + "_type": "image", + "_key": "9024177c2c73" + }, + { + "_key": "c4c2c021e47b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Here is what it offers:", + "_key": "3460577cae3f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Baseline read preprocessing using processes adapted from existing nf-core components.", + "_key": "5e7ebc27391f0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "cfb811774489", + "listItem": "bullet" + }, + { + "_key": "f78073ef3267", + "listItem": "bullet", + "markDefs": [ + { + "_key": "159e3bc6217d", + "_type": "link", + "href": "https://github.com/alexdobin/STAR" + } + ], + "children": [ + { + "text": "Alignment to references with ", + "_key": "4ce6dc424aed0", + "_type": "span", + "marks": [] + }, + { + "text": "STAR", + "_key": "4ce6dc424aed1", + "_type": "span", + "marks": [ + "159e3bc6217d" + ] + }, + { + "_key": "4ce6dc424aed2", + "_type": "span", + "marks": [], + "text": ", producing both transcriptome and genome alignments." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3cdb46402566", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "1b345d3fa4f80", + "_type": "span", + "marks": [], + "text": "Analysis of read distribution around protein-coding regions to assess frame bias and P-site offsets. This produces a rich selection of diagnostic plots to assess Ribo-seq data quality." + } + ], + "level": 1 + }, + { + "_key": "9e3414d59445", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Prediction and identification of translated open reading frames using tools like Ribo-TISH and Ribotricer.", + "_key": "3299c56efe000", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "9e8c117a96a2", + "markDefs": [], + "children": [ + { + "text": "Assessment of translational efficiency, which requires matched RNA-seq and Ribo-seq data, facilitated by the anota2seq Bioconductor package (see dot plot below).", + "_key": "c39d9d7b14f8", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "7122c68ade88", + "asset": { + "_type": "reference", + "_ref": "image-ca5f9967df813470051fcf548e962bdbf4c50ee5-624x624-png" + } + }, + { + "_key": "067ad9c9d6d7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "An example result from anota2seq, a tool used to study gene expression, shows how transcription and translation are connected. The x-axis shows changes in overall mRNA levels (transcription) between a treated and a control group, while the y-axis displays changes in the rate of protein synthesis (translation) between those groups, as measured by Ribo-seq. Grey points represent genes with no significant change in either metric and most points align near the center of the x-axis, indicating little change in mRNA levels. However, some genes exhibit increased (orange) or decreased (red) protein synthesis, suggesting direct regulation of translation rather than changes driven solely by mRNA abundance.", + "_key": "57c0e67a28250" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/riboseq/#usage", + "_key": "34ab33c4a8e1" + }, + { + "_type": "link", + "href": "https://nfcore.slack.com/channels/riboseq", + "_key": "218183b5348d" + } + ], + "children": [ + { + "_key": "e5078088e49b0", + "_type": "span", + "marks": [], + "text": "If you are a researcher interested in Ribo-seq data analysis, you can test the pipeline by following the instructions in the " + }, + { + "_type": "span", + "marks": [ + "34ab33c4a8e1" + ], + "text": "getting started", + "_key": "e5078088e49b1" + }, + { + "marks": [], + "text": " section of the pipeline. Please feel free to submit bugs and feature requests to drive ongoing improvements. You can also become part of the conversation by joining the ", + "_key": "e5078088e49b2", + "_type": "span" + }, + { + "text": "#riboseq", + "_key": "e5078088e49b3", + "_type": "span", + "marks": [ + "218183b5348d" + ] + }, + { + "text": " channel in the nf-core community Slack workspace. We would love to see you there!", + "_key": "e5078088e49b4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "46beba019134" + }, + { + "markDefs": [], + "children": [ + { + "_key": "bd13a8c55f6e", + "_type": "span", + "marks": [], + "text": "Next Steps" + } + ], + "_type": "block", + "style": "h2", + "_key": "515022911e71" + }, + { + "_type": "block", + "style": "normal", + "_key": "2d75d51ff270", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Following this initial phase of work, Seqera and Altos Labs have handed over the nf-core/riboseq pipeline to the nf-core community for ongoing maintenance and development. As members of that community, we will continue to play a part in enhancing the pipeline going forward. We hope others will benefit from this effort and continue to improve and refine pipeline functionality.", + "_key": "14a152a9174f0", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/iraiosub/riboseq-flow", + "_key": "46fa6099abc2" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Coincidentally the authors of ", + "_key": "98347010c2330" + }, + { + "_key": "98347010c2331", + "_type": "span", + "marks": [ + "46fa6099abc2" + ], + "text": "riboseq-flow" + }, + { + "_key": "98347010c2332", + "_type": "span", + "marks": [], + "text": " published their related work on the same day that nf-core/riboseq was first released. This pipeline has a highly complementary set of steps, and there is already ongoing collaboration to work together to build an even better community resource." + } + ], + "_type": "block", + "style": "normal", + "_key": "09c10fe38376" + }, + { + "children": [ + { + "marks": [], + "text": "Empowering Research and Innovation", + "_key": "e5fdf870848b0", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "c566b4d435e3", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The joint contribution of Seqera and Altos Labs to the nf-core/riboseq pipeline highlights how collaboration between industry and open-source communities can result in tools that push scientific boundaries and foster community engagement and development. By adhering to rigorous code quality and testing standards, nf-core/riboseq ensures researchers access to a dependable, cutting-edge tool.", + "_key": "35352a1b306b0" + } + ], + "_type": "block", + "style": "normal", + "_key": "99da8271ab0f" + }, + { + "markDefs": [], + "children": [ + { + "_key": "53386085eb760", + "_type": "span", + "marks": [], + "text": "We believe this new pipeline is poised to be vital in studying protein synthesis and its implications for aging and health. This is not just a technical achievement - it's a step forward in collaborative, open scientific progress." + } + ], + "_type": "block", + "style": "normal", + "_key": "56719298b452" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you have a project in mind where Seqera may be able to help with our Professional Services offerings, please contact us at ", + "_key": "cafe02f0755d" + }, + { + "_key": "53386085eb761", + "_type": "span", + "marks": [ + "ccafa728bca7" + ], + "text": "services@seqera.io" + }, + { + "text": ". We are the content experts for Nextflow, nf-core, and the Seqera Platform, and can offer tailored solutions and expert guidance to help you fulfill your objectives.", + "_key": "53386085eb762", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6e42514da79e", + "markDefs": [ + { + "_key": "ccafa728bca7", + "_type": "link", + "href": "mailto:services@seqera.io" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.altoslabs.com/", + "_key": "026178e92bb6" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To learn more about Altos Labs, visit ", + "_key": "3babdea8c79d0" + }, + { + "_key": "3babdea8c79d1", + "_type": "span", + "marks": [ + "026178e92bb6" + ], + "text": "https://www.altoslabs.com/" + }, + { + "_key": "3babdea8c79d2", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "a5dc365dc556" + }, + { + "_key": "5b95f381569b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Acknowledgments", + "_key": "48b61c9282e00" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "258428890647", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "nf-core/riboseq was initially written by Jonathan Manning (Bioinformatics Engineer at Seqera) in collaboration with Felix Krueger and Christel Krueger (Altos Labs). The development work carried out on the pipeline was funded by Altos Labs. We thank the following people for their input (", + "_key": "d836d0eff50e0" + }, + { + "_key": "d836d0eff50e1", + "_type": "span", + "marks": [ + "em" + ], + "text": "in alphabetical order" + }, + { + "text": "):", + "_key": "d836d0eff50e2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Felipe Almeida (ZS)", + "_key": "376c006c20de0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "be9ad649bb8d", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "abb0a8d9fba2", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Anne Bresciani (ZS)", + "_key": "6046a5e41c110" + } + ], + "level": 1 + }, + { + "_key": "31c2f31a40bc", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Caroline Eastwood (University of Edinburgh)", + "_key": "040c3d125ae60" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ce8f076685cf", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Maxime U Garcia (Seqera)", + "_key": "f3c530a930470", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "7b34ffefab7d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Mikhail Osipovitch (ZS)", + "_key": "e21649c58e7b0" + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "02884c22d195", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Jack Tierney (University College Cork)", + "_key": "1f18a294d9a20", + "_type": "span" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "86b2bce07178", + "_type": "span", + "marks": [], + "text": "Edward Wallace (University of Edinburgh)\n\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8f03c90bd810" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1da880ad30a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e59fb1d47363" + }, + { + "_type": "block", + "style": "normal", + "_key": "736ce4dde440", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n\n", + "_key": "1c8d35ffcae9" + } + ] + } + ], + "_updatedAt": "2024-07-15T14:51:14Z", + "title": "nf-core/riboseq: A collaboration between Altos Labs and Seqera", + "author": { + "_ref": "109f0c7b-3d40-42a9-af77-3844f0e031c0", + "_type": "reference" + }, + "meta": { + "shareImage": { + "_type": "image", + "asset": { + "_type": "reference", + "_ref": "image-10399aee1fa48e4250f2e7ab3c7fb76ca3aa1ac4-1200x628-png" + } + }, + "description": "nf-core/riboseq: A collaboration between Altos Labs and Seqera", + "noIndex": false, + "slug": { + "current": "nf-core-riboseq", + "_type": "slug" + }, + "_type": "meta" + }, + "_id": "drafts.0d583937-1d7f-4c31-9e79-d8f1e5f2a2da", + "publishedAt": "2024-05-15T13:59:00.000Z", + "_createdAt": "2024-05-13T11:54:28Z", + "_type": "blogPost" + }, + { + "body": [ + { + "style": "normal", + "_key": "b2c90a845577", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "This is a joint blog post by Chris Wright of Oxford Nanopore Technologies and Paolo Di Tommaso of Seqera. ", + "_key": "dce4c210802c0" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b6d54206379d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa2ec0b99c58" + }, + { + "_key": "e3f059b3c10d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Introduction", + "_key": "790a05fdbdd2" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "76489b73dd57", + "markDefs": [ + { + "href": "https://nf-co.re/", + "_key": "2ed12be893e0", + "_type": "link" + } + ], + "children": [ + { + "text": "Besides the well-known ", + "_key": "d26917496f3f0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "2ed12be893e0" + ], + "text": "nf-core", + "_key": "34b66355584b" + }, + { + "_key": "5ad48037f99f", + "_type": "span", + "marks": [], + "text": ", there are several collections of high-quality Nextflow pipelines and modules, including:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8f0e7727b92d", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://www.iarc.who.int/", + "_key": "86ec76a9bc7e", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow pipelines from the International Agency for Research on Cancer (", + "_key": "ec376d926cdb0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "86ec76a9bc7e" + ], + "text": "IARC", + "_key": "ec376d926cdb1" + }, + { + "marks": [], + "text": ")", + "_key": "ec376d926cdb2", + "_type": "span" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "56f0498ec45b", + "_type": "link", + "href": "https://github.com/UMCUGenetics/NextflowModules" + }, + { + "_type": "link", + "href": "https://github.com/UMCUGenetics/", + "_key": "631125797903" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Various pipelines and ", + "_key": "51e2182400bb0" + }, + { + "_type": "span", + "marks": [ + "56f0498ec45b" + ], + "text": "Nextflow modules", + "_key": "51e2182400bb1" + }, + { + "_key": "51e2182400bb2", + "_type": "span", + "marks": [], + "text": " maintained by " + }, + { + "marks": [ + "631125797903" + ], + "text": "UMCU Genetics", + "_key": "51e2182400bb3", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "630bd0a71a60" + }, + { + "style": "normal", + "_key": "d178804486d4", + "listItem": "bullet", + "markDefs": [ + { + "_key": "cd9ee1a357db", + "_type": "link", + "href": "https://github.com/qbic-pipelines" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "cd9ee1a357db" + ], + "text": "QBiC pipelines", + "_key": "92c0024f2cc10" + }, + { + "_type": "span", + "marks": [], + "text": " maintained at the University of Tübingen", + "_key": "92c0024f2cc11" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "d9624fd0014f", + "_type": "link", + "href": "https://labs.epi2me.io/wfindex/" + }, + { + "_type": "link", + "href": "https://labs.epi2me.io/wfindex/", + "_key": "6808d696dec7" + } + ], + "children": [ + { + "_key": "22a465de00660", + "_type": "span", + "marks": [], + "text": "We thought it was high time that we gave some attention to " + }, + { + "_type": "span", + "marks": [ + "6808d696dec7" + ], + "text": "EPI2ME", + "_key": "22a465de00661" + }, + { + "text": "™", + "_key": "9a7e6b39ba16", + "_type": "span", + "marks": [ + "d9624fd0014f" + ] + }, + { + "marks": [ + "6808d696dec7" + ], + "text": " Workflows", + "_key": "dfdcc3d45166", + "_type": "span" + }, + { + "marks": [], + "text": " - another set of professionally maintained pipelines developed by the EPI2ME team at Oxford Nanopore Technologies (ONT).", + "_key": "22a465de00662", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "86cecd91f447" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this article, we discuss the workflows, compare them to similar pipelines from nf-core, and explain how users can easily get started using software from EPI2ME™ or the Seqera Platform.", + "_key": "b490da57e51b0" + } + ], + "_type": "block", + "style": "normal", + "_key": "27751fed9302" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "About Oxford Nanopore", + "_key": "3aeea92ca74e0" + } + ], + "_type": "block", + "style": "h2", + "_key": "017862dd41da", + "markDefs": [] + }, + { + "style": "normal", + "_key": "b0c3d54fcfbb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Oxford Nanopore Technologies was founded in 2005 as a spin-off from the University of Oxford in the UK. The company has developed, commercialized, and continues to innovate on a new generation of sensing technology that uses nanopores - nano-scale holes - embedded in high-tech electronics to perform comprehensive analyses of single molecules.", + "_key": "57b1e1ef5de70" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Oxford Nanopore’s first products sequence DNA and RNA. The technology offers scalability from portable to ultra-high throughput formats that are appropriate for broad use. This combines with real-time data delivery for rapid insights and dynamic workflows, and PCR-free sequencing of any length of fragment for the ability to accurately characterize biological variation.", + "_key": "1a04c455f8d80", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb44b80b01f0" + }, + { + "_type": "block", + "style": "normal", + "_key": "22be926d3f99", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Today, Oxford Nanopore offers a range of products, from laboratory preparation and automation solutions to sequencers to software tools for analysis, in addition to industry-leading sequencing products, which range from the portable MinION™ sequencer to benchtop GridION™ devices with integrated compute to their high-throughput PromethION™ series with up to 48 independently addressable flow cells.", + "_key": "d17e2d990fb90" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "EPI2ME", + "_key": "636a4e1482120" + } + ], + "_type": "block", + "style": "h2", + "_key": "a930887c907e" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The EPI2ME team within Oxford Nanopore is composed of a dozen people with expertise in diverse fields, from genetics to computational biology and focus on a variety of workflows from microbiology to clinical research applications. EPI2ME provides Oxford Nanopore’s open-source bioinformatics platform and develops and maintains pipelines tailored to nanopore sequencing data.", + "_key": "95ad2fd668020" + } + ], + "_type": "block", + "style": "normal", + "_key": "e93d0e1a78e4", + "markDefs": [] + }, + { + "style": "normal", + "_key": "fad46be12fdd", + "markDefs": [], + "children": [ + { + "text": "In addition to maintaining Nextflow pipelines, the EPI2ME team also supports:", + "_key": "eab61eeb5cd00", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8afaafe65af7", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://registry.opendata.aws/ont-open-data/", + "_key": "c6146d464d70", + "_type": "link" + } + ], + "children": [ + { + "text": "Oxford Nanopore’s Open Data provided through the ", + "_key": "40fe29c2695f0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "c6146d464d70" + ], + "text": "registry of open data on AWS", + "_key": "40fe29c2695f1" + }, + { + "marks": [], + "text": ".", + "_key": "40fe29c2695f2", + "_type": "span" + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "1ae98d791b54", + "listItem": "bullet", + "markDefs": [ + { + "_key": "6983f59d3163", + "_type": "link", + "href": "https://labs.epi2me.io/downloads/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "9fa05d8343900" + }, + { + "_key": "c573fa95e973", + "_type": "span", + "marks": [ + "6983f59d3163" + ], + "text": "EPI2ME desktop application" + }, + { + "_type": "span", + "marks": [], + "text": " for Windows, Mac and Linux for running workflows locally or in the cloud.", + "_key": "40fd725ad2bd" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Various tutorials and training materials for bioinformaticians.", + "_key": "e4c4325f39120", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5c603b8592eb" + }, + { + "_type": "block", + "style": "normal", + "_key": "5b4604221dde", + "markDefs": [], + "children": [ + { + "_key": "8876e46e8e30", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "5e478d3d26a8", + "markDefs": [], + "children": [ + { + "_key": "a3a61ee99a7a0", + "_type": "span", + "marks": [], + "text": "Meet the pipelines" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The EPI2ME team selected ", + "_key": "476377a62d880" + }, + { + "_key": "476377a62d881", + "_type": "span", + "marks": [ + "181888f825c8" + ], + "text": "Nextflow " + }, + { + "_key": "476377a62d882", + "_type": "span", + "marks": [], + "text": "as their preferred framework for workflows in March of 2021. The pipelines are written using both community provided and bespoke (open-source) tools and maintained by a dedicated team of professional bioinformaticians and software developers. They are extensively documented, and generate high-quality interactive reports." + } + ], + "_type": "block", + "style": "normal", + "_key": "687f06efdeb6", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/nextflow/", + "_key": "181888f825c8" + } + ] + }, + { + "children": [ + { + "text": "The pipelines are freely available from ", + "_key": "5be3d3739c2b0", + "_type": "span", + "marks": [] + }, + { + "text": "GitHub", + "_key": "5be3d3739c2b1", + "_type": "span", + "marks": [ + "6fe1b59ffff9" + ] + }, + { + "text": " and are grouped based on their functionality as follows:", + "_key": "5be3d3739c2b2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dff4e1ef2418", + "markDefs": [ + { + "href": "https://github.com/epi2me-labs", + "_key": "6fe1b59ffff9", + "_type": "link" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipelines for handling basic tasks such as base calling and alignment: ", + "_key": "ef0f7768b3f60" + }, + { + "_type": "span", + "marks": [ + "fb98094d40d4" + ], + "text": "wf-basecalling", + "_key": "ef0f7768b3f61" + }, + { + "marks": [], + "text": " and ", + "_key": "ef0f7768b3f62", + "_type": "span" + }, + { + "_key": "ef0f7768b3f63", + "_type": "span", + "marks": [ + "d8361b59de86" + ], + "text": "wf-alignment" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "725964a021aa", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/epi2me-labs/wf-basecalling", + "_key": "fb98094d40d4", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-alignment", + "_key": "d8361b59de86" + } + ] + }, + { + "style": "normal", + "_key": "bd0914ddc3e9", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-human-variation", + "_key": "b4df4a9149c2" + }, + { + "_key": "afcc3659be32", + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-somatic-variation" + } + ], + "children": [ + { + "marks": [], + "text": "Pipelines for studying human genetics and cancer: ", + "_key": "eadc5a77d5bf0", + "_type": "span" + }, + { + "_key": "eadc5a77d5bf1", + "_type": "span", + "marks": [ + "b4df4a9149c2" + ], + "text": "wf-human-variation" + }, + { + "marks": [], + "text": " and ", + "_key": "eadc5a77d5bf2", + "_type": "span" + }, + { + "marks": [ + "afcc3659be32" + ], + "text": "wf-somatic-variation", + "_key": "eadc5a77d5bf3", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Pipelines for genome assembly: ", + "_key": "bebf1e3dae070", + "_type": "span" + }, + { + "text": "wf-clone-validation", + "_key": "bebf1e3dae071", + "_type": "span", + "marks": [ + "a5974b48d87e" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "bebf1e3dae072" + }, + { + "_type": "span", + "marks": [ + "f5cddbb76c9b" + ], + "text": "wf-bacterial-genomes", + "_key": "bebf1e3dae073" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "429c0538478c", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-clone-validation", + "_key": "a5974b48d87e" + }, + { + "href": "https://github.com/epi2me-labs/wf-bacterial-genomes", + "_key": "f5cddbb76c9b", + "_type": "link" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/epi2me-labs/wf-metagenomics", + "_key": "52b154d3d1e6", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-16s", + "_key": "d26e8ebf9b62" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Metagenomic analysis pipelines: ", + "_key": "c2ba1980c0c40" + }, + { + "_type": "span", + "marks": [ + "52b154d3d1e6" + ], + "text": "wf-metagenomics", + "_key": "c2ba1980c0c41" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "c2ba1980c0c42" + }, + { + "text": "wf-16s", + "_key": "c2ba1980c0c43", + "_type": "span", + "marks": [ + "d26e8ebf9b62" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a84c4604286a" + }, + { + "style": "normal", + "_key": "753781e4e05a", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-transcriptomes", + "_key": "eaa9ccd118e1" + }, + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-single-cell", + "_key": "c994c09a8a63" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Direct RNA sequencing and cDNA: ", + "_key": "93376e8595190" + }, + { + "_type": "span", + "marks": [ + "eaa9ccd118e1" + ], + "text": "wf-transcriptomes", + "_key": "93376e8595191" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "93376e8595192" + }, + { + "_type": "span", + "marks": [ + "c994c09a8a63" + ], + "text": "wf-single-cell", + "_key": "93376e8595193" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e9489bd531f9", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-artic", + "_key": "4742c63b6758" + }, + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-mpx", + "_key": "977ee6511324" + }, + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-flu", + "_key": "df2667b37c87" + }, + { + "href": "https://github.com/epi2me-labs/wf-tb-amr", + "_key": "34c72b732a23", + "_type": "link" + } + ], + "children": [ + { + "text": "Pipelines for infectious disease including SARS-CoV-2, Monkeypox, Influenza, and tuberculosis: ", + "_key": "47c89491d4dd0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "4742c63b6758" + ], + "text": "wf-artic", + "_key": "47c89491d4dd1" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "47c89491d4dd2" + }, + { + "_type": "span", + "marks": [ + "977ee6511324" + ], + "text": "wf-mpx", + "_key": "47c89491d4dd3" + }, + { + "marks": [], + "text": ", ", + "_key": "47c89491d4dd4", + "_type": "span" + }, + { + "_key": "47c89491d4dd5", + "_type": "span", + "marks": [ + "df2667b37c87" + ], + "text": "wf-flu" + }, + { + "marks": [], + "text": ", and ", + "_key": "47c89491d4dd6", + "_type": "span" + }, + { + "marks": [ + "34c72b732a23" + ], + "text": "wf-tb-amr", + "_key": "47c89491d4dd7", + "_type": "span" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "f17d24b126a7", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/epi2me-labs/wf-amplicon", + "_key": "06030687d4a8", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Targeted sequencing pipelines: ", + "_key": "807c964325570" + }, + { + "_type": "span", + "marks": [ + "06030687d4a8" + ], + "text": "wf-amplicon", + "_key": "807c964325571" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Other pipelines: ", + "_key": "094f1f7f00750" + }, + { + "marks": [ + "19ef1b8b2717" + ], + "text": "wf-pore-c", + "_key": "094f1f7f00751", + "_type": "span" + }, + { + "marks": [], + "text": ", ", + "_key": "094f1f7f00752", + "_type": "span" + }, + { + "_key": "094f1f7f00753", + "_type": "span", + "marks": [ + "77a1cf496c86" + ], + "text": "wf-aav-qc" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3d15976a2de7", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/epi2me-labs/wf-pore-c/", + "_key": "19ef1b8b2717", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-aav-qc/", + "_key": "77a1cf496c86" + } + ] + }, + { + "markDefs": [ + { + "href": "https://github.com/epi2me-labs/wf-template", + "_key": "ceb7a9c6f6d3", + "_type": "link" + } + ], + "children": [ + { + "_key": "4169ac3f19a30", + "_type": "span", + "marks": [], + "text": "While the pipelines pre-date some recent nf-core practices, EPI2ME pipelines are DSL2 compliant, modular, and employ their own consistent coding standards. Like nf-core, EPI2ME provides a " + }, + { + "_type": "span", + "marks": [ + "ceb7a9c6f6d3" + ], + "text": "standard template", + "_key": "4169ac3f19a31" + }, + { + "_type": "span", + "marks": [], + "text": " that can be used as the basis for developing new workflows. Community-developed workflows following those standards can be easily integrated in the EPI2ME desktop application, increasing accessibility for other Oxford Nanopore users, through an intuitive graphical interface. The EPI2ME team also publishes the containers for each workflow on Docker Hub and makes them freely available.", + "_key": "4169ac3f19a32" + } + ], + "_type": "block", + "style": "normal", + "_key": "bfaa6f746f6a" + }, + { + "style": "h2", + "_key": "2365c15e1db4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Running the EPI2ME pipelines", + "_key": "81489acf3c9a0" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "df57dd6eb117", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For each pipeline, instructions are provided for running:", + "_key": "4a647fd3a5e50" + } + ] + }, + { + "style": "normal", + "_key": "f44b9b736f01", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "From the command line", + "_key": "95054c9cdad40" + } + ], + "level": 1, + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "On the Seqera Platform", + "_key": "b3f735b03631", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "03d74a0f0336" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6b4477b09f0b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Using the EPI2ME desktop application\n\n", + "_key": "71974dc131350", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "8f3cd2df96b4", + "markDefs": [], + "children": [ + { + "_key": "bf20a4228ebe", + "_type": "span", + "marks": [], + "text": "EPI2ME pipelines on Seqera Platform" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7ed4ff8abf9c", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/platform/", + "_key": "0e4d8583f03d" + } + ], + "children": [ + { + "_key": "0bb16c677f290", + "_type": "span", + "marks": [], + "text": "Since EPI2ME pipelines include a nextflow_schema.json file, pipelines can be adapted for use with the " + }, + { + "_key": "0bb16c677f291", + "_type": "span", + "marks": [ + "0e4d8583f03d" + ], + "text": "Seqera Platform" + }, + { + "_type": "span", + "marks": [], + "text": ", leveraging Seqera’s interactive interface for launching and monitoring pipelines in their preferred HPC or cloud computing environment.", + "_key": "0bb16c677f292" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "5e1b549edadb", + "markDefs": [ + { + "_key": "f454698e5c14", + "_type": "link", + "href": "https://seqera.io/pipelines/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera users can simply add EPI2ME pipelines to the Seqera Launchpad via ", + "_key": "1bd4c1b0b5de0" + }, + { + "text": "Seqera Pipelines ", + "_key": "1bd4c1b0b5de1", + "_type": "span", + "marks": [ + "f454698e5c14" + ] + }, + { + "marks": [], + "text": "or by pointing to the EPI2ME GitHub repo and selecting a pipeline version.", + "_key": "1bd4c1b0b5de2", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After adding a tile for the EPI2ME pipeline in Seqera, users can launch pipelines to their preferred compute environments, monitor execution, and share resulting datasets and pipeline results.", + "_key": "fe351b8e99b70" + } + ], + "_type": "block", + "style": "normal", + "_key": "db03c5635b3c" + }, + { + "_key": "d640c4c27d05", + "_type": "youtube", + "id": "KWw0NP-CT_s" + }, + { + "_type": "block", + "style": "h3", + "_key": "6567e1952e72", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\nEPI2ME pipelines using EPI2ME Desktop Application", + "_key": "00ee018999920", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "13b683fb7ce0", + "markDefs": [ + { + "href": "https://labs.epi2me.io/installation/", + "_key": "2d734f49d113", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Users can learn more about installing the EPI2ME desktop application ", + "_key": "d558435f20ad0", + "_type": "span" + }, + { + "_key": "d558435f20ad1", + "_type": "span", + "marks": [ + "2d734f49d113" + ], + "text": "here" + }, + { + "marks": [], + "text": ". This desktop tool uses Nextflow and Docker to run bioinformatics workflows and provides an intuitive, easy-to-use interface. With the EPI2ME desktop application, users can launch EPI2ME workflows and other Nextflow pipelines from their choice of desktop environment including Windows (via Windows Subsystem for Linux, WSL), MacOS, or Linux.", + "_key": "d558435f20ad2", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e0ac6e1638e20" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb0368719104", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "ce58c255aba50" + } + ], + "_type": "block", + "style": "h2", + "_key": "9f2035517ace" + }, + { + "_key": "9f6b0fa9669b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "EPI2ME provides a comprehensive and valuable collection of Nextflow pipelines developed and made available by Oxford Nanopore, catering to a wide range of bioinformatics use cases. These pipelines can be deployed on both Seqera Platform and the EPI2ME Desktop application.", + "_key": "8f1e4c5afc0a0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "To learn more about EPI2ME, visit ", + "_key": "523fa3aa51180" + }, + { + "text": "https://nanoporetech.com/products/analyse/epi2me/", + "_key": "ed445f69116f", + "_type": "span", + "marks": [ + "a8eb33f3f46a" + ] + }, + { + "text": " or ", + "_key": "5c5bdcbe8cad", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "54cfdf09c8ff" + ], + "text": "sign-up", + "_key": "2505b16fc3c8", + "_type": "span" + }, + { + "marks": [], + "text": " for a free Seqera Cloud account now.", + "_key": "bf417933d86e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2fd1ef4c904d", + "markDefs": [ + { + "_key": "a8eb33f3f46a", + "_type": "link", + "href": "https://nanoporetech.com/products/analyse/epi2me/" + }, + { + "_key": "54cfdf09c8ff", + "_type": "link", + "href": "https://cloud.seqera.io/login?utm_source=hs_email&utm_campaign=Webinar%20What%27s%20New%20November%202024&utm_medium=email&utm_content=2&utm_term=fusionprod&utk=4fc237408a000b621d88cfef06fe09e0" + } + ] + } + ], + "meta": { + "description": "We thought it was high time that we gave some attention to EPI2ME Workflows - a set of professionally maintained Nextflow pipelines developed by the EPI2ME team at Oxford Nanopore Technologies (ONT).\n", + "noIndex": false, + "slug": { + "current": "epi2me-nextflow-pipelines", + "_type": "slug" + }, + "_type": "meta" + }, + "_updatedAt": "2024-10-15T13:01:13Z", + "_rev": "572bf426-c44a-4f55-99f7-35d9576b00b4", + "title": "Nextflow pipelines from EPI2ME ", + "publishedAt": "2024-10-18T14:33:00.000Z", + "_createdAt": "2024-10-14T09:15:05Z", + "_id": "drafts.208c3a9d-0253-486c-bc4e-6f233ef7080f", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "aebece5c41a0" + } + ], + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_type": "blogPost" + }, + { + "publishedAt": "2016-06-10T06:00:00.000Z", + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "_updatedAt": "2024-10-16T14:28:48Z", + "meta": { + "slug": { + "current": "docker-for-dunces-nextflow-for-nunces" + } + }, + "_id": "drafts.561ca06ac707", + "body": [ + { + "children": [ + { + "_key": "aa74c907fb89", + "_type": "span", + "marks": [ + "em" + ], + "text": "Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution." + } + ], + "_type": "block", + "style": "normal", + "_key": "5de644223001", + "markDefs": [] + }, + { + "style": "normal", + "_key": "fba2c75d251d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1e58c8a15fb2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Modern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.", + "_key": "5ad57d04cb9d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "50833a8d465d" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f171be4200cf" + } + ], + "_type": "block", + "style": "normal", + "_key": "df4dbb73e883" + }, + { + "markDefs": [], + "children": [ + { + "_key": "85b35fa626c4", + "_type": "span", + "marks": [], + "text": "During several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere." + } + ], + "_type": "block", + "style": "normal", + "_key": "84ce0feaea47" + }, + { + "_type": "block", + "style": "normal", + "_key": "ca94bc941408", + "markDefs": [], + "children": [ + { + "_key": "d043f09e00b4", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "974f1a1cdfa3", + "markDefs": [ + { + "_type": "link", + "href": "http://www.github.com/cbcrg/lncrna-annotation-nf", + "_key": "99165958e6b5" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Creating the Nextflow pipeline (", + "_key": "155a8a08d8cd" + }, + { + "_type": "span", + "marks": [ + "99165958e6b5" + ], + "text": "here", + "_key": "357c4685588b" + }, + { + "text": ") in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.", + "_key": "f3317867e3c0", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_key": "2a5c98bf3a96", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab6f59d351cb", + "markDefs": [] + }, + { + "markDefs": [ + { + "_key": "905a8bc500ad", + "_type": "link", + "href": "https://www.docker.com/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the ", + "_key": "8390ee0ee4e6" + }, + { + "_key": "c58b7dc20cce", + "_type": "span", + "marks": [ + "em" + ], + "text": "long" + }, + { + "marks": [], + "text": " dependency tree. The solution was to turn to ", + "_key": "e384258a5c3f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "905a8bc500ad" + ], + "text": "Docker", + "_key": "48755f8b6d14" + }, + { + "text": " containers.", + "_key": "236e84a2092d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "004440881a96" + }, + { + "children": [ + { + "text": "", + "_key": "f4792876a9aa", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "55e482405e7c", + "markDefs": [] + }, + { + "_key": "df983b305d4f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.", + "_key": "8fe4f707201e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "649e13290a13", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "64ccdad0c58d" + }, + { + "style": "normal", + "_key": "9daaf61343a0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Docker Installation", + "_key": "f8e4f2418ada", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "22f03df3d9b5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7dbae6fbfa16" + }, + { + "children": [ + { + "_key": "3af57ef1c497", + "_type": "span", + "marks": [], + "text": "By far the most challenging issue is the installation of Docker. For local installations, the " + }, + { + "text": "process is relatively straight forward", + "_key": "29497e07ff62", + "_type": "span", + "marks": [ + "b39b383b61e5" + ] + }, + { + "_key": "bac8833f273e", + "_type": "span", + "marks": [], + "text": ". However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with " + }, + { + "text": "updates addressing these issues", + "_key": "f4e68c0049e2", + "_type": "span", + "marks": [ + "1664943865ae" + ] + }, + { + "text": ".", + "_key": "ebdcec8ebe01", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a438411f6220", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.docker.com/engine/installation", + "_key": "b39b383b61e5" + }, + { + "_type": "link", + "href": "https://blog.docker.com/2016/02/docker-engine-1-10-security/", + "_key": "1664943865ae" + } + ] + }, + { + "style": "normal", + "_key": "369a356018e0", + "markDefs": [], + "children": [ + { + "_key": "6ea9c938cc17", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9c82fe0136e7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "That being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.", + "_key": "f06c6b5ed104", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "11b1347afcab", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9f5f313834ae", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3640fc87e1c5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.", + "_key": "0b77a23d5bf7" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b0ad6ffae120" + } + ], + "_type": "block", + "style": "normal", + "_key": "9edd5abef435" + }, + { + "code": "docker pull cbcrg/lncrna_annotation\nnextflow run cbcrg/lncrna-annotation-nf -profile test", + "_type": "code", + "_key": "e04747c2e377" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "[If the following does not work, there could be a problem with your Docker installation.]", + "_key": "fb8752c7e000" + } + ], + "_type": "block", + "style": "normal", + "_key": "0fc16192bebe" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0af154258b87" + } + ], + "_type": "block", + "style": "normal", + "_key": "e0523eff522a", + "markDefs": [] + }, + { + "style": "normal", + "_key": "773b9de99fad", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and run it using the test data included with it.", + "_key": "36689d3a632c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "6ba84ebe36e1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0973e8d341fc" + } + ] + }, + { + "style": "normal", + "_key": "1f30f62bc089", + "markDefs": [], + "children": [ + { + "text": "###The Dockerfile", + "_key": "3a33f2cb54af", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "50364dafcb96" + } + ], + "_type": "block", + "style": "normal", + "_key": "4cb45a2ade99" + }, + { + "markDefs": [], + "children": [ + { + "text": "The ", + "_key": "5f45a4596c7c", + "_type": "span", + "marks": [] + }, + { + "text": "Dockerfile", + "_key": "6e92add363fc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules.", + "_key": "908e792d54df" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f1d99c7b705" + }, + { + "_type": "block", + "style": "normal", + "_key": "eb6597312e37", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "eedc860980f3", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "d4223ee66e84", + "markDefs": [], + "children": [ + { + "text": "We begin by creating a file ", + "_key": "b0b033a77a83", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "Dockerfile", + "_key": "69aa3263d8b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in the Nextflow project directory. The Dockerfile begins with:", + "_key": "ea7e45e2295a" + } + ], + "_type": "block" + }, + { + "_key": "b6aef4e4bff6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "dd2d05610c5b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "# Set the base image to debian jessie\nFROM debian:jessie\n\n# File Author / Maintainer\nMAINTAINER Evan Floden ", + "_type": "code", + "_key": "c95932bd73bd" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image.", + "_key": "dbd6ec0da776" + } + ], + "_type": "block", + "style": "normal", + "_key": "dd72b4cb8f73" + }, + { + "_key": "e3e22b6493fa", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "8ce23ba404a7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "24d492f7dd06", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Next we update the repository sources and install some essential tools such as ", + "_key": "883b5be27cf1" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "wget", + "_key": "de93d23dcc24" + }, + { + "marks": [], + "text": " and ", + "_key": "a75f0d48042f", + "_type": "span" + }, + { + "_key": "b8f1f6977f76", + "_type": "span", + "marks": [ + "code" + ], + "text": "perl" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3d2e30dbd5be" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "74087f39767c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b388a47cc16", + "markDefs": [] + }, + { + "_key": "0781f0913220", + "code": "RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...", + "_type": "code" + }, + { + "children": [ + { + "_key": "82c7900bb435", + "_type": "span", + "marks": [], + "text": "Notice that we use the command " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "RUN", + "_key": "7029c2127e5e" + }, + { + "_type": "span", + "marks": [], + "text": " before each line. The ", + "_key": "cc075c2808b5" + }, + { + "text": "RUN", + "_key": "5372b2fbc07e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " instruction executes commands as if they are performed from the Linux shell.", + "_key": "54c24028d590", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3ca70fafd6b8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2f4253d9b870" + } + ], + "_type": "block", + "style": "normal", + "_key": "24e6cf4eeaad" + }, + { + "_type": "block", + "style": "normal", + "_key": "ac0cc4e414e7", + "markDefs": [ + { + "_type": "link", + "href": "https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/", + "_key": "3b99f1c6e0d0" + }, + { + "href": "https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/", + "_key": "ee681c47a630", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Also is good practice to group as many as possible commands in the same ", + "_key": "a715e201a410" + }, + { + "text": "RUN", + "_key": "4c0542b30503", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " statement. This reduces the size of the final Docker image. See ", + "_key": "cd0129fc2cb4" + }, + { + "_key": "95753b3703a7", + "_type": "span", + "marks": [ + "3b99f1c6e0d0" + ], + "text": "here" + }, + { + "_type": "span", + "marks": [], + "text": " for these details and ", + "_key": "b3d6166d7b40" + }, + { + "marks": [ + "ee681c47a630" + ], + "text": "here", + "_key": "ea9f63a37e2f", + "_type": "span" + }, + { + "marks": [], + "text": " for more best practices.", + "_key": "fec090986d03", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9f35046732fb" + } + ], + "_type": "block", + "style": "normal", + "_key": "24659e48c3e7", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "57e5a413a943", + "markDefs": [ + { + "_key": "d68e3d739fed", + "_type": "link", + "href": "http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo" + } + ], + "children": [ + { + "_key": "ab9ae2c48fd3", + "_type": "span", + "marks": [], + "text": "Next we can specify the install of the required perl modules using " + }, + { + "marks": [ + "d68e3d739fed" + ], + "text": "cpan minus", + "_key": "376a38ae89cc", + "_type": "span" + }, + { + "_key": "b82c42d7d1f5", + "_type": "span", + "marks": [], + "text": ":" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a23d9bbf5ef9", + "markDefs": [], + "children": [ + { + "_key": "0b5c9131deb9", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "code": "# Install perl modules\nRUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...", + "_type": "code", + "_key": "e7530c3f6dba" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We can give the instructions to download and install software from GitHub using:", + "_key": "c3ff2167e3c1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "83711b5bfb64" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6891af5db4de" + } + ], + "_type": "block", + "style": "normal", + "_key": "00fd8f533a9a" + }, + { + "_type": "code", + "_key": "ac765553f6ad", + "code": "# Install Star Mapper\nRUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR" + }, + { + "_key": "5387c5d1aae0", + "markDefs": [], + "children": [ + { + "_key": "21f01a7dee08", + "_type": "span", + "marks": [], + "text": "We can add custom Perl modules and specify environmental variables such as " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "PERL5LIB", + "_key": "3c35ccd9597e" + }, + { + "_key": "7edd690d58bf", + "_type": "span", + "marks": [], + "text": " as below:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "88da8fa38161", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "95b43e15b080" + }, + { + "_type": "code", + "_key": "02cae409f036", + "code": "# Install FEELnc\nRUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\nENV FEELNCPATH /FEELnc\nENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/" + }, + { + "_key": "3db7c8965a0b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "R and R libraries can be installed as follows:", + "_key": "fab1d01a8d76" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7e8b16febe0b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "369cb978dbc9" + }, + { + "_type": "code", + "_key": "b635cd93fe02", + "code": "# Install R\nRUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\napt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\napt-get update --fix-missing && \\\napt-get -y install r-base\n\n# Install R libraries\nRUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'" + }, + { + "children": [ + { + "text": "For the complete working Dockerfile of this project see ", + "_key": "31f01f88d7d4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "95d80901751f" + ], + "text": "here", + "_key": "61cd37841c10" + } + ], + "_type": "block", + "style": "normal", + "_key": "f897e630ac44", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile", + "_key": "95d80901751f" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "34a2ed31ef9a", + "markDefs": [], + "children": [ + { + "_key": "ec0c46f9c3c6", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "###Building the Docker Image", + "_key": "99404c3f6b68" + } + ], + "_type": "block", + "style": "normal", + "_key": "1abf7c16ad8c", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1d5e4d812566" + } + ], + "_type": "block", + "style": "normal", + "_key": "b636fd70f4f5", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "5650048a4760", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Once we start working on the Dockerfile, we can build it anytime using:", + "_key": "5264f09e8e11", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "d641e7f6bf5b", + "markDefs": [], + "children": [ + { + "_key": "70fcc4126623", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "e90f06c1b843", + "code": "docker build -t skptic/lncRNA_annotation ." + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.", + "_key": "fe3388bbb799" + } + ], + "_type": "block", + "style": "normal", + "_key": "8ccc8a028371" + }, + { + "_type": "block", + "style": "normal", + "_key": "7738ce1608b0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "23129a90f294", + "_type": "span" + } + ] + }, + { + "_key": "53e684ed0883", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Testing the Docker Image", + "_key": "ac9aefee0790", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f459dd9c7e8f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "29310a754336", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "995c7f634de1", + "markDefs": [], + "children": [ + { + "text": "We find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:", + "_key": "0f7532136e6a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0f3a99e8f0f9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "76902143bcad", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "docker run -ti lncrna_annotation\n\ncd FEELnc/test\n\nFEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n> -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\nexit # remember to exit the Docker image", + "_type": "code", + "_key": "8bc163f9f47c" + }, + { + "style": "normal", + "_key": "8a04e5fe54c3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Tagging the Docker Image", + "_key": "3c27e7d47f5a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "b58a8fff4134", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "376d9185809e", + "markDefs": [] + }, + { + "style": "normal", + "_key": "1a1035fe3e9e", + "markDefs": [ + { + "_type": "link", + "href": "https://hub.docker.com/", + "_key": "e8267b213edb" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you are confident your image is built correctly, you can tag it, allowing you to push it to ", + "_key": "0e81f997274e" + }, + { + "_type": "span", + "marks": [ + "e8267b213edb" + ], + "text": "Dockerhub.io", + "_key": "9f99511c671e" + }, + { + "_type": "span", + "marks": [], + "text": ". Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.", + "_key": "62279d8d8677" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "629916622f88" + } + ], + "_type": "block", + "style": "normal", + "_key": "7ad2329cd8e6" + }, + { + "style": "normal", + "_key": "ab2403070ee6", + "markDefs": [], + "children": [ + { + "_key": "83a7985ea39e", + "_type": "span", + "marks": [], + "text": "You can view the images in your local repository with the " + }, + { + "text": "docker images", + "_key": "9b9c237f8f87", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " command and tag using ", + "_key": "6aaa7f1f9459" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "docker tag", + "_key": "56edcf9c0231" + }, + { + "_type": "span", + "marks": [], + "text": " with the image ID and the name.", + "_key": "fccfd00ea0ef" + } + ], + "_type": "block" + }, + { + "_key": "2883293716da", + "markDefs": [], + "children": [ + { + "_key": "4796d2e24cad", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "cb58c9b6a966", + "code": "docker images\n\nREPOSITORY TAG IMAGE ID CREATED SIZE\nlncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\ndocker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now when we check our local images we can see the updated tag.", + "_key": "efecf9499efc" + } + ], + "_type": "block", + "style": "normal", + "_key": "977cb77dafd8" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "de27f8c8d34d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e069ba58981" + }, + { + "code": "docker images\n\nREPOSITORY TAG IMAGE ID CREATED SIZE\ncbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB", + "_type": "code", + "_key": "859c42e5cad8" + }, + { + "markDefs": [], + "children": [ + { + "_key": "adbb0489873f", + "_type": "span", + "marks": [], + "text": "###Pushing the Docker Image to Dockerhub" + } + ], + "_type": "block", + "style": "normal", + "_key": "36110c0bc0bc" + }, + { + "style": "normal", + "_key": "72eb6aa2d1ff", + "markDefs": [], + "children": [ + { + "_key": "1818ebcbf996", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "If you have not previously, sign up for a Dockerhub account ", + "_key": "d3c68be9bab9", + "_type": "span", + "marks": [] + }, + { + "text": "here", + "_key": "73adbe5a767b", + "_type": "span", + "marks": [ + "1cf86a9aeb72" + ] + }, + { + "_key": "fdb56fb68fc0", + "_type": "span", + "marks": [], + "text": ". From the command line, login to Dockerhub and push your image." + } + ], + "_type": "block", + "style": "normal", + "_key": "a7bd5e43df27", + "markDefs": [ + { + "_key": "1cf86a9aeb72", + "_type": "link", + "href": "https://hub.docker.com/" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "74b517dfa3a2" + } + ], + "_type": "block", + "style": "normal", + "_key": "76d11410797f" + }, + { + "code": "docker login --username=cbcrg\ndocker push cbcrg/lncrna_annotation", + "_type": "code", + "_key": "72e018a1b3a7" + }, + { + "_key": "4e814562758e", + "markDefs": [], + "children": [ + { + "text": "You can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:", + "_key": "603c47308e12", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "25b68c28836e" + } + ], + "_type": "block", + "style": "normal", + "_key": "12f3da40fcc2", + "markDefs": [] + }, + { + "_type": "code", + "_key": "9c8cc03d66d4", + "code": "docker rmi -f d8ec49cbe3ed\n\n# Ensure the local version is not listed.\ndocker images\n\ndocker pull cbcrg/lncrna_annotation" + }, + { + "children": [ + { + "marks": [], + "text": "We are now almost ready to run our pipeline. The last step is to set up the Nexflow config.", + "_key": "fb18e8ebb6fb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "67b0d083f1e1", + "markDefs": [] + }, + { + "_key": "851718e1c203", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7e1f6285672c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "###Nextflow Configuration", + "_key": "ee703ba1a7b8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "89a8e9b57253", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3c12e4f84be6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "301b53373abc" + }, + { + "children": [ + { + "text": "Within the ", + "_key": "e450d4c03687", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "eb562dcd976e" + }, + { + "_type": "span", + "marks": [], + "text": " file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:", + "_key": "0aada97916c3" + } + ], + "_type": "block", + "style": "normal", + "_key": "853618d141bc", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "bcefea639daa", + "markDefs": [], + "children": [ + { + "_key": "56eecb336e47", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "General (same docker image for all processes):\n\n process {\n container = 'cbcrg/lncrna_annotation'\n }\nSpecific to a profile (specified by `-profile crg` for example):\n\n profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\nSpecific to a given process within a pipeline:\n\n $processName.container = 'cbcrg/lncrna_annotation'", + "_key": "dc79564414fe" + } + ], + "_type": "block", + "style": "normal", + "_key": "b30a67dabb0c" + }, + { + "_type": "block", + "style": "normal", + "_key": "a560b7a67c40", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "14aebd1b7468", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "4033d3bebdf9", + "markDefs": [ + { + "href": "https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html", + "_key": "f61aacdb2ef0", + "_type": "link" + } + ], + "children": [ + { + "text": "In most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have ", + "_key": "1a64527f3033", + "_type": "span", + "marks": [] + }, + { + "text": "previously written about this", + "_key": "38cf9657683c", + "_type": "span", + "marks": [ + "f61aacdb2ef0" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", but briefly, including a hash ensures that not a single byte of the operating system or software is different.", + "_key": "bc4e97553513" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "441548f75de3" + } + ], + "_type": "block", + "style": "normal", + "_key": "0eaf14f96c05" + }, + { + "code": " process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }", + "_type": "code", + "_key": "e986f84b6af5" + }, + { + "children": [ + { + "marks": [], + "text": "All that is left now to run the pipeline.", + "_key": "132c729c8d25", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "39e6843958d4", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "a4a85a9e7228", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3eba503d4fca" + } + ] + }, + { + "_key": "a90f3eeed817", + "code": "nextflow run lncRNA-Annotation-nf -profile test", + "_type": "code" + }, + { + "_key": "e51c1eda68c5", + "markDefs": [], + "children": [ + { + "_key": "6bc1b9275274", + "_type": "span", + "marks": [], + "text": "Whilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f4ab602e7e18", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7ece5b1d69ec" + } + ] + }, + { + "_key": "3e9640736a34", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "###CircleCI and Nextflow", + "_key": "127548bd2ca6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "48776ea3a77d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8bd12b9a35d5" + }, + { + "style": "normal", + "_key": "006188af7329", + "markDefs": [ + { + "href": "http://www.circleci.com", + "_key": "52d7d21fec88", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the ", + "_key": "e1a0115c8a63" + }, + { + "_type": "span", + "marks": [ + "52d7d21fec88" + ], + "text": "CircleCI", + "_key": "bf9e5650e51a" + }, + { + "text": " servers (using Docker).", + "_key": "a7690c9f35e1", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "bbb43942df4f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "565cf1047e08" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "41364a5f63d3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To include CircleCI in the Nexflow pipeline, create a file named ", + "_key": "f8cea4ca2097", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "circle.yml", + "_key": "fa98c01db045", + "_type": "span" + }, + { + "_key": "2f21b332f3b0", + "_type": "span", + "marks": [], + "text": " in the project directory. We add the following instructions to the file:" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "25942a6c677a" + } + ], + "_type": "block", + "style": "normal", + "_key": "e2b0b14d0fd2" + }, + { + "code": "machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\ndependencies:\n override:\n\ntest:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test", + "_type": "code", + "_key": "7433acb412d2" + }, + { + "children": [ + { + "marks": [], + "text": "Next you can sign up to CircleCI, linking your GitHub account.", + "_key": "433129d9fd5e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "70d6d1859e1d", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "3a0243c5639e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "2f2296b7bb34" + }, + { + "_key": "261b716a06a7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Within the GitHub README.md you can add a badge with the following:", + "_key": "0be2d4a60379", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c227bf5b9089", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "64db719ff8e1", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "code": "![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)", + "_type": "code", + "_key": "a375b3bed0e9" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "###Tips and Tricks", + "_key": "46f101c27e69" + } + ], + "_type": "block", + "style": "normal", + "_key": "1642b961bc5a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "993ba2832874", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "dd9168b63937", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "File permissions", + "_key": "a2f6b726c62d", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ": When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:", + "_key": "287310bd8df1" + } + ], + "_type": "block", + "style": "normal", + "_key": "d33d746e9473" + }, + { + "_type": "block", + "style": "normal", + "_key": "0fd72cb1c652", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "f294eccb09f6", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "d03452f5b41c", + "markDefs": [], + "children": [ + { + "text": "find -type f -exec chmod 644 {} \\; find -type d -exec chmod 755 {} \\;", + "_key": "d7e384eae7c7", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5c097f5ad5b2", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8dc4ce35290d", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_key": "a645ea709cb2", + "_type": "link", + "href": "mailto:/evanfloden@gmail.com" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "###Summary This was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform. ", + "_key": "f21d4187558c" + }, + { + "_key": "38e9df5b660d", + "_type": "span", + "marks": [ + "a645ea709cb2" + ], + "text": "/evanfloden@gmail.com" + } + ], + "_type": "block", + "style": "normal", + "_key": "16153769e1e6" + } + ], + "title": "Docker for dunces & Nextflow for nunces", + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "5edc3ed408ba" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "c2a74b2b2cad" + } + ], + "_type": "blogPost", + "_createdAt": "2024-09-25T14:15:05Z", + "_rev": "1caaa5f4-55b0-4c8f-a7fa-f6e12936c553" + }, + { + "_rev": "46bcb683-925f-4a71-8f3e-b9210031d1b8", + "meta": { + "description": "asdasd", + "noIndex": true, + "slug": { + "_type": "slug", + "current": "singularity-reloaded-2" + }, + "_type": "meta" + }, + "_id": "drafts.9fb5989a-4718-4430-b79d-414c0046c359", + "title": "Test article", + "publishedAt": "2024-04-18T15:35:00.000Z", + "tags": [ + { + "_key": "1bbbc4317abb", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + }, + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "342b6f7ba8ea" + }, + { + "_key": "535a050b7e1c", + "_ref": "d356a4d5-06c1-40c2-b655-4cb21cf74df1", + "_type": "reference" + } + ], + "author": { + "_type": "reference", + "_ref": "bfa556d4-8ea3-419d-99f9-3716804c5f2a" + }, + "body": [ + { + "_key": "f7338c80f8da", + "markDefs": [], + "children": [ + { + "_key": "b13a6f8930ec0", + "_type": "span", + "marks": [], + "text": "Containers are essential components in reproducible scientific workflows. They enable applications to be easily packaged and distributed along with dependencies, making them portable across operating systems, runtimes, and clouds." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://sylabs.io/singularity/", + "_key": "15042503705b" + }, + { + "_key": "1bbf0802e6b5", + "_type": "link", + "href": "https://apptainer.org/news/community-announcement-20211130/" + } + ], + "children": [ + { + "_key": "96d378e2b2470", + "_type": "span", + "marks": [], + "text": "While Docker is the most popular container runtime and file format, " + }, + { + "_type": "span", + "marks": [ + "15042503705b" + ], + "text": "Singularity", + "_key": "96d378e2b2471" + }, + { + "_type": "span", + "marks": [], + "text": " (and now ", + "_key": "96d378e2b2472" + }, + { + "text": "Apptainer", + "_key": "96d378e2b2473", + "_type": "span", + "marks": [ + "1bbf0802e6b5" + ] + }, + { + "_type": "span", + "marks": [], + "text": ") have emerged as preferred solutions in HPC settings. For HPC users, Singularity provides several advantages:", + "_key": "96d378e2b2474" + } + ], + "_type": "block", + "style": "normal", + "_key": "bedaa0d3d2b6" + }, + { + "_type": "block", + "style": "normal", + "_key": "d62d8d3e6e43", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "3ed7116a3a920", + "_type": "span", + "marks": [], + "text": "Singularity runs under a Linux user's UID, avoiding security concerns and simplifying file system access in multi-user environments." + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "736be0294d42", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Singularity Image Format (SIF) containers are stored as individual files, making them portable across cluster nodes, easy to manage, and fast to load.", + "_key": "2ffc515f96d3" + } + ], + "level": 1 + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Containers work seamlessly with workload managers such as Slurm or Spectrum LSF, running under the workload manager’s control rather than as a child of the Docker daemon.", + "_key": "d08b4222f9ea" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4f986efa8cb6", + "listItem": "bullet", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "This article explains how ", + "_key": "30b2d63f14200", + "_type": "span" + }, + { + "_key": "30b2d63f14201", + "_type": "span", + "marks": [ + "42bd86b81ff3" + ], + "text": "Nextflow" + }, + { + "marks": [], + "text": " and ", + "_key": "30b2d63f14202", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "cf8ddb481912" + ], + "text": "Wave", + "_key": "30b2d63f14203" + }, + { + "_key": "30b2d63f14204", + "_type": "span", + "marks": [], + "text": " are evolving to meet the needs of HPC users, supporting new capabilities in both Singularity and Apptainer. Read on to learn more!" + } + ], + "_type": "block", + "style": "normal", + "_key": "738092543938", + "markDefs": [ + { + "_key": "42bd86b81ff3", + "_type": "link", + "href": "https://seqera.io/nextflow/" + }, + { + "_type": "link", + "href": "https://seqera.io/wave/", + "_key": "cf8ddb481912" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "97dd14bd260c0", + "_type": "span", + "marks": [], + "text": "Singularity vs. Apptainer" + } + ], + "_type": "block", + "style": "h2", + "_key": "6cad29f83c4b" + }, + { + "_type": "block", + "style": "normal", + "_key": "029194e5aa78", + "markDefs": [ + { + "href": "https://sylabs.io/", + "_key": "a7ac85241dde", + "_type": "link" + }, + { + "href": "https://hpcng.org/", + "_key": "3c204b3bdb74", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "There is often confusion between Singularity and Apptainer, so it is worth providing a brief explanation. When ", + "_key": "d622afbe8b960" + }, + { + "marks": [ + "a7ac85241dde" + ], + "text": "Sylabs", + "_key": "d622afbe8b961", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " forked the Singularity project from the ", + "_key": "d622afbe8b962" + }, + { + "_type": "span", + "marks": [ + "3c204b3bdb74" + ], + "text": "HPCng", + "_key": "d622afbe8b963" + }, + { + "marks": [], + "text": " repository in May of 2021, they chose not to rename their fork. As a result, the name “Singularity” described both the original open-source project and Sylabs’ new version underpinning their commercial offerings.", + "_key": "d622afbe8b964", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "f4ae7180a502", + "markDefs": [ + { + "href": "https://apptainer.org/", + "_key": "7be27073a836", + "_type": "link" + }, + { + "_key": "416a88e85588", + "_type": "link", + "href": "https://sylabs.io/singularity/" + }, + { + "_key": "943aa1a4c6f9", + "_type": "link", + "href": "https://sylabs.io/singularity-pro/" + }, + { + "_type": "link", + "href": "https://apptainer.org/", + "_key": "b624ce929ab4" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To avoid confusion, members of the original Singularity project moved their project to the Linux Foundation in November 2021, and renamed it “", + "_key": "b0e0b949c8790" + }, + { + "marks": [ + "7be27073a836" + ], + "text": "Apptainer", + "_key": "b0e0b949c8791", + "_type": "span" + }, + { + "text": ".” As a result of these moves, Singularity has diverged. ", + "_key": "b0e0b949c8792", + "_type": "span", + "marks": [] + }, + { + "_key": "b0e0b949c8793", + "_type": "span", + "marks": [ + "416a88e85588" + ], + "text": "SingularityCE" + }, + { + "_key": "b0e0b949c8794", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "943aa1a4c6f9" + ], + "text": "SingularityPro", + "_key": "b0e0b949c8795" + }, + { + "marks": [], + "text": " are maintained by Sylabs, and open-source Apptainer is available from ", + "_key": "b0e0b949c8796", + "_type": "span" + }, + { + "marks": [ + "b624ce929ab4" + ], + "text": "apptainer.org", + "_key": "b0e0b949c8797", + "_type": "span" + }, + { + "text": " with available commercial support.", + "_key": "b0e0b949c8798", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/container.html#singularity", + "_key": "d6e909d02df2" + }, + { + "_key": "c5c4cb0b927e", + "_type": "link", + "href": "https://nextflow.io/docs/latest/container.html#apptainer" + } + ], + "children": [ + { + "_key": "ee43b516076e0", + "_type": "span", + "marks": [], + "text": "Nextflow and Seqera fully support both Singularity dialects, treating " + }, + { + "marks": [ + "d6e909d02df2" + ], + "text": "Singularity", + "_key": "ee43b516076e1", + "_type": "span" + }, + { + "_key": "ee43b516076e2", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "c5c4cb0b927e" + ], + "text": "Apptainer", + "_key": "ee43b516076e3" + }, + { + "text": " as distinct offerings reflecting their unique and evolving features.", + "_key": "ee43b516076e4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "39bbf08809ac" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow support for Singularity and Apptainer", + "_key": "b2ed20d4e95a0" + } + ], + "_type": "block", + "style": "h2", + "_key": "42f286c076f2" + }, + { + "_type": "block", + "style": "normal", + "_key": "87e3b726fcd5", + "markDefs": [ + { + "_key": "5d72a0a1037f", + "_type": "link", + "href": "https://hub.docker.com/" + }, + { + "_type": "link", + "href": "https://quay.io/", + "_key": "63433f065473" + }, + { + "_type": "link", + "href": "https://aws.amazon.com/ecr/", + "_key": "28c6cc646431" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow can pull containers in different formats from multiple sources, including Singularity Hub, Singularity Library, or Docker/OCI-compatible registries such as ", + "_key": "5f094f8e09d00", + "_type": "span" + }, + { + "text": "Docker Hub", + "_key": "5f094f8e09d01", + "_type": "span", + "marks": [ + "5d72a0a1037f" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "5f094f8e09d02" + }, + { + "text": "Quay.io", + "_key": "5f094f8e09d03", + "_type": "span", + "marks": [ + "63433f065473" + ] + }, + { + "text": ", or ", + "_key": "5f094f8e09d04", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "28c6cc646431" + ], + "text": "Amazon ECR", + "_key": "5f094f8e09d05" + }, + { + "_key": "5f094f8e09d06", + "_type": "span", + "marks": [], + "text": ". In HPC environments, Nextflow users can also point to existing SIF format images that reside on a shared file system.\n\n" + } + ] + }, + { + "asset": { + "_ref": "image-eff63aca1ce03113f328233754388476d21122c3-736x414-jpg", + "_type": "reference" + }, + "_type": "image", + "_key": "38a09d9b4da5" + }, + { + "_key": "4f043506d28f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nFor Nextflow users in HPC environments, a common usage pattern has been to have Nextflow download and convert OCI/Docker images to SIF format on the fly. For this to work, scratch storage needs to be available on the cluster node running the Nextflow head job to facilitate downloading the container’s OCI blob layers and assembling the SIF file. The resulting SIF file IS then stored on a shared file system accessible to other cluster nodes. While this works, there are problems with this approach:", + "_key": "318361117b6a0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "2b01846439c1", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "3d2579c339120", + "_type": "span", + "marks": [], + "text": "Having the Nextflow head node responsible for downloading and converting multiple images presents a bottleneck that affects performance." + } + ], + "level": 1, + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "In production environments, pointing ", + "_key": "5147a083367f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "SINGULARITY_TMPDIR", + "_key": "07ba1ab2625c" + }, + { + "_key": "0c8d0debd1ac", + "_type": "span", + "marks": [], + "text": " to fast local storage is a standard practice for speeding the generation of SIF format images, but this adds configuration complexity in clustered environments.\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4c518f71df22" + }, + { + "style": "h2", + "_key": "650e0c02a21f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "A better approach using Nextflow ociAutoPull", + "_key": "753e95324b29", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "0e650e126dc5", + "_type": "span", + "marks": [], + "text": "As of version 23.12.0-edge, Nextflow provides a new `ociAutoPull` option for both Singularity and Apptainer that delegates the conversion of OCI-compliant images to Singularity format to the container runtime itself\n\nThis approach has several advantages over the previous approach:\n\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "decd5c6bbfd9" + }, + { + "_key": "e13521fd6454", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a4d90da06d1a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e379ee76dbfc", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "The pull and conversion phase of generating SIF files from OCI images is managed by the container runtime instead of by Nextflow.", + "_key": "c7c4fe5e6cfa", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0dade09a96bd", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The pull and conversion happen on compute nodes instead of the node running the head job, thus freeing up the head node and enabling conversions to execute in parallel.", + "_key": "7209e7e6c588", + "_type": "span" + } + ] + }, + { + "_key": "a48400c30d94", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Images are cached on the compute nodes with the OCI layers intact. Assuming images are cached on a shared file system, when two containers share the same base images, only one copy needs to be retained. This avoids the need for unnecessary downloads and processing.", + "_key": "b63279f2b87a" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The example below illustrates how this works in practice:\n", + "_key": "7a338e68755c" + } + ], + "_type": "block", + "style": "normal", + "_key": "736f3fda8e11", + "markDefs": [] + }, + { + "children": [ + { + "marks": [ + "code" + ], + "text": "\nsingularity.enabled = true\nsingularity.ociAutoPull = true\nprocess.container = 'ubuntu:latest'", + "_key": "31d648e84ab9", + "_type": "span" + }, + { + "text": "\n", + "_key": "ac0e10d2f780", + "_type": "span", + "marks": [] + }, + { + "text": "\n$ nextflow run hello -c \n", + "_key": "54962958b47e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": "\n\nIf you are using Apptainer, replace the scope singularity with `apptainer` in the Nextflow config example above.", + "_key": "9aa862af40f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "58d2e52b7129", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Running OCI format containers", + "_key": "4b1c5a4da958" + } + ], + "_type": "block", + "style": "h2", + "_key": "acc549d83bfe" + }, + { + "_key": "e72f6254629d", + "markDefs": [], + "children": [ + { + "text": "Apptainer now supports multiple image formats including Singularity SIF files, SquashFS files, and Docker/OCI containers hosted on an OCI registry. As of SingularityCE 4.0, Sylabs introduced a new SIF image format that directly encapsulates OCI containers. They also introduced a new OCI mode enabled by the `--oci` command line switch or by adding the `oci mode` directive to the `singularity.conf` file.\n\nWhen OCI mode is enabled, Singularity uses a new low-level runtime to achieve OCI compatibility. This is a major step forward, allowing Singularity to execute OCI-compliant container images directly, solving previous compatibility issues. For Singularity users, this new runtime and direct support for OCI container images make it much more efficient to run OCI containers.\n\nIn Nextflow, this functionality can be enabled as follows:", + "_key": "940cd721e4ca", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "550188acdced", + "markDefs": [], + "children": [ + { + "marks": [ + "code" + ], + "text": "```\nsingularity.enabled = true\nsingularity.ociMode = true\nprocess.container = 'ubuntu:latest'\n```", + "_key": "fe85379057ef", + "_type": "span" + }, + { + "text": "\n", + "_key": "7aaf83c6cfeb", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "56647f3fa67b", + "markDefs": [], + "children": [ + { + "text": "```\nnextflow run hello -c \n```", + "_key": "ebf86ed2be24", + "_type": "span", + "marks": [ + "code" + ] + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Wave support for Singularity", + "_key": "ae510f1077cf", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "2438b250b0d3", + "markDefs": [] + }, + { + "children": [ + { + "text": "In addition to the feature above, Nextflow provides better support for Singularity and Wave containers.\n\nWave is a container provisioning service that, among other things, allows for the on-demand assembly of containers based on the dependencies of the jobs in your data analysis workflows.\n\nNextflow, along with Wave, allows you to build Singularity native images by using the Conda packages declared in your Nextflow configuration file. Singularity container images are stored in an OCI-compliant registry and pulled on demand by your pipeline.\n\nTo enable this capability, you will need to add the following settings to your nextflow.config. In our example, these settings were stored in `wave-singularity.config`.", + "_key": "59bc036c04fe", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e49356276920", + "markDefs": [] + }, + { + "style": "normal", + "_key": "7e0b3498eba6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "```", + "_key": "3e01b8196434" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "4adeda237ffd", + "markDefs": [], + "children": [ + { + "text": "singularity.enabled = true\nsingularity.autoMounts = true\nsingularity.ociAutoPull = true\n\nwave.enabled = true\nwave.freeze = true\nwave.build.repository = 'docker.io//wavebuild'\nwave.build.cacheRepository = 'docker.io//wave-cache'\n\ntower.accessToken = ''\ntower.workspaceId = ''\n\nwave.strategy = ['conda']\nconda.channels = 'seqera,conda-forge,bioconda,defaults'\n```", + "_key": "a45394330d4f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": "\n\nYou can test this configuration using the command below. In this example. Nextflow invokes Wave to build Singularity containers on the fly and freezes them to a repository using credentials stored in the Seqera Platform.", + "_key": "f0f6f97e052e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "263d37f8fa4d0", + "_type": "span", + "marks": [], + "text": "Nextflow requires that the `accessToken` and `workspaceId` for the Seqera workspace containing the registry credentials be supplied in the `nextflow.config` file (above) so that the containers can be persisted in the user’s preferred registry." + } + ], + "_type": "block", + "style": "normal", + "_key": "bf3fed1845eb", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The personal authorization token (`tower.accessToken`) required to access the Seqera API can be generated in the user menu under `Your Tokens` from within the Seqera web interface. See the ", + "_key": "c06562019e950" + }, + { + "_type": "span", + "marks": [ + "cf58d80fe1f7" + ], + "text": "Seqera documentation", + "_key": "c06562019e951" + }, + { + "_type": "span", + "marks": [], + "text": " for instructions on how to create a Docker Hub personal access token (PAT) and store it as a credential in your organization workspace.", + "_key": "c06562019e952" + } + ], + "_type": "block", + "style": "normal", + "_key": "5de92052a601", + "markDefs": [ + { + "href": "https://docs.seqera.io/platform/23.4.0/credentials/docker_hub_registry_credentials", + "_key": "cf58d80fe1f7", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9309640448c3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8f9be69efae9" + }, + { + "children": [ + { + "marks": [ + "code" + ], + "text": "```", + "_key": "d615f1088d78", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f168616d8b0a", + "markDefs": [] + }, + { + "children": [ + { + "text": "$ nextflow run rnaseq-nf -c ./wave-singularity.config\n\nN E X T F L O W ~ version 24.02.0-edge\n\n┃ Launching `https://github.com/nextflow-io/rnaseq-nf` [serene_montalcini] DSL2 - revision: 8253a586cc [master]\n\nR N A S E Q - N F P I P E L I N E\n===================================\ntranscriptome: /home/ubuntu/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa\nreads : /home/ubuntu/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_gut_{1,2}.fq\noutdir : results\n\nexecutor > local (4)\n[1f/af2ca7] RNA…ggal_1_48850000_49020000) | 1 of 1 ✔\n[d0/afbc55] RNA…STQC (FASTQC on ggal_gut) | 1 of 1 ✔\n[b0/f9587a] RNASEQ:QUANT (ggal_gut) | 1 of 1 ✔\n[f0/093b45] MULTIQC | 1 of 1 ✔\n\nDone! Open the following report in your browser --> results/multiqc_report.htm\n```", + "_key": "b33be1859e41", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": "\n\nYou can use the `nextflow inspect` command to view the path to the containers built and pushed to the repo by wave as follows:", + "_key": "158aca23997a" + } + ], + "_type": "block", + "style": "normal", + "_key": "882a46d001e4", + "markDefs": [] + }, + { + "children": [ + { + "marks": [ + "code" + ], + "text": "```", + "_key": "ecc97197456b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1733fbc04666", + "markDefs": [] + }, + { + "style": "normal", + "_key": "93b2248f6ec6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "$ ", + "_key": "537f7c3062a30" + }, + { + "_type": "span", + "marks": [ + "strong", + "code" + ], + "text": "nextflow inspect rnaseq-nf -c ./wave-singularity.config", + "_key": "537f7c3062a31" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3a58128b29f6", + "markDefs": [], + "children": [ + { + "_key": "bb2c993fd85b0", + "_type": "span", + "marks": [ + "code" + ], + "text": "{" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "\"processes\": [", + "_key": "ecda074c68b50", + "_type": "span", + "marks": [ + "code" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "eefb7103ac4b" + }, + { + "children": [ + { + "text": "{", + "_key": "3992d97323560", + "_type": "span", + "marks": [ + "code" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "63aa00b73ce3", + "markDefs": [] + }, + { + "style": "normal", + "_key": "716d0852c154", + "markDefs": [], + "children": [ + { + "_key": "43ccc11369ee0", + "_type": "span", + "marks": [ + "code" + ], + "text": "\"name\": \"RNASEQ:INDEX\"," + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "\"container\": \"docker://docker.io//wavebuild:salmon-1.10.2--fdce05f6d77af751\"", + "_key": "72ba851acb1a0", + "_type": "span", + "marks": [ + "code" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "f0c73c40eefd" + }, + { + "_type": "block", + "style": "normal", + "_key": "f5f9077d398b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "},", + "_key": "153c06ad50660" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "code" + ], + "text": "{", + "_key": "a09787bd89690", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5feee5c7500d" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "\"name\": \"RNASEQ:QUANT\",", + "_key": "846867e51f060" + } + ], + "_type": "block", + "style": "normal", + "_key": "ebd95ef2f9c7", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "492796aeea20", + "markDefs": [], + "children": [ + { + "_key": "f7101f985ff60", + "_type": "span", + "marks": [ + "code" + ], + "text": "\"container\": \"docker://docker.io//wavebuild:salmon-1.10.2--fdce05f6d77af751\"" + } + ] + }, + { + "_key": "28bcff4b8875", + "markDefs": [], + "children": [ + { + "text": "},", + "_key": "8262013529ff0", + "_type": "span", + "marks": [ + "code" + ] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "{", + "_key": "f16a1eeab71f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "18c3cb0bdba5", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "\"name\": \"MULTIQC\",", + "_key": "098094fb8c7f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "71076f51cbb2" + }, + { + "_key": "3df10d4ae98f", + "markDefs": [], + "children": [ + { + "_key": "dafd7cf086500", + "_type": "span", + "marks": [ + "code" + ], + "text": "\"container\": \"docker://docker.io//wavebuild:multiqc-1.17--d85209f21556c472\"" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "aa7e2d2a0ca4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "},", + "_key": "9d65718ed7d60" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [ + "code" + ], + "text": "{", + "_key": "a3fdd44e78bb0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "57f8a7dc1545", + "markDefs": [] + }, + { + "_key": "320f5b79d359", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "\"name\": \"RNASEQ:FASTQC\",", + "_key": "ea5db7353b070" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "\"container\": \"docker://docker.io//wavebuild:fastqc-0.12.1--f44601bdd08701ed\"", + "_key": "bbd68658fe1a0" + } + ], + "_type": "block", + "style": "normal", + "_key": "92f7f61d9dcb" + }, + { + "children": [ + { + "marks": [ + "code" + ], + "text": "}", + "_key": "e7ac774e844c0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ec810778d2d7", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "7f5e28a4b94d", + "markDefs": [], + "children": [ + { + "_key": "40b4ab61fd7a0", + "_type": "span", + "marks": [ + "code" + ], + "text": "]" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "}\n```", + "_key": "e367b0fb78e30" + } + ], + "_type": "block", + "style": "normal", + "_key": "70aa1a5fd670" + }, + { + "_key": "e66b050d19a4", + "markDefs": [], + "children": [ + { + "text": "Singularity containers built by Wave can be stored locally on your HPC cluster or be served from your preferred registry at runtime providing tremendous flexibility.\n", + "_key": "2a0e54e352340", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "38af6aad3526", + "markDefs": [], + "children": [ + { + "text": "Conclusion", + "_key": "093014d7b5800", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "7047964024b60", + "_type": "span", + "marks": [], + "text": "Nextflow continues to improve pipeline portability and reproducibility across clusters and cloud computing environments by providing the widest support for container runtimes and cutting-edge functionality for Singularity users.\n\nToday, Nextflow supports Apptainer, Singularity, Charliecloud, Docker, Podman, Sarus, and Shifter with rich support for native Singularity and OCI container formats. Nextflow can run both container formats served from multiple sources, including Singularity Hub, Singularity Library, or any Docker/OCI-compliant registry.\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "51ef6c751d30" + }, + { + "children": [ + { + "text": "adssd", + "_key": "d3b16e468697", + "_type": "span", + "marks": [ + "code" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "20152028f035", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "code" + ], + "text": "", + "_key": "f50b33bf11e8" + } + ], + "_type": "block", + "style": "normal", + "_key": "141255fba432" + }, + { + "_key": "7787d7ae1b7b", + "markDefs": [], + "children": [ + { + "marks": [ + "code" + ], + "text": "", + "_key": "92529617dfbb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "script", + "_key": "b39fdec04a39" + }, + { + "_key": "e59aa97a0474", + "code": "
Test
", + "_type": "code", + "language": "html" + }, + { + "_key": "664ae0fdfa7f", + "_type": "script" + } + ], + "_createdAt": "2024-04-15T12:03:42Z", + "_type": "blogPost", + "_updatedAt": "2024-10-07T16:14:54Z" + } +] \ No newline at end of file diff --git a/internal/step3/backup.mjs b/internal/step3/backup.mjs new file mode 100644 index 00000000..c40471bd --- /dev/null +++ b/internal/step3/backup.mjs @@ -0,0 +1,24 @@ +import fs from 'fs'; +import path from 'path'; +import sanityClient from '@sanity/client'; + +const outputFile = path.join(process.cwd(), 'backup.json'); + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + + +async function fetchBlogPosts() { + return await client.fetch(`*[_type == "blogPost"]`); +} + +async function doBackup() { + const posts = await fetchBlogPosts(); + fs.writeFileSync(outputFile, JSON.stringify(posts, null, 2)); +} + +doBackup(); \ No newline at end of file diff --git a/internal/step3/migrateBlogType.mjs b/internal/step3/migrateBlogType.mjs new file mode 100644 index 00000000..310e08b6 --- /dev/null +++ b/internal/step3/migrateBlogType.mjs @@ -0,0 +1,46 @@ +import sanityClient from '@sanity/client'; +import { customAlphabet } from 'nanoid'; + +const nanoid = customAlphabet('0123456789abcdef', 12); + +export const client = sanityClient({ + projectId: 'o2y1bt2g', + dataset: 'seqera', + token: process.env.SANITY_TOKEN, + useCdn: false, +}); + +async function fetchBlogPostsDev() { + return await client.fetch(`*[_type == "blogPostDev"]`); +} + +async function fetchBlogPosts() { + return await client.fetch(`*[_type == "blogPost"]`); +} + +async function migrateBlogType() { + console.log('🟢🟢🟢 Migrating'); + const devPosts = await fetchBlogPostsDev(); + const posts = await fetchBlogPosts(); + + for (const post of devPosts) { + console.log('🔵 >> ', post.meta.slug.current); + const existing = posts.find(p => p.meta.slug.current === post.meta.slug.current); + if (!!existing) { + console.log('🟡 exists >> ', existing.meta.slug.current); + console.log('🟡 skipping >> ', existing.title); + continue; + } + const newPost = { + ...post, + _type: 'blogPost', + _id: nanoid(), + _rev: undefined, + } + const p = await client.create(newPost); + console.log('🟢 created >> ', p.title); + } + console.log('🟢🟢🟢 Done'); +} + +migrateBlogType(); \ No newline at end of file From ce07c92ff0640013d1905a236270d8b425a7e603 Mon Sep 17 00:00:00 2001 From: Jake Broughton Date: Wed, 16 Oct 2024 18:43:48 +0200 Subject: [PATCH 20/21] Backup --- internal/step3/backup.mjs | 2 +- internal/step3/backup2.json | 192074 +++++++++++++++++++++++++ internal/step3/backup3.json | 194073 ++++++++++++++++++++++++++ internal/step3/migrateBlogType.mjs | 4 +- 4 files changed, 386150 insertions(+), 3 deletions(-) create mode 100644 internal/step3/backup2.json create mode 100644 internal/step3/backup3.json diff --git a/internal/step3/backup.mjs b/internal/step3/backup.mjs index c40471bd..0dd65acd 100644 --- a/internal/step3/backup.mjs +++ b/internal/step3/backup.mjs @@ -2,7 +2,7 @@ import fs from 'fs'; import path from 'path'; import sanityClient from '@sanity/client'; -const outputFile = path.join(process.cwd(), 'backup.json'); +const outputFile = path.join(process.cwd(), 'backup3.json'); export const client = sanityClient({ projectId: 'o2y1bt2g', diff --git a/internal/step3/backup2.json b/internal/step3/backup2.json new file mode 100644 index 00000000..1c4ed7fe --- /dev/null +++ b/internal/step3/backup2.json @@ -0,0 +1,192074 @@ +[ + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_type": "blogPost", + "title": "Bringing Nextflow to Google Cloud Platform with WuXi NextCODE", + "publishedAt": "2018-12-18T07:00:00.000Z", + "_updatedAt": "2024-10-14T15:01:30Z", + "body": [ + { + "style": "normal", + "_key": "8ee0452c6463", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e36636ba3a31", + "_type": "span" + } + ], + "_type": "block" + }, + { + "alt": "Google cloud", + "_key": "d789b60a8ddc", + "alignment": "right", + "asset": { + "_type": "image", + "asset": { + "_type": "reference", + "_ref": "image-7fae682c6f3664d3952e47ef54bd0b49343c9a1d-181x28-svg" + } + }, + "size": "medium", + "_type": "picture" + }, + { + "size": "medium", + "_type": "picture", + "alt": "WuXiNetCode", + "_key": "829c5b8f5185", + "alignment": "right", + "asset": { + "_type": "image", + "asset": { + "_ref": "image-446465196e0697bbb77a2d0cf8c4b0b4880df352-900x203-jpg", + "_type": "reference" + } + } + }, + { + "_type": "block", + "style": "normal", + "_key": "1fa958f8b436", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Google Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.", + "_key": "765cfc1bc853", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b9ef5ebb3bd0", + "markDefs": [ + { + "_key": "e7ce80cdc9cf", + "_type": "link", + "href": "https://cloud.google.com/genomics/pipelines" + } + ], + "children": [ + { + "_key": "f08cffdfa0e5", + "_type": "span", + "marks": [], + "text": "WuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the " + }, + { + "_key": "1864c67ab899", + "_type": "span", + "marks": [ + "e7ce80cdc9cf" + ], + "text": "Google Pipelines API" + }, + { + "marks": [], + "text": ". Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.", + "_key": "737e796ea0b2", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "d33e09b96126", + "_type": "span", + "marks": [], + "text": "Nextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud." + } + ], + "_type": "block", + "style": "normal", + "_key": "c954fbe7f126", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2a9365b134e2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "17404c3264c5" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Get started!", + "_key": "3be229258825" + } + ], + "_type": "block", + "style": "h2", + "_key": "54b6bf5701b6", + "markDefs": [] + }, + { + "_key": "0304286c8b13", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This feature is currently available in the Nextflow edge channel. Follow these steps to get started:", + "_key": "ae023b45b354" + } + ], + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "4a31e6afb31f0", + "_type": "span", + "marks": [], + "text": "Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "93d2e70d2c81" + }, + { + "style": "normal", + "_key": "c26bd9413310", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "441262cf359f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "8aa600f9da11", + "code": "export NXF_VER=18.12.0-edge\nexport NXF_MODE=google\ncurl https://get.nextflow.io | bash", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "fe577f8284f9", + "listItem": "bullet", + "markDefs": [ + { + "_key": "5ed0cccd25ef", + "_type": "link", + "href": "https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com" + } + ], + "children": [ + { + "text": "Enable the Google Genomics API for your GCP projects", + "_key": "7de5436157670", + "_type": "span", + "marks": [ + "5ed0cccd25ef" + ] + }, + { + "text": ".", + "_key": "7de5436157671", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "e255607d3caa", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually", + "_key": "7d2b6a477396" + } + ], + "children": [ + { + "_key": "19613c82c86a0", + "_type": "span", + "marks": [ + "7d2b6a477396" + ], + "text": "Download and set credentials for your Genomics API-enabled project" + }, + { + "marks": [], + "text": ".", + "_key": "19613c82c86a1", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "70edfc6384c4", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/google.html#google-pipelines", + "_key": "d82157ef87c8" + } + ], + "children": [ + { + "_key": "2b0f9161a6ac0", + "_type": "span", + "marks": [], + "text": "Change your " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "2b0f9161a6ac1" + }, + { + "_type": "span", + "marks": [], + "text": " file to use the Google Pipelines executor and specify the required config values for it as ", + "_key": "2b0f9161a6ac2" + }, + { + "marks": [ + "d82157ef87c8" + ], + "text": "described in the documentation", + "_key": "2b0f9161a6ac3", + "_type": "span" + }, + { + "text": ".", + "_key": "2b0f9161a6ac4", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "children": [ + { + "marks": [], + "text": "Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the ", + "_key": "9aeeac86e2a4", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "-work-dir", + "_key": "edbafa1271c6", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " option. For example:", + "_key": "36a0489c07b2" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "073ca244eb05", + "listItem": "bullet", + "markDefs": [] + }, + { + "code": "nextflow run rnaseq-nf -work-dir gs://your-bucket/scratch", + "_type": "code", + "_key": "974838e9637b" + }, + { + "_type": "block", + "style": "normal", + "_key": "dbb2428de0c7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6029e550d7b0" + } + ] + }, + { + "_key": "7f24ec4952e3", + "markDefs": [ + { + "_key": "071ad9bd7039", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/google.html" + } + ], + "children": [ + { + "marks": [], + "text": "You can find more detailed info about available configuration settings and deployment options ", + "_key": "5a75a115afee0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "071ad9bd7039" + ], + "text": "here", + "_key": "d8063142f797" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "be4aa802f9d6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b46978f58895", + "markDefs": [], + "children": [ + { + "text": "We’re thrilled to make this contribution available to the Nextflow community!", + "_key": "631a091be65d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "_createdAt": "2024-09-25T14:15:24Z", + "_rev": "hf9hwMPb7ybAE3bqEU5qVs", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "3ab093d5a797" + }, + { + "_key": "1678d0cbf36c", + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference" + } + ], + "meta": { + "description": "Google Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply.", + "slug": { + "current": "bringing-nextflow-to-google-cloud-wuxinextcode" + } + }, + "_id": "00e11677f7df" + }, + { + "_createdAt": "2024-09-25T14:16:36Z", + "_rev": "rsIQ9Jd8Z4nKBVUruy4PPx", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_type": "blogPost", + "_id": "021ed313c8f2", + "meta": { + "slug": { + "current": "evolution-of-nextflow-runtime" + } + }, + "tags": [ + { + "_type": "reference", + "_key": "332cc2616f09", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "publishedAt": "2022-03-24T07:00:00.000Z", + "title": "Evolution of the Nextflow runtime", + "_updatedAt": "2024-09-26T09:03:10Z", + "body": [ + { + "_key": "0d8f09611cc7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Software development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.", + "_key": "124b26b39eee", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c47c5295eadf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "06d1d13382fe" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Nextflow is no exception. This post will summarise the major changes in the evolution of the framework over the next 12 to 18 months.", + "_key": "54a0a3187c97", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d6451561de62", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "683698696f14" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d856fa1a74e" + }, + { + "_key": "37ceefaf32d4", + "children": [ + { + "_key": "7f601fc1b899", + "_type": "span", + "text": "Java baseline version" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "d4f2aae38a99", + "markDefs": [ + { + "_type": "link", + "href": "https://endoflife.date/java", + "_key": "2a34e0754fd6" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow runs on top of Java (or, more precisely, the Java virtual machine). So far, Java 8 has been the minimal version required to run Nextflow. However, this version was released 8 years ago and is going to reach its end-of-life status at the end of ", + "_key": "f7e5940ff4dd", + "_type": "span" + }, + { + "text": "this month", + "_key": "ef6ec8a8d896", + "_type": "span", + "marks": [ + "2a34e0754fd6" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". For this reason, as of version 22.01.x-edge and the upcoming stable release 22.04.0, Nextflow will require Java version 11 or later for its execution. This also allows the introduction of new capabilities provided by the modern Java runtime.", + "_key": "16c16a8e18b2" + } + ] + }, + { + "_key": "1460a0b9747c", + "children": [ + { + "_key": "a5f4a9cad7af", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://sdkman.io/", + "_key": "cae313613ad1" + } + ], + "children": [ + { + "text": "Tip: If you are confused about how to install or upgrade Java on your computer, consider using ", + "_key": "39e505b489d6", + "_type": "span", + "marks": [] + }, + { + "_key": "75d5f0d2d9b9", + "_type": "span", + "marks": [ + "cae313613ad1" + ], + "text": "Sdkman" + }, + { + "marks": [], + "text": ". It’s a one-liner install tool that allows easy management of Java versions.", + "_key": "326eca772e92", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5a4f5f4709bd" + }, + { + "style": "normal", + "_key": "a411ad5f506a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "623b9ff0aaf9" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "DSL2 as default syntax", + "_key": "bf01a1e543ae" + } + ], + "_type": "block", + "style": "h3", + "_key": "5933868e9d09" + }, + { + "style": "normal", + "_key": "eacccaa47a75", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2020/dsl2-is-here.html", + "_key": "09a00de0d00a" + }, + { + "href": "https://nf-co.re/pipelines", + "_key": "9e4640e14412", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow DSL2 has been introduced nearly ", + "_key": "b16d4a7959ea", + "_type": "span", + "marks": [] + }, + { + "text": "2 years ago", + "_key": "7b28aa10b18c", + "_type": "span", + "marks": [ + "09a00de0d00a" + ] + }, + { + "text": " (how time flies!) and definitely represented a major milestone for the project. Established pipeline collections such as those in ", + "_key": "991fc6d79dd8", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "9e4640e14412" + ], + "text": "nf-core", + "_key": "62486c4ff169", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " have migrated their pipelines to DSL2 syntax.", + "_key": "af1bad29e8c7" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "19122611d713" + } + ], + "_type": "block", + "style": "normal", + "_key": "c0b41800cb68" + }, + { + "children": [ + { + "marks": [], + "text": "This is a confirmation that the DSL2 syntax represents a natural evolution for the project and is not considered to be just an experimental or alternative syntax.", + "_key": "e4f1710c865d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4f186bf2fa5b", + "markDefs": [] + }, + { + "children": [ + { + "_key": "c16ec1f5b844", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a27ae1dfa5c2" + }, + { + "_type": "block", + "style": "normal", + "_key": "2727246ab16a", + "markDefs": [], + "children": [ + { + "text": "For this reason, as for Nextflow version 22.03.0-edge and the upcoming 22.04.0 stable release, DSL2 syntax is going to be the ", + "_key": "18326dbd3fa4", + "_type": "span", + "marks": [] + }, + { + "_key": "5d2d0aef3920", + "_type": "span", + "marks": [ + "strong" + ], + "text": "default" + }, + { + "marks": [], + "text": " syntax version used by Nextflow, if not otherwise specified.", + "_key": "27330afcf1e6", + "_type": "span" + } + ] + }, + { + "_key": "813ddfd66f82", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a48be79b819c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b212ff99872d", + "markDefs": [], + "children": [ + { + "text": "In practical terms, this means it will no longer be necessary to add the declaration ", + "_key": "e8818ed3195e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.enable.dsl = 2", + "_key": "c3bbcbbfea41" + }, + { + "_key": "2c189f6284d4", + "_type": "span", + "marks": [], + "text": " at the top of your script or use the command line option " + }, + { + "marks": [ + "code" + ], + "text": "-dsl2 ", + "_key": "96934518dc20", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " to enable the use of this syntax.", + "_key": "a37ce9db25f1" + } + ] + }, + { + "style": "normal", + "_key": "245bdcbe8b75", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ab6fd943b602" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you still want to continue to use DSL1 for your pipeline scripts, you will need to add the declaration ", + "_key": "3a9c095a9daf" + }, + { + "marks": [ + "code" + ], + "text": "nextflow.enable.dsl = 1", + "_key": "0cbc89df0844", + "_type": "span" + }, + { + "text": " at the top of your pipeline script or use the command line option ", + "_key": "0248aee8b501", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-dsl1", + "_key": "07befd452a8b" + }, + { + "marks": [], + "text": ".", + "_key": "783b04cb3341", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "500b75c5ac0b" + }, + { + "_key": "7009a7822a6b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4ccf0d02ff4d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c22d408fc28f", + "markDefs": [], + "children": [ + { + "_key": "60caa0d27216", + "_type": "span", + "marks": [], + "text": "To make this transition as smooth as possible, we have also added the possibility to declare the DSL version in the Nextflow configuration file, using the same syntax shown above." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "365beb04f28c", + "children": [ + { + "text": "", + "_key": "5413b54a4720", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "20f4983d70d9", + "markDefs": [], + "children": [ + { + "text": "Finally, if you wish to keep the current DSL behaviour and not make any changes in your pipeline scripts, the following variable can be defined in your system environment:", + "_key": "faf1496f480c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "4322063f7e24", + "children": [ + { + "_key": "73585930399f", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "5f343f89977e", + "code": "export NXF_DEFAULT_DSL=1" + }, + { + "_type": "block", + "style": "normal", + "_key": "9f6d4e7bc11c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "50827b49fc84" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "608179f25360", + "children": [ + { + "text": "DSL1 end-of-life phase", + "_key": "46afa98b31af", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "6ab122bf0f3e", + "_type": "span", + "marks": [], + "text": "Maintaining two separate DSL implementations in the same programming environment is not sustainable and, above all, does not make much sense. For this reason, along with making DSL2 the default Nextflow syntax, DSL1 will enter into a 12-month end-of-life phase, at the end of which it will be removed. Therefore version 22.04.x and 22.10.x will be the last stable versions providing the ability to run DSL1 scripts." + } + ], + "_type": "block", + "style": "normal", + "_key": "ef0ca37d3101", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "79566760c8ab" + } + ], + "_type": "block", + "style": "normal", + "_key": "f91427f76068" + }, + { + "children": [ + { + "marks": [], + "text": "This is required to keep evolving the framework and to create a more solid implementation of Nextflow grammar. Maintaining compatibility with the legacy syntax implementation and data structures is a challenging task that prevents the evolution of the new syntax.", + "_key": "be5061112ef2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "253745843863", + "markDefs": [] + }, + { + "_key": "90cbf87d4503", + "children": [ + { + "text": "", + "_key": "0a849cbf24f2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "779015e1a628", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/releases", + "_key": "d1e2981cd904", + "_type": "link" + } + ], + "children": [ + { + "_key": "78cfff12e672", + "_type": "span", + "marks": [], + "text": "Bear in mind, this does " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "not", + "_key": "a3947d5fa744" + }, + { + "marks": [], + "text": " mean it will not be possible to use DSL1 starting from 2023. All existing Nextflow runtimes will continue to be available, and it will be possible to for any legacy pipeline to run using the required version available from the GitHub ", + "_key": "8df1419fcf96", + "_type": "span" + }, + { + "text": "releases page", + "_key": "bdb28f19a25d", + "_type": "span", + "marks": [ + "d1e2981cd904" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", or by specifying the version using the NXF_VER variable, e.g.", + "_key": "5184698579b0" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9285387ddfb5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "17976845589c" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "64d1e1607622", + "code": "NXF_VER: 21.10.6 nextflow run " + }, + { + "_key": "47be3491a7b8", + "children": [ + { + "text": "", + "_key": "10e6f099c2c1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "New configuration format", + "_key": "edf6a634fc9f" + } + ], + "_type": "block", + "style": "h3", + "_key": "b33987079c9d" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The configuration file is a key component of the Nextflow framework since it allows workflow developers to decouple the pipeline logic from the execution parameters and infrastructure deployment settings.", + "_key": "8f67710faa58", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6509f2b57c4" + }, + { + "_type": "block", + "style": "normal", + "_key": "880486b74a61", + "children": [ + { + "_key": "e9d9022ba1c5", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "afb500b83a74", + "markDefs": [], + "children": [ + { + "_key": "fc133ed67f9d", + "_type": "span", + "marks": [], + "text": "The current Nextflow configuration file mechanism is extremely powerful, but it also has some serious drawbacks due to its " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "dynamic", + "_key": "07e424d8a40f" + }, + { + "_key": "50c77083262c", + "_type": "span", + "marks": [], + "text": " nature that makes it very hard to keep stable and maintainable over time." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "93e5a4514dc6" + } + ], + "_type": "block", + "style": "normal", + "_key": "8f76ebb26ae9" + }, + { + "_key": "f81607750743", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this reason, we are planning to re-engineer the current configuration component and replace it with a better configuration component with two major goals: 1) continue to provide a rich and human-readable configuration system (so, no YAML or JSON), 2) have a well-defined syntax with a solid foundation that guarantees predictable configurations, simpler troubleshooting and more sustainable maintenance.", + "_key": "888a87dc54d3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "414d40874f57", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d51dc5548954" + } + ] + }, + { + "markDefs": [ + { + "_key": "f88563f6e6cf", + "_type": "link", + "href": "https://github.com/hashicorp/hcl" + }, + { + "_type": "link", + "href": "https://github.com/lightbend/config", + "_key": "dd31218adb40" + }, + { + "_key": "5f58504262ad", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/2723" + } + ], + "children": [ + { + "marks": [], + "text": "Currently, the most likely options are ", + "_key": "db966f3ebf12", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "f88563f6e6cf" + ], + "text": "Hashicorp HCL", + "_key": "2600b769af09" + }, + { + "_type": "span", + "marks": [], + "text": " (as used by Terraform and other Hashicorp tools) and ", + "_key": "f09939b614a3" + }, + { + "text": "Lightbend HOCON", + "_key": "b6bf5edfb3df", + "_type": "span", + "marks": [ + "dd31218adb40" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". You can read more about this feature at ", + "_key": "20c8d2004b5d" + }, + { + "marks": [ + "5f58504262ad" + ], + "text": "this link", + "_key": "a20826ee22f8", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "9e066f85e9e9" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb2c28e9ee23" + }, + { + "style": "normal", + "_key": "31617ababdfc", + "children": [ + { + "_key": "d4516291347a", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "e743e38f44bb", + "children": [ + { + "text": "Ignite executor deprecation", + "_key": "0aff91392e44", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "26b895ea2ac0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/ignite.html", + "_key": "7860ca109cbd" + } + ], + "children": [ + { + "marks": [], + "text": "The executor for ", + "_key": "6a499af3a8bb", + "_type": "span" + }, + { + "text": "Apache Ignite", + "_key": "9dc4b9754145", + "_type": "span", + "marks": [ + "7860ca109cbd" + ] + }, + { + "text": " was an early attempt to provide Nextflow with a self-contained, distributed cluster for the deployment of pipelines into HPC environments. However, it had very little adoption over the years, which was not balanced by the increasing complexity of its maintenance.", + "_key": "22d1e1eda6cd", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "94ac0085d8b3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bd593e1d634a" + } + ] + }, + { + "style": "normal", + "_key": "78e007205aa7", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nf-ignite", + "_key": "534211f225ac" + } + ], + "children": [ + { + "text": "For this reason, it was decided to deprecate it and remove it from the default Nextflow distribution. The module is still available in the form of a separate project plugin and available at ", + "_key": "788499d39268", + "_type": "span", + "marks": [] + }, + { + "_key": "3c17485fef37", + "_type": "span", + "marks": [ + "534211f225ac" + ], + "text": "this link" + }, + { + "marks": [], + "text": ", however, it will not be actively maintained.", + "_key": "89e0994a19f0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "71ca7cb19b99", + "children": [ + { + "_type": "span", + "text": "", + "_key": "869a5f0a3647" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "e7fc7b8a31ac" + } + ], + "_type": "block", + "style": "h3", + "_key": "39a3918382ab" + }, + { + "markDefs": [], + "children": [ + { + "text": "This post is focused on the most fundamental changes we are planning to make in the following months.", + "_key": "fc633bc84c6f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e5492a05b394" + }, + { + "_key": "4d27793d27a1", + "children": [ + { + "_key": "8a0dad7ac6f1", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the adoption of Java 11, the full migration of DSL1 to DSL2 and the re-engineering of the configuration system, our purpose is to consolidate the Nextflow technology and lay the foundation for all the new exciting developments and features on which we are working on. Stay tuned for future blogs about each of them in upcoming posts.", + "_key": "ac1d7ce9e376" + } + ], + "_type": "block", + "style": "normal", + "_key": "d0d726854852" + }, + { + "style": "normal", + "_key": "71f5034fef7f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b26cce36e65d" + } + ], + "_type": "block" + }, + { + "_key": "c8b7484c9e7b", + "markDefs": [ + { + "_type": "link", + "href": "https://app.slack.com/client/T03L6DM9G", + "_key": "797eed5aec45" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you want to learn more about the upcoming changes reach us out on ", + "_key": "095acc7b9d14" + }, + { + "marks": [ + "797eed5aec45" + ], + "text": "Slack at this link", + "_key": "0ee03ac01f9c", + "_type": "span" + }, + { + "_key": "01a525eb74ff", + "_type": "span", + "marks": [], + "text": ". " + }, + { + "_type": "span", + "text": "", + "_key": "de97bc9cf84c" + } + ], + "_type": "block", + "style": "normal" + } + ] + }, + { + "meta": { + "slug": { + "current": "experimental-cleanup-with-nf-boost" + } + }, + "body": [ + { + "style": "h3", + "_key": "ce03c795868d", + "children": [ + { + "_key": "f7a1b9cc4eac", + "_type": "span", + "text": "Backstory" + } + ], + "_type": "block" + }, + { + "_key": "f81483baf47d", + "markDefs": [ + { + "href": "https://github.com/systemsgenetics/gemmaker", + "_key": "dad6395914d8", + "_type": "link" + }, + { + "href": "https://github.com/nf-core/rnaseq", + "_key": "d7e301d52915", + "_type": "link" + } + ], + "children": [ + { + "_key": "4583802a9cb9", + "_type": "span", + "marks": [], + "text": "When I (Ben) was in grad school, I worked on a Nextflow pipeline called " + }, + { + "_key": "1ffbb8694fa0", + "_type": "span", + "marks": [ + "dad6395914d8" + ], + "text": "GEMmaker" + }, + { + "marks": [], + "text": ", an RNA-seq analysis pipeline similar to ", + "_key": "ef514b533144", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "d7e301d52915" + ], + "text": "nf-core/rnaseq", + "_key": "5762ef617b9c" + }, + { + "_type": "span", + "marks": [], + "text": ". We quickly ran into a problem, which is that on large runs, we were running out of storage! As it turns out, it wasn’t the final outputs, but the intermediate outputs (the BAM files, etc) that were taking up so much space, and we figured that if we could just delete those intermediate files sooner, we might be able to make it through a pipeline run without running out of storage. We were far from alone.", + "_key": "f75d29df264d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "777b8c0b7e6b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9c8ccfb57a9e" + } + ], + "_type": "block" + }, + { + "_key": "fd62897b86b4", + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/452", + "_key": "883d5b4db6eb" + }, + { + "_type": "link", + "href": "https://github.com/spficklin", + "_key": "438d456612cd" + } + ], + "children": [ + { + "text": "Automatic cleanup is currently the ", + "_key": "99496d18d1ec", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "883d5b4db6eb" + ], + "text": "oldest open issue", + "_key": "f89de9f19ea1" + }, + { + "text": " on the Nextflow repository. For many users, the ability to quickly delete intermediate files makes the difference between a run being possible or impossible. ", + "_key": "75a7ac8a8a0f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "438d456612cd" + ], + "text": "Stephen Ficklin", + "_key": "ee2c799bf0ee" + }, + { + "text": ", the creator of GEMmaker, came up with a clever way to delete intermediate files and even “trick” Nextflow into skipping deleted tasks on a resumed run, which you can read about in the GitHub issue. It involved wiring the intermediate output channels to a “cleanup” process, along with a “done” signal from the relevant downstream processes to ensure that the intermediates were deleted at the right time.", + "_key": "72accb3f19a9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "34ddce8a481e" + }, + { + "_key": "04635ad3b6dc", + "children": [ + { + "text": "", + "_key": "b356ee6299e6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "a86e60ad155e", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/pull/3849", + "_key": "91bf33e18075" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This hack worked, but it required a lot of manual effort to wire up the cleanup process correctly, and it left me wondering whether it could be done automatically. Nextflow should be able to analyze the DAG, figure out when an output file can be deleted, and then delete it! During my time on the Nextflow team, I have implemented this exact idea in a ", + "_key": "781ee223e4b9" + }, + { + "_key": "64ca92d59553", + "_type": "span", + "marks": [ + "91bf33e18075" + ], + "text": "pull request" + }, + { + "_key": "0c906514ecc0", + "_type": "span", + "marks": [], + "text": ", but there are still a few challenges to resolve, such as resuming from deleted runs (which is not as impossible as it sounds)." + } + ] + }, + { + "_key": "da94401cf934", + "children": [ + { + "_type": "span", + "text": "", + "_key": "df8231728f26" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Introducing nf-boost: experimental features for Nextflow", + "_key": "c3ce2dca861b" + } + ], + "_type": "block", + "style": "h3", + "_key": "670c36761502" + }, + { + "markDefs": [ + { + "href": "https://github.com/bentsherman/nf-boost", + "_key": "861b1f06fda0", + "_type": "link" + } + ], + "children": [ + { + "_key": "02b90c71d511", + "_type": "span", + "marks": [], + "text": "Many users have told me that they would gladly take the cleanup without the resume, so I found a way to provide the cleanup functionality in a plugin, which I call " + }, + { + "_key": "7fe5b06f02af", + "_type": "span", + "marks": [ + "861b1f06fda0" + ], + "text": "nf-boost" + }, + { + "marks": [], + "text": ". This plugin is not just about automatic cleanup – it contains a variety of experimental features, like new operators and functions, that anyone can try today with a few extra lines of config, which is much less tedious than building Nextflow from a pull request. Not every new feature can be implemented via plugin, but for those features that can, it’s nice for the community to be able to try it out before we make it official.", + "_key": "bfa849e20ae4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f89ce206c786" + }, + { + "style": "normal", + "_key": "898d23413f19", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1ccbdb425cd9" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "78656a8ff789", + "_type": "span", + "marks": [], + "text": "The nf-boost plugin requires Nextflow v23.10.0 or later. You can enable the experimental cleanup by adding the following lines to your config file:" + } + ], + "_type": "block", + "style": "normal", + "_key": "13a6056909f0", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "6950c497c262", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a720d5247387" + }, + { + "code": "plugins {\n id 'nf-boost'\n}\n\nboost {\n cleanup = true\n}", + "_type": "code", + "_key": "a552782c335b" + }, + { + "children": [ + { + "_key": "79616a5de771", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "979eac8bc5d8" + }, + { + "children": [ + { + "text": "Automatic cleanup: how it works", + "_key": "c734e9a5db27", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "99b77905b36c" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The strategy of automatic cleanup is simple:", + "_key": "26da7f9e8e49", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "018697b682ee" + }, + { + "_key": "3aef3f1d0086", + "children": [ + { + "_key": "22c331cee154", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "As soon as an output file can be deleted, delete it", + "_key": "0a593eb005cd" + }, + { + "marks": [], + "text": "An output file can be deleted when (1) all downstream tasks that use the output file as an input have completed AND (2) the output file has been published (if it needs to be published)", + "_key": "fb17b32018f8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c77dec863ad5", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "72473dcfa84a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d93ae3f0d098" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In practice, the conditions for 2(a) are tricky to get right because Nextflow doesn’t know the full task graph from the start (thanks to the flexibility of Nextflow’s dataflow operators). But you don’t have to worry about any of that because we already figured out how to make it work! All you have to do is flip a switch (", + "_key": "961b908f59b3" + }, + { + "marks": [ + "code" + ], + "text": "boost.cleanup = true", + "_key": "ad37175ebbda", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ") and enjoy the ride.", + "_key": "c5419bbc4116" + } + ], + "_type": "block", + "style": "normal", + "_key": "b640690f9190", + "markDefs": [] + }, + { + "_key": "2c0345156453", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4c4856825bd5" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "cfe9cd9600f6", + "children": [ + { + "_type": "span", + "text": "Real-world example", + "_key": "79845b7c7708" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Let’s consider a variant calling pipeline following standard best practices. Sequencing reads are mapped onto the genome, producing a BAM file which will be marked for duplicates, filtered, recalibrated using GATK, etc. This means that, for a given sample, at least four copies of the BAM file will be stored in the work directory. In other words, for an initial paired-end whole-exome sequencing (WES) sample of 12 GB, the work directory will quickly grow to 50 GB just to store the BAM files for one sample, or 100 GB for a paired sample (e.g. germline and tumor).", + "_key": "5d9e8acb281a" + } + ], + "_type": "block", + "style": "normal", + "_key": "1738e0b23fed", + "markDefs": [] + }, + { + "_key": "468d1b14075d", + "children": [ + { + "_key": "ffe677e58bc4", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Now suppose that we want to analyze a cohort of 100 patients – that’s ~10 TB of intermediate data, which is a real problem. For some users, it means processing only a few samples at a time, even though they might have the compute capacity to do much more. For others, it means not being able to process even one sample, because the accumulated intermediate data is simply too large. With automatic cleanup, Nextflow should be able to delete the previous BAM as soon as the next BAM is produced, for each sample independently.", + "_key": "8b9e921181d8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6626018c9987", + "markDefs": [] + }, + { + "style": "normal", + "_key": "3af8097d4744", + "children": [ + { + "_key": "c020d08e55f9", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "f1c8d0d1876c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We tested this use-case with a paired WES sample (total input size of 26.8 GB), by tracking the work directory size for a run with and a run without automatic cleanup. The results are shown below.", + "_key": "f36c26c4974e", + "_type": "span" + } + ] + }, + { + "_key": "44172100e61b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "79ac17d59e2a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6a0251fb5164", + "asset": { + "_ref": "image-a3241cf3c7a0ad6e17d60b4848241996020ceed9-1600x795-png", + "_type": "reference" + }, + "_type": "image", + "alt": "disk usage with and without nf-boost" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a0167cb4b65c", + "_type": "span", + "marks": [ + "em" + ], + "text": "Note: we also changed the `boost.cleanupInterval` config option to 180 seconds, which was more optimal for our system." + } + ], + "_type": "block", + "style": "normal", + "_key": "724e0baf1122" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "78def267cc44" + } + ], + "_type": "block", + "style": "normal", + "_key": "e18c9ca3c9d8" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As expected, we see that without automatic cleanup, the size of the work directory reaches 110 GB when all BAM files are produced and never deleted. On the other hand, when the nf-boost cleanup is enabled, the work directory occasionally peaks at ~50 GB (i.e. no more than two BAM files are stored at the same time), but always returns to ~25 GB, since the previous BAM is deleted immediately after the next BAM is ready. There is no impact on the size of the results (since they are identical) or the total runtime (since cleanup happens in parallel with the workflow itself).", + "_key": "49d176d6101a" + } + ], + "_type": "block", + "style": "normal", + "_key": "73349701ade7" + }, + { + "_key": "63466a212e39", + "children": [ + { + "text": "", + "_key": "b44ef2c065cd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ed361498a9a4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this case, automatic cleanup reduced the total storage by 50-75% (depending on how you measure the storage). In general, the effectiveness of automatic cleanup will depend greatly on how you write your pipeline. Here are a few rules of thumb that we’ve come up with so far:", + "_key": "30d177cab1d5" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "556799e7121a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "34bbf4079e43" + }, + { + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "As your pipeline becomes “deeper” (i.e. more processing steps in sequence), automatic cleanup becomes more effective, because it only needs to keep two steps’ worth of data, regardless of the total number of steps", + "_key": "2dd2f3aa367b" + }, + { + "marks": [], + "text": "As your pipeline becomes “wider” (i.e. more inputs being processed in parallel), automatic cleanup should have roughly the same level of effectiveness. If some samples take longer to process than others, the peak storage should be lower with automatic cleanup, since the “peaks” for each sample will happen at different times.", + "_key": "d3a9eaaacc3d", + "_type": "span" + }, + { + "text": "As you add more dependencies between processes, automatic cleanup becomes less effective, because it has to wait longer before it can delete the upstream outputs. Note that each output is tracked independently, so for example, sending logs to a summary process won’t affect the cleanup of other outputs from that same process.", + "_key": "a598fc4c019b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d48777e619c1" + }, + { + "children": [ + { + "_key": "42c9b1c842d6", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b33b42e1b67b" + }, + { + "children": [ + { + "text": "Closing thoughts", + "_key": "a58f54685c94", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "6132dd3fbc42" + }, + { + "_key": "4f36f48596ab", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Automatic cleanup in nf-boost is an experimental feature, and notably does not support resumability, meaning that the deleted files will simply be re-executed on a resumed run. While we work through these last few challenges, the nf-boost plugin is a nice option for users who want to benefit from what we’ve built so far and don’t need the resumability.", + "_key": "30fdf5e9d4ef" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7e562a7a9ea5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "aed54532f0bd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The nice thing about nf-boost’s automatic cleanup is that it is just a preview of what will eventually be the “official” cleanup feature in Nextflow (when it is merged), so by using nf-boost, you are helping the future of Nextflow directly! We hope that this experimental version will help users run workloads that were previously difficult or even impossible, and we look forward to when we can bring this feature home to Nextflow.", + "_key": "e77645ccc52f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4ed8da5e1229" + } + ], + "title": "Experimental cleanup with nf-boost", + "_updatedAt": "2024-09-26T09:04:28Z", + "publishedAt": "2024-08-08T06:00:00.000Z", + "_type": "blogPost", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "99e7d255368c" + } + ], + "_id": "04e353d7b44d", + "author": { + "_type": "reference", + "_ref": "8bd9c7c9-b7e7-473a-ace4-2cf6802bc884" + }, + "_createdAt": "2024-09-25T14:17:58Z", + "_rev": "5lTkDsqMC29L3wnnkjjRrb" + }, + { + "_rev": "Ot9x7kyGeH5005E3MJ9Rfy", + "author": { + "_type": "reference", + "_ref": "5bLgfCKN00diCN0ijmWOx7" + }, + "_id": "054ddb6c99b8", + "body": [ + { + "_key": "08eb8e28ce6c", + "markDefs": [], + "children": [ + { + "_key": "c37e44a56e51", + "_type": "span", + "marks": [], + "text": "From December 2022 to March 2023, I was part of the second cohort of the Nextflow and nf-core mentorship program, which spanned four months and attracted participants globally. I could not have anticipated the extent to which my participation in this program and the associated learning experiences would positively change my professional growth. The mentorship aims to foster collaboration, knowledge exchange, flexible learning, collaborative coding, and contributions to the nf-core community. It was funded by the Chan Zuckerberg Initiative and is guided by experienced mentors in the community. In the upcoming paragraphs, I'll be sharing more details about the program—its structure, the valuable learning experiences it brought, and the exciting opportunities it opened up for me." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "084f7bb139c4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "3da880238de0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "45f515286788" + } + ], + "_type": "block", + "style": "normal", + "_key": "602f29bcb707" + }, + { + "_type": "block", + "style": "h2", + "_key": "31ac55a88c7d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Meeting my mentor", + "_key": "c89aba3992cb" + } + ] + }, + { + "children": [ + { + "_key": "96b5206cc67c", + "_type": "span", + "marks": [], + "text": "One of the most interesting aspects of the mentorship is that the program emphasizes that mentor-mentee pairs share research interests. In addition, the mentor should have significant experience in the areas where the mentee wants to develop. I found this extremely valuable, as it makes the program very flexible while also considering individual goals and interests. My goal as a mentee was to transition from a " + }, + { + "_key": "192c49f808d6", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Nextflow user to a Nextflow developer" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d5e0c45af35a" + } + ], + "_type": "block", + "style": "normal", + "_key": "5e3fa09ebcf1", + "markDefs": [] + }, + { + "style": "normal", + "_key": "d04bc5fdca31", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cffea9af48cf" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I was lucky enough to have Matthias De Smet as a mentor. He is a member of the Center for Medical Genetics in Ghent and has extensive experience working with open-source projects such as nf-core and Bioconda. His experience working in clinical genomics was a common ground for us to communicate, share experiences and build effective collaboration.", + "_key": "ab508f5d1c60" + } + ], + "_type": "block", + "style": "normal", + "_key": "127040f3d6cf" + }, + { + "_type": "block", + "style": "normal", + "_key": "be1a816ea55d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "16dcc0bc56c2" + } + ] + }, + { + "_key": "9e8bd42c388b", + "markDefs": [], + "children": [ + { + "text": "During my first days, he guided me to the most useful Nextflow resources available online, tailored to my goals. Then, I drafted a pipeline that I wanted to build and attempted to write my first lines of code in Nextflow. We communicated via Slack and Matthias reviewed and corrected my code via GitHub. He introduced me to the supportive nf-core community, to ask for help when needed, and to acknowledge every success along the way.", + "_key": "7db32e6934ee", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6c190c4de1ae" + } + ], + "_type": "block", + "style": "normal", + "_key": "161d577708cd" + }, + { + "style": "normal", + "_key": "f64c1d9cc1e3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b84873ab9936" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "Mentor compliment about new module added", + "_key": "1b8a603cb633", + "asset": { + "_ref": "image-7d35ff2925da1534129b9c7dd8bfbade190da61c-1132x204-png", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "h2", + "_key": "d8cab8f801b9", + "markDefs": [], + "children": [ + { + "text": "Highlights of the program", + "_key": "ac600b3104d1", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "We decided to start small, setting step-by-step goals. Matthias suggested that a doable goal would be to create my first Nextflow module in the context of a broader pipeline I wanted to develop. A module is a building block that encapsulates a specific functionality or task within a workflow. We realized that the tool I wanted to modularize was not available as part of nf-core. The nf-core GitHub has a community-driven collection of Nextflow modules, subworkflows and pipelines for bioinformatics, providing standardized and well-documented modules. The goal, therefore, was to create a module for this missing tool and then submit it as a contribution to nf-core.", + "_key": "27cfe03ffc9b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f0a1d33a074d", + "markDefs": [] + }, + { + "style": "normal", + "_key": "5203da423a25", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "40eb6e2fae90" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "22984c4f5707", + "_type": "span", + "marks": [], + "text": "For those unfamiliar, contributing to nf-core requires another member of the community, usually a maintainer, to review your code. As a newcomer, I was obviously curious about how the process would be. In academia, where anonymity often prevails, feedback can occasionally be a bit stringent. Conversely, during my submission to the nf-core project, I was pleasantly surprised that reviewers look for collective improvement, providing quick, constructive and amicable reviews, leading to a positive environment." + } + ], + "_type": "block", + "style": "normal", + "_key": "9e9f8a0fad70" + }, + { + "style": "normal", + "_key": "f7048aab1f35", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "21bb257c00e1" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "Review comment in GitHub", + "_key": "b5af7aaa5e40", + "asset": { + "_ref": "image-f3994b9f4e06fba6be7552431a56c828079f9c77-1106x226-png", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "_key": "0516df23bfc1", + "_type": "span", + "marks": [], + "text": "For my final project in the mentorship program, I successfully ported a complete pipeline from Bash to Nextflow. This was a learning experience that allowed me to explore a diverse range of skills, such as modularizing content, understanding how crucial the meta map is, and creating Docker container images for software. This process not only enhanced my proficiency in Nextflow but also allowed me to interact with and contribute to related projects like Bioconda and BioContainers." + } + ], + "_type": "block", + "style": "normal", + "_key": "3d757dc9cc22" + }, + { + "_type": "block", + "style": "normal", + "_key": "fe1c3b09599d", + "markDefs": [], + "children": [ + { + "_key": "7d4ff084080c", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "bdfb527e143d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Life after the mentorship", + "_key": "056c66229cbe", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "8be45b9b4b93", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=GHb2Wt9VCOg", + "_key": "8daa85555ae9" + } + ], + "children": [ + { + "text": "With the skills I acquired during the mentorship as a mentee, I proposed and successfully implemented a custom solution in Nextflow for a precision medicine start-up I worked at the time that could sequentially do several diagnostics and consumer-genetics applications in the cloud, resulting in substantial cost savings and increasing flexibility for the company. Beyond my immediate projects, I joined a group actively developing an open-source Nextflow pipeline for genetic imputation. This project allowed me to be in close contact with members of the nf-core community working on similar projects, adding new tools to this pipeline, giving and receiving feedback, and continuing to improve my overall Nextflow skills while also contributing to the broader bioinformatics community. You can learn more about this project with the fantastic talk by Louis Le Nézet at Nextflow Summit 2023 ", + "_key": "8dd5c7965d4b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "8daa85555ae9" + ], + "text": "here", + "_key": "4d0d91d051e1" + }, + { + "marks": [], + "text": ".", + "_key": "bafba6a7cd2c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "23f64f77d16b" + } + ], + "_type": "block", + "style": "normal", + "_key": "b97f09e8d2f6" + }, + { + "style": "normal", + "_key": "cab82421bbcc", + "markDefs": [], + "children": [ + { + "text": "Finally, I was honored to become a Nextflow ambassador. The program’s goal is to extend the awareness of Nextflow around the world while also building a supportive community. In particular, the South American community is underrepresented, so I serve as a point of contact for any institution or newcomer who wants to implement pipelines with Nextflow. As part of this program, I was invited to speak at the second Chilean Congress of Bioinformatics, where I gave a talk about how Nextflow and nf-core can support scaling bioinformatics projects in the cloud. It was incredibly rewarding to introduce Nextflow to a community for the first time and witness the genuine enthusiasm it sparks among students and attendees for the potential in their research projects.", + "_key": "b8286a2004c9", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "93fcde1f454c" + } + ], + "_type": "block", + "style": "normal", + "_key": "bef5727b0094", + "markDefs": [] + }, + { + "_key": "70e9f82405c4", + "asset": { + "_ref": "image-2591b4bfffbda1b9fb2b7e8ba72f82efd8b61148-1202x796-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Second Chilean Congress of Bioinformatics" + }, + { + "style": "h2", + "_key": "8d0f1e22b905", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "What’s next?", + "_key": "b1f267ae95ea", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The comprehensive skill set acquired in my journey proved to be incredibly valuable for my professional development and allowed me to join the ZS Discovery Team as a Senior Bioinformatician. This organization accelerates transformation in research and early development with direct contribution to impactful bioinformatics projects with a globally distributed, multidisciplinary talented team.", + "_key": "b8ad1a5313ed" + } + ], + "_type": "block", + "style": "normal", + "_key": "71a8c0ef7949", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "cedac273c584", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca296a4bb901", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "In addition, we organized a local site for the nf-core hackathon in March 2024, the first Nextflow Hackathon in Argentina, fostering a space to advance our skills in workflow management collectively. It was a pleasure to see how beginners got their first PRs approved and how they interacted with the nf-core community for the first time.", + "_key": "d412cf7a30e9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "59765e76faf5" + }, + { + "_type": "block", + "style": "normal", + "_key": "8ff2c8514806", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bf3cd841b664", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "81a165770b06", + "asset": { + "_type": "reference", + "_ref": "image-cae7d65050e9549e61530a352ec7f6a80d1168db-1198x898-png" + }, + "_type": "image", + "alt": "nf-core March 2024 Hackathon site in Argentina" + }, + { + "children": [ + { + "text": "My current (and probably future!) day-to-day work involves working and developing pipelines with Nextflow, while also mentoring younger bioinformaticians into this language. The commitment to open-source projects remains a cornerstone of my journey and I am thankful that it has provided me the opportunity to collaborate with individuals from diverse backgrounds all over the world.", + "_key": "8f0a4e63b316", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6c0736dae390", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5aefe083fe3c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2a3d13fd6c64" + }, + { + "_key": "50290758700f", + "markDefs": [], + "children": [ + { + "text": "Whether you're interested in the mentorship program, curious about the hackathon, or simply wish to connect, feel free to reach out at the nf-core Slack!", + "_key": "b811a3387b13", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9bc814404591", + "markDefs": [], + "children": [ + { + "_key": "fd43a9e712b2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it ", + "_key": "f5b28ee4f9c60", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "214f6b272101" + ], + "text": "here", + "_key": "f5b28ee4f9c61" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "f5b28ee4f9c62" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "f718da0c3019", + "markDefs": [ + { + "_key": "214f6b272101", + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html" + } + ] + } + ], + "_updatedAt": "2024-09-27T08:58:40Z", + "tags": [ + { + "_type": "reference", + "_key": "f8e871b015d2", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "meta": { + "description": "From December 2022 to March 2023, I was part of the second cohort of the Nextflow and nf-core mentorship program, which spanned four months and attracted participants globally. I could not have anticipated the extent to which my participation in this program and the associated learning experiences would positively change my professional growth. ", + "slug": { + "current": "reflections-on-nextflow-mentorship" + } + }, + "publishedAt": "2024-04-10T06:00:00.000Z", + "title": "One-year reflections on Nextflow Mentorship", + "_createdAt": "2024-09-25T14:18:42Z", + "_type": "blogPost" + }, + { + "body": [ + { + "style": "normal", + "_key": "a6dd923a2416", + "markDefs": [], + "children": [ + { + "_key": "8c9f16a1cce9", + "_type": "span", + "marks": [], + "text": "As a Nextflow Ambassador and a PhD student working in bioinformatics, I’ve always believed in the power of collaboration. Over the past six months, I’ve had the privilege of working with another PhD student specializing in metagenomics environmental science. This collaboration began through a simple email after the other researcher discovered my contact information on the ambassadors’ list page. It has been a journey of learning, problem-solving, and mutual growth. I’d like to share some reflections on this experience, highlighting both the challenges and the rewards." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "60a81d9fe61b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "98db31939957", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3eb1c6c909b8" + } + ], + "_type": "block", + "style": "normal", + "_key": "2513b0af78d6", + "markDefs": [] + }, + { + "style": "h2", + "_key": "ecdaa3836d39", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Connecting across disciplines", + "_key": "efe568a4e2e9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Our partnership began with a simple question about running one of nf-core’s metagenomics analysis pipelines. Despite being in different parts of Europe and coming from different academic backgrounds, we quickly found common ground. The combination of our expertise – my focus on bioinformatics workflows and their deep knowledge of microbial ecosystems – created a synergy that enriched our work.", + "_key": "1f2dd6a1d93a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ff67ec19e031", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "061704155a0f" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc022300f7f2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "bb0a59547b10", + "_type": "span", + "marks": [], + "text": "Navigating challenges together" + } + ], + "_type": "block", + "style": "h2", + "_key": "01757c5e0af3" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Like any collaboration, ours was not without its difficulties. We faced numerous technical challenges, from optimizing computational resources to troubleshooting pipeline errors. There were moments of frustration when things didn’t work as expected. However, each challenge was an opportunity to learn and grow. Working through these challenges together made them much more manageable and even enjoyable at times. We focused on mastering Nextflow in a high-performance computing (HPC) environment, managing large datasets, and conducting comprehensive data analysis. Additionally, we explored effective data visualization techniques to better interpret and present the findings. We leaned heavily on the Nextflow and nf-core community for support. The extensive documentation and guides were invaluable, and the different Slack channels provided real-time problem-solving assistance. Having the possibility of contacting the main developers of the pipeline that was troubling was a great resource that we are fortunate to have. The community’s willingness to share and offer help was a constant source of encouragement, making us feel supported every step of the way.", + "_key": "e52495a01e91", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "46935770191c" + }, + { + "_key": "7cd6dfc6ba43", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2fae48d0bbd9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Learning and growing", + "_key": "e74bf30ab856", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "37f0f0ac3b0a" + }, + { + "children": [ + { + "marks": [], + "text": "Over the past six months, we’ve both learned a tremendous amount. The other PhD student became more adept at using and understanding Nextflow, particularly when running the nf-core/ampliseq pipeline, managing files, and handling high-performance computing (HPC) environments. I, on the other hand, gained a deeper understanding of environmental microbiomes and the specific needs of metagenomics research. Our sessions were highly collaborative, allowing us to share knowledge and insights freely. It was reassuring to know that we weren’t alone in our journey and that there was a whole community of researchers ready to share their wisdom and experiences. These interactions made our learning process more rewarding.", + "_key": "9e4dc5bbdcd7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6fba4dea6926", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "41cecfe95aad", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b005cee7c272" + }, + { + "style": "h2", + "_key": "c53684a3a8c2", + "markDefs": [], + "children": [ + { + "_key": "e3a9ff47bc5e", + "_type": "span", + "marks": [], + "text": "Achieving synergy" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "b6742b4b3694", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "One of the most rewarding aspects of this collaboration has been the synergy between our different backgrounds. Our combined expertise enabled us to efficiently analyze a high volume of metagenomics samples. The journey does not stop here, of course. Now that they have their samples processed, it comes the time to interpret the data, one of my favorite parts. Our work together highlighted the potential for Nextflow and the nf-core community to facilitate research across diverse fields. The collaboration has been a testament to the idea that when individuals from different disciplines come together, they can achieve more than they could alone. This collaboration is poised to result in significant academic contributions. The other PhD student is preparing to publish a paper with the findings enabled by the use of the nf-core/ampliseq pipeline, which will be a key component of their thesis. This paper is going to serve as an excellent example of using Nextflow and nf-core pipelines in the field of metagenomics environmental science.", + "_key": "8241902d3002", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "22e1e4d5a98d", + "markDefs": [], + "children": [ + { + "_key": "153ec73273ad", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Reflecting on the journey", + "_key": "1718d5c1dbb1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "21d7f0ce679b" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "As I reflect on these six months, I’m struck by the power of this community in fostering such collaborations. The support network, comprehensive resources, and culture of knowledge sharing have been essential in our success. This experience has reinforced my belief in the importance of open-source bioinformatics and data science communities for professional development and scientific advancement. Through it all, having a collaborator who understood the struggles and celebrated the successes with me made the journey all the more rewarding. Moving forward, I’m excited about the potential for more such collaborations. The past six months have been a journey of discovery and growth, and I’m grateful for the opportunity to work with such a dedicated and talented researcher. Our work is far from over, and I look forward to continuing this journey, learning more, and contributing to the field of environmental science.", + "_key": "0cb17451e324" + } + ], + "_type": "block", + "style": "normal", + "_key": "12406ca53c5f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7167b3913379", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9e578829bc08" + }, + { + "_key": "0370560ec85e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Join the journey!", + "_key": "f31a05d8e191" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "dcc510770d8d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For those of you in the Nextflow community or considering joining, I encourage you to take advantage of the resources available. Engage with the community, attend webinars, and don’t hesitate to ask questions. Whether you’re a seasoned expert or a curious newcomer, the Nextflow family is here to support you. Together, we can achieve great things.", + "_key": "32f4ca65b971" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a2ead4cdde31", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "63e95ef08a9d" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "541a683790c6", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html", + "_key": "a60c8c7071b9" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it ", + "_key": "c97e14c395db0" + }, + { + "_type": "span", + "marks": [ + "a60c8c7071b9" + ], + "text": "here", + "_key": "c97e14c395db1" + }, + { + "text": ".", + "_key": "c97e14c395db2", + "_type": "span", + "marks": [] + } + ] + } + ], + "_updatedAt": "2024-09-27T09:02:28Z", + "publishedAt": "2024-06-19T06:00:00.000Z", + "title": "Reflecting on a six-month collaboration: insights from a Nextflow Ambassador", + "_createdAt": "2024-09-25T14:18:36Z", + "meta": { + "description": "As a Nextflow Ambassador and a PhD student working in bioinformatics, I’ve always believed in the power of collaboration. Over the past six months, I’ve had the privilege of working with another PhD student specializing in metagenomics environmental science. This collaboration began through a simple email after the other researcher discovered my contact information on the ambassadors’ list page.", + "slug": { + "current": "reflecting-ambassador-collaboration" + } + }, + "_id": "06590f6d24cf", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWOlw", + "_type": "reference" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5puv", + "_type": "blogPost", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "9d12976af1b1" + } + ] + }, + { + "_rev": "Ot9x7kyGeH5005E3MIxeEu", + "title": "Deploy Nextflow Pipelines with Google Cloud Batch!", + "_createdAt": "2024-09-25T14:16:35Z", + "_id": "071efbf6af2a", + "_type": "blogPost", + "meta": { + "slug": { + "current": "deploy-nextflow-pipelines-with-google-cloud-batch" + } + }, + "tags": [ + { + "_key": "b390e48f13c2", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + }, + { + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference", + "_key": "4e4827badcd6" + } + ], + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_updatedAt": "2024-09-26T09:03:09Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "A key feature of Nextflow is the ability to abstract the implementation of data analysis pipelines so they can be deployed in a portable manner across execution platforms.", + "_key": "0b115e4b37be", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "62aeb8ad922c" + }, + { + "_key": "bc8d1b3dadc3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c07708123153" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "6ef24ed4aa6a", + "markDefs": [], + "children": [ + { + "text": "As of today, Nextflow supports a rich variety of HPC schedulers and all major cloud providers. Our goal is to support new services as they emerge to enable Nextflow users to take advantage of the latest technology and deploy pipelines on the compute environments that best fit their requirements.", + "_key": "d6d15c9dff4f", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c4c79648b37c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f0def4ca37f3" + } + ] + }, + { + "style": "normal", + "_key": "ceef88a0ce2d", + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.google.com/batch", + "_key": "16e4acc77800" + } + ], + "children": [ + { + "_key": "3f602737e73a", + "_type": "span", + "marks": [], + "text": "For this reason, we are delighted to announce that Nextflow now supports " + }, + { + "marks": [ + "16e4acc77800" + ], + "text": "Google Cloud Batch", + "_key": "51218dc90beb", + "_type": "span" + }, + { + "_key": "c8c4a1f97b02", + "_type": "span", + "marks": [], + "text": ", a new fully managed batch service just announced for beta availability by Google Cloud." + } + ], + "_type": "block" + }, + { + "_key": "34d1e0921dc5", + "children": [ + { + "text": "", + "_key": "15bea6d42567", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "4d5cc0c72762", + "children": [ + { + "_type": "span", + "text": "A New On-Ramp to the Google Cloud", + "_key": "408e029c7edb" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Google Cloud Batch is a comprehensive cloud service suitable for multiple use cases, including HPC, AI/ML, and data processing. While it is similar to the Google Cloud Life Sciences API, used by many Nextflow users today, Google Cloud Batch offers a broader set of capabilities. As with Google Cloud Life Sciences, Google Cloud Batch automatically provisions resources, manages capacity, and allows batch workloads to run at scale. It offers several advantages, including:", + "_key": "9d1fb3d6681d" + } + ], + "_type": "block", + "style": "normal", + "_key": "46b7423d4725" + }, + { + "style": "normal", + "_key": "84bec944f99c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1eb601fd6d0c" + } + ], + "_type": "block" + }, + { + "listItem": "bullet", + "children": [ + { + "text": "The ability to re-use VMs across jobs steps to reduce overhead and boost performance.", + "_key": "ad7e4e7e266a", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Granular control over task execution, compute, and storage resources.", + "_key": "904872d9e316", + "_type": "span" + }, + { + "text": "Infrastructure, application, and task-level logging.", + "_key": "89bb330ef825", + "_type": "span", + "marks": [] + }, + { + "text": "Improved task parallelization, including support for multi-node MPI jobs, with support for array jobs, and subtasks.", + "_key": "f93c3c9db2af", + "_type": "span", + "marks": [] + }, + { + "_key": "481194962a0c", + "_type": "span", + "marks": [], + "text": "Improved support for spot instances, which provides a significant cost saving when compared to regular instance." + }, + { + "_key": "8598251de746", + "_type": "span", + "marks": [], + "text": "Streamlined data handling and provisioning." + } + ], + "_type": "block", + "style": "normal", + "_key": "99d4c194ee9b" + }, + { + "_key": "0ab0659aaa9d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5f944614e115" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ac621e3eb3f0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A nice feature of Google Cloud Batch API, that fits nicely with Nextflow, is its built-in support for data ingestion from Google Cloud Storage buckets. A batch job can ", + "_key": "864a53d068fc" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "mount", + "_key": "7fba42747ba5" + }, + { + "_type": "span", + "marks": [], + "text": " a storage bucket and make it directly accessible to a container running a Nextflow task. This feature makes data ingestion and sharing resulting data sets more efficient and reliable than other solutions.", + "_key": "d714c2eb22b9" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "bec6193aaf3d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d720e73b45f" + }, + { + "children": [ + { + "text": "Getting started with Google Cloud Batch", + "_key": "eed86b7db1ec", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "24472ef7c945" + }, + { + "style": "normal", + "_key": "9a272f9d4103", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Support for the Google Cloud Batch requires the latest release of Nextflow from the edge channel (version ", + "_key": "5cfe4e6ee7f9" + }, + { + "text": "22.07.1-edge", + "_key": "93b75672018d", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " or later). If you don't already have it, you can install this release using these commands:", + "_key": "8b413ed29f35" + } + ], + "_type": "block" + }, + { + "_key": "60041151b8d0", + "children": [ + { + "_key": "901f5e4a4b29", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "export NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update", + "_type": "code", + "_key": "b1b7aecb92d5" + }, + { + "_key": "37ffba134598", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6e47c7b13340" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://console.cloud.google.com/apis/dashboard", + "_key": "b9756c7ef17d" + } + ], + "children": [ + { + "_key": "70ada897b088", + "_type": "span", + "marks": [], + "text": "Make sure your Google account is allowed to access the Google Cloud Batch service by checking the " + }, + { + "text": "API & Service", + "_key": "2356a4d136c4", + "_type": "span", + "marks": [ + "b9756c7ef17d" + ] + }, + { + "_key": "09173826e758", + "_type": "span", + "marks": [], + "text": " dashboard." + } + ], + "_type": "block", + "style": "normal", + "_key": "9d787343fc0c" + }, + { + "style": "normal", + "_key": "19e4b37cdfa9", + "children": [ + { + "text": "", + "_key": "a05ae2ca3be6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http", + "_key": "3314346cfcb5" + } + ], + "children": [ + { + "_key": "6ecddfb6f83c", + "_type": "span", + "marks": [], + "text": "Credentials for accessing the service are picked up by Nextflow from your environment using the usual " + }, + { + "_key": "b99c99b7a012", + "_type": "span", + "marks": [ + "3314346cfcb5" + ], + "text": "Google Application Default Credentials" + }, + { + "text": " mechanism. That is, either via the ", + "_key": "a0d6532624f6", + "_type": "span", + "marks": [] + }, + { + "text": "GOOGLE_APPLICATION_CREDENTIALS", + "_key": "52cd9d0ce17e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " environment variable, or by using the following command to set up the environment:", + "_key": "02bbf60d94b5" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a685645e82e" + }, + { + "style": "normal", + "_key": "125fad5872b6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bfa9f7aceaac" + } + ], + "_type": "block" + }, + { + "code": "gcloud auth application-default login", + "_type": "code", + "_key": "e375b4c1c016" + }, + { + "_key": "a29c1a248077", + "children": [ + { + "_type": "span", + "text": "", + "_key": "718b0910c9b2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1704cfe42db", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "After authenticating yourself to Google Cloud, create a ", + "_key": "61f7f6d5c666", + "_type": "span" + }, + { + "text": "nextflow.config", + "_key": "8cd57d288fcc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "93529ebd1007", + "_type": "span", + "marks": [], + "text": " file and specify " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "google-batch", + "_key": "62fe137b5910" + }, + { + "text": " as the Nextflow executor. You will also need to specify the Google Cloud project where execution will occur and the Google Cloud Storage working directory for pipeline execution.", + "_key": "9b515b0b2fdf", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a03f1188af43" + } + ], + "_type": "block", + "style": "normal", + "_key": "ff6571f25a5f" + }, + { + "_type": "code", + "_key": "dd8f7afacc2c", + "code": "cat < nextflow.config\nprocess.executor = 'google-batch'\nworkDir = 'gs://YOUR-GOOGLE-BUCKET/scratch'\ngoogle.project = 'YOUR GOOGLE PROJECT ID'\nEOT" + }, + { + "children": [ + { + "_key": "71b822039690", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e2bc0ab7401d" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the above snippet replace ", + "_key": "2b604cb4eff2" + }, + { + "marks": [ + "code" + ], + "text": "<your_google_bucket>", + "_key": "1e34833483db", + "_type": "span" + }, + { + "marks": [], + "text": " with a Google Storage bucket of your choice where to store the pipeline output data and ", + "_key": "3b2372a676a2", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "<your_google_project_id>", + "_key": "123730ea6ac7", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " with your Google project Id where the computation will be deployed.", + "_key": "a3a87a4a498a" + } + ], + "_type": "block", + "style": "normal", + "_key": "2c0e813c80b0", + "markDefs": [] + }, + { + "_key": "ad6809ef0183", + "children": [ + { + "_type": "span", + "text": "", + "_key": "70b6f7ec49d4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3a33ec965d01", + "markDefs": [], + "children": [ + { + "text": "With this information, you are ready to start. You can verify that the integration is working by running the Nextflow “hello” pipeline as shown below:", + "_key": "05464d011147", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "1d8624c613a6", + "children": [ + { + "text": "", + "_key": "05ed3fa39b9c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "nextflow run https://github.com/nextflow-io/hello", + "_type": "code", + "_key": "65f1788e395c" + }, + { + "_type": "block", + "style": "normal", + "_key": "e95865421c8d", + "children": [ + { + "_key": "3f527818834d", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "36a6871e401f", + "_type": "span", + "text": "Migrating Google Cloud Life Sciences pipelines to Google Cloud Batch" + } + ], + "_type": "block", + "style": "h3", + "_key": "992297e5baa2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Google Cloud Life Sciences users can easily migrate their pipelines to Google Cloud Batch by making just a few edits to their pipeline configuration settings. Simply replace the ", + "_key": "2a4a869b2b3e" + }, + { + "_key": "989f53c14520", + "_type": "span", + "marks": [ + "code" + ], + "text": "google-lifesciences" + }, + { + "_type": "span", + "marks": [], + "text": " executor with ", + "_key": "1537864c70db" + }, + { + "_key": "3ab7bb3c7b99", + "_type": "span", + "marks": [ + "code" + ], + "text": "google-batch" + }, + { + "text": ".", + "_key": "b9f6ef005405", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9d2f6caddd47" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f0c4aac735da" + } + ], + "_type": "block", + "style": "normal", + "_key": "21753e096d5d" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For each setting having the prefix ", + "_key": "3c3806bfab13" + }, + { + "marks": [ + "code" + ], + "text": "google.lifeScience.", + "_key": "ac7bd603aacc", + "_type": "span" + }, + { + "_key": "8a5576af5854", + "_type": "span", + "marks": [], + "text": ", there is a corresponding " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "google.batch.", + "_key": "4b07f26a38ad" + }, + { + "_type": "span", + "marks": [], + "text": " setting. Simply update these configuration settings to reflect the new service.", + "_key": "50947c03547d" + } + ], + "_type": "block", + "style": "normal", + "_key": "f2b4ec920803" + }, + { + "_type": "block", + "style": "normal", + "_key": "d967eaba392d", + "children": [ + { + "_key": "dd39ee26daf1", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "cb298e4fdbf6", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#cpus", + "_key": "3f9a8b8bcd94" + }, + { + "_key": "cf805b38221e", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#memory" + }, + { + "href": "https://www.nextflow.io/docs/latest/process.html#time", + "_key": "9a8bf28a7f1a", + "_type": "link" + }, + { + "_key": "bddb769e0b96", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#machinetype" + } + ], + "children": [ + { + "marks": [], + "text": "The usual process directives such as: ", + "_key": "7dab659ed391", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3f9a8b8bcd94" + ], + "text": "cpus", + "_key": "586f1550bc9e" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "1cc68fd9bb8e" + }, + { + "_key": "4abbba6eaeb5", + "_type": "span", + "marks": [ + "cf805b38221e" + ], + "text": "memory" + }, + { + "_key": "f3de137648b3", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "9a8bf28a7f1a" + ], + "text": "time", + "_key": "e1286c758513" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "37c6ec102f5b" + }, + { + "text": "machineType", + "_key": "54d2efb3bb88", + "_type": "span", + "marks": [ + "bddb769e0b96" + ] + }, + { + "_key": "caa6666a2cbf", + "_type": "span", + "marks": [], + "text": " are natively supported by Google Cloud Batch, and should not be modified." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0813e67cdcf4", + "children": [ + { + "_key": "eafa3e5f9e8f", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Find out more details in the ", + "_key": "015760c11fae", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "f36e720116c6" + ], + "text": "Nextflow documentation", + "_key": "9053059c9ae0" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6cf9a73c4e19" + } + ], + "_type": "block", + "style": "normal", + "_key": "596cde8a9319", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/google.html#cloud-batch", + "_key": "f36e720116c6" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7f54f03cdfd3" + } + ], + "_type": "block", + "style": "normal", + "_key": "176e98edecbc" + }, + { + "_type": "block", + "style": "h3", + "_key": "5a0a59da8cd5", + "children": [ + { + "text": "100% Open, Built to Scale", + "_key": "617b3ac26d29", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "href": "https://seqera.io/", + "_key": "d8ab0bae4aef", + "_type": "link" + } + ], + "children": [ + { + "_key": "ea4d55c179e4", + "_type": "span", + "marks": [], + "text": "The Google Cloud Batch executor for Nextflow is offered as an open source contribution to the Nextflow project. The integration was developed by Google in collaboration with " + }, + { + "text": "Seqera Labs", + "_key": "9c758343ac84", + "_type": "span", + "marks": [ + "d8ab0bae4aef" + ] + }, + { + "marks": [], + "text": ". This is a validation of Google Cloud’s ongoing commitment to open source software (OSS) and a testament to the health and vibrancy of the Nextflow project. We wish to thank the entire Google Cloud Batch team, and Shamel Jacobs in particular, for their support of this effort.", + "_key": "69f92fbceece", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fae10c9431ee" + }, + { + "children": [ + { + "_key": "e1df763b4462", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "db6356afba90" + }, + { + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "cfc13a6d5a5c" + } + ], + "_type": "block", + "style": "h3", + "_key": "4154937128b0" + }, + { + "style": "normal", + "_key": "d518769ab192", + "markDefs": [], + "children": [ + { + "text": "Support for Google Cloud Batch further expands the wide range of computing platforms supported by Nextflow. It empowers Nextflow users to easily access cost-effective resources, and take full advantage of the rich capabilities of the Google Cloud. Above all, it enables researchers to easily scale and collaborate, improving their productivity, and resulting in better research outcomes. ", + "_key": "b94445897015", + "_type": "span", + "marks": [] + }, + { + "_key": "f4b2c5d2c4dc", + "_type": "span", + "text": "
" + }, + { + "text": "
", + "_key": "0ebfe99b1307", + "_type": "span" + }, + { + "_key": "14fdd2e93aab", + "_type": "span", + "text": "
" + } + ], + "_type": "block" + } + ], + "publishedAt": "2022-07-13T06:00:00.000Z" + }, + { + "_type": "blogPost", + "tags": [ + { + "_type": "reference", + "_key": "ca765fcb7f9d", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "meta": { + "slug": { + "current": "nextflow-hack17" + }, + "description": "Last week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation (CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as well as two hackathon sessions for more advanced users." + }, + "_createdAt": "2024-09-25T14:15:16Z", + "_updatedAt": "2024-10-09T08:37:10Z", + "publishedAt": "2017-09-30T06:00:00.000Z", + "_rev": "Ot9x7kyGeH5005E3MJ9TPv", + "title": "Nexflow Hackathon 2017", + "_id": "0896c3ec87d3", + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "2cbe5322454d", + "markDefs": [], + "children": [ + { + "_key": "5623bac5c2b2", + "_type": "span", + "marks": [], + "text": "Last week saw the inaugural Nextflow meeting organised at the Centre for Genomic Regulation (CRG) in Barcelona. The event combined talks, demos, a tutorial/workshop for beginners as well as two hackathon sessions for more advanced users." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "5cb62c98112f", + "markDefs": [], + "children": [ + { + "_key": "a5bbccc30e0d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "ad1babe37482", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nearly 50 participants attended over the two days which included an entertaining tapas course during the first evening!", + "_key": "319bf393701b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "de4ccbc5810d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5a059f6f868f", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "792ee6de9a90", + "markDefs": [], + "children": [ + { + "_key": "855702f7eca4", + "_type": "span", + "marks": [], + "text": "One of the main objectives of the event was to bring together Nextflow users to work together on common interest projects. There were several proposals for the hackathon sessions and in the end five diverse ideas were chosen for communal development ranging from new pipelines through to the addition of new features in Nextflow." + } + ] + }, + { + "_key": "32f1f81213f6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d11af5c36154" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "5dce02951807", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/hack17", + "_key": "5f91cb5ad23e" + } + ], + "children": [ + { + "marks": [], + "text": "The proposals and outcomes of each the projects, which can be found in the issues section of ", + "_key": "fbac832bb6ce", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5f91cb5ad23e" + ], + "text": "this GitHub repository", + "_key": "680dfa76dbfd" + }, + { + "text": ", have been summarised below.", + "_key": "4158c66926b6", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "84a6f578d655", + "markDefs": [], + "children": [ + { + "_key": "7de6623a47e9", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "07a5782572fe", + "markDefs": [], + "children": [ + { + "text": "Nextflow HTML tracing reports", + "_key": "e8cecc9b32b5", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The HTML tracing project aims to generate a rendered version of the Nextflow trace file to enable fast sorting and visualisation of task/process execution statistics.", + "_key": "76036d2f16c3" + } + ], + "_type": "block", + "style": "normal", + "_key": "258a40dc9efe" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "67075073adf2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "39f3b0e1e26a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "9549d05e22b7", + "markDefs": [], + "children": [ + { + "text": "Currently the data in the trace includes information such as CPU duration, memory usage and completion status of each task, however wading through the file is often not convenient when a large number of tasks have been executed.", + "_key": "b7666ff7501a", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "b089776480b2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "785dae0898e7", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://github.com/ewels", + "_key": "158643eed0fc", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "158643eed0fc" + ], + "text": "Phil Ewels", + "_key": "4e4d5e33994c" + }, + { + "text": " proposed the idea and led the coordination effort with the outcome being a very impressive working prototype which can be found in the Nextflow branch ", + "_key": "5ba718b04428", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "html-trace", + "_key": "ea2e96827234", + "_type": "span" + }, + { + "_key": "38c841d6cb19", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "4559ddcac9fa" + }, + { + "style": "normal", + "_key": "6f98c5393b5b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "06684911ca3f" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "91cf4fb05950", + "_type": "link", + "href": "/misc/nf-trace-report.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "An image of the example report is shown below with the interactive HTML available ", + "_key": "bb436f75056d" + }, + { + "text": "here", + "_key": "60da7400f352", + "_type": "span", + "marks": [ + "91cf4fb05950" + ] + }, + { + "text": ". It is expected to be merged into the main branch of Nextflow with documentation in a near-future release.", + "_key": "973c77706bf2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "af431bc28a67" + }, + { + "_key": "534f01b2135a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b798026961aa", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "alt": "Nextflow HTML execution report", + "_key": "536a48473890", + "asset": { + "_ref": "image-adbf6caf7f5bc89f1f94083d37412437599c4ed4-2840x1877-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "08d760ec3120" + } + ], + "_type": "block", + "style": "normal", + "_key": "d7c759aac68b" + }, + { + "children": [ + { + "_key": "3b0e869a36fa", + "_type": "span", + "marks": [], + "text": "Nextflow pipeline for 16S microbial data" + } + ], + "_type": "block", + "style": "h2", + "_key": "53d9f184f744", + "markDefs": [] + }, + { + "children": [ + { + "text": "The H3Africa Bioinformatics Network have been developing several pipelines which are used across the participating centers. The diverse computing resources available across the nodes has led to members wanting workflow solutions with a particular focus on portability.", + "_key": "0c9c3275fbd7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ac791b1f515d", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "2a923da3afaf", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bc3361598fa2" + }, + { + "_type": "block", + "style": "normal", + "_key": "d8e4875d7337", + "markDefs": [ + { + "_key": "4b42a39484ae", + "_type": "link", + "href": "https://github.com/h3abionet/h3abionet16S/tree/master" + } + ], + "children": [ + { + "text": "With this is mind, Scott Hazelhurst proposed a project for a 16S Microbial data analysis pipeline which had ", + "_key": "fd3bd7460c22", + "_type": "span", + "marks": [] + }, + { + "_key": "d6c28c73f1f1", + "_type": "span", + "marks": [ + "4b42a39484ae" + ], + "text": "previously been developed using CWL" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "cbd1e9c66a43" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7e71f2f831f7" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d3ba12818f3", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0fa63b7d2825", + "markDefs": [ + { + "href": "https://github.com/h3abionet/h3abionet16S/tree/nextflow", + "_key": "0fef7cca6c55", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The participants made a new ", + "_key": "f6dd7857e235" + }, + { + "marks": [ + "0fef7cca6c55" + ], + "text": "branch", + "_key": "96d5a20e1ace", + "_type": "span" + }, + { + "_key": "6f6bd40ba2a7", + "_type": "span", + "marks": [], + "text": " of the original pipeline and ported it into Nextflow." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "9bb90e7fe5d5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6a530eca058d", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f95a692b4c1c", + "markDefs": [], + "children": [ + { + "_key": "a635b728c9f8", + "_type": "span", + "marks": [], + "text": "The pipeline will continue to be developed with the goal of acting as a comparison between CWL and Nextflow. It is thought this can then be extended to other pipelines by both those who are already familiar with Nextflow as well as used as a tool for training newer users." + } + ] + }, + { + "style": "normal", + "_key": "ff8e50d8d4d0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ff4a5092c54c", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow modules prototyping", + "_key": "5ecd4291b0a3" + } + ], + "_type": "block", + "style": "h2", + "_key": "aa2a8adc1823", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0f89e43af82f", + "markDefs": [ + { + "_type": "link", + "href": "https://toolshed.g2.bx.psu.edu/", + "_key": "727e6c3b82d0" + } + ], + "children": [ + { + "_key": "77d5d7a81c8a", + "_type": "span", + "marks": [ + "em" + ], + "text": "Toolboxing" + }, + { + "_type": "span", + "marks": [], + "text": " allows users to incorporate software into their pipelines in an efficient and reproducible manner. Various software repositories are becoming increasing popular, highlighted by the over 5,000 tools available in the ", + "_key": "b75c5864c3fa" + }, + { + "marks": [ + "727e6c3b82d0" + ], + "text": "Galaxy Toolshed", + "_key": "1d3a928152cb", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "a32686f8f2a6" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "661468c80496", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "02414c73cd54" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://biocontainers.pro/", + "_key": "26576d80d5ce" + }, + { + "_key": "f17d96ea5c98", + "_type": "link", + "href": "https://github.com/skptic" + }, + { + "_key": "fc0f2364a9aa", + "_type": "link", + "href": "https://github.com/viklund" + }, + { + "_type": "link", + "href": "https://dockstore.org", + "_key": "04f5f6d03520" + }, + { + "_type": "link", + "href": "http://genomicsandhealth.org", + "_key": "5cdcada3bae6" + } + ], + "children": [ + { + "marks": [], + "text": "Projects such as ", + "_key": "4d60dc29ae3a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "26576d80d5ce" + ], + "text": "Biocontainers", + "_key": "3208e686ee85" + }, + { + "_key": "34dd398447c2", + "_type": "span", + "marks": [], + "text": " aim to wrap up the execution environment using containers. " + }, + { + "_type": "span", + "marks": [ + "f17d96ea5c98" + ], + "text": "Myself", + "_key": "b05a3495fb78" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "cb493d5cd925" + }, + { + "marks": [ + "fc0f2364a9aa" + ], + "text": "Johan Viklund", + "_key": "e09b40efd21e", + "_type": "span" + }, + { + "_key": "cd741fffeb79", + "_type": "span", + "marks": [], + "text": " wished to piggyback off existing repositories and settled on " + }, + { + "text": "Dockstore", + "_key": "43a2827e96c0", + "_type": "span", + "marks": [ + "04f5f6d03520" + ] + }, + { + "text": " which is an open platform compliant with the ", + "_key": "60eb98637eee", + "_type": "span", + "marks": [] + }, + { + "_key": "6e4e8a0dde0a", + "_type": "span", + "marks": [ + "5cdcada3bae6" + ], + "text": "GA4GH" + }, + { + "text": " initiative.", + "_key": "e1a18a9a498d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d24ec61c74b9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "753cedb452e5" + } + ], + "_type": "block", + "style": "normal", + "_key": "439bc1c592cf" + }, + { + "style": "normal", + "_key": "12ff1a3c7252", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The majority of tools in Dockstore are written in the CWL and therefore we required a parser between the CWL CommandLineTool class and Nextflow processes. Johan was able to develop a parser which generates Nextflow processes for several Dockstore tools.", + "_key": "57319b7dcaec" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c9b8783e1b86", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "38fbfc97292d" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "67d8b0ce0ebd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As these resources such as Dockstore become mature and standardised, it will be possible to automatically generate a ", + "_key": "08f47baaf220" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Nextflow Store", + "_key": "5aee6dc7ced9" + }, + { + "marks": [], + "text": " and enable efficient incorporation of tools into workflows.", + "_key": "43a94ad4470f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "741be4c7e4d3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8d8caeb3fd86" + } + ], + "_type": "block" + }, + { + "_key": "ffa6612e5068", + "src": "https://gist.github.com/pditommaso/7ccdb6e8af80133a25f259ae801371bf.js", + "_type": "script", + "id": "" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Example showing a Nextflow process generated from the Dockstore CWL repository for the tool BAMStats.", + "_key": "5d165ace0856" + } + ], + "_type": "block", + "style": "normal", + "_key": "69b3e789b4a9", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "8d56d04a4b34", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2e7c61aa053d", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "b16ef37b980f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow pipeline for de novo assembly of nanopore reads", + "_key": "6f9a85c69879" + } + ] + }, + { + "style": "normal", + "_key": "b58ad4882322", + "markDefs": [ + { + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Nanopore_sequencing", + "_key": "b1a8406e2562" + } + ], + "children": [ + { + "_key": "3b912f9da180", + "_type": "span", + "marks": [ + "b1a8406e2562" + ], + "text": "Nanopore sequencing" + }, + { + "_key": "c01600bcb19f", + "_type": "span", + "marks": [], + "text": " is an exciting and emerging technology which promises to change the landscape of nucleotide sequencing." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "aabf52fe35a4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "16aa1bfa244b" + }, + { + "_type": "block", + "style": "normal", + "_key": "09066fbdb578", + "markDefs": [ + { + "href": "https://github.com/HadrienG", + "_key": "cc06ba336389", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "With keen interest in Nanopore specific pipelines, ", + "_key": "2103fc0b00d8", + "_type": "span" + }, + { + "marks": [ + "cc06ba336389" + ], + "text": "Hadrien Gourlé", + "_key": "60dfff9c0067", + "_type": "span" + }, + { + "_key": "d276af16691a", + "_type": "span", + "marks": [], + "text": " lead the hackathon project for " + }, + { + "_key": "a6d26200efeb", + "_type": "span", + "marks": [ + "em" + ], + "text": "Nanoflow" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "ba02d6fb4470" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2512cf2f498d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2efe2595ea63" + }, + { + "children": [ + { + "_key": "a971540a251d", + "_type": "span", + "marks": [ + "008845674365" + ], + "text": "Nanoflow" + }, + { + "_type": "span", + "marks": [], + "text": " is a de novo assembler of bacterials genomes from nanopore reads using Nextflow.", + "_key": "cebbb08e5345" + } + ], + "_type": "block", + "style": "normal", + "_key": "911bde2cc0ae", + "markDefs": [ + { + "_key": "008845674365", + "_type": "link", + "href": "https://github.com/HadrienG/nanoflow" + } + ] + }, + { + "style": "normal", + "_key": "0d0c781cb245", + "markDefs": [], + "children": [ + { + "_key": "1fdccf28375b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "63fff43c4f7b", + "markDefs": [ + { + "_key": "eda4beb430e8", + "_type": "link", + "href": "https://github.com/marbl/canu" + }, + { + "href": "https://github.com/lh3/miniasm", + "_key": "52fb30b7eabc", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "During the two days the participants developed the pipeline for adapter trimming as well as assembly and consensus sequence generation using either ", + "_key": "7cba85f8e8f4" + }, + { + "_type": "span", + "marks": [ + "eda4beb430e8" + ], + "text": "Canu", + "_key": "253a74e1dc78" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "ef78e6de75a5" + }, + { + "text": "Miniasm", + "_key": "7d6ebd8db453", + "_type": "span", + "marks": [ + "52fb30b7eabc" + ] + }, + { + "_key": "eaa09d03d0ea", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "87dbb8eeb35c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ddf25b71bbd4" + } + ] + }, + { + "children": [ + { + "text": "The future plans are to finalise the pipeline to include a polishing step and a genome annotation step.", + "_key": "adf8e37e134d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2365bb0d4103", + "markDefs": [] + }, + { + "style": "h2", + "_key": "871456a1f6ac", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "36c49d30f0dc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow AWS Batch integration", + "_key": "aed4ff14f109", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "202192968458" + }, + { + "markDefs": [ + { + "href": "https://aws.amazon.com/batch/", + "_key": "9150dddf1425", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/fstrozzi", + "_key": "724b80736270" + } + ], + "children": [ + { + "_key": "8237cce83cbb", + "_type": "span", + "marks": [], + "text": "Nextflow already has experimental support for " + }, + { + "marks": [ + "9150dddf1425" + ], + "text": "AWS Batch", + "_key": "715486fd8b71", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and the goal of this project proposed by ", + "_key": "032d258021a0" + }, + { + "_type": "span", + "marks": [ + "724b80736270" + ], + "text": "Francesco Strozzi", + "_key": "15b3997f81db" + }, + { + "marks": [], + "text": " was to improve this support, add features and test the implementation on real world pipelines.", + "_key": "f7c5965d2389", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "89193a08f23a" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1dfa17147948", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5d6761851298" + }, + { + "style": "normal", + "_key": "c843f3a81375", + "markDefs": [ + { + "_key": "b688ed74de14", + "_type": "link", + "href": "https://github.com/pditommaso" + } + ], + "children": [ + { + "marks": [], + "text": "Earlier work from ", + "_key": "7fbd20436da5", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "b688ed74de14" + ], + "text": "Paolo Di Tommaso", + "_key": "13f305cc9f7f" + }, + { + "marks": [], + "text": " in the Nextflow repository, highlighted several challenges to using AWS Batch with Nextflow.", + "_key": "74f9ce8442f7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d37a042922df" + } + ], + "_type": "block", + "style": "normal", + "_key": "eac64202fe11" + }, + { + "_type": "block", + "style": "normal", + "_key": "3445894ec8b4", + "markDefs": [ + { + "href": "https://github.com/tdudgeon", + "_key": "02d385e8b03b", + "_type": "link" + } + ], + "children": [ + { + "text": "The major obstacle described by ", + "_key": "5302f92fe57f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "02d385e8b03b" + ], + "text": "Tim Dudgeon", + "_key": "88c4243aef47", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " was the requirement for each Docker container to have a version of the Amazon Web Services Command Line tools (aws-cli) installed.", + "_key": "94f2a875fe05" + } + ] + }, + { + "style": "normal", + "_key": "d93497f8cde0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d6e9d04e08f9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A solution was to install the AWS CLI tools on a custom AWS image that is used by the Docker host machine, and then mount the directory that contains the necessary items into each of the Docker containers as a volume. Early testing suggests this approach works with the hope of providing a more elegant solution in future iterations.", + "_key": "041098f4286f" + } + ], + "_type": "block", + "style": "normal", + "_key": "367d516bae7c" + }, + { + "style": "normal", + "_key": "fde2bc0962b5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1bdf95f173e8" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "cd39d7f27619", + "markDefs": [], + "children": [ + { + "_key": "b01ce1ef1682", + "_type": "span", + "marks": [], + "text": "The code and documentation for AWS Batch has been prepared and will be tested further before being rolled into an official Nextflow release in the near future." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3091842ae0df", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "89ffb4f4bb89", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "d867d1b93003" + } + ], + "_type": "block", + "style": "h2", + "_key": "9989b3feb70a", + "markDefs": [] + }, + { + "_key": "826b931a71eb", + "markDefs": [], + "children": [ + { + "text": "The event was seen as an overwhelming success and special thanks must be made to all the participants. As the Nextflow community continues to grow, it would be fantastic to make these types meetings more regular occasions.", + "_key": "11a6b5509a9c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c41c80cd070a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "94d3a6224eaf" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "In the meantime we have put together a short video containing some of the highlights of the two days.", + "_key": "8394b2d77a0a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "84f7f4981a4c" + }, + { + "_type": "block", + "style": "normal", + "_key": "ec5f02692747", + "markDefs": [], + "children": [ + { + "_key": "b1440212574e", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We hope to see you all again in Barcelona soon or at new events around the world!", + "_key": "af0e17f72dc0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "473c4729c8fe" + }, + { + "_key": "bbae86e9e0fb", + "_type": "youtube", + "id": "s7SqYMRiY8w" + }, + { + "markDefs": [], + "children": [ + { + "_key": "2850604cfc73", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3bbdfcd8d64a" + } + ] + }, + { + "tags": [ + { + "_type": "reference", + "_key": "aa92e6cacd9e", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_key": "2c4557cbb4ce", + "_ref": "c64dbc74-f995-4eb6-a9a8-6c79e75884e9", + "_type": "reference" + } + ], + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_type": "blogPost", + "_id": "0a084ffb6efb", + "meta": { + "slug": { + "current": "introducing-nextflow-for-azure-batch" + } + }, + "publishedAt": "2021-02-22T07:00:00.000Z", + "_rev": "hf9hwMPb7ybAE3bqEU1xB1", + "_createdAt": "2024-09-25T14:15:56Z", + "title": "Introducing Nextflow for Azure Batch", + "body": [ + { + "style": "normal", + "_key": "591b799f20bb", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When the Nextflow project was created, one of the main drivers was to enable reproducible data pipelines that could be deployed across a wide range of execution platforms with minimal effort as well as to empower users to scale their data analysis while facilitating the migration to the cloud.", + "_key": "9e0aed7d0f12", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "ebf91b616224", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a237f4053759" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "21ee8dddaea2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Throughout the years, the computing services provided by cloud vendors have evolved in a spectacular manner. Eight years ago, the model was focused on launching virtual machines in the cloud, then came containers and then the idea of serverless computing which changed everything again. However, the power of the Nextflow abstraction consists of hiding the complexity of the underlying platform. Through the concept of executors, emerging technologies and new platforms can be easily adapted with no changes required to user pipelines.", + "_key": "88d30432ea48" + } + ] + }, + { + "_key": "0287a5a0a09f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8bd2d758fcd7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "79d6200370d2", + "markDefs": [ + { + "href": "https://azure.microsoft.com/en-us/services/batch/", + "_key": "654e38c2393e", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "With this in mind, we could not be more excited to announce that over the past months we have been working with Microsoft to implement built-in support for ", + "_key": "195d6dcada93", + "_type": "span" + }, + { + "text": "Azure Batch", + "_key": "35fc65093b98", + "_type": "span", + "marks": [ + "654e38c2393e" + ] + }, + { + "text": " into Nextflow. Today we are delighted to make it available to all users as a beta release.", + "_key": "e668f013d384", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "2b99bdeebe72", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "de735c38e4ec" + }, + { + "_type": "block", + "style": "h3", + "_key": "0c4fae962d77", + "children": [ + { + "text": "How does it work", + "_key": "1960f5dfff66", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Azure Batch is a cloud-based computing service that allows the execution of highly scalable, container based, workloads in the Azure cloud.", + "_key": "bcd734489efc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4fee32dd51c4" + }, + { + "style": "normal", + "_key": "9e566c970598", + "children": [ + { + "_key": "1ef729198799", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The support for Nextflow comes in the form of a plugin which implements a new executor, not surprisingly named ", + "_key": "bb7a93944c20" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "azurebatch", + "_key": "b846fd0b7bcb" + }, + { + "marks": [], + "text": ", which offloads the execution of the pipeline jobs to corresponding Azure Batch jobs.", + "_key": "a0a0b4bc0865", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d30f49a377e8" + }, + { + "style": "normal", + "_key": "dbf4dbf53f91", + "children": [ + { + "text": "", + "_key": "0d72521d9f86", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "b8ba219ffa1e", + "markDefs": [ + { + "_key": "ffd6707366ff", + "_type": "link", + "href": "https://azure.microsoft.com/en-us/services/storage/blobs/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Each job run consists in practical terms of a container execution which ships the job dependencies and carries out the job computation. As usual, each job is assigned a unique working directory allocated into a ", + "_key": "5ae00fca305e" + }, + { + "text": "Azure Blob", + "_key": "14ba1f170a25", + "_type": "span", + "marks": [ + "ffd6707366ff" + ] + }, + { + "_type": "span", + "marks": [], + "text": " container.", + "_key": "e2a12db15d63" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "1910d5d08c1b", + "children": [ + { + "text": "", + "_key": "6d46c20672ea", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "c80db31ffb21", + "children": [ + { + "_key": "a802cd9b491a", + "_type": "span", + "text": "Let's get started!" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "009a845e4cb0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The support for Azure Batch requires the latest release of Nextflow from the ", + "_key": "e35eb4258363" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "edge", + "_key": "76ed043b5252" + }, + { + "text": " channel (version 21.02-edge or later). If you don't have this, you can install it using these commands:", + "_key": "b81a428b8a52", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7bcf4140af28" + } + ], + "_type": "block", + "style": "normal", + "_key": "66708489693d" + }, + { + "code": "export NXF_EDGE=1\ncurl get.nextflow.io | bash\n./nextflow -self-update", + "_type": "code", + "_key": "a8dc956bbb68" + }, + { + "style": "normal", + "_key": "8688c5f27141", + "children": [ + { + "_type": "span", + "text": "", + "_key": "32c66af65c70" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "f0a2001fd548", + "_type": "span", + "marks": [], + "text": "Note for Windows users, as Nextflow is " + }, + { + "_type": "span", + "text": "\\*", + "_key": "2848ec97018c" + }, + { + "_key": "09b166d52a64", + "_type": "span", + "marks": [], + "text": "nix based tool you will need to run it using the " + }, + { + "text": "Windows subsystem for Linux", + "_key": "2bc45c1672d9", + "_type": "span", + "marks": [ + "245705cea5e5" + ] + }, + { + "marks": [], + "text": ". Also make sure Java 8 or later is installed in the Linux environment.", + "_key": "e798f961b3de", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "126b2a778ed2", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/wsl/install-win10", + "_key": "245705cea5e5" + } + ] + }, + { + "style": "normal", + "_key": "60e4027cd4d1", + "children": [ + { + "text": "", + "_key": "0ced195d0a01", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "1e1c20fa1125", + "markDefs": [], + "children": [ + { + "_key": "3465a8898f1d", + "_type": "span", + "marks": [], + "text": "Once Nextflow is installed, to run your data pipelines with Azure Batch, you will need to create an Azure Batch account in the region of your choice using the Azure Portal. In a similar manner, you will need an Azure Blob container." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3ee0f335db52", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e404d70aaa9f" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the Azure Batch and Blob storage container configured, your ", + "_key": "c732ccfab964" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "ceac146a0a83" + }, + { + "_type": "span", + "marks": [], + "text": " file should be set up similar to the example below:", + "_key": "688a9d75b3ce" + } + ], + "_type": "block", + "style": "normal", + "_key": "40105ce4f958", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "9a2762859822", + "children": [ + { + "_type": "span", + "text": "", + "_key": "551d7d26ae3c" + } + ] + }, + { + "code": "plugins {\n id 'nf-azure'\n}\n\nprocess {\n executor = 'azurebatch'\n}\n\nazure {\n batch {\n location = 'westeurope'\n accountName = ''\n accountKey = ''\n autoPoolMode = true\n }\n storage {\n accountName = \"\"\n accountKey = \"\"\n }\n}", + "_type": "code", + "_key": "1d2111be77ce" + }, + { + "style": "normal", + "_key": "837d675481cb", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1e53e062e81c" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ce8a96f4bbf7", + "markDefs": [], + "children": [ + { + "_key": "1185772ce306", + "_type": "span", + "marks": [], + "text": "Using this configuration snippet, Nextflow will automatically create the virtual machine pool(s) required to deploy the pipeline execution in the Azure Batch service." + } + ] + }, + { + "_key": "9267f273c83a", + "children": [ + { + "_key": "8c2ec7a572d2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "41d25c7e55c8", + "_type": "span", + "marks": [], + "text": "Now you will be able to launch the pipeline execution using the following command:" + } + ], + "_type": "block", + "style": "normal", + "_key": "09e51c8df762", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "6bf857d740b2" + } + ], + "_type": "block", + "style": "normal", + "_key": "824b73f9a4cd" + }, + { + "code": "nextflow run -w az://my-container/work", + "_type": "code", + "_key": "7e93f24f8d00" + }, + { + "_type": "block", + "style": "normal", + "_key": "a3eea520ae20", + "children": [ + { + "_key": "5ad2b1d1b561", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b657523a2148", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Replace ", + "_key": "8cefc8847578", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "<pipeline name="">", + "_key": "8959e6cedd7a", + "_type": "span" + }, + { + "_key": "fd52e16e9c1f", + "_type": "span", + "marks": [], + "text": " with a pipeline name e.g. nextflow-io/rnaseq-nf and " + }, + { + "marks": [ + "code" + ], + "text": "my-container", + "_key": "0238a7e36908", + "_type": "span" + }, + { + "text": " with a blob container in the storage account as defined in the above configuration.", + "_key": "5711eeb2dfe2", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "c202e39dfc52", + "children": [ + { + "text": "", + "_key": "c555a32d1e88", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "8a537cd7aaa1", + "markDefs": [ + { + "_key": "9c01fbbc6940", + "_type": "link", + "href": "/docs/edge/azure.html" + } + ], + "children": [ + { + "_key": "a8cf67eda54a", + "_type": "span", + "marks": [], + "text": "For more details regarding the Nextflow configuration setting for Azure Batch refers to the Nextflow documentation at " + }, + { + "marks": [ + "9c01fbbc6940" + ], + "text": "this link", + "_key": "6759dbb437b9", + "_type": "span" + }, + { + "text": ".", + "_key": "1384952b42ef", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "125376bfc002", + "children": [ + { + "_key": "255d192d50d3", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "b5dcfaa87d16", + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "0e1e72604a01" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "The support for Azure Batch further expands the wide range of computing platforms supported by Nextflow and empowers Nextflow users to deploy their data pipelines in the cloud provider of their choice. Above all, it allows researchers to scale, collaborate and share their work without being locked into a specific platform.", + "_key": "5e3e8a5658b2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "92f4a4585e0c", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1da01d6a01b1", + "children": [ + { + "_key": "148abec6e309", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We thank Microsoft, and in particular ", + "_key": "77c99860013f" + }, + { + "marks": [ + "0f7c716d6d0b" + ], + "text": "Jer-Ming Chia", + "_key": "f0b1f42547ae", + "_type": "span" + }, + { + "text": " who works in the HPC and AI team for having supported and sponsored this open source contribution to the Nextflow framework. ", + "_key": "c2f1096aeb99", + "_type": "span", + "marks": [] + }, + { + "text": "", + "_key": "2a732a2a35bd", + "_type": "span" + }, + { + "_type": "span", + "text": "
", + "_key": "ae1f9cee3382" + }, + { + "text": "
", + "_key": "74773e700090", + "_type": "span" + }, + { + "_type": "span", + "text": "
", + "_key": "1637d28066f2" + }, + { + "_key": "7b9c7d7240fc", + "_type": "span", + "text": "
" + }, + { + "_type": "span", + "text": "
", + "_key": "ccc4761e9b03" + } + ], + "_type": "block", + "style": "normal", + "_key": "48d444d93b73", + "markDefs": [ + { + "href": "https://www.linkedin.com/in/jermingchia/", + "_key": "0f7c716d6d0b", + "_type": "link" + } + ] + } + ], + "_updatedAt": "2024-09-26T09:02:28Z" + }, + { + "_rev": "Qhrcj1462eoyp9RZGGQNso", + "author": { + "_type": "reference", + "_ref": "rob-syme" + }, + "_createdAt": "2024-08-27T08:23:51Z", + "_updatedAt": "2024-09-16T07:32:05Z", + "publishedAt": "2024-09-02T07:17:00.000Z", + "_id": "0b2d6b7b-2e03-41e1-8b10-74ad41686e89", + "title": "Optimizing image segmentation modeling using Seqera Platform", + "meta": { + "slug": { + "current": "data-studios-image-segmentation", + "_type": "slug" + }, + "_type": "meta", + "description": "Performing interactive analysis is considered one of the most difficult phases in the entire bioinformatics process. User-friendly interactive environments that are adjacent to your data and streamline the end-to end analysis process are critical.\n", + "noIndex": false + }, + "body": [ + { + "_key": "92fe05ff0537", + "markDefs": [], + "children": [ + { + "_key": "631a6e0767360", + "_type": "span", + "marks": [], + "text": "Scientific research is rarely direct, and workflows commonly require further downstream analyses beyond pipeline runs. While Nextflow excels at batch automation, human interpretation of the generated data is also an essential part of the scientific process. Interactive environments facilitate this process by enabling model refinement and report generation, increasing efficiency and facilitating informed decision-making." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8482564/", + "_key": "1bbb46da0dcc", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Performing interactive analysis is considered one of the ", + "_key": "dbe6c56444880", + "_type": "span" + }, + { + "marks": [ + "1bbb46da0dcc" + ], + "text": "most challenging steps in the entire bioinformatics process", + "_key": "442e2de71eb8", + "_type": "span" + }, + { + "text": ". Users face cumbersome, time-consuming, and error-prone manual tasks such as transferring data from the cloud to local storage and navigating various APIs, programming languages, libraries, and tools. ", + "_key": "201701c45af0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "User-friendly interactive environments", + "_key": "04f2d036269a" + }, + { + "text": " that exist adjacent to your data are critical to streamline end-to-end computational analyses.", + "_key": "9aeb813c2016", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "fc43232ac000" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/latest/data/data-studios", + "_key": "11674253ce1c" + }, + { + "href": "https://seqera.io/blog/data-studios-announcement/", + "_key": "203ac6ca0086", + "_type": "link" + }, + { + "href": "https://nf-co.re/molkart/1.0.0", + "_key": "2e6de2623899", + "_type": "link" + } + ], + "children": [ + { + "text": "Seqera’s ", + "_key": "1a4b67c2eec10", + "_type": "span", + "marks": [] + }, + { + "text": "Data Studios", + "_key": "1a4b67c2eec11", + "_type": "span", + "marks": [ + "11674253ce1c" + ] + }, + { + "marks": [], + "text": " bridges the gap between pipeline outputs and secure interactive analysis environments by bringing ", + "_key": "1a4b67c2eec12", + "_type": "span" + }, + { + "text": "reproducible, containerized and interactive analytical notebook environments", + "_key": "1a4b67c2eec13", + "_type": "span", + "marks": [ + "203ac6ca0086" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to your data. In this way, the output of one workflow can be analyzed manually and be used as the input for a subsequent workflow. Here, we show how a scientist can use the Seqera Platform’s Runs and Data Studios features to ", + "_key": "1a4b67c2eec14" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "optimize image segmentation model iteration", + "_key": "1a4b67c2eec15" + }, + { + "_key": "1a4b67c2eec16", + "_type": "span", + "marks": [], + "text": " in the " + }, + { + "_type": "span", + "marks": [ + "2e6de2623899" + ], + "text": "nf-core/molkart", + "_key": "1a4b67c2eec17" + }, + { + "text": " pipeline.", + "_key": "1a4b67c2eec18", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "20397f5fc865" + }, + { + "style": "normal", + "_key": "a3328a6c71f1", + "markDefs": [], + "children": [ + { + "_key": "6fcebc357158", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Watch the full presentation from Nextflow Summit in Boston, May 2024 ", + "_key": "3344b7d0b586", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "915063774d4c", + "markDefs": [] + }, + { + "_key": "2c940faebb5f", + "_type": "youtube", + "id": "sIFL-Pk9Wl4" + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "How does image segmentation work?", + "_key": "700ac256fecf", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "7df84a8fb865" + }, + { + "_key": "6d0963afece2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A central task in molecular biology is quantifying the abundance of different molecules (often RNAs or proteins) per cell or structure. Traditionally, this was done by sampling entire tissues or, in later approaches, using single-cell methods to measure such molecules within each cell. However, both bulk and single-cell omics methods lose information about the spatial organization of cells within a tissue, a key factor during tissue development and a potential driver for diseases like cancer. Spatial omics, which combines imaging with ultra-sensitive assays to measure molecules, now allows the identification of hundreds to thousands of transcripts on tissue sections.", + "_key": "5a978796f4610" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "806e4ab36139", + "markDefs": [ + { + "_type": "link", + "href": "http://nf-core/molkart", + "_key": "b3d4e5d943ce" + }, + { + "_type": "link", + "href": "https://resolvebiosciences.com/", + "_key": "f2172ebc7417" + }, + { + "_type": "link", + "href": "https://github.com/MouseLand/cellpose", + "_key": "aa9d7482adbf" + } + ], + "children": [ + { + "_key": "707cd6cbb7851", + "_type": "span", + "marks": [ + "b3d4e5d943ce" + ], + "text": "nf-core/molkart" + }, + { + "text": " is a spatial transcriptomics pipeline for processing ", + "_key": "707cd6cbb7852", + "_type": "span", + "marks": [] + }, + { + "text": "Molecular Cartography data by Resolve Bioscience", + "_key": "707cd6cbb7853", + "_type": "span", + "marks": [ + "f2172ebc7417" + ] + }, + { + "text": ", which measures hundreds of RNA transcripts on a tissue section using single-molecule fluorescent in-situ hybridization (smFISH) (Figure 1). This pipeline includes a Nextflow module for the popular segmentation method ", + "_key": "707cd6cbb7854", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "aa9d7482adbf" + ], + "text": "Cellpose", + "_key": "707cd6cbb7855", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", which allows a human-in-the-loop approach for improving cell segmentation. Conveniently, the nf-core/molkart pipeline includes a workflow branch for generating custom training data from a source data set. Training a performant, custom cellpose model typically requires multiple time consuming human-in-the-loop model iterations within an interactive analysis environment.\n", + "_key": "707cd6cbb7856" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "ff68a5ff91f0", + "asset": { + "_type": "reference", + "_ref": "image-2bf639c49db818e0ac460c03bf1358c842865511-1600x900-png" + } + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 1. ", + "_key": "e38653a66779" + }, + { + "marks": [ + "em" + ], + "text": "Adapted workflow diagram of the nf-core/molkart pipeline for processing molecular cartography data using Nextflow. Original image data shown was taken from the literature (", + "_key": "ac613f5dd761", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em", + "71d69b8521c9" + ], + "text": "Perico et al", + "_key": "6de7c14d3d24" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": ".).", + "_key": "76c52cc20136" + } + ], + "_type": "block", + "style": "normal", + "_key": "782b164da2b9", + "markDefs": [ + { + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3", + "_key": "71d69b8521c9" + } + ] + }, + { + "_key": "4d9dddc44892", + "markDefs": [ + { + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3", + "_key": "e387c2f0dc3e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We used Data Studios to bring the tertiary analysis adjacent to the data in cloud storage, using data from ta 2024 preprint by ", + "_key": "d2dab384f2980" + }, + { + "_type": "span", + "marks": [ + "e387c2f0dc3e" + ], + "text": "Perico et. al", + "_key": "46434dbd743d" + }, + { + "_type": "span", + "marks": [], + "text": ". This allows us to iteratively train and improve a custom cellpose model for our specific dataset (Figure 2).", + "_key": "57b73961eea5" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "52cd6f4b80c7", + "asset": { + "_ref": "image-2d79c51097a15dfee042f120bbda1bebe4b129a4-1600x900-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "style": "normal", + "_key": "7a444b4d08d3", + "markDefs": [ + { + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3", + "_key": "2c25749c710f" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 2. ", + "_key": "dfba0db9702b0" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Adapted workflow diagram of the nf-core/molkart pipeline using Data Studios (highlighted in gray) to iteratively train a custom cellpose model to use as input for cell segmentation. Original image data shown was taken from the literature (", + "_key": "fdbbdf7dd8a9" + }, + { + "marks": [ + "em", + "2c25749c710f" + ], + "text": "Perico et al", + "_key": "862bb7f81cd3", + "_type": "span" + }, + { + "_key": "a88e1550c228", + "_type": "span", + "marks": [ + "em" + ], + "text": ".).\n" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "3b0ef9de8028", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Adding Data Studios to the workflow", + "_key": "ebdcac6fc010", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f7eb2c08e442", + "markDefs": [], + "children": [ + { + "_key": "f286f6dec6a60", + "_type": "span", + "marks": [], + "text": "Using Data Studios as part of an adapted workflow was extremely beneficial:" + } + ] + }, + { + "_key": "458e3394230e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "797be56fead20" + } + ], + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ff0a1dded277", + "listItem": "number", + "markDefs": [ + { + "href": "https://napari.org/stable/", + "_key": "33bd89a47469", + "_type": "link" + }, + { + "_type": "link", + "href": "https://qupath.github.io/", + "_key": "7ddd78939bfc" + }, + { + "_key": "a8f46f9aa40a", + "_type": "link", + "href": "https://imagej.net/software/fiji/" + } + ], + "children": [ + { + "text": "Rapid review of image training data", + "_key": "b9b6ceaf6cc30", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " –", + "_key": "b9b6ceaf6cc31" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " ", + "_key": "b9b6ceaf6cc32" + }, + { + "marks": [], + "text": "Images can be quickly reviewed directly in the cloud-hosted Data Studio analysis environment using common tools such as ", + "_key": "b9b6ceaf6cc33", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "33bd89a47469" + ], + "text": "napari", + "_key": "b9b6ceaf6cc34" + }, + { + "marks": [], + "text": ", ", + "_key": "b9b6ceaf6cc35", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "7ddd78939bfc" + ], + "text": "QuPath", + "_key": "b9b6ceaf6cc36" + }, + { + "text": ", or ", + "_key": "b9b6ceaf6cc37", + "_type": "span", + "marks": [] + }, + { + "text": "Fiji", + "_key": "b9b6ceaf6cc38", + "_type": "span", + "marks": [ + "a8f46f9aa40a" + ] + }, + { + "_key": "b9b6ceaf6cc39", + "_type": "span", + "marks": [], + "text": ". Prior to Data Studios, bioinformaticians would typically download the images, review, and re-upload to blob storage." + } + ] + }, + { + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Collaboratively train a custom model in-situ ", + "_key": "1774f85b64fa0", + "_type": "span" + }, + { + "text": "– Using a GPU-enabled compute environment for the Data Studios session, we used cellpose to train a new custom model on-the-fly using the previously generated image crops. Using a shareable URL, Data Studios enables seamless collaboration between data scientists and bench scientists with domain expertise in a single location.", + "_key": "1774f85b64fa1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2ea3f4af5025" + }, + { + "_type": "block", + "style": "normal", + "_key": "9aaefadfd9da", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Apply the new model to the original data ", + "_key": "5596d4c9c4d70" + }, + { + "text": "– The new, manually-trained model was then applied to the original, full size image dataset. The cell segmentation results of the custom model can be inspected in the same Data Studios instance using any standard tool.\n", + "_key": "3e9f21a0c9d6", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "_type": "image", + "_key": "0b1b42edce43", + "asset": { + "_ref": "image-481b2d03aee5622987b731a4819945bdced48291-1920x1080-png", + "_type": "reference" + } + }, + { + "_key": "164ca6145027", + "markDefs": [ + { + "_key": "b3bd36addbaf", + "_type": "link", + "href": "https://www.biorxiv.org/content/10.1101/2024.02.05.578898v3" + } + ], + "children": [ + { + "marks": [ + "strong", + "em" + ], + "text": "Figure 3. ", + "_key": "bc888466590a", + "_type": "span" + }, + { + "_key": "94887d38f64d", + "_type": "span", + "marks": [ + "em" + ], + "text": "Schematic workflow of image segmentation using nf-core/molkart with (bottom) and without (top) Data Studios. Original image data shown was taken from the literature (" + }, + { + "_type": "span", + "marks": [ + "b3bd36addbaf", + "em" + ], + "text": "Perico et al", + "_key": "9bf7d5bb618e" + }, + { + "marks": [ + "em" + ], + "text": ".).", + "_key": "c8a5c75c8854", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [ + "strong" + ], + "text": "The benefits of Data Studios", + "_key": "ce0ed88dbf6a0", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "bf6040867323", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0e2b6a59dd98", + "listItem": "bullet", + "markDefs": [ + { + "_key": "4c1738586069", + "_type": "link", + "href": "https://seqera.io/fusion/" + } + ], + "children": [ + { + "_key": "c727cb86c6ac0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Data remains in-situ" + }, + { + "marks": [], + "text": " – No shuttling large volumes of data back and forth between your cloud storage and local analysis environments, which can quickly become expensive with ingress and egress charges, is extremely inefficient, and can result in data loss. Using the ", + "_key": "c727cb86c6ac1", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "4c1738586069" + ], + "text": "Fusion file system", + "_key": "c727cb86c6ac2" + }, + { + "marks": [], + "text": ", Data Studios enables direct file access to cloud blob storage and is incredibly performant.", + "_key": "c727cb86c6ac3", + "_type": "span" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "b8155228df9a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "28efa7b3f7af0" + } + ], + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Stable, containerized analysis environments", + "_key": "d389e047db4e0" + }, + { + "marks": [], + "text": " – Data Studio sessions are checkpointed, and can be rolled back to any previous state each time the session is stopped and restarted. Each checkpoint preserves the state of the running machine at a point in time, ensuring consistency and reproducibility of the environment, the software used, and data worked with.", + "_key": "d389e047db4e1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9ca108cbb51b" + }, + { + "_type": "block", + "style": "normal", + "_key": "ca582579a6fd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6f1fd4d1251a0" + } + ] + }, + { + "style": "normal", + "_key": "1415c744c5c3", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Provision only the resources you need", + "_key": "52f988cd7a9d0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": " – Data Studio sessions are fully customizable. Based on the analysis task(s) at hand, they can be provisioned as lean or as fully-featured as required, for example, making them GPU-enabled or adding hundreds of cores.", + "_key": "52f988cd7a9d1", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "7e36cc508b12", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "36a4f06d680e0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Permissions are centrally managed", + "_key": "745473d296350" + }, + { + "_key": "745473d296351", + "_type": "span", + "marks": [], + "text": " – Organization and workspace credentials are centrally managed by your organization administrators, ensuring only authenticated users with the appropriate permissions can connect to the data and analysis environment(s). Bioinformaticians and data scientists shouldn’t spend time managing infrastructure and permissions." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c086ca066584" + }, + { + "_key": "d6ed9176d77a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6320ec1510da0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5af126e9be9c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Secure, real time collaboration –", + "_key": "b08f197315d30" + }, + { + "_type": "span", + "marks": [], + "text": " The shareable URL feature ensures safe collaboration within, or across, bioinformatician and data science teams.", + "_key": "b08f197315d31" + } + ] + }, + { + "style": "normal", + "_key": "47ce6afd2c6e", + "markDefs": [], + "children": [ + { + "_key": "375ea8c072cb", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "ef8ae7eca4d6", + "markDefs": [], + "children": [ + { + "text": "Streamline the entire data lifecycle", + "_key": "caf7f2d160530", + "_type": "span", + "marks": [ + "strong" + ] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Data Studios can ", + "_key": "c08a98a9fb88", + "_type": "span" + }, + { + "text": "streamline the entire end-to-end scientific data lifecycle", + "_key": "0ed1694aa9ad", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "a3942ac3f18e", + "_type": "span", + "marks": [], + "text": " by bringing reproducible, containerized and interactive analytical notebook environments to your data in real-time. This allows you to seamlessly transition from Nextflow pipeline outputs to secure interactive environments, consolidating data and analytics into one unified location." + } + ], + "_type": "block", + "style": "normal", + "_key": "84b24ceaf43a", + "markDefs": [] + }, + { + "style": "normal", + "_key": "564b3312ce1a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "befe51b498150" + } + ], + "_type": "block" + }, + { + "style": "blockquote", + "_key": "fbdc3a7ad46c", + "markDefs": [], + "children": [ + { + "text": "“Data Studios enables the creation of the needed package environment for any project quickly, expediting the project start-up process. This allows us to promptly focus on data analysis and efficiently share the environment with the team”\n\n- ", + "_key": "b7d2c5cc7e600", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "Lorena Pantano, PhD\nDirector of Bioinformatics Platform, Harvard Chan Bioinformatics Core", + "_key": "6fb6b1456ec5", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "8c0662e7f397", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "270b32efa102" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02Nhk4y0", + "_key": "98aa74b74492" + }, + { + "_key": "cf07f789df27", + "_type": "link", + "href": "https://hubs.la/Q02NhjDZ0" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "View Data Studios in the Seqera Platform ", + "_key": "76dd999d4e8f0" + }, + { + "text": "Community Showcase workspace", + "_key": "76dd999d4e8f1", + "_type": "span", + "marks": [ + "98aa74b74492" + ] + }, + { + "marks": [], + "text": " or start a ", + "_key": "76dd999d4e8f2", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "cf07f789df27" + ], + "text": "free trial today", + "_key": "76dd999d4e8f3" + }, + { + "_type": "span", + "marks": [], + "text": "!", + "_key": "76dd999d4e8f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "0fa70ffe4664" + } + ], + "_type": "blogPost", + "tags": [ + { + "_ref": "82fd60f1-c6d0-4b8a-9c5d-f971c622f341", + "_type": "reference", + "_key": "07cea8dc5caa" + }, + { + "_key": "5b7351bdac98", + "_ref": "f1d61674-9374-4d2c-97c2-55778db7c922", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "cb6a3c4b282b", + "_ref": "32377094-ace0-4f1e-bb48-b47f02d3849e" + }, + { + "_type": "reference", + "_key": "48f31265ebdd", + "_ref": "b70b4c8b-10e9-4630-b43f-e11b33f14daf" + }, + { + "_key": "231a068aa82a", + "_ref": "8c6a46a2-4653-49fb-a5c3-ddf572a75381", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "28d742bb8a84", + "_ref": "2b5c9a56-b491-42aa-b291-86611d77ccec" + } + ] + }, + { + "_createdAt": "2024-05-13T11:54:28Z", + "_rev": "UBGILU345IzqgWYhEN5Di2", + "_type": "blogPost", + "author": { + "_ref": "109f0c7b-3d40-42a9-af77-3844f0e031c0", + "_type": "reference" + }, + "tags": [ + { + "_key": "f30d3e591314", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "4525d8907a1f", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058" + }, + { + "_type": "reference", + "_key": "7c9827906277", + "_ref": "ab59634e-a349-468d-8f99-cb9fe4c38228" + } + ], + "_id": "0d583937-1d7f-4c31-9e79-d8f1e5f2a2da", + "publishedAt": "2024-05-15T13:59:00.000Z", + "_updatedAt": "2024-05-15T10:12:46Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "This is a joint article contributed to the Seqera blog by Jon Manning of Seqera and Felix Krueger of Altos Labs describing the new nf-core/riboseq pipeline.", + "_key": "8c2ee84cdf5e0", + "_type": "span", + "marks": [ + "em" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "fc11d5317163" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "39d86b09469d" + }, + { + "_type": "link", + "href": "https://nf-co.re/riboseq", + "_key": "f22304d582ae" + }, + { + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Ribosome_profiling", + "_key": "23797f8146f8" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In April 2024, the bioinformatics community welcomed a significant addition to the ", + "_key": "5355407782e60" + }, + { + "text": "nf-core", + "_key": "5355407782e61", + "_type": "span", + "marks": [ + "39d86b09469d" + ] + }, + { + "marks": [], + "text": " suite: the ", + "_key": "5355407782e62", + "_type": "span" + }, + { + "text": "nf-core/riboseq", + "_key": "5355407782e63", + "_type": "span", + "marks": [ + "f22304d582ae" + ] + }, + { + "text": " pipeline. This new tool, born from a collaboration between Altos Labs and Seqera, underscores the potential of strategic partnerships to advance scientific research. In this article, we provide some background on the project, offer details on the pipeline, and explain how readers can get started with ", + "_key": "5355407782e64", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "23797f8146f8" + ], + "text": "Ribo-seq", + "_key": "5355407782e65" + }, + { + "_type": "span", + "marks": [], + "text": " analysis.", + "_key": "5355407782e66" + } + ], + "_type": "block", + "style": "normal", + "_key": "a96b84f9b665" + }, + { + "markDefs": [], + "children": [ + { + "_key": "06511e51fc0b", + "_type": "span", + "marks": [], + "text": "A Fruitful Collaboration" + } + ], + "_type": "block", + "style": "h2", + "_key": "ff2e29964409" + }, + { + "_key": "212704cdad6c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Altos Labs is known for its ambitious efforts in harnessing cellular rejuvenation to reverse disease, injury, and disabilities that can occur throughout life. Their scientific strategy heavily relies on understanding cellular mechanisms via advanced technologies. Ribo-seq provides insights into the real-time translation of proteins, a core process often dysregulated during aging and disease. Altos Labs needed a way to ensure reliable, reproducible Ribo-seq analysis that its research teams could use. While a Ribo-seq pipeline had been started in nf-core, limited progress had been made. Seqera seemed the ideal partner to help build one!", + "_key": "ef4460f305a4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3a4e325a6885", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/nextflow/", + "_key": "afd8d4976f75" + }, + { + "_type": "link", + "href": "https://www.zs.com/", + "_key": "8fc76bfd5785" + } + ], + "children": [ + { + "_key": "402551d96a99", + "_type": "span", + "marks": [], + "text": "Seqera, known for creating and developing the " + }, + { + "_key": "a11895ee51be", + "_type": "span", + "marks": [ + "afd8d4976f75" + ], + "text": "Nextflow DSL" + }, + { + "_type": "span", + "marks": [], + "text": " and being an active partner in establishing community standards on nf-core, brought the expertise needed to translate Altos Labs' vision into a viable community pipeline. As part of this collaboration, we formed a working group and also reached out to colleagues at ", + "_key": "206247a437cc" + }, + { + "_type": "span", + "marks": [ + "8fc76bfd5785" + ], + "text": "ZS", + "_key": "da520fc0d7f3" + }, + { + "_type": "span", + "marks": [], + "text": " and other community members who had done prior work with Ribosome profiling in Nextflow. Our goal was not only to enhance Ribo-seq analysis capabilities but also to ensure the pipeline’s sustainability through a community-driven process.", + "_key": "c8e26b5b7392" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "110443549dbc", + "markDefs": [], + "children": [ + { + "text": "Development Insights", + "_key": "023772c169b7", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "1bb6d0dcf94a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The nf-core/riboseq project was structured into several phases:", + "_key": "fcef7ffc7722" + } + ], + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "Initial planning", + "_key": "04af5c122b050", + "_type": "span" + }, + { + "_key": "04af5c122b051", + "_type": "span", + "marks": [], + "text": ": This phase involved detailed discussions between the Scientific Development team at Seqera, Altos Labs, and expert partners to ensure alignment with best practices and effective tool selection." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9bd450582af3" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/rnaseq", + "_key": "3fa0f88295d5" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Adapting existing components", + "_key": "4eb6302b38970" + }, + { + "marks": [], + "text": ": Key pre-processing and alignment functions were adapted from the ", + "_key": "4eb6302b38971", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3fa0f88295d5" + ], + "text": "nf-core/rnaseq", + "_key": "4eb6302b38972" + }, + { + "_key": "4eb6302b38973", + "_type": "span", + "marks": [], + "text": " pipeline, allowing for shareability, efficiency, and scalability." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ef189e78f7f5" + }, + { + "_type": "block", + "style": "normal", + "_key": "dc6acae62561", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/zhpn1024/ribotish", + "_key": "6be1a3f37f71", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/smithlabcode/ribotricer", + "_key": "67a956a543b0" + }, + { + "_type": "link", + "href": "https://www.bioconductor.org/packages/release/bioc/html/anota2seq.html", + "_key": "5f9cca0d1922" + }, + { + "href": "https://biocontainers.pro/", + "_key": "a24a587b6c75", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/modules", + "_key": "d813571ed2e7" + } + ], + "children": [ + { + "text": "New tool integration", + "_key": "f59020155b400", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": ": Specific tools for Ribo-seq analysis, such as ", + "_key": "f59020155b401" + }, + { + "_type": "span", + "marks": [ + "6be1a3f37f71" + ], + "text": "Ribo-TISH", + "_key": "f59020155b402" + }, + { + "text": ", ", + "_key": "f59020155b403", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "67a956a543b0" + ], + "text": "Ribotricer", + "_key": "f59020155b404" + }, + { + "text": ", and ", + "_key": "f59020155b405", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "5f9cca0d1922" + ], + "text": "anota2seq", + "_key": "f59020155b406", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", were wrapped into modules using ", + "_key": "f59020155b407" + }, + { + "_key": "f59020155b408", + "_type": "span", + "marks": [ + "a24a587b6c75" + ], + "text": "Biocontainers" + }, + { + "_type": "span", + "marks": [], + "text": ", within comprehensive testing frameworks to prevent regression and ensure reliability. These components were contributed to the ", + "_key": "f59020155b409" + }, + { + "text": "nf-core/modules", + "_key": "f59020155b4010", + "_type": "span", + "marks": [ + "d813571ed2e7" + ] + }, + { + "text": " repository, which will now be available for the wider community to reuse, independent of this effort.", + "_key": "f59020155b4011", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "06ef45923942", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "27cae4355e4d0", + "_type": "span", + "marks": [ + "em" + ], + "text": "Pipeline development" + }, + { + "_key": "27cae4355e4d1", + "_type": "span", + "marks": [], + "text": ": Individual components were stitched together coherently to create the nf-core/riboseq pipeline, with its own testing framework and user documentation." + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "a11532d70cbc", + "markDefs": [], + "children": [ + { + "text": "Technical and Community Challenges", + "_key": "9af59990c0c00", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "449d8032618a", + "markDefs": [], + "children": [ + { + "text": "Generalizing existing functionality", + "_key": "262d18dad67e0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "text": "nf-core has become an encyclopedia of components, including ", + "_key": "48bd49dd01300", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "601d56009a00" + ], + "text": "modules", + "_key": "48bd49dd01301", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "48bd49dd01302" + }, + { + "marks": [ + "9d9691a3a5b4" + ], + "text": "subworkflows", + "_key": "48bd49dd01303", + "_type": "span" + }, + { + "marks": [], + "text": " that developers can leverage to build Nextflow pipelines. RNA-seq data analysis, in particular, is well served by the nf-core/rnaseq pipeline, one of the longest-standing and most popular members of the nf-core community. Some of the components used in nf-core/rnaseq were not written with re-use in mind, so the first task in this project was to abstract the commodity components for processes such as preprocessing and quantification so that they could be effectively shared by the nf-core/riboseq pipeline.", + "_key": "48bd49dd01304", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "af3382a99d21", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/modules", + "_key": "601d56009a00" + }, + { + "_key": "9d9691a3a5b4", + "_type": "link", + "href": "https://nf-co.re/subworkflows" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Test dataset generation", + "_key": "68d04249013b0" + } + ], + "_type": "block", + "style": "h3", + "_key": "5669adb1dcd3" + }, + { + "_type": "block", + "style": "normal", + "_key": "2767c14b9d80", + "markDefs": [], + "children": [ + { + "text": "Another significant hurdle was generating robust test data capable of supporting the ongoing quality assurance of our software. In Ribo-seq analysis, the basic operation of some tools depends on the quality of input data, so random down-sampling of variable quality input reads, especially at shallow depths may not be useful to generate test data. To overcome this, we implemented a targeted down-sampling strategy, selectively using input reads that meet high-quality standards and are known to align well with a specific chromosome. This method enabled us to produce a concise yet effective test data set, ensuring that our Ribo-seq tools operate reliably under realistic conditions.", + "_key": "a4da2ac411130", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Tool selection", + "_key": "27bceef25c9e0" + } + ], + "_type": "block", + "style": "h3", + "_key": "2aaebc117fde" + }, + { + "_key": "1cb88d14f05e", + "markDefs": [], + "children": [ + { + "_key": "42c9c78112020", + "_type": "span", + "marks": [], + "text": "A primary challenge in developing the pipeline was the selection of high-quality, sustainable software. In bioinformatics, funding often limits software development, and many tools are poorly maintained. Furthermore, the understanding of what software 'works' can be ambiguous, embedded in the community's shared knowledge rather than documented formally. Our cooperative approach enabled us to make informed decisions and contribute improvements to the underlying software, enhancing utility for users beyond the nf-core community." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Parameter selection", + "_key": "b2c37914fe590", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "d1e0de03d5a1", + "markDefs": [] + }, + { + "style": "normal", + "_key": "2523548b9954", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Selecting the correct parameter settings for optimal operation of bioinformatics tools is a perennial problem in the community. In particular, the settings for the STAR alignment algorithm have very different constraints in Ribo-seq analysis relative to generic RNA-seq analysis. We conducted a series of benchmarks to assess the impact on alignment statistics of various combinations of parameters. We settled on a starting set, but this is a subject of continuing discussion with community members to drive further optimizations.", + "_key": "decd6cfc25240" + } + ], + "_type": "block" + }, + { + "_key": "a8c53464a53f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipeline Features", + "_key": "9a31de208e060" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "45f1476190e5", + "markDefs": [], + "children": [ + { + "_key": "f51ea64a9e180", + "_type": "span", + "marks": [], + "text": "The nf-core/riboseq pipeline is now a robust framework written using the nf-core pipeline template, and specifically tailored to handle the complexities of Ribo-seq data analysis." + } + ], + "_type": "block" + }, + { + "_key": "9024177c2c73", + "asset": { + "_type": "reference", + "_ref": "image-83f90945d29b41fcdc562789b06f3abbdbfa4d9a-1010x412-png" + }, + "_type": "image" + }, + { + "style": "normal", + "_key": "c4c2c021e47b", + "markDefs": [], + "children": [ + { + "text": "Here is what it offers:", + "_key": "3460577cae3f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "cfb811774489", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Baseline read preprocessing using processes adapted from existing nf-core components.", + "_key": "5e7ebc27391f0" + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "f78073ef3267", + "listItem": "bullet", + "markDefs": [ + { + "_key": "159e3bc6217d", + "_type": "link", + "href": "https://github.com/alexdobin/STAR" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Alignment to references with ", + "_key": "4ce6dc424aed0" + }, + { + "text": "STAR", + "_key": "4ce6dc424aed1", + "_type": "span", + "marks": [ + "159e3bc6217d" + ] + }, + { + "_key": "4ce6dc424aed2", + "_type": "span", + "marks": [], + "text": ", producing both transcriptome and genome alignments." + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "3cdb46402566", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Analysis of read distribution around protein-coding regions to assess frame bias and P-site offsets. This produces a rich selection of diagnostic plots to assess Ribo-seq data quality.", + "_key": "1b345d3fa4f80", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "9e3414d59445", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "3299c56efe000", + "_type": "span", + "marks": [], + "text": "Prediction and identification of translated open reading frames using tools like Ribo-TISH and Ribotricer." + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c39d9d7b14f8", + "_type": "span", + "marks": [], + "text": "Assessment of translational efficiency, which requires matched RNA-seq and Ribo-seq data, facilitated by the anota2seq Bioconductor package (see dot plot below)." + } + ], + "_type": "block", + "style": "normal", + "_key": "9e8c117a96a2" + }, + { + "asset": { + "_ref": "image-ca5f9967df813470051fcf548e962bdbf4c50ee5-624x624-png", + "_type": "reference" + }, + "_type": "image", + "_key": "7122c68ade88" + }, + { + "children": [ + { + "text": "An example result from anota2seq, a tool used to study gene expression, shows how transcription and translation are connected. The x-axis shows changes in overall mRNA levels (transcription) between a treated and a control group, while the y-axis displays changes in the rate of protein synthesis (translation) between those groups, as measured by Ribo-seq. Grey points represent genes with no significant change in either metric and most points align near the center of the x-axis, indicating little change in mRNA levels. However, some genes exhibit increased (orange) or decreased (red) protein synthesis, suggesting direct regulation of translation rather than changes driven solely by mRNA abundance.", + "_key": "57c0e67a28250", + "_type": "span", + "marks": [ + "em" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "067ad9c9d6d7", + "markDefs": [] + }, + { + "style": "normal", + "_key": "46beba019134", + "markDefs": [ + { + "_key": "34ab33c4a8e1", + "_type": "link", + "href": "https://nf-co.re/riboseq/#usage" + }, + { + "href": "https://nfcore.slack.com/channels/riboseq", + "_key": "218183b5348d", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "If you are a researcher interested in Ribo-seq data analysis, you can test the pipeline by following the instructions in the ", + "_key": "e5078088e49b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "34ab33c4a8e1" + ], + "text": "getting started", + "_key": "e5078088e49b1" + }, + { + "_key": "e5078088e49b2", + "_type": "span", + "marks": [], + "text": " section of the pipeline. Please feel free to submit bugs and feature requests to drive ongoing improvements. You can also become part of the conversation by joining the " + }, + { + "_key": "e5078088e49b3", + "_type": "span", + "marks": [ + "218183b5348d" + ], + "text": "#riboseq" + }, + { + "text": " channel in the nf-core community Slack workspace. We would love to see you there!", + "_key": "e5078088e49b4", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "bd13a8c55f6e", + "_type": "span", + "marks": [], + "text": "Next Steps" + } + ], + "_type": "block", + "style": "h2", + "_key": "515022911e71" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Following this initial phase of work, Seqera and Altos Labs have handed over the nf-core/riboseq pipeline to the nf-core community for ongoing maintenance and development. As members of that community, we will continue to play a part in enhancing the pipeline going forward. We hope others will benefit from this effort and continue to improve and refine pipeline functionality.", + "_key": "14a152a9174f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "2d75d51ff270" + }, + { + "children": [ + { + "marks": [], + "text": "Coincidentally the authors of ", + "_key": "98347010c2330", + "_type": "span" + }, + { + "text": "riboseq-flow", + "_key": "98347010c2331", + "_type": "span", + "marks": [ + "46fa6099abc2" + ] + }, + { + "marks": [], + "text": " published their related work on the same day that nf-core/riboseq was first released. This pipeline has a highly complementary set of steps, and there is already ongoing collaboration to work together to build an even better community resource.", + "_key": "98347010c2332", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "09c10fe38376", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/iraiosub/riboseq-flow", + "_key": "46fa6099abc2" + } + ] + }, + { + "children": [ + { + "_key": "e5fdf870848b0", + "_type": "span", + "marks": [], + "text": "Empowering Research and Innovation" + } + ], + "_type": "block", + "style": "h2", + "_key": "c566b4d435e3", + "markDefs": [] + }, + { + "_key": "99da8271ab0f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The joint contribution of Seqera and Altos Labs to the nf-core/riboseq pipeline highlights how collaboration between industry and open-source communities can result in tools that push scientific boundaries and foster community engagement and development. By adhering to rigorous code quality and testing standards, nf-core/riboseq ensures researchers access to a dependable, cutting-edge tool.", + "_key": "35352a1b306b0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "56719298b452", + "markDefs": [ + { + "_type": "link", + "href": "mailto:services@seqera.io", + "_key": "ccafa728bca7" + } + ], + "children": [ + { + "text": "We believe this new pipeline is poised to be vital in studying protein synthesis and its implications for aging and health. This is not just a technical achievement - it's a step forward in collaborative, open scientific progress.", + "_key": "53386085eb760", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "If you have a project in mind where Seqera may be able to help with our Professional Services offerings, please contact us at ", + "_key": "cafe02f0755d", + "_type": "span" + }, + { + "text": "services@seqera.io", + "_key": "53386085eb761", + "_type": "span", + "marks": [ + "ccafa728bca7" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". We are the content experts for Nextflow, nf-core, and the Seqera Platform, and can offer tailored solutions and expert guidance to help you fulfill your objectives.", + "_key": "53386085eb762" + } + ], + "_type": "block", + "style": "normal", + "_key": "6e42514da79e", + "markDefs": [ + { + "href": "mailto:services@seqera.io", + "_key": "ccafa728bca7", + "_type": "link" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.altoslabs.com/", + "_key": "026178e92bb6" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To learn more about Altos Labs, visit ", + "_key": "3babdea8c79d0" + }, + { + "marks": [ + "026178e92bb6" + ], + "text": "https://www.altoslabs.com/", + "_key": "3babdea8c79d1", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "3babdea8c79d2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5dc365dc556" + }, + { + "_type": "block", + "style": "h2", + "_key": "5b95f381569b", + "markDefs": [], + "children": [ + { + "_key": "48b61c9282e00", + "_type": "span", + "marks": [], + "text": "Acknowledgments" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "258428890647", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "nf-core/riboseq was initially written by Jonathan Manning (Bioinformatics Engineer at Seqera) in collaboration with Felix Krueger and Christel Krueger (Altos Labs). The development work carried out on the pipeline was funded by Altos Labs. We thank the following people for their input (", + "_key": "d836d0eff50e0" + }, + { + "text": "in alphabetical order", + "_key": "d836d0eff50e1", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": "):", + "_key": "d836d0eff50e2", + "_type": "span" + } + ] + }, + { + "_key": "be9ad649bb8d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Felipe Almeida (ZS)", + "_key": "376c006c20de0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Anne Bresciani (ZS)", + "_key": "6046a5e41c110", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "abb0a8d9fba2", + "listItem": "bullet" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "31c2f31a40bc", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Caroline Eastwood (University of Edinburgh)", + "_key": "040c3d125ae60" + } + ] + }, + { + "_key": "ce8f076685cf", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Maxime U Garcia (Seqera)", + "_key": "f3c530a930470", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "7b34ffefab7d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Mikhail Osipovitch (ZS)", + "_key": "e21649c58e7b0" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Jack Tierney (University College Cork)", + "_key": "1f18a294d9a20", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "02884c22d195", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "8f03c90bd810", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Edward Wallace (University of Edinburgh)\n\n", + "_key": "86b2bce07178", + "_type": "span" + } + ], + "level": 1 + }, + { + "children": [ + { + "text": "", + "_key": "1da880ad30a0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e59fb1d47363", + "markDefs": [] + }, + { + "_key": "736ce4dde440", + "markDefs": [], + "children": [ + { + "text": "\n\n", + "_key": "1c8d35ffcae9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "title": "nf-core/riboseq: A collaboration between Altos Labs and Seqera", + "meta": { + "noIndex": false, + "slug": { + "_type": "slug", + "current": "nf-core-riboseq" + }, + "_type": "meta", + "shareImage": { + "_type": "image", + "asset": { + "_ref": "image-10399aee1fa48e4250f2e7ab3c7fb76ca3aa1ac4-1200x628-png", + "_type": "reference" + } + }, + "description": "nf-core/riboseq: A collaboration between Altos Labs and Seqera" + } + }, + { + "body": [ + { + "style": "normal", + "_key": "3ddbc0f6780a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Through a partnership between AWS Asia Pacific and Japan, and Seqera, Nextflow touched ground in South Korea for the first time with a training session at the Korea Genome Organization (KOGO) Winter Symposium. The objective was to introduce participants to Nextflow, empowering them to craft their own pipelines. Recognizing the interest among bioinformaticians, MinSung Cho from AWS Korea’s Healthcare & Research Team decided to sponsor this 90-minute workshop session. This initiative covered my travel expenses and accommodations.", + "_key": "15827bb9adb4" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "fea4de287794", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1160821ab2e5" + }, + { + "_key": "693cbcdc5eb9", + "_type": "block" + }, + { + "_type": "image", + "alt": "Nextflow workshop at KOGO Winter Symposium 2024", + "_key": "dfa401f56a2f", + "asset": { + "_ref": "image-1113aa37834d3dd5de51eebdde898a49b7b4fad5-1600x1200-jpg", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "3affeea2c2b6", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/nanoseq/3.1.0", + "_key": "94079640c52c" + }, + { + "_key": "8e2b90528221", + "_type": "link", + "href": "https://github.com/nf-core/tools" + }, + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=KM1A0_GD2vQ", + "_key": "436a47aba2b8" + } + ], + "children": [ + { + "marks": [], + "text": "The training commenced with an overview of Nextflow pipelines, exemplified by the ", + "_key": "11dba3b52d89", + "_type": "span" + }, + { + "text": "nf-core/nanoseq", + "_key": "407c839220d9", + "_type": "span", + "marks": [ + "94079640c52c" + ] + }, + { + "_key": "9d8e8038f63d", + "_type": "span", + "marks": [], + "text": " Nextflow pipeline, highlighting the subworkflows and modules. nfcore/nanoseq is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data that can be used to perform base-calling, demultiplexing, QC, alignment, and downstream analysis. Following this, participants engaged in a hands-on workshop using the AWS Cloud9 environment. In 70 minutes, they constructed a basic pipeline for analyzing nanopore sequencing data, incorporating workflow templates, modules, and subworkflows from " + }, + { + "marks": [ + "8e2b90528221" + ], + "text": "nf-core/tools", + "_key": "182529b5389c", + "_type": "span" + }, + { + "marks": [], + "text": ". If you're interested in learning more about the nf-core/nanoseq Nextflow pipeline, I recorded a video talking about it in the nf-core bytesize meeting. You can watch it ", + "_key": "9d08844ac924", + "_type": "span" + }, + { + "text": "here", + "_key": "ad4616778011", + "_type": "span", + "marks": [ + "436a47aba2b8" + ] + }, + { + "_key": "7705a45ea03e", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "ca2283886b2d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "398c3c5471f1" + }, + { + "asset": { + "_ref": "image-0716bd3af2a7fa5b5fec2494ae09dcf0d52fba18-2446x1378-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Slide from Nextflow workshop at KOGO Winter Symposium 2024", + "_key": "6c9e74f6149d" + }, + { + "style": "normal", + "_key": "2b4b36f291b5", + "markDefs": [ + { + "_key": "67b6cfd65e72", + "_type": "link", + "href": "https://docs.google.com/presentation/d/1OC4ccgbrNet4e499ShIT7S6Gm6S0xr38_OauKPa4G88/edit?usp=sharing" + }, + { + "_type": "link", + "href": "https://github.com/yuukiiwa/nf-core-koreaworkshop", + "_key": "4e92934532b6" + } + ], + "children": [ + { + "_key": "ba3493be440f", + "_type": "span", + "marks": [], + "text": "You can find the workshop slides " + }, + { + "text": "here", + "_key": "35a089390cea", + "_type": "span", + "marks": [ + "67b6cfd65e72" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and the GitHub repository with source code ", + "_key": "2eb706b73483" + }, + { + "_type": "span", + "marks": [ + "4e92934532b6" + ], + "text": "here", + "_key": "b2b180a3f699" + }, + { + "_key": "2112eed2e327", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "_key": "1c2c71553e77", + "children": [ + { + "text": "", + "_key": "c5cc1a18691c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c7925a5bc2b2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The workshop received positive feedback, with participants expressing interest in further sessions to deepen their Nextflow proficiency. Due to this feedback, AWS and the nf-core outreach team are considering organizing small-group local or Zoom training sessions in response to these requests.", + "_key": "fe6bdf88576e" + } + ] + }, + { + "style": "normal", + "_key": "9f573907ae2a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "63868748e237" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "7cdc8fe30886", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is imperative to acknowledge the invaluable contributions and support from AWS Korea’s Health Care & Research Team, including MinSung Cho, HyunMin Kim, YoungUng Kim, SeungChang Kang, and Jiyoon Hwang, without whom this workshop would not have been possible. Gratitude is also extended to Charlie Lee for fostering collaboration with the nf-core/outreach team.", + "_key": "91da4b118fd8" + } + ] + } + ], + "tags": [ + { + "_key": "63bb211343c3", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_updatedAt": "2024-09-26T09:05:00Z", + "meta": { + "slug": { + "current": "nxf-nf-core-workshop-kogo" + } + }, + "_rev": "2PruMrLMGpvZP5qAknm78W", + "_type": "blogPost", + "_createdAt": "2024-09-25T14:18:31Z", + "author": { + "_ref": "ntV3A5cVsWRByk7zltFcwH", + "_type": "reference" + }, + "_id": "0e55bd6fedc9", + "publishedAt": "2024-03-14T07:00:00.000Z", + "title": "Nextflow workshop at the 20th KOGO Winter Symposium" + }, + { + "meta": { + "slug": { + "current": "nextflow-with-gbatch" + } + }, + "body": [ + { + "children": [ + { + "marks": [ + "46532c5914ac" + ], + "text": "We have talked about Google Cloud Batch before", + "_key": "b4a1cc918280", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The ", + "_key": "20b9df914b63" + }, + { + "text": "Nextflow official documentation", + "_key": "f99f39ade5ae", + "_type": "span", + "marks": [ + "0d07d5df5d3f" + ] + }, + { + "_type": "span", + "marks": [], + "text": " also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳", + "_key": "c591fd0e447a" + } + ], + "_type": "block", + "style": "normal", + "_key": "4649b669d1dd", + "markDefs": [ + { + "href": "https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html", + "_key": "46532c5914ac", + "_type": "link" + }, + { + "_key": "0d07d5df5d3f", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/google.html" + } + ] + }, + { + "style": "normal", + "_key": "8eb442784072", + "children": [ + { + "_key": "31dc0ef53452", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "14fae6b49728", + "children": [ + { + "_key": "090be1a79409", + "_type": "span", + "text": "Running an RNAseq pipeline with Google Cloud Batch" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "marks": [], + "text": "Welcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check ", + "_key": "44dd00decee7", + "_type": "span" + }, + { + "marks": [ + "427e6a38fc12" + ], + "text": "nf-core/rnaseq", + "_key": "65f2b5b882ae", + "_type": "span" + }, + { + "marks": [], + "text": ". For the proof-of-concept RNAseq pipeline that we will use here, check ", + "_key": "a7b9f4ebe96b", + "_type": "span" + }, + { + "marks": [ + "5ff301bdcbdc" + ], + "text": "nextflow-io/rnaseq-nf", + "_key": "52d6ae27ef64", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "17bc9da0f2f7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bea5369d85c8", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/rnaseq", + "_key": "427e6a38fc12" + }, + { + "href": "https://github.com/nextflow-io/rnaseq-nf", + "_key": "5ff301bdcbdc", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "41fc79f02b25", + "children": [ + { + "_key": "3f41aeb8d77b", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "fb5d83ab259f", + "markDefs": [], + "children": [ + { + "_key": "ae719a82b164", + "_type": "span", + "marks": [], + "text": "Nextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "fa66649596e9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "af2c64c1704c" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "We will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!", + "_key": "e550a19a3312", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2a80ab269d30", + "markDefs": [] + }, + { + "children": [ + { + "_key": "e96fcb32b940", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bb5a1cd49cf3" + }, + { + "_type": "block", + "style": "h3", + "_key": "69bc270b7a93", + "children": [ + { + "text": "Setting up Google Cloud CLI (gcloud)", + "_key": "9dba215d7da8", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "205f28088e9c", + "markDefs": [ + { + "href": "https://cloud.google.com/sdk/docs/install", + "_key": "3c67e3a142ba", + "_type": "link" + } + ], + "children": [ + { + "_key": "17ec0a58b37a", + "_type": "span", + "marks": [], + "text": "In this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions " + }, + { + "_type": "span", + "marks": [ + "3c67e3a142ba" + ], + "text": "here", + "_key": "186e87235cb9" + }, + { + "marks": [], + "text": " to install it. Once you have gcloud installed, run the command ", + "_key": "3038220f1b5f", + "_type": "span" + }, + { + "text": "gcloud init", + "_key": "15fa3021f802", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project "my-rnaseq-pipeline". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.", + "_key": "8429998e9683" + } + ] + }, + { + "_key": "8801184f55e9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "777bd5ea0394" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Setting up Batch and Storage in Google Cloud Platform", + "_key": "eedb31169aaa" + } + ], + "_type": "block", + "style": "h3", + "_key": "c37b701f727c" + }, + { + "children": [ + { + "_key": "f14713d6d4af", + "_type": "span", + "text": "Enable Google Batch" + } + ], + "_type": "block", + "style": "h4", + "_key": "ab72c428fa0e" + }, + { + "children": [ + { + "text": "According to the ", + "_key": "f7a60d7ecd4e", + "_type": "span", + "marks": [] + }, + { + "_key": "2d6145bb7907", + "_type": "span", + "marks": [ + "28252753c849" + ], + "text": "official Google documentation" + }, + { + "_key": "1b0903328014", + "_type": "span", + "marks": [], + "text": " " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale", + "_key": "3f97db81f636" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "2d1bd7a00989" + } + ], + "_type": "block", + "style": "normal", + "_key": "75af9d0f088e", + "markDefs": [ + { + "href": "https://cloud.google.com/batch/docs/get-started", + "_key": "28252753c849", + "_type": "link" + } + ] + }, + { + "children": [ + { + "_key": "69ea0742f94d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5040271b7464" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The first step is to download the ", + "_key": "1c1aa05a4f6b", + "_type": "span" + }, + { + "text": "beta", + "_key": "25e679175bd4", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " command group. You can do this by executing:", + "_key": "c75f8c836136", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "661825f15f6c" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "90cc0b1ab3f9" + } + ], + "_type": "block", + "style": "normal", + "_key": "7e214178bebe" + }, + { + "code": "$ gcloud components install beta", + "_type": "code", + "_key": "835a4ca38cf9" + }, + { + "style": "normal", + "_key": "c2ad92d563c1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7a158845ce56" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "70e8032679f5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Then, enable billing for this project. You will first need to get your account id with", + "_key": "a1ee6ca9d18e", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "7d0ce4c2119c", + "children": [ + { + "_key": "7e72b15ee683", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "50ff076983b0", + "code": "$ gcloud beta billing accounts list" + }, + { + "_key": "4195dfe1ea37", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b737505423a4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "524fa7a1eb2d", + "markDefs": [], + "children": [ + { + "text": "After that, you will see something like the following appear in your window:", + "_key": "5ab19436a930", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "a8758fd7f668", + "children": [ + { + "text": "", + "_key": "0c25657b093a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b07972ec3773", + "code": "ACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True", + "_type": "code" + }, + { + "style": "normal", + "_key": "83185677499c", + "children": [ + { + "_key": "b9e4cccf8227", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "7eb027cfdc57", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.", + "_key": "330f7d74a793", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "c145c7f7579f", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "07fb6666dd89" + }, + { + "_type": "code", + "_key": "ec8ac3156e35", + "code": "$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ" + }, + { + "style": "normal", + "_key": "9832bacce6ac", + "children": [ + { + "_type": "span", + "text": "", + "_key": "02546a432cab" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Next, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:", + "_key": "e450e0eb53d9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b23c4c13ab0c" + }, + { + "children": [ + { + "text": "", + "_key": "55db5d6abe0d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5860dca2e6e1" + }, + { + "_key": "a76f1708fd2c", + "code": "$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com", + "_type": "code" + }, + { + "style": "normal", + "_key": "bbc58295ae43", + "children": [ + { + "_type": "span", + "text": "", + "_key": "85ce60ce4c08" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "aa40d9186315", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You should see a message similar to the one below:", + "_key": "ecfe9ee6ed90", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "96623ecbd2fb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "478838fb4aa1" + }, + { + "code": "Operation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.", + "_type": "code", + "_key": "13b2c8def5d7" + }, + { + "_type": "block", + "style": "normal", + "_key": "447269e67a29", + "children": [ + { + "text": "", + "_key": "4cf8182570f7", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "2e57cbc2616f", + "_type": "span", + "text": "Create a Service Account" + } + ], + "_type": "block", + "style": "h4", + "_key": "5fc786597ec0" + }, + { + "_key": "1ba584cd052a", + "markDefs": [ + { + "href": "https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud", + "_key": "847a3cb2f2b1", + "_type": "link" + } + ], + "children": [ + { + "_key": "bdbc9cd54151", + "_type": "span", + "marks": [], + "text": "In order to access the APIs we enabled, you need to " + }, + { + "marks": [ + "847a3cb2f2b1" + ], + "text": "create a Service Account", + "_key": "f8754daffbe4", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and set the necessary IAM roles for the project. You can create the Service Account by executing:", + "_key": "292a6706da91" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d0de6d313002" + } + ], + "_type": "block", + "style": "normal", + "_key": "b44b9fbd646d" + }, + { + "code": "$ gcloud iam service-accounts create rnaseq-pipeline-sa", + "_type": "code", + "_key": "ad4bdca3e7fc" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ee17255d7084" + } + ], + "_type": "block", + "style": "normal", + "_key": "d0aa14575fff" + }, + { + "_type": "block", + "style": "normal", + "_key": "f1b2318907a6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After this, set appropriate roles for the project using the commands below:", + "_key": "e04b70cf3db5" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "12c91813881c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a6810ad3fc2e" + } + ] + }, + { + "_key": "7d13f848f2f1", + "code": "$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"", + "_type": "code" + }, + { + "children": [ + { + "_key": "e875150619c1", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ec64464c2904" + }, + { + "children": [ + { + "text": "Create your Bucket", + "_key": "e3d3b9d98e3b", + "_type": "span" + } + ], + "_type": "block", + "style": "h4", + "_key": "f4aa41527974" + }, + { + "children": [ + { + "marks": [], + "text": "Now it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name", + "_key": "aea32544ddbb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c5f2664254e4", + "markDefs": [] + }, + { + "style": "normal", + "_key": "cd8cfca1df31", + "children": [ + { + "_type": "span", + "text": "", + "_key": "764d4980c45a" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "3047bd50bec2", + "code": "$ gcloud storage buckets create gs://rnaseq-pipeline-bckt" + }, + { + "_key": "23f682b84f50", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7b9a1734f864" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "72168f669946", + "markDefs": [], + "children": [ + { + "_key": "60aadeda5d84", + "_type": "span", + "marks": [], + "text": "Now it's time for Nextflow to join the party! 🥳" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "81ae14ff217c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e48066d029e4" + } + ] + }, + { + "children": [ + { + "_key": "52387da857ae", + "_type": "span", + "text": "Setting up Nextflow to make use of Batch and Storage" + } + ], + "_type": "block", + "style": "h3", + "_key": "ef10a5c1bd12" + }, + { + "_type": "block", + "style": "h4", + "_key": "b8c5b54c4fcb", + "children": [ + { + "_key": "a6b6f2c1fcb4", + "_type": "span", + "text": "Write the configuration file" + } + ] + }, + { + "style": "normal", + "_key": "a64ea1bb143f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Here you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.", + "_key": "77b38786aaf1" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8a54e6cb5f2c" + } + ], + "_type": "block", + "style": "normal", + "_key": "3fcff2378bc4" + }, + { + "children": [ + { + "marks": [], + "text": "Start by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally.", + "_key": "0864d5a89f16", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3b770c6f35c3", + "markDefs": [] + }, + { + "style": "normal", + "_key": "d1c559e688d7", + "children": [ + { + "text": "", + "_key": "c5925ece0d84", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "49396905ad09", + "markDefs": [], + "children": [ + { + "text": "Inside the folder that you created for the project, create a file named ", + "_key": "907c4f5ad88a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "def2c020fae8" + }, + { + "marks": [], + "text": " with the following content (remember to replace PROJECT-ID with the project id you created above):", + "_key": "cc3b7bce788b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "df579ad7106f", + "children": [ + { + "text": "", + "_key": "a02ccabc2d79", + "_type": "span" + } + ] + }, + { + "code": "workDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}", + "_type": "code", + "_key": "bfa5cca99c69" + }, + { + "_type": "block", + "style": "normal", + "_key": "98d65726a516", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2c8ece844ecc" + } + ] + }, + { + "children": [ + { + "text": "The ", + "_key": "0654936d159f", + "_type": "span", + "marks": [] + }, + { + "_key": "2aa97fb5721a", + "_type": "span", + "marks": [ + "code" + ], + "text": "workDir" + }, + { + "_key": "e4e06aff9f3b", + "_type": "span", + "marks": [], + "text": " option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice." + } + ], + "_type": "block", + "style": "normal", + "_key": "77ffe96a523b", + "markDefs": [] + }, + { + "_key": "602c084fe2df", + "children": [ + { + "text": "", + "_key": "e2748099361c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "700ef61290f0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "43f0459bb9cd", + "_type": "span" + }, + { + "_key": "37ccb6e69e32", + "_type": "span", + "marks": [ + "code" + ], + "text": "process" + }, + { + "_type": "span", + "marks": [], + "text": " scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the ", + "_key": "612c5f17b82e" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow/rnaseq-nf", + "_key": "c461db2400ad" + }, + { + "_type": "span", + "marks": [], + "text": " Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed.", + "_key": "f85d9cfa0759" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "7df72c094a49", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0d276d928470" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "08fce59dc3a2" + }, + { + "marks": [ + "code" + ], + "text": "google", + "_key": "0b85966cada5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances ", + "_key": "f80063b91e65" + }, + { + "text": "here", + "_key": "961d20fa85d9", + "_type": "span", + "marks": [ + "bb2936cf5ab8" + ] + }, + { + "_key": "2c91a5815680", + "_type": "span", + "marks": [], + "text": "), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the " + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "5e1e957ae643", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " file should contain "rnaseq-nxf" as the project id.", + "_key": "c6366d563a98" + } + ], + "_type": "block", + "style": "normal", + "_key": "272b9ef92bc9", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/google.html#spot-instances", + "_key": "bb2936cf5ab8" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "890a8ebfebc8", + "children": [ + { + "_key": "27729dc1e5a0", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "4a448756320a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.", + "_key": "4bdada2e36c3" + } + ], + "_type": "block" + }, + { + "_key": "8727de833ac3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0fbcd665df67" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "f488d364d980", + "code": "$ gcloud auth application-default login" + }, + { + "_type": "block", + "style": "normal", + "_key": "0aeb881f24b7", + "children": [ + { + "text": "", + "_key": "f0e00801c9f4", + "_type": "span" + } + ] + }, + { + "style": "h4", + "_key": "8ccbb8d91e90", + "children": [ + { + "_key": "cf451d36d8ca", + "_type": "span", + "text": "Launch the pipeline!" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c8f94520bac9", + "markDefs": [ + { + "_key": "bf3ec4dfea75", + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of ", + "_key": "0aceea84d950" + }, + { + "marks": [ + "bf3ec4dfea75" + ], + "text": "this tutorial", + "_key": "820002441a2e", + "_type": "span" + }, + { + "marks": [], + "text": ", and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at <https: training.nextflow.io="">.", + "_key": "52f2161a08d9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "53d76002b446", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b448f79a4dcd" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "One important thing to mention is that in this repository there is already a ", + "_key": "8a51df595c74", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "6ad035af60a9", + "_type": "span" + }, + { + "text": " file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the ", + "_key": "a610e7f77a8e", + "_type": "span", + "marks": [] + }, + { + "text": "-c", + "_key": "7290d7c3e7ef", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "35f5b94c5c5a", + "_type": "span", + "marks": [], + "text": " Nextflow parameter. Run the command line below:" + } + ], + "_type": "block", + "style": "normal", + "_key": "be4636722fc8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "e90a099c9c63", + "children": [ + { + "_key": "f841ab6b5771", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "7d482473d32b", + "code": "$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "7a6a5b37fd9d", + "children": [ + { + "text": "", + "_key": "05417cd2d683", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "fdf87214a7f7", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5", + "_key": "ecffbe50d053", + "_type": "link" + } + ], + "children": [ + { + "text": "While the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called ", + "_key": "b5e70e86e7d5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "results", + "_key": "57cdfeb51cd3" + }, + { + "_key": "1bb98dfa2883", + "_type": "span", + "marks": [], + "text": ", because of how the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "publishDir", + "_key": "7f24faa22f2f" + }, + { + "text": " directive was specified in the ", + "_key": "ec9fc60f4f41", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "main.nf", + "_key": "d94baec681fc", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " script (example ", + "_key": "2cf5b0503440" + }, + { + "text": "here", + "_key": "3c0ca9c35f09", + "_type": "span", + "marks": [ + "ecffbe50d053" + ] + }, + { + "text": "). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the ", + "_key": "e3eb83dafbc7", + "_type": "span", + "marks": [] + }, + { + "_key": "e2663d779e5f", + "_type": "span", + "marks": [ + "code" + ], + "text": "publishDir" + }, + { + "marks": [], + "text": " to another bucket directory, e.g. ", + "_key": "cda80dbda0dd", + "_type": "span" + }, + { + "text": "gs://rnaseq-pipeline-bckt/results", + "_key": "8515bf18a5d9", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "f1a5c5569291" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7bae53def502" + } + ], + "_type": "block", + "style": "normal", + "_key": "fe166b1cfebe" + }, + { + "children": [ + { + "_key": "a5d345b3c49b", + "_type": "span", + "marks": [], + "text": "In your terminal, you should see something like this:" + } + ], + "_type": "block", + "style": "normal", + "_key": "31dd792d1971", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "e561e553fdfa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2988b5de445d" + }, + { + "alt": "Nextflow ongoing run on Google Cloud Batch", + "_key": "57b6920fd4d5", + "asset": { + "_ref": "image-f828eff746c5383b57ed1a8943f8bfd64f224475-1714x656-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "6f0a9dfe3b2b" + } + ], + "_type": "block", + "style": "normal", + "_key": "b754a6329358" + }, + { + "style": "normal", + "_key": "4e0d6e461a2e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You can check the status of your jobs on Google Batch by opening another terminal and running the following command:", + "_key": "d2bad5bbba43", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e71d191cf0a4", + "children": [ + { + "_key": "06179d9ea5e4", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "042a2d28ce93", + "code": "$ gcloud batch jobs list" + }, + { + "_type": "block", + "style": "normal", + "_key": "e2fd1c323765", + "children": [ + { + "_key": "630d9653ec5d", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b4f5c23a00dc", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "By the end of it, if everything worked well, you should see something like:", + "_key": "256c3085089f", + "_type": "span" + } + ] + }, + { + "_key": "39a762d7636b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b34980829a20" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "alt": "Nextflow run on Google Cloud Batch finished", + "_key": "47d737b659d2", + "asset": { + "_ref": "image-79042f36d93b14d4f6efb0eafa75d2043f6b797e-1728x866-png", + "_type": "reference" + } + }, + { + "children": [ + { + "_key": "bbf66adba559", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f445bee37ae3" + }, + { + "children": [ + { + "_key": "7a4c0728b276", + "_type": "span", + "marks": [], + "text": "And that's all, folks! 😆" + } + ], + "_type": "block", + "style": "normal", + "_key": "6a911548ebb7", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "d5cded0acc2a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fa6a8b7a8ab9" + }, + { + "_type": "block", + "style": "normal", + "_key": "0d663c3aa42a", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html", + "_key": "e04f54fcfedf" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/google.html", + "_key": "79f110285473" + } + ], + "children": [ + { + "_key": "54d4d3a3b8dc", + "_type": "span", + "marks": [], + "text": "You will find more information about Nextflow on Google Batch in " + }, + { + "marks": [ + "e04f54fcfedf" + ], + "text": "this blog post", + "_key": "982ebc5705a1", + "_type": "span" + }, + { + "text": " and the ", + "_key": "3faefd1b3a4d", + "_type": "span", + "marks": [] + }, + { + "text": "official Nextflow documentation", + "_key": "2fb54fa8c31e", + "_type": "span", + "marks": [ + "79f110285473" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "587770709603" + } + ] + }, + { + "_key": "0cfd24fe661b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "340bdcc16d70" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "0c7956bb62ab", + "markDefs": [], + "children": [ + { + "text": "Special thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document. ", + "_key": "7861ad82e5c5", + "_type": "span", + "marks": [] + }, + { + "_key": "e877a2c39df0", + "_type": "span", + "text": "
" + } + ] + } + ], + "_rev": "g7tG3ShgLiOybM4TXYts9d", + "_type": "blogPost", + "_createdAt": "2024-09-25T14:17:38Z", + "_id": "0f6fb8a71436", + "publishedAt": "2023-02-01T07:00:00.000Z", + "title": "Get started with Nextflow on Google Cloud Batch", + "author": { + "_ref": "mNsm4Vx1W1Wy6aYYkroetD", + "_type": "reference" + }, + "_updatedAt": "2024-09-25T14:17:38Z" + }, + { + "title": "Learn Nextflow in 2023", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_rev": "Ot9x7kyGeH5005E3MJ8hC8", + "publishedAt": "2023-02-24T07:00:00.000Z", + "_type": "blogPost", + "_updatedAt": "2024-10-07T09:32:40Z", + "_createdAt": "2024-09-25T14:17:25Z", + "_id": "0fdf9ea70365", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "5bc75c55bad2", + "markDefs": [ + { + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "dca21c4b2cf3", + "_type": "link" + }, + { + "_key": "ddf6d5188a96", + "_type": "link", + "href": "https://nf-co.re/events/training/" + }, + { + "_type": "link", + "href": "https://github.com/seqeralabs/wave-showcase", + "_key": "fb2bb6d334df" + } + ], + "children": [ + { + "marks": [], + "text": "In 2023, the world of Nextflow is more exciting than ever! With new resources constantly being released, there is no better time to dive into this powerful tool. From a new ", + "_key": "a11e41bb71bb", + "_type": "span" + }, + { + "_key": "d305cac4733b", + "_type": "span", + "marks": [ + "dca21c4b2cf3" + ], + "text": "Software Carpentries’" + }, + { + "_type": "span", + "marks": [], + "text": " course to ", + "_key": "a25adf9b8040" + }, + { + "marks": [ + "ddf6d5188a96" + ], + "text": "recordings of mutiple nf-core training events", + "_key": "7beb581f0d16", + "_type": "span" + }, + { + "_key": "967772d3efed", + "_type": "span", + "marks": [], + "text": " to " + }, + { + "_key": "759b6c5382c8", + "_type": "span", + "marks": [ + "fb2bb6d334df" + ], + "text": "new tutorials on Wave and Fusion" + }, + { + "text": ", the options for learning Nextflow are endless.", + "_key": "a31188af34f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ef8cd1145e60" + } + ], + "_type": "block", + "style": "normal", + "_key": "0bce358b9a98" + }, + { + "markDefs": [ + { + "_key": "a4c63f7dbbab", + "_type": "link", + "href": "https://github.com/nextflow-io/" + } + ], + "children": [ + { + "text": "We've compiled a list of the best resources in 2023 to make your journey to Nextflow mastery as seamless as possible. And remember, Nextflow is a community-driven project. If you have suggestions or want to contribute to this list, head to the ", + "_key": "0c281e8a9d10", + "_type": "span", + "marks": [] + }, + { + "_key": "621a6366c1f9", + "_type": "span", + "marks": [ + "a4c63f7dbbab" + ], + "text": "GitHub page" + }, + { + "_type": "span", + "marks": [], + "text": " and make a pull request.", + "_key": "6ec13768132e" + } + ], + "_type": "block", + "style": "normal", + "_key": "e89fbd3c3b7b" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "85ee8ab68a65", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "17c429ef2098" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Before you start", + "_key": "b1c8ce4c8ad9" + } + ], + "_type": "block", + "style": "h2", + "_key": "68808c4cdf96", + "markDefs": [] + }, + { + "_key": "94f3843def7c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Before learning Nextflow, you should be comfortable with the Linux command line and be familiar with some basic scripting languages, such as Perl or Python. The beauty of Nextflow is that task logic can be written in your language of choice. You will just need to learn Nextflow’s domain-specific language (DSL) to control overall flow.", + "_key": "88663651e1d1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "819d38804ac0", + "markDefs": [], + "children": [ + { + "_key": "0dec2b379daf", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "388a202e45da", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow is widely used in bioinformatics, so many tutorials focus on life sciences. However, Nextflow can be used for almost any data-intensive workflow, including image analysis, ML model training, astronomy, and geoscience applications.", + "_key": "10557b975aee" + } + ], + "_type": "block" + }, + { + "_key": "064da3cd5896", + "markDefs": [], + "children": [ + { + "_key": "0f2993a5f54c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "299694f8186e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "So, let's get started! These resources will guide you from beginner to expert and make you unstoppable in the field of scientific workflows.", + "_key": "af45f4dc2b19" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "2c6e3d4448d8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Why learn Nextflow", + "_key": "af1e64688665" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "378a77ee0a16", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "There are hundreds of workflow managers to choose from. In fact, Meir Wahnon and several of his colleagues have gone to the trouble of compiling an awesome-workflow-engines list. The workflows community initiative is another excellent source of information about workflow engines.", + "_key": "9607ccc72ad6" + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a611120065c1", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.go-fair.org/fair-principles/", + "_key": "908929814ea5" + } + ], + "children": [ + { + "marks": [], + "text": "Using Nextflow in your analysis workflows helps you implement reproducible pipelines. Nextflow pipelines follow ", + "_key": "7343940ce3760", + "_type": "span" + }, + { + "text": "FAIR guidelines", + "_key": "7343940ce3761", + "_type": "span", + "marks": [ + "908929814ea5" + ] + }, + { + "marks": [], + "text": " (findability, accessibility, interoperability, and reuse). Nextflow also supports version control and containers to manage all software dependencies.", + "_key": "7343940ce3762", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow is portable; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon AWS, Microsoft Azure, Google Cloud Platform, or Kubernetes. With features like ", + "_key": "ff3131edffdf0" + }, + { + "_key": "ff3131edffdf1", + "_type": "span", + "marks": [ + "ffaad2941792" + ], + "text": "configuration profiles" + }, + { + "text": ", code can be written so that it is 100% portable across different on-prem and cloud infrastructures enabling collaboration and avoiding lock-in.", + "_key": "ff3131edffdf2", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9ef47040de69", + "listItem": "bullet", + "markDefs": [ + { + "_key": "ffaad2941792", + "_type": "link", + "href": "https://nextflow.io/docs/latest/config.html?#config-profiles" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is massively ", + "_key": "f579f0dc175a0" + }, + { + "_key": "f579f0dc175a1", + "_type": "span", + "marks": [ + "strong" + ], + "text": "scalable" + }, + { + "text": ", allowing the parallelization of tasks using the dataflow paradigm without hard-coding pipelines to specific platforms, workload managers, or batch services.", + "_key": "f579f0dc175a2", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a23bae9cb866", + "listItem": "bullet", + "markDefs": [] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Nextflow is ", + "_key": "9de750c226da0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "flexible", + "_key": "9de750c226da1", + "_type": "span" + }, + { + "_key": "9de750c226da2", + "_type": "span", + "marks": [], + "text": ", supporting scientific workflow requirements like caching processes to avoid redundant computation and workflow reporting to help understand and diagnose workflow execution patterns." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6b154f3779b4" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "36d95e2b912b", + "_type": "link", + "href": "https://seqera.io/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727429443081.1727441286556.79&__hssc=247481240.5.1727441286556&__hsfp=3485190257" + } + ], + "children": [ + { + "marks": [], + "text": "It is ", + "_key": "47ff0b1272840", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "growing fast", + "_key": "47ff0b1272841", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "47ff0b1272842" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "support is available", + "_key": "47ff0b1272843" + }, + { + "_key": "47ff0b1272844", + "_type": "span", + "marks": [], + "text": " from " + }, + { + "_key": "47ff0b1272845", + "_type": "span", + "marks": [ + "36d95e2b912b" + ], + "text": "Seqera Labs" + }, + { + "text": ". The project has been active since 2013 with a vibrant developer community, and the Nextflow ecosystem continues to expand rapidly.", + "_key": "47ff0b1272846", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "87457c2d2387" + }, + { + "children": [ + { + "marks": [], + "text": "Finally, Nextflow is open source and licensed under Apache 2.0. You are free to use it, modify it, and distribute it.", + "_key": "c1d7c96cdc990", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6a6d4e75342a", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "acaaa9abc8bb", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Meet the tutorials!", + "_key": "520e77a4d51b0", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Some of the best publicly available tutorials are listed below:\n", + "_key": "1814aa85ac1a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0306a8d546a0" + }, + { + "_key": "fc2b553c372a", + "markDefs": [], + "children": [ + { + "_key": "469cbfa0d32b0", + "_type": "span", + "marks": [], + "text": "\n1. Basic Nextflow Community Training" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "139ceb32ed9a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Basic training for all things Nextflow. Perfect for anyone looking to get to grips with using Nextflow to run analyses and build workflows. This is the primary Nextflow training material used in most Nextflow and nf-core training events. It covers a large number of topics, with both theoretical and hands-on chapters.", + "_key": "5e34ff7e1f5b0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://training.nextflow.io/basic_training/", + "_key": "4826f34c7fa2" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "4826f34c7fa2" + ], + "text": "Basic Nextflow Community Training", + "_key": "559362c52e500" + } + ], + "_type": "block", + "style": "normal", + "_key": "51e093abc5d1" + }, + { + "style": "normal", + "_key": "7f6120e25fc3", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2023/training-basic-2023", + "_key": "a714986f5b10" + }, + { + "_key": "c7886dbb5375", + "_type": "link", + "href": "https://youtu.be/ERbTqLtAkps?si=6xDoDXsb6kGQ_Qa8" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We run a free online training event for this course approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community. You can watch the recording of the most recent training (", + "_key": "71ce07d47afe0" + }, + { + "marks": [ + "a714986f5b10" + ], + "text": "September, 2023", + "_key": "71ce07d47afe1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ") in the ", + "_key": "71ce07d47afe2" + }, + { + "marks": [ + "c7886dbb5375" + ], + "text": "YouTube playlist", + "_key": "71ce07d47afe3", + "_type": "span" + }, + { + "_key": "71ce07d47afe4", + "_type": "span", + "marks": [], + "text": " below:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "afc4934b74dc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "16a31f72758e", + "markDefs": [] + }, + { + "_type": "youtube", + "id": "ERbTqLtAkps", + "_key": "8c775e987678" + }, + { + "_key": "5e2f1d1af3b1", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "b15ebcb19b5f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "840b354698e9", + "markDefs": [], + "children": [ + { + "text": "2. Hands-on Nextflow Community Training", + "_key": "acafec39a7f50", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0da82531a32d", + "markDefs": [], + "children": [ + { + "text": "A \"learn by doing\" tutorial with less focus on theory, instead leading through exercises of slowly increasing complexity. This course is quite short and hands-on, great if you want to practice your Nextflow skills.", + "_key": "e41ac9000a35", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "bd855987346e", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b5ff0c647ea4", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Hands-on Nextflow Community Training", + "_key": "fe5ec95255bc", + "_type": "span", + "marks": [ + "4f1437070eda" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "deb48a263735", + "markDefs": [ + { + "_key": "4f1437070eda", + "_type": "link", + "href": "https://training.nextflow.io/hands_on/" + } + ] + }, + { + "_key": "b520561477f5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a721cf3aeebd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://nf-co.re/events/2023/training-hands-on-2023/", + "_key": "b411ad157951", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "You can watch the recording of the most recent training (", + "_key": "aa0f3bdbba4d", + "_type": "span" + }, + { + "marks": [ + "b411ad157951" + ], + "text": "September, 2023", + "_key": "ee4dc766bc82", + "_type": "span" + }, + { + "text": ") below:", + "_key": "888ea33a7616", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dc55bd204369" + }, + { + "id": "x5klpxczAXA", + "_key": "786c23c08e30", + "_type": "youtube" + }, + { + "_key": "d4a2b81dd764", + "markDefs": [], + "children": [ + { + "_key": "d282de5542440", + "_type": "span", + "marks": [], + "text": "\n3. Advanced Nextflow Community Training" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "8017cc325628", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "644459aaeef1" + } + ] + }, + { + "style": "normal", + "_key": "145df6ca04f2", + "markDefs": [], + "children": [ + { + "text": "An advanced material exploring the advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows. This is the Nextflow training material used in advanced training events.", + "_key": "bfc62ea17c9b", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9de862f9709b" + } + ], + "_type": "block", + "style": "normal", + "_key": "09a870b3178e", + "markDefs": [] + }, + { + "children": [ + { + "_key": "de5d8b67a4d5", + "_type": "span", + "marks": [ + "62fbcb9fadb1" + ], + "text": "Advanced Nextflow Community Training" + } + ], + "_type": "block", + "style": "normal", + "_key": "d733adf27c15", + "markDefs": [ + { + "_key": "62fbcb9fadb1", + "_type": "link", + "href": "https://training.nextflow.io/advanced/" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0178a2256c13", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5c36587f3e94" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can watch the recording of the most recent training (", + "_key": "003027e033f7" + }, + { + "_type": "span", + "marks": [ + "61d96be41411" + ], + "text": "September, 2023", + "_key": "ca51aa5466f9" + }, + { + "_type": "span", + "marks": [], + "text": ") below:", + "_key": "01c30e142a75" + } + ], + "_type": "block", + "style": "normal", + "_key": "67692dd68831", + "markDefs": [ + { + "href": "https://nf-co.re/events/2023/training-sept-2023/", + "_key": "61d96be41411", + "_type": "link" + } + ] + }, + { + "_type": "youtube", + "id": "nPAH9owvKvI", + "_key": "eceef80e13ca" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n4. Software Carpentry workshop", + "_key": "5c4f8525cdf70" + } + ], + "_type": "block", + "style": "h3", + "_key": "5fa09ed8084d", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "08172885a6e5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6bdba13535d9", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "700ad48e2eb0" + }, + { + "text": "Nextflow Software Carpentry", + "_key": "bebd56829f17", + "_type": "span", + "marks": [ + "68152e533171" + ] + }, + { + "_key": "3370b97a6ce9", + "_type": "span", + "marks": [], + "text": " workshop (still being developed) explains the use of Nextflow and " + }, + { + "_key": "a5dfe2a9be09", + "_type": "span", + "marks": [ + "46d00b5a4e4c" + ], + "text": "nf-core" + }, + { + "marks": [], + "text": " as development tools for building and sharing reproducible data science workflows. The intended audience is those with little programming experience. The course provides a foundation to write and run Nextflow and nf-core workflows comfortably. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit The Carpentries training style. ", + "_key": "0605b3d36b45", + "_type": "span" + }, + { + "_key": "354b4c651b6b", + "_type": "span", + "marks": [ + "97fadeabeea1" + ], + "text": "The Carpentries" + }, + { + "text": " emphasize feedback to improve teaching materials, so we would like to hear back from you about what you thought was well-explained and what needs improvement. Pull requests to the course material are very welcome. The workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.", + "_key": "df445453f644", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4991d679b33f", + "markDefs": [ + { + "_type": "link", + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "68152e533171" + }, + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "46d00b5a4e4c" + }, + { + "_key": "97fadeabeea1", + "_type": "link", + "href": "https://carpentries.org/" + } + ] + }, + { + "_key": "374d664dc520", + "markDefs": [], + "children": [ + { + "_key": "33e2ea05db55", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "1d72bfdeab27", + "markDefs": [ + { + "_type": "link", + "href": "https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow", + "_key": "5830efc62a54" + } + ], + "children": [ + { + "text": "The workshop can be opened on ", + "_key": "125aaa9b1a2b", + "_type": "span", + "marks": [] + }, + { + "text": "Gitpod", + "_key": "7830a3dc4dca", + "_type": "span", + "marks": [ + "5830efc62a54" + ] + }, + { + "_type": "span", + "marks": [], + "text": " where you can try the exercises in an online computing environment at your own pace while referencing the course material in another window alongside the tutorials.", + "_key": "4846465745ed" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "da5a49b3bf26", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "667ad80fb604" + }, + { + "children": [ + { + "marks": [], + "text": "You can find the course in ", + "_key": "360795d3399a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9efeacbd9b2e" + ], + "text": "The Carpentries incubator", + "_key": "23d75751e3fe" + }, + { + "marks": [], + "text": ".", + "_key": "e79b1abf1ef8", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "773850db7f6c", + "markDefs": [ + { + "_type": "link", + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "9efeacbd9b2e" + } + ] + }, + { + "_key": "468ab6b2b45f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6441975c5b2d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n5. An introduction to Nextflow course from Uppsala University", + "_key": "5f7626af2ed00" + } + ], + "_type": "block", + "style": "h3", + "_key": "4cee87c197a9" + }, + { + "markDefs": [], + "children": [ + { + "text": "This 5-module course by Uppsala University covers the basics of Nextflow, from running Nextflow pipelines, writing your own pipelines and even using containers and conda.", + "_key": "eee49f7eb87c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "41a124bbf542" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "38d778aea799", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4b4c3044f190" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "af44442c1da9", + "markDefs": [ + { + "_key": "24889bbd5d04", + "_type": "link", + "href": "https://uppsala.instructure.com/courses/51980/pages/nextflow-1-introduction?module_item_id=328997" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The course can be viewed ", + "_key": "dc4976c72375" + }, + { + "marks": [ + "24889bbd5d04" + ], + "text": "here", + "_key": "6490c0bbf7cd", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "0e6e45aff751" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e0e7b77dd2a7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "cbf66d9fe0c3" + }, + { + "_type": "block", + "style": "h3", + "_key": "aefad24ed168", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n6. Introduction to Nextflow workshop by VIB", + "_key": "27acfe9e79d50" + } + ] + }, + { + "children": [ + { + "_key": "435bc619f9df", + "_type": "span", + "marks": [], + "text": "Workshop materials by VIB (mainly) in DSL2 aiming to get familiar with the Nextflow syntax by explaining basic concepts and building a simple RNAseq pipeline. Highlights also reproducibility aspects with adding containers (docker & singularity)." + } + ], + "_type": "block", + "style": "normal", + "_key": "278db12e0125", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f9cd12601e21" + } + ], + "_type": "block", + "style": "normal", + "_key": "5f8e54b2ca66" + }, + { + "_key": "6ee0eb601127", + "markDefs": [ + { + "_key": "b276e49295a5", + "_type": "link", + "href": "https://vibbits-nextflow-workshop.readthedocs.io/en/latest/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The course can be viewed ", + "_key": "c49b92be46ca" + }, + { + "_key": "cd3722fbb3f9", + "_type": "span", + "marks": [ + "b276e49295a5" + ], + "text": "here" + }, + { + "marks": [], + "text": ".", + "_key": "5a51ca711ec3", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_type": "block", + "style": "h3", + "_key": "74b5fd46c2a2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n7. Nextflow Training by Curtin Institute of Radio Astronomy (CIRA)", + "_key": "e331a618fd400" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "This training was prepared for physicists and has examples applied to astronomy which may be interesting for Nextflow users coming from this background!", + "_key": "8d13f2db1aa8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2fe4e6d15194" + }, + { + "style": "normal", + "_key": "357166a3aee5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9c8546e9ad40" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://carpentries-incubator.github.io/Pipeline_Training_with_Nextflow/", + "_key": "2d0ff7a9aaf8" + } + ], + "children": [ + { + "_key": "56d6b95e1e72", + "_type": "span", + "marks": [], + "text": "The course can be viewed " + }, + { + "marks": [ + "2d0ff7a9aaf8" + ], + "text": "here", + "_key": "eb12f7388297", + "_type": "span" + }, + { + "text": ".", + "_key": "04fc832b9dbf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "ff0ba5106608" + }, + { + "_type": "block", + "style": "normal", + "_key": "9104a2a76347", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "591315ebed0b", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "text": "\n8. Managing Pipelines in the Cloud - GenomeWeb Webinar", + "_key": "e6741e1d38930", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "33277ba862ab", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "This on-demand webinar features Phil Ewels from SciLifeLab, nf-core (now also Seqera Labs), Brendan Boufler from Amazon Web Services, and Evan Floden from Seqera Labs. The wide-ranging discussion covers the significance of scientific workflows, examples of Nextflow in production settings, and how Nextflow can be integrated with other processes.", + "_key": "dc6fc40ff5e4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "65603e15901f", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "fc8269195c23", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6f4991357ed2", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "5ba07236ea9d", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/events/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/", + "_key": "b6a02f33642f" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "b6a02f33642f" + ], + "text": "Watch the webinar", + "_key": "5cb1b49c3828" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c8eeeb58ed47", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6bfe91b90e75" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n9. Nextflow implementation patterns", + "_key": "2c216c0653b70" + } + ], + "_type": "block", + "style": "h3", + "_key": "74ec51472a20", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "This advanced documentation discusses recurring patterns in Nextflow and solutions to many common implementation requirements. Code examples are available with notes to follow along and a GitHub repository.", + "_key": "c4469aba33b0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0761f5f7242b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "bf535a0e7e9e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5d1c59f4e026", + "_type": "span" + } + ] + }, + { + "_key": "7089ce9b4a0f", + "markDefs": [ + { + "href": "http://nextflow-io.github.io/patterns/index.html", + "_key": "2ea8fd78e039", + "_type": "link" + }, + { + "_key": "f79220d04cd8", + "_type": "link", + "href": "https://github.com/nextflow-io/patterns" + } + ], + "children": [ + { + "marks": [ + "2ea8fd78e039" + ], + "text": "Nextflow Patterns", + "_key": "32202ffeef59", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " & ", + "_key": "4d449f2e9862" + }, + { + "marks": [ + "f79220d04cd8" + ], + "text": "GitHub repository", + "_key": "f4e9eaabe069", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "1520722134a5", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_key": "098365c814a0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5dba3d0cd02a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "282391dca552", + "markDefs": [], + "children": [ + { + "_key": "6143c264b12c0", + "_type": "span", + "marks": [], + "text": "\n10. nf-core tutorials" + } + ] + }, + { + "style": "normal", + "_key": "d93018464ace", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "ca0000123431" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A set of tutorials covering the basics of using and creating nf-core pipelines developed by the team at ", + "_key": "f1f4ce37ab7d" + }, + { + "_type": "span", + "marks": [ + "ca0000123431" + ], + "text": "nf-core", + "_key": "5db320472915" + }, + { + "_key": "89b525b5a9c9", + "_type": "span", + "marks": [], + "text": ". These tutorials provide an overview of the nf-core framework, including:" + } + ], + "_type": "block" + }, + { + "_key": "2a3057409d46", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "cea6cb3cf2790", + "_type": "span", + "marks": [], + "text": "How to run nf-core pipelines" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "709a49933157", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "What are the most commonly used nf-core tools", + "_key": "b943f34613a70" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "444207e21d0a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "a99035d0075a0", + "_type": "span", + "marks": [], + "text": "How to make new pipelines using the nf-core template" + } + ], + "level": 1 + }, + { + "markDefs": [], + "children": [ + { + "text": "What are nf-core shared modules", + "_key": "c2c54d82f81a0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "dc53e159463e", + "listItem": "bullet" + }, + { + "children": [ + { + "text": "How to add nf-core shared modules to a pipeline", + "_key": "bf53e7c3d19f0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "52cf9627a17e", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "How to make new nf-core modules using the nf-core module template", + "_key": "09c7a6f7575a0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "af58cfe786ba", + "listItem": "bullet" + }, + { + "_key": "82e44e55ff18", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "68750984540e0", + "_type": "span", + "marks": [], + "text": "How nf-core pipelines are reviewed and ultimately released" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "cbafd44a3aea", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/docs/usage/tutorials", + "_key": "8286d381a2cd" + }, + { + "_type": "link", + "href": "https://nf-co.re/docs/contributing/tutorials", + "_key": "f27eb0805d09" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "8286d381a2cd" + ], + "text": "nf-core usage tutorials", + "_key": "2eef60a9532a" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "40e32b15d487" + }, + { + "_type": "span", + "marks": [ + "f27eb0805d09" + ], + "text": "nf-core developer tutorials", + "_key": "8224bcdfc363" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "65d5e021bc01", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ea05ab83b4cf", + "markDefs": [] + }, + { + "_type": "block", + "style": "h3", + "_key": "6963e5761954", + "markDefs": [], + "children": [ + { + "text": "\n11. Awesome Nextflow", + "_key": "c059b84bad970", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "22307766ba1e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A collection of awesome Nextflow pipelines compiled by various contributors to the open-source Nextflow project.", + "_key": "51e92ba98fa6" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "2b3cddeef3af", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c5f718aa7da6" + }, + { + "_key": "a10ad310bb10", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/awesome-nextflow", + "_key": "fee9b63924bf" + } + ], + "children": [ + { + "text": "Awesome Nextflow", + "_key": "c10a3145c34f", + "_type": "span", + "marks": [ + "fee9b63924bf" + ] + }, + { + "marks": [], + "text": " and GitHub", + "_key": "f0efee53d0a3", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9af8f1c3bf16" + } + ], + "_type": "block", + "style": "normal", + "_key": "ccf7a6675483", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "\n12. Wave showcase: Wave and Fusion tutorials", + "_key": "2e3025b58ccc0", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "c325658e2119", + "markDefs": [] + }, + { + "_key": "a2bb957aa3cc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wave and the Fusion file system are new Nextflow capabilities introduced in November 2022. Wave is a container provisioning and augmentation service fully integrated with the Nextflow ecosystem. Instead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself.", + "_key": "b1440674b47f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f448705ab284" + } + ], + "_type": "block", + "style": "normal", + "_key": "034aeae4f571", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "b71fad712283", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Tightly coupled with Wave is the new Fusion 2.0 file system. Fusion implements a virtual distributed file system and presents a thin client, allowing data hosted in AWS S3 buckets (and other object stores in the future) to be accessed via the standard POSIX filesystem interface expected by most applications.", + "_key": "95b22d1f6441", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "2c7d391a5b55", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "53f92e2f3741", + "markDefs": [] + }, + { + "style": "normal", + "_key": "c6dca30e97fe", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/", + "_key": "3dd5f0d2af3d" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wave can help simplify development, improve reliability, and make pipelines easier to maintain. It can even improve pipeline performance. The optional Fusion 2.0 file system offers further advantages, delivering performance on par with FSx for Lustre while enabling organizations to reduce their cloud computing bill and improve pipeline efficiency throughput. See the ", + "_key": "ef60df81f0ef" + }, + { + "marks": [ + "3dd5f0d2af3d" + ], + "text": "blog article", + "_key": "e14cd6f8705a", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " released in February 2023 explaining the Fusion file system and providing benchmarks comparing Fusion to other data handling approaches in the cloud.", + "_key": "68636e10e34a" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9cbe914ddb49", + "markDefs": [], + "children": [ + { + "_key": "db24bb32d6cd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://github.com/seqeralabs/wave-showcase", + "_key": "f4a5da4c44de", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "f4a5da4c44de" + ], + "text": "Wave showcase", + "_key": "833a6dc7c8a5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " on GitHub", + "_key": "09590f5f8c16" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "d11a6745f1ae" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "942773861a83" + } + ], + "_type": "block", + "style": "normal", + "_key": "230aa6990cc9" + }, + { + "_type": "block", + "style": "h3", + "_key": "674e7158e1a7", + "markDefs": [], + "children": [ + { + "_key": "f4a9475dcd4e0", + "_type": "span", + "marks": [], + "text": "\n13. Building Containers for Scientific Workflows" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "110648c6549a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While not strictly a guide about Nextflow, this article provides an overview of scientific containers and provides a tutorial involved in creating your own container and integrating it into a Nextflow pipeline. It also provides some useful tips on troubleshooting containers and publishing them to registries.", + "_key": "4f828041937a", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f95a22baf64f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "06544871b4d1", + "_type": "span" + } + ] + }, + { + "_key": "f6d0af8acfdd", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/building-containers-for-scientific-workflows/", + "_key": "6a489e764f59" + } + ], + "children": [ + { + "_key": "4fc8099aaec0", + "_type": "span", + "marks": [ + "6a489e764f59" + ], + "text": "Building Containers for Scientific Workflows" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "style": "normal", + "_key": "cafea05b0adc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a1986c660c53" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "58cfa777614d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n14. Best Practices for Deploying Pipelines with Nextflow Tower", + "_key": "79418d86bbbd0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "09f7ec8ebe2a", + "markDefs": [], + "children": [ + { + "text": "When building Nextflow pipelines, a best practice is to supply a nextflow_schema.json file describing pipeline input parameters. The benefit of adding this file to your code repository, is that if the pipeline is launched using Nextflow, the schema enables an easy-to-use web interface that users through the process of parameter selection. While it is possible to craft this file by hand, the nf-core community provides a handy schema build tool. This step-by-step guide explains how to adapt your pipeline for use with Nextflow Tower by using the schema build tool to automatically generate the nextflow_schema.json file.", + "_key": "a083960c2ef4", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "6fe42050e25d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5c5d8d0be7ef" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "46c32368848a" + ], + "text": "Best Practices for Deploying Pipelines with Nextflow Tower", + "_key": "a89d055c898c" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "0bc696df02ae", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/best-practices-for-deploying-pipelines-with-nextflow-tower/", + "_key": "46c32368848a" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "197a77ac8cd2", + "markDefs": [], + "children": [ + { + "_key": "ba8a469f49f3", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "86a772be9093", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Cloud integration tutorials", + "_key": "80c4f34cffd70", + "_type": "span" + } + ] + }, + { + "_key": "ff236d6ac84d", + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.tower.nf/", + "_key": "a041b242f11e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In addition to the learning resources above, several step-by-step integration guides explain how to run Nextflow pipelines on your cloud platform of choice. Some of these tutorials extend to the use of ", + "_key": "5971c4a2bffc" + }, + { + "_type": "span", + "marks": [ + "a041b242f11e" + ], + "text": "Nextflow Tower", + "_key": "ad881edc3a46" + }, + { + "_type": "span", + "marks": [], + "text": ". Organizations can use the Tower Cloud Free edition to launch pipelines quickly in the cloud. Organizations can optionally use Tower Cloud Professional or run self-hosted or on-premises Tower Enterprise environments as requirements grow. This year, we added Google Cloud Batch to the cloud services supported by Nextflow.", + "_key": "77b020587c4f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7d6c44a97f2c", + "markDefs": [], + "children": [ + { + "_key": "60922751c66b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "fd46f4e03d70", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n1. Nextflow and AWS Batch — Inside the Integration", + "_key": "a1e87a47503f0" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "This three-part series of articles provides a step-by-step guide explaining how to use Nextflow with AWS Batch. The ", + "_key": "9e4d29ff1184", + "_type": "span" + }, + { + "marks": [ + "0cb0aaaaaac2" + ], + "text": "first of three articles", + "_key": "81390b230c5d", + "_type": "span" + }, + { + "text": " covers AWS Batch concepts, the Nextflow execution model, and explains how the integration works under the covers. The ", + "_key": "01254938674d", + "_type": "span", + "marks": [] + }, + { + "text": "second article", + "_key": "c25cf18033b3", + "_type": "span", + "marks": [ + "995aef5b8624" + ] + }, + { + "text": " in the series provides a step-by-step guide explaining how to set up the AWS batch environment and how to run and troubleshoot open-source Nextflow pipelines. The ", + "_key": "060f7d544d22", + "_type": "span", + "marks": [] + }, + { + "_key": "aa8a0ca471f9", + "_type": "span", + "marks": [ + "43ccc0f712ad" + ], + "text": "third article" + }, + { + "text": " builds on what you've learned, explaining how to integrate workflows with Nextflow Tower and share the AWS Batch environment with other users by "publishing" your workflows to the cloud.", + "_key": "957e1d95bf6f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "bc8ee578db25", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/", + "_key": "0cb0aaaaaac2" + }, + { + "href": "https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/", + "_key": "995aef5b8624", + "_type": "link" + }, + { + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/", + "_key": "43ccc0f712ad" + } + ] + }, + { + "children": [ + { + "_key": "2fdc7ebd9b30", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ed6e1e2f7830", + "markDefs": [] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "4cc3a868808b", + "markDefs": [ + { + "href": "https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/", + "_key": "21cfbf450168", + "_type": "link" + }, + { + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-2-of-3/", + "_key": "c8941e233067" + }, + { + "_key": "334a52c91755", + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-aws-batch-using-tower-part-3-of-3/" + } + ], + "children": [ + { + "_key": "e30bf5bd0b25", + "_type": "span", + "marks": [], + "text": "Nextflow and AWS Batch — Inside the Integration (" + }, + { + "_key": "6c5132ee3845", + "_type": "span", + "marks": [ + "21cfbf450168" + ], + "text": "part 1 of 3" + }, + { + "text": ", ", + "_key": "2ed41984b7f6", + "_type": "span", + "marks": [] + }, + { + "text": "part 2 of 3", + "_key": "afbdf285f4a4", + "_type": "span", + "marks": [ + "c8941e233067" + ] + }, + { + "_key": "fbeac335d932", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "text": "part 3 of 3", + "_key": "6449ac7ed6df", + "_type": "span", + "marks": [ + "334a52c91755" + ] + }, + { + "text": ")", + "_key": "dde1e9911358", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_key": "26c0807fff0e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "44b38a9e7160", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n2. Nextflow and Azure Batch — Inside the Integration", + "_key": "04758a922e270", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "6f0ff87ca5d5" + }, + { + "_key": "74d8268e38aa", + "markDefs": [ + { + "href": "https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/", + "_key": "86c7542d05fb", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Similar to the tutorial above, this set of articles does a deep dive into the Nextflow Azure Batch integration. ", + "_key": "7c164b1c6687", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "86c7542d05fb" + ], + "text": "Part 1", + "_key": "b62db6285a0d" + }, + { + "_type": "span", + "marks": [], + "text": " covers Azure Batch and essential concepts, provides an overview of the integration, and explains how to set up Azure Batch and Storage accounts. It also covers deploying a machine instance in the Azure cloud and configuring it to run Nextflow pipelines against the Azure Batch service.", + "_key": "b95f96505321" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "baabc693a344", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "260f5727c0de", + "markDefs": [] + }, + { + "_key": "90c65afa5d35", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/", + "_key": "4d6a8138c580" + } + ], + "children": [ + { + "text": "Part 2", + "_key": "d3d23ad1c22b", + "_type": "span", + "marks": [ + "4d6a8138c580" + ] + }, + { + "_key": "82b90c120b3b", + "_type": "span", + "marks": [], + "text": " builds on what you learned in part 1 and shows how to use Azure Batch from within Nextflow Tower Cloud. It provides a walkthrough of how to make the environment set up in part 1 accessible to users through Tower's intuitive web interface." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b9c575955149", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d30b1ef06d5f", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "66add2094652", + "markDefs": [ + { + "href": "https://seqera.io/blog/nextflow-and-azure-batch-part-1-of-2/", + "_key": "4039df72f8a2", + "_type": "link" + }, + { + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-azure-batch-working-with-tower-part-2-of-2/", + "_key": "4c67dc16271f" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow and Azure Batch — Inside the Integration (", + "_key": "527d9bcc92e6" + }, + { + "_key": "14f1d5d53978", + "_type": "span", + "marks": [ + "4039df72f8a2" + ], + "text": "part 1 of 2" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "4eff9d43b3ca" + }, + { + "_type": "span", + "marks": [ + "4c67dc16271f" + ], + "text": "part 2 of 2", + "_key": "813e11b0a172" + }, + { + "_key": "2800cacd9643", + "_type": "span", + "marks": [], + "text": ")" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6622beedbc78" + } + ], + "_type": "block", + "style": "normal", + "_key": "f85484c0c0e9" + }, + { + "_key": "7a7c520a9ab9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n3. Get started with Nextflow on Google Cloud Batch", + "_key": "d3b70f3aff200", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This excellent article by Marcel Ribeiro-Dantas provides a step-by-step tutorial on using Nextflow with Google’s new Google Cloud Batch service. Google Cloud Batch is expected to replace the Google Life Sciences integration over time. The article explains how to deploy the Google Cloud Batch and Storage environments in GCP using the gcloud CLI. It then goes on to explain how to configure Nextflow to launch pipelines into the newly created Google Cloud Batch environment.", + "_key": "56635bb70425", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4044f860107f" + }, + { + "style": "blockquote", + "_key": "e68440a699c1", + "markDefs": [ + { + "_key": "a87215d5c059", + "_type": "link", + "href": "https://nextflow.io/blog/2023/nextflow-with-gbatch.html" + } + ], + "children": [ + { + "text": "Get started with Nextflow on Google Cloud Batch", + "_key": "24e0d7616afe", + "_type": "span", + "marks": [ + "a87215d5c059" + ] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d8964b0eff6d" + } + ], + "_type": "block", + "style": "normal", + "_key": "36c8c2b5ba7b", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n4. Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS", + "_key": "09adb54088410" + } + ], + "_type": "block", + "style": "h3", + "_key": "047a1f90c0cf" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While not commonly used for HPC workloads, Kubernetes has clear momentum. In this educational article, Ben Sherman provides an overview of how the Nextflow / Kubernetes integration has been simplified by avoiding the requirement for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). This detailed guide provides step-by-step instructions for using Amazon EKS as a compute environment complete with how to configure IAM Roles for Kubernetes Services Accounts (IRSA), now an Amazon EKS best practice.", + "_key": "555eb2fc6419", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "15f65f75f10e" + }, + { + "_type": "block", + "style": "normal", + "_key": "d6a5d1a2e1d5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2965bbbd3aac" + } + ] + }, + { + "style": "blockquote", + "_key": "e098dcdadfb5", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/deploying-nextflow-on-amazon-eks/", + "_key": "ddd3bedfa107" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "ddd3bedfa107" + ], + "text": "Nextflow and K8s Rebooted: Running Nextflow on Amazon EKS", + "_key": "5ad47efdd2a9" + } + ], + "_type": "block" + }, + { + "_key": "3db739e375ba", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "de0c5ba79fef" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "b952b49a30980", + "_type": "span", + "marks": [], + "text": "Additional resources" + } + ], + "_type": "block", + "style": "h2", + "_key": "2feb44abd39b", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "54a12f8b8d39", + "_type": "span", + "marks": [], + "text": "The following resources will help you dig deeper into Nextflow and other related projects like the nf-core community which maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything." + } + ], + "_type": "block", + "style": "normal", + "_key": "c8218d7d02a7" + }, + { + "_type": "block", + "style": "normal", + "_key": "02b2bee1dd8d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "75a74cb4c8bf", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "8b433e2cbfac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n1. Nextflow docs", + "_key": "d02d58e53f830", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "text": "The reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month, with the latest stable releases every three months.", + "_key": "e734bcb1bbcf0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5c2c4fcbc179", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "7e4a4f4173b1", + "markDefs": [ + { + "_key": "b77ed70b5a25", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/index.html" + }, + { + "href": "https://www.nextflow.io/docs/edge/index.html", + "_key": "c327805a7fa2", + "_type": "link" + } + ], + "children": [ + { + "text": "Latest ", + "_key": "78090f026fb10", + "_type": "span", + "marks": [] + }, + { + "text": "stable", + "_key": "78090f026fb11", + "_type": "span", + "marks": [ + "b77ed70b5a25" + ] + }, + { + "_key": "78090f026fb12", + "_type": "span", + "marks": [], + "text": " & " + }, + { + "_type": "span", + "marks": [ + "c327805a7fa2" + ], + "text": "edge", + "_key": "78090f026fb13" + }, + { + "_type": "span", + "marks": [], + "text": " documentation.", + "_key": "78090f026fb14" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9e47f411d1c8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n2. Seqera Labs docs", + "_key": "e7e891babd590" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "17b5a9dba87d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "An index of documentation, deployment guides, training materials, and resources for all things Nextflow and Tower.", + "_key": "ed39478f65430" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://seqera.io/docs/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257", + "_key": "8aab3bc11c27", + "_type": "link" + } + ], + "children": [ + { + "_key": "30f286886c060", + "_type": "span", + "marks": [ + "8aab3bc11c27" + ], + "text": "Seqera Labs docs" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "e9e933264102" + }, + { + "style": "h3", + "_key": "75fe53a26166", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n3. nf-core", + "_key": "5b124c63548e0" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "66ebc2a68d86", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "nf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up for the Slack channel.", + "_key": "b9a78237b9600", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://nf-co.re/", + "_key": "01d92c4a5618", + "_type": "link" + }, + { + "_key": "bfc917868479", + "_type": "link", + "href": "https://nf-co.re/join" + } + ], + "children": [ + { + "text": "nf-core website", + "_key": "ac496acb13ab0", + "_type": "span", + "marks": [ + "01d92c4a5618" + ] + }, + { + "_key": "ac496acb13ab1", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "text": "nf-core Slack", + "_key": "ac496acb13ab2", + "_type": "span", + "marks": [ + "bfc917868479" + ] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "b6f21f9957e9" + }, + { + "_key": "a41044763ab6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n4. Nextflow Tower", + "_key": "0d6bfb1e4cd00" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "1f1102858c8c", + "markDefs": [], + "children": [ + { + "_key": "a3fbc63704f50", + "_type": "span", + "marks": [], + "text": "Nextflow Tower is a platform to easily monitor, launch, and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines, and launching using either the web graphic interface, CLI, or API." + } + ] + }, + { + "style": "blockquote", + "_key": "17215264f8d4", + "markDefs": [ + { + "_type": "link", + "href": "https://tower.nf/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257", + "_key": "fa4c6d3cee19" + }, + { + "href": "http://help.tower.nf/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257", + "_key": "b91fbf2a0af2", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow Tower", + "_key": "818eeece754d0", + "_type": "span", + "marks": [ + "fa4c6d3cee19" + ] + }, + { + "text": " and ", + "_key": "818eeece754d1", + "_type": "span", + "marks": [] + }, + { + "text": "user documentation", + "_key": "818eeece754d2", + "_type": "span", + "marks": [ + "b91fbf2a0af2" + ] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "827f5c948480", + "markDefs": [], + "children": [ + { + "text": "5. Nextflow on AWS", + "_key": "29c044b3ae980", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "1e15c21144a5", + "markDefs": [ + { + "_key": "2932bd7ee9c1", + "_type": "link", + "href": "https://seqera.io/blog/nextflow-and-aws-batch-inside-the-integration-part-1-of-3/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Part of the Genomics Workflows on AWS, Amazon provides a quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes. While this article is packed with good information, the procedure outlined in the more recent ", + "_key": "6bc65d2cd0880" + }, + { + "_type": "span", + "marks": [ + "2932bd7ee9c1" + ], + "text": "Nextflow and AWS Batch – Inside the integration", + "_key": "6bc65d2cd0881" + }, + { + "_key": "6bc65d2cd0882", + "_type": "span", + "marks": [], + "text": " series, may be an easier place to start. Some of the steps that previously needed to be performed manually have been updated in the latest integration." + } + ] + }, + { + "_key": "2dbf4f2298bc", + "markDefs": [ + { + "_key": "a9730caa0494", + "_type": "link", + "href": "https://docs.opendata.aws/genomics-workflows/orchestration/nextflow/nextflow-overview.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "a9730caa0494" + ], + "text": "Nextflow on AWS Batch", + "_key": "0276efcf119e0" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_key": "469c0f64c29e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "dd7b0745afb8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "f82cc80d64dc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "6. Nextflow Data Pipelines on Azure Batch", + "_key": "e5084570aadf0" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Nextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guide below developed by the team at Microsoft to set up both services on Azure, and to get your storage and batch account names and keys.", + "_key": "3fa54d0de0330", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1e6d81417f35" + }, + { + "markDefs": [ + { + "href": "https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383", + "_key": "0c6158a202cb", + "_type": "link" + }, + { + "href": "https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md", + "_key": "536547d0ffe6", + "_type": "link" + } + ], + "children": [ + { + "_key": "d416705fce860", + "_type": "span", + "marks": [ + "0c6158a202cb" + ], + "text": "Azure Blog" + }, + { + "_key": "d416705fce861", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "536547d0ffe6" + ], + "text": "GitHub repository", + "_key": "d416705fce862" + }, + { + "text": ".\n", + "_key": "d416705fce863", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "3b828fb1f668" + }, + { + "_type": "block", + "style": "h3", + "_key": "971a8d169a48", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "7. Running Nextflow with Google Life Sciences", + "_key": "906d746e2b130", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_key": "95716c0eb018", + "_type": "link", + "href": "https://nextflow.io/blog/2023/nextflow-with-gbatch.html" + } + ], + "children": [ + { + "text": "A step-by-step guide to launching Nextflow Pipelines in Google Cloud. Note that this integration process is specific to Google Life Sciences – an offering that pre-dates Google Cloud Batch. If you want to use the newer integration approach, you can also check out the Nextflow blog article ", + "_key": "2f33327ff4960", + "_type": "span", + "marks": [] + }, + { + "text": "Get started with Nextflow on Google Cloud Batch", + "_key": "2f33327ff4961", + "_type": "span", + "marks": [ + "95716c0eb018" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "2f33327ff4962" + } + ], + "_type": "block", + "style": "normal", + "_key": "de3127834864" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "f9b9d6409be5" + ], + "text": "Nextflow on Google Cloud", + "_key": "94ac289653c90" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "78732c5b0fa9", + "markDefs": [ + { + "_key": "f9b9d6409be5", + "_type": "link", + "href": "https://cloud.google.com/life-sciences/docs/tutorials/nextflow" + } + ] + }, + { + "_key": "971e20cadc16", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "8. Bonus: Nextflow Tutorial - Variant Calling Edition", + "_key": "7bc1fc6e87000" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "528210ae1411", + "markDefs": [ + { + "href": "https://sateeshperi.github.io/nextflow_varcal/nextflow/", + "_key": "6e6f732973fc", + "_type": "link" + }, + { + "_type": "link", + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "2ada78867481" + }, + { + "href": "https://datacarpentry.org/wrangling-genomics/", + "_key": "3be5f61f4f9f", + "_type": "link" + } + ], + "children": [ + { + "text": "This ", + "_key": "f984a42901c20", + "_type": "span", + "marks": [] + }, + { + "text": "Nextflow Tutorial - Variant Calling Edition", + "_key": "f984a42901c21", + "_type": "span", + "marks": [ + "6e6f732973fc" + ] + }, + { + "_key": "f984a42901c22", + "_type": "span", + "marks": [], + "text": " has been adapted from the " + }, + { + "_key": "f984a42901c23", + "_type": "span", + "marks": [ + "2ada78867481" + ], + "text": "Nextflow Software Carpentry training material" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "f984a42901c24" + }, + { + "marks": [ + "3be5f61f4f9f" + ], + "text": "Data Carpentry: Wrangling Genomics Lesson", + "_key": "f984a42901c25", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash script into a Nextflow workflow, and modularize the pipeline using DSL2 modules and sub-workflows.", + "_key": "f984a42901c26" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "The workshop can be opened on ", + "_key": "067f82dd9ef00", + "_type": "span", + "marks": [] + }, + { + "text": "Gitpod", + "_key": "067f82dd9ef01", + "_type": "span", + "marks": [ + "21b36fe6edbb" + ] + }, + { + "marks": [], + "text": ", where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.", + "_key": "067f82dd9ef02", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "07433e9f2062", + "markDefs": [ + { + "_key": "21b36fe6edbb", + "_type": "link", + "href": "https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git" + } + ] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "5bb03f5bfa8c", + "markDefs": [ + { + "_type": "link", + "href": "https://sateeshperi.github.io/nextflow_varcal/nextflow/", + "_key": "8e98f7db947c" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can find the course in ", + "_key": "8421e36a55d80" + }, + { + "_type": "span", + "marks": [ + "8e98f7db947c" + ], + "text": "Nextflow Tutorial - Variant Calling Edition", + "_key": "8421e36a55d81" + }, + { + "_key": "8421e36a55d82", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Community and support", + "_key": "51d91be03cdb0", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "c854ed4f06d2", + "markDefs": [] + }, + { + "_key": "1ba21bb8629f", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://community.seqera.io/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257", + "_key": "1350a96cc924", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "1350a96cc924" + ], + "text": "Seqera Community Forum", + "_key": "f3ab56a7e28f0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "8269d0c883d6", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://twitter.com/nextflowio?lang=en", + "_key": "0249c975c08f" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow Twitter ", + "_key": "4bc8c029b6e90", + "_type": "span" + }, + { + "_key": "4bc8c029b6e91", + "_type": "span", + "marks": [ + "0249c975c08f" + ], + "text": "@nextflowio" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "6abc41cc7ad1", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/slack-invite.html", + "_key": "0a4286cf081b" + } + ], + "children": [ + { + "_key": "52540c98b6670", + "_type": "span", + "marks": [ + "0a4286cf081b" + ], + "text": "Nextflow Slack" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "dcf817b01ab2", + "listItem": "bullet", + "markDefs": [ + { + "_key": "ab138802e351", + "_type": "link", + "href": "https://nfcore.slack.com/" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "ab138802e351" + ], + "text": "nf-core Slack", + "_key": "26170e84ee1f0" + } + ] + }, + { + "_key": "d0a8642fc3e7", + "listItem": "bullet", + "markDefs": [ + { + "_key": "c6a4fe1d1ab8", + "_type": "link", + "href": "https://www.seqera.io/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257" + }, + { + "_type": "link", + "href": "https://tower.nf/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727441286556.1727454973713.80&__hssc=247481240.1.1727454973713&__hsfp=3485190257", + "_key": "bff789814bf2" + } + ], + "children": [ + { + "_key": "7677b715e3520", + "_type": "span", + "marks": [ + "c6a4fe1d1ab8" + ], + "text": "Seqera Labs" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "7677b715e3521" + }, + { + "_type": "span", + "marks": [ + "bff789814bf2" + ], + "text": "Nextflow Tower", + "_key": "7677b715e3522" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d987716239c7", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/patterns", + "_key": "6377d01a9c30" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "6377d01a9c30" + ], + "text": "Nextflow patterns", + "_key": "69a974c3f2b10" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/mribeirodantas/NextflowSnippets", + "_key": "27f2338d55a5" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "27f2338d55a5" + ], + "text": "Nextflow Snippets", + "_key": "c3e36b4b326b0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "e2fcededf7df" + } + ], + "meta": { + "slug": { + "current": "learn-nextflow-in-2023" + }, + "description": "In 2023, the world of Nextflow is more exciting than ever! With new resources constantly being released, there is no better time to dive into this powerful tool. From a new Software Carpentries’ course to recordings of mutiple nf-core training events to new tutorials on Wave and Fusion, the options for learning Nextflow are endless." + } + }, + { + "title": "More fun with containers in HPC", + "_rev": "hf9hwMPb7ybAE3bqEU5jSp", + "publishedAt": "2016-12-20T07:00:00.000Z", + "_createdAt": "2024-09-25T14:15:10Z", + "_id": "119eac534c0e", + "meta": { + "slug": { + "current": "more-fun-containers-hpc" + }, + "description": "Nextflow was one of the first workflow framework to provide built-in support for Docker containers. A couple of years ago we also started to experiment with the deployment of containerised bioinformatic pipelines at CRG, using Docker technology." + }, + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_type": "blogPost", + "_updatedAt": "2024-10-10T08:59:14Z", + "body": [ + { + "style": "normal", + "_key": "e7d369471664", + "markDefs": [ + { + "_type": "link", + "href": "<(https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html)>", + "_key": "877e5fb6c345" + }, + { + "_type": "link", + "href": "https://www.nextplatform.com/2016/01/28/crg-goes-with-the-genomics-flow/", + "_key": "c63a3252a968" + } + ], + "children": [ + { + "_key": "6180014ea97e", + "_type": "span", + "marks": [], + "text": "Nextflow was one of the first workflow framework to provide built-in support for Docker containers. A couple of years ago we also started to experiment with the deployment of containerised bioinformatic pipelines at CRG, using Docker technology (see " + }, + { + "_key": "3dfcf52d87a9", + "_type": "span", + "marks": [ + "877e5fb6c345" + ], + "text": "here" + }, + { + "text": " and ", + "_key": "0e2e99610a68", + "_type": "span", + "marks": [] + }, + { + "_key": "5a9956d92bdb", + "_type": "span", + "marks": [ + "c63a3252a968" + ], + "text": "here" + }, + { + "marks": [], + "text": ").", + "_key": "a4a9cfa06eae", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e0754f0e26f7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "411cdb54e846", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We found that by isolating and packaging the complete computational workflow environment with the use of Docker images, radically simplifies the burden of maintaining complex dependency graphs of real workload data analysis pipelines.", + "_key": "0e25fa08f998", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b3bc551514e8" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "75cc4f99f518", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1b80bc2edbc9", + "markDefs": [] + }, + { + "style": "normal", + "_key": "592138cb0a4d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Even more importantly, the use of containers enables replicable results with minimal effort for the system configuration. The entire computational environment can be archived in a self-contained executable format, allowing the replication of the associated analysis at any point in time.", + "_key": "aaf0111f6176", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "37d46777aede", + "markDefs": [], + "children": [ + { + "_key": "60f3a9b35a85", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "80a73c935206", + "markDefs": [ + { + "_type": "link", + "href": "https://galaxyproject.org", + "_key": "2a6525a9fe68" + }, + { + "_key": "43a33e86eb14", + "_type": "link", + "href": "http://commonwl.org" + }, + { + "_key": "a2bb78cceb23", + "_type": "link", + "href": "http://bioboxes.org" + }, + { + "href": "https://dockstore.org", + "_key": "4537df0adb31", + "_type": "link" + } + ], + "children": [ + { + "text": "This ability is the main reason that drove the rapid adoption of Docker in the bioinformatic community and its support in many projects, like for example ", + "_key": "c11ee380e111", + "_type": "span", + "marks": [] + }, + { + "text": "Galaxy", + "_key": "a15f39735d8f", + "_type": "span", + "marks": [ + "2a6525a9fe68" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "fd4011f3607e" + }, + { + "_type": "span", + "marks": [ + "43a33e86eb14" + ], + "text": "CWL", + "_key": "ecf308d5bb2f" + }, + { + "_key": "808c6b5753df", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "marks": [ + "a2bb78cceb23" + ], + "text": "Bioboxes", + "_key": "b42fbb644fd0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "d4246e8478c9" + }, + { + "text": "Dockstore", + "_key": "7f0a75c4df2e", + "_type": "span", + "marks": [ + "4537df0adb31" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and many others.", + "_key": "cd4af7a47aa3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4f869de6cd46", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a4b5fb8bf5a9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "6cf34ce03d5a", + "_type": "span", + "marks": [], + "text": "However, while the popularity of Docker spread between the developers, its adaption in research computing infrastructures continues to remain very low and it's very unlikely that this trend will change in the future." + } + ], + "_type": "block", + "style": "normal", + "_key": "169a0c49c4f6" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "829e80f97a23" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca2b6cbc9bfd", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "d008ac555f2e", + "_type": "span", + "marks": [], + "text": "The reason for this resides in the Docker architecture, which requires a daemon running with root permissions on each node of a computing cluster. Such a requirement raises many security concerns, thus good practices would prevent its use in shared HPC cluster or supercomputer environments." + } + ], + "_type": "block", + "style": "normal", + "_key": "7fe36d366c09" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "22280fbe14f7" + } + ], + "_type": "block", + "style": "normal", + "_key": "01b383fb9003" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Introducing Singularity", + "_key": "7c476c983201" + } + ], + "_type": "block", + "style": "h3", + "_key": "7cf41c72a0f2" + }, + { + "style": "normal", + "_key": "38f6801229cc", + "markDefs": [ + { + "href": "http://singularity.lbl.gov", + "_key": "37db4c8e7f19", + "_type": "link" + } + ], + "children": [ + { + "_key": "b6f4e0e8fcd0", + "_type": "span", + "marks": [], + "text": "Alternative implementations, such as " + }, + { + "text": "Singularity", + "_key": "224497de0a09", + "_type": "span", + "marks": [ + "37db4c8e7f19" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", have fortunately been promoted by the interested in containers technology.", + "_key": "7091935b3535" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "fde457c9d1bc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "674419083941" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7b5cb2414a3d", + "markDefs": [], + "children": [ + { + "_key": "15b0baa35d41", + "_type": "span", + "marks": [], + "text": "Singularity is a containers engine developed at the Berkeley Lab and designed for the needs of scientific workloads. The main differences with Docker are: containers are file based, no root escalation is allowed nor root permission is needed to run a container (although a privileged user is needed to create a container image), and there is no separate running daemon." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "54efab0cb6ff", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5de96a2990da" + }, + { + "_key": "13375573984d", + "markDefs": [], + "children": [ + { + "text": "These, along with other features, such as support for autofs mounts, makes Singularity a container engine better suited to the requirements of HPC clusters and supercomputers.", + "_key": "e50f84917242", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c34729d4154e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "97de16416e02" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Moreover, although Singularity uses a container image format different to that of Docker, they provide a conversion tool that allows Docker images to be converted to the Singularity format.", + "_key": "a7ab871b9f6c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "664a3450e79d" + }, + { + "style": "normal", + "_key": "52acc4097d74", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "50cbf3f7f66a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "b3253a3b769a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Singularity in the wild", + "_key": "2c2bbff0d6b0", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We integrated Singularity support in Nextflow framework and tested it in the CRG computing cluster and the BSC ", + "_key": "ca609e3cb1c3" + }, + { + "marks": [ + "3ef18cfe6bc8" + ], + "text": "MareNostrum", + "_key": "7188967dc6f4", + "_type": "span" + }, + { + "marks": [], + "text": " supercomputer.", + "_key": "1dd4f899e827", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7213895483cc", + "markDefs": [ + { + "_key": "3ef18cfe6bc8", + "_type": "link", + "href": "https://www.bsc.es/discover-bsc/the-centre/marenostrum" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "626ac034d80c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e6801f0dd060" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The absence of a separate running daemon or image gateway made the installation straightforward when compared to Docker or other solutions.", + "_key": "827c151fa85e" + } + ], + "_type": "block", + "style": "normal", + "_key": "27887711860c", + "markDefs": [] + }, + { + "style": "normal", + "_key": "4cba719774ed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0fc9d9e436d5" + } + ], + "_type": "block" + }, + { + "_key": "4941bf73d52e", + "markDefs": [ + { + "_type": "link", + "href": "https://peerj.com/articles/1273/", + "_key": "f3615100da40" + } + ], + "children": [ + { + "marks": [], + "text": "To evaluate the performance of Singularity we carried out the ", + "_key": "c055360057fe", + "_type": "span" + }, + { + "_key": "371dd764672e", + "_type": "span", + "marks": [ + "f3615100da40" + ], + "text": "same benchmarks" + }, + { + "_type": "span", + "marks": [], + "text": " we performed for Docker and compared the results of the two engines.", + "_key": "c8f0d4cdce85" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "eb46867d6afd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ade42cc1ae5a" + }, + { + "style": "normal", + "_key": "187aaff1dd38", + "markDefs": [], + "children": [ + { + "text": "The benchmarks consisted in the execution of three Nextflow based genomic pipelines:", + "_key": "5d4c789ca5df", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1c7afa2bb7f1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "12eb2eb24a79" + }, + { + "_key": "347976535137", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The benchmarks consisted in the execution of three Nextflow based genomic pipelines:", + "_key": "34c1f27bcf6a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "13d04f5d21a3" + ], + "text": "Rna-toy", + "_key": "105e8958868e0" + }, + { + "_type": "span", + "marks": [], + "text": ": a simple pipeline for RNA-Seq data analysis.", + "_key": "105e8958868e1" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0e834b18a540", + "listItem": "number", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/rnatoy/tree/peerj5515", + "_key": "13d04f5d21a3" + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1d8897f610b0", + "listItem": "number", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nmdp-flow/tree/peerj5515/", + "_key": "95112c1e663f", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "95112c1e663f" + ], + "text": "Nmdp-Flow", + "_key": "58819653d5a70", + "_type": "span" + }, + { + "text": ": an assembly-based variant calling pipeline.", + "_key": "58819653d5a71", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "3298a94afd35", + "listItem": "number", + "markDefs": [ + { + "_key": "7bae0627b66d", + "_type": "link", + "href": "https://github.com/cbcrg/piper-nf/tree/peerj5515" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "7bae0627b66d" + ], + "text": "Piper-NF", + "_key": "208b3b7385030" + }, + { + "marks": [], + "text": ": a pipeline for the detection and mapping of long non-coding RNAs.", + "_key": "208b3b7385031", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "e505af1613da", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "51ec717211fd", + "markDefs": [] + }, + { + "_key": "5e21713dd3f6", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/singularityware/docker2singularity", + "_key": "7e3bd531a1a8" + } + ], + "children": [ + { + "marks": [], + "text": "In order to repeat the analyses, we converted the container images we used to perform the Docker benchmarks to Singularity image files by using the ", + "_key": "ace824ac0b73", + "_type": "span" + }, + { + "marks": [ + "7e3bd531a1a8" + ], + "text": "docker2singularity", + "_key": "ca5bc7368118", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " tool ", + "_key": "39585870f517" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "(this is not required anymore, see the update below)", + "_key": "766747ce8f01" + }, + { + "text": ".", + "_key": "f8705f45a964", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "676855f85adf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "95dbf4299f9b", + "markDefs": [] + }, + { + "_key": "714573787995", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The only change needed to run these pipelines with Singularity was to replace the Docker specific settings with the following ones in the configuration file:", + "_key": "864241ccbb5e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ccafd633aa5e" + } + ], + "_type": "block", + "style": "normal", + "_key": "0628ed52e6b5", + "markDefs": [] + }, + { + "_type": "code", + "_key": "13cf64ba5e69", + "code": "singularity.enabled = true\nprocess.container = ''" + }, + { + "_key": "95521d93d95d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Each pipeline was executed 10 times, alternately by using Docker and Singularity as container engine. The results are shown in the following table (time in minutes):", + "_key": "9c63345ceab8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "89060153ba33", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "335c92a03859" + }, + { + "body": "\n\n
\n \n \n \n \n \n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PipelineTasksMean task timeMean execution timeExecution time std devRatio
  SingularityDockerSingularityDockerSingularityDocker 
RNA-Seq973.773.6663.6662.32.03.10.998
Variant call4822.122.41061.21074.443.138.51.012
Piper-NF981.21.3120.0124.56.9 2.81.038
", + "_type": "markdownTable", + "_key": "fd3cb16feca5" + }, + { + "style": "normal", + "_key": "2db136091f6d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "c70e1de31a6e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c52ac9468e6e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The benchmark results show that there isn't any significative difference in the execution times of containerised workflows between Docker and Singularity. In two cases Singularity was slightly faster and a third one it was almost identical although a little slower than Docker.", + "_key": "1684c4e973b4", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "163f8c12633b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "aaab082b401b" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "5c98e9dfd59f", + "_type": "span", + "marks": [], + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h3", + "_key": "beadd6fa1a39" + }, + { + "children": [ + { + "_key": "a719a85a5c7f", + "_type": "span", + "marks": [], + "text": "In our evaluation Singularity proved to be an easy to install, stable and performant container engine." + } + ], + "_type": "block", + "style": "normal", + "_key": "21e630514449", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "3bea03d6dd07", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "96071742934c" + }, + { + "style": "normal", + "_key": "3fe2641d2c67", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The only minor drawback, we found when compared to Docker, was the need to define the host path mount points statically when the Singularity images were created. In fact, even if Singularity supports user mount points to be defined dynamically when the container is launched, this feature requires the overlay file system which was not supported by the kernel available in our system.", + "_key": "08b0d33befac", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "500f62854ec7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4f60874210bc" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://www.coscale.com/blog/docker-usage-statistics-increased-adoption-by-enterprises-and-for-production-use", + "_key": "49b64e7fa081" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Docker surely will remain the ", + "_key": "4ace888fb369" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "de facto", + "_key": "b6e8da261cef" + }, + { + "_type": "span", + "marks": [], + "text": " standard engine and image format for containers due to its popularity and ", + "_key": "177e0d132654" + }, + { + "text": "impressive growth", + "_key": "3c4af8c021a5", + "_type": "span", + "marks": [ + "49b64e7fa081" + ] + }, + { + "marks": [], + "text": ".", + "_key": "f8bfa27eaac5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a56ab44638ca" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "8a9c7bd5157a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "756261f6c8b9" + }, + { + "_type": "block", + "style": "normal", + "_key": "8f1c3aa85aa0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However, in our opinion, Singularity is the tool of choice for the execution of containerised workloads in the context of HPC, thanks to its focus on system security and its simpler architectural design.", + "_key": "df0dc1831cb9" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c603b936607b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "78fcdf057cc9", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "49dd6f209c36", + "markDefs": [], + "children": [ + { + "text": "The transparent support provided by Nextflow for both Docker and Singularity technology guarantees the ability to deploy your workflows in a range of different platforms (cloud, cluster, supercomputer, etc). Nextflow transparently manages the deployment of the containerised workload according to the runtime available in the target system.", + "_key": "14f9ba4c1128", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "c72cf9cf0a29", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e355071442eb", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Credits", + "_key": "fb4eba318a7b" + } + ], + "_type": "block", + "style": "h4", + "_key": "92fa7d2377d9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Thanks to Gabriel Gonzalez (CRG), Luis Exposito (CRG) and Carlos Tripiana Montes (BSC) for the support installing Singularity.", + "_key": "dac030df49d5" + } + ], + "_type": "block", + "style": "normal", + "_key": "d81edd1f60a2" + }, + { + "style": "normal", + "_key": "29d88aa1aeac", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2d8e43fc9568" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Update", + "_key": "949db0d8b0de" + }, + { + "_key": "3624909277c2", + "_type": "span", + "marks": [], + "text": " Singularity, since version 2.3.x, is able to pull and run Docker images from the Docker Hub. This greatly simplifies the interoperability with existing Docker containers. You only need to prefix the image name with the " + }, + { + "marks": [ + "code" + ], + "text": "docker://", + "_key": "2be57a9ed72a", + "_type": "span" + }, + { + "_key": "2a55ab26be45", + "_type": "span", + "marks": [], + "text": " pseudo-protocol to download it as a Singularity image, for example:" + } + ], + "_type": "block", + "style": "normal", + "_key": "f4fc6509a238", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1cd962ce7117", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6bb50bf545b5" + } + ] + }, + { + "_type": "code", + "_key": "bc86abd37eec", + "code": "singularity pull --size 1200 docker://nextflow/rnatoy" + }, + { + "style": "normal", + "_key": "cae9255d00ca", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a1d4d6c5caa1" + } + ], + "_type": "block" + } + ] + }, + { + "_type": "blogPost", + "_id": "1250d14d3b12", + "meta": { + "description": "Our recent The State of the Workflow 2022: Community Survey Results showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.", + "slug": { + "current": "czi-mentorship-round-1" + } + }, + "author": { + "_type": "reference", + "_ref": "chris-hakkaart" + }, + "_createdAt": "2024-09-25T14:16:33Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "b1d181cad0c5" + } + ], + "publishedAt": "2022-09-18T06:00:00.000Z", + "title": "Nextflow and nf-core mentorship, Round 1", + "_updatedAt": "2024-10-14T09:38:57Z", + "body": [ + { + "_key": "d3de926d7d4e", + "markDefs": [], + "children": [ + { + "_key": "738e30f0d3fd", + "_type": "span", + "marks": [], + "text": "Introduction" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "4527211bc460", + "markDefs": [ + { + "_key": "af5842058746", + "_type": "link", + "href": "https://seqera.io/blog/state-of-the-workflow-2022-results/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Our recent ", + "_key": "194e0a2dea30" + }, + { + "marks": [ + "af5842058746" + ], + "text": "The State of the Workflow 2022: Community Survey Results", + "_key": "1846c969f96f", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.", + "_key": "8d6ca510af5b" + } + ] + }, + { + "size": "medium", + "_type": "picture", + "alt": "Word cloud of scientific interest keywords, averaged across all applications.", + "_key": "677691faff37", + "alignment": "right", + "asset": { + "asset": { + "_ref": "image-5fc27f5d64de9bde9da8bb59698ef2967918c4e4-1024x1024-webp", + "_type": "reference" + }, + "_type": "image" + } + }, + { + "_key": "98f04ee33da7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Thanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.", + "_key": "591167eb5f93" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "16f858a219f9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ef57b732a44e" + } + ] + }, + { + "_key": "074b572cd129", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "From almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.", + "_key": "60ff5a9c05fd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "4cfcd5ae0a91", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "219432f6cada", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "02df0b7d42b1", + "_type": "span", + "marks": [], + "text": "People worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows." + } + ], + "_type": "block", + "style": "normal", + "_key": "55cc552c50d9" + }, + { + "_type": "block", + "style": "normal", + "_key": "3ac045899f3b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "529136b7574e", + "_type": "span", + "marks": [] + } + ] + }, + { + "asset": { + "_ref": "image-31539d4ccaf43ac747479baa294127db8f940419-1833x867-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Map of mentor / mentee pairs", + "_key": "d910397d21ed" + }, + { + "children": [ + { + "_key": "d712594299e8", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "48ecc573fb19", + "markDefs": [] + }, + { + "_key": "ee42852d0068", + "markDefs": [], + "children": [ + { + "text": "Ndeye Marième Top (mentee) & John Juma (mentor)", + "_key": "249da459c935", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.", + "_key": "a755e3dc6d21" + } + ], + "_type": "block", + "style": "normal", + "_key": "f8ae55ba3dbe" + }, + { + "_type": "block", + "style": "normal", + "_key": "90679b4f0e1e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a96500950495", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Together, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the ", + "_key": "c7b802c84777" + }, + { + "_type": "span", + "marks": [ + "c6e86c42169c" + ], + "text": "nf-core/viralrecon", + "_key": "d150e207b445" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to ", + "_key": "64b6246d9e10" + }, + { + "marks": [ + "fa5fe8e2d66b" + ], + "text": "GISAID", + "_key": "66bc008dff22", + "_type": "span" + }, + { + "_key": "6e1523bba58e", + "_type": "span", + "marks": [], + "text": ". She also got up to speed with the " + }, + { + "_type": "span", + "marks": [ + "557ae690d572" + ], + "text": "nf-core/mag", + "_key": "4ebfddd89624" + }, + { + "text": " pipeline for metagenomic analysis.", + "_key": "269184163b9f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "67d2039365d6", + "markDefs": [ + { + "_key": "c6e86c42169c", + "_type": "link", + "href": "https://nf-co.re/viralrecon" + }, + { + "_type": "link", + "href": "https://gisaid.org/", + "_key": "fa5fe8e2d66b" + }, + { + "_type": "link", + "href": "https://nf-co.re/mag", + "_key": "557ae690d572" + } + ] + }, + { + "children": [ + { + "_key": "a312a980c5b2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "875f81900a0e", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "\"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\"", + "_key": "2131a66e8beb", + "_type": "span" + }, + { + "text": " - ", + "_key": "51df78dd7f77", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Marième Top (mentee)", + "_key": "99fcd8db5cf3" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "7573542163c4" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bc759ecb2258" + } + ], + "_type": "block", + "style": "normal", + "_key": "2a3b2363e0ba", + "markDefs": [] + }, + { + "children": [ + { + "text": "\"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\"", + "_key": "8a8f45e2df0a", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "b75e6b2536ba", + "_type": "span", + "marks": [], + "text": " - " + }, + { + "marks": [ + "strong" + ], + "text": "John Juma (mentor)", + "_key": "b89c33c64ae6", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "5a9aed754e56", + "markDefs": [] + }, + { + "children": [ + { + "_key": "9ac26d2425c9", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0beb6b65ed62", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "d61766d73f88", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Philip Ashton (mentee) & Robert Petit (mentor)", + "_key": "61b0e31178db" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://bactopia.github.io/", + "_key": "62aa54ec1a18" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Philip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the ", + "_key": "dfe3ff994f93" + }, + { + "_type": "span", + "marks": [ + "62aa54ec1a18" + ], + "text": "Bactopia", + "_key": "9b1c81a0eb37" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.", + "_key": "5c66190409c9" + } + ], + "_type": "block", + "style": "normal", + "_key": "e1b1a9aaec4f" + }, + { + "style": "normal", + "_key": "559b0139b9e2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8f91adccd3e1" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Robert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.", + "_key": "745e843dc99f" + } + ], + "_type": "block", + "style": "normal", + "_key": "fb513bd10d33" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "51c2e6c630cd" + } + ], + "_type": "block", + "style": "normal", + "_key": "9b9d000b4a7d", + "markDefs": [] + }, + { + "_key": "cf3aac1e1593", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\"", + "_key": "8df88cb8a907" + }, + { + "text": " - ", + "_key": "b259ce528ab7", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Phil Ashton (mentee)", + "_key": "0e8c7cefc8a8" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c9acfafe2874", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "faf540bb721e" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "3276db06cc48", + "markDefs": [], + "children": [ + { + "text": "\"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\"", + "_key": "fe592fdac35f", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "42e5e52be9dd", + "_type": "span", + "marks": [], + "text": " - " + }, + { + "_key": "172bf1afacfe", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Robert Petit (mentor)" + } + ] + }, + { + "style": "normal", + "_key": "299be282ae9a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3dcdb661801c" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Kalayanee Chairat (mentee) & Alison Meynert (mentor)", + "_key": "1b73fb3f98d5", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "283e4de78711" + }, + { + "_type": "block", + "style": "normal", + "_key": "4ad6496f2d49", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Kalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom.", + "_key": "4fbc7f8abc75", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "0b4491576528", + "markDefs": [], + "children": [ + { + "_key": "72bf1014087b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "693db29767c2", + "_type": "link", + "href": "https://github.com/nf-core/configs" + }, + { + "_type": "link", + "href": "https://nf-co.re/sarek", + "_key": "24a8c82f740e" + }, + { + "_type": "link", + "href": "https://nf-co.re/rnaseq", + "_key": "0b0903b66397" + } + ], + "children": [ + { + "text": "Working with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to ", + "_key": "b273b101c58e", + "_type": "span", + "marks": [] + }, + { + "_key": "085a972023a8", + "_type": "span", + "marks": [ + "693db29767c2" + ], + "text": "nf-core/configs" + }, + { + "_type": "span", + "marks": [], + "text": " as an institutional profile. Now she is familiar with these tools, Kalayanee is using ", + "_key": "b2e402259036" + }, + { + "marks": [ + "24a8c82f740e" + ], + "text": "nf-core/sarek", + "_key": "b26f0dc77189", + "_type": "span" + }, + { + "_key": "04c0248e6ac8", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_key": "7b44a1a5ba5b", + "_type": "span", + "marks": [ + "0b0903b66397" + ], + "text": "nf-core/rnaseq" + }, + { + "_type": "span", + "marks": [], + "text": " to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment.", + "_key": "9b3631ca088a" + } + ], + "_type": "block", + "style": "normal", + "_key": "019db67b03ac" + }, + { + "style": "normal", + "_key": "f5c4cb724bff", + "markDefs": [], + "children": [ + { + "_key": "26324a75d8d6", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "b34b4c3c8f1c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\"", + "_key": "4dba083e351d", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\"", + "_key": "1af2f58d53fb", + "_type": "span" + }, + { + "_key": "e77d89d866ab", + "_type": "span", + "marks": [], + "text": " - " + }, + { + "marks": [ + "strong" + ], + "text": "Kalayanee Chairat (mentee)", + "_key": "66e61e8dc75f", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_type": "block", + "style": "normal", + "_key": "dedb8d93ba88", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "028e43e089f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "blockquote", + "_key": "9cf4e3b95ae8", + "markDefs": [], + "children": [ + { + "_key": "6b58b44128b8", + "_type": "span", + "marks": [ + "em" + ], + "text": "\"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\"" + }, + { + "_type": "span", + "marks": [], + "text": " - ", + "_key": "c08c1807714e" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Alison Meynert (mentor)", + "_key": "35de294d1091" + } + ], + "_type": "block" + }, + { + "_key": "29e6bc861527", + "markDefs": [], + "children": [ + { + "_key": "4f65186449bb", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "7a73aec61d83", + "markDefs": [], + "children": [ + { + "_key": "270629a92a81", + "_type": "span", + "marks": [], + "text": "Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)" + } + ] + }, + { + "_key": "074400679a39", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.", + "_key": "ba460c9aa91e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9ce3e32e8d46", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6cd8e8b720d8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f7708a13e9ae", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Edward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.", + "_key": "3c75d3a0accd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "84986d379ba1", + "markDefs": [], + "children": [ + { + "_key": "6cb0d2b884c8", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "a29b3e354b68", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\"", + "_key": "c50222763f79" + }, + { + "marks": [], + "text": " - ", + "_key": "74e365660454", + "_type": "span" + }, + { + "text": "Edward Lukyamuzi (mentee)", + "_key": "9e0d1669c8f8", + "_type": "span", + "marks": [ + "strong" + ] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "983403438678", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a7fe9c30bb1b" + } + ] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "e42966884ae0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\"", + "_key": "7d1ddd7f4ef9" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\"", + "_key": "50761b319c65" + }, + { + "_key": "135710a3916e", + "_type": "span", + "marks": [], + "text": " -" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " Emilio Garcia-Rios (mentor)", + "_key": "42a9f3a61c27" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b1710f24819e" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c9a2c369a57" + }, + { + "style": "h2", + "_key": "b2fafb3d2661", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Suchitra Thapa (mentee) & Maxime Borry (mentor)", + "_key": "cc3de2e6f7c7" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Suchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the ", + "_key": "31a64959aeab", + "_type": "span", + "marks": [] + }, + { + "text": "metaphlankrona", + "_key": "4c9fb915325d", + "_type": "span", + "marks": [ + "05c3ce85f98a" + ] + }, + { + "marks": [], + "text": " pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.", + "_key": "0d040aabf018", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0a66f450f5d7", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/suchitrathapa/metaphlankrona", + "_key": "05c3ce85f98a" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8d4c1b00df02" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca1e4a140ddd", + "markDefs": [] + }, + { + "style": "normal", + "_key": "202ad20eb189", + "markDefs": [ + { + "href": "https://summit.nextflow.io/speakers/suchitra-thapa/", + "_key": "88944a9c0c59", + "_type": "link" + } + ], + "children": [ + { + "text": "Suchitra will be sharing her work and progress on the pipeline as a poster at the ", + "_key": "775eba9624a8", + "_type": "span", + "marks": [] + }, + { + "_key": "24e91bc97a4e", + "_type": "span", + "marks": [ + "88944a9c0c59" + ], + "text": "Nextflow Summit 2022" + }, + { + "marks": [], + "text": ".", + "_key": "d00b0ec50bf9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "41c727116ad2" + } + ], + "_type": "block", + "style": "normal", + "_key": "b2db3eec3f2e" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "8ff27e09cc2b", + "markDefs": [], + "children": [ + { + "_key": "9ef9c83605ce", + "_type": "span", + "marks": [], + "text": "\"" + }, + { + "marks": [ + "em" + ], + "text": "This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\"", + "_key": "a9952bbf287c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " - ", + "_key": "665afe3bc168" + }, + { + "_key": "8e7e684aab46", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Suchitra Thapa (mentee)" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "81259c1f4ae3" + } + ], + "_type": "block", + "style": "normal", + "_key": "a079cd58fb3c", + "markDefs": [] + }, + { + "_key": "8f1ed3147881", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\"", + "_key": "e91d4988fd83" + }, + { + "text": " - ", + "_key": "fb07dd0f06bd", + "_type": "span", + "marks": [] + }, + { + "text": "Maxime Borry (mentor)", + "_key": "62e42bbbd267", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "_key": "490b63262630", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "90cbe5d73720" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "05573d4c8c19" + } + ], + "_type": "block", + "style": "h2", + "_key": "cb71844b2004" + }, + { + "style": "normal", + "_key": "46a962656f63", + "markDefs": [], + "children": [ + { + "text": "Feedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.", + "_key": "1f4d6f173c17", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "70611a4816e9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e6444141d4cf", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "3e3993943317", + "markDefs": [], + "children": [ + { + "_key": "803ff12c1b77", + "_type": "span", + "marks": [], + "text": "\"This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!\" - " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "John Juma (mentor)", + "_key": "b783f9e2a2a7" + } + ], + "_type": "block" + }, + { + "alt": "Screenshot of final zoom meetup", + "_key": "c842eac6b09f", + "asset": { + "_type": "reference", + "_ref": "image-4d5f2f1331d74bcba295e81c065d0c95cb4a0848-3246x1820-png" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a0158ad4433f", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "82a583b16afd" + }, + { + "children": [ + { + "_key": "b7f58b2e02a0", + "_type": "span", + "marks": [], + "text": "As the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow." + } + ], + "_type": "block", + "style": "normal", + "_key": "812c162381d7", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1a81b9d201c2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b3a0554a611a" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/mentorships", + "_key": "6754093172a9" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See ", + "_key": "8bef7e9fa93b" + }, + { + "_type": "span", + "marks": [ + "6754093172a9" + ], + "text": "https://nf-co.re/mentorships", + "_key": "6218f4af465d" + }, + { + "text": " for details.", + "_key": "497f552b3da6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "417226b675a8" + }, + { + "children": [ + { + "text": "", + "_key": "e7ff87ca2f33", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dcaa2822c17d", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://nf-co.re/mentorships", + "_key": "cfda61ac796d", + "_type": "link" + } + ], + "children": [ + { + "text": "Mentorship Round 2 - Details", + "_key": "acc9dcde16c1", + "_type": "span", + "marks": [ + "cfda61ac796d" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "f7c90e2b1f56" + } + ], + "_rev": "hf9hwMPb7ybAE3bqEU5qg4" + }, + { + "_createdAt": "2024-09-24T07:27:22Z", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "a4fd9b6dafb3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Bioinformaticians worldwide, get ready to mark your calendars: Fall 2024 is looking jam-packed with amazing opportunities to learn, connect, and stay at the forefront of bioinformatics!", + "_key": "005350c7abb30" + } + ] + }, + { + "style": "normal", + "_key": "616a0b80c1b7", + "markDefs": [], + "children": [ + { + "_key": "83660dc9adba0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "1772499dc381", + "markDefs": [], + "children": [ + { + "text": "With so many fantastic events happening worldwide, we've handpicked those that are bioinformatics-focused or feature bioinformatics tracks – so you can be sure not to miss the ones most relevant to you. \n\nHere is our curated compilation of some of the best industry events to attend this fall in Europe, North America, and Asia-Pacific, as well as a sneak peak of events coming up in 2025.", + "_key": "b2ce77baf4420", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "736725c102d0", + "markDefs": [], + "children": [ + { + "_key": "bc91ea5b179f0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Top bioinformatics events in Europe", + "_key": "ba883d260fbf0" + } + ], + "_type": "block", + "style": "h2", + "_key": "80b88306ce1f" + }, + { + "_key": "6ef975430bd0", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "a58cd46a7162" + } + ], + "children": [ + { + "marks": [ + "a58cd46a7162", + "strong" + ], + "text": "Nextflow Summit Barcelona", + "_key": "504b028511d50", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "7408e5fde66f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e1cc809749e10", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "eedc79c932b6", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Location:", + "_key": "ba521555b8600", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " Barcelona, Spain", + "_key": "e38d045f888b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "4a2541ddb2bf", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Dates:", + "_key": "6f4fbc08e04d0", + "_type": "span" + }, + { + "text": " October 28 - November 1, 2024", + "_key": "3022f367802b", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "92a0bd002639", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In-person | Online", + "_key": "b6f470c2fc4a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "81af197467b0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7b2910b77cf90" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/boston/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "e53a5efb455a" + }, + { + "_key": "d19e6bd20b41", + "_type": "link", + "href": "https://seqera.io?utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024" + }, + { + "_key": "3f0fdb18390e", + "_type": "link", + "href": "https://seqera.io/nextflow/?utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/training/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "3b66656713f3" + }, + { + "_key": "3a54f51827fc", + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/hackathon/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024" + } + ], + "children": [ + { + "marks": [], + "text": "Did you miss out on the ", + "_key": "4b0e9968a18b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "e53a5efb455a" + ], + "text": "Nextflow Summit in Boston", + "_key": "4b0e9968a18b1" + }, + { + "_type": "span", + "marks": [], + "text": " earlier this year? Don’t worry! The premier event in bioinformatics, from ", + "_key": "4b0e9968a18b2" + }, + { + "_type": "span", + "marks": [ + "d19e6bd20b41" + ], + "text": "Seqera ", + "_key": "4b0e9968a18b3" + }, + { + "text": "- the creators of ", + "_key": "4b0e9968a18b4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "3f0fdb18390e" + ], + "text": "Nextflow", + "_key": "4b0e9968a18b5" + }, + { + "marks": [], + "text": " - returns to the old continent and will bring together leading experts, innovators, and researchers to showcase the latest breakthroughs in ", + "_key": "4b0e9968a18b6", + "_type": "span" + }, + { + "text": "bioinformatics workflow management", + "_key": "9d456da5cb10", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": ". Whether you are new to Nextflow or a seasoned pro, the Nextflow Summit offers something for everyone. The ", + "_key": "373cbfba2986", + "_type": "span", + "marks": [] + }, + { + "text": "foundational training", + "_key": "4b0e9968a18b7", + "_type": "span", + "marks": [ + "3b66656713f3" + ] + }, + { + "marks": [], + "text": " is perfect for newcomers, while experienced users can dive into advanced topics during the ", + "_key": "4b0e9968a18b8", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3a54f51827fc" + ], + "text": "nf-core hackathon", + "_key": "4b0e9968a18b9" + }, + { + "marks": [], + "text": ". The event concludes with three days of talks where attendees can learn about the latest developments from the Nextflow world.", + "_key": "4b0e9968a18b10", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f2f63e0ffe44" + }, + { + "children": [ + { + "_key": "35a1087e0cc40", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8116e6ce1ea3", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Register by October 11 for the in-person event or by October 21 for the online event — don’t miss your chance to join! ", + "_key": "de39ddb7f24d0" + }, + { + "marks": [ + "5bfcc95e707c" + ], + "text": "Secure your spot now", + "_key": "b1db89a2a9460", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "ac094e244fb8", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "5bfcc95e707c" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "20a3f7967a050", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3e5fcc90ac56", + "markDefs": [] + }, + { + "_type": "block", + "style": "h3", + "_key": "e416b76c45cc", + "markDefs": [ + { + "href": "https://www.terrapinn.com/conference/biotechx/index.stm", + "_key": "b7d0470bb0a0", + "_type": "link" + } + ], + "children": [ + { + "text": "BiotechX Europe", + "_key": "9a11618984d80", + "_type": "span", + "marks": [ + "b7d0470bb0a0", + "strong" + ] + } + ] + }, + { + "style": "normal", + "_key": "25078f3c087a", + "markDefs": [], + "children": [ + { + "_key": "eb507cbfd26d0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "56111676db9c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location", + "_key": "e089ddce37050" + }, + { + "_type": "span", + "marks": [], + "text": ": Basel, Switzerland", + "_key": "6f54ca8682e0" + } + ] + }, + { + "_key": "97e0f9f38fc5", + "markDefs": [], + "children": [ + { + "_key": "b16b44c054520", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates:" + }, + { + "marks": [], + "text": " October 9-10, 2024", + "_key": "a05dff9d821a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "a5c447814d2e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In-person", + "_key": "f8f8a4a1a7570" + } + ] + }, + { + "_key": "40fa33cf6eb4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8a9758c9fea70" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "d3885f9b73c9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you work in pharmaceutical development and healthcare, this is the event for you to attend. As Europe’s largest conference in the industry, BiotechX Europe will welcome more than 400 speakers, 3,500 attendees, and 150 exhibitors. Aiming to foster collaboration between research and industry, the event features 16 tracks covering a wide range of topics, including ", + "_key": "2e13ccf9b0c60" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "bioinformatics, multi-omics data management, AI, and computational genomics", + "_key": "9383f9cb63d8" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "88656d154026" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a4d22394e97d0" + } + ], + "_type": "block", + "style": "normal", + "_key": "b0c44d155f57", + "markDefs": [] + }, + { + "_key": "c31bb3fd6482", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/seqera-at-biotechx-eu-2024/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "5bddbaea2256" + } + ], + "children": [ + { + "_key": "fa7beca9b4170", + "_type": "span", + "marks": [ + "5bddbaea2256" + ], + "text": "Seqera will be at the event" + }, + { + "text": " for two full days of networking and discussion with the life sciences community from around the world. We'll also deliver a talk as part of the bioinformatics track, so be sure to stop by. Can’t make it? No worries–we'll send you the recording afterward.", + "_key": "fa7beca9b4171", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "fb0c1c420266", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d20d7581ffe7" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "c34556d8796c", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/seqera-at-biotechx-eu-2024/?utm_campaign=BiotechX%20Europe%20October%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "abc21a447095" + } + ], + "children": [ + { + "_key": "75425c19bb920", + "_type": "span", + "marks": [ + "abc21a447095" + ], + "text": "Send me the recording!" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "9b2c991e6bc8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ccffff4a3a840" + } + ] + }, + { + "_key": "ad59bb4d3c92", + "markDefs": [], + "children": [ + { + "_key": "dee5fb2705be0", + "_type": "span", + "marks": [], + "text": "Top bioinformatics events in North America" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "54f53931838a" + } + ], + "_type": "block", + "style": "normal", + "_key": "b2f8fdbbacd3" + }, + { + "style": "h3", + "_key": "b82701a347b9", + "markDefs": [ + { + "href": "https://www.ashg.org/meetings/2024meeting/", + "_key": "1de576a95b41", + "_type": "link" + } + ], + "children": [ + { + "_key": "f121c91a01960", + "_type": "span", + "marks": [ + "1de576a95b41", + "strong" + ], + "text": "American Society of Human Genetics (ASHG) 2024 Annual Meeting" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9791344b1601", + "markDefs": [], + "children": [ + { + "_key": "270a0b0c93520", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "5f83a2c703a00", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location" + }, + { + "_type": "span", + "marks": [], + "text": ": Denver, CO", + "_key": "34f611874891" + } + ], + "_type": "block", + "style": "normal", + "_key": "6cafccd72b27" + }, + { + "markDefs": [], + "children": [ + { + "text": "Dates:", + "_key": "200e5ac4ecbf0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "1c818d12c54a", + "_type": "span", + "marks": [], + "text": " November 5-9, 2024" + } + ], + "_type": "block", + "style": "normal", + "_key": "6dd5d782c459" + }, + { + "_type": "block", + "style": "normal", + "_key": "f2486648036e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In-person", + "_key": "48854f5dda260", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "736a1293e25e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2d6ee098f3950" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "886e182edc4d", + "markDefs": [], + "children": [ + { + "text": "ASHG 2024 will welcome more than 8,000 scientists from around the world for five days of talks, exhibits, and networking events focused on ", + "_key": "bbcbfde31c4b0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "genetics and genomics science", + "_key": "cacd04d5d23c" + }, + { + "_type": "span", + "marks": [], + "text": ". The conference will feature many sessions and workshops dedicated to bioinformatics, big data analysis, and computational biology, making it one of the most anticipated events this year for bioinformaticians and computational biologists.", + "_key": "712fc74e16b6" + } + ], + "_type": "block" + }, + { + "_key": "18ad39abbcf6", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0414fa03d6dd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera will exhibit at the event and lead an industry session on November 6 at 12:00 pm. More information will be available soon.", + "_key": "3e19b04852a00" + } + ], + "_type": "block", + "style": "normal", + "_key": "e5ca7f24e4c2", + "markDefs": [] + }, + { + "style": "normal", + "_key": "ef22f2d30f12", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f2faa13a66e9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "a75336029a69", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you'd like to join ASHG, make sure to register by October 1 – time is running out!", + "_key": "c8edc7ead73f0" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_key": "406e5aa50374", + "markDefs": [], + "children": [ + { + "_key": "97a5825ce0bd0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "c1609723d99c", + "markDefs": [ + { + "href": "https://sc24.supercomputing.org/", + "_key": "aad3372eb124", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "aad3372eb124", + "strong" + ], + "text": "Supercomputing Conference (SC) 2024", + "_key": "7f49ea4d4a120" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "4aa3272646850", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f80b88664b26", + "markDefs": [] + }, + { + "children": [ + { + "text": "Location", + "_key": "2aa2e076c3e00", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "26f9d1914085", + "_type": "span", + "marks": [], + "text": ": Atlanta, GA" + } + ], + "_type": "block", + "style": "normal", + "_key": "8856171a422b", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Dates", + "_key": "430211bfc8fd0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": ": November 17-22, 2024", + "_key": "4b57e300353f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8e494c697046" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In-person", + "_key": "d02b7076e2730", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6687f77180c" + }, + { + "_key": "0784e7e7b650", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1786737407f60", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "SC 2024 is an essential event for professionals and students in the high-performance computing (HPC) community. It is heavily oriented towards bioinformaticians involved in the computational aspects of bioinformatics and will tackle topics including ", + "_key": "a6f68a9bee6f0" + }, + { + "text": "AI, machine learning, and cloud computing", + "_key": "4c799f8dcbf5", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": ". The six-day event will also allow attendees to attend tutorials and workshops, giving them the chance to learn from leading experts in the most popular areas of HPC.", + "_key": "07dfe2450254", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b05914a7f5f4" + }, + { + "children": [ + { + "_key": "0972cb55b5a20", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "709058259fc0", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "c59da1af8df20", + "_type": "span", + "marks": [], + "text": "Top bioinformatics event in Asia-Pacific" + } + ], + "_type": "block", + "style": "h2", + "_key": "613b9498e172" + }, + { + "_key": "d645227f6012", + "markDefs": [ + { + "_type": "link", + "href": "https://www.abacbs.org/conference2024/home", + "_key": "48fc94ab104d" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "48fc94ab104d", + "strong" + ], + "text": "Australian Bioinformatics and Computational Biology Society (ABACBS)", + "_key": "cdbc4313d6e40" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "08ac1af2bef90", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b7bf73f58eec", + "markDefs": [] + }, + { + "style": "normal", + "_key": "8c082012ba0e", + "markDefs": [], + "children": [ + { + "_key": "74cbc34e6fc00", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location" + }, + { + "_key": "6c5ae628c782", + "_type": "span", + "marks": [], + "text": ": Sydney, Australia" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates", + "_key": "3ad60b7ef0f70" + }, + { + "marks": [], + "text": ": November 4-6, 2024", + "_key": "591dd93801c2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa5f5d785b87" + }, + { + "markDefs": [], + "children": [ + { + "_key": "70aeae7c0a1c0", + "_type": "span", + "marks": [], + "text": "In-person" + } + ], + "_type": "block", + "style": "normal", + "_key": "51210721b576" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "dc51128dbf810", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "084892f223c3", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "9d1c6ab024b3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Back for its 9th edition, the Australian Bioinformatics and Computational Biology Society conference (ABACBS) is an exciting event for bioinformatics professionals and students in APAC, serving as the central hub for bioinformatics and computational biology in the region. In addition to highlighting international developments in the field, the conference focuses on regional bioinformatics innovations across central themes such as ", + "_key": "0b7cb42aa9110" + }, + { + "_key": "0290cc74565c", + "_type": "span", + "marks": [ + "strong" + ], + "text": "AI, statistical bioinformatics, genomics, proteomics, and single-cell and spatial technologies" + }, + { + "_key": "aca568628005", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "99b177a43c3c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "733d749612cf0" + } + ] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "00d0e23d734f", + "markDefs": [ + { + "href": "https://www.combine.org.au/symp/", + "_key": "dae3da921ce9", + "_type": "link" + } + ], + "children": [ + { + "text": "If you’re a student in the field, you should consider attending the event, which will be held in conjunction with the ", + "_key": "9513b42b74790", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "dae3da921ce9" + ], + "text": "COMBINE", + "_key": "9513b42b74791" + }, + { + "_type": "span", + "marks": [], + "text": " student symposium.", + "_key": "9513b42b74792" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d7f7b49e0840", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "fbe1769338e30", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Upcoming bioinformatics events in 2025", + "_key": "138d633d7ef70", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "4560ed2f3628", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "f33391676310", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For those of you already planning for next year's conference season, we’ve highlighted events that are already confirmed for 2025. While their programs are yet to be released, you can count on these events taking place.", + "_key": "bb6eb50c89060" + } + ] + }, + { + "style": "normal", + "_key": "8575b2b8ad7c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "cbc69b8eee9f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [ + "68b913f586a0", + "strong" + ], + "text": "Nextflow Summit 2025", + "_key": "facd674d64600", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "9527133dd59c", + "markDefs": [ + { + "_key": "68b913f586a0", + "_type": "link", + "href": "https://summit.nextflow.io/preregister-2025/" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f7a2358dce0a0" + } + ], + "_type": "block", + "style": "normal", + "_key": "1755b2902f51", + "markDefs": [] + }, + { + "style": "normal", + "_key": "efae1385cf87", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Location", + "_key": "f026d6e6cc2d0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ": Boston & Barcelona", + "_key": "35978ba96f8b" + } + ], + "_type": "block" + }, + { + "_key": "a08e77d4c18c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates:", + "_key": "f09ea28e89f20" + }, + { + "marks": [], + "text": " May 13-16, 2025, Boston | Fall 2025, Barcelona", + "_key": "16d8512f2f29", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "234a304394e60", + "_type": "span", + "marks": [], + "text": "In-person | Online" + } + ], + "_type": "block", + "style": "normal", + "_key": "a62cf4f06de9", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "e2bba980e95a0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "38bdbbe0ea31", + "markDefs": [] + }, + { + "style": "normal", + "_key": "3b9cca4be7c4", + "markDefs": [], + "children": [ + { + "text": "If you missed earlier editions of the Boston and Barcelona Nextflow Summits, this is your chance to take part. The Nextflow Summit will be back in Boston during the Spring of 2025 and to Barcelona in the Fall. While the full agenda is yet to be released, you can already pre-register to be the first to know when tickets go on sale.", + "_key": "949ff83df9750", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "174f3cf7650b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "808678869018" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "00e54ab70ebd0", + "_type": "span", + "marks": [ + "5b563ada5eeb" + ], + "text": "Pre-register" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "ee129ec0b1f1", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/preregister-2025/?utm_campaign=Summit%202024&utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "5b563ada5eeb" + } + ] + }, + { + "_key": "d2097b2976a1", + "markDefs": [ + { + "href": "https://festivalofgenomics.com/london/en/page/home", + "_key": "ae10362c6b73", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "56dc071f15870", + "_type": "span" + }, + { + "_key": "5ac46e126d80", + "_type": "span", + "marks": [ + "ae10362c6b73", + "strong" + ], + "text": "The Festival of Genomics & Biodata" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "857eb01550dd", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "51ff3f3d71470", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e7da00284d71", + "markDefs": [], + "children": [ + { + "_key": "3c23340b964e0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Location" + }, + { + "marks": [], + "text": ": London, UK", + "_key": "dc560c599223", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "679432d9beff0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates" + }, + { + "_type": "span", + "marks": [], + "text": ": January 29-30, 2025", + "_key": "dd4fd5fccef2" + } + ], + "_type": "block", + "style": "normal", + "_key": "5554a518940e", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In-person", + "_key": "706f90cb61650" + } + ], + "_type": "block", + "style": "normal", + "_key": "b00fc59f223a", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e6d4529eb25f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "82b182d551db" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Established as the UK’s largest annual life sciences event, the Festival of Genomics & Biodata is particularly relevant for ", + "_key": "4e96f23e3acc0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "bioinformaticians in the genomics community", + "_key": "ba05da7be364" + }, + { + "_type": "span", + "marks": [], + "text": ". The 2025 edition is expected to gather more than 7000 attendees and 300 speakers. The full agenda will be released on October 15, 2024, but you can already express interest in registering to be the first to know when tickets go on sale!", + "_key": "72271b05856e" + } + ], + "_type": "block", + "style": "normal", + "_key": "5e3ffe169818" + }, + { + "children": [ + { + "text": "", + "_key": "c4dd4d1cba7f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9e05733ec461", + "markDefs": [] + }, + { + "_key": "64a0959f1585", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/events/?utm_source=seqera&utm_medium=blog&utm_content=top_events_fall_2024", + "_key": "511e6294d45b" + } + ], + "children": [ + { + "marks": [], + "text": "Seqera will be attending the Festival for the third year in a row! We’ll share more information about our participation soon–stay tuned! To make sure you don’t miss out on any announcements, follow us on social media or check out our ", + "_key": "a552c6b5f6c50", + "_type": "span" + }, + { + "text": "events page", + "_key": "a552c6b5f6c51", + "_type": "span", + "marks": [ + "511e6294d45b" + ] + }, + { + "text": ".", + "_key": "a552c6b5f6c52", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_type": "block", + "style": "normal", + "_key": "da2fe42732c0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "938fc2309ea1" + } + ] + }, + { + "children": [ + { + "_key": "f95dcb9530220", + "_type": "span", + "marks": [ + "3af3eac54fd9", + "strong" + ], + "text": "Bio-IT World Conference & Expo" + } + ], + "_type": "block", + "style": "h3", + "_key": "e878b99f1723", + "markDefs": [ + { + "_type": "link", + "href": "https://www.bio-itworldexpo.com/", + "_key": "3af3eac54fd9" + } + ] + }, + { + "_key": "54ddb92cf29e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "465133b3815c0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "54792387f148", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Location", + "_key": "f593685fd4f20", + "_type": "span" + }, + { + "_key": "1b69f8ea855f", + "_type": "span", + "marks": [], + "text": ": Boston, MA" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Dates", + "_key": "e6746c9c63d90" + }, + { + "_type": "span", + "marks": [], + "text": ": April 2-4, 2025", + "_key": "369508d1c229" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e746b424dfe", + "markDefs": [] + }, + { + "children": [ + { + "_key": "d3d807db73a90", + "_type": "span", + "marks": [], + "text": "In-person | Online" + } + ], + "_type": "block", + "style": "normal", + "_key": "10af5b1aaee8", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "d17fe67cf1d4", + "markDefs": [], + "children": [ + { + "_key": "69afc5921d850", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "d98d6ce09255", + "markDefs": [], + "children": [ + { + "_key": "c570a28e8bac0", + "_type": "span", + "marks": [], + "text": "The Annual Conference and Expo focuses on the intersection of " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "life sciences, data sciences, and technology", + "_key": "ec0b70541836" + }, + { + "marks": [], + "text": " and is particularly suited to bioinformaticians and computational biologists with a strong interest in data and technology. The event includes plenary keynotes, over 200 educational and technical presentations across 11 tracks, interactive discussions, and exhibits on the latest technologies in the life sciences. Those of you who can’t attend in person can follow a live virtual stream. Registrations are already open, and you can benefit from a discounted rate until November 15, 2024!", + "_key": "a28d11d5fc55", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "6f9ff5363f1a", + "markDefs": [], + "children": [ + { + "_key": "7b614ade8e38", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera will be a Platinum sponsor of Bio-IT World. Visit our booth on the tradeshow floor and listen to our presentation on the Cloud Computing track on Thursday, April 3. More information will be available earlier in the year.", + "_key": "46425ee3e2460" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "b061a380bfe9", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4ec1d8fa8969" + } + ], + "_type": "block", + "style": "normal", + "_key": "94fff51924ce", + "markDefs": [] + }, + { + "style": "h2", + "_key": "9b2834fdcebf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Why these events matter: learn, innovate, connect", + "_key": "b4d6c1726cd40" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "32dd265b0c7b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The events we’ve highlighted are all well-established and represent a unique opportunity to keep up with the latest research, build strong industry connections, and learn new skills. Throughout the wide range of topics and specialties covered, data scientists and bioinformaticians can keep up with how the field is advancing, both at the regional and international levels.", + "_key": "3a6692daa6b20" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "91d71b7afa4b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The hands-on workshops and tutorials will also help develop practical skills that you can apply to your research or work.", + "_key": "02f1d633eef70" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "5e231565ed770", + "_type": "span", + "marks": [], + "text": "Whether you’re just starting or a seasoned expert, these events represent an excellent opportunity for professional growth and to remain at the forefront of bioinformatics." + } + ], + "_type": "block", + "style": "normal", + "_key": "d814e73f13d4" + }, + { + "_key": "c77e333c8e1d", + "markDefs": [], + "children": [ + { + "_key": "a388e852748e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "737760a444b6", + "asset": { + "_ref": "image-54912048f85a1aa655553391b6d0e62fa57e82de-1200x628-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "1ced84af1e7a0" + } + ], + "_type": "block", + "style": "normal", + "_key": "e4638b15e2df" + } + ], + "author": { + "_ref": "irina-silva", + "_type": "reference" + }, + "_type": "blogPost", + "_id": "15c75021-e091-4854-9aa0-fc04970ec963", + "tags": [ + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "851fad916bc4" + } + ], + "_updatedAt": "2024-09-24T09:27:24Z", + "meta": { + "noIndex": false, + "slug": { + "_type": "slug", + "current": "bioinformatics-events-2024-2025" + }, + "_type": "meta", + "description": "Get ready to mark your calendars because the fall of 2024 is going to be jam-packed with amazing opportunities to expand your knowledge, make new connections, and stay at the forefront of bioinformatics!" + }, + "title": "Bioinformatics events you can’t miss in fall 2024 and early 2025", + "_rev": "odsN0KVxadbI50QPUHiVWo", + "publishedAt": "2024-09-24T09:27:00.000Z" + }, + { + "_updatedAt": "2024-10-14T08:14:37Z", + "author": { + "_type": "reference", + "_ref": "mNsm4Vx1W1Wy6aYYkroetD" + }, + "meta": { + "description": "We are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers", + "slug": { + "current": "introducing-nextflow-ambassador-program" + } + }, + "_rev": "2PruMrLMGpvZP5qAknmCcn", + "tags": [ + { + "_type": "reference", + "_key": "513706fc2fdb", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_type": "reference", + "_key": "ce1df4cabdb1", + "_ref": "3d25991c-f357-442b-a5fa-6c02c3419f88" + } + ], + "_type": "blogPost", + "_id": "17055226a59e", + "title": "Introducing the Nextflow Ambassador Program", + "_createdAt": "2024-09-25T14:17:23Z", + "publishedAt": "2023-10-18T06:00:00.000Z", + "body": [ + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers.", + "_key": "a117e2b4acff" + } + ], + "_type": "block", + "style": "normal", + "_key": "4410f5606b1d", + "markDefs": [] + }, + { + "_key": "bc7b6c0a4c88", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bf50e4abff86" + } + ], + "_type": "block", + "style": "normal" + }, + { + "size": "small", + "_type": "picture", + "alt": "nf-core Hackathon in Barcelona 2023", + "_key": "fc868e5686da", + "alignment": "right", + "asset": { + "_type": "image", + "asset": { + "_ref": "image-2a9648dad75de6c930ca67d9dc43b90c9a5ce90b-900x981-jpg", + "_type": "reference" + } + } + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1bf450716ce6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "227b0fd37e00" + }, + { + "children": [ + { + "text": "Nextflow ambassadors will play a vital role in:", + "_key": "85f6064419cc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6130b7d70f6c", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "32556c7707d7", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Sharing knowledge", + "_key": "fe66504a70b00" + }, + { + "marks": [], + "text": ": Ambassadors provide valuable insights and best practices to help users make the most of Nextflow by writing training material and blog posts, giving seminars and workshops, organizing hackathons and meet-ups, and helping with community support.", + "_key": "8762e91494e0", + "_type": "span" + } + ], + "level": 1 + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Fostering collaboration", + "_key": "75e1c00f694b0" + }, + { + "_type": "span", + "marks": [], + "text": ": As knowledgeable members of our community, ambassadors facilitate connections among users and developers, enabling collaboration on community projects, such as nf-core pipelines, sub-workflows, and modules, among other things, in the Nextflow ecosystem.", + "_key": "0856b210910e" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "e7a7b3f014ad", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Community growth", + "_key": "b7ee6750eb9c0" + }, + { + "_type": "span", + "marks": [], + "text": ": Ambassadors help expand and enrich the Nextflow community, making it more vibrant and supportive. They are local contacts for new community members and engage with potential users in their region and fields of expertise.", + "_key": "1df25a8ab577" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3772fa8a407f", + "listItem": "bullet" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7c4108f47a57", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "59449ed5fb6d" + }, + { + "style": "normal", + "_key": "616292489f7d", + "markDefs": [], + "children": [ + { + "_key": "ff67b34b21490", + "_type": "span", + "marks": [], + "text": "As community members who already actively contribute to outreach, ambassadors will be supported to extend the work they’re already doing. For example, many of our ambassadors run local Nextflow training events – to help with this, the program will include “train the trainer” sessions and give access to our content library with slide decks, templates, and more. Ambassadors can also request stickers and financial support for events they organize (e.g., for pizza). Seqera is opening an exclusive travel fund that ambassadors can apply to help cover travel costs for events where they will present relevant work. Social media content written by ambassadors will be amplified by the nextflow and nf-core accounts, increasing their reach. Ambassadors will get “behind the scenes” access, with insights into running an open-source community, early access to new features, and a great networking experience. The ambassador network will enable members to be kept up-to-date with events and opportunities happening all over the world. To recognize their efforts, ambassadors will receive exclusive swag and apparel, a certificate for their work, and a profile on the ambassador page of our website." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "5830c122b5290", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "18a57a26bbdb", + "markDefs": [] + }, + { + "style": "h2", + "_key": "fbe6916524f0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Meet our ambassadors", + "_key": "1a57c6363147", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/our_ambassadors.html", + "_key": "0bdbee58fdb5" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can visit our ", + "_key": "3627b6c3fe0b" + }, + { + "text": "Nextflow ambassadors page", + "_key": "152ab0c2219a", + "_type": "span", + "marks": [ + "0bdbee58fdb5" + ] + }, + { + "marks": [], + "text": " to learn more about our first group of ambassadors. You will find their profiles there, highlighting their interests, expertise, and insights they bring to the Nextflow ecosystem.", + "_key": "6255a610d688", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1e7b869dbea3" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "b704ba7382b7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "99bb3d3ea0d4", + "markDefs": [] + }, + { + "style": "normal", + "_key": "69067864ca03", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can see snippets about some of our ambassadors below:", + "_key": "4391e48f7672" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c7208bdf3749", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "99d21bef958f" + }, + { + "markDefs": [], + "children": [ + { + "text": "Priyanka Surana", + "_key": "d3ad09ee244c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h4", + "_key": "ad9dd568c04d" + }, + { + "_key": "10afdcfe3624", + "markDefs": [ + { + "_key": "bd7e7fb6fcef", + "_type": "link", + "href": "https://pipelines.tol.sanger.ac.uk/pipelines" + } + ], + "children": [ + { + "_key": "5c6bbedd2c1a", + "_type": "span", + "marks": [], + "text": "Priyanka Surana is a Principal Bioinformatician at the Wellcome Sanger Institute, where she oversees the Nextflow development for the Tree of Life program. Over the last almost two years, they have released nine pipelines with nf-core standards and have three more in development. You can learn more about them " + }, + { + "_type": "span", + "marks": [ + "bd7e7fb6fcef" + ], + "text": "here", + "_key": "259b698e277c" + }, + { + "_key": "f93972378a73", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6a4da5a1fcdb", + "markDefs": [], + "children": [ + { + "_key": "3745179dfb82", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "39d491380594", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "She’s one of our ambassadors in the UK 🇬🇧 and has already done fantastic outreach work, organizing seminars and bringing many new users to our community! 🤩 In the March Hackathon, she organized a local site with over 70 individuals participating in person, plus over five other events in 2023. The Nextflow community on the Wellcome Genome Campus started in March 2023 with the nf-core hackathon, and now it has grown to over 150 members across 11 different organizations across Cambridge. Currently, they are planning a day-long Nextflow Symposium in December 🤯. They do seminars, workshops, coffee meetups, and trainings. In our previous round of the Nextflow and nf-core mentorship, Priyanka mentored Lila, a graduate student in Peru, to build her first Nextflow pipeline using nf-core tools to analyze bacterial metagenomics data. This is the power of a Nextflow ambassador! Not only growing a local community but helping people all over the world to get the best out of Nextflow and nf-core 🥰.", + "_key": "d9f323bb7883", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "652ea87303b4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2ce936283694", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "h4", + "_key": "8f7bd3003d5d", + "markDefs": [], + "children": [ + { + "_key": "cba4909039ca", + "_type": "span", + "marks": [], + "text": "Abhinav Sharma" + } + ] + }, + { + "_key": "de7a97f18459", + "markDefs": [ + { + "_key": "06ab99d97dfd", + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ" + }, + { + "_key": "0de7e52eeee3", + "_type": "link", + "href": "https://twitter.com/abhi18av/status/1695863348162675042" + }, + { + "_key": "dc28161b87d7", + "_type": "link", + "href": "https://github.com/TelethonKids/Nextflow-BioWiki" + }, + { + "href": "https://www.gov.br/iec/pt-br/assuntos/noticias/curso-contribui-para-criacao-da-rede-norte-nordeste-de-vigilancia-genomica-para-tuberculose-no-iec", + "_key": "beca79ea6900", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Abhinav is a PhD candidate at Stellenbosch University, South Africa. As a Nextflow Ambassador, Abhinav has been tremendously active in the Global South, supporting young scientists in Africa 🇿🇦🇿🇲, Brazil 🇧🇷, India 🇮🇳 and Australia 🇦🇺 leading to the growth of local communities. He has contributed to the ", + "_key": "b6e8d36f86c3" + }, + { + "text": "Nextflow training in Hindi", + "_key": "518bdf7a4d40", + "_type": "span", + "marks": [ + "06ab99d97dfd" + ] + }, + { + "text": " and played a key role in integrating African bioinformaticians in the Nextflow and nf-core community and initiatives, showcased by the high participation of individuals in African countries who benefited from mentorship during nf-core Hackathons, Training events and prominent workshops like ", + "_key": "7d1842c315da", + "_type": "span", + "marks": [] + }, + { + "_key": "7b14a2d7260f", + "_type": "span", + "marks": [ + "0de7e52eeee3" + ], + "text": "VEME, 2023" + }, + { + "text": ". In Australia, Abhinav continues to collaborate with Patricia, a research scientist from Telethon Kids Institute, Perth (whom he mentored during the nf-core mentorship round 2), to organize monthly seminars on ", + "_key": "21fcf63931ec", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "dc28161b87d7" + ], + "text": "BioWiki", + "_key": "7dfad4122f67", + "_type": "span" + }, + { + "text": " and bootcamp for local capacity building. In addition, he engages in regular capacity-building sessions in Brazilian institutes such as ", + "_key": "314a2d2d508f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "beca79ea6900" + ], + "text": "Instituto Evandro Chagas", + "_key": "3def7320d8e3", + "_type": "span" + }, + { + "marks": [], + "text": " (Belém, Brazil) and INI, FIOCRUZ (Rio de Janeiro, Brazil). Last but not least, Abhinav has contributed to the Nextflow community and project in several ways, even to the extent of contributing to the Nextflow code base and plugin ecosystem! 😎", + "_key": "0ef144ecac23", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0410219f035e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ab15123b64bf", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Robert Petit", + "_key": "06626c083a6f" + } + ], + "_type": "block", + "style": "h4", + "_key": "bf61576534ee" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Robert Petit is the Senior Bioinformatics Scientist at the ", + "_key": "901531dd83bb" + }, + { + "text": "Wyoming Public Health Laboratory", + "_key": "90ab0c0b886c", + "_type": "span", + "marks": [ + "cdf56da59ce7" + ] + }, + { + "text": " 🦬 and a long-time contributor to the Nextflow community! 🥳 Being a Nextflow Ambassador, Robert has made extensive efforts to grow the Nextflow and nf-core communities, both locally and internationally. Through his work on ", + "_key": "a86514e125db", + "_type": "span", + "marks": [] + }, + { + "_key": "f3f8d1d69af5", + "_type": "span", + "marks": [ + "f40ca52fd7d3" + ], + "text": "Bactopia" + }, + { + "_type": "span", + "marks": [], + "text": ", a popular and extensive Nextflow pipeline for the analysis of bacterial genomes, Robert has been able to ", + "_key": "df550964cc75" + }, + { + "text": "contribute to nf-core regularly", + "_key": "27708e432c84", + "_type": "span", + "marks": [ + "3a86ba5c4884" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". As a Bioconda Core team member, he is always lending a hand when called upon by the Nextflow community, whether it is to add a new recipe or approve a pull request! ⚒️ He has also delivered multiple trainings to the local community in Wyoming, US 🇺🇸, and workshops at conferences, including ASM Microbe. Robert's dedication as a Nextflow Ambassador is best highlighted, and he'll agree, by his active role as a mentor. Robert has acted as a mentor multiple times during virtual nf-core hackathons, and he is the only person to be a mentor in all three rounds of the Nextflow and nf-core mentorship program 😍!", + "_key": "f14233ba7f6d" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e6e149f9adc", + "markDefs": [ + { + "_type": "link", + "href": "https://health.wyo.gov/publichealth/lab/", + "_key": "cdf56da59ce7" + }, + { + "href": "https://bactopia.github.io/", + "_key": "f40ca52fd7d3", + "_type": "link" + }, + { + "href": "https://bactopia.github.io/v3.0.0/impact-and-outreach/enhancements/#enhancements-and-fixes", + "_key": "3a86ba5c4884", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ca2bc13967ff" + } + ], + "_type": "block", + "style": "normal", + "_key": "a048a9278d4b" + }, + { + "_type": "block", + "style": "normal", + "_key": "4836756ed37f", + "markDefs": [], + "children": [ + { + "text": "The Nextflow Ambassador Program is a testament to the power of community-driven innovation, and we invite you to join us in celebrating this exceptional group. In the coming weeks and months, you will hear more from our ambassadors as they continue to share their experiences, insights, and expertise with the community as freshly minted Nextflow ambassadors.", + "_key": "72727706ce42", + "_type": "span", + "marks": [] + } + ] + } + ] + }, + { + "_createdAt": "2024-09-25T14:16:37Z", + "tags": [], + "_type": "blogPost", + "title": "Learning Nextflow in 2022", + "_updatedAt": "2024-09-30T09:19:44Z", + "meta": { + "slug": { + "current": "learn-nextflow-in-2022" + }, + "description": "A lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow Software Carpentries course and an excellent write-up by 23andMe." + }, + "_rev": "mvya9zzDXWakVjnX4hhQm6", + "publishedAt": "2022-01-21T07:00:00.000Z", + "_id": "171e6dc0c8fe", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "17c4bd39bd3b", + "markDefs": [ + { + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "9160a63dc996", + "_type": "link" + }, + { + "_type": "link", + "href": "https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1", + "_key": "bbce41f2ff57" + } + ], + "children": [ + { + "text": "A lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow ", + "_key": "e315546dc35f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "9160a63dc996" + ], + "text": "Software Carpentries", + "_key": "cec59b9e532e" + }, + { + "_type": "span", + "marks": [], + "text": " course and an excellent write-up by ", + "_key": "4edcd19bfde1" + }, + { + "_type": "span", + "marks": [ + "bbce41f2ff57" + ], + "text": "23andMe", + "_key": "466b4cb4df72" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6ab90f9a32cf" + } + ] + }, + { + "style": "normal", + "_key": "06dfb5ccce44", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "961007bbc7e9" + } + ], + "_type": "block" + }, + { + "_key": "6c999505eeba", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/website/tree/master/content/blog/2022/learn-nextflow-in-2022.md", + "_key": "c853f8f18ef4" + } + ], + "children": [ + { + "text": "We have collated some links below from a diverse collection of resources to help you on your journey to learn Nextflow. Nextflow is a community-driven project - if you have any suggestions, please make a pull request to ", + "_key": "024dcb5bce4d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "c853f8f18ef4" + ], + "text": "this page on GitHub", + "_key": "bddedd655f62" + }, + { + "_key": "ca671a153d4d", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "18abef342ef9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "43a2732c5651" + }, + { + "style": "normal", + "_key": "3c7fedca4dc7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Without further ado, here is the definitive guide for learning Nextflow in 2022. These resources will support anyone in the journey from total beginner to Nextflow expert.", + "_key": "0651659ed488" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "79f62fa52638", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "cf418b580159", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "695517434926", + "markDefs": [], + "children": [ + { + "text": "Prerequisites", + "_key": "ff0696e1f523", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "f69464be375a", + "markDefs": [], + "children": [ + { + "_key": "82c2f064a958", + "_type": "span", + "marks": [], + "text": "Before you start writing Nextflow pipelines, we recommend that you are comfortable with using the command-line and understand the basic concepts of scripting languages such as Python or Perl. Nextflow is widely used for bioinformatics applications, and scientific data analysis. The examples and guides below often focus on applications in these areas. However, Nextflow is now adopted in a number of data-intensive domains such as image analysis, machine learning, astronomy and geoscience." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "00c0f09e3786", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f6bba24394ae", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Time commitment", + "_key": "2f50c5ae500f", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "c89645f90042" + }, + { + "children": [ + { + "text": "We estimate that it will take at least 20 hours to complete the material. How quickly you finish will depend on your background and how deep you want to dive into the content. Most of the content is introductory but there are some more advanced dataflow and configuration concepts outlined in the workshop and pattern sections.", + "_key": "2cb2f867299e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6d738b68a18c", + "markDefs": [] + }, + { + "style": "normal", + "_key": "4df0f5429c28", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d20f0bca5dad", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "47a411b9b23e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Contents", + "_key": "4fbbc31ea781", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Why learn Nextflow?", + "_key": "f8eba5a1e9570" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "f1fe0d6c7868" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Introduction to Nextflow from 23andMe", + "_key": "69f164d84c120" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "20d7191d406d", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "768645bff31f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "An RNA-Seq hands-on tutorial", + "_key": "441d0e0a7ad80" + } + ], + "level": 1 + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "aa4b09a42e75", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow workshop from Seqera Labs", + "_key": "111f0b5e3ce20" + } + ] + }, + { + "style": "normal", + "_key": "3a952d69c206", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Software Carpentries Course", + "_key": "8b0e87b422f20", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "7a5cf9a32e75", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Managing Pipelines in the Cloud", + "_key": "d290c444cf780" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "2d082890048b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The nf-core tutorial", + "_key": "f981a15e5e440" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "470efa7dd1ac", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Advanced implementation patterns", + "_key": "fd1efacc007e0", + "_type": "span" + } + ] + }, + { + "_key": "9d673d0d1113", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Awesome Nextflow", + "_key": "109bcfd133840" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e4682b867eb0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "00900b7165d40", + "_type": "span", + "marks": [], + "text": "Further resources" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "6a33ae89cfbf", + "markDefs": [], + "children": [ + { + "_key": "a30ba9100bfd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "1. Why learn Nextflow?", + "_key": "74dabb074dd2" + } + ], + "_type": "block", + "style": "h2", + "_key": "5e29875a4a51", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.", + "_key": "5b7b0e281460", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a7c0080bec46" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1efe5a3d86eb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c0ed6811fa77" + }, + { + "_key": "7cf8bda98d2b", + "markDefs": [], + "children": [ + { + "text": "The following are some high-level motivations on why people choose to adopt Nextflow:", + "_key": "679963ec6e65", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "9ef3453161c2", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c0686f52baa9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "d9ae82376b15", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Integrating Nextflow in your analysis workflows helps you implement ", + "_key": "899c91a005ea0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "reproducible", + "_key": "899c91a005ea1" + }, + { + "_type": "span", + "marks": [], + "text": " pipelines. Nextflow pipelines follow FAIR guidelines with version-control and containers to manage all software dependencies.", + "_key": "899c91a005ea2" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "a7d348d25cdf", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Avoid vendor lock-in by ensuring portability. Nextflow is ", + "_key": "4de5eda554a80", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "portable", + "_key": "4de5eda554a81" + }, + { + "marks": [], + "text": "; the same pipeline written on a laptop can quickly scale to run on an HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.", + "_key": "4de5eda554a82", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is ", + "_key": "da71b5f747e00" + }, + { + "_key": "da71b5f747e01", + "_type": "span", + "marks": [ + "strong" + ], + "text": "scalable" + }, + { + "_type": "span", + "marks": [], + "text": " allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture.", + "_key": "da71b5f747e02" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "bdee360f9489", + "listItem": "number" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6847acd3eeb3", + "listItem": "number", + "markDefs": [], + "children": [ + { + "text": "It is ", + "_key": "ebb7df1d27910", + "_type": "span", + "marks": [] + }, + { + "_key": "ebb7df1d27911", + "_type": "span", + "marks": [ + "strong" + ], + "text": "flexible" + }, + { + "_type": "span", + "marks": [], + "text": " and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.", + "_key": "ebb7df1d27912" + } + ] + }, + { + "style": "normal", + "_key": "d63e90107212", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is ", + "_key": "6ab9733600030" + }, + { + "marks": [ + "strong" + ], + "text": "growing fast", + "_key": "6ab9733600031", + "_type": "span" + }, + { + "text": " and has ", + "_key": "6ab9733600032", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "long-term support", + "_key": "6ab9733600033" + }, + { + "text": " available from Seqera Labs. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly.", + "_key": "6ab9733600034", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "05c032fad9dc", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "It is ", + "_key": "3972b950fd750", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "open source", + "_key": "3972b950fd751", + "_type": "span" + }, + { + "marks": [], + "text": " and licensed under Apache 2.0. You are free to use it, modify it and distribute it.", + "_key": "3972b950fd752", + "_type": "span" + } + ], + "level": 1 + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "9025af254599", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f9862b4d5883", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "2. Introduction to Nextflow by 23andMe", + "_key": "7552d6d831ea", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "3cf0ff3526a1" + }, + { + "_key": "139ee2c52953", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This informative post begins with the basic concepts of Nextflow and builds towards how Nextflow is used at 23andMe. It includes a detailed use case for how 23andMe run their imputation pipeline in the cloud, processing over 1 million individuals per day with over 10,000 CPUs in a single compute environment.", + "_key": "26259355226c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "eebedda8320d" + } + ], + "_type": "block", + "style": "normal", + "_key": "e4fd20eb861e" + }, + { + "style": "normal", + "_key": "df3f4ba9eb3b", + "markDefs": [ + { + "href": "https://medium.com/23andme-engineering/introduction-to-nextflow-4d0e3b6768d1", + "_key": "ee98c9c7dd80", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "👉 ", + "_key": "edb911f9f012", + "_type": "span" + }, + { + "text": "Nextflow at 23andMe", + "_key": "3d639c2eb006", + "_type": "span", + "marks": [ + "ee98c9c7dd80" + ] + } + ], + "_type": "block" + }, + { + "_key": "ee1765865115", + "markDefs": [], + "children": [ + { + "_key": "0b2190962655", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "6850303c248e", + "markDefs": [], + "children": [ + { + "_key": "40b29c7c6598", + "_type": "span", + "marks": [], + "text": "3. A simple RNA-Seq hands-on tutorial" + } + ] + }, + { + "_key": "9768cb7f549f", + "markDefs": [], + "children": [ + { + "_key": "4b115b943814", + "_type": "span", + "marks": [], + "text": "This hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, using channels to pass data around and writing processes to perform tasks. It includes all scripts, input data and resources and is perfect for getting a taste of Nextflow." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "3bf721bae5aa", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "018ba0967d07" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "eee4b4d5634b", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/seqeralabs/nextflow-tutorial", + "_key": "1684a9d1da62" + } + ], + "children": [ + { + "marks": [], + "text": "👉 ", + "_key": "ad253e1b5e08", + "_type": "span" + }, + { + "_key": "c9965008eca4", + "_type": "span", + "marks": [ + "1684a9d1da62" + ], + "text": "Tutorial link on GitHub" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "86364c843fab", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0de9f33aeb39", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "4. Nextflow workshop from Seqera Labs", + "_key": "27e2ce9e1377" + } + ], + "_type": "block", + "style": "h2", + "_key": "5ae94453a466", + "markDefs": [] + }, + { + "_key": "b931fb11db69", + "markDefs": [], + "children": [ + { + "_key": "35a6c29ef2a0", + "_type": "span", + "marks": [], + "text": "Here you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "be1116412b05", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b6302c4064d7" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a29dc2d7c622", + "markDefs": [], + "children": [ + { + "_key": "15fc70cd3777", + "_type": "span", + "marks": [], + "text": "The workshop includes topics on:" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "dbcfd4f30383", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Environment Setup", + "_key": "bb69e88f5e080", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Basic NF Script and Concepts", + "_key": "89eed1c80b690" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "994697413b8a" + }, + { + "children": [ + { + "text": "Nextflow Processes", + "_key": "3fa919edaf380", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6877b4fc6ae2", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "d6d3678bc256", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Nextflow Channels", + "_key": "5638081ae2d80", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "2a816049a0d0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "b0eab01208d90", + "_type": "span", + "marks": [], + "text": "Nextflow Operators" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "23a6df5e78cf", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Basic RNA-Seq pipeline", + "_key": "6d6081749c4e0" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Containers & Conda", + "_key": "a87f5692c9a80" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b88001c2eb89", + "listItem": "bullet", + "markDefs": [] + }, + { + "style": "normal", + "_key": "3fa72c856705", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Nextflow Configuration", + "_key": "cf23eb5d7bc30", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "8582bfbd1efc0", + "_type": "span", + "marks": [], + "text": "On-premise & Cloud Deployment" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "92162c7f5c99", + "listItem": "bullet" + }, + { + "_key": "5ca0c704b72e", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "DSL 2 & Modules", + "_key": "451c295b6f560", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "aae6987ab5bf", + "listItem": "bullet", + "markDefs": [ + { + "_key": "01078b116cd0", + "_type": "link", + "href": "https://seqera.io/training/handson/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727454973713.1727685668449.81&__hssc=247481240.20.1727685668449&__hsfp=3485190257" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "01078b116cd0" + ], + "text": "GATK hands-on exercise", + "_key": "670716b1d0410" + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "16981d844a89", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "98091311e491" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ba3249e7c838", + "markDefs": [ + { + "href": "https://seqera.io/training", + "_key": "c9757c8f64ae", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI", + "_key": "2ab20bcfd321" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 ", + "_key": "c67554e7feab" + }, + { + "_type": "span", + "marks": [ + "c9757c8f64ae" + ], + "text": "Workshop", + "_key": "255ebabf9479" + }, + { + "_type": "span", + "marks": [], + "text": " & ", + "_key": "84b96224ef2d" + }, + { + "_key": "f03c33b2a55a", + "_type": "span", + "marks": [ + "2ab20bcfd321" + ], + "text": "YouTube playlist" + }, + { + "marks": [], + "text": ".", + "_key": "8642a4e2efab", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e14b8f54464c" + } + ], + "_type": "block", + "style": "normal", + "_key": "02777ec2ca1f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "5. Software Carpentry workshop", + "_key": "8a1134ffa2f9", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "32e36925715d" + }, + { + "markDefs": [ + { + "_key": "dfa5e2b63ced", + "_type": "link", + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html" + }, + { + "href": "https://nf-co.re/", + "_key": "1760364e04e8", + "_type": "link" + }, + { + "_type": "link", + "href": "https://carpentries.org/", + "_key": "3a6a228764f3" + } + ], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "fada5d9ccc7c", + "_type": "span" + }, + { + "marks": [ + "dfa5e2b63ced" + ], + "text": "Nextflow Software Carpentry", + "_key": "7d5f308fb3ff", + "_type": "span" + }, + { + "_key": "6625dac02e37", + "_type": "span", + "marks": [], + "text": " workshop (in active development) motivates the use of Nextflow and " + }, + { + "_type": "span", + "marks": [ + "1760364e04e8" + ], + "text": "nf-core", + "_key": "7e272c7f4410" + }, + { + "marks": [], + "text": " as development tools for building and sharing reproducible data science workflows. The intended audience are those with little programming experience, and the course provides a foundation to comfortably write and run Nextflow and nf-core workflows. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit ", + "_key": "471eefc1cee2", + "_type": "span" + }, + { + "_key": "3c759b38b43c", + "_type": "span", + "marks": [ + "3a6a228764f3" + ], + "text": "The Carpentries" + }, + { + "_type": "span", + "marks": [], + "text": " style of training. The Carpentries emphasize feedback to improve teaching materials so we would like to hear back from you about what you thought was both well-explained and what needs improvement. Pull requests to the course material are very welcome.", + "_key": "b3aa11ccb06d" + } + ], + "_type": "block", + "style": "normal", + "_key": "48f909ba8fa5" + }, + { + "style": "normal", + "_key": "047d7830acde", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "941749b7d4c8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "The workshop can be opened on ", + "_key": "99c56c9cd477", + "_type": "span", + "marks": [] + }, + { + "text": "Gitpod", + "_key": "3e8dd007dd49", + "_type": "span", + "marks": [ + "cafc2ae90125" + ] + }, + { + "_key": "3c57c93d8c59", + "_type": "span", + "marks": [], + "text": " where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside." + } + ], + "_type": "block", + "style": "normal", + "_key": "d0637a644a49", + "markDefs": [ + { + "_type": "link", + "href": "https://gitpod.io/#https://github.com/carpentries-incubator/workflows-nextflow", + "_key": "cafc2ae90125" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d2fb089056a6" + } + ], + "_type": "block", + "style": "normal", + "_key": "9bc86bc0dd20" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "2c7c185f7268" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 You can find the course in ", + "_key": "fca854600a20" + }, + { + "marks": [ + "2c7c185f7268" + ], + "text": "The Carpentries incubator", + "_key": "1e3355d4210e", + "_type": "span" + }, + { + "_key": "34bbcae5c781", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "8a4050ef3894" + }, + { + "_key": "cd267c22eda8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fbfaef0e0533" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "f4a5c8d14268", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "6. Managing Pipelines in the Cloud - GenomeWeb Webinar", + "_key": "d9d0d8458340", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ee7642b0ab97", + "markDefs": [], + "children": [ + { + "_key": "3483c0cae28f", + "_type": "span", + "marks": [], + "text": "This on-demand webinar features Phil Ewels from SciLifeLab and nf-core, Brendan Boufler from Amazon Web Services and Evan Floden from Seqera Labs. The wide ranging dicussion covers the significance of scientific workflow, examples of Nextflow in production settings and how Nextflow can be integrated with other processes." + } + ] + }, + { + "style": "normal", + "_key": "7faf95cedd28", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "770a1bb85d45" + } + ], + "_type": "block" + }, + { + "_key": "f6857d75f0f6", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/webinars-and-podcasts/managing-bioinformatics-pipelines-in-the-cloud-to-do-more-science/", + "_key": "ad144f3ff954" + } + ], + "children": [ + { + "text": "👉 ", + "_key": "7cd47f1fb986", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "ad144f3ff954" + ], + "text": "Watch the webinar", + "_key": "8f63652f7c6c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "e514468f4ecf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9b37a6cfc899" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "7. Nextflow implementation patterns", + "_key": "181dc00ec033" + } + ], + "_type": "block", + "style": "h2", + "_key": "e086d1d59ff3" + }, + { + "_key": "766587c3a173", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along, as well as a GitHub repository.", + "_key": "c8cb222a1037", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "5cad4343667f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ede732572d32" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "d1c45c59a220", + "markDefs": [ + { + "_type": "link", + "href": "http://nextflow-io.github.io/patterns/index.html", + "_key": "97faf136cd7e" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/patterns", + "_key": "e8c0a363f94f" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 ", + "_key": "12120a953152" + }, + { + "_key": "a141c05ec948", + "_type": "span", + "marks": [ + "97faf136cd7e" + ], + "text": "Nextflow Patterns" + }, + { + "_type": "span", + "marks": [], + "text": " & ", + "_key": "2eae736c99ec" + }, + { + "text": "GitHub repository", + "_key": "4f81ef45724d", + "_type": "span", + "marks": [ + "e8c0a363f94f" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "1b9836a6b73b" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "bea3cd84fd84", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b30bd87a0093" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "8. nf-core tutorials", + "_key": "edf3271a5b9d", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "2838cce5e2a9" + }, + { + "_type": "block", + "style": "normal", + "_key": "4517e3e93d57", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "A tutorial covering the basics of using and creating nf-core pipelines. It provides an overview of the nf-core framework including:", + "_key": "448cdb4660cc", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "How to run nf-core pipelines", + "_key": "9ce3c25acd7c0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8a8b627f0ef2", + "listItem": "bullet" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "What are the most commonly used nf-core tools", + "_key": "15a4325e2b150" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "38898cdce7fe", + "listItem": "bullet", + "markDefs": [] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "How to make new pipelines using the nf-core template", + "_key": "230ee6be8a4e0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c89aeee07cf0" + }, + { + "_key": "590d7b2bc8cb", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "What are nf-core shared modules", + "_key": "44d3f22682f00", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ed7d85e6fffb0", + "_type": "span", + "marks": [], + "text": "How to add nf-core shared modules to a pipeline" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0eb8b342de8e", + "listItem": "bullet" + }, + { + "_key": "904f16cc8187", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "How to make new nf-core modules using the nf-core module template", + "_key": "65885d3779cb0" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "536278bc8e8b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "How nf-core pipelines are reviewed and ultimately released", + "_key": "5378a560d3bd0", + "_type": "span" + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "f72883dd5825", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "f1c75264bb48", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 ", + "_key": "5642be3e2582" + }, + { + "_key": "a3a383639ad7", + "_type": "span", + "marks": [ + "a672056b7c5c" + ], + "text": "nf-core usage tutorials" + }, + { + "_key": "41291636a32e", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_key": "3a043f42f65c", + "_type": "span", + "marks": [ + "9a92929770b3" + ], + "text": "nf-core developer tutorials" + } + ], + "_type": "block", + "style": "normal", + "_key": "ef17292de4f0", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/usage/usage_tutorials", + "_key": "a672056b7c5c" + }, + { + "_type": "link", + "href": "https://nf-co.re/developers/developer_tutorials", + "_key": "9a92929770b3" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "163422611805", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a56d2d3a8b55", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "610c06fa5910", + "markDefs": [], + "children": [ + { + "text": "9. Awesome Nextflow", + "_key": "47cf291fb997", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "A collections of awesome Nextflow pipelines.", + "_key": "3ac77fc3de55" + } + ], + "_type": "block", + "style": "normal", + "_key": "77634d7c61a8", + "markDefs": [] + }, + { + "_key": "819a387fa98a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c190715ddfec" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/awesome-nextflow", + "_key": "5be9ab56b853" + } + ], + "children": [ + { + "text": "👉 ", + "_key": "860b74efea97", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "5be9ab56b853" + ], + "text": "Awesome Nextflow", + "_key": "c89c79604d62" + }, + { + "_key": "9b3276815ab2", + "_type": "span", + "marks": [], + "text": " on GitHub" + } + ], + "_type": "block", + "style": "normal", + "_key": "f6a41b2ab326" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d667dc22b83b" + } + ], + "_type": "block", + "style": "normal", + "_key": "fcc2fa9bd423" + }, + { + "markDefs": [], + "children": [ + { + "_key": "2b21cbbb5d79", + "_type": "span", + "marks": [], + "text": "10. Further resources" + } + ], + "_type": "block", + "style": "h2", + "_key": "52d5616f599b" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.", + "_key": "987ce58bf0f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "39bf2e9c1105" + }, + { + "_type": "block", + "style": "normal", + "_key": "6445a22b5f05", + "markDefs": [], + "children": [ + { + "_key": "d0398bc3cf37", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nNextflow docs", + "_key": "0d0c0f3987b1" + } + ], + "_type": "block", + "style": "h3", + "_key": "c0fdfc189100" + }, + { + "children": [ + { + "_key": "c75f94d4d292", + "_type": "span", + "marks": [], + "text": "The reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month with the latest stable releases every three months." + } + ], + "_type": "block", + "style": "normal", + "_key": "c845d42c9daa", + "markDefs": [] + }, + { + "style": "normal", + "_key": "e1dd75c1a090", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "584914ef5475", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "aaf30c65766d", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/index.html", + "_key": "ba37dc0a6368" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/index.html", + "_key": "665ccffb3da6" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 Latest ", + "_key": "e74c4e2712d5" + }, + { + "_type": "span", + "marks": [ + "ba37dc0a6368" + ], + "text": "stable", + "_key": "7364a67bf551" + }, + { + "marks": [], + "text": " & ", + "_key": "11c87bc10b4a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "665ccffb3da6" + ], + "text": "edge", + "_key": "5acb40efd0fb" + }, + { + "_type": "span", + "marks": [], + "text": " documentation.", + "_key": "5b61710faf86" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "80fac716a43e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "40ea79c50a29" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nSeqera Labs docs", + "_key": "2f067c0fee01" + } + ], + "_type": "block", + "style": "h3", + "_key": "c5fee38a4d87", + "markDefs": [] + }, + { + "_key": "15e3c3d6a4ce", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "An index of documentation, deployment guides, training materials and resources for all things Nextflow and Tower.", + "_key": "e7205784453b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "49af5234bf65", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0ad3c800e450", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 ", + "_key": "9f53899e4ffd" + }, + { + "_type": "span", + "marks": [ + "95926cbb83df" + ], + "text": "Seqera Labs docs", + "_key": "08cd09721f2a" + } + ], + "_type": "block", + "style": "normal", + "_key": "5b2133e5f5c3", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/docs", + "_key": "95926cbb83df" + } + ] + }, + { + "_key": "653881895670", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b0394ac6c12f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "3756f0d47ff8", + "markDefs": [], + "children": [ + { + "text": "\nnf-core", + "_key": "e942fa3c5a18", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "fad042f49d78", + "markDefs": [], + "children": [ + { + "text": "nf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.", + "_key": "ff1616c0b03d", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "6e724f1b05f1", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "c8b92638cd83", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re", + "_key": "66277775b867" + }, + { + "_key": "9a9421ed709c", + "_type": "link", + "href": "https://nf-co.re/join" + } + ], + "children": [ + { + "_key": "e80548bccdd7", + "_type": "span", + "marks": [], + "text": "👉 " + }, + { + "text": "nf-core website", + "_key": "f5cb3f49dd12", + "_type": "span", + "marks": [ + "66277775b867" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "0d059543944b" + }, + { + "text": "nf-core Slack", + "_key": "c0a0996ecc63", + "_type": "span", + "marks": [ + "9a9421ed709c" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "ebb34a9d1881" + }, + { + "style": "normal", + "_key": "62785245a4d0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "275da84c6f62", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nNextflow Tower", + "_key": "9482b120e1b1" + } + ], + "_type": "block", + "style": "h3", + "_key": "afed5ecd4874", + "markDefs": [] + }, + { + "_key": "a16c68cf2559", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface, CLI or API.", + "_key": "a39cd4243f40" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "425185bd8be7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "684355b941bd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "dcd51303b4f2", + "_type": "link", + "href": "https://tower.nf" + }, + { + "_type": "link", + "href": "http://help.tower.nf", + "_key": "dc758ce7a535" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "👉 ", + "_key": "4906a8dbe2fb" + }, + { + "_key": "c13c27680f1c", + "_type": "span", + "marks": [ + "dcd51303b4f2" + ], + "text": "Nextflow Tower" + }, + { + "_key": "da4cc6b020c8", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "text": "user documentation", + "_key": "649502d051c5", + "_type": "span", + "marks": [ + "dc758ce7a535" + ] + }, + { + "_key": "8d543e3ae1c5", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "85d48432f4ad" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "52e711736200", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "fa51176d1868" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nNextflow Biotech Blueprint by AWS", + "_key": "d8554d7a88fd" + } + ], + "_type": "block", + "style": "h3", + "_key": "c1032f785860" + }, + { + "_key": "8fff7182417e", + "markDefs": [], + "children": [ + { + "_key": "8f4d9b0e83ae", + "_type": "span", + "marks": [], + "text": "A quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "97562df2c0d2", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8fdb6289dc7d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "👉 ", + "_key": "82a030c7fda2", + "_type": "span" + }, + { + "text": "Biotech Blueprint by AWS", + "_key": "b5e7768ccf0e", + "_type": "span", + "marks": [ + "3b91e618daab" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "e42b7fe143ba", + "markDefs": [ + { + "_key": "3b91e618daab", + "_type": "link", + "href": "https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/" + } + ] + }, + { + "_key": "eda85e1f7ef8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2254fe8e8647" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "\nNextflow Data Pipelines on Azure Batch", + "_key": "8b1c0246f1b5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "c70b87116002" + }, + { + "children": [ + { + "_key": "719192686a36", + "_type": "span", + "marks": [], + "text": "Nextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guides below to set up both services on Azure, and to get your storage and batch account names and keys." + } + ], + "_type": "block", + "style": "normal", + "_key": "2497b685cabd", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "f7bf8339db77", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "927addfab689" + }, + { + "style": "normal", + "_key": "0e11a4498c0a", + "markDefs": [ + { + "href": "https://techcommunity.microsoft.com/t5/azure-compute-blog/running-nextflow-data-pipelines-on-azure-batch/ba-p/2150383", + "_key": "997c1e7faa62", + "_type": "link" + }, + { + "href": "https://github.com/microsoft/Genomics-Quickstart/blob/main/03-Nextflow-Azure/README.md", + "_key": "34698d870937", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "👉 ", + "_key": "1dc60af2db62", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "997c1e7faa62" + ], + "text": "Azure Blog", + "_key": "e7055ae38185" + }, + { + "text": " and ", + "_key": "456737a25ed5", + "_type": "span", + "marks": [] + }, + { + "text": "GitHub repository", + "_key": "8b46277edb76", + "_type": "span", + "marks": [ + "34698d870937" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "bc2847725bb2" + } + ], + "_type": "block" + }, + { + "_key": "e4c560f35a6e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8fd4ccdb484c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "ed9ac5fd1023", + "markDefs": [], + "children": [ + { + "text": "\nRunning Nextflow by Google Cloud", + "_key": "c8057e5ccdfc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a7c1203fbe34", + "_type": "span", + "marks": [], + "text": "A step-by-step guide to launching Nextflow Pipelines in Google Cloud." + } + ], + "_type": "block", + "style": "normal", + "_key": "ddb141d2ea06" + }, + { + "style": "normal", + "_key": "076988864de6", + "markDefs": [], + "children": [ + { + "_key": "3d63c32304df", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0a05f1c2dc9a", + "markDefs": [ + { + "_key": "64ffaca73f7a", + "_type": "link", + "href": "https://cloud.google.com/life-sciences/docs/tutorials/nextflow" + } + ], + "children": [ + { + "text": "👉 ", + "_key": "f77bac9d8e8e", + "_type": "span", + "marks": [] + }, + { + "_key": "de0ed8f6b94e", + "_type": "span", + "marks": [ + "64ffaca73f7a" + ], + "text": "Nextflow on Google Cloud" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "49dddfa8b020", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "135361554f03", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "h3", + "_key": "da12d828f008", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\nBonus: Nextflow Tutorial - Variant Calling Edition", + "_key": "fe1926d59d48", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "81e30c1820da", + "_type": "span", + "marks": [], + "text": "This " + }, + { + "_type": "span", + "marks": [ + "4cbfc4c188fd" + ], + "text": "Nextflow Tutorial - Variant Calling Edition", + "_key": "0b92dfdecb13" + }, + { + "_type": "span", + "marks": [], + "text": " has been adapted from the ", + "_key": "70606a6936fa" + }, + { + "_type": "span", + "marks": [ + "0bb8547545ef" + ], + "text": "Nextflow Software Carpentry training material", + "_key": "9a260a299f8d" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "5a2252d75ac2" + }, + { + "_key": "001ca02345ef", + "_type": "span", + "marks": [ + "3bce3f4f8dfe" + ], + "text": "Data Carpentry: Wrangling Genomics Lesson" + }, + { + "marks": [], + "text": ". Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash-script into a Nextflow workflow and to modularize the pipeline using DSL2 modules and sub-workflows.", + "_key": "38dac3d70043", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "899dbf7da439", + "markDefs": [ + { + "_type": "link", + "href": "https://sateeshperi.github.io/nextflow_varcal/nextflow/", + "_key": "4cbfc4c188fd" + }, + { + "href": "https://carpentries-incubator.github.io/workflows-nextflow/index.html", + "_key": "0bb8547545ef", + "_type": "link" + }, + { + "_key": "3bce3f4f8dfe", + "_type": "link", + "href": "https://datacarpentry.org/wrangling-genomics/" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c6a90b307e55" + } + ], + "_type": "block", + "style": "normal", + "_key": "94c982dc8d84", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://gitpod.io/#https://github.com/sateeshperi/nextflow_tutorial.git", + "_key": "0e52da477db1" + } + ], + "children": [ + { + "text": "The workshop can be opened on ", + "_key": "e65cfa1e2fe5", + "_type": "span", + "marks": [] + }, + { + "text": "Gitpod", + "_key": "dfb97d9fc7c8", + "_type": "span", + "marks": [ + "0e52da477db1" + ] + }, + { + "_type": "span", + "marks": [], + "text": " where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.", + "_key": "4ab8053c3848" + } + ], + "_type": "block", + "style": "normal", + "_key": "2c74f4bf2f9c" + }, + { + "style": "normal", + "_key": "21ad4696ee87", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "4c82574b1116", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "a0260f36ca71", + "markDefs": [ + { + "_key": "2ec2725e6c2b", + "_type": "link", + "href": "https://sateeshperi.github.io/nextflow_varcal/nextflow/" + } + ], + "children": [ + { + "marks": [], + "text": "👉 You can find the course in ", + "_key": "412e724ec707", + "_type": "span" + }, + { + "_key": "c191095a7e22", + "_type": "span", + "marks": [ + "2ec2725e6c2b" + ], + "text": "Nextflow Tutorial - Variant Calling Edition" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6e7325cbbc7e" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "bfd6e4bbf7b7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ba93cdf5cb08", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "5d4e17180b1e", + "markDefs": [], + "children": [ + { + "_key": "c528c6e1b4c4", + "_type": "span", + "marks": [], + "text": "Community and support" + } + ] + }, + { + "_key": "d07e99db5398", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://gitter.im/nextflow-io/nextflow", + "_key": "de0eee6a39b2" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow ", + "_key": "cd8adc19bf350" + }, + { + "marks": [ + "de0eee6a39b2" + ], + "text": "Gitter channel", + "_key": "cd8adc19bf351", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "adcd2c24e5d4", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://groups.google.com/forum/#!forum/nextflow", + "_key": "c81f376ad63d", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow ", + "_key": "4c6af55933d70", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "c81f376ad63d" + ], + "text": "Forums", + "_key": "4c6af55933d71", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "5967a0b92cf3", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://twitter.com/nextflowio?lang=en", + "_key": "a39fa3a86eae" + } + ], + "children": [ + { + "text": "Nextflow Twitter ", + "_key": "33fb1bb48a0d0", + "_type": "span", + "marks": [] + }, + { + "_key": "33fb1bb48a0d1", + "_type": "span", + "marks": [ + "a39fa3a86eae" + ], + "text": "@nextflowio" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "744007dc150c", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://nfcore.slack.com/", + "_key": "888176b40e43" + } + ], + "children": [ + { + "marks": [ + "888176b40e43" + ], + "text": "nf-core Slack", + "_key": "0fbde57c12450", + "_type": "span" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "63ac41fd22fd", + "_type": "link", + "href": "https://www.seqera.io/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727454973713.1727685668449.81&__hssc=247481240.20.1727685668449&__hsfp=3485190257" + }, + { + "_type": "link", + "href": "https://tower.nf/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727454973713.1727685668449.81&__hssc=247481240.20.1727685668449&__hsfp=3485190257", + "_key": "c4b5605e4ecf" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "63ac41fd22fd" + ], + "text": "Seqera Labs", + "_key": "49be1eb302f90" + }, + { + "text": " and ", + "_key": "49be1eb302f91", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "c4b5605e4ecf" + ], + "text": "Nextflow Tower", + "_key": "49be1eb302f92", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "7dd4caade288" + }, + { + "_type": "block", + "style": "normal", + "_key": "8fed5491a450", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1d701a299e40", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Credits", + "_key": "fa2ade8307b8", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "321eaf51d9f1" + }, + { + "_type": "block", + "style": "normal", + "_key": "579d77064df2", + "markDefs": [], + "children": [ + { + "_key": "5f8ed04593f0", + "_type": "span", + "marks": [], + "text": "Special thanks to Mahesh Binzer-Panchal for reviewing the latest revision of this post and contributing the Software Carpentry workshop section." + } + ] + } + ] + }, + { + "_id": "18740170e04d", + "title": "Bringing Nextflow to Google Cloud Platform with WuXi NextCODE", + "_updatedAt": "2024-09-26T09:01:52Z", + "_rev": "mvya9zzDXWakVjnX4hhZBC", + "publishedAt": "2018-12-18T07:00:00.000Z", + "_createdAt": "2024-09-25T14:15:24Z", + "body": [ + { + "_type": "block", + "_key": "66ed39877363" + }, + { + "_type": "block", + "style": "normal", + "_key": "8ee0452c6463", + "markDefs": [], + "children": [ + { + "_key": "e36636ba3a31", + "_type": "span", + "marks": [], + "text": "Google Cloud and WuXi NextCODE are dedicated to advancing the state of the art in biomedical informatics, especially through open source, which allows developers to collaborate broadly and deeply." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "47c7e48c3ccf", + "children": [ + { + "_key": "149ed2ad6309", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "b9ef5ebb3bd0", + "markDefs": [ + { + "href": "https://cloud.google.com/genomics/pipelines", + "_key": "e7ce80cdc9cf", + "_type": "link" + } + ], + "children": [ + { + "text": "WuXi NextCODE is itself a user of Nextflow, and Google Cloud has many customers that use Nextflow. Together, we’ve collaborated to deliver Google Cloud Platform (GCP) support for Nextflow using the ", + "_key": "f08cffdfa0e5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e7ce80cdc9cf" + ], + "text": "Google Pipelines API", + "_key": "1864c67ab899" + }, + { + "text": ". Pipelines API is a managed computing service that allows the execution of containerized workloads on GCP.", + "_key": "737e796ea0b2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "aab078c4ac2a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e164e3272470" + }, + { + "asset": { + "_ref": "image-82d7c6eac71557e87271761a03e409b036ab4b4b-181x28-svg", + "_type": "reference" + }, + "_type": "image", + "alt": "", + "_key": "4ca231946908" + }, + { + "_key": "d5261db26df4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": ".row:after { content: ""; display: table; clear: both; } ", + "_key": "e79138cff33d" + }, + { + "_type": "span", + "text": "", + "_key": "8c99bf933fdb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f48b498b01a9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bf2b049af76a" + } + ] + }, + { + "_key": "c954fbe7f126", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow now provides built-in support for Google Pipelines API which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines running on Google's scalable infrastructure with a few commands. This makes it even easier for customers and partners like WuXi NextCODE to process biomedical data using Google Cloud.", + "_key": "d33e09b96126" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "17404c3264c5", + "children": [ + { + "text": "", + "_key": "2a9365b134e2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "54b6bf5701b6", + "children": [ + { + "text": "Get started!", + "_key": "3be229258825", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0304286c8b13", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This feature is currently available in the Nextflow edge channel. Follow these steps to get started:", + "_key": "ae023b45b354" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "42ca07156479" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c3185c3bbef" + }, + { + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Install Nextflow from the edge channel exporting the variables shown below and then running the usual Nextflow installer Bash snippet:", + "_key": "441262cf359f" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "340506dcd7e3" + }, + { + "_type": "span", + "text": "```\nexport NXF_VER=18.12.0-edge\nexport NXF_MODE=google\ncurl https://get.nextflow.io | bash\n```\n", + "_key": "f81d87459566" + }, + { + "_type": "span", + "marks": [], + "text": "[Enable the Google Genomics API for your GCP projects](https://console.cloud.google.com/flows/enableapi?apiid=genomics.googleapis.com,compute.googleapis.com,storage-api.googleapis.com).", + "_key": "953450eb4cfb" + }, + { + "text": "[Download and set credentials for your Genomics API-enabled project](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually).", + "_key": "3fbb27135516", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Change your `nextflow.config` file to use the Google Pipelines executor and specify the required config values for it as [described in the documentation](/docs/edge/google.html#google-pipelines).", + "_key": "b0b93e84a56d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Finally, run your script with Nextflow like usual, specifying a Google Storage bucket as the pipeline work directory with the `-work-dir` option. For example:", + "_key": "83490ef8230a" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "adf82e577839" + }, + { + "_key": "c21b4448dd6b", + "_type": "span", + "text": "```\nnextflow run rnaseq-nf -work-dir gs://your-bucket/scratch\n```" + } + ], + "_type": "block", + "style": "normal", + "_key": "c26bd9413310" + }, + { + "style": "normal", + "_key": "19bbfa6ce38a", + "children": [ + { + "text": "", + "_key": "b3abf79ca239", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "_key": "da6244905916" + }, + { + "_key": "b46978f58895", + "markDefs": [], + "children": [ + { + "text": "We’re thrilled to make this contribution available to the Nextflow community!", + "_key": "631a091be65d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "bringing-nextflow-to-google-cloud-wuxinextcode" + } + }, + "_type": "blogPost", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "3ab093d5a797" + }, + { + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference", + "_key": "1678d0cbf36c" + } + ] + }, + { + "_createdAt": "2024-09-25T14:18:51Z", + "_rev": "hf9hwMPb7ybAE3bqEU5jMS", + "_id": "1c09b3c67c32", + "title": "Join us in welcoming the new Nextflow Ambassadors", + "publishedAt": "2024-07-10T06:00:00.000Z", + "_updatedAt": "2024-09-27T08:53:50Z", + "_type": "blogPost", + "author": { + "_ref": "mNsm4Vx1W1Wy6aYYkroetD", + "_type": "reference" + }, + "body": [ + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "As the second semester of 2024 kicks off, I am thrilled to welcome a new cohort of ambassadors to the Nextflow Ambassador Program. This vibrant group joins the dedicated ambassadors who are continuing their remarkable work from the previous semester. Together, they form a diverse and talented team, representing a variety of countries and backgrounds, encompassing both industry and academia.", + "_key": "7b7ead3b161a" + } + ], + "_type": "block", + "style": "normal", + "_key": "0ac8c6674b95", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "755301651240", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "eeaf046ecc6f" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "91da4218fab9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "158f6d803cf9" + }, + { + "_key": "108331865d76", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A diverse and inclusive cohort", + "_key": "93fe59350dc3" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ce1bab06fae6", + "_type": "span", + "marks": [], + "text": "This semester, I am proud to announce that our ambassadors hail from over 20 countries, reflecting the increasingly global reach and inclusive nature of the Nextflow community. There has historically been a strong presence of Nextflow in the US and Europe, so I would like to extend an especially warm welcome to all those in Asia and the global south who are joining us through the program, from countries such as Argentina, Chile, Brazil, Ghana, Tunisia, Nigeria, South Africa, India, Indonesia, Singapore, and Australia. From seasoned bioinformaticians to emerging data scientists, our ambassadors bring a wealth of expertise and unique perspectives to the program." + } + ], + "_type": "block", + "style": "normal", + "_key": "c49bd09bc153" + }, + { + "markDefs": [], + "children": [ + { + "_key": "54a4ca5874b4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c23a0fef506f" + }, + { + "style": "h2", + "_key": "40719aff37ab", + "markDefs": [], + "children": [ + { + "text": "Industry and academia unite", + "_key": "ef0f41d63330", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "78862f9ff86c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "One of the strengths of the Nextflow Ambassador Program is its ability to bridge the gap between industry and academia. This semester, we have an exciting mix of professionals from biotech companies, renowned research institutions, and leading universities. This synergy fosters a rich exchange of ideas, driving innovation and collaboration.", + "_key": "6ab2e9763e9a" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e415a161e26b" + } + ], + "_type": "block", + "style": "normal", + "_key": "16cb2b6a3dab", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "f864c0eacbb3", + "markDefs": [], + "children": [ + { + "_key": "8fe428087f44", + "_type": "span", + "marks": [], + "text": "Spotlight on new Ambassadors" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0261748d906f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "I am particularly happy with this last call for ambassadors. Amazing people were selected, and I would like to highlight a few, though all of them are good additions to the team! For example, while Carson Miller, a PhD Candidate in the Department of Microbiology at the University of Washington, is new to the ambassador program, he has been making impactful contributions to the community for a long time. He hosted a local site for the nf-core Hackathon back in March, wrote a post to the Nextflow blog and has been very active in the nf-core community. The same can be said about Mahesh Binzer-Panchal, a Bioinformatician at NBIS, who has been very active in the community answering technical questions about Nextflow.", + "_key": "2cdd76f3e2a9", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d86865454f05" + } + ], + "_type": "block", + "style": "normal", + "_key": "f71378862d93" + }, + { + "_type": "block", + "style": "normal", + "_key": "5ff0d9ece480", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The previous round of ambassadors allowed us to achieve a broad global presence. However, some regions were more represented than others. I am especially thrilled to have new ambassadors in new regions of the globe, For example, Fadinda Shafira and Edwin Simjaya from Indonesia, AI Engineer and Head of AI at Kalbe, respectively. Prior to joining the program, they had already been strong advocates for Nextflow in Indonesia and had conducted Nextflow training sessions!", + "_key": "02b4f7eeb329", + "_type": "span" + } + ] + }, + { + "_key": "da1dc7fc75c5", + "markDefs": [], + "children": [ + { + "_key": "501afc222b5e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "fd3341b61b53", + "_type": "span", + "marks": [], + "text": "Continuing the good work" + } + ], + "_type": "block", + "style": "h2", + "_key": "c668a570e5e0" + }, + { + "_key": "a47b257bc8aa", + "markDefs": [ + { + "_key": "1e383eb70892", + "_type": "link", + "href": "https://www.nextflow.io/blog/2024/bioinformatics-growth-in-turkiye.html" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2024/training-local-site.html", + "_key": "f79d172cc706" + } + ], + "children": [ + { + "text": "I'm also delighted to see the continuing work of several dedicated ambassadors who have made significant contributions to the program. Abhinav Sharma, a Ph.D. Candidate at Stellenbosch University in South Africa, has been a key community contact in the African continent, and with the support we were able to provide him through the program, he was able to travel around Brazil and visit multiple research groups to advocate for Open Science, Nextflow, and nf-core. Similarly, Kübra Narcı, a bioinformatician at DKFZ in Germany, increased the awareness of ", + "_key": "c585df6b778a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "1e383eb70892" + ], + "text": "Nextflow in her home country, Türkiye", + "_key": "645a4950fdb4" + }, + { + "_type": "span", + "marks": [], + "text": ", while also contributing to the ", + "_key": "bffaf86c5b79" + }, + { + "_type": "span", + "marks": [ + "f79d172cc706" + ], + "text": "German research community", + "_key": "179f8a41d874" + }, + { + "marks": [], + "text": ".", + "_key": "ffcf7e59ccc3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "eec6971dfa47", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7b02658110ce" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "1bee5e78b2bf", + "_type": "span", + "marks": [], + "text": "The program has been shown to welcome a variety of backgrounds and both new and long-time community members. Just last year, Anabella Trigila, a Senior Bioinformatician at ZS in Argentina, was a mentee in the Nextflow and nf-core mentorship program and has quickly become a " + }, + { + "_type": "span", + "marks": [ + "0a39387f294c" + ], + "text": "key member in Latin America", + "_key": "c0fc82dc78f0" + }, + { + "_type": "span", + "marks": [], + "text": ". Robert Petit, a Bioinformatician at the Wyoming Public Health Laboratory in the US, meanwhile, has been ", + "_key": "3ee1f9a09536" + }, + { + "text": "a contributor for many years", + "_key": "8f45d35598a3", + "_type": "span", + "marks": [ + "790c1d9abbd5" + ] + }, + { + "_key": "7ea5b7bf28b9", + "_type": "span", + "marks": [], + "text": " and keeps giving back to the community." + } + ], + "_type": "block", + "style": "normal", + "_key": "48fe3b6602b7", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2024/reflections-on-nextflow-mentorship.html", + "_key": "0a39387f294c" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2024/empowering-bioinformatics-mentoring.html", + "_key": "790c1d9abbd5" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "447b0a7e5d73", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9fca4b3057b0" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Where we are", + "_key": "93ac879f5da4", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "10029989de78", + "markDefs": [] + }, + { + "_key": "6b4258709d92", + "asset": { + "_type": "reference", + "_ref": "image-b670e5d93bb37893edf42bb6f7841ed8e950196d-2432x1402-png" + }, + "_type": "image", + "alt": "Map with colored countries based on ambassadors residency" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Looking ahead", + "_key": "d9e3fd4241a2" + } + ], + "_type": "block", + "style": "h2", + "_key": "e06899825599" + }, + { + "_key": "29668a22b415", + "markDefs": [], + "children": [ + { + "text": "The upcoming semester promises to be an exciting period of growth and innovation for the Nextflow Ambassador Program. Based on current plans, our ambassadors are set to make sure people worldwide know Nextflow and have all the support they need to use it to advance the field of computational biology, among others. I look forward to seeing the incredible work that will emerge from this talented group.", + "_key": "1d4f095f9f50", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b7eeb4df95d6", + "markDefs": [], + "children": [ + { + "_key": "c7ab1997ec45", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Welcome, new and continuing ambassadors, to another inspiring semester! Together, we will continue to help push the boundaries of what's possible with Nextflow.", + "_key": "9139a8824f9c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3722fe41a06c" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "88d3116b9ca3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "cd633f2f30b9" + }, + { + "children": [ + { + "text": "Stay tuned for more updates and follow our ambassadors' journeys on the Nextflow blog here and the ", + "_key": "2ae7d01ce754", + "_type": "span", + "marks": [] + }, + { + "text": "Nextflow's Twitter/X account", + "_key": "1d16ba7e09f9", + "_type": "span", + "marks": [ + "8054e89f5631" + ] + }, + { + "marks": [], + "text": ".", + "_key": "131e1e316813", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a48904338e44", + "markDefs": [ + { + "_type": "link", + "href": "https://x.com/nextflowio", + "_key": "8054e89f5631" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "af5be818ed26" + } + ], + "_type": "block", + "style": "normal", + "_key": "3df45ff071aa" + }, + { + "_key": "8544b11f71a4", + "markDefs": [ + { + "_key": "69ef89c5d068", + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it ", + "_key": "656f9033f0e3" + }, + { + "_type": "span", + "marks": [ + "69ef89c5d068" + ], + "text": "here", + "_key": "e9a011dbf179" + }, + { + "text": ".", + "_key": "d90aea266e81", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_type": "block", + "style": "normal", + "_key": "54d8b2ca12e2", + "markDefs": [], + "children": [ + { + "_key": "2201116185d5", + "_type": "span", + "marks": [], + "text": "" + } + ] + } + ], + "meta": { + "description": "As the second semester of 2024 kicks off, I am thrilled to welcome a new cohort of ambassadors to the Nextflow Ambassador Program. This vibrant group joins the dedicated ambassadors who are continuing their remarkable work from the previous semester. Together, they form a diverse and talented team, representing a variety of countries and backgrounds, encompassing both industry and academia.", + "slug": { + "current": "welcome_ambassadors_20242" + } + } + }, + { + "meta": { + "description": "Nextflow and nf-core provide frequent community training events to new users, which offer an opportunity to get started using and understanding Nextflow, Groovy and nf-core. These events are live-streamed and are available for on-demand viewing on YouTube, but what if you could join friends in person and watch it live?", + "slug": { + "current": "training-local-site" + } + }, + "_id": "1c7ce044795b", + "_updatedAt": "2024-09-27T08:55:36Z", + "publishedAt": "2024-05-08T06:00:00.000Z", + "_type": "blogPost", + "author": { + "_type": "reference", + "_ref": "79d4b4cf-e2e3-4408-b79f-57c0c912e345" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5jJu", + "_createdAt": "2024-09-25T14:18:46Z", + "body": [ + { + "_key": "2948ddd8146f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow and nf-core provide frequent community training events to new users, which offer an opportunity to get started using and understanding Nextflow, Groovy and nf-core. These events are live-streamed and are available for on-demand viewing on YouTube, but what if you could join friends in person and watch it live?", + "_key": "7d95cf0e041b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "af395f42f648", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "c43ec3a56739", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6c5adf4f3645", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "683f5990ca53" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Learning something new by yourself can be a daunting task. Having colleagues and friends go through the learning and discovering process alongside you can really enrich the experience and be a lot of fun! With that in mind, we decided to host a get-together for the fundamentals training streams in person. Anybody from the scientific community in and around Heidelberg who wanted to learn Nextflow was welcome to join.", + "_key": "44710b5845bf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "30bcbeb4ebe0", + "markDefs": [] + }, + { + "_key": "07981aefbfd0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5d883adcec6a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "398234014af6", + "markDefs": [ + { + "_type": "link", + "href": "https://twitter.com/mribeirodantas", + "_key": "f17f159ab8d2" + }, + { + "href": "https://twitter.com/Chris_Hakk", + "_key": "e97774159a36", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNgLBGLAiY6Rl9fizsz-DTCT", + "_key": "5cd0c70ec7f3" + }, + { + "_type": "link", + "href": "https://twitter.com/kubranarci", + "_key": "2c74ce5e9a49" + }, + { + "_type": "link", + "href": "https://twitter.com/flowuenne", + "_key": "962798acd8c0" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This year, ", + "_key": "bdae0d3231ab" + }, + { + "text": "Marcel Ribeiro-Dantas", + "_key": "72dc8426d6d4", + "_type": "span", + "marks": [ + "f17f159ab8d2" + ] + }, + { + "_key": "ed44049a64d4", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "e97774159a36" + ], + "text": "Chris Hakkaart", + "_key": "bbb75ca9c3a0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " from Seqera held the training over two days, offering the first steps into the Nextflow universe (you can watch it ", + "_key": "097eb1db027d" + }, + { + "_key": "fa20bfee4645", + "_type": "span", + "marks": [ + "5cd0c70ec7f3" + ], + "text": "here" + }, + { + "_key": "80bbd258e461", + "_type": "span", + "marks": [], + "text": "). " + }, + { + "text": "Kübra Narcı", + "_key": "1459781242b5", + "_type": "span", + "marks": [ + "2c74ce5e9a49" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "ac650903825d" + }, + { + "_type": "span", + "marks": [ + "962798acd8c0" + ], + "text": "Florian Wünneman", + "_key": "4a1d49a205a8" + }, + { + "_key": "ff5e615f3aa0", + "_type": "span", + "marks": [], + "text": " hosted a local training site for the recent community fundamentals training in Heidelberg. Kübra is a Nextflow ambassador, working as a bioinformatician and using Nextflow to develop pipelines for the German Human Genome Phenome Archive (GHGA) project in her daily life. At the time, Florian was a Postdoc at the Institute of Computational Biomedicine with Denis Schapiro in Heidelberg, though he has since then joined Seqera as a Bioinformatics Engineer." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0d97eacd266f" + } + ], + "_type": "block", + "style": "normal", + "_key": "63c1e967805b" + }, + { + "markDefs": [], + "children": [ + { + "text": "We advertised the event about a month beforehand in our local communities (genomics, transcriptomics, spatial omics among others) to give people enough time to decide whether they want to join. We had quite a bit of interest and a total of 15 people participated. The event took place at the Marsilius Arkaden at the University Clinic campus in Heidelberg. Participants brought their laptops and followed along with the stream, which we projected for everyone, so people could use their laptops exclusively for coding and did not have to switch between stream and coding environment.", + "_key": "05400e689632", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d4a3efb5a7bf" + }, + { + "markDefs": [], + "children": [ + { + "_key": "6e1a8697be28", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6f7d026124d4" + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "7c38d84adbea", + "asset": { + "_type": "reference", + "_ref": "image-435d02dfa8574287535e57b4bbc98391bf543b96-1999x1500-jpg" + } + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "104dc5c35388", + "asset": { + "_ref": "image-89d3e976a117255a3e6d509faec0b6c64f747070-1999x1500-jpg", + "_type": "reference" + } + }, + { + "style": "normal", + "_key": "fb50f2e0eb52", + "markDefs": [], + "children": [ + { + "text": "The goal of this local training site was for everyone to follow the fundamentals training sessions on their laptop and be able to ask follow-up questions in person to the room. We also had a few experienced Nextflow users be there for support. There is a dedicated nf-core Slack channel during the training events for people to ask questions, which is a great tool for help. We also found that in-person discussions around topics that remained confusing to participants were really helpful for many people, as they could provide some more context and allow quick follow-up questions. During the course of the fundamentals training, we found ourselves naturally pausing the video and taking the time to discuss with the group. It was particularly great to see new users explaining concepts they just learned to each other.", + "_key": "038bef124afc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "3f40999fa162", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "63ec9e2dc604" + }, + { + "style": "normal", + "_key": "a22116c61fe1", + "markDefs": [ + { + "href": "https://nf-co.re/events/2024/hackathon-march-2024/germany-heidelberg", + "_key": "e2f46514825c", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This local training site was also an excellent opportunity for new Nextflow users in Heidelberg to get to know each other and make new connections before the upcoming nf-core hackathon, for which there was also a ", + "_key": "e288381c176c" + }, + { + "text": "local site", + "_key": "ce78f07ba353", + "_type": "span", + "marks": [ + "e2f46514825c" + ] + }, + { + "_type": "span", + "marks": [], + "text": " organized in Heidelberg. It was a great experience to organize a smaller local event to learn Nextflow with the local community. We learned some valuable lessons from this experience, that we will apply for the next local Nextflow gatherings. Advertising a bit earlier will give people more time to spread the word, we would likely aim for 2 months in advance next time. Offering coffee during breaks can go a long way to keep people awake and motivated, so we would try to serve up some hot coffee next time. Finally, having a bit more in-depth introductions (maybe via short posts on a forum) of everyone joining could be an even better ice breaker to foster contacts and collaborations for the future.", + "_key": "1ab233a9e47b" + } + ], + "_type": "block" + }, + { + "_key": "e56adb59fa34", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c6cd95a7c58b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "647d942ba741", + "markDefs": [], + "children": [ + { + "text": "The ability to join training sessions, bytesize talks, and other events from nf-core and Nextflow online is absolutely fantastic and enables the free dissemination of knowledge. However, the opportunity to join a group in person and work through the content together can really enrich the experience and bring people closer together.", + "_key": "f511bff75416", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_key": "dafd0a067288", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "20813650b1c4", + "markDefs": [] + }, + { + "children": [ + { + "_key": "99dee9971283", + "_type": "span", + "marks": [], + "text": "If you're looking for a training opportunity, there will be one in Basel, Switzerland, on June 25 and another one in Cambridge, UK, on September 12. These and other events will be displayed in the " + }, + { + "_type": "span", + "marks": [ + "f37634a55508" + ], + "text": "Seqera Events", + "_key": "0717c9146aef" + }, + { + "marks": [], + "text": " page when it gets closer to the dates of the events.", + "_key": "363e19ba8fce", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bd5857c46b76", + "markDefs": [ + { + "href": "https://seqera.io/events/", + "_key": "f37634a55508", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "114dde563fcf", + "markDefs": [], + "children": [ + { + "_key": "d36f709b36b5", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Who knows, maybe you will meet someone interested in the same topic, a new collaborator or even a new friend in your local Nextflow community!", + "_key": "3e65a0073232", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f7d3e524ace7" + }, + { + "children": [ + { + "_key": "9eec858a21df", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "91b445973635", + "markDefs": [] + }, + { + "_key": "aca8fae7ae03", + "markDefs": [ + { + "href": "https://www.nextflow.io/ambassadors.html", + "_key": "ab08fce92f9f", + "_type": "link" + } + ], + "children": [ + { + "_key": "42644671889c", + "_type": "span", + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it " + }, + { + "_key": "3cc1acb8cbfc", + "_type": "span", + "marks": [ + "ab08fce92f9f" + ], + "text": "here" + }, + { + "_key": "47fafa0eb0ea", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "blockquote" + } + ], + "title": "Nextflow Training: Bridging Online Learning with In-Person Connections" + }, + { + "_updatedAt": "2024-09-26T09:05:18Z", + "title": "Nextflow Training: Bridging Online Learning with In-Person Connections", + "publishedAt": "2024-05-08T06:00:00.000Z", + "_createdAt": "2024-09-25T14:18:46Z", + "author": { + "_ref": "79d4b4cf-e2e3-4408-b79f-57c0c912e345", + "_type": "reference" + }, + "body": [ + { + "_key": "2948ddd8146f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow and nf-core provide frequent community training events to new users, which offer an opportunity to get started using and understanding Nextflow, Groovy and nf-core. These events are live-streamed and are available for on-demand viewing on YouTube, but what if you could join friends in person and watch it live?", + "_key": "7d95cf0e041b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c43ec3a56739" + } + ], + "_type": "block", + "style": "normal", + "_key": "af395f42f648" + }, + { + "_type": "block", + "_key": "6c5adf4f3645" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Learning something new by yourself can be a daunting task. Having colleagues and friends go through the learning and discovering process alongside you can really enrich the experience and be a lot of fun! With that in mind, we decided to host a get-together for the fundamentals training streams in person. Anybody from the scientific community in and around Heidelberg who wanted to learn Nextflow was welcome to join.", + "_key": "44710b5845bf", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "30bcbeb4ebe0" + }, + { + "children": [ + { + "text": "", + "_key": "5d883adcec6a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "07981aefbfd0" + }, + { + "style": "normal", + "_key": "398234014af6", + "markDefs": [ + { + "_key": "f17f159ab8d2", + "_type": "link", + "href": "https://twitter.com/mribeirodantas" + }, + { + "_key": "e97774159a36", + "_type": "link", + "href": "https://twitter.com/Chris_Hakk" + }, + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNgLBGLAiY6Rl9fizsz-DTCT", + "_key": "5cd0c70ec7f3" + }, + { + "_key": "2c74ce5e9a49", + "_type": "link", + "href": "https://twitter.com/kubranarci" + }, + { + "_key": "962798acd8c0", + "_type": "link", + "href": "https://twitter.com/flowuenne" + } + ], + "children": [ + { + "marks": [], + "text": "This year, ", + "_key": "bdae0d3231ab", + "_type": "span" + }, + { + "text": "Marcel Ribeiro-Dantas", + "_key": "72dc8426d6d4", + "_type": "span", + "marks": [ + "f17f159ab8d2" + ] + }, + { + "marks": [], + "text": " and ", + "_key": "ed44049a64d4", + "_type": "span" + }, + { + "marks": [ + "e97774159a36" + ], + "text": "Chris Hakkaart", + "_key": "bbb75ca9c3a0", + "_type": "span" + }, + { + "marks": [], + "text": " from Seqera held the training over two days, offering the first steps into the Nextflow universe (you can watch it ", + "_key": "097eb1db027d", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5cd0c70ec7f3" + ], + "text": "here", + "_key": "fa20bfee4645" + }, + { + "text": "). ", + "_key": "80bbd258e461", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "2c74ce5e9a49" + ], + "text": "Kübra Narcı", + "_key": "1459781242b5" + }, + { + "marks": [], + "text": " and ", + "_key": "ac650903825d", + "_type": "span" + }, + { + "_key": "4a1d49a205a8", + "_type": "span", + "marks": [ + "962798acd8c0" + ], + "text": "Florian Wünneman" + }, + { + "_type": "span", + "marks": [], + "text": " hosted a local training site for the recent community fundamentals training in Heidelberg. Kübra is a Nextflow ambassador, working as a bioinformatician and using Nextflow to develop pipelines for the German Human Genome Phenome Archive (GHGA) project in her daily life. At the time, Florian was a Postdoc at the Institute of Computational Biomedicine with Denis Schapiro in Heidelberg, though he has since then joined Seqera as a Bioinformatics Engineer.", + "_key": "ff5e615f3aa0" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0d97eacd266f" + } + ], + "_type": "block", + "style": "normal", + "_key": "63c1e967805b" + }, + { + "_key": "d4a3efb5a7bf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We advertised the event about a month beforehand in our local communities (genomics, transcriptomics, spatial omics among others) to give people enough time to decide whether they want to join. We had quite a bit of interest and a total of 15 people participated. The event took place at the Marsilius Arkaden at the University Clinic campus in Heidelberg. Participants brought their laptops and followed along with the stream, which we projected for everyone, so people could use their laptops exclusively for coding and did not have to switch between stream and coding environment.", + "_key": "05400e689632" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6f7d026124d4", + "children": [ + { + "_key": "6e1a8697be28", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "alt": "meme on bright landscape", + "_key": "7c38d84adbea", + "asset": { + "_ref": "image-435d02dfa8574287535e57b4bbc98391bf543b96-1999x1500-jpg", + "_type": "reference" + }, + "_type": "image" + }, + { + "_key": "104dc5c35388", + "asset": { + "_ref": "image-89d3e976a117255a3e6d509faec0b6c64f747070-1999x1500-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "meme on bright landscape" + }, + { + "children": [ + { + "_key": "038bef124afc", + "_type": "span", + "marks": [], + "text": "The goal of this local training site was for everyone to follow the fundamentals training sessions on their laptop and be able to ask follow-up questions in person to the room. We also had a few experienced Nextflow users be there for support. There is a dedicated nf-core Slack channel during the training events for people to ask questions, which is a great tool for help. We also found that in-person discussions around topics that remained confusing to participants were really helpful for many people, as they could provide some more context and allow quick follow-up questions. During the course of the fundamentals training, we found ourselves naturally pausing the video and taking the time to discuss with the group. It was particularly great to see new users explaining concepts they just learned to each other." + } + ], + "_type": "block", + "style": "normal", + "_key": "fb50f2e0eb52", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "63ec9e2dc604", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3f40999fa162" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "This local training site was also an excellent opportunity for new Nextflow users in Heidelberg to get to know each other and make new connections before the upcoming nf-core hackathon, for which there was also a ", + "_key": "e288381c176c", + "_type": "span" + }, + { + "text": "local site", + "_key": "ce78f07ba353", + "_type": "span", + "marks": [ + "e2f46514825c" + ] + }, + { + "marks": [], + "text": " organized in Heidelberg. It was a great experience to organize a smaller local event to learn Nextflow with the local community. We learned some valuable lessons from this experience, that we will apply for the next local Nextflow gatherings. Advertising a bit earlier will give people more time to spread the word, we would likely aim for 2 months in advance next time. Offering coffee during breaks can go a long way to keep people awake and motivated, so we would try to serve up some hot coffee next time. Finally, having a bit more in-depth introductions (maybe via short posts on a forum) of everyone joining could be an even better ice breaker to foster contacts and collaborations for the future.", + "_key": "1ab233a9e47b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a22116c61fe1", + "markDefs": [ + { + "href": "https://nf-co.re/events/2024/hackathon-march-2024/germany-heidelberg", + "_key": "e2f46514825c", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e56adb59fa34", + "children": [ + { + "text": "", + "_key": "c6cd95a7c58b", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "647d942ba741", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ability to join training sessions, bytesize talks, and other events from nf-core and Nextflow online is absolutely fantastic and enables the free dissemination of knowledge. However, the opportunity to join a group in person and work through the content together can really enrich the experience and bring people closer together.", + "_key": "f511bff75416" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "dafd0a067288" + } + ], + "_type": "block", + "style": "normal", + "_key": "20813650b1c4" + }, + { + "_key": "bd5857c46b76", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/events/", + "_key": "f37634a55508" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you're looking for a training opportunity, there will be one in Basel, Switzerland, on June 25 and another one in Cambridge, UK, on September 12. These and other events will be displayed in the ", + "_key": "99dee9971283" + }, + { + "_key": "0717c9146aef", + "_type": "span", + "marks": [ + "f37634a55508" + ], + "text": "Seqera Events" + }, + { + "marks": [], + "text": " page when it gets closer to the dates of the events.", + "_key": "363e19ba8fce", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "114dde563fcf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d36f709b36b5" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "f7d3e524ace7", + "markDefs": [], + "children": [ + { + "_key": "3e65a0073232", + "_type": "span", + "marks": [], + "text": "Who knows, maybe you will meet someone interested in the same topic, a new collaborator or even a new friend in your local Nextflow community!" + } + ] + } + ], + "_rev": "Ot9x7kyGeH5005E3MJ8alB", + "_type": "blogPost", + "_id": "20a1b1ae6681", + "meta": { + "slug": { + "current": "training-local-site" + } + }, + "tags": [ + { + "_type": "reference", + "_key": "cf024e178661", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ] + }, + { + "_rev": "hf9hwMPb7ybAE3bqEU5pIh", + "_updatedAt": "2024-10-02T07:32:23Z", + "_type": "blogPost", + "meta": { + "description": "For most developers, the command line is synonymous with agility. While tools such as Nextflow Tower are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the Nextflow website.", + "slug": { + "current": "cli-docs-release" + } + }, + "_createdAt": "2024-09-25T14:15:47Z", + "title": "The Nextflow CLI - tricks and treats!", + "publishedAt": "2020-10-22T06:00:00.000Z", + "_id": "224f4de4b73b", + "body": [ + { + "markDefs": [ + { + "_key": "5f74049a4a0b", + "_type": "link", + "href": "https://tower.nf" + }, + { + "_key": "23b4bf1a0a17", + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/cli.html" + } + ], + "children": [ + { + "text": "For most developers, the command line is synonymous with agility. While tools such as ", + "_key": "02a92956a55e", + "_type": "span", + "marks": [] + }, + { + "_key": "df77273e95ab", + "_type": "span", + "marks": [ + "5f74049a4a0b" + ], + "text": "Nextflow Tower" + }, + { + "_type": "span", + "marks": [], + "text": " are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the ", + "_key": "4bf7305364f0" + }, + { + "_type": "span", + "marks": [ + "23b4bf1a0a17" + ], + "text": "Nextflow website", + "_key": "0d9489b78b84" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3cc4a6f05f5c" + } + ], + "_type": "block", + "style": "normal", + "_key": "f53a71042080" + }, + { + "_key": "0964d6176602", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "488984dad191", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3ed5e08f86e5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "And given Halloween is just around the corner, in this blog post we'll take a look at 5 CLI tricks and examples which will make your life easier in designing, executing and debugging data pipelines. We are also giving away 5 limited-edition Nextflow hoodies and sticker packs so you can code in style this Halloween season!", + "_key": "7a716109cd37" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "090ecbd4f732", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c2049fd5e87d" + } + ] + }, + { + "style": "h3", + "_key": "66f038e8388a", + "markDefs": [], + "children": [ + { + "_key": "52c2d551540f", + "_type": "span", + "marks": [], + "text": "1. Invoke a remote pipeline execution with the latest revision" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Nextflow facilitates easy collaboration and re-use of existing pipelines in multiple ways. One of the simplest ways to do this is to use the URL of the Git repository:", + "_key": "2058e24358dc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b7f36aeb45f8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5c749d6a1e44" + } + ], + "_type": "block", + "style": "normal", + "_key": "af4e488a2cd1" + }, + { + "_type": "code", + "_key": "34a45055c363", + "code": "$ nextflow run https://www.github.com/nextflow-io/hello" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "b6bd611d7266", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8712e775b594", + "markDefs": [] + }, + { + "style": "normal", + "_key": "bd0a8586ad0f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When executing a pipeline using the run command, it first checks to see if it has been previously downloaded in the ", + "_key": "1a6d5f168c7b", + "_type": "span" + }, + { + "_key": "5da0a443f807", + "_type": "span", + "marks": [ + "code" + ], + "text": "~/.nextflow/assets" + }, + { + "_key": "bcbc6270595b", + "_type": "span", + "marks": [], + "text": " directory, and if so, Nextflow uses this to execute the pipeline. If the pipeline is not already cached, Nextflow will download it, store it in the " + }, + { + "_key": "37896a0f63f6", + "_type": "span", + "marks": [ + "code" + ], + "text": "$HOME/.nextflow/" + }, + { + "_key": "8ffc714ce324", + "_type": "span", + "marks": [], + "text": " directory and then launch the execution." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7269baed76c8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e3aa2801e71f" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "826d92ed952a", + "_type": "span", + "marks": [], + "text": "How can we make sure that we always run the latest code from the remote pipeline? We simply need to add the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-latest", + "_key": "05891da81525" + }, + { + "text": " option to the run command, and Nextflow takes care of the rest:", + "_key": "502fedc589ab", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d9b8e4db1dc9" + }, + { + "children": [ + { + "text": "", + "_key": "7c122dcd519b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7b1a2ed4a7c9", + "markDefs": [] + }, + { + "_key": "8b41fbaf989d", + "code": "$ nextflow run nextflow-io/hello -latest", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "725641bd62f8", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7dd400f9bf57", + "_type": "span" + } + ] + }, + { + "style": "h3", + "_key": "69f318f1cd6a", + "markDefs": [], + "children": [ + { + "text": "2. Query work directories for a specific execution", + "_key": "65d9e0858203", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "For every invocation of Nextflow, all the metadata about an execution is stored including task directories, completion status and time etc. We can use the ", + "_key": "6eac33152bb4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow log", + "_key": "00a46431af2c" + }, + { + "_key": "fa28d524a7e5", + "_type": "span", + "marks": [], + "text": " command to generate a summary of this information for a specific run." + } + ], + "_type": "block", + "style": "normal", + "_key": "0b65592870fc" + }, + { + "style": "normal", + "_key": "d140bf1fb8aa", + "markDefs": [], + "children": [ + { + "_key": "335c277d6ab5", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To see a list of work directories associated with a particular execution (for example, ", + "_key": "87365e4fa96a" + }, + { + "_key": "3426c2a6d34e", + "_type": "span", + "marks": [ + "code" + ], + "text": "tiny_leavitt" + }, + { + "text": "), use:", + "_key": "7bb2013a5b34", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b0492756900f" + }, + { + "style": "normal", + "_key": "7fae984a7a79", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ad9f9a075c13", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "code": "$ nextflow log tiny_leavitt", + "_type": "code", + "_key": "77fc8bf4b8ac" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e2ebd70bf39a" + } + ], + "_type": "block", + "style": "normal", + "_key": "716df9e751b8", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "To filter out specific process-level information from the logs of any execution, we simply need to use the fields (", + "_key": "9fad5087922c" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-f", + "_key": "17cb08ee32dd" + }, + { + "_type": "span", + "marks": [], + "text": ") option and specify the fields:", + "_key": "4e3d354f56ad" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2fd6746543f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "f9376c2deca9", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f876db1e52f6" + }, + { + "_key": "d9297d1e7a95", + "code": "$ nextflow log tiny_leavitt –f 'process, hash, status, duration'", + "_type": "code" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "0462f5a09916", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4c90ff7149cd", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1201a29850d6", + "_type": "span", + "marks": [], + "text": "The hash is the name of the work directory where the process was executed; therefore, the location of a process work directory would be something like " + }, + { + "marks": [ + "code" + ], + "text": "work/74/68ff183", + "_key": "9286c85ecb80", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "9e9f85259301", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a1c1c457041" + }, + { + "_type": "block", + "style": "normal", + "_key": "54974b549279", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "2cd6d7f16c1b", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "6d39b22a4319", + "markDefs": [], + "children": [ + { + "_key": "617ecbfb2cdf", + "_type": "span", + "marks": [], + "text": "The log command also has other child options including " + }, + { + "marks": [ + "code" + ], + "text": "-before", + "_key": "dda550682e34", + "_type": "span" + }, + { + "text": " and ", + "_key": "0daea83d286c", + "_type": "span", + "marks": [] + }, + { + "text": "-after", + "_key": "58056e8b5c79", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to help with the chronological inspection of logs.", + "_key": "e25877645b2d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1ad4693f4f0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7230b4385ab6", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1e3076196e5e", + "_type": "span", + "marks": [], + "text": "3. Top-level configuration" + } + ], + "_type": "block", + "style": "h3", + "_key": "de3b269917d6" + }, + { + "_type": "block", + "style": "normal", + "_key": "00712450e6df", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow emphasizes customization of pipelines and exposes multiple options to facilitate this. The configuration is applied to multiple Nextflow commands and is therefore a top-level option. In practice, this means specifying configuration options ", + "_key": "2daf8e559865" + }, + { + "text": "before", + "_key": "d3449bed824c", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " the command.", + "_key": "dea9cffe4c5b" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8248c23c1bdc", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f682139db714", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "9cf43882c7ff", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow CLI provides two kinds of config overrides - the soft override and the hard override.", + "_key": "45b569c9d5eb" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "737ca2f8ee45" + } + ], + "_type": "block", + "style": "normal", + "_key": "c82bc6e6de82" + }, + { + "_type": "block", + "style": "normal", + "_key": "c22b3aaba489", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The top-level soft override (", + "_key": "972704119e6a", + "_type": "span" + }, + { + "text": "-c", + "_key": "626a5f0a0e17", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ") option allows us to change the previous config in an additive manner, overriding only the fields included the configuration file:", + "_key": "12c1a8b9e19e" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "19732bb38c4f" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1c3e6057958" + }, + { + "code": "$ nextflow -c my.config run nextflow-io/hello", + "_type": "code", + "_key": "cc13a89abfa8" + }, + { + "style": "normal", + "_key": "bf01438a7a26", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "dcb0f33c95be" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "675978c5735b", + "markDefs": [], + "children": [ + { + "text": "On the other hand, the hard override (", + "_key": "13301f70e6d7", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "-C)", + "_key": "11a385f62eb4", + "_type": "span" + }, + { + "_key": "bdf5e0b90c3f", + "_type": "span", + "marks": [], + "text": " completely replaces and ignores any additional configurations:" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "630fd7da4ac9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b2b42f53c395" + }, + { + "code": "$ nextflow –C my.config nextflow-io/hello", + "_type": "code", + "_key": "c17e17e6962d" + }, + { + "_type": "block", + "style": "normal", + "_key": "ffbabdf579f3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Moreover, we can also use the config command to inspect the final inferred configuration and view any profiles:", + "_key": "aa336a662327", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "b0f2177085c3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d40e3d9830ad" + } + ], + "_type": "block" + }, + { + "code": "$ nextflow config -show-profiles", + "_type": "code", + "_key": "03a9ec8e025e" + }, + { + "_key": "7f3820fe3257", + "markDefs": [], + "children": [ + { + "_key": "cdb1e14ea76c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "d52cbeb9e3c6", + "markDefs": [], + "children": [ + { + "text": "4. Passing in an input parameter file", + "_key": "7e2e026df196", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "22693c2a833e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow is designed to work across both research and production settings. In production especially, specifying multiple parameters for the pipeline on the command line becomes cumbersome. In these cases, environment variables or config files are commonly used which contain all input files, options and metadata. Love them or hate them, YAML and JSON are the standard formats for human and machines, respectively.", + "_key": "d3329aa2b0f1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d3e951f2e82c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "86d45930afc0" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow run option ", + "_key": "7e992ebb5214" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-params-file", + "_key": "5d04d70a5dfe" + }, + { + "text": " can be used to pass in a file containing parameters in either format:", + "_key": "a469c8a253ac", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "08ae7429963d", + "markDefs": [] + }, + { + "style": "normal", + "_key": "0e0a78f00ba9", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a93227a6d3ff", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "47bbbee89628", + "code": "$ nextflow run nextflow-io/rnaseq -params-file run_42.yaml", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5770dc6e7675", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "90471ddfa812" + }, + { + "_type": "block", + "style": "normal", + "_key": "e1ba5d9fbb05", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The YAML file could contain the following:", + "_key": "744d3dd67928" + } + ] + }, + { + "_key": "6bde078896bf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c5bfbb492b0f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "reads : \"s3://gatk-data/run_42/reads/*_R{1,2}_*.fastq.gz\"\nbwa_index : \"$baseDir/index/*.bwa-index.tar.gz\"\npaired_end : true\npenalty : 12", + "_type": "code", + "_key": "2462af85dd93" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fadbe454e0a2" + } + ], + "_type": "block", + "style": "normal", + "_key": "8b40b488ffae" + }, + { + "markDefs": [], + "children": [ + { + "text": "5. Specific workflow entry points", + "_key": "df825d8da2d5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "e2b58b2a59d5" + }, + { + "style": "normal", + "_key": "773f9a9ace8c", + "markDefs": [ + { + "_key": "4b5b0c568454", + "_type": "link", + "href": "https://www.nextflow.io/blog/2020/dsl2-is-here.html" + } + ], + "children": [ + { + "marks": [], + "text": "The recently released ", + "_key": "b39078a98091", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "4b5b0c568454" + ], + "text": "DSL2", + "_key": "60034d94a3b7" + }, + { + "_key": "15f0d38017d5", + "_type": "span", + "marks": [], + "text": " adds powerful modularity to Nextflow and enables scripts to contain multiple workflows. By default, the unnamed workflow is assumed to be the main entry point for the script, however, with numerous named workflows, the entry point can be customized by using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "entry", + "_key": "8e8054c25f46" + }, + { + "marks": [], + "text": " child-option of the run command:", + "_key": "f8d1080431e8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cee6c0c5bac3" + } + ], + "_type": "block", + "style": "normal", + "_key": "431bd58fe897" + }, + { + "code": "$ nextflow run main.nf -entry workflow1", + "_type": "code", + "_key": "042ce39434f7" + }, + { + "_key": "da619209a28e", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/dsl2.html#implicit-workflow", + "_key": "9ae8d4b33714", + "_type": "link" + } + ], + "children": [ + { + "_key": "ce97d7044e2b", + "_type": "span", + "marks": [], + "text": "This allows users to run a specific sub-workflow or a section of their entire workflow script. For more information, refer to the " + }, + { + "text": "implicit workflow", + "_key": "7ac639c27172", + "_type": "span", + "marks": [ + "9ae8d4b33714" + ] + }, + { + "_key": "19c9cc471b2b", + "_type": "span", + "marks": [], + "text": " section of the documentation." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "9af8e981ae11", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7369caa20cdb", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, as of version 20.09.1-edge, you can specify the script in a project to run other than ", + "_key": "d1ae77edbd7c" + }, + { + "_key": "ff50ad9d5e84", + "_type": "span", + "marks": [ + "code" + ], + "text": "main.nf" + }, + { + "_type": "span", + "marks": [], + "text": " using the command line option ", + "_key": "90a223fdf06e" + }, + { + "_key": "ad70a5e19ee4", + "_type": "span", + "marks": [ + "code" + ], + "text": "-main-script" + }, + { + "_type": "span", + "marks": [], + "text": ":", + "_key": "df6250fb7a83" + } + ], + "_type": "block", + "style": "normal", + "_key": "db137b5fb480" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a214947c4f46", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6e3e8fd35197" + }, + { + "_type": "code", + "_key": "30dd7f4e01e9", + "code": "$ nextflow run http://github.com/my/pipeline -main-script my-analysis.nf" + }, + { + "style": "h3", + "_key": "ba4b90a4f69b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Bonus trick! Web dashboard launched from the CLI", + "_key": "e4911c6cd4f6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "733e633f475b", + "_type": "span", + "marks": [], + "text": "The tricks above highlight the functionality of the Nextflow CLI. However, for long-running workflows, monitoring becomes all the more crucial. With Nextflow Tower, we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow execution wherever we are. Sign-in to " + }, + { + "_key": "835ee4efc8dc", + "_type": "span", + "marks": [ + "78b96fc3b98d" + ], + "text": "Tower" + }, + { + "marks": [], + "text": " using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, ", + "_key": "ab5bb1caccf7", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "~/.bashrc", + "_key": "b833263d1f5e", + "_type": "span" + }, + { + "_key": "b03539fd1f38", + "_type": "span", + "marks": [], + "text": " or include them in your " + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config:", + "_key": "3ec0efc65715", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5791813a4f49", + "markDefs": [ + { + "_type": "link", + "href": "https://tower.nf", + "_key": "78b96fc3b98d" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d46e7176b3e3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1974648cab41", + "_type": "span" + } + ] + }, + { + "code": "$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n$ export NXF_VER=20.07.1", + "_type": "code", + "_key": "a59e198bb227" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "85b813e0227e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d59d4e6b7fd2", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "4f8c1e69b99a", + "markDefs": [], + "children": [ + { + "text": "Next simply add the ", + "_key": "61209901f441", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-with-tower", + "_key": "3dd9bd4a2975" + }, + { + "text": "; child-option to any Nextflow run command. A URL with the monitoring dashboard will appear:", + "_key": "8caa9d7d0be7", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e18c0b4a40da", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9964b61dc7ba" + }, + { + "_key": "7ffa9a76c114", + "code": "$ nextflow run nextflow-io/hello -with-tower", + "_type": "code" + }, + { + "style": "normal", + "_key": "e1864562eb67", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2e926af9e987" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow Giveaway", + "_key": "81d52ccfc71c" + } + ], + "_type": "block", + "style": "h3", + "_key": "2ea447a60afa" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you want to look stylish while you put the above tips into practice, or simply like free stuff, we are giving away five of our latest Nextflow hoodie and sticker packs. Retweet or like the Nextflow tweet about this article and we will draw and notify the winners on October 31st!", + "_key": "c83d8bbf452b" + } + ], + "_type": "block", + "style": "normal", + "_key": "597e62b74ed2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5315e940ca76" + } + ], + "_type": "block", + "style": "normal", + "_key": "607b67145889", + "markDefs": [] + }, + { + "_type": "block", + "style": "h3", + "_key": "b05b89361e1c", + "markDefs": [], + "children": [ + { + "_key": "9e182c922cfa", + "_type": "span", + "marks": [], + "text": "About the Author" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.linkedin.com/in/abhi18av/", + "_key": "28202d2f1e4e" + }, + { + "href": "https://www.seqera.io", + "_key": "e69245a51845", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "28202d2f1e4e" + ], + "text": "Abhinav Sharma", + "_key": "98112b0da88b" + }, + { + "_key": "ce7c8685a1a4", + "_type": "span", + "marks": [], + "text": " is a Bioinformatics Engineer at " + }, + { + "_key": "4fee5ddab6db", + "_type": "span", + "marks": [ + "e69245a51845" + ], + "text": "Seqera Labs" + }, + { + "marks": [], + "text": " interested in Data Science and Cloud Engineering. He enjoys working on all things Genomics, Bioinformatics and Nextflow.", + "_key": "6f51547015be", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "97c3b6ac58c0" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7deac9b37098", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6c9f11bde344" + }, + { + "_key": "5ebdad595af1", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Acknowledgements", + "_key": "9379d00d54c3", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "68c04976b571", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/KevinSayers", + "_key": "3734d6d2d01c" + }, + { + "_type": "link", + "href": "https://github.com/apeltzer", + "_key": "6fd08f204816" + } + ], + "children": [ + { + "_key": "f15069f01280", + "_type": "span", + "marks": [], + "text": "Shout out to " + }, + { + "_type": "span", + "marks": [ + "3734d6d2d01c" + ], + "text": "Kevin Sayers", + "_key": "048e53641c24" + }, + { + "text": " and ", + "_key": "7c24897fd2a0", + "_type": "span", + "marks": [] + }, + { + "text": "Alexander Peltzer", + "_key": "e061aeb35772", + "_type": "span", + "marks": [ + "6fd08f204816" + ] + }, + { + "_type": "span", + "marks": [], + "text": " for their earlier efforts in documenting the CLI and which inspired this work.", + "_key": "51f3e0dd06f4" + } + ], + "_type": "block" + }, + { + "_key": "23c15a4f6f1e", + "markDefs": [], + "children": [ + { + "_key": "417a40a3ad91", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "dcf58fa87de7", + "markDefs": [ + { + "_key": "ef72d4664ebb", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/cli.html" + } + ], + "children": [ + { + "marks": [ + "em" + ], + "text": "The latest CLI docs can be found in the edge release docs at ", + "_key": "513ce70bb7b5", + "_type": "span" + }, + { + "text": "https://www.nextflow.io/docs/latest/cli.html", + "_key": "ea39ae1fb0aa", + "_type": "span", + "marks": [ + "em", + "ef72d4664ebb" + ] + } + ], + "_type": "block" + } + ], + "tags": [ + { + "_key": "59e18ed3d8a2", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "author": { + "_type": "reference", + "_ref": "5bLgfCKN00diCN0ijmWNOF" + } + }, + { + "_type": "blogPost", + "_id": "2708fd809c45", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:14:52Z", + "publishedAt": "2014-08-07T06:00:00.000Z", + "title": "Share Nextflow pipelines with GitHub", + "_rev": "rsIQ9Jd8Z4nKBVUruy4PqN", + "meta": { + "slug": { + "current": "share-nextflow-pipelines-with-github" + }, + "description": "The GitHub code repository and collaboration platform is widely used between researchers to publish their work and to collaborate on projects source code." + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "a07645557081", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com", + "_key": "8d3cdaafbbc5" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "f07352e346fa" + }, + { + "_type": "span", + "marks": [ + "8d3cdaafbbc5" + ], + "text": "GitHub", + "_key": "bffd3c605f8c" + }, + { + "_type": "span", + "marks": [], + "text": " code repository and collaboration platform is widely used between researchers to publish their work and to collaborate on projects source code.", + "_key": "0d38680be145" + } + ] + }, + { + "_key": "19d5b404ad3f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3fdf4550d0d8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "8966ba7a30ff", + "_type": "span", + "marks": [], + "text": "Even more interestingly a few months ago " + }, + { + "_key": "af922a3b69df", + "_type": "span", + "marks": [ + "be447868125a" + ], + "text": "GitHub announced improved support for researchers" + }, + { + "_key": "32ea1c1fd03a", + "_type": "span", + "marks": [], + "text": " making it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive." + } + ], + "_type": "block", + "style": "normal", + "_key": "7de6766c24e4", + "markDefs": [ + { + "_key": "be447868125a", + "_type": "link", + "href": "https://github.com/blog/1840-improving-github-for-science" + } + ] + }, + { + "_key": "2344ecb67856", + "markDefs": [], + "children": [ + { + "_key": "f8f42ff1afe2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With a DOI for your GitHub repository archive your code becomes formally citable in scientific publications.", + "_key": "f0bcb1fa9d82" + } + ], + "_type": "block", + "style": "normal", + "_key": "98aa8576ca4f" + }, + { + "style": "normal", + "_key": "2bb36eb467ce", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b415762506df", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Why use GitHub with Nextflow?", + "_key": "36db87aa1e5e" + } + ], + "_type": "block", + "style": "h2", + "_key": "0f0f4ad5ccae" + }, + { + "children": [ + { + "text": "The latest Nextflow release (0.9.0) seamlessly integrates with GitHub. This feature allows you to manage your code in a more consistent manner, or use other people's Nextflow pipelines, published through GitHub, in a quick and transparent manner.", + "_key": "9c0863a2ac18", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4376cd944273", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "062c51f1ab7a" + } + ], + "_type": "block", + "style": "normal", + "_key": "445d3e235cf9" + }, + { + "_type": "block", + "style": "h2", + "_key": "6973037bfa65", + "markDefs": [], + "children": [ + { + "_key": "1fe8ff800fa4", + "_type": "span", + "marks": [], + "text": "How it works" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The idea is very simple, when you launch a script execution with Nextflow, it will look for a file with the pipeline name you've specified. If that file does not exist, it will look for a public repository with the same name on GitHub. If it is found, the repository is automatically downloaded to your computer and the code executed. This repository is stored in the Nextflow home directory, by default ", + "_key": "cf09f0d37c8f", + "_type": "span" + }, + { + "_key": "37bd8cd027e6", + "_type": "span", + "marks": [ + "code" + ], + "text": "$HOME/.nextflow" + }, + { + "text": ", thus it will be reused for any further execution.", + "_key": "2f5af43c7ee5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c15c6c1039b7" + }, + { + "_key": "e492b588e82f", + "markDefs": [], + "children": [ + { + "_key": "0e2af18596ea", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "2528d31bb700", + "_type": "span", + "marks": [], + "text": "You can try this feature out, having Nextflow (version 0.9.0 or higher) installed in your computer, by simply entering the following command in your shell terminal:" + } + ], + "_type": "block", + "style": "normal", + "_key": "19bd0690e0fa" + }, + { + "_type": "block", + "style": "normal", + "_key": "b8da00867f32", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "97ab941d45b6", + "_type": "span" + } + ] + }, + { + "_type": "code", + "_key": "f7f22048de16", + "code": "nextflow run nextflow-io/hello" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The first time you execute this command Nextflow will download the pipeline at the following GitHub repository ", + "_key": "459bc1a12ab2" + }, + { + "marks": [ + "code" + ], + "text": "https://github.com/nextflow-io/hello", + "_key": "c6b9193e85d4", + "_type": "span" + }, + { + "text": ", as you don't already have it in your computer. It will then execute it producing the expected output.", + "_key": "0728e481dc7f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "781494c54ca6" + }, + { + "style": "normal", + "_key": "937bb8364240", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "4cdfed786d5b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "71c2309548c3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In order for a GitHub repository to be used as a Nextflow project, it must contain at least one file named ", + "_key": "52fcf1489e3d" + }, + { + "marks": [ + "code" + ], + "text": "main.nf", + "_key": "c2a20de182da", + "_type": "span" + }, + { + "marks": [], + "text": " that defines your Nextflow pipeline script.", + "_key": "a54f71789970", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7604c242816b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3ea2f52ce73f" + }, + { + "markDefs": [], + "children": [ + { + "_key": "410b1f1c76ff", + "_type": "span", + "marks": [], + "text": "Run a specific revision" + } + ], + "_type": "block", + "style": "h2", + "_key": "9bd7535e34c1" + }, + { + "_type": "block", + "style": "normal", + "_key": "305b9189140c", + "markDefs": [], + "children": [ + { + "text": "Any Git branch, tag or commit ID in the GitHub repository can be used to specify a revision, that you want to execute, when running your pipeline by adding the ", + "_key": "76f03c29af47", + "_type": "span", + "marks": [] + }, + { + "_key": "6b4ddbf51df5", + "_type": "span", + "marks": [ + "code" + ], + "text": "-r" + }, + { + "_type": "span", + "marks": [], + "text": " option to the run command line. So for example you could enter:", + "_key": "c34c3a9c47af" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "dcd46900d752", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5f9e0f849ff5", + "_type": "span", + "marks": [] + } + ] + }, + { + "code": "nextflow run nextflow-io/hello -r mybranch", + "_type": "code", + "_key": "eb6675c43cdb" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "or", + "_key": "39a57b89ca7f" + } + ], + "_type": "block", + "style": "normal", + "_key": "6d73707b1a83" + }, + { + "style": "normal", + "_key": "638de0863669", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "dc9b14611729", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "f7490b06466b", + "code": "nextflow run nextflow-io/hello -r v1.1", + "_type": "code" + }, + { + "children": [ + { + "_key": "39a4499b523a", + "_type": "span", + "marks": [], + "text": "This can be very useful when comparing different versions of your project. It also guarantees consistent results in your pipeline as your source code evolves." + } + ], + "_type": "block", + "style": "normal", + "_key": "fb41e7586dad", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "08c032c4c714", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "54b6b2ed3353", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "0a0b1df54c89", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Commands to manage pipelines", + "_key": "235e7ed4db7a", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "d39383233d8c", + "markDefs": [ + { + "href": "http://git-scm.com/", + "_key": "48c9ce7e70cb", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The following commands allows you to perform some basic operations that can be used to manage your pipelines. Anyway Nextflow is not meant to replace functionalities provided by the ", + "_key": "f720c1488228", + "_type": "span" + }, + { + "text": "Git", + "_key": "b00b6960345a", + "_type": "span", + "marks": [ + "48c9ce7e70cb" + ] + }, + { + "_key": "54a1ede67df9", + "_type": "span", + "marks": [], + "text": " tool, you may still need it to create new repositories or commit changes, etc." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "353b56cd31f4", + "markDefs": [], + "children": [ + { + "_key": "a23026b241ab", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "8afe6ec229c3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nList available pipelines", + "_key": "52a277daf7ab" + } + ] + }, + { + "_key": "ce2a059b90e5", + "markDefs": [], + "children": [ + { + "_key": "6b626464167d", + "_type": "span", + "marks": [], + "text": "The " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "ls", + "_key": "5c104452a548" + }, + { + "_type": "span", + "marks": [], + "text": " command allows you to list all the pipelines you have downloaded in your computer. For example:", + "_key": "df1b21df73d9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "8535d67df668", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "3d57d8c66b15", + "_type": "span" + } + ] + }, + { + "_type": "code", + "_key": "a68e18c9c168", + "code": "nextflow ls" + }, + { + "markDefs": [], + "children": [ + { + "_key": "3eb2d1d4d5fe", + "_type": "span", + "marks": [], + "text": "This prints a list similar to the following one:" + } + ], + "_type": "block", + "style": "normal", + "_key": "bb980c9dda3b" + }, + { + "markDefs": [], + "children": [ + { + "_key": "f2b1f4a97860", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "002708302e68" + }, + { + "code": "cbcrg/piper-nf\nnextflow-io/hello", + "_type": "code", + "_key": "52baf65ff713" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nShow pipeline information", + "_key": "2d98264aeece" + } + ], + "_type": "block", + "style": "h3", + "_key": "9c2bdd88cbe5" + }, + { + "_type": "block", + "style": "normal", + "_key": "dc9af4b62ac2", + "markDefs": [], + "children": [ + { + "text": "By using the ", + "_key": "0103ad40e6d6", + "_type": "span", + "marks": [] + }, + { + "text": "info", + "_key": "8526418c4be5", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " command you can show information from a downloaded pipeline. For example:", + "_key": "a9e0511c6504" + } + ] + }, + { + "style": "normal", + "_key": "111720510d39", + "markDefs": [], + "children": [ + { + "_key": "8aea29465d1d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "62299f232532", + "code": "$ nextflow info hello" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This command prints:", + "_key": "cbabc944c610" + } + ], + "_type": "block", + "style": "normal", + "_key": "b828101c8f09" + }, + { + "style": "normal", + "_key": "7c846bbe7e68", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6333b5fa8956", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "d91598db0ee6", + "code": " repo name : nextflow-io/hello\n home page : http://github.com/nextflow-io/hello\n local path : $HOME/.nextflow/assets/nextflow-io/hello\n main script: main.nf\n revisions :\n * master (default)\n mybranch\n v1.1 [t]\n v1.2 [t]" + }, + { + "style": "normal", + "_key": "08c65d5ad991", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Starting from the top it shows: 1) the repository name; 2) the project home page; 3) the local folder where the pipeline has been downloaded; 4) the script that is executed when launched; 5) the list of available revisions i.e. branches + tags. Tags are marked with a ", + "_key": "019315b8078c", + "_type": "span" + }, + { + "text": "[t]", + "_key": "beebcf6883ca", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " on the right, the current checked-out revision is marked with a ", + "_key": "9dd5f2c2277f", + "_type": "span" + }, + { + "_key": "c5ba85d7aa35", + "_type": "span", + "marks": [ + "code" + ], + "text": "*" + }, + { + "marks": [], + "text": " on the left.", + "_key": "39860df19739", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3cfe35b303b9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "8afd3726f204" + } + ] + }, + { + "style": "h3", + "_key": "b56130be1fac", + "markDefs": [], + "children": [ + { + "text": "Pull or update a pipeline", + "_key": "184b46660633", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "6a28029a50db", + "_type": "span", + "marks": [], + "text": "The " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "pull", + "_key": "22ff025b48e0" + }, + { + "text": " command allows you to download a pipeline from a GitHub repository or to update it if that repository has already been downloaded. For example:", + "_key": "21e9b3e45938", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2c91d32c2327" + }, + { + "style": "normal", + "_key": "dddb456e706c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "15959450a6b3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "nextflow pull nextflow-io/examples", + "_type": "code", + "_key": "ceb2f6ebb71f" + }, + { + "style": "normal", + "_key": "eab1050eec07", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Downloaded pipelines are stored in the folder ", + "_key": "2a351c0e3dfc" + }, + { + "marks": [ + "code" + ], + "text": "$HOME/.nextflow/assets", + "_key": "d14fc4139a88", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in your computer.", + "_key": "4e1ad29f05a3" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "\n", + "_key": "cbdb0dab8f50", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9d59f305e496", + "markDefs": [] + }, + { + "children": [ + { + "_key": "a67a3da07d96", + "_type": "span", + "marks": [], + "text": "Clone a pipeline into a folder" + } + ], + "_type": "block", + "style": "h3", + "_key": "dbcf52d2809c", + "markDefs": [] + }, + { + "style": "normal", + "_key": "7c1878ba526e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "11e0c504de73" + }, + { + "_key": "7e397ca4c35d", + "_type": "span", + "marks": [ + "code" + ], + "text": "clone" + }, + { + "marks": [], + "text": " command allows you to copy a Nextflow pipeline project to a directory of your choice. For example:", + "_key": "b4597ea0095c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "0405e83300a1", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d549ce7b68cc", + "markDefs": [] + }, + { + "_key": "c8b679c3c986", + "code": "nextflow clone nextflow-io/hello target-dir", + "_type": "code" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "If the destination directory is omitted the specified pipeline is cloned to a directory with the same name as the pipeline ", + "_key": "dde6be18172c" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "base", + "_key": "30fd799ef449" + }, + { + "text": " name (e.g. ", + "_key": "40c090accee8", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "hello", + "_key": "5ce119e03e6d" + }, + { + "text": ") in the current folder.", + "_key": "75f77c371e2e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f565cada7f81", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "4e2fb8327335", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "d3939ac3ef8f", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_key": "1ea8b3776003", + "_type": "span", + "marks": [], + "text": "The clone command can be used to inspect or modify the source code of a pipeline. You can eventually commit and push back your changes by using the usual Git/GitHub workflow." + } + ], + "_type": "block", + "style": "normal", + "_key": "6a354346806f", + "markDefs": [] + }, + { + "style": "normal", + "_key": "13329edd3cfb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "ec81f6b87b26" + } + ], + "_type": "block" + }, + { + "_key": "ea65e6cabc92", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Drop an installed pipeline", + "_key": "2b3613954c13", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "701bcc7edbb0", + "markDefs": [], + "children": [ + { + "_key": "45db8e1eeb90", + "_type": "span", + "marks": [], + "text": "Downloaded pipelines can be deleted by using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "drop", + "_key": "fd86cfb67261" + }, + { + "_key": "ec5e9af9521e", + "_type": "span", + "marks": [], + "text": " command, as shown below:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "347a394d9b6f" + } + ], + "_type": "block", + "style": "normal", + "_key": "e0768ec7b1e9", + "markDefs": [] + }, + { + "code": "nextflow drop nextflow-io/hello", + "_type": "code", + "_key": "992ef3fb72d8" + }, + { + "children": [ + { + "_key": "ab5e858d072e", + "_type": "span", + "marks": [], + "text": "Limitations and known problems" + } + ], + "_type": "block", + "style": "h2", + "_key": "b8430d7c759a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "956fa652fad6", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strike-through" + ], + "text": "GitHub private repositories currently are not supported", + "_key": "8a997a8767a30" + }, + { + "text": " Support for private GitHub repositories has been introduced with version 0.10.0.", + "_key": "8a997a8767a31", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "children": [ + { + "_key": "70e20ff448230", + "_type": "span", + "marks": [ + "strike-through" + ], + "text": "Symlinks committed in a Git repository are not resolved correctly when downloaded/cloned by Nextflow" + }, + { + "_type": "span", + "marks": [], + "text": " Symlinks are resolved correctly when using Nextflow version 0.11.0 (or higher).", + "_key": "70e20ff448231" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "455e0c9956c6", + "listItem": "bullet", + "markDefs": [] + } + ], + "_updatedAt": "2024-10-02T13:33:20Z" + }, + { + "meta": { + "noIndex": false, + "slug": { + "_type": "slug", + "current": "multiqc-grouped-samples" + }, + "_type": "meta", + "shareImage": { + "_type": "image", + "asset": { + "_ref": "image-d7dd7dfbf392ebb35e2f6a2be71934efc944ccc4-1200x1200-png", + "_type": "reference" + } + }, + "description": "Introducing grouped table rows with collapsed sub-samples! Also big performance improvements and a new ability to work as a Python library within scripts, notebooks and Python apps." + }, + "_updatedAt": "2024-10-16T13:31:10Z", + "_createdAt": "2024-05-23T06:21:37Z", + "_id": "28fbd463-3640-4195-8c8f-82cf183846f9", + "_rev": "mvya9zzDXWakVjnX4hBcNe", + "title": "MultiQC: Grouped samples and custom scripts", + "author": { + "_ref": "phil-ewels", + "_type": "reference" + }, + "_type": "blogPost", + "body": [ + { + "children": [ + { + "text": "It’s been an exciting year for the MultiQC team at Seqera, with developments aimed at modernizing the codebase and expanding functionality. In this blog post we’ll recap the big features, such as long-awaited ", + "_key": "f07cbc53bd370", + "_type": "span", + "marks": [] + }, + { + "text": "Sample Grouping", + "_key": "f07cbc53bd371", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to simplify report tables, as well as the ability to use MultiQC as a Python library, enabling custom scripts and dynamic report generation. And there’s even more to come – stay tuned for the upcoming ", + "_key": "f07cbc53bd372" + }, + { + "marks": [ + "85a288678e95" + ], + "text": "MultiQC talk", + "_key": "f07cbc53bd373", + "_type": "span" + }, + { + "_key": "f07cbc53bd374", + "_type": "span", + "marks": [], + "text": " at the Nextflow Summit in Barcelona, excitement guaranteed!" + } + ], + "_type": "block", + "style": "normal", + "_key": "825c0af35887", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/2024/barcelona/agenda/10-31--multiqc-new-features-and-flexible/", + "_key": "85a288678e95" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "a2253fab8405", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Sample grouping 🫂", + "_key": "05dc46b2aa760" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Many of you who are used to reading MultiQC reports will be familiar with seeing ", + "_key": "8808adba7b870" + }, + { + "text": "General Statistics", + "_key": "8808adba7b871", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " tables that have “gaps” in rows like this:", + "_key": "8808adba7b872" + } + ], + "_type": "block", + "style": "normal", + "_key": "e597c2f9c4b4" + }, + { + "_type": "image", + "_key": "f111f647ef2a", + "asset": { + "_ref": "image-0bc6a5e44bd0449bf63fa8a3fe9380e10fcaed01-3482x2064-png", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This happens because MultiQC finds sample names from input data filenames. In the case of FastQC, paired-end sequencing data will have two FASTQ files and generate two separate FastQC reports. This means each sample name has a ", + "_key": "fe85e8a0b3450" + }, + { + "marks": [ + "code" + ], + "text": "_R1", + "_key": "5c683f75103f", + "_type": "span" + }, + { + "_key": "387b4dbd0c21", + "_type": "span", + "marks": [], + "text": " or " + }, + { + "_key": "be0566f07b2f", + "_type": "span", + "marks": [ + "code" + ], + "text": "_R2" + }, + { + "_key": "dc7b7eb36c44", + "_type": "span", + "marks": [], + "text": " suffix and cannot be merged with outputs from downstream analysis, where these are collapsed into a single sample identifier. Until now, the best advice we’ve been able to give is to either throw half of the data away or put up with the ugly tables - neither are good options!" + } + ], + "_type": "block", + "style": "normal", + "_key": "0acde0d59fb6" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/MultiQC/MultiQC/issues/542", + "_key": "09acce0b2b3f" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "One of the oldest open issues in the MultiQC repo (", + "_key": "8eef787f80ea0" + }, + { + "_type": "span", + "marks": [ + "09acce0b2b3f" + ], + "text": "#542", + "_key": "8eef787f80ea1" + }, + { + "_type": "span", + "marks": [], + "text": ", from 2017) is about introducing a new technique to group samples. Phil started a branch to work on the problem but hit a wall, leaving the comment ", + "_key": "8eef787f80ea2" + }, + { + "marks": [ + "em" + ], + "text": "“This got really complicated. Need to think about how to improve it.”", + "_key": "8eef787f80ea3", + "_type": "span" + }, + { + "marks": [], + "text": " There it sat, racking up occasional comments and requests for updates.", + "_key": "8eef787f80ea4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5338993f69c" + }, + { + "style": "normal", + "_key": "3423abf00059", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Finally in MultiQC v1.25, seven years after this issue was created, we’re delighted to introduce – ", + "_key": "cd6951d8332a0" + }, + { + "_key": "cd6951d8332a1", + "_type": "span", + "marks": [ + "em" + ], + "text": "Sample grouping" + }, + { + "_type": "span", + "marks": [], + "text": ":", + "_key": "cd6951d8332a2" + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "48911ef9fb45", + "asset": { + "_ref": "image-c12d430eb8dc05b871a48add5e8f8e22c2ff6028-1640x720-gif", + "_type": "reference" + } + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "This new ", + "_key": "f9dd723187bd0" + }, + { + "text": "table_sample_merge", + "_key": "988338ff4a02", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " config option allows you to specify sample name suffixes to group into a single row (see ", + "_key": "d921626516f4", + "_type": "span" + }, + { + "_key": "f9dd723187bd1", + "_type": "span", + "marks": [ + "48b4eec2b409" + ], + "text": "docs" + }, + { + "_type": "span", + "marks": [], + "text": "). When set, MultiQC will group samples in supported modules under a common prefix. Any component sample statistics can be shown by toggling the caret in the row header, with summary statistics on the main row. This allows a compressed yet accurate overview of all samples, whilst still allowing readers of the report to dig in and see the underlying data for each input sample.", + "_key": "f9dd723187bd2" + } + ], + "_type": "block", + "style": "normal", + "_key": "5f7071610ad9", + "markDefs": [ + { + "href": "https://docs.seqera.io/multiqc/reports/customisation#sample-grouping", + "_key": "48b4eec2b409", + "_type": "link" + } + ] + }, + { + "_key": "8a074f0efc68", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For now, the new config option is opt-in, but we hope to soon set some common suffixes such as ", + "_key": "edc4f643059d0" + }, + { + "_key": "c67dcad3c755", + "_type": "span", + "marks": [ + "code" + ], + "text": "_R1" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "18b26913d3a7" + }, + { + "marks": [ + "code" + ], + "text": "_R2", + "_key": "7a569be866d9", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " as defaults for all users. Some modules have the concept of sub-samples within parsed data (e.g., flow cells → lanes) and use sample grouping without needing additional configuration. The sample grouping implementation is entirely bespoke to each MultiQC module: each column needs consideration as to whether it should be averaged, summed, or something else. We’ve added support to key modules such as FastQC, Cutadapt, and BCLConvert, and plan to add support to more modules over time.", + "_key": "4811b985f605" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "MultiQC as a library 📜", + "_key": "e2bf849132580", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "32587d324a17" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Version 1.22 brought some major behind-the-scenes refactoring to MultiQC. These changes enable MultiQC to be used as a library within scripts. It adds another way to customize report content beyond “Custom Content” and MultiQC Plugins, as you can now dynamically inject data, filter, and customize report content within a script. Ideal for use within analysis pipelines!", + "_key": "91f6511e64730" + } + ], + "_type": "block", + "style": "normal", + "_key": "0f9ae21494d4" + }, + { + "style": "normal", + "_key": "6049bf0f935f", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/OpenGene/fastp", + "_key": "f089f7503fa5" + } + ], + "children": [ + { + "text": "Let's look at a very basic example to give a feel for how this could be used. Here, we have a Python script that imports MultiQC, parses report data from ", + "_key": "3bb10cbfcadb0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "f089f7503fa5" + ], + "text": "fastp", + "_key": "3bb10cbfcadb1", + "_type": "span" + }, + { + "_key": "3bb10cbfcadb2", + "_type": "span", + "marks": [], + "text": ", adds a custom report section and table, and then generates a report." + } + ], + "_type": "block" + }, + { + "language": "python", + "_key": "2fc6817b3c84", + "code": "import multiqc\nfrom multiqc.plots import table\n\n# Parse logs from fastp\nmultiqc.parse_logs('./data/fastp')\n\n# Add a custom table\nmodule = multiqc.BaseMultiqcModule()\nmodule.add_section(\n plot=table.plot(\n data={\n \"sample 1\": {\"aligned\": 23542, \"not_aligned\": 343},\n \"sample 2\": {\"aligned\": 1275, \"not_aligned\": 7328},\n },\n pconfig={\n \"id\": \"my_metrics_table\",\n \"title\": \"My metrics\"\n }\n )\n)\nmultiqc.report.modules.append(module)\n\n# Generate the report\nmultiqc.write_report()", + "_type": "code" + }, + { + "_key": "0d4aca5310f8", + "markDefs": [ + { + "_key": "07b92ff1c1c2", + "_type": "link", + "href": "https://docs.seqera.io/multiqc/development/plugins" + } + ], + "children": [ + { + "_key": "90677f2e0a320", + "_type": "span", + "marks": [], + "text": "Scripts like this can be written to do any number of things. We hope it removes the need to run MultiQC multiple times to report on secondary statistics. It can also enable customization of things like table columns, custom data injection, and most other things you can think of! Best of all, unlike " + }, + { + "marks": [ + "07b92ff1c1c2" + ], + "text": "MultiQC plugins", + "_key": "90677f2e0a321", + "_type": "span" + }, + { + "text": ", no special installation is needed. This will be hugely powerful for custom analysis and reporting. It also means that MultiQC becomes a first-class citizen for explorative analysis within notebooks and analysis apps.", + "_key": "90677f2e0a322", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4bf457962649", + "markDefs": [ + { + "href": "https://multiqc.info/docs/usage/interactive/", + "_key": "8135552fbdf7", + "_type": "link" + }, + { + "href": "https://community.seqera.io/multiqc", + "_key": "ec397c4c15d7", + "_type": "link" + } + ], + "children": [ + { + "_key": "98837a87371a0", + "_type": "span", + "marks": [], + "text": "See the new " + }, + { + "_type": "span", + "marks": [ + "8135552fbdf7" + ], + "text": "Using MultiQC in interactive environments", + "_key": "98837a87371a1" + }, + { + "_type": "span", + "marks": [], + "text": " page to learn more about MultiQC Python functions. ", + "_key": "98837a87371a2" + }, + { + "text": "Let us know", + "_key": "98837a87371a3", + "_type": "span", + "marks": [ + "ec397c4c15d7" + ] + }, + { + "_type": "span", + "marks": [], + "text": " how you get on with this functionality - we’d love to see what you build!", + "_key": "98837a87371a4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "31398f87eeaf", + "markDefs": [], + "children": [ + { + "_key": "343f5cc92fe80", + "_type": "span", + "marks": [], + "text": "Major performance improvements 🚅" + } + ] + }, + { + "_key": "5817aa826cf0", + "markDefs": [ + { + "href": "https://github.com/rhpvorderman", + "_key": "fd09f4c005cf", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In MultiQC v1.22 we’ve had a number of high-impact pull requests from ", + "_key": "17f6d6a138610" + }, + { + "_key": "17f6d6a138611", + "_type": "span", + "marks": [ + "fd09f4c005cf" + ], + "text": "@rhpvorderman" + }, + { + "_type": "span", + "marks": [], + "text": ". He did a deep-dive on the compression that MultiQC uses for embedding data within the HTML reports, switching the old ", + "_key": "17f6d6a138612" + }, + { + "text": "lzstring", + "_key": "19801d576faf", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " compression for a more up-to-date ", + "_key": "45969e812380" + }, + { + "text": "gzip", + "_key": "c5f9f1fce5cc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " implementation, which made writing reports ", + "_key": "fa6d2402801f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "4x times faster", + "_key": "17f6d6a138613" + }, + { + "marks": [], + "text": ".", + "_key": "17f6d6a138614", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e355a0723a10", + "markDefs": [], + "children": [ + { + "_key": "d3ed30f558930", + "_type": "span", + "marks": [], + "text": "He also significantly optimized the file search, making it " + }, + { + "_key": "d3ed30f558931", + "_type": "span", + "marks": [ + "strong" + ], + "text": "54% faster" + }, + { + "_type": "span", + "marks": [], + "text": " on our benchmarks, and key modules. For example, ", + "_key": "d3ed30f558932" + }, + { + "text": "FastQC got 6x faster and uses 10x less memory", + "_key": "d3ed30f558933", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "d3ed30f558934", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "688e13eb74b5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Taken together, comparing a typical v1.22 run against v1.21 shows that MultiQC is ", + "_key": "ca2a7d2cc9ea0" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "53% faster", + "_key": "ca2a7d2cc9ea1" + }, + { + "text": " and has a ", + "_key": "ca2a7d2cc9ea2", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "6x smaller peak-memory footprint", + "_key": "ca2a7d2cc9ea3", + "_type": "span" + }, + { + "marks": [], + "text": ". It’s well worth updating!", + "_key": "ca2a7d2cc9ea4", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "27a4647d4af0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.science.org/doi/full/10.1126/sciadv.aba1190", + "_key": "98689cf6da09" + } + ], + "children": [ + { + "marks": [], + "text": "To get these numbers for real-world scenarios, we tested some huge input datasets (many thanks to Felix Krueger for helping with these). For example, from ", + "_key": "bd242e39c19c0", + "_type": "span" + }, + { + "_key": "bd242e39c19c1", + "_type": "span", + "marks": [ + "98689cf6da09" + ], + "text": "Xing et. al. 2020" + }, + { + "_type": "span", + "marks": [], + "text": ":", + "_key": "bd242e39c19c2" + } + ], + "_type": "block" + }, + { + "_key": "187c0ee80e97", + "asset": { + "_type": "reference", + "_ref": "image-d8472852383a8de068c60e4b67cccb9401fda6e8-2202x1206-svg" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "4c2e62b1a6ce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "These three runs were run with identical inputs and generated essentially identical reports.", + "_key": "26881f3f445b0" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c5778edba7c9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "These improvements will be especially noticeable with large runs. Improvements are also especially significant in certain MultiQC modules, including FastQC (10x less peak memory), Mosdepth, and Kraken (~20x improvement in memory and CPU in MultiQC v1.24, larger improvements with more samples).", + "_key": "29c455e60d230", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We hope that this makes MultiQC more usable at scale and makes your analysis pipelines run a little smoother!", + "_key": "0299272ee77e0" + } + ], + "_type": "block", + "style": "normal", + "_key": "16f4ee957baa" + }, + { + "_key": "03fd047a042d", + "markDefs": [], + "children": [ + { + "_key": "fe15ec5a699d0", + "_type": "span", + "marks": [], + "text": "Unit tests 🧪" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Until now, MultiQC only had rudimentary end-to-end testing - each continuous integration test simply runs MultiQC on a range of test data and checks that it doesn’t crash (there are a few more bells and whistles, but that’s the essence of it). These CI tests have worked remarkably well, considering. However - they do not catch unintentional changes to data outputs and are limited in their scope.", + "_key": "8dac686af5890" + } + ], + "_type": "block", + "style": "normal", + "_key": "9a1dcc7171ea", + "markDefs": [] + }, + { + "children": [ + { + "text": "Version 1.23 of MultiQC introduced unit tests. These small, isolated tests are a cornerstone of modern software development. A suite of ", + "_key": "69fec00fe0b70", + "_type": "span", + "marks": [] + }, + { + "_key": "69fec00fe0b71", + "_type": "span", + "marks": [ + "8c10e89e187d" + ], + "text": "pytest" + }, + { + "_key": "69fec00fe0b72", + "_type": "span", + "marks": [], + "text": " tests now cover most of the core library code. Pytest is also used to “just run” modules as before (with 90% code coverage!), but going forward we will require module authors to include a tests directory with custom detailed unit tests. See " + }, + { + "marks": [ + "49e1086b5601" + ], + "text": "Tests", + "_key": "69fec00fe0b73", + "_type": "span" + }, + { + "_key": "69fec00fe0b74", + "_type": "span", + "marks": [], + "text": " for more information." + } + ], + "_type": "block", + "style": "normal", + "_key": "ea3631bb639a", + "markDefs": [ + { + "href": "https://docs.pytest.org/", + "_key": "8c10e89e187d", + "_type": "link" + }, + { + "_type": "link", + "href": "https://docs.seqera.io/multiqc/development/modules#tests", + "_key": "49e1086b5601" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e632a2c061ef", + "markDefs": [], + "children": [ + { + "text": "It’s a lot of work to add useful test coverage to such a large codebase, and anyone familiar with the topic will know that it’s a job that’s never done. However, now that we have a framework and pattern in place we’re hopeful that test coverage will steadily increase and code quality with it.", + "_key": "154f52b5a7560", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "62766583b084", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Refactoring and static typing 📐", + "_key": "751fd31f3f260", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a6ef8f12773a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "MultiQC v1.22 refactoring brings with it the first wave of Pydantic models in the back end. This unlocks run-time validation of plot config attributes - we found and fixed a lot of bugs with this already! The code looks very similar, but the Pydantic models use classes that allow most code IDEs to highlight errors as you write. Validation at run time also means that you catch typos right away, instead of wondering why your configuration is not being applied.", + "_key": "ddace25f43050", + "_type": "span" + } + ] + }, + { + "_type": "image", + "_key": "26d505ad4426", + "asset": { + "_ref": "image-fd6b80f3dcb5eaf7c243180d8f926e59b013795c-1175x545-svg", + "_type": "reference" + } + }, + { + "asset": { + "_ref": "image-2cec52b9fc023cf0214e91ae653af17ab68d8631-2237x634-png", + "_type": "reference" + }, + "_type": "image", + "_key": "135a4646454f" + }, + { + "_type": "block", + "style": "normal", + "_key": "91e612186bc9", + "markDefs": [ + { + "_type": "link", + "href": "https://mypy-lang.org/", + "_key": "de93e9dea13b" + } + ], + "children": [ + { + "_key": "aaf630994d490", + "_type": "span", + "marks": [], + "text": "Along similar lines, the core MultiQC library and test suite has had type annotations added throughout, complete with CI testing using " + }, + { + "text": "mypy", + "_key": "aaf630994d491", + "_type": "span", + "marks": [ + "de93e9dea13b" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". We will progressively add typing to all MultiQC modules over time. Typing also helps the MultiQC developer experience, with rich IDE integrations and earlier bug-catching.", + "_key": "aaf630994d492" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "5369acf21e63", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "HighCharts removed 🗑", + "_key": "5e015b0762a50" + } + ] + }, + { + "children": [ + { + "text": "In v1.20 we added support for using Plotly instead of HighCharts for graphs in MultiQC reports. We left the HighCharts code in place whilst we transitioned to the new library, in case people hit any major issues with Plotly. As of v1.22 the HighCharts support (via ", + "_key": "20b3965936730", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "--template highcharts", + "_key": "ffa5eceee420" + }, + { + "text": ") has been removed completely. See the ", + "_key": "5c85500d9de2", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "85116740be7f" + ], + "text": "MultiQC: A fresh coat of paint", + "_key": "20b3965936731", + "_type": "span" + }, + { + "marks": [], + "text": " blog to find out more about this topic.", + "_key": "20b3965936732", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8c352ae0daf", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/multiqc-plotly/", + "_key": "85116740be7f" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "f7684112839e", + "markDefs": [], + "children": [ + { + "_key": "330cd974225b0", + "_type": "span", + "marks": [], + "text": "Moving to seqera.io" + } + ] + }, + { + "_key": "3fb7ceae5313", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Since MultiQC joined the Seqera family in 2022, we’ve been steadily improving integration with other Seqera tools and websites. Last year, we launched the Seqera Community Forum with a dedicated MultiQC section, which has been a valuable resource for users. Recently, we’ve continued this effort by moving all MultiQC documentation to Seqera.io, providing a single, streamlined location for accessing information and searching across all Seqera tools. Old links will still redirect, ensuring a smooth transition.", + "_key": "e79f1010d2450", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "We’re also excited to announce that we’re launching a new MultiQC product page at ", + "_key": "44fc010597ec0", + "_type": "span" + }, + { + "marks": [ + "04cd3809c790" + ], + "text": "https://seqera.io/multiqc/", + "_key": "44fc010597ec1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " with an updated design, which will replace ", + "_key": "44fc010597ec2" + }, + { + "text": "https://multqc.info", + "_key": "44fc010597ec3", + "_type": "span", + "marks": [ + "205fbd181003" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". This fresh look aligns with the rest of the Seqera ecosystem, making it easier to explore MultiQC’s features and stay up to date with future developments.", + "_key": "44fc010597ec4" + } + ], + "_type": "block", + "style": "normal", + "_key": "d4f6fd75df99", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/multiqc/", + "_key": "04cd3809c790" + }, + { + "_key": "205fbd181003", + "_type": "link", + "href": "https://multiqc.info" + } + ] + } + ], + "tags": [ + { + "_key": "933aa64152ba", + "_ref": "ea6c309b-154f-45c3-9fda-650d7764b260", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "0e00c52955ba", + "_ref": "be8b298c-af12-4b5f-89cd-d2e208580926" + } + ], + "publishedAt": "2024-10-16T06:00:00.000Z" + }, + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_updatedAt": "2024-09-25T14:17:42Z", + "_rev": "hf9hwMPb7ybAE3bqEU5sB5", + "publishedAt": "2023-05-04T06:00:00.000Z", + "_id": "2ae82a00fd10", + "_createdAt": "2024-09-25T14:17:42Z", + "title": "Selecting the right storage architecture for your Nextflow pipelines", + "_type": "blogPost", + "body": [ + { + "_key": "5f7c964bd985", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "In this article we present the various storage solutions supported by Nextflow including on-prem and cloud file systems, parallel file systems, and cloud object stores. We also discuss Fusion file system 2.0, a new high-performance file system that can help simplify configuration, improve throughput, and reduce costs in the cloud.", + "_key": "05798fc17faf", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "bb8b5ddebdb6", + "children": [ + { + "_key": "594b7a617944", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "d924ae31546e", + "_type": "span", + "marks": [], + "text": "At one time, selecting a file system for distributed workloads was straightforward. Through the 1990s, the Network File System (NFS), developed by Sun Microsystems in 1984, was pretty much the only game in town. It was part of every UNIX distribution, and it presented a standard " + }, + { + "_type": "span", + "marks": [ + "36e716fcc3df" + ], + "text": "POSIX interface", + "_key": "7b9a7733b269" + }, + { + "_type": "span", + "marks": [], + "text": ", meaning that applications could read and write data without modification. Dedicated NFS servers and NAS filers became the norm in most clustered computing environments.", + "_key": "3ff83fd939e2" + } + ], + "_type": "block", + "style": "normal", + "_key": "f760214b145d", + "markDefs": [ + { + "_type": "link", + "href": "https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html", + "_key": "36e716fcc3df" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "58459c889529", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "af61c2a1ff6b" + }, + { + "markDefs": [ + { + "href": "https://www.lustre.org/", + "_key": "0df43e0198dd", + "_type": "link" + }, + { + "_key": "d6dd8a943ce3", + "_type": "link", + "href": "https://www.anl.gov/mcs/pvfs-parallel-virtual-file-system" + }, + { + "_type": "link", + "href": "https://openzfs.org/wiki/Main_Page", + "_key": "acb440cf4cb2" + }, + { + "href": "https://www.beegfs.io/c/", + "_key": "0c713dc3e4be", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.ibm.com/products/storage-scale-system", + "_key": "19d720fb5e9e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For organizations that outgrew the capabilities of NFS, other POSIX file systems emerged. These included parallel file systems such as ", + "_key": "bedbd03c48e7" + }, + { + "_type": "span", + "marks": [ + "0df43e0198dd" + ], + "text": "Lustre", + "_key": "4a2c63a9cea5" + }, + { + "text": ", ", + "_key": "601d921ccead", + "_type": "span", + "marks": [] + }, + { + "text": "PVFS", + "_key": "0e5c554ec8a1", + "_type": "span", + "marks": [ + "d6dd8a943ce3" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "015ae3722947" + }, + { + "_type": "span", + "marks": [ + "acb440cf4cb2" + ], + "text": "OpenZFS", + "_key": "184156090139" + }, + { + "_key": "9ec885d644c5", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "text": "BeeGFS", + "_key": "ec3d47609d10", + "_type": "span", + "marks": [ + "0c713dc3e4be" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "bca5f6c5c00c" + }, + { + "marks": [ + "19d720fb5e9e" + ], + "text": "IBM Spectrum Scale", + "_key": "15a066811fc7", + "_type": "span" + }, + { + "text": " (formerly GPFS). Parallel file systems can support thousands of compute clients and deliver more than a TB/sec combined throughput, however, they are expensive, and can be complex to deploy and manage. While some parallel file systems work with standard Ethernet, most rely on specialized low-latency fabrics such as Intel® Omni-Path Architecture (OPA) or InfiniBand. Because of this, these file systems are typically found in only the largest HPC data centers.", + "_key": "e8e84009af5d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "be2304b67bef" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "fa91c438506f" + } + ], + "_type": "block", + "style": "normal", + "_key": "a0a5d7d426fa" + }, + { + "_key": "677b3412fd7d", + "children": [ + { + "_type": "span", + "text": "Cloud changes everything", + "_key": "b11f226c8f5b" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "196717039f05", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/s3/", + "_key": "2a233c7de0a7" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the launch of ", + "_key": "f2fd34d71585" + }, + { + "_type": "span", + "marks": [ + "2a233c7de0a7" + ], + "text": "Amazon S3", + "_key": "9bffc25d96c0" + }, + { + "_type": "span", + "marks": [], + "text": " in 2006, new choices began to emerge. Rather than being a traditional file system, S3 is an object store accessible through a web API. S3 abandoned traditional ideas around hierarchical file systems. Instead, it presented a simple programmatic interface and CLI for storing and retrieving binary objects.", + "_key": "4f6de03f9d46" + } + ] + }, + { + "style": "normal", + "_key": "f421dde1a0bd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "23120689e080" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "d9a4f88dce28", + "_type": "link", + "href": "https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html" + }, + { + "_type": "link", + "href": "https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html", + "_key": "0d1d3dde87df" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Object stores are a good fit for cloud services because they are simple and scalable to multiple petabytes of storage. Rather than relying on central metadata that presents a bottleneck, metadata is stored with each object. All operations are atomic, so there is no need for complex POSIX-style file-locking mechanisms that add complexity to the design. Developers interact with object stores using simple calls like ", + "_key": "81641ea0b49f" + }, + { + "marks": [ + "d9a4f88dce28" + ], + "text": "PutObject", + "_key": "c7efc133c3c1", + "_type": "span" + }, + { + "_key": "57768a3c4c75", + "_type": "span", + "marks": [], + "text": " (store an object in a bucket in return for a key) and " + }, + { + "marks": [ + "0d1d3dde87df" + ], + "text": "GetObject", + "_key": "0b4d408861d1", + "_type": "span" + }, + { + "_key": "5d8862733c6c", + "_type": "span", + "marks": [], + "text": " (retrieve a binary object, given a key)." + } + ], + "_type": "block", + "style": "normal", + "_key": "ad309f918f94" + }, + { + "_key": "911c23f4a25e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c3a5977d2020" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "This simple approach was ideal for internet-scale applications. It was also much less expensive than traditional file systems. As a result, S3 usage grew rapidly. Similar object stores quickly emerged, including Microsoft ", + "_key": "809cde29aaf8", + "_type": "span", + "marks": [] + }, + { + "text": "Azure Blob Storage", + "_key": "d05a49423fcd", + "_type": "span", + "marks": [ + "53286c954ebc" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "4cd9f5426c0c" + }, + { + "text": "Open Stack Swift", + "_key": "49910fc3df6e", + "_type": "span", + "marks": [ + "ecc7f5d8943b" + ] + }, + { + "marks": [], + "text": ", and ", + "_key": "7e6746a198e4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3e9d66b449f2" + ], + "text": "Google Cloud Storage", + "_key": "7c867c5c426b" + }, + { + "_type": "span", + "marks": [], + "text": ", released in 2010.", + "_key": "6f98d52453fb" + } + ], + "_type": "block", + "style": "normal", + "_key": "a1054cd4bad9", + "markDefs": [ + { + "href": "https://azure.microsoft.com/en-ca/products/storage/blobs/", + "_key": "53286c954ebc", + "_type": "link" + }, + { + "_type": "link", + "href": "https://wiki.openstack.org/wiki/Swift", + "_key": "ecc7f5d8943b" + }, + { + "_type": "link", + "href": "https://cloud.google.com/storage/", + "_key": "3e9d66b449f2" + } + ] + }, + { + "style": "normal", + "_key": "6f95fb711fbe", + "children": [ + { + "text": "", + "_key": "d7ca6e26b49d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "c7c62bb485a8", + "children": [ + { + "text": "Cloud object stores vs. shared file systems", + "_key": "1890d0adf95b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "25131f7d5a54", + "markDefs": [ + { + "_type": "link", + "href": "https://availability.sre.xyz/", + "_key": "6e6040a287be" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Object stores are attractive because they are reliable, scalable, and cost-effective. They are frequently used to store large amounts of data that are accessed infrequently. Examples include archives, images, raw video footage, or in the case of bioinformatics applications, libraries of biological samples or reference genomes. Object stores provide near-continuous availability by spreading data replicas across cloud availability zones (AZs). AWS claims theoretical data availability of up to 99.999999999% (11 9's) – a level of availability so high that it does not even register on most ", + "_key": "cf0a27b25493" + }, + { + "_type": "span", + "marks": [ + "6e6040a287be" + ], + "text": "downtime calculators", + "_key": "53443b83c04f" + }, + { + "_type": "span", + "marks": [], + "text": "!", + "_key": "ef737530052e" + } + ] + }, + { + "_key": "049c31e88b49", + "children": [ + { + "text": "", + "_key": "6dd6498183db", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5f370ecc94d3", + "markDefs": [ + { + "_key": "a4fd33879e2f", + "_type": "link", + "href": "https://aws.amazon.com/s3/pricing" + } + ], + "children": [ + { + "text": "Because they support both near-line and cold storage, object stores are sometimes referred to as "cheap and deep." Based on current ", + "_key": "206cd15f4daf", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "a4fd33879e2f" + ], + "text": "S3 pricing", + "_key": "cacf9ff109fd", + "_type": "span" + }, + { + "text": ", the going rate for data storage is USD 0.023 per GB for the first 50 TB of data. Users can "pay as they go" — spinning up S3 storage buckets and storing arbitrary amounts of data for as long as they choose. Some high-level differences between object stores and traditional file systems are summarized below.", + "_key": "845635fa3f52", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a7dcb31cc94e" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d39d355b54a" + }, + { + "_key": "887b188df4bf", + "_type": "block" + }, + { + "_key": "8008370e24aa", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The downside of object storage is that the vast majority of applications are written to work with POSIX file systems. As a result, applications seldom interact directly with object stores. A common practice is to copy data from an object store, perform calculations locally on a cluster node, and write results back to the object store for long-term storage.", + "_key": "9b5ee1c7185b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "8ece30838538", + "children": [ + { + "text": "", + "_key": "1dc80d37b75d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "783fbcf93e71", + "children": [ + { + "_key": "4df4d4a7b6c6", + "_type": "span", + "text": "Data handling in Nextflow" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Unlike older pipeline orchestrators, Nextflow was built with cloud object stores in mind. Depending on the cloud where pipelines run, Nextflow manages cloud credentials and allows users to provide a path to shared data. This can be a shared file system such as ", + "_key": "7f3184c1d0a7", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "/my-shared-filesystem/data", + "_key": "f51835eca0e6" + }, + { + "_type": "span", + "marks": [], + "text": " or a cloud object store e.g. ", + "_key": "735793315d95" + }, + { + "text": "s3://my-bucket/data/", + "_key": "3e23a473fdb1", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "2a515b1e7977" + } + ], + "_type": "block", + "style": "normal", + "_key": "1b5b898d5ac2", + "markDefs": [] + }, + { + "_key": "1b5f2c266f83", + "children": [ + { + "_type": "span", + "text": "", + "_key": "92b5f44c23bf" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "233f6f7832e7", + "markDefs": [ + { + "href": "https://nextflow.io/docs/latest/executor.html", + "_key": "e9ee0789870f", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow is exceptionally versatile when it comes to data handling, and can support almost any file system or object store.", + "_key": "9d087ff6ee5a", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " Internally, Nextflow uses ", + "_key": "e8422cc528b6" + }, + { + "marks": [ + "e9ee0789870f" + ], + "text": "executors", + "_key": "62a4a865c268", + "_type": "span" + }, + { + "_key": "7eded1448d14", + "_type": "span", + "marks": [], + "text": " implemented as plug-ins to insulate pipeline code from underlying compute and storage environments. This enables pipelines to run without modification across multiple clouds regardless of the underlying storage technology." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b9f2cee19d81", + "children": [ + { + "text": "", + "_key": "abbac4984319", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ecb08f84abd5", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-amazon", + "_key": "3c5128af26e2", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Suppose an S3 bucket is specified as a location for shared data during pipeline execution. In that case, aided by the ", + "_key": "fbb8e0e85817" + }, + { + "_key": "ea7bf4632478", + "_type": "span", + "marks": [ + "3c5128af26e2" + ], + "text": "nf-amazon" + }, + { + "text": " plug-in, Nextflow transparently copies data from the S3 bucket to a file system on a cloud instance. Containerized applications mount the local file system and read and write data directly. Once processing is complete, Nextflow copies data to the shared bucket to be available for the next task. All of this is completely transparent to the pipeline and applications. The same plug-in-based approach is used for other cloud object stores such as Azure BLOBs and Google Cloud Storage.", + "_key": "856a01e6a8e7", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "064ce84daf06", + "children": [ + { + "text": "", + "_key": "f6ba5ae136af", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "b930bf2973f8", + "children": [ + { + "_key": "c973e2b2e8ce", + "_type": "span", + "text": "The Nextflow scratch directive" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "The idea of staging data from shared repositories to a local disk, as described above, is not new. A common practice with HPC clusters when using NFS file systems is to use local "scratch" storage.", + "_key": "1c0a42bbb165", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e6072722f5a1", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1137b8caa5a5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "06c465d5ed27" + } + ] + }, + { + "style": "normal", + "_key": "29e33de39775", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A common problem with shared NFS file systems is that they can be relatively slow — especially when there are multiple clients. File systems introduce latency, have limited IO capacity, and are prone to problems such as “hot spots” and bandwidth limitations when multiple clients read and write files in the same directory.", + "_key": "54a6ee89289a" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "452c0c0dcf2d" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c2c1df40666" + }, + { + "_type": "block", + "style": "normal", + "_key": "688a27a734f1", + "markDefs": [ + { + "_type": "link", + "href": "https://www.mvps.net/docs/how-to-mount-the-physical-memory-from-a-linux-system-as-a-partition/", + "_key": "bb1d97ec5e96" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To avoid bottlenecks, data is often copied from an NFS filer to local scratch storage for processing. Depending on data volumes, users often use fast solid-state drives or ", + "_key": "e903e533e29f" + }, + { + "_key": "bded42b8de47", + "_type": "span", + "marks": [ + "bb1d97ec5e96" + ], + "text": "RAM disks" + }, + { + "text": " for scratch storage to accelerate processing.", + "_key": "7ed48b572c90", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "171815412434" + } + ], + "_type": "block", + "style": "normal", + "_key": "b32ad172a4a4" + }, + { + "markDefs": [ + { + "href": "https://nextflow.io/docs/latest/process.html?highlight=scratch#scratch", + "_key": "421399f8dcca", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow automates this data handling pattern with built-in support for a ", + "_key": "45b5acb25aab", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "421399f8dcca" + ], + "text": "scratch", + "_key": "380e4d5f70d3" + }, + { + "marks": [], + "text": " directive that can be enabled or disabled per process. If scratch is enabled, data is automatically copied to a designated local scratch device prior to processing.", + "_key": "ab39f384cb37", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "762d92553ba5" + }, + { + "_type": "block", + "style": "normal", + "_key": "6678585b57e6", + "children": [ + { + "_key": "57c1b09c041d", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "65852ca186ac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When high-performance file systems such as Lustre or Spectrum Scale are available, the question of whether to use scratch storage becomes more complicated. Depending on the file system and interconnect, parallel file systems performance can sometimes exceed that of local disk. In these cases, customers may set scratch to false and perform I/O directly on the parallel file system.", + "_key": "d3f880949402", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "12004a8a06a4", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ca2759ff9c5" + }, + { + "_key": "574000677ea6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Results will vary depending on the performance of the shared file system, the speed of local scratch storage, and the amount of shared data to be shuttled back and forth. Users will want to experiment to determine whether enabling scratch benefits pipelines performance.", + "_key": "acc21666d121" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e041cca31d85", + "children": [ + { + "_key": "9cc9d55e591e", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4f714fe120e5", + "children": [ + { + "_type": "span", + "text": "Multiple storage options for Nextflow users", + "_key": "6c65564d8d89" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "b9f987b8b17b", + "markDefs": [], + "children": [ + { + "_key": "99feb7998a12", + "_type": "span", + "marks": [], + "text": "Storage solutions used with Nextflow can be grouped into five categories as described below:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "2b5486f73911", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2de5191a715b" + }, + { + "_key": "5ad30cf2fcc2", + "listItem": "bullet", + "children": [ + { + "text": "Traditional file systems", + "_key": "c91569d67ff6", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Cloud object stores", + "_key": "5ca16a7455a2", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Cloud file systems", + "_key": "17d1c7631b7d" + }, + { + "text": "High-performance cloud file systems", + "_key": "20ae918e3975", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Fusion file system v2.0", + "_key": "b280e351a674", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ba01eded3b5d" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca72ad63a115" + }, + { + "children": [ + { + "marks": [], + "text": "The optimal choice will depend on your environment and the nature of your applications and compute environments.", + "_key": "937a00c428f2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5fa0b4908a67", + "markDefs": [] + }, + { + "_key": "ce8c985aabb3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c44395a5ade2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Traditional file systems", + "_key": "c3f37b9df1bc" + }, + { + "marks": [], + "text": " — These are file systems typically deployed on-premises that present a POSIX interface. NFS is the most popular choice, but some users may use high-performance parallel file systems. Storage vendors often package their offerings as appliances, making them easier to deploy and manage. Solutions common in on-prem HPC environments include ", + "_key": "53638d31561f", + "_type": "span" + }, + { + "_key": "e340406a5ac5", + "_type": "span", + "marks": [ + "d562f186aab2" + ], + "text": "Network Appliance" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "6606a1c2a9a5" + }, + { + "_type": "span", + "marks": [ + "ed24907e75c6" + ], + "text": "Data Direct Networks", + "_key": "fef4da4975c5" + }, + { + "marks": [], + "text": " (DDN), ", + "_key": "f4436d49dc7c", + "_type": "span" + }, + { + "marks": [ + "fe0d4e818850" + ], + "text": "HPE Cray ClusterStor", + "_key": "130d64c74fc0", + "_type": "span" + }, + { + "_key": "bb6169c627c8", + "_type": "span", + "marks": [], + "text": ", and " + }, + { + "text": "IBM Storage Scale", + "_key": "29ce17b17dce", + "_type": "span", + "marks": [ + "cbee001225b3" + ] + }, + { + "marks": [], + "text": ". While customers can deploy self-managed NFS or parallel file systems in the cloud, most don’t bother with this in practice. There are generally better solutions available in the cloud.", + "_key": "4ab7256ed015", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "666ad3b69798", + "markDefs": [ + { + "_type": "link", + "href": "https://www.netapp.com/", + "_key": "d562f186aab2" + }, + { + "href": "https://www.ddn.com/", + "_key": "ed24907e75c6", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.hpe.com/psnow/doc/a00062172enw", + "_key": "fe0d4e818850" + }, + { + "_type": "link", + "href": "https://www.ibm.com/products/storage-scale-system", + "_key": "cbee001225b3" + } + ] + }, + { + "_key": "330bd66284c1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0aef91bc9128" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "088db8cdb668", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Cloud object stores", + "_key": "239f3215d516" + }, + { + "text": " — In the cloud, object stores tend to be the most popular solution among Nextflow users. Although object stores don’t present a POSIX interface, they are inexpensive, easy to configure, and scale practically without limit. Depending on performance, access, and retention requirements, customers can purchase different object storage tiers at different price points. Popular cloud object stores include Amazon S3, Azure BLOBs, and Google Cloud Storage. As pipelines execute, the Nextflow executors described above manage data transfers to and from cloud object storage automatically. One drawback is that because of the need to copy data to and from the object store for every process, performance may be lower than a fast shared file system.", + "_key": "fb09d77bfd5d", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ee76949dc3ff" + } + ], + "_type": "block", + "style": "normal", + "_key": "cda4f0a2dd65" + }, + { + "_type": "block", + "style": "normal", + "_key": "30dc0a610070", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/efs/", + "_key": "d4a51401993a" + }, + { + "_type": "link", + "href": "https://azure.microsoft.com/en-us/products/storage/files/", + "_key": "eb9826013a6a" + }, + { + "_type": "link", + "href": "https://cloud.google.com/filestore", + "_key": "2b870dc66680" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Cloud file systems", + "_key": "43658f4e3ddc" + }, + { + "_type": "span", + "marks": [], + "text": " — Often, it is desirable to have a shared file NFS system. However, these environments can be tedious to deploy and manage in the cloud. Recognizing this, most cloud providers offer cloud file systems that combine some of the best properties of traditional file systems and object stores. These file systems present a POSIX interface and are accessible via SMB and NFS file-sharing protocols. Like object stores, they are easy to deploy and scalable on demand. Examples include ", + "_key": "ff3eb6e0aeb4" + }, + { + "marks": [ + "d4a51401993a" + ], + "text": "Amazon EFS", + "_key": "f7bccbd671e9", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "15e8d3fe5583" + }, + { + "_type": "span", + "marks": [ + "eb9826013a6a" + ], + "text": "Azure Files", + "_key": "f63889afcfe4" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "0bc46253f59d" + }, + { + "_key": "87883e7ef639", + "_type": "span", + "marks": [ + "2b870dc66680" + ], + "text": "Google Cloud Filestore" + }, + { + "_type": "span", + "marks": [], + "text": ". These file systems are described as "serverless" and "elastic" because there are no servers to manage, and capacity scales automatically.", + "_key": "1d54a575745f" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "283b5356a7f5" + } + ], + "_type": "block", + "style": "normal", + "_key": "9ea83c9d27b0" + }, + { + "markDefs": [ + { + "_key": "2ebc244cd9bf", + "_type": "link", + "href": "https://aws.amazon.com/efs/pricing/" + }, + { + "_key": "1559102899c3", + "_type": "link", + "href": "https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html" + }, + { + "_key": "cf45b481281f", + "_type": "link", + "href": "https://azure.microsoft.com/en-us/pricing/details/storage/files/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Comparing price and performance can be tricky because cloud file systems are highly configurable. For example, ", + "_key": "47ea26292614" + }, + { + "text": "Amazon EFS", + "_key": "cf97b99892b4", + "_type": "span", + "marks": [ + "2ebc244cd9bf" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is available in ", + "_key": "09a3673a17a5" + }, + { + "_type": "span", + "marks": [ + "1559102899c3" + ], + "text": "four storage classes", + "_key": "b8668800d816" + }, + { + "marks": [], + "text": " – Amazon EFS Standard, Amazon EFS Standard-IA, and two One Zone storage classes – Amazon EFS One Zone and Amazon EFS One Zone-IA. Similarly, Azure Files is configurable with ", + "_key": "1d613721d2aa", + "_type": "span" + }, + { + "text": "four different redundancy options", + "_key": "77f887e227ae", + "_type": "span", + "marks": [ + "cf45b481281f" + ] + }, + { + "_key": "c7bcbb66ddb1", + "_type": "span", + "marks": [], + "text": ", and different billing models apply depending on the offer selected. To provide a comparison, Amazon EFS Standard costs $0.08 /GB-Mo in the US East region, which is ~4x more expensive than Amazon S3." + } + ], + "_type": "block", + "style": "normal", + "_key": "2b802f90d60f" + }, + { + "children": [ + { + "_key": "b9fc1e362530", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ce913740e3b" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "From the perspective of Nextflow users, using Amazon EFS and similar cloud file systems is the same as using a local NFS system. Nextflow users must ensure that their cloud instances mount the NFS share, so there is slightly more management overhead than using an S3 bucket. Nextflow users and administrators can experiment with the scratch directive governing whether Nextflow stages data in a local scratch area or reads and writes directly to the shared file system.", + "_key": "0eaa9456bf63" + } + ], + "_type": "block", + "style": "normal", + "_key": "575fb8ce32a9" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "53cffa6357c0" + } + ], + "_type": "block", + "style": "normal", + "_key": "46becda37fe9" + }, + { + "style": "normal", + "_key": "949b78c43880", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Cloud file systems suffer from some of the same limitations as on-prem NFS file systems. They often don’t scale efficiently, and performance is limited by network bandwidth. Also, depending on the pipeline, users may need to stage data to the shared file system in advance, often by copying data from an object store used for long term storage.", + "_key": "47a479d1cf68" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "bc22d2ef17a9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c23ff1128259" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.tower.nf/", + "_key": "79d1887d8eb7" + } + ], + "children": [ + { + "text": "For ", + "_key": "cd37f1f4a63d", + "_type": "span", + "marks": [] + }, + { + "_key": "42cf614dac9d", + "_type": "span", + "marks": [ + "79d1887d8eb7" + ], + "text": "Nextflow Tower" + }, + { + "marks": [], + "text": " users, there is a convenient integration with Amazon EFS. Tower Cloud users can have an Amazon EFS instance created for them automatically via Tower Forge, or they can leverage an existing EFS instance in their compute environment. In either case, Tower ensures that the EFS share is available to compute hosts in the AWS Batch environment, reducing configuration requirements.", + "_key": "fd728979d8bd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d11cde2c8233" + }, + { + "_key": "2323bccfb8be", + "children": [ + { + "_key": "8d67187c5883", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "96db3ae2e6b6", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/fsx/lustre/", + "_key": "c3bc1358bf6b" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Cloud high-performance file systems", + "_key": "d35a3d12eda7" + }, + { + "_type": "span", + "marks": [], + "text": " — For customers that need high levels of performance in the cloud, Amazon offers Amazon FSx. Amazon FSx comes in different flavors, including NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. In HPC circles, ", + "_key": "a40ab9bada9e" + }, + { + "text": "FSx for Lustre", + "_key": "6e18d9b96aed", + "_type": "span", + "marks": [ + "c3bc1358bf6b" + ] + }, + { + "_key": "2ae2646254a4", + "_type": "span", + "marks": [], + "text": " is most popular delivering sub-millisecond latency, up to 1 TB/sec maximum throughput per file system, and millions of IOPs. Some Nextflow users with data bottlenecks use FSx for Lustre, but it is more difficult to configure and manage than Amazon S3." + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "153cfdcef191" + } + ], + "_type": "block", + "style": "normal", + "_key": "4676216e120b" + }, + { + "markDefs": [], + "children": [ + { + "_key": "3f040585112b", + "_type": "span", + "marks": [], + "text": "Like Amazon EFS, FSx for Lustre is a fully-managed, serverless, elastic file system. Amazon FSx for Lustre is configurable, depending on customer requirements. For example, customers with latency-sensitive applications can deploy FSx cluster nodes with SSD drives. Customers concerned with cost and throughput can select standard hard drives (HDD). HDD-based FSx for Lustre clusters can be optionally configured with an SSD-based cache to accelerate performance. Customers also choose between different persistent file system options and a scratch file system option. Another factor to remember is that with parallel file systems, bandwidth scales with capacity. If you deploy a Lustre file system that is too small, you may be disappointed in the performance." + } + ], + "_type": "block", + "style": "normal", + "_key": "9e2a2fd1fb6b" + }, + { + "_type": "block", + "style": "normal", + "_key": "75f95651fa0f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6ed52734e401" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "FSx for Lustre persistent file systems ranges from 125 to 1,000 MB/s/TiB at ", + "_key": "c49f174fb406" + }, + { + "text": "prices", + "_key": "753043fb343e", + "_type": "span", + "marks": [ + "5ad0dbe8f16f" + ] + }, + { + "marks": [], + "text": " ranging from ", + "_key": "20bca0622d6a", + "_type": "span" + }, + { + "text": "$0.145", + "_key": "04e410459318", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " to ", + "_key": "d749fff2d6a3", + "_type": "span", + "marks": [] + }, + { + "_key": "11c093fa93fa", + "_type": "span", + "marks": [ + "strong" + ], + "text": "$0.600" + }, + { + "text": " per GB month. Amazon also offers a lower-cost scratch FSx for Lustre file systems (not to be confused with the scratch directive in Nextflow). At this tier, FSx for Lustre does not replicate data across availability zones, so it is suited to short-term data storage. Scratch FSx for Lustre storage delivers ", + "_key": "7afc5d707c1c", + "_type": "span", + "marks": [] + }, + { + "_key": "ac0fcb8862cf", + "_type": "span", + "marks": [ + "strong" + ], + "text": "200 MB/s/TiB" + }, + { + "_type": "span", + "marks": [], + "text": ", costing ", + "_key": "6fb731d777b5" + }, + { + "marks": [ + "strong" + ], + "text": "$0.140", + "_key": "87dd40d4a516", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " per GB month. This is ", + "_key": "ac255f175e21" + }, + { + "text": "~75%", + "_key": "51ac87bddbac", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "54bd0144aff0", + "_type": "span", + "marks": [], + "text": " more expensive than Amazon EFS (Standard) and " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "~6x", + "_key": "d1662a42f60b" + }, + { + "_key": "324a75db56f6", + "_type": "span", + "marks": [], + "text": " the cost of standard S3 storage. Persistent FSx for Lustre file systems configured to deliver " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "1,000 MB/s/TiB", + "_key": "1fe8f6ee2165" + }, + { + "_type": "span", + "marks": [], + "text": " can be up to ", + "_key": "9b118c1e9eec" + }, + { + "_key": "4876d340b9f8", + "_type": "span", + "marks": [ + "strong" + ], + "text": "~26x" + }, + { + "text": " the price of standard S3 object storage!", + "_key": "6fdf1752b8ae", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "65a68042b927", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/fsx/lustre/pricing/", + "_key": "5ad0dbe8f16f" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "ca4589c54a73", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1ff23b1ccb59" + }, + { + "_type": "block", + "style": "normal", + "_key": "21ea1d8584b5", + "markDefs": [ + { + "_key": "d7793f8843b5", + "_type": "link", + "href": "https://www.weka.io/" + } + ], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Hybrid Cloud file systems", + "_key": "7e928b595b8a", + "_type": "span" + }, + { + "marks": [], + "text": " — In addition to the solutions described above, there are other solutions that combine the best of object stores and high-performance parallel file systems. An example is ", + "_key": "359ebb298c07", + "_type": "span" + }, + { + "_key": "61ccdf002e1a", + "_type": "span", + "marks": [ + "d7793f8843b5" + ], + "text": "WekaFS™" + }, + { + "text": " from WEKA. WekaFS is used by several Nextflow users and is deployable on-premises or across your choice cloud platforms. WekaFS is attractive because it provides multi-protocol access to the same data (POSIX, S3, NFS, SMB) while presenting a common namespace between on-prem and cloud resident compute environments. Weka delivers the performance benefits of a high-performance parallel file system and optionally uses cloud object storage as a backing store for file system data to help reduce costs.", + "_key": "213a8af5e9ba", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e2f48adcfc73" + } + ], + "_type": "block", + "style": "normal", + "_key": "296d2d96fb28" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "From a Nextflow perspective, WekaFS behaves like any other shared file system. As such, Nextflow and Tower have no specific integration with WEKA. Nextflow users will need to deploy and manage WekaFS themselves making the environment more complex to setup and manage. However, the flexibility and performance provided by a hybrid cloud file system makes this worthwhile for many organizations.", + "_key": "2e901d62ffdb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5d52e274c929" + }, + { + "children": [ + { + "text": "", + "_key": "c1d3d15f9ad2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c2b9b7af28b8" + }, + { + "_key": "953967068b82", + "markDefs": [ + { + "href": "https://seqera.io/fusion", + "_key": "f0cae0d6e931", + "_type": "link" + } + ], + "children": [ + { + "_key": "ee2484e42124", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Fusion file system 2.0" + }, + { + "text": " — Fusion file system is a solution developed by ", + "_key": "c355beff8bb2", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "f0cae0d6e931" + ], + "text": "Seqera Labs", + "_key": "a81f4fbd6b4d" + }, + { + "text": " that aims to bridge the gap between cloud-native storage and data analysis workflows. The solution implements a thin client that allows pipeline jobs to access object storage using a standard POSIX interface, thus simplifying and speeding up most operations.", + "_key": "44e5ee571941", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "bfd28f4bae4d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "939c5637547a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/", + "_key": "d0c9301ea6a0", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The advantage of the Fusion file system is that there is no need to copy data between S3 and local storage. The Fusion file system driver accesses and manipulates files in Amazon S3 directly. You can learn more about the Fusion file system and how it works in the whitepaper ", + "_key": "5f782c6a9d0e", + "_type": "span" + }, + { + "marks": [ + "d0c9301ea6a0" + ], + "text": "Breakthrough performance and cost-efficiency with the new Fusion file system", + "_key": "4724d29dbbf0", + "_type": "span" + }, + { + "text": ".", + "_key": "cd6203c659e6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5fb20cf6df96" + }, + { + "_type": "block", + "style": "normal", + "_key": "2105f3a80074", + "children": [ + { + "_key": "e905660ae591", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "77ec46186f3e", + "markDefs": [ + { + "href": "https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/", + "_key": "5a9fe30f9b87", + "_type": "link" + } + ], + "children": [ + { + "_key": "0030f51a5d6a", + "_type": "span", + "marks": [], + "text": "For sites struggling with performance and scalability issues on shared file systems or object storage, the Fusion file system offers several advantages. " + }, + { + "text": "Benchmarks conducted", + "_key": "5c7220ddaaab", + "_type": "span", + "marks": [ + "5a9fe30f9b87" + ] + }, + { + "_key": "6c672165837b", + "_type": "span", + "marks": [], + "text": " by Seqera Labs have shown that, in some cases, " + }, + { + "_key": "b4d787073182", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Fusion can deliver performance on par with Lustre but at a much lower cost." + }, + { + "_type": "span", + "marks": [], + "text": " Fusion is also significantly easier to configure and manage and can result in lower costs for both compute and storage resources.", + "_key": "1fa28d30450b" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e99e4311a062", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dea09915adb7" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Comparing the alternatives", + "_key": "d7a07935147b", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "5106472c51f2" + }, + { + "_key": "93e6ddcba003", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "A summary of storage options is presented in the table below:", + "_key": "7ce0bd3fcfe3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a00ee1573f6f", + "children": [ + { + "_key": "b6dd27c0b0e0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b353b842b965", + "_type": "block" + }, + { + "children": [ + { + "text": "So what’s the bottom line?", + "_key": "b659e110fd84", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "535de99262ec" + }, + { + "_type": "block", + "style": "normal", + "_key": "7fde6033fa3d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The choice or storage solution depends on several factors. Object stores like Amazon S3 are popular because they are convenient and inexpensive. However, depending on data access patterns, and the amount of data to be staged in advance, file systems such as EFS, Azure Files or FSx for Lustre can also be a good alternative.", + "_key": "9fef3d3d4fbb" + } + ] + }, + { + "children": [ + { + "_key": "a707747fb405", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9023f8b6c875" + }, + { + "children": [ + { + "marks": [], + "text": "For many Nextflow users, Fusion file system will be a better option since it offers performance comparable to a high-performance file system at the cost of cloud object storage. Fusion is also dramatically easier to deploy and manage. ", + "_key": "5dd52baf2a9d", + "_type": "span" + }, + { + "text": "Adding Fusion support", + "_key": "e2e03c6ba338", + "_type": "span", + "marks": [ + "200a2ff7afd9" + ] + }, + { + "marks": [], + "text": " is just a matter of adding a few lines to the ", + "_key": "63ca5d31b6b6", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "546c7fe61657" + }, + { + "_key": "6c98e446b56a", + "_type": "span", + "marks": [], + "text": " file." + } + ], + "_type": "block", + "style": "normal", + "_key": "b4c816fe7e19", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/fusion.html", + "_key": "200a2ff7afd9" + } + ] + }, + { + "_key": "689bc5089183", + "children": [ + { + "text": "", + "_key": "e9ff0e7621f5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Where workloads run is also an important consideration. For example, on-premises clusters will typically use whatever shared file system is available locally. When operating in the cloud, you can choose whether to use cloud file systems, object stores, high-performance file systems, Fusion FS, or hybrid cloud solutions such as Weka.", + "_key": "f54d8bfe630b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a308e0f0570" + }, + { + "style": "normal", + "_key": "a90f0fe4f383", + "children": [ + { + "_key": "37e17f06cd13", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "f382789ac04f", + "markDefs": [ + { + "_key": "9ea9d9d78876", + "_type": "link", + "href": "https://nextflow.slack.com/" + } + ], + "children": [ + { + "_key": "01ddd205d2d0", + "_type": "span", + "marks": [], + "text": "Still unsure what storage solution will best meet your needs? Consider joining our community at " + }, + { + "marks": [ + "9ea9d9d78876" + ], + "text": "nextflow.slack.com", + "_key": "792c90c8322c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". You can engage with others, post technical questions, and learn more about the pros and cons of the storage solutions described above.", + "_key": "bfcc3c7829f1" + } + ], + "_type": "block", + "style": "normal" + } + ], + "meta": { + "slug": { + "current": "selecting-the-right-storage-architecture-for-your-nextflow-pipelines" + } + } + }, + { + "publishedAt": "2019-07-01T06:00:00.000Z", + "_updatedAt": "2024-09-26T09:02:17Z", + "_createdAt": "2024-09-25T14:15:46Z", + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "_type": "blogPost", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "e63c779fa76c" + } + ], + "body": [ + { + "_key": "93ab01662de4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part one [here](/blog/2019/demystifying-nextflow-resume.html)", + "_key": "4b594e5cec4a" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "632ea3464c2b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "cfbea7ac452b", + "children": [ + { + "text": "", + "_key": "a984638cee7a", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "Troubleshooting resume", + "_key": "ee4a3cbe5013", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "8092ec95ec33" + }, + { + "children": [ + { + "text": "If your workflow execution is not resumed as expected, there exists several strategies to debug the problem.", + "_key": "e9dc9837eaf7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5f2c3a7a32d0", + "markDefs": [] + }, + { + "style": "normal", + "_key": "271581b91f27", + "children": [ + { + "text": "", + "_key": "2a841fdb6483", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Modified input file(s)", + "_key": "989251e9fbb4" + } + ], + "_type": "block", + "style": "h4", + "_key": "cd829cbe2975" + }, + { + "markDefs": [], + "children": [ + { + "_key": "8f1f55ed392b", + "_type": "span", + "marks": [], + "text": "Make sure that there has been no change in your input files. Don’t forget the unique task hash is computed by taking into account the complete file path, the last modified timestamp and the file size. If any of these change, the workflow will be re-executed, even if the input content is the same." + } + ], + "_type": "block", + "style": "normal", + "_key": "0122af7d4ab2" + }, + { + "children": [ + { + "_key": "ddc6c3161be5", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "fddaa5d71381" + }, + { + "style": "h4", + "_key": "a99a4d38dcbb", + "children": [ + { + "_key": "56b44c1b257f", + "_type": "span", + "text": "A process modifying one or more inputs" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "10c3ecb67377", + "markDefs": [], + "children": [ + { + "_key": "e1a6543ab5a2", + "_type": "span", + "marks": [], + "text": "A process should never alter input files. When this happens, the future execution of tasks will be invalidated for the same reason explained in the previous point." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "85b0ec4c7ed6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8b2470b70a1c" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h4", + "_key": "59e23f2fa8e8", + "children": [ + { + "_type": "span", + "text": "Inconsistent input file attributes", + "_key": "27f9e053fcf3" + } + ] + }, + { + "_key": "ff7749c84b75", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#cache", + "_key": "c533da84387a" + } + ], + "children": [ + { + "_key": "24ac4189bb62", + "_type": "span", + "marks": [], + "text": "Some shared file system, such as NFS, may report inconsistent file timestamp i.e. a different timestamp for the same file even if it has not been modified. There is an option to use the " + }, + { + "marks": [ + "c533da84387a" + ], + "text": "lenient mode of caching", + "_key": "ad03fa681369", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " to avoid this problem.", + "_key": "af374dd7e209" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "663560aeee65" + } + ], + "_type": "block", + "style": "normal", + "_key": "14f8d8644f3d" + }, + { + "_type": "block", + "style": "h4", + "_key": "0432456066e5", + "children": [ + { + "_key": "921de65e0036", + "_type": "span", + "text": "Race condition in a global variable" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f7c77a880ad6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow does its best to simplify parallel programming and to prevent race conditions and the access of shared resources. One of the few cases in which a race condition may arise is when using a global variable with two (or more) operators. For example:", + "_key": "bf4515d87054", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "fa671714d4c6", + "children": [ + { + "text": "", + "_key": "67643d07fb03", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "Channel\n .from(1,2,3)\n .map { it -> X=it; X+=2 }\n .println { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> X=it; X*=2 }\n .println { \"ch2 = $it\" }", + "_type": "code", + "_key": "2c1290be8c8f" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "397140ecc64e" + } + ], + "_type": "block", + "style": "normal", + "_key": "29fed35e8d6d" + }, + { + "_key": "b870cc3c0b59", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The problem with this snippet is that the ", + "_key": "f40d89920ebc", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "X", + "_key": "17d3115ae19c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " variable in the closure definition is defined in the global scope. Since operators are executed in parallel, the ", + "_key": "026592c4fc12" + }, + { + "text": "X", + "_key": "581fce320b1b", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " value can, therefore, be overwritten by the other ", + "_key": "bbdb2ec8bc0c" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "map", + "_key": "06cea9d5048b" + }, + { + "text": " invocation.", + "_key": "3c7bc38c26d5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "61b0e73179c1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8644812eef91" + } + ] + }, + { + "_key": "db81b2c1c6a2", + "markDefs": [], + "children": [ + { + "_key": "19f092d5c2fe", + "_type": "span", + "marks": [], + "text": "The correct implementation requires the use of the " + }, + { + "marks": [ + "code" + ], + "text": "def", + "_key": "740fe8b6461e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " keyword to declare the variable local.", + "_key": "88e7c8be3a52" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4fa9e6899043" + } + ], + "_type": "block", + "style": "normal", + "_key": "d53d557d4b40" + }, + { + "_key": "e2873d09bf0e", + "code": "Channel\n .from(1,2,3)\n .map { it -> def X=it; X+=2 }\n .view { \"ch1 = $it\" }\n\nChannel\n .from(1,2,3)\n .map { it -> def X=it; X*=2 }\n .view { \"ch2 = $it\" }", + "_type": "code" + }, + { + "style": "normal", + "_key": "ed937b3d5fd6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "be3200ae219f" + } + ], + "_type": "block" + }, + { + "_key": "ce3ead16fad8", + "children": [ + { + "_type": "span", + "text": "Non-deterministic input channels", + "_key": "257f3c5805ca" + } + ], + "_type": "block", + "style": "h4" + }, + { + "children": [ + { + "_key": "a8699e165eb4", + "_type": "span", + "marks": [], + "text": "While dataflow channel ordering is guaranteed i.e. data is read in the same order in which it’s written in the channel, when a process declares as input two or more channels, each of which is the output of a different process, the overall input ordering is not consistent across different executions." + } + ], + "_type": "block", + "style": "normal", + "_key": "e4d98470341a", + "markDefs": [] + }, + { + "_key": "fc479da15206", + "children": [ + { + "_type": "span", + "text": "", + "_key": "18846141ce96" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "2ba566425557", + "markDefs": [], + "children": [ + { + "_key": "9657a3324e44", + "_type": "span", + "marks": [], + "text": "Consider the following snippet:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "9bf16d6bfcf7", + "children": [ + { + "text": "", + "_key": "2458b7adcaa1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "c1c4704a211a", + "code": "process foo {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bam') into bam_ch\n \"\"\"\n your_command --here\n \"\"\"\n}\n\nprocess bar {\n input: set val(pair), file(reads) from reads_ch\n output: set val(pair), file('*.bai') into bai_ch\n \"\"\"\n other_command --here\n \"\"\"\n}\n\nprocess gather {\n input:\n set val(pair), file(bam) from bam_ch\n set val(pair), file(bai) from bai_ch\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}" + }, + { + "_key": "668c6cbe53a3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "705e5fbe21d3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "02910660f82e", + "markDefs": [], + "children": [ + { + "_key": "91776f3c10fd", + "_type": "span", + "marks": [], + "text": "The inputs declared in the gather process can be delivered in any order as the execution order of the process " + }, + { + "text": "foo", + "_key": "105067939604", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "fcc0002a7d9e" + }, + { + "text": "bar", + "_key": "a3d5b494a0ea", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is not deterministic due to parallel executions.", + "_key": "11b0a7fbec1e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e7ee6bb71f92", + "children": [ + { + "_key": "b438aeab60d5", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "faa226c1e79b", + "_type": "span", + "marks": [], + "text": "Therefore, the input of the third process needs to be synchronized using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "join", + "_key": "8d6901259fb5" + }, + { + "_type": "span", + "marks": [], + "text": " operator or a similar approach. The third process should be written as:", + "_key": "3f1c720c36f8" + } + ], + "_type": "block", + "style": "normal", + "_key": "70f066ffe56b", + "markDefs": [] + }, + { + "children": [ + { + "_key": "cf99a817e2e5", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e92e8694ddbc" + }, + { + "code": "process gather {\n input:\n set val(pair), file(bam), file(bai) from bam_ch.join(bai_ch)\n \"\"\"\n merge_command $bam $bai\n \"\"\"\n}", + "_type": "code", + "_key": "6d66a6bebb3c" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8da7f45cf14a" + } + ], + "_type": "block", + "style": "normal", + "_key": "0a780c208e99" + }, + { + "_key": "60bb7887d8a9", + "children": [ + { + "text": "Still in trouble?", + "_key": "49df38d17e1e", + "_type": "span" + } + ], + "_type": "block", + "style": "h4" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "These are most frequent causes of problems with the Nextflow resume mechanism. If you are still not able to resolve your problem, identify the first process not resuming correctly, then run your script twice using ", + "_key": "98bd862104f3" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-dump-hashes", + "_key": "7fae0179ed9a" + }, + { + "_key": "b55e27a934de", + "_type": "span", + "marks": [], + "text": ". You can then compare the resulting " + }, + { + "marks": [ + "code" + ], + "text": ".nextflow.log", + "_key": "5ed4757372e8", + "_type": "span" + }, + { + "text": " files (the first will be named ", + "_key": "15ae5d359e41", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".nextflow.log.1", + "_key": "77eb378c5800" + }, + { + "_type": "span", + "marks": [], + "text": ").", + "_key": "9bf794eace3c" + } + ], + "_type": "block", + "style": "normal", + "_key": "c88c6c18fca3" + }, + { + "style": "normal", + "_key": "9d83045d78f5", + "children": [ + { + "text": "", + "_key": "8deb4b6d2790", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Unfortunately, the information reported by ", + "_key": "f1a9b89a7b6a" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-dump-hashes", + "_key": "e0b81bc22ba1" + }, + { + "_type": "span", + "marks": [], + "text": " can be quite cryptic, however, with the help of a good ", + "_key": "f6d06fe5b7f9" + }, + { + "marks": [ + "em" + ], + "text": "diff", + "_key": "282d61459e2c", + "_type": "span" + }, + { + "marks": [], + "text": " tool it is possible to compare the two log files to identify the reason for the cache to be invalidated.", + "_key": "438c968e6d8b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "cb1c188235e6" + }, + { + "style": "normal", + "_key": "f920ccd53a13", + "children": [ + { + "text": "", + "_key": "df6696f9a801", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "The golden rule", + "_key": "4b97f8058607", + "_type": "span" + } + ], + "_type": "block", + "style": "h4", + "_key": "7ec5a36569df" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Never try to debug this kind of problem with production data! This issue can be annoying, but when it happens it should be able to be replicated in a consistent manner with any data.", + "_key": "7895695d4c48" + } + ], + "_type": "block", + "style": "normal", + "_key": "fe612b73bb56", + "markDefs": [] + }, + { + "_key": "a464230b70cd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8c79b84af40e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4956592d794e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Therefore, we always suggest Nextflow developers include in their pipeline project a small synthetic dataset to easily execute and test the complete pipeline execution in a few seconds. This is the golden rule for debugging and troubleshooting execution problems avoids getting stuck with production data.", + "_key": "375973649f32" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5ea1faf82037", + "children": [ + { + "_key": "09868bd33e28", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "h4", + "_key": "14c5fa6306ae", + "children": [ + { + "_type": "span", + "text": "Resume by default?", + "_key": "b560e6b475c8" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "84aa3f14e3f0", + "_type": "span", + "marks": [], + "text": "Given the majority of users always apply resume, we recently discussed having resume applied by the default." + } + ], + "_type": "block", + "style": "normal", + "_key": "3dcaba20c7cf", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ab618ec89a3f" + } + ], + "_type": "block", + "style": "normal", + "_key": "7ee0e9156853" + }, + { + "style": "normal", + "_key": "8d53df890b9f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Is there any situation where you do not use resume? Would a flag specifying ", + "_key": "f44b2c3f7c4b", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-no-cache", + "_key": "def3d387c75a" + }, + { + "marks": [], + "text": " be enough to satisfy these use cases?", + "_key": "63aa85dca53c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "486e722700f2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c25e62877b2d" + } + ] + }, + { + "_key": "7798ac9f5a8a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We want to hear your thoughts on this. Help steer Nextflow development and vote in the twitter poll below.", + "_key": "d1a831632f3b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "fa8cbb3425e5", + "children": [ + { + "text": "", + "_key": "a9f935c549fd", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6a9bb3978632", + "markDefs": [ + { + "_type": "link", + "href": "https://twitter.com/nextflowio/status/1145599932268785665?ref_src=twsrc%5Etfw", + "_key": "cafd4744d3f5" + } + ], + "children": [ + { + "marks": [], + "text": "> Should -resume⏯️ be the default when launching a Nextflow pipeline? > > — Nextflow (@nextflowio) ", + "_key": "d0fa4eab4ec8", + "_type": "span" + }, + { + "text": "July 1, 2019", + "_key": "0bca7d45cdf0", + "_type": "span", + "marks": [ + "cafd4744d3f5" + ] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "6aa4ff95e0d8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9ba69409bf4a" + } + ] + }, + { + "_key": "cc8b3ae05f2f", + "src": "https://platform.twitter.com/widgets.js", + "_type": "script", + "id": "" + }, + { + "_type": "block", + "_key": "18e32644cad6" + } + ], + "meta": { + "slug": { + "current": "troubleshooting-nextflow-resume" + } + }, + "_rev": "Ot9x7kyGeH5005E3MIwj0a", + "title": "Troubleshooting Nextflow resume", + "_id": "2c282a52dee5" + }, + { + "_id": "3210507ef7fa", + "title": "Nextflow's colorful new console output", + "author": { + "_type": "reference", + "_ref": "drafts.phil-ewels" + }, + "meta": { + "description": "Nextflow is a command-line interface (CLI) tool that runs in the terminal. Everyone who has launched Nextflow from the command line knows what it’s like to follow the console output as a pipeline runs: the excitement of watching jobs zipping off as they’re submitted, the satisfaction of the phrase “Pipeline completed successfully!” and occasionally, the sinking feeling of seeing an error message.", + "slug": { + "current": "nextflow-colored-logs" + } + }, + "_type": "blogPost", + "_updatedAt": "2024-10-04T15:11:12Z", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "2cceea31d746", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow is a command-line interface (CLI) tool that runs in the terminal. Everyone who has launched Nextflow from the command line knows what it’s like to follow the console output as a pipeline runs: the excitement of watching jobs zipping off as they’re submitted, the satisfaction of the phrase ", + "_key": "93dd136fee87" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"Pipeline completed successfully!\"", + "_key": "23a1705fc8a8" + }, + { + "_type": "span", + "marks": [], + "text": " and occasionally, the sinking feeling of seeing an error message.", + "_key": "3e4c01e7bb8c" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "91928470bf42" + } + ], + "_type": "block", + "style": "normal", + "_key": "14e80afbe07e" + }, + { + "_key": "a3df7e313994", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Because the CLI is the primary way that people interact with Nextflow, a little bit of polish can have a big effect. In this article, I’m excited to describe an upgrade for the console output that should make monitoring workflow progress just a little easier.", + "_key": "749ec7d8d20d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "69e1a35f3d66", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ecf1e96c62cd", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "The new functionality is available in ", + "_key": "b59c634ee5d1", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "24.02-0-edge", + "_key": "0ce758759f0a" + }, + { + "_key": "29f533117330", + "_type": "span", + "marks": [], + "text": " and will be included in the next " + }, + { + "text": "24.04.0", + "_key": "2ab18c1191b7", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " stable release. You can try it out now by updating Nextflow as follows:", + "_key": "882c292ac7a1" + } + ], + "_type": "block", + "style": "normal", + "_key": "5fa497d423e5" + }, + { + "_type": "block", + "style": "normal", + "_key": "5e5daac79d6c", + "markDefs": [], + "children": [ + { + "_key": "0cfbb9a81c69", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "code": "NXF_EDGE=1 nextflow self-update", + "_type": "code", + "_key": "a4767acdf80d" + }, + { + "_type": "block", + "style": "normal", + "_key": "3722325771df", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4699ef92051a" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Background", + "_key": "f99ff1cc007a" + } + ], + "_type": "block", + "style": "h2", + "_key": "b1385e63c637" + }, + { + "_type": "block", + "style": "normal", + "_key": "7d03b6df37d4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The Nextflow console output hasn’t changed much over the 10 years that it’s been around. The biggest update happened in 2018 when "ANSI logging" was released in version ", + "_key": "60629d7631b5", + "_type": "span" + }, + { + "_key": "ee10d920672b", + "_type": "span", + "marks": [ + "code" + ], + "text": "18.10.0" + }, + { + "marks": [], + "text": ". This replaced the stream of log messages announcing each task submission with a view that updates dynamically, giving an overview of each process. This gives an overview of the pipeline’s progress rather than being swamped with thousands of individual task submissions.", + "_key": "e6704d24e93f", + "_type": "span" + } + ] + }, + { + "_key": "0f9d24597ed1", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5e712aaad2d3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "alt": "Nextflow console output with and without ANSI logging", + "_key": "f86d58e5d502", + "asset": { + "_ref": "image-4015d6e188bb46dea4d70532249976f412e5fbae-4532x2026-png", + "_type": "reference" + } + }, + { + "style": "normal", + "_key": "6fdd22ca844d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "ANSI console output. Nextflow log output from running the nf-core/rnaseq pipeline before (Left) and after (Right) enabling ANSI logging.", + "_key": "f856ec5b997d" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d0da24c1564e" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5d6cfcd8cdc" + }, + { + "_type": "block", + "style": "normal", + "_key": "4778375bbd81", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "64e32e99c6e6" + } + ] + }, + { + "_key": "db4b773c0ce0", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/Textualize/rich", + "_key": "49770632ec38" + }, + { + "href": "https://github.com/ewels/rich-click/", + "_key": "bdaacd6af1a9", + "_type": "link" + }, + { + "href": "https://github.com/ewels/rich-codex", + "_key": "6559fe6b7114", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/3976", + "_key": "0197c1a1df4a" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/3976#issuecomment-1568071479", + "_key": "f6725976b0d7" + } + ], + "children": [ + { + "text": "I can be a little obsessive about tool user interfaces. The nf-core template, as well as MultiQC and nf-core/tools all have coloured terminal output, mostly using the excellent ", + "_key": "f6f7044f202f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "49770632ec38" + ], + "text": "textualize/rich", + "_key": "62498605b6a5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". I’ve also written a couple of general-use tools around this such as ", + "_key": "c6bf797d17c4" + }, + { + "_key": "71f41f9a7a9b", + "_type": "span", + "marks": [ + "bdaacd6af1a9" + ], + "text": "ewels/rich-click" + }, + { + "marks": [], + "text": " for Python CLI help texts, and ", + "_key": "90bc461c9e17", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "6559fe6b7114" + ], + "text": "ewels/rich-codex", + "_key": "6e03857ccadd" + }, + { + "text": " to auto-generate screenshots from code / commands in markdown. The problem with being surrounded by so much colored CLI output is that any tools ", + "_key": "c14b9e401175", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "em" + ], + "text": "without", + "_key": "c5b46b190bca", + "_type": "span" + }, + { + "text": " colors start to stand out. Dropping hints to the Nextflow team didn’t work, so eventually I whipped up ", + "_key": "83ae03c62a8d", + "_type": "span", + "marks": [] + }, + { + "_key": "d2340bfb5687", + "_type": "span", + "marks": [ + "0197c1a1df4a" + ], + "text": "a proposal" + }, + { + "_key": "b26e659cf56a", + "_type": "span", + "marks": [], + "text": " of what the console output could look like using the tools I knew: Python and Rich. Paolo knows me well and " + }, + { + "_key": "ffac5c00f92f", + "_type": "span", + "marks": [ + "f6725976b0d7" + ], + "text": "offered up a bait" + }, + { + "_type": "span", + "marks": [], + "text": " that I couldn’t resist: ", + "_key": "7bd61d55c11e" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"Phil. I think this a great opportunity to improve your Groovy skills 😆\".", + "_key": "3c61b16df3b9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "07299ee745e8", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "15647d65e6cc", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "h2", + "_key": "f545fe0c3436", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Showing what’s important", + "_key": "a0e7d5e753a1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The console output shown by Nextflow describes a range of information. Much of it aligns in vertical columns, but not all. There’s also a variety of fields, some of which are more important than others to see at a glance.", + "_key": "cbf060d88f44" + } + ], + "_type": "block", + "style": "normal", + "_key": "0906e069e392" + }, + { + "_type": "block", + "style": "normal", + "_key": "63f5577d02b5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9a12bff875a1", + "_type": "span" + } + ] + }, + { + "_type": "image", + "alt": "New coloured output from Nextflow", + "_key": "c15e29625064", + "asset": { + "_ref": "image-aca8082c7fcd2be86b3cbd8d29611c81c8127620-2532x1577-png", + "_type": "reference" + } + }, + { + "_key": "b6dc4faef5f6", + "markDefs": [], + "children": [ + { + "_key": "bdf050d4eca9", + "_type": "span", + "marks": [ + "em" + ], + "text": "Introducing: colored console output. Output from running nf-core/rnaseq with the new colors applied (nf-core header removed for clarity)." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "68894e070af7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bdb8cf95c127" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "02f4f28130c6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8e71c9afefa2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "605554477f37", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With some judicious use of the ", + "_key": "e93b34ffe144" + }, + { + "marks": [ + "code" + ], + "text": "dim", + "_key": "283d07e9df8f", + "_type": "span" + }, + { + "_key": "afa816e8339d", + "_type": "span", + "marks": [], + "text": " style, we can make less important information fade into the background. For example, the "stem" of the fully qualified process identifiers now step back to allow the process name to stand out. Secondary information such as the number of tasks that were cached, or the executor that is being submitted to, are still there to see but take a back seat. Doing the reverse with some " + }, + { + "_key": "dc635918a628", + "_type": "span", + "marks": [ + "code" + ], + "text": "bold" + }, + { + "marks": [], + "text": " text helps to highlight the run name – key information for identifying and resuming pipeline runs. Using color allows different fields to be easily distinguished, such as process labels and task hashes. Greens, blues, and reds in the task statuses allow a reader to get an impression of the run progress without needing to read every number.", + "_key": "73a193752f24", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "083ab90d5fc3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9a15250a9228", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Probably the most difficult aspect technically was the ", + "_key": "ef23e6974a8c", + "_type": "span" + }, + { + "_key": "4dea47e27329", + "_type": "span", + "marks": [ + "code" + ], + "text": "NEXTFLOW" + }, + { + "marks": [], + "text": " header line. I knew I wanted to use the ", + "_key": "b6e035f994cf", + "_type": "span" + }, + { + "_key": "ad118d3bf699", + "_type": "span", + "marks": [ + "em" + ], + "text": "\"Nextflow Green\"" + }, + { + "_type": "span", + "marks": [], + "text": " here, or as close to it as possible. But colors in the terminal are tricky. What the ANSI standard defines as ", + "_key": "1e34802dadc3" + }, + { + "marks": [ + "code" + ], + "text": "green", + "_key": "5b4d4d88a41a", + "_type": "span" + }, + { + "_key": "0dbb6942e5d0", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "text": "black", + "_key": "5293695485c8", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": ", and ", + "_key": "15a1ce43a090", + "_type": "span", + "marks": [] + }, + { + "_key": "84859c6470d1", + "_type": "span", + "marks": [ + "code" + ], + "text": "blue" + }, + { + "_key": "2e99c6824387", + "_type": "span", + "marks": [], + "text": " can vary significantly across different systems and terminal themes. Some people use a light color scheme and others run in dark mode. This hadn’t mattered much for most of the colors up until this point - I could use the " + }, + { + "_type": "span", + "marks": [ + "5890a980513e" + ], + "text": "Jansi", + "_key": "732986f78a29" + }, + { + "marks": [], + "text": " library to use named colors and they should look ok. But for the specific RGB of the ", + "_key": "04274df0030d", + "_type": "span" + }, + { + "_key": "da6094083dcc", + "_type": "span", + "marks": [ + "em" + ], + "text": "\"Nextflow Green\"" + }, + { + "marks": [], + "text": " I had to ", + "_key": "a271b04512f3", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "6db61913f7a2" + ], + "text": "hardcode specific ANSI control characters", + "_key": "40e68f248356" + }, + { + "text": ". But it got worse - it turns out that the default Terminal app that ships with macOS only supports 256 colors, so I had to find the closest match (", + "_key": "920cddbced40", + "_type": "span", + "marks": [] + }, + { + "text": "\"light sea green\"", + "_key": "8bb8978a245d", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "c1b29d02c35e", + "_type": "span", + "marks": [], + "text": " if you’re curious). Even once the green was ok, using " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "black", + "_key": "1e2fcd727ff0" + }, + { + "_type": "span", + "marks": [], + "text": " as the text color meant that it would actually render as white with some terminal color themes and be unreadable. In the end, the header text is a very dark gray.", + "_key": "c15810f5d682" + } + ], + "_type": "block", + "style": "normal", + "_key": "386ae756f6fa", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/fusesource/jansi", + "_key": "5890a980513e" + }, + { + "href": "https://github.com/nextflow-io/nextflow/blob/c9c7032c2e34132cf721ffabfea09d893adf3761/modules/nextflow/src/main/groovy/nextflow/cli/CmdRun.groovy#L379-L389", + "_key": "6db61913f7a2", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d469a86d64a4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6d1cbb5b6bba" + } + ] + }, + { + "_type": "image", + "alt": "Testing many horrible terminal themes", + "_key": "5009c3867293", + "asset": { + "_type": "reference", + "_ref": "image-925712beff7bb566147747f56c5da9d15991c91e-2382x1378-png" + } + }, + { + "_key": "95d4cd9b2c45", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Testing color rendering across a wide range of themes in the OS X Terminal app.", + "_key": "1c8174de0ec5" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ea81db3baf81", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "31b8c6019fbe" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0d4eb04267b4" + } + ], + "_type": "block", + "style": "normal", + "_key": "95a1c08976aa", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "56088a39723f", + "markDefs": [], + "children": [ + { + "text": "More than just colors", + "_key": "4bc13a1743a9", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "f7ffa651f4b5", + "markDefs": [], + "children": [ + { + "_key": "a8b2ee89a0d9", + "_type": "span", + "marks": [], + "text": "Whilst the original intent was focused on using color, it didn’t take long to come up with a shortlist of other niggles that I wanted to fix. I took this project as an opportunity to address a few of these, specifically:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3c7d6e512cac", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "de1fc1cb92fe", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "70904cbccd64", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "286b0cebf88f0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Make the most of the available width in the terminal:" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "fb059d3f78e40", + "_type": "span", + "marks": [], + "text": "Redundant text is now cut down when the screen is narrow. Specifically the repeated " + }, + { + "_key": "fb059d3f78e41", + "_type": "span", + "marks": [ + "code" + ], + "text": "process >" + }, + { + "_key": "fb059d3f78e42", + "_type": "span", + "marks": [], + "text": " text, plus other small gains such as replacing the three " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "...", + "_key": "fb059d3f78e43" + }, + { + "marks": [], + "text": " characters with a single ", + "_key": "fb059d3f78e44", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "…", + "_key": "fb059d3f78e45" + }, + { + "text": " character. The percentage-complete is removed if the window is really narrow. These changes happen dynamically every time the screen refreshes, so should update if you resize the terminal window.\n\n", + "_key": "fb059d3f78e46", + "_type": "span", + "marks": [] + } + ], + "level": 2, + "_type": "block", + "style": "normal", + "_key": "a12a76069ffb", + "listItem": "bullet" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "fb9fa7f012ce0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Be more selective about which part of process names are truncated:" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "53f9a674b4d3" + }, + { + "children": [ + { + "text": "There’s only so much width that can be saved, and fully qualified process names are long. The current Nextflow console output truncates the end of the identifier if there’s no space, but this is the part that varies most between pipeline steps. Instead, we can truncate the start and preserve the process name and label.\n\n", + "_key": "e2d6449fb0a50", + "_type": "span", + "marks": [] + } + ], + "level": 2, + "_type": "block", + "style": "normal", + "_key": "8f3a5f3dbe5e", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "24210723aa910", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Don’t show all pending processes without tasks:" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "e9b1b1ebcd0d", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "97c67a113c38", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The existing ANSI logging shows ", + "_key": "6688e2628f800" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "all", + "_key": "6688e2628f801" + }, + { + "_type": "span", + "marks": [], + "text": " processes in the pipeline, even those that haven’t had any tasks submitted. If a pipeline has a lot of processes this can push the running processes out of view.", + "_key": "6688e2628f802" + } + ], + "level": 2 + }, + { + "markDefs": [], + "children": [ + { + "text": "Nextflow now tracks the number of available rows in the terminal and hides pending processes once we run out of space. Running processes are always printed.", + "_key": "59def1f024380", + "_type": "span", + "marks": [] + } + ], + "level": 2, + "_type": "block", + "style": "normal", + "_key": "997985efd10e", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "1db92fad201d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "eac9a108248c" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5c5de7e98d7d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The end result is console output that makes the most of the available space in your terminal window:", + "_key": "a418f3086855", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "94b78ed301fa" + } + ], + "_type": "block", + "style": "normal", + "_key": "3798fa25cad2" + }, + { + "_type": "image", + "alt": "Nextflow console output at different terminal window widths", + "_key": "ea236e6b58d4", + "asset": { + "_ref": "image-a769af04b6ec5811297b9124eab2bdf5e941dfe3-2650x2130-png", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "a627f22d8f4a", + "markDefs": [], + "children": [ + { + "_key": "fa01d037006d", + "_type": "span", + "marks": [ + "em" + ], + "text": "Progress of the nf-core/rnaseq shown across 3 different terminal-width breakpoints, with varying levels of text truncation." + } + ] + }, + { + "_key": "583700f8cd36", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6441d88927a6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "c6c374866825", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "cbd388d826b5", + "markDefs": [] + }, + { + "style": "h2", + "_key": "d725b7ac6481", + "markDefs": [], + "children": [ + { + "_key": "e488319f824c", + "_type": "span", + "marks": [], + "text": "Contributing to Nextflow" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2024/bytesize_nextflow_dev", + "_key": "0b9d0845e1c1" + }, + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=R0fqk5OS-nw", + "_key": "d2be1f99ae8e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Despite building tools that use Nextflow for many years, I’ve spent relatively little time venturing into the main codebase myself. Just as with any contributor, part of the challenge was figuring out how to build Nextflow, how to navigate its code structure and how to write tests. I found it quite a fun experience, so I described and demoed the process in a recent nf-core Bytesize talk titled \"", + "_key": "ad9f816bae61" + }, + { + "_type": "span", + "marks": [ + "0b9d0845e1c1" + ], + "text": "Contributing to Nextflow", + "_key": "5daee381c807" + }, + { + "marks": [], + "text": "\". You can watch the talk on ", + "_key": "087af4d7449e", + "_type": "span" + }, + { + "marks": [ + "d2be1f99ae8e" + ], + "text": "YouTube", + "_key": "195290dcf8b4", + "_type": "span" + }, + { + "marks": [], + "text": ", where I explain the mechanics of forking Nextflow, enhancing, compiling, and testing changes locally, and contributing enhancements back to the main code base.", + "_key": "479a00b173cb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a85b4a299cd" + }, + { + "_type": "youtube", + "id": "R0fqk5OS-nw", + "_key": "8e46168bfd6f" + }, + { + "_type": "block", + "style": "normal", + "_key": "2cbe2ad1ea81", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9c1c340b8ca5", + "_type": "span" + } + ] + }, + { + "style": "h2", + "_key": "a3f7fc40c99b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "But wait, there’s more!", + "_key": "0c12e4175b2e" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "I’m happy with how the new console output looks, and it seems to have been well received so far. But once the warm glow of the newly merged pull request started to subside, I realized there was more to do. The console output is great for monitoring a running pipeline, but I spend most of my time these days digging through much more verbose ", + "_key": "d343188ee99f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": ".nextflow.log", + "_key": "725d37d96f24", + "_type": "span" + }, + { + "marks": [], + "text": " files. Suddenly it seemed a little unfair that these didn’t also benefit from a similar treatment.", + "_key": "505d0a6e0abc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2755becf075b" + }, + { + "style": "normal", + "_key": "c7cb8a38729a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "cf8f3b2b7dd8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "74f76f8fd4cb", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/willmcgugan", + "_key": "dfce70339b53" + }, + { + "_key": "690369af101d", + "_type": "link", + "href": "https://github.com/Textualize/rich" + }, + { + "href": "https://textual.textualize.io/blog/2024/02/11/file-magic-with-the-python-standard-library/", + "_key": "d24d6b9364f1", + "_type": "link" + }, + { + "_key": "9179317e5533", + "_type": "link", + "href": "https://github.com/textualize/toolong" + }, + { + "_type": "link", + "href": "https://www.textualize.io/", + "_key": "0c901ae88c65" + }, + { + "_type": "link", + "href": "https://github.com/textualize/rich", + "_key": "59908219e99d" + } + ], + "children": [ + { + "_key": "9b6a6dc8c9dd", + "_type": "span", + "marks": [], + "text": "This project was a little different because the logs are just files on the disk, meaning that I could approach the problem with whatever code stack I liked. Coincidentally, " + }, + { + "_type": "span", + "marks": [ + "dfce70339b53" + ], + "text": "Will McGugan", + "_key": "40bbea19d01b" + }, + { + "_type": "span", + "marks": [], + "text": " (author of ", + "_key": "49e07286c273" + }, + { + "text": "textualize/rich", + "_key": "0029aa9e7fd6", + "_type": "span", + "marks": [ + "690369af101d" + ] + }, + { + "text": ") was recently ", + "_key": "a3b90ffca585", + "_type": "span", + "marks": [] + }, + { + "_key": "24eaa729369b", + "_type": "span", + "marks": [ + "d24d6b9364f1" + ], + "text": "writing about" + }, + { + "_type": "span", + "marks": [], + "text": " a side project of his own: ", + "_key": "26008b26fa1c" + }, + { + "text": "Toolong", + "_key": "f08df20e1393", + "_type": "span", + "marks": [ + "9179317e5533" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". This is a terminal app built using ", + "_key": "818bb16bd038" + }, + { + "text": "Textual", + "_key": "80b37ac687d4", + "_type": "span", + "marks": [ + "0c901ae88c65" + ] + }, + { + "marks": [], + "text": " which is specifically aimed at viewing large log files. I took it for a spin and it did a great job with Nextflow log files right out of the box, but I figured that I could take it further. At its core, Toolong uses the ", + "_key": "9761e6ed8015", + "_type": "span" + }, + { + "marks": [ + "59908219e99d" + ], + "text": "Rich", + "_key": "e2b779b579c4", + "_type": "span" + }, + { + "text": " library to format text and so with a little hacking, I was able to introduce a handful of custom formatters for the Nextflow logs. And voilà, we have colored console output for log files too!", + "_key": "d0cf9d4e71b2", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "3ca78e845d80", + "markDefs": [], + "children": [ + { + "_key": "4ba29466853c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "d6e99dafac06", + "asset": { + "_ref": "image-2255c53730e2472f27b817ff8b085d3674de07ff-3938x2290-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Formatting .nextflow.log files with Toolong" + }, + { + "children": [ + { + "_key": "6693d1695117", + "_type": "span", + "marks": [ + "em" + ], + "text": "The tail end of a " + }, + { + "_type": "span", + "marks": [ + "code", + "em" + ], + "text": ".nextflow.log", + "_key": "ce7ac27e92de" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": " file, rendered with ", + "_key": "38a80fdc66a9" + }, + { + "_type": "span", + "marks": [ + "code", + "em" + ], + "text": "less", + "_key": "21a44ae8bd3d" + }, + { + "_key": "eb7c0d9144d8", + "_type": "span", + "marks": [ + "em" + ], + "text": " (Left) and Toolong (Right). Try finding the warning log message in both!" + } + ], + "_type": "block", + "style": "normal", + "_key": "4b588a21b74c", + "markDefs": [] + }, + { + "style": "normal", + "_key": "2f4528ddfad1", + "markDefs": [], + "children": [ + { + "_key": "d83efff4f1ea", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6748e137f113", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "a1acd474ca3b" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "By using Toolong as a viewer we get much more than just syntax highlighting too - it provides powerful file navigation and search functionality. It also supports tailing files in real time, so you can launch a pipeline in one window and tail the log in another to have the best of both worlds!", + "_key": "7f3bea44d311", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8d0440bdbd8a" + }, + { + "_key": "f9a3604c6eb5", + "markDefs": [], + "children": [ + { + "_key": "cfabceade633", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f6fe30226cbc", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "Running nf-core/rnaseq with the new Nextflow coloured console output (Left) whilst simultaneously tailing the ", + "_key": "79037961268f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code", + "em" + ], + "text": ".nextflow.log", + "_key": "067c04ae3a04" + }, + { + "_key": "6f0c623f1fd2", + "_type": "span", + "marks": [ + "em" + ], + "text": " file using " + }, + { + "text": "nf-core log", + "_key": "5cd96b7dc1af", + "_type": "span", + "marks": [ + "code", + "em" + ] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": " (Right).", + "_key": "a8a3086dfa7a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "02bf2b489fa7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f0b051aec318" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "407733f736c7", + "markDefs": [], + "children": [ + { + "_key": "7688b0c18e8c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "f6e5d85bd132", + "markDefs": [ + { + "_key": "4d1d841f6e09", + "_type": "link", + "href": "https://github.com/Textualize/toolong/pull/47" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/tools/pull/2895", + "_key": "bd904a2596ac" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This work with Toolong is still in two ", + "_key": "27828ba68990" + }, + { + "_key": "5d7c1a8ffef4", + "_type": "span", + "marks": [ + "4d1d841f6e09" + ], + "text": "open" + }, + { + "text": " ", + "_key": "df2c56a27606", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "bd904a2596ac" + ], + "text": "pull requests", + "_key": "b6a89ad65b58" + }, + { + "_type": "span", + "marks": [], + "text": " as I write this, but hopefully you’ll soon be able to use the ", + "_key": "a890ea20a47c" + }, + { + "text": "nf-core log", + "_key": "338d370e5461", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " command in a directory where you’ve run Nextflow, and it’ll launch Toolong with any log files it finds.", + "_key": "5efd69709769", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "publishedAt": "2024-03-28T07:00:00.000Z", + "_createdAt": "2024-09-25T14:18:16Z", + "_rev": "mvya9zzDXWakVjnX4hhCrm" + }, + { + "_rev": "rsIQ9Jd8Z4nKBVUruy4PGW", + "title": "Nextflow DSL 2 is here!", + "meta": { + "slug": { + "current": "dsl2-is-here" + } + }, + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_updatedAt": "2024-09-26T09:02:20Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "06f9aa95cde1" + } + ], + "body": [ + { + "_key": "629e9379215e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We are thrilled to announce the stable release of Nextflow DSL 2 as part of the latest 20.07.1 version!", + "_key": "b1b0c05794d2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c3a7a280da6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "cf3a0e5a9ffa" + }, + { + "markDefs": [], + "children": [ + { + "text": "Nextflow DSL 2 represents a major evolution of the Nextflow language and makes it possible to scale and modularise your data analysis pipeline while continuing to use the Dataflow programming paradigm that characterises the Nextflow processing model.", + "_key": "e07149cfcdc1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4184c68f29c7" + }, + { + "style": "normal", + "_key": "e6ce21546d62", + "children": [ + { + "_key": "8a67bb60e0b7", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9a525d82ff3a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We spent more than one year collecting user feedback and making sure that DSL 2 would naturally fit the programming experience Nextflow developers are used to.", + "_key": "6b65eccfee7f" + } + ] + }, + { + "style": "normal", + "_key": "23b819adbc20", + "children": [ + { + "_key": "c2fb41da4984", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h4", + "_key": "014fd4b2497d", + "children": [ + { + "_key": "8286965d737c", + "_type": "span", + "text": "DLS 2 in a nutshell" + } + ] + }, + { + "_key": "be9262d4dd8b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Backward compatibility is a paramount value, for this reason the changes introduced in the syntax have been minimal and above all, guarantee the support of all existing applications. DSL 2 will be an opt-in feature for at least the next 12 to 18 months. After this transitory period, we plan to make it the default Nextflow execution mode.", + "_key": "417a251356ee", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "4b7fd2378549", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c63188359a84" + }, + { + "_key": "db525cead075", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As of today, to use DSL 2 in your Nextflow pipeline, you are required to use the following declaration at the top of your script:", + "_key": "fe436c2e0e42", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "3eff4105e9a3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9f14f9cf6234" + }, + { + "code": "nextflow.enable.dsl=2", + "_type": "code", + "_key": "70313683d07a" + }, + { + "_key": "cb3008df6ea1", + "children": [ + { + "_key": "fd8a297e5800", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Note that the previous ", + "_key": "5d18efc095f5" + }, + { + "marks": [ + "code" + ], + "text": "nextflow.preview", + "_key": "1ca56e3bcbc8", + "_type": "span" + }, + { + "marks": [], + "text": " directive is still available, however, when using the above declaration the use of the final syntax is enforced.", + "_key": "496f7e826a91", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "27e6e452976f" + }, + { + "style": "normal", + "_key": "1287ddcc78e3", + "children": [ + { + "_key": "6624ad41312c", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "8cb6c9cbc82d", + "children": [ + { + "_type": "span", + "text": "Nextflow modules", + "_key": "3607e53e868e" + } + ], + "_type": "block", + "style": "h4" + }, + { + "_type": "block", + "style": "normal", + "_key": "a95d08afa026", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A module file is nothing more than a Nextflow script containing one or more ", + "_key": "ea3209b42b12" + }, + { + "_key": "0fec38859073", + "_type": "span", + "marks": [ + "code" + ], + "text": "process" + }, + { + "text": " definitions that can be imported from another Nextflow script.", + "_key": "ee39407a05c2", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "ef4dba2fc93e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f6b4fab165f7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "The only difference when compared with legacy syntax is that the process is not bound with specific input and output channels, as was previously required using the ", + "_key": "01dc43ae4019", + "_type": "span" + }, + { + "text": "from", + "_key": "b67ffc0e123f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "c296259675a7" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "into", + "_key": "ae85a7df26b3" + }, + { + "text": " keywords respectively. Consider this example of the new syntax:", + "_key": "8b7a8043e33d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c07ad9a60cce", + "markDefs": [] + }, + { + "style": "normal", + "_key": "377a4600fe0a", + "children": [ + { + "_key": "8d777b8fe33b", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "code": "process INDEX {\n input:\n path transcriptome\n output:\n path 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}", + "_type": "code", + "_key": "86e9a60603f3" + }, + { + "children": [ + { + "_key": "2b0a14830ef7", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a0a6344030f" + }, + { + "markDefs": [], + "children": [ + { + "_key": "3c9044acc05c", + "_type": "span", + "marks": [], + "text": "This allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "workflow", + "_key": "b630714b0041" + }, + { + "marks": [], + "text": " scope. This effectively allows for the composition of the pipeline logic and enables reuse of workflow components. We anticipate this to improve both the speed that users can develop new pipelines, and the robustness of these pipelines through the use of validated modules.", + "_key": "88ad8ae4c93d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "cb11b80961bd" + }, + { + "style": "normal", + "_key": "ccd9f49f28f4", + "children": [ + { + "text": "", + "_key": "dd369d48d4d8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "0a758965d124", + "_type": "span", + "marks": [], + "text": "Any process input can be provided as a function argument using the usual channel semantics familiar to Nextflow developers. Moreover process outputs can either be assigned to a variable or accessed using the implicit " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".out", + "_key": "80c3833c07a2" + }, + { + "_key": "9289722f74aa", + "_type": "span", + "marks": [], + "text": " attribute in the scope implicitly defined by the process name itself. See the example below:" + } + ], + "_type": "block", + "style": "normal", + "_key": "3521a13e0ac9", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "e3074ad6f345", + "children": [ + { + "_key": "abaacf905af8", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "include { INDEX; FASTQC; QUANT; MULTIQC } from './some/module/script.nf'\n\nread_pairs_ch = channel.fromFilePairs( params.reads)\n\nworkflow {\n INDEX( params.transcriptome )\n FASTQC( read_pairs_ch )\n QUANT( INDEX.out, read_pairs_ch )\n MULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )\n}", + "_type": "code", + "_key": "1ee5980f40c6" + }, + { + "style": "normal", + "_key": "b24c9fd23e7b", + "children": [ + { + "text": "", + "_key": "67d9c224920f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "f3e81c124800", + "markDefs": [], + "children": [ + { + "_key": "548f47d084d3", + "_type": "span", + "marks": [], + "text": "Also enhanced is the ability to use channels as inputs multiple times without the need to duplicate them (previously done with the special into operator) which makes the resulting pipeline code more concise, fluent and therefore readable!" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "40cbb33314a4", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9de5517d2710" + } + ], + "_type": "block" + }, + { + "_key": "839201ea848c", + "children": [ + { + "_type": "span", + "text": "Sub-workflows", + "_key": "d6ef657492e4" + } + ], + "_type": "block", + "style": "h4" + }, + { + "_key": "3abab27b8de6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Notably, the DSL 2 syntax allows for the definition of reusable processes as well as sub-workflow libraries. The only requirement is to provide a ", + "_key": "150c6c119ccc", + "_type": "span" + }, + { + "text": "workflow", + "_key": "04a5accd35f5", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " name that will be used to reference and declare the corresponding inputs and outputs using the new ", + "_key": "a791b24a7872", + "_type": "span", + "marks": [] + }, + { + "text": "take", + "_key": "805c11044073", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " and ", + "_key": "43af5357eb60", + "_type": "span", + "marks": [] + }, + { + "_key": "92a99fc1c45b", + "_type": "span", + "marks": [ + "code" + ], + "text": "emit" + }, + { + "_type": "span", + "marks": [], + "text": " keywords. For example:", + "_key": "11a6429ecf89" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b0c885abb3c8" + } + ], + "_type": "block", + "style": "normal", + "_key": "a9cb2b85e1ca" + }, + { + "_type": "code", + "_key": "8a1045f4c8f5", + "code": "workflow RNASEQ {\n take:\n transcriptome\n read_pairs_ch\n\n main:\n INDEX(transcriptome)\n FASTQC(read_pairs_ch)\n QUANT(INDEX.out, read_pairs_ch)\n\n emit:\n QUANT.out.mix(FASTQC.out).collect()\n}" + }, + { + "style": "normal", + "_key": "133e4c1cca67", + "children": [ + { + "_type": "span", + "text": "", + "_key": "aaf0039da737" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now named sub-workflows can be used in the same way as processes, allowing you to easily include and reuse multi-step workflows as part of larger workflows. Find more details ", + "_key": "9bf473c34dbf" + }, + { + "marks": [ + "ec7fb9ab7375" + ], + "text": "here", + "_key": "0699bfce1add", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "26c56bf6d23c" + } + ], + "_type": "block", + "style": "normal", + "_key": "f467490844b1", + "markDefs": [ + { + "_key": "ec7fb9ab7375", + "_type": "link", + "href": "/docs/latest/dsl2.html" + } + ] + }, + { + "style": "normal", + "_key": "6193857cb1c6", + "children": [ + { + "_key": "3bfa9dcbd58d", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h4", + "_key": "f7e382a0d43b", + "children": [ + { + "_key": "25e9795cdfd5", + "_type": "span", + "text": "More syntax sugar" + } + ] + }, + { + "style": "normal", + "_key": "d8d874e12ebe", + "markDefs": [], + "children": [ + { + "text": "Another exciting feature of Nextflow DSL 2 is the ability to compose built-in operators, pipeline processes and sub-workflows with the pipe (|) operator! For example the last line in the above example could be written as:", + "_key": "3714313bbb5e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "18276c8e26cf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0d9aedfb33ba" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "009b1fe7bd50", + "code": "emit:\n QUANT.out | mix(FASTQC.out) | collect" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "41e3dd3a4a96" + } + ], + "_type": "block", + "style": "normal", + "_key": "436b6ca0a9e0" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.", + "_key": "e26a5fa8101d" + } + ], + "_type": "block", + "style": "normal", + "_key": "8ec81d0be3e7" + }, + { + "style": "normal", + "_key": "c46191f0fd31", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6e2636d6a795" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Another change is the introduction of ", + "_key": "7a44b23981b7", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "channel", + "_key": "e8974cd3d396" + }, + { + "text": " as an alternative name as a synonym of ", + "_key": "dcbab0a599a2", + "_type": "span", + "marks": [] + }, + { + "text": "Channel", + "_key": "0790e5160f03", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " type identifier and therefore allows the use of ", + "_key": "28ab8b739f9b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "channel.fromPath", + "_key": "21a5b79be0e1" + }, + { + "marks": [], + "text": " instead of ", + "_key": "71df8021f826", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "Channel.fromPath", + "_key": "649cde0af4d9" + }, + { + "_type": "span", + "marks": [], + "text": " and so on. This is a small syntax sugar to keep the capitazionation consistent with the rest of the language.", + "_key": "e08224b7a8f6" + } + ], + "_type": "block", + "style": "normal", + "_key": "b5ffe4e2b470" + }, + { + "children": [ + { + "_key": "560ff28c18db", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3a6628bc6b6e" + }, + { + "style": "normal", + "_key": "916c3fd5fe3e", + "markDefs": [], + "children": [ + { + "text": "Moreover, several process inputs and outputs syntax shortcuts were removed when using the final version of DSL 2 to make it more predictable. For example, with DSL1, in a tuple input or output declaration the component type could be omitted, for example:", + "_key": "6fb438bb03f5", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "1ca4e100a5b1", + "children": [ + { + "text": "", + "_key": "b729e5907e9e", + "_type": "span" + } + ] + }, + { + "code": "input:\n tuple foo, 'bar'", + "_type": "code", + "_key": "46040087a56b" + }, + { + "_type": "block", + "style": "normal", + "_key": "dfffe2d6c7da", + "children": [ + { + "_type": "span", + "text": "", + "_key": "49aa3dc21cc8" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "893071437cad", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "dfee1fe297ea", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "foo", + "_key": "ae9305b22592" + }, + { + "_key": "7763eadf6651", + "_type": "span", + "marks": [], + "text": " identifier was implicitly considered an input value declaration instead the string " + }, + { + "_key": "a3409d80280d", + "_type": "span", + "marks": [ + "code" + ], + "text": "'bar'" + }, + { + "text": " was considered a shortcut for ", + "_key": "c0a27db5e290", + "_type": "span", + "marks": [] + }, + { + "_key": "e3a943e0f49e", + "_type": "span", + "marks": [ + "code" + ], + "text": "file('bar')" + }, + { + "_key": "56063cc42720", + "_type": "span", + "marks": [], + "text": ". However, this was a bit confusing especially for new users and therefore using DSL 2, the fully qualified version must be used:" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0477bfc51cc0", + "children": [ + { + "_key": "0d7e799ab406", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "9b00ce9a773f", + "code": "input:\n tuple val(foo), path('bar')" + }, + { + "style": "normal", + "_key": "579bb4a2c41b", + "children": [ + { + "_key": "5c685344126f", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "eeb28462efa4", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/dsl2.html#dsl2-migration-notes", + "_key": "e551bcf32c88" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can find more detailed migration notes at ", + "_key": "c7be1d8c14e3" + }, + { + "text": "this link", + "_key": "4f874e91fac6", + "_type": "span", + "marks": [ + "e551bcf32c88" + ] + }, + { + "marks": [], + "text": ".", + "_key": "00a1381f6444", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "6db2247f5288", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5bd7b9bc4f7a" + }, + { + "style": "h4", + "_key": "5ec58643376f", + "children": [ + { + "_type": "span", + "text": "What's next", + "_key": "0f60673e7cda" + } + ], + "_type": "block" + }, + { + "_key": "a592647db1c1", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As always, reaching an important project milestone can be viewed as a major success, but at the same time the starting point for challenges and developments. Having a modularization mechanism opens new needs and possibilities. The first one of which will be focused on the ability to test and validate process modules independently using a unit-testing style approach. This will definitely help to make the resulting pipelines more resilient.", + "_key": "e0afa317c738", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "df55fda40136", + "children": [ + { + "text": "", + "_key": "a8d442f465d2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Another important area for the development of the Nextflow language will be the ability to better formalise pipeline inputs and outputs and further decouple for the process declaration. Nextflow currently strongly relies on the ", + "_key": "398655ccaa8d" + }, + { + "text": "publishDir", + "_key": "6bc9d635aabc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " constructor for the generation of the workflow outputs.", + "_key": "096b9add57a8" + } + ], + "_type": "block", + "style": "normal", + "_key": "d244046ea2a1", + "markDefs": [] + }, + { + "children": [ + { + "_key": "c950cf771dbd", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a7311ce0868" + }, + { + "_key": "5a22f36a5baf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However in the new ", + "_key": "e237df55246a" + }, + { + "marks": [ + "em" + ], + "text": "module", + "_key": "8f9df02a58cf", + "_type": "span" + }, + { + "marks": [], + "text": " world, this approach results in ", + "_key": "e0800a5dc3fa", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "publishDir", + "_key": "a77c864a7257" + }, + { + "_type": "span", + "marks": [], + "text": " being tied to a single process definition. The plan is instead to extend this concept in a more general and abstract manner, so that it will be possible to capture and redirect the result of any process and sub-workflow based on semantic annotations instead of hardcoding it at the task level.", + "_key": "f48a4472d7c4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6d969975a051", + "children": [ + { + "text": "", + "_key": "a1bee7a0f65b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Conclusion", + "_key": "c61eff6f2a5d", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "a5cb08f8c40b" + }, + { + "_type": "block", + "style": "normal", + "_key": "b3a0f371f3e6", + "markDefs": [], + "children": [ + { + "_key": "80482ffbf0d2", + "_type": "span", + "marks": [], + "text": "We are extremely excited about today's release. This was a long awaited advancement and therefore we are very happy to make it available for general availability to all Nextflow users. We greatly appreciate all of the community feedback and ideas over the past year which have shaped DSL 2." + } + ] + }, + { + "style": "normal", + "_key": "be28ed9da573", + "children": [ + { + "_key": "e4f1bfd68c75", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "ae83523cacca", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We are confident this represents a big step forward for the project and will enable the writing of a more scalable and complex data analysis pipeline and above all, a more enjoyable experience.", + "_key": "8be6efd53e7a" + } + ], + "_type": "block", + "style": "normal" + } + ], + "_createdAt": "2024-09-25T14:15:48Z", + "_type": "blogPost", + "_id": "327c8c923834", + "publishedAt": "2020-07-24T06:00:00.000Z" + }, + { + "title": "Nextflow Summit 2022 Recap", + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "57b6d8012c37", + "_type": "span", + "marks": [], + "text": "Three days of Nextflow goodness in Barcelona" + } + ], + "_type": "block", + "style": "h2", + "_key": "952ec445a0cb" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "After a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!", + "_key": "00eefd598243" + } + ], + "_type": "block", + "style": "normal", + "_key": "930ccd7487e6", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "d84d2db1c6dc", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7c3e21780bea", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "5c3611190f25", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2022/hackathon-october-2022", + "_key": "a35ad8801bfc" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/stream/", + "_key": "c2b0c2e0e1b7" + } + ], + "children": [ + { + "text": "The week-long event kicked off with 50 people participating in a hackathon organized by nf-core beginning on October 10th. The ", + "_key": "d1f411659a4a", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "a35ad8801bfc" + ], + "text": "hackathon", + "_key": "1eb3bf84d85d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " tackled several cutting-edge projects with developer teams focused on various aspects of nf-core including documentation, subworkflows, pipelines, DSL2 conversions, modules, and infrastructure. The Nextflow Summit began mid-week attracting nearly 600 people, including 165 attending in person and another 433 remotely. The ", + "_key": "d776b9cf21f3" + }, + { + "marks": [ + "c2b0c2e0e1b7" + ], + "text": "YouTube live streams", + "_key": "27201c361663", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " have now collected over two and half thousand views. Just prior to the summit, three virtual Nextflow training events were also run with separate sessions for the Americas, EMEA, and APAC in which 835 people participated.", + "_key": "bb81f4b4d969" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d5a700bca970" + } + ], + "_type": "block", + "style": "normal", + "_key": "67e6894ac322" + }, + { + "style": "h2", + "_key": "b769550bd450", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "An action-packed agenda", + "_key": "197be1cfa0ca" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "fe3b8dcbffb0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The three-day Nextflow Summit featured 33 talks delivered by speakers from academia, research, healthcare providers, biotechs, and cloud providers. This year’s speakers came from the following organizations:", + "_key": "1c06aeb25e22" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "891f24ac9759", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "96406925d9f1", + "_type": "span" + } + ] + }, + { + "_key": "bfa744343fed", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "2dfaad932e110", + "_type": "span", + "marks": [], + "text": "Amazon Web Services" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Center for Genomic Regulation", + "_key": "c9c4d148b0360" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d58d183f44d7", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "752498accb4f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "9b8089cffee80", + "_type": "span", + "marks": [], + "text": "Centre for Molecular Medicine and Therapeutics, University of British Columbia" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "ed495aa254d60", + "_type": "span", + "marks": [], + "text": "Chan Zukerberg Biohub" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2324add317a7" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Curative", + "_key": "f27abf86cb950", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1dd1bc48f709" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "66e08c87c537", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "DNA Nexus", + "_key": "dbe36e3bb3490", + "_type": "span", + "marks": [] + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Enterome", + "_key": "eb8ff1c3ab5e0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9a240a2ffe60" + }, + { + "markDefs": [], + "children": [ + { + "text": "Google", + "_key": "25173e7060df0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "816f0b93da0b", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "f5261832f664", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Janelia Research Campus", + "_key": "e468c1d42dce0" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "bc7e1d78b82a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Microsoft", + "_key": "aaf66a0c668d0" + } + ] + }, + { + "style": "normal", + "_key": "ca2e839bd437", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Oxford Nanopore", + "_key": "2b9ca8d6468e0" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "b4393e64502e", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "5236eda420950", + "_type": "span", + "marks": [], + "text": "Quadram Institute BioScience" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "b8b1b26e300c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera Labs", + "_key": "a054fa8632710" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "2ba5acefbe25", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Quantitative Biology Center, University of Tübingen", + "_key": "cb66775426880", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "5fe07ca3d721", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "b56bf56265ce0", + "_type": "span", + "marks": [], + "text": "Quilt Data" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "cc3bf16a5f63", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "UNC Lineberger Comprehensive Cancer Center", + "_key": "9ba7f583a8700" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Università degli Studi di Macerata", + "_key": "636ca3fcc4e90", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "769e616ad36f", + "listItem": "bullet", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "University of Maryland", + "_key": "8951a49c80a00" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "08c869c9769b", + "listItem": "bullet", + "markDefs": [] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "66ee53233468", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wellcome Sanger Institute", + "_key": "bd6284a1797b0" + } + ] + }, + { + "style": "normal", + "_key": "a1caeb7d11e9", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wyoming Public Health Laboratory", + "_key": "65b20a7523b40" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "d43c3b892a0a", + "markDefs": [], + "children": [ + { + "_key": "dc014ab8789c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Some recurring themes", + "_key": "4e7cfbae198a", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "ddd92b066915" + }, + { + "style": "normal", + "_key": "b5f00e526e52", + "markDefs": [ + { + "href": "https://www.youtube.com/watch?v=JZMaRYzZxGU&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=8", + "_key": "d59005094735", + "_type": "link" + }, + { + "_key": "ba9d23fbef74", + "_type": "link", + "href": "https://www.youtube.com/watch?v=6jQr9dDaais&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=30" + } + ], + "children": [ + { + "text": "While there were too many excellent talks to cover individually, a few themes surfaced throughout the summit. Not surprisingly, SARS-Cov-2 was a thread that wound through several talks. Tony Zeljkovic from Curative led a discussion about ", + "_key": "ff5e8f1b703d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "d59005094735" + ], + "text": "unlocking automated bioinformatics for large-scale healthcare", + "_key": "6b82b6e7a0be" + }, + { + "_key": "2c835ddc3a0e", + "_type": "span", + "marks": [], + "text": ", and Thanh Le Viet of Quadram Institute Bioscience discussed " + }, + { + "_type": "span", + "marks": [ + "ba9d23fbef74" + ], + "text": "large-scale SARS-Cov-2 genomic surveillance at QIB", + "_key": "9d6f9b6a7b25" + }, + { + "_type": "span", + "marks": [], + "text": ". Several speakers discussed best practices for building portable, modular pipelines. Other common themes were data provenance & traceability, data management, and techniques to use compute and storage more efficiently. There were also a few talks about the importance of dataflows in new application areas outside of genomics and bioinformatics.", + "_key": "cbe4cf23e5dd" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "bfe8d0fc1c2f", + "markDefs": [], + "children": [ + { + "_key": "d676a8a94271", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "h2", + "_key": "effd4e0bce5e", + "markDefs": [], + "children": [ + { + "_key": "2bce3ca16202", + "_type": "span", + "marks": [], + "text": "Data provenance tracking" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the Thursday morning keynote, Rob Patro﹘Associate Professor at the University of Maryland Dept. of Computer Science and CTO and co-founder of Ocean Genomics﹘described in his talk “", + "_key": "0036463307f1" + }, + { + "text": "What could be next(flow)", + "_key": "9fb42dd454d4", + "_type": "span", + "marks": [ + "7235bb8e8a10" + ] + }, + { + "text": ",” how far the Nextflow community had come in solving problems such as reproducibility, scalability, modularity, and ease of use. He then challenged the community with some complex issues still waiting in the wings. He focused on data provenance as a particularly vexing challenge explaining how tremendous effort currently goes into manual metadata curation.", + "_key": "61b64a0b077e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "29a500bce005", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=vNrKFT5eT8U&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=6", + "_key": "7235bb8e8a10" + } + ] + }, + { + "style": "normal", + "_key": "b3602d99db7b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "105c23d72889", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0234b3f8e11e", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/mikelove/tximeta", + "_key": "54287a6db681" + } + ], + "children": [ + { + "marks": [], + "text": "Rob offered suggestions about how Nextflow might evolve, and coined the term “augmented execution contexts” (AECs) drawing from his work on provenance tracking – answering questions such as “what are these files, and where did they come from.” This thinking is reflected in ", + "_key": "9084ab38be82", + "_type": "span" + }, + { + "_key": "c0b44e6d1afa", + "_type": "span", + "marks": [ + "54287a6db681" + ], + "text": "tximeta" + }, + { + "text": ", a project co-developed with Mike Love of UNC. Rob also proposed ideas around automating data format conversions analogous to type casting in programming languages explaining how such conversions might be built into Nextflow channels to make pipelines more interoperable.", + "_key": "db8a883fe0b0", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3e251a113435", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "85272edf2ce0", + "_type": "span" + } + ] + }, + { + "_key": "961f1840c7c7", + "markDefs": [ + { + "href": "https://www.youtube.com/watch?v=dttkcuP3OBc&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=13", + "_key": "ce30c81006aa", + "_type": "link" + }, + { + "_key": "5149dc32e277", + "_type": "link", + "href": "https://www.youtube.com/watch?v=RIwpJTDlLiE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=21" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In his talk with the clever title “", + "_key": "62a9a45df553" + }, + { + "_type": "span", + "marks": [ + "ce30c81006aa" + ], + "text": "one link to rule them all", + "_key": "a02e83090f94" + }, + { + "marks": [], + "text": ",” Aneesh Karve of Quilt explained how every pipeline run is a function of the code, environment, and data, and went on to show how Quilt could help dramatically simplify data management with dataset versioning, accessibility, and verifiability. Data provenance and traceability were also front and center when Yih-Chii Hwang of DNAnexus described her team’s work around ", + "_key": "a84e40b40b3c", + "_type": "span" + }, + { + "marks": [ + "5149dc32e277" + ], + "text": "bringing GxP compliance to Nextflow workflows", + "_key": "9afbbcaa9787", + "_type": "span" + }, + { + "_key": "6e0b19c0be34", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "42dbd9c84ea5", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "cef5e9663e1c", + "markDefs": [] + }, + { + "style": "h2", + "_key": "dea55ecc40b4", + "markDefs": [], + "children": [ + { + "text": "Data management and storage", + "_key": "41e333c21612", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "decf4fc8b1e0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=VXtYCAqGEQQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=12", + "_key": "18f76521cd36" + }, + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=jB91uqUqsRM&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=9", + "_key": "57b564263316" + }, + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=GAIL8ZAMJPQ&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=20", + "_key": "6cca4185ac4c" + }, + { + "href": "https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14", + "_key": "e8644f6b706c", + "_type": "link" + } + ], + "children": [ + { + "text": "Other speakers also talked about challenges related to data management and performance. Angel Pizarro of AWS gave an interesting talk comparing the ", + "_key": "833ee14fd864", + "_type": "span", + "marks": [] + }, + { + "text": "price/performance of different AWS cloud storage options", + "_key": "0ed96747561b", + "_type": "span", + "marks": [ + "18f76521cd36" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". ", + "_key": "371c9f0a8c90" + }, + { + "marks": [ + "57b564263316" + ], + "text": "Hatem Nawar", + "_key": "b9ce34241727", + "_type": "span" + }, + { + "text": " (Google) and ", + "_key": "337bfa6be3de", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "6cca4185ac4c" + ], + "text": "Venkat Malladi", + "_key": "ccac0113ac93", + "_type": "span" + }, + { + "_key": "331a022e1eca", + "_type": "span", + "marks": [], + "text": " (Microsoft) also talked about cloud economics and various approaches to data handling in their respective clouds. Data management was also a key part of Evan Floden’s discussion about Nextflow Tower where he discussed Tower Datasets, as well as the various cloud storage options accessible through Nextflow Tower. Finally, Nextflow creator Paolo Di Tommaso unveiled new work being done in Nextflow to simplify access to data residing in object stores in his talk “" + }, + { + "_key": "74eac5941145", + "_type": "span", + "marks": [ + "e8644f6b706c" + ], + "text": "Nextflow and the future of containers" + }, + { + "_type": "span", + "marks": [], + "text": "”.", + "_key": "3b316bec6bbe" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "8e851747e5f0", + "markDefs": [], + "children": [ + { + "_key": "8056d89d0907", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Compute optimization", + "_key": "9204efcb54f5", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "095367b1f73a" + }, + { + "children": [ + { + "_key": "8ad370b4d116", + "_type": "span", + "marks": [], + "text": "Another recurring theme was improving compute efficiency. Several talks discussed using containers more effectively, leveraging GPUs & FPGAs for added performance, improving virtual machine instance type selection, and automating resource requirements. Mike Smoot of Illumina talked about Nextflow, Kubernetes, and DRAGENs and how Illumina’s FPGA-based Bio-IT Platform can dramatically accelerate analysis. Venkat Malladi discussed efforts to suggest optimal VM types based on different standardized nf-core labels in the Azure cloud (process_low, process_medium, process_high, etc.) Finally, Evan Floden discussed " + }, + { + "marks": [ + "c1508869d6cf" + ], + "text": "Nextflow Tower", + "_key": "9d4f93c7d98e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and unveiled an exciting new ", + "_key": "576e19846810" + }, + { + "_key": "4a65fd2a5b64", + "_type": "span", + "marks": [ + "49fe14ab4113" + ], + "text": "resource optimization feature" + }, + { + "text": " that can intelligently tune pipeline resource requests to radically reduce cloud costs and improve run speed. Overall, the Nextflow community continues to make giant strides in improving efficiency and managing costs in the cloud.", + "_key": "c8bac41fe1d8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "30f332cd9616", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22", + "_key": "c1508869d6cf" + }, + { + "href": "https://seqera.io/blog/optimizing-resource-usage-with-nextflow-tower/", + "_key": "49fe14ab4113", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "07a0c854f649" + } + ], + "_type": "block", + "style": "normal", + "_key": "cd5fb1ac19b5" + }, + { + "style": "h2", + "_key": "b88850e8f308", + "markDefs": [], + "children": [ + { + "_key": "ecf8e0944b90", + "_type": "span", + "marks": [], + "text": "Beyond genomics" + } + ], + "_type": "block" + }, + { + "_key": "acf5c73047ce", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=PlKJ0IDV_ds&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=27", + "_key": "8bad333b58ab" + }, + { + "href": "https://www.youtube.com/watch?v=ZjSzx1I76z0&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=18", + "_key": "728e8b287d1a", + "_type": "link" + } + ], + "children": [ + { + "_key": "a9ad15894fcb", + "_type": "span", + "marks": [], + "text": "While most summit speakers focused on genomics, a few discussed data pipelines in other areas, including statistical modeling, analysis, and machine learning. Nicola Visonà from Università degli Studi di Macerata gave a fascinating talk about " + }, + { + "marks": [ + "8bad333b58ab" + ], + "text": "using agent-based models to simulate the first industrial revolution", + "_key": "f89985811118", + "_type": "span" + }, + { + "text": ". Similarly, Konrad Rokicki from the Janelia Research Campus explained how Janelia are using ", + "_key": "563ecae3dba2", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "728e8b287d1a" + ], + "text": "Nextflow for petascale bioimaging data", + "_key": "012e7608919d", + "_type": "span" + }, + { + "_key": "b877a53c323e", + "_type": "span", + "marks": [], + "text": " and why bioimage processing remains a large domain area with an unmet need for reproducible workflows." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "1ce3ebe99fd4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "72eb25131d00" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "96cc242fb858", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Summit announcements", + "_key": "d23d9236cb58", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=PTbiCVq0-sE&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=14", + "_key": "9e481f1f638b" + }, + { + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v22.10.0", + "_key": "762b988a5f1d", + "_type": "link" + }, + { + "href": "https://www.nextflow.io/docs/latest/fusion.html", + "_key": "1fa4eb09bf58", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "This year’s summit also saw several exciting announcements from Nextflow developers. Paolo Di Tommaso, during his talk on ", + "_key": "35879ea66e6d", + "_type": "span" + }, + { + "marks": [ + "9e481f1f638b" + ], + "text": "the future of containers", + "_key": "f3036598476f", + "_type": "span" + }, + { + "marks": [], + "text": ", announced the availability of ", + "_key": "53855238a56f", + "_type": "span" + }, + { + "marks": [ + "762b988a5f1d" + ], + "text": "Nextflow 22.10.0", + "_key": "73136653c7bd", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". In addition to various bug fixes, the latest Nextflow release introduces an exciting new technology called Wave that allows containers to be built on the fly from Dockerfiles or Conda recipes saved within a Nextflow pipeline. Wave also helps to simplify containerized pipeline deployment with features such as “container augmentation”; enabling developers to inject new container scripts and functionality on the fly without needing to rebuild the base containers such as a cloud-native ", + "_key": "4a7ddad2bb94" + }, + { + "_type": "span", + "marks": [ + "1fa4eb09bf58" + ], + "text": "Fusion file system", + "_key": "de8e7cf5ec6e" + }, + { + "text": ". When used with Nextflow Tower, Wave also simplifies authentication to various public and private container registries. The latest Nextflow release also brings improved support for Kubernetes and enhancements to documentation, along with many other features.", + "_key": "7425f47ae8d5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7e954fc21d27" + }, + { + "_type": "block", + "style": "normal", + "_key": "f76fad37c954", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5b2362f9c501" + } + ] + }, + { + "_key": "db0a98e231fe", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=yJpN3fRSClA&list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32&index=22&t=127s", + "_key": "d7613591b2c5" + } + ], + "children": [ + { + "marks": [], + "text": "Several other announcements were made during ", + "_key": "c72ea5e586e8", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "d7613591b2c5" + ], + "text": "Evan Floden’s talk", + "_key": "46c031e8d66d" + }, + { + "marks": [], + "text": ", such as:", + "_key": "34f63b71a922", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "eadcedc9bb67", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "500cdd4fe277", + "markDefs": [] + }, + { + "style": "normal", + "_key": "994e99a767ce", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "MultiQC is joining the Seqera Labs family of products", + "_key": "4f3732395dc30", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fusion – a distributed virtual file system for cloud-native data pipelines", + "_key": "c66b6170df6a0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2e3c783dd8ca" + }, + { + "markDefs": [], + "children": [ + { + "text": "Nextflow Tower support for Google Cloud Batch", + "_key": "bc48ea1b71690", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "31fc6503004a", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "43a1d72a9b4a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Nextflow Tower resource optimization", + "_key": "8ae79a8409580", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "91384f041e83", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Improved Resource Labels support in Tower with integrations for cost accounting with all major cloud providers", + "_key": "75cde74caaee0" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A new Nextflow Tower dashboard coming soon, providing visibility across workspaces", + "_key": "e5738db90dd10" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "cca4277f9b8c", + "listItem": "bullet" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "5b75b9963fc0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "35a209a3e420", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "29ae1e1c3d52", + "markDefs": [], + "children": [ + { + "text": "Thank you to our sponsors", + "_key": "3d68d73c7163", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "e19ab76e7005", + "markDefs": [ + { + "_type": "link", + "href": "https://chanzuckerberg.com/eoss/", + "_key": "240ae9ff3367" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The summit organizers wish to extend a sincere thank you to the event sponsors: AWS, Google Cloud, Seqera Labs, Quilt Data, Oxford Nanopore Technologies, and Element BioSciences. In addition, the ", + "_key": "5dcb52f1a634" + }, + { + "marks": [ + "240ae9ff3367" + ], + "text": "Chan Zuckerberg Initiative", + "_key": "975e925bdb1d", + "_type": "span" + }, + { + "marks": [], + "text": " continues to play a key role with their EOSS grants funding important work related to Nextflow and the nf-core community. The success of this year’s summit reminds us of the tremendous value of community and the critical impact of open science software in improving the quality, accessibility, and efficiency of scientific research.", + "_key": "40821f695a7a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "77e95406eae2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0b83eebc3c5c" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Learning more", + "_key": "b6a7364bbcaa", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "4c21a0dd5061" + }, + { + "style": "normal", + "_key": "80233c4ed388", + "markDefs": [], + "children": [ + { + "_key": "6fe74738246d", + "_type": "span", + "marks": [], + "text": "For anyone who missed the summit, you can still watch the sessions or view the training sessions at your convenience:" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "08789df2e782", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b39a94a2ec3a" + }, + { + "_key": "455e9c24ff4e", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Watch post-event recordings of the [Nextflow Summit on YouTube](https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUdAJlHowo7zL2pN3x97d32)View replays of the recent online [Nextflow and nf-core training](https://nf-co.re/events/2022/training-october-2022)", + "_key": "56da66efa8ce" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "74c06441f5db" + } + ], + "_type": "block", + "style": "normal", + "_key": "e186986f019b" + }, + { + "markDefs": [ + { + "_key": "9efb67eaaa9c", + "_type": "link", + "href": "https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/" + }, + { + "href": "https://mribeirodantas.xyz/blog/index.php/2022/10/27/nextflow-and-nf-core-hot-news/", + "_key": "d81331143b3f", + "_type": "link" + } + ], + "children": [ + { + "text": "For additional detail on the summit and the preceding nf-core events, also check out an excellent ", + "_key": "97e3e98c4819", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "9efb67eaaa9c" + ], + "text": "summary of the event", + "_key": "9c1266d959e2" + }, + { + "_key": "8030c4852159", + "_type": "span", + "marks": [], + "text": " written by Marcel Ribeiro-Dantas in his blog, the " + }, + { + "_type": "span", + "marks": [ + "d81331143b3f" + ], + "text": "Dataist Storyteller", + "_key": "15eb86c2fb43" + }, + { + "text": "!", + "_key": "91d272557a18", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8970bc1ad207" + }, + { + "_type": "block", + "style": "normal", + "_key": "1dbb38bb4a49", + "markDefs": [], + "children": [ + { + "_key": "0bbf4ed07388", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges.", + "_key": "b40c29a6fbf6" + } + ], + "_type": "block", + "style": "normal", + "_key": "abee4ab67ab0" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6ce2677ea1cc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2a3da94a1407" + }, + { + "alt": "grant logos", + "_key": "5cc3296ab3bc", + "asset": { + "_type": "reference", + "_ref": "image-df17e8a21b15056284176b5b0a510e2e1d265850-1146x128-png" + }, + "_type": "image" + } + ], + "meta": { + "slug": { + "current": "nextflow-summit-2022-recap" + }, + "description": "After a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!" + }, + "_createdAt": "2024-09-25T14:16:41Z", + "_id": "32c821d8c7d0", + "author": { + "_ref": "noel-ortiz", + "_type": "reference" + }, + "_type": "blogPost", + "_updatedAt": "2024-09-30T08:56:57Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "4a02b5685b49" + }, + { + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference", + "_key": "d74d9ad34186" + } + ], + "publishedAt": "2022-11-03T07:00:00.000Z", + "_rev": "mvya9zzDXWakVjnX4hhGHO" + }, + { + "publishedAt": "2023-04-17T06:00:00.000Z", + "_type": "blogPost", + "tags": [ + { + "_type": "reference", + "_key": "415f09645498", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "body": [ + { + "children": [ + { + "marks": [], + "text": "Introduction", + "_key": "3e52c261031e", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "525bb3f03bf7", + "markDefs": [] + }, + { + "alignment": "right", + "asset": { + "_type": "image", + "asset": { + "_type": "reference", + "_ref": "image-296ee97517ad882c11b8a7e1b2eb7e03b530f129-2000x2307-webp" + } + }, + "size": "small", + "_type": "picture", + "alt": "Mentorship rocket", + "_key": "6a6b9272367a" + }, + { + "_type": "block", + "style": "normal", + "_key": "3b4aff33b6ff", + "markDefs": [], + "children": [ + { + "text": "The global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation.", + "_key": "35a380c1c58e", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "216db291d6ea", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Thanks to the support of our Chan Zuckerberg Initiative Diversity and Inclusion grant, we established an international Nextflow and nf-core mentoring program. With the second round of the mentorship program now complete, we celebrate the success of the most recent cohort of mentors and mentees.", + "_key": "68d5ed9530ea", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1f050baf9052", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "344cb78b22a0" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "From hundreds of applications, thirteen pairs of mentors and mentees were chosen for the second round of the program. For the past four months, they met regularly to collaborate on Nextflow or nf-core projects. The project scope was left up to the mentees, enabling them to work on any project aligned with their scientific interests and schedules.", + "_key": "7e87203711fe" + } + ], + "_type": "block", + "style": "normal", + "_key": "48ffd04f48ea", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1f24ae310b30", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b675187d6125" + } + ] + }, + { + "_key": "64200c4077bb", + "markDefs": [], + "children": [ + { + "text": "Mentor-mentee pairs worked on a range of projects that included learning Nextflow and nf-core fundamentals, setting up Nextflow on their institutional clusters, translating Nextflow training material into other languages, and developing and implementing Nextflow and nf-core pipelines. Impressively, despite many mentees starting the program with very limited knowledge of Nextflow and nf-core, they completed the program with confidence and improved their abilities to develop and implement scalable and reproducible scientific workflows.", + "_key": "85dcf7c955f6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0878c8ec6100", + "markDefs": [], + "children": [ + { + "_key": "e1d4a35799b2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "2ec1a88e080d", + "asset": { + "_type": "reference", + "_ref": "image-4f422c7bc58251fb76e6c1142f2faa64175bebd3-3308x1500-png" + }, + "_type": "image", + "alt": "Map of mentor and mentee pairs" + }, + { + "_type": "block", + "style": "normal", + "_key": "98caa9223294", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4b95c7706d2b" + } + ] + }, + { + "style": "h2", + "_key": "54eee158ee59", + "markDefs": [], + "children": [ + { + "text": "Jing Lu (Mentee) & Moritz Beber (Mentor)", + "_key": "ce57dfd53a7f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Jing joined the program with the goal of learning how to develop advanced Nextflow pipelines for disease surveillance at the Guangdong Provincial Center for Diseases Control and Prevention in China. His mentor was Moritz Beber from Denmark.", + "_key": "7b46fe6b3e82", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a6e48d8e3010", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1b869774f778" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2386f7069d0", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Together, Jing and Moritz developed a pipeline for the analysis of SARS-CoV-2 genomes from sewage samples. They also used GitHub and docker containers to make the pipeline more sharable and reproducible. In the future, Jing hopes to use Nextflow Tower to share the pipeline with other institutions.", + "_key": "b641a4be0ba2" + } + ], + "_type": "block", + "style": "normal", + "_key": "97b70c49ad3c", + "markDefs": [] + }, + { + "style": "normal", + "_key": "48e9a89e7de9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "747dd15bdba8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Luria Leslie Founou (Mentee) & Sebastian Malkusch (Mentor)", + "_key": "d03e0749b84e" + } + ], + "_type": "block", + "style": "h2", + "_key": "d2057ac26668", + "markDefs": [] + }, + { + "children": [ + { + "_key": "b040a1966338", + "_type": "span", + "marks": [], + "text": "Luria's goal was to accelerate her understanding of Nextflow and apply it to her exploration of the resistome, virulome, mobilome, and phylogeny of bacteria at the Research Centre of Expertise and Biological Diagnostic of Cameroon. Luria was mentored by Sebastian Malkusch, Kolja Becker, and Alex Peltzer from the Boehringer Ingelheim Pharma GmbH & Co. KG in Germany." + } + ], + "_type": "block", + "style": "normal", + "_key": "5f7459fdfb81", + "markDefs": [] + }, + { + "_key": "5ce760f8fac9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5e7b63d864b1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/SMLMS/nfml", + "_key": "44b1d8337e8e" + } + ], + "children": [ + { + "marks": [], + "text": "For their project, Luria and her mentors developed a ", + "_key": "2d4c45448d98", + "_type": "span" + }, + { + "marks": [ + "44b1d8337e8e" + ], + "text": "pipeline", + "_key": "57d76ea62cd4", + "_type": "span" + }, + { + "marks": [], + "text": " for mapping multi-dimensional feature space onto a discrete or continuous response variable by using multivariate models from the field of classical machine learning. Their pipeline will be able to handle classification, regression, and time-to-event models and can be used for model training, validation, and feature selection.", + "_key": "18e6eac5a7f1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "497a5ac365e7" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "12dc3883a7b7" + } + ], + "_type": "block", + "style": "normal", + "_key": "76eb69a98c80" + }, + { + "_key": "5dd88399395c", + "markDefs": [], + "children": [ + { + "_key": "50d46d8c26ca", + "_type": "span", + "marks": [], + "text": "Sebastian Musundi (Mentee) & Athanasios Baltzis (Mentor)" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "fe3ae02a69fe", + "_type": "span", + "marks": [], + "text": "Sebastian, from Mount Kenya University in Kenya, joined the mentorship program with the goal of using Nextflow pipelines to identify vaccine targets in Apicomplexan parasites. He was mentored by Athanasios Balzis from the Centre for Genomic Regulation in Spain." + } + ], + "_type": "block", + "style": "normal", + "_key": "ac49f1547b11" + }, + { + "_key": "4c2fffb4a79a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e323452dd9c2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3748614cc072", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/sebymusundi/simple_RNA-seq", + "_key": "3ebc603c0d0c" + }, + { + "href": "https://github.com/sebymusundi/AMR_pipeline", + "_key": "788f59bfb7d9", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With Athanasios’s help, Sebastian learned the fundamentals for developing Nextflow pipelines. During the learning process, they developed a ", + "_key": "3eac53f884e6" + }, + { + "_key": "9326762ec156", + "_type": "span", + "marks": [ + "3ebc603c0d0c" + ], + "text": "pipeline" + }, + { + "marks": [], + "text": " for customized RNA sequencing and a ", + "_key": "109a3ca7c7d9", + "_type": "span" + }, + { + "text": "pipeline", + "_key": "363071683a14", + "_type": "span", + "marks": [ + "788f59bfb7d9" + ] + }, + { + "text": " for predicting antimicrobial resistance genes. With his new skills, Sebastian plans to keep using Nextflow on a daily basis and start contributing to nf-core.", + "_key": "d19a827ee592", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "d5f44ed008a0", + "markDefs": [], + "children": [ + { + "_key": "086f43a60427", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "069d80655355", + "_type": "span", + "marks": [], + "text": "Juan Ugalde (Mentee) & Robert Petit (Mentor)" + } + ], + "_type": "block", + "style": "h2", + "_key": "636e38f0205a", + "markDefs": [] + }, + { + "_key": "637e6c9e9a35", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Juan joined the mentorship program with the goal of improving his understanding of Nextflow to support microbial and viral analysis at the Universidad Andres Bello in Chile. Juan was mentored by Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert is an experienced Nextflow mentor who also mentored in Round 1 of the program.", + "_key": "4486c7293ca1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ad9a0d8a7e1f", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5beb6c00b10" + }, + { + "children": [ + { + "_key": "573f63cca5c4", + "_type": "span", + "marks": [], + "text": "Juan and Robert shared an interest in viral genomics. After learning more about the Nextflow and nf-core ecosystem, Robert mentored Juan as he developed a Nextflow viral amplicon analysis " + }, + { + "text": "pipeline", + "_key": "f3526da1c15f", + "_type": "span", + "marks": [ + "8207c0b28e6a" + ] + }, + { + "_key": "d275e40ff977", + "_type": "span", + "marks": [], + "text": ". Juan will continue his Nextflow and nf-core journey by sharing his new knowledge with his group and incorporating it into his classes in the coming semester." + } + ], + "_type": "block", + "style": "normal", + "_key": "6367817fbd52", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/gene2dis/hantaflow", + "_key": "8207c0b28e6a" + } + ] + }, + { + "style": "normal", + "_key": "4fd443d9ccee", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ec1acdc2f95f" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Bhargava Reddy Morampalli (Mentee) & Venkat Malladi (Mentor)", + "_key": "f985659e0e7f" + } + ], + "_type": "block", + "style": "h2", + "_key": "d4aee9cbea63", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Bhargava studies at Massey University in New Zealand and joined the program with the goal of improving his understanding of Nextflow and resolving issues he was facing while developing a pipeline to analyze Nanopore direct RNA sequencing data. Bhargava was mentored by Venkat Malladi from Microsoft in the USA.", + "_key": "8610aaef74ee", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a633f3dfae2f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "bfb01482681b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b25e6994be3" + }, + { + "_type": "block", + "style": "normal", + "_key": "8d486e62a86e", + "markDefs": [ + { + "_key": "650af6107ea0", + "_type": "link", + "href": "https://github.com/bhargava-morampalli/rnamods-nf/" + } + ], + "children": [ + { + "_key": "20c95e7d8d0f", + "_type": "span", + "marks": [], + "text": "Bhargava and Venkat worked on Bhargava’s " + }, + { + "text": "pipeline", + "_key": "c66e252923f9", + "_type": "span", + "marks": [ + "650af6107ea0" + ] + }, + { + "_key": "ee0b224f2330", + "_type": "span", + "marks": [], + "text": " to identify RNA modifications from bacteria. Their successes included advancing the pipeline and making Singularity images for the tools Bhargava was using to make it more reproducible. For Bhargava, the mentorship program was a great kickstart for learning Nextflow and his pipeline development. He hopes to continue to develop his pipeline and optimize it for cloud platforms in the future." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "4228ee80c75c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "343a89b1c156" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "56213ac2dd95", + "markDefs": [], + "children": [ + { + "_key": "59e2c600143a", + "_type": "span", + "marks": [], + "text": "Odion Ikhimiukor (Mentee) & Ben Sherman (Mentor)" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Before the program, Odion, who is at the University at Albany in the USA, was new to Nextflow and nf-core. He joined the program with the goal of improving his understanding and to learn how to develop pipelines for bacterial genome analysis. His mentor Ben Sherman works for Seqera Labs in the USA.", + "_key": "6e039be53409", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "770ee0b93069" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fa27234dfbce" + } + ], + "_type": "block", + "style": "normal", + "_key": "2166349efa1a" + }, + { + "style": "normal", + "_key": "668189e17022", + "markDefs": [ + { + "href": "https://github.com/odionikh/nf-practice", + "_key": "e225da208116", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "During the program Odion and Ben developed a ", + "_key": "91653e2d2cd2" + }, + { + "_type": "span", + "marks": [ + "e225da208116" + ], + "text": "pipeline", + "_key": "6a1767a75240" + }, + { + "text": " to analyze bacterial genomes for antimicrobial resistance surveillance. They also developed configuration settings to enable the deployment of their pipeline with high and low resources. Odion has plans to share his new knowledge with others in his community.", + "_key": "2afb1fddf1e6", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "471d6a74dfbb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "00778bf1fa88" + }, + { + "children": [ + { + "_key": "aa11a588c358", + "_type": "span", + "marks": [], + "text": "Batool Almarzouq (Mentee) & Murray Wham (Mentor)" + } + ], + "_type": "block", + "style": "h2", + "_key": "748a135720d6", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Batool works at the King Abdullah International Medical Research Center in Saudi Arabia. Her goal for the mentorship program was to contribute to, and develop, nf-core pipelines. Additionally, she aimed to develop new educational resources for nf-core that can support researchers from lowly represented groups. Her mentor was Murray Wham from the ​​University of Edinburgh in the UK.", + "_key": "df4f8c8527fd" + } + ], + "_type": "block", + "style": "normal", + "_key": "434c57438402" + }, + { + "_type": "block", + "style": "normal", + "_key": "48d30e09742e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4d6e7f6accac" + } + ] + }, + { + "style": "normal", + "_key": "db95ac931c23", + "markDefs": [], + "children": [ + { + "text": "During the mentorship program, Murray helped Batool develop her molecular dynamics pipeline and participate in the 1st Biohackathon in MENA (KAUST). Batool and Murray also found ways to make documentation more accessible and are actively promoting Nextlfow and nf-core in Saudi Arabia.", + "_key": "310188dcc9c8", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "14458002b981", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "91b6add5e26f", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Mariama Telly Diallo (Mentee) & Emilio Garcia (Mentor)", + "_key": "f921fb338fcb", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "ca49b2be4552" + }, + { + "_key": "a57da0a5052e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Mariama Telly joined the mentorship program with the goal of developing and implementing Nextflow pipelines for malaria research at the Medical Research Unit at The London School of Hygiene and Tropical Medicine in Gambia. She was mentored by Emilio Garcia from Platomics in Austria. Emilio is another experienced mentor who joined the program for a second time.", + "_key": "b4a2d00dcde1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1d5ae90ba598" + } + ], + "_type": "block", + "style": "normal", + "_key": "470440f42c11" + }, + { + "style": "normal", + "_key": "9b311f907f45", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Together, Mariama Telly and Emilio worked on learning the basics of Nextflow, Git, and Docker. Putting these skills into practice they started to develop a Nextflow pipeline with a docker file and custom configuration. Mariama Telly greatly improved her understanding of best practices and Nextflow and intends to use her newfound knowledge for future projects.", + "_key": "b9e78ee7b1a4" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "c414696fe436", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0128e794552d", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Anabella Trigila (Mentee) & Matthias De Smet (Mentor)", + "_key": "88b225aa892b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "39ea4f40a7bf" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Anabella’s goal was to set up Nextflow on her institutional cluster at Héritas S.A. in Argentina and translate some bash pipelines into Nextflow pipelines. Anabella was mentored by Matthias De Smet from Ghent University in Belgium.", + "_key": "1fd1937bac6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "59a70fdae005", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "5d7b3779b31d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fb07e48878ca", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Anabella and Matthias worked on developing several new nf-core modules. Extending this, they started the development of a ", + "_key": "f5d3d35ef216" + }, + { + "marks": [ + "e8fa315ad585" + ], + "text": "pipeline", + "_key": "af1614ab86f5", + "_type": "span" + }, + { + "text": " to process VCFs obtained from saliva samples and a ", + "_key": "bf31ed761697", + "_type": "span", + "marks": [] + }, + { + "_key": "f887351486a9", + "_type": "span", + "marks": [ + "735bd3e7c79e" + ], + "text": "pipeline" + }, + { + "_type": "span", + "marks": [], + "text": " to infer ancestry from VCF samples. Anabella has now transitioned from a user to a developer and made multiple contributions to the most recent nf-core hackathon. She also contributed to the Spanish translation of the Nextflow ", + "_key": "d77d73bd125a" + }, + { + "_type": "span", + "marks": [ + "299997f08407" + ], + "text": "training material", + "_key": "1bd5cfd1f20b" + }, + { + "marks": [], + "text": ".", + "_key": "064c8e43c9ff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2b17f296e8d4", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/atrigila/nf-core-saliva", + "_key": "e8fa315ad585" + }, + { + "_type": "link", + "href": "https://github.com/atrigila/nf-core-ancestry", + "_key": "735bd3e7c79e" + }, + { + "href": "https://training.nextflow.io/es/", + "_key": "299997f08407", + "_type": "link" + } + ] + }, + { + "style": "normal", + "_key": "ac19729b9b83", + "markDefs": [], + "children": [ + { + "_key": "3e9a6f12161a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a3b8430d7b40", + "_type": "span", + "marks": [], + "text": "Juliano de Oliveira Silveira (Mentee) & Maxime Garcia (Mentor)" + } + ], + "_type": "block", + "style": "h2", + "_key": "f8cc67208122" + }, + { + "_key": "4aab67d86676", + "markDefs": [], + "children": [ + { + "_key": "3139d82fd160", + "_type": "span", + "marks": [], + "text": "Juliano works at the Laboratório Central de Saúde Pública RS in Brazil. He joined the program with the goal of setting up Nextflow at his institution, which led him to learn to write his own pipelines. Juliano was mentored by Maxime Garcia from Seqera Labs in Sweden." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "65c413e0302d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "cac1ea80b1fa", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "aac0ea80ffe6", + "_type": "span", + "marks": [], + "text": "Juliano and Maxime worked on learning about Nextflow and nf-core. Juliano applied his new skills to an open-source bioinformatics program that used Nextflow with a customized R script. Juliano hopes to give back to the wider community and peers in Brazil." + } + ], + "_type": "block", + "style": "normal", + "_key": "dbff314cfc83" + }, + { + "_key": "4e960ca3ecd4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6211c11e7953" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Patricia Agudelo-Romero (Mentee) & Abhinav Sharma (Mentor)", + "_key": "49549cdb8bd6" + } + ], + "_type": "block", + "style": "h2", + "_key": "91b0a258937b", + "markDefs": [] + }, + { + "children": [ + { + "_key": "d89e8eab154f", + "_type": "span", + "marks": [], + "text": "Patricia's goal was to create, customize, and deploy nf-core pipelines at the Telethon Kids Institute in Australia. Her mentor was Abhinav Sharma from Stellenbosch University in South Africa." + } + ], + "_type": "block", + "style": "normal", + "_key": "97d2d804cee7", + "markDefs": [] + }, + { + "_key": "177216cd84dc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f07021f77d50" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/agudeloromero/everest_nf", + "_key": "6ea283d18044" + } + ], + "children": [ + { + "text": "Abhinav helped Patricia learn how to write reproducible pipelines with Nextflow and how to work with shared code repositories on GitHub. With Abhinav's support, Patricia worked on translating a Snakemake ", + "_key": "51c0cbfd0f0e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "6ea283d18044" + ], + "text": "pipeline", + "_key": "0a743f4c01e0" + }, + { + "marks": [], + "text": " designed for genome virus identification and classification into Nextflow. Patricia is already applying her new skills and supporting others at her institute as they adopt Nextflow.", + "_key": "83ec3af7880e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "dd5062b3d80a" + }, + { + "_key": "3416df01b2e1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bd1b4ecb89d0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9316d9ccae05", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Mariana Guilardi (Mentee) & Alyssa Briggs (Mentor)", + "_key": "8e0eb1ce4cae", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "c02510e8c436", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Mariana’s goal was to learn the fundamentals of Nextflow, construct and run pipelines, and help with nf-core pipeline development. Her mentor was Alyssa Briggs from the University of Texas at Dallas in the USA", + "_key": "4e96508b14d9" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d48795156be5" + } + ], + "_type": "block", + "style": "normal", + "_key": "dd61070c1b15" + }, + { + "_type": "block", + "style": "normal", + "_key": "e628f4a0baf3", + "markDefs": [ + { + "_key": "6bd4fc70e346", + "_type": "link", + "href": "https://github.com/nf-core/viralintegration" + }, + { + "_type": "link", + "href": "https://training.nextflow.io/pt/", + "_key": "fb7b816da387" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "At the start of the program, Alyssa helped Mariana learn the fundamentals of Nextflow. With Alyssa’s help, Mariana’s skills progressed rapidly and by the end of the program, they were running pipelines and developing new nf-core modules and the ", + "_key": "92faca3df278" + }, + { + "marks": [ + "6bd4fc70e346" + ], + "text": "nf-core/viralintegration", + "_key": "44ba1014ada3", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline. Mariana also made community contributions to the Portuguese translation of the Nextflow ", + "_key": "007213b86568" + }, + { + "_key": "b70b14346aae", + "_type": "span", + "marks": [ + "fb7b816da387" + ], + "text": "training material" + }, + { + "marks": [], + "text": ".", + "_key": "2e486bbf0194", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7ca87a872535", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8295dab93b70" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "213a4681d259", + "_type": "span", + "marks": [], + "text": "Liliane Cavalcante (Mentee) & Marcel Ribeiro-Dantas (Mentor)" + } + ], + "_type": "block", + "style": "h2", + "_key": "e9379cc0d9a0" + }, + { + "style": "normal", + "_key": "98035bef980b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Liliane’s goal was to develop and apply Nextflow pipelines for genomic and epidemiological analyses at the Laboratório Central de Saúde Pública Noel Nutels in Brazil. Her mentor was Marcel Ribeiro-Dantas from Seqera Labs in Brazil.", + "_key": "395aa4c3cfa8" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "73974a46900d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b8ecebfbb35c" + }, + { + "_key": "36a1cc4a4a72", + "markDefs": [ + { + "href": "https://nf-co.re/viralrecon", + "_key": "55a09b251f0b", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Liliane and Marcel used Nextflow and nf-core to analyze SARS-CoV-2 genomes and demographic data for public health surveillance. They used the ", + "_key": "55def84c843f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "55a09b251f0b" + ], + "text": "nf-core/viralrecon", + "_key": "911929c07070" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline and made a new Nextflow script for additional analysis and generating graphs.", + "_key": "42780bb1907b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "11cfae6714ba" + } + ], + "_type": "block", + "style": "normal", + "_key": "de7df9a35912", + "markDefs": [] + }, + { + "style": "h2", + "_key": "ac1c90bbefff", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Conclusion", + "_key": "b933ccbeb02f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "65b391a5bb9e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As with the first round of the program, the feedback about the second round of the mentorship program was overwhelmingly positive. All mentees found the experience to be highly beneficial and were grateful for the opportunity to participate.", + "_key": "b5a3812d4e54", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "a30feefaf203", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "01de87bef0ac", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "“Having a mentor guide through the entire program was super cool. We worked all the way from the basics of Nextflow and learned a lot about developing and debugging pipelines. Today, I feel more confident than before in using Nextflow on a daily basis.”", + "_key": "9edeb6d4693c" + }, + { + "_key": "d242b1752439", + "_type": "span", + "marks": [], + "text": " - " + }, + { + "text": "Sebastian Musundi (Mentee)", + "_key": "1f8d311e8dd8", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "794fab9456c9", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "30e2a487c882", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4791eecec4b4" + }, + { + "children": [ + { + "text": "Similarly, the mentors also found the experience to be highly rewarding.", + "_key": "8f5dc24c47ef", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "086108767cde", + "markDefs": [] + }, + { + "style": "normal", + "_key": "645839f795c1", + "markDefs": [], + "children": [ + { + "_key": "426ba1c238c6", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "2438fa8874ed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "“As a mentor, I really enjoyed participating in the program. Not only did I have the chance to support and work with colleagues from lowly represented regions, but also I learned a lot and improved myself through the mentoring and teaching process.”", + "_key": "88ab924e5e1e" + }, + { + "text": " - ", + "_key": "311a5fbe03f2", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Athanasios Baltzis (Mentor)", + "_key": "4f34c8a51700" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "f15db0ab7909", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "270fa55ead93", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Importantly, all program participants expressed their willingness to encourage others to be part of it in the future.", + "_key": "df88028aaf56", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1c46929a4c0a", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5a4aafd2b225", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "fe2a12c53e8e" + }, + { + "_key": "9af78e3e136e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "“The mentorship allows mentees not only to learn nf-core/Nextflow but also a lot of aspects about open-source reproducible research. With your learning, at the end of the mentorship, you could even contribute back to the nf-core community, which is fantastic! I would tell everyone who is interested in the program to go for it.”", + "_key": "299fb30fd7fb" + }, + { + "text": " - ", + "_key": "8c943a4dc19c", + "_type": "span", + "marks": [] + }, + { + "text": "Anabella Trigila (Mentee)", + "_key": "ed6fb023a2e4", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "style": "normal", + "_key": "d1207f7d033c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "353fa6ea904c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "4ecd17d6fd43", + "markDefs": [], + "children": [ + { + "_key": "652f7863d8dc", + "_type": "span", + "marks": [], + "text": "As the Nextflow and nf-core communities continue to grow, the mentorship program will have long-lasting benefits beyond those that can be immediately measured. Mentees from the program have already become positive role models, contributing new perspectives to the broader community." + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "6fa2280cc5e3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d0edd8522db0", + "markDefs": [] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "87331f20c77e", + "markDefs": [], + "children": [ + { + "_key": "ce3fbe0207a6", + "_type": "span", + "marks": [ + "em" + ], + "text": "“I highly recommend this program. Independent if you are new to Nextflow or already have some experience, the possibility of working with amazing people to learn about the Nextflow ecosystem is invaluable. It helped me to improve my work, learn new things, and become confident enough to teach Nextflow to students.”" + }, + { + "_type": "span", + "marks": [], + "text": " - ", + "_key": "e493bfa57681" + }, + { + "_key": "3eb08e1c47d7", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Juan Ugalde (Mentee)" + } + ] + }, + { + "style": "normal", + "_key": "a870f6e1ab73", + "markDefs": [], + "children": [ + { + "_key": "d25d61e6ad30", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "389695383bdc", + "markDefs": [ + { + "href": "https://nf-co.re/mentorships", + "_key": "b207da55422c", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "We were delighted with the achievements of the mentors and mentees. Applications for the third round are now open! For more information, please visit ", + "_key": "bc23798d9ac4", + "_type": "span" + }, + { + "text": "https://nf-co.re/mentorships", + "_key": "0861ff1514cd", + "_type": "span", + "marks": [ + "b207da55422c" + ] + }, + { + "_key": "932b2531fca1", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + } + ], + "_createdAt": "2024-09-25T14:17:14Z", + "meta": { + "slug": { + "current": "czi-mentorship-round-2" + }, + "description": "The global Nextflow and nf-core community is thriving with strong engagement in several countries. As we continue to expand and grow, we remain committed to prioritizing inclusivity and actively reaching groups with low representation." + }, + "title": "Nextflow and nf-core Mentorship, Round 2", + "_id": "34143f856ae6", + "_updatedAt": "2024-10-14T10:31:38Z", + "_rev": "Ot9x7kyGeH5005E3MJ9OQ9", + "author": { + "_ref": "chris-hakkaart", + "_type": "reference" + } + }, + { + "_id": "3568f05cdd76", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWOx7", + "_type": "reference" + }, + "body": [ + { + "style": "normal", + "_key": "08eb8e28ce6c", + "markDefs": [], + "children": [ + { + "text": "From December 2022 to March 2023, I was part of the second cohort of the Nextflow and nf-core mentorship program, which spanned four months and attracted participants globally. I could not have anticipated the extent to which my participation in this program and the associated learning experiences would positively change my professional growth. The mentorship aims to foster collaboration, knowledge exchange, flexible learning, collaborative coding, and contributions to the nf-core community. It was funded by the Chan Zuckerberg Initiative and is guided by experienced mentors in the community. In the upcoming paragraphs, I'll be sharing more details about the program—its structure, the valuable learning experiences it brought, and the exciting opportunities it opened up for me.", + "_key": "c37e44a56e51", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "084f7bb139c4", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3da880238de0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "_key": "602f29bcb707" + }, + { + "children": [ + { + "_type": "span", + "text": "Meeting my mentor", + "_key": "c89aba3992cb" + } + ], + "_type": "block", + "style": "h1", + "_key": "31ac55a88c7d" + }, + { + "_key": "5e3fa09ebcf1", + "markDefs": [], + "children": [ + { + "text": "One of the most interesting aspects of the mentorship is that the program emphasizes that mentor-mentee pairs share research interests. In addition, the mentor should have significant experience in the areas where the mentee wants to develop. I found this extremely valuable, as it makes the program very flexible while also considering individual goals and interests. My goal as a mentee was to transition from a ", + "_key": "96b5206cc67c", + "_type": "span", + "marks": [] + }, + { + "_key": "192c49f808d6", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Nextflow user to a Nextflow developer" + }, + { + "marks": [], + "text": ".", + "_key": "d5e0c45af35a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d04bc5fdca31", + "children": [ + { + "text": "", + "_key": "cffea9af48cf", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "I was lucky enough to have Matthias De Smet as a mentor. He is a member of the Center for Medical Genetics in Ghent and has extensive experience working with open-source projects such as nf-core and Bioconda. His experience working in clinical genomics was a common ground for us to communicate, share experiences and build effective collaboration.", + "_key": "ab508f5d1c60", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "127040f3d6cf", + "markDefs": [] + }, + { + "children": [ + { + "_key": "16dcc0bc56c2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "be1a816ea55d" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "During my first days, he guided me to the most useful Nextflow resources available online, tailored to my goals. Then, I drafted a pipeline that I wanted to build and attempted to write my first lines of code in Nextflow. We communicated via Slack and Matthias reviewed and corrected my code via GitHub. He introduced me to the supportive nf-core community, to ask for help when needed, and to acknowledge every success along the way.", + "_key": "7db32e6934ee" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e8bd42c388b", + "markDefs": [] + }, + { + "_key": "161d577708cd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6c190c4de1ae" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "alt": "Mentor compliment about new module added", + "_key": "1b8a603cb633", + "asset": { + "_ref": "image-7d35ff2925da1534129b9c7dd8bfbade190da61c-1132x204-png", + "_type": "reference" + } + }, + { + "children": [ + { + "_key": "ac600b3104d1", + "_type": "span", + "text": "Highlights of the program" + } + ], + "_type": "block", + "style": "h1", + "_key": "d8cab8f801b9" + }, + { + "_key": "f0a1d33a074d", + "markDefs": [], + "children": [ + { + "text": "We decided to start small, setting step-by-step goals. Matthias suggested that a doable goal would be to create my first Nextflow module in the context of a broader pipeline I wanted to develop. A module is a building block that encapsulates a specific functionality or task within a workflow. We realized that the tool I wanted to modularize was not available as part of nf-core. The nf-core GitHub has a community-driven collection of Nextflow modules, subworkflows and pipelines for bioinformatics, providing standardized and well-documented modules. The goal, therefore, was to create a module for this missing tool and then submit it as a contribution to nf-core.", + "_key": "27cfe03ffc9b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5203da423a25", + "children": [ + { + "_type": "span", + "text": "", + "_key": "40eb6e2fae90" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "For those unfamiliar, contributing to nf-core requires another member of the community, usually a maintainer, to review your code. As a newcomer, I was obviously curious about how the process would be. In academia, where anonymity often prevails, feedback can occasionally be a bit stringent. Conversely, during my submission to the nf-core project, I was pleasantly surprised that reviewers look for collective improvement, providing quick, constructive and amicable reviews, leading to a positive environment.", + "_key": "22984c4f5707", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e9f8a0fad70", + "markDefs": [] + }, + { + "_key": "f7048aab1f35", + "children": [ + { + "text": "", + "_key": "21bb257c00e1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "alt": "Review comment in GitHub", + "_key": "b5af7aaa5e40", + "asset": { + "_type": "reference", + "_ref": "image-f3994b9f4e06fba6be7552431a56c828079f9c77-1106x226-png" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "3d757dc9cc22", + "markDefs": [], + "children": [ + { + "text": "For my final project in the mentorship program, I successfully ported a complete pipeline from Bash to Nextflow. This was a learning experience that allowed me to explore a diverse range of skills, such as modularizing content, understanding how crucial the meta map is, and creating Docker container images for software. This process not only enhanced my proficiency in Nextflow but also allowed me to interact with and contribute to related projects like Bioconda and BioContainers.", + "_key": "0516df23bfc1", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "fe1c3b09599d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7d4ff084080c" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h1", + "_key": "bdfb527e143d", + "children": [ + { + "_key": "056c66229cbe", + "_type": "span", + "text": "Life after the mentorship" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8be45b9b4b93", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=GHb2Wt9VCOg", + "_key": "8daa85555ae9" + } + ], + "children": [ + { + "marks": [], + "text": "With the skills I acquired during the mentorship as a mentee, I proposed and successfully implemented a custom solution in Nextflow for a precision medicine start-up I worked at the time that could sequentially do several diagnostics and consumer-genetics applications in the cloud, resulting in substantial cost savings and increasing flexibility for the company. Beyond my immediate projects, I joined a group actively developing an open-source Nextflow pipeline for genetic imputation. This project allowed me to be in close contact with members of the nf-core community working on similar projects, adding new tools to this pipeline, giving and receiving feedback, and continuing to improve my overall Nextflow skills while also contributing to the broader bioinformatics community. You can learn more about this project with the fantastic talk by Louis Le Nézet at Nextflow Summit 2023 ", + "_key": "8dd5c7965d4b", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "8daa85555ae9" + ], + "text": "here", + "_key": "4d0d91d051e1" + }, + { + "_key": "bafba6a7cd2c", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "style": "normal", + "_key": "b97f09e8d2f6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "23f64f77d16b" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "cab82421bbcc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Finally, I was honored to become a Nextflow ambassador. The program’s goal is to extend the awareness of Nextflow around the world while also building a supportive community. In particular, the South American community is underrepresented, so I serve as a point of contact for any institution or newcomer who wants to implement pipelines with Nextflow. As part of this program, I was invited to speak at the second Chilean Congress of Bioinformatics, where I gave a talk about how Nextflow and nf-core can support scaling bioinformatics projects in the cloud. It was incredibly rewarding to introduce Nextflow to a community for the first time and witness the genuine enthusiasm it sparks among students and attendees for the potential in their research projects.", + "_key": "b8286a2004c9" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "bef5727b0094", + "children": [ + { + "_type": "span", + "text": "", + "_key": "93fcde1f454c" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "Second Chilean Congress of Bioinformatics", + "_key": "70e9f82405c4", + "asset": { + "_ref": "image-2591b4bfffbda1b9fb2b7e8ba72f82efd8b61148-1202x796-png", + "_type": "reference" + } + }, + { + "_key": "8d0f1e22b905", + "children": [ + { + "_type": "span", + "text": "What’s next?", + "_key": "b1f267ae95ea" + } + ], + "_type": "block", + "style": "h1" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The comprehensive skill set acquired in my journey proved to be incredibly valuable for my professional development and allowed me to join the ZS Discovery Team as a Senior Bioinformatician. This organization accelerates transformation in research and early development with direct contribution to impactful bioinformatics projects with a globally distributed, multidisciplinary talented team.", + "_key": "b8ad1a5313ed" + } + ], + "_type": "block", + "style": "normal", + "_key": "71a8c0ef7949", + "markDefs": [] + }, + { + "_key": "ca296a4bb901", + "children": [ + { + "text": "", + "_key": "cedac273c584", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In addition, we organized a local site for the nf-core hackathon in March 2024, the first Nextflow Hackathon in Argentina, fostering a space to advance our skills in workflow management collectively. It was a pleasure to see how beginners got their first PRs approved and how they interacted with the nf-core community for the first time.", + "_key": "d412cf7a30e9" + } + ], + "_type": "block", + "style": "normal", + "_key": "59765e76faf5" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "bf3cd841b664" + } + ], + "_type": "block", + "style": "normal", + "_key": "8ff2c8514806" + }, + { + "_key": "81a165770b06", + "asset": { + "_ref": "image-cae7d65050e9549e61530a352ec7f6a80d1168db-1198x898-png", + "_type": "reference" + }, + "_type": "image", + "alt": "nf-core March 2024 Hackathon site in Argentina" + }, + { + "children": [ + { + "text": "My current (and probably future!) day-to-day work involves working and developing pipelines with Nextflow, while also mentoring younger bioinformaticians into this language. The commitment to open-source projects remains a cornerstone of my journey and I am thankful that it has provided me the opportunity to collaborate with individuals from diverse backgrounds all over the world.", + "_key": "8f0a4e63b316", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6c0736dae390", + "markDefs": [] + }, + { + "style": "normal", + "_key": "2a3d13fd6c64", + "children": [ + { + "_key": "5aefe083fe3c", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Whether you're interested in the mentorship program, curious about the hackathon, or simply wish to connect, feel free to reach out at the nf-core Slack!", + "_key": "b811a3387b13" + } + ], + "_type": "block", + "style": "normal", + "_key": "50290758700f" + } + ], + "meta": { + "slug": { + "current": "reflections-on-nextflow-mentorship" + } + }, + "title": "One-Year Reflections on Nextflow Mentorship", + "_updatedAt": "2024-09-25T14:18:42Z", + "publishedAt": "2024-04-10T06:00:00.000Z", + "_rev": "mvya9zzDXWakVjnX4hhave", + "_createdAt": "2024-09-25T14:18:42Z", + "_type": "blogPost" + }, + { + "publishedAt": "2022-03-24T07:00:00.000Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5jyg", + "_id": "35d0f7aecd0d", + "_updatedAt": "2024-09-30T09:13:15Z", + "meta": { + "description": "Software development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.", + "slug": { + "current": "evolution-of-nextflow-runtime" + } + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "0d8f09611cc7", + "markDefs": [], + "children": [ + { + "text": "Software development is a constantly evolving process that requires continuous adaptation to keep pace with new technologies, user needs, and trends. Likewise, changes are needed in order to introduce new capabilities and guarantee a sustainable development process.", + "_key": "124b26b39eee", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "06d1d13382fe", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c47c5295eadf", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow is no exception. This post will summarize the major changes in the evolution of the framework over the next 12 to 18 months.\n", + "_key": "54a0a3187c97", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6451561de62" + }, + { + "_type": "block", + "style": "h2", + "_key": "37ceefaf32d4", + "markDefs": [], + "children": [ + { + "_key": "7f601fc1b899", + "_type": "span", + "marks": [], + "text": "Java baseline version" + } + ] + }, + { + "markDefs": [ + { + "href": "https://endoflife.date/java", + "_key": "2a34e0754fd6", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow runs on top of Java (or, more precisely, the Java virtual machine). So far, Java 8 has been the minimal version required to run Nextflow. However, this version was released 8 years ago and is going to reach its end-of-life status at the end of ", + "_key": "f7e5940ff4dd", + "_type": "span", + "marks": [] + }, + { + "text": "this month", + "_key": "ef6ec8a8d896", + "_type": "span", + "marks": [ + "2a34e0754fd6" + ] + }, + { + "text": ". For this reason, as of version 22.01.x-edge and the upcoming stable release 22.04.0, Nextflow will require Java version 11 or later for its execution. This also allows the introduction of new capabilities provided by the modern Java runtime.", + "_key": "16c16a8e18b2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d4f2aae38a99" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "a5f4a9cad7af", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1460a0b9747c", + "markDefs": [] + }, + { + "_key": "5a4f5f4709bd", + "markDefs": [ + { + "_type": "link", + "href": "https://sdkman.io/", + "_key": "cae313613ad1" + } + ], + "children": [ + { + "text": "Tip: If you are confused about how to install or upgrade Java on your computer, consider using ", + "_key": "39e505b489d6", + "_type": "span", + "marks": [] + }, + { + "_key": "75d5f0d2d9b9", + "_type": "span", + "marks": [ + "cae313613ad1" + ], + "text": "Sdkman" + }, + { + "_key": "326eca772e92", + "_type": "span", + "marks": [], + "text": ". It’s a one-liner install tool that allows easy management of Java versions." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "623b9ff0aaf9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a411ad5f506a", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "5933868e9d09", + "markDefs": [], + "children": [ + { + "_key": "bf01a1e543ae", + "_type": "span", + "marks": [], + "text": "DSL2 as default syntax" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "eacccaa47a75", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2020/dsl2-is-here.html", + "_key": "09a00de0d00a" + }, + { + "_type": "link", + "href": "https://nf-co.re/pipelines", + "_key": "9e4640e14412" + } + ], + "children": [ + { + "_key": "b16d4a7959ea", + "_type": "span", + "marks": [], + "text": "Nextflow DSL2 has been introduced nearly " + }, + { + "_type": "span", + "marks": [ + "09a00de0d00a" + ], + "text": "2 years ago", + "_key": "7b28aa10b18c" + }, + { + "text": " (how time flies!) and definitely represented a major milestone for the project. Established pipeline collections such as those in ", + "_key": "991fc6d79dd8", + "_type": "span", + "marks": [] + }, + { + "_key": "62486c4ff169", + "_type": "span", + "marks": [ + "9e4640e14412" + ], + "text": "nf-core" + }, + { + "_key": "af1bad29e8c7", + "_type": "span", + "marks": [], + "text": " have migrated their pipelines to DSL2 syntax." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "19122611d713", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c0b41800cb68" + }, + { + "_key": "4f186bf2fa5b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This is a confirmation that the DSL2 syntax represents a natural evolution for the project and is not considered to be just an experimental or alternative syntax.", + "_key": "e4f1710c865d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a27ae1dfa5c2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c16ec1f5b844" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "18326dbd3fa4", + "_type": "span", + "marks": [], + "text": "For this reason, as for Nextflow version 22.03.0-edge and the upcoming 22.04.0 stable release, DSL2 syntax is going to be the " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "default", + "_key": "5d2d0aef3920" + }, + { + "_type": "span", + "marks": [], + "text": " syntax version used by Nextflow, if not otherwise specified.", + "_key": "27330afcf1e6" + } + ], + "_type": "block", + "style": "normal", + "_key": "2727246ab16a", + "markDefs": [] + }, + { + "style": "normal", + "_key": "813ddfd66f82", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a48be79b819c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "b212ff99872d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In practical terms, this means it will no longer be necessary to add the declaration ", + "_key": "e8818ed3195e" + }, + { + "_key": "c3bbcbbfea41", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.enable.dsl = 2" + }, + { + "_type": "span", + "marks": [], + "text": " at the top of your script or use the command line option ", + "_key": "2c189f6284d4" + }, + { + "_key": "96934518dc20", + "_type": "span", + "marks": [ + "code" + ], + "text": "-dsl2 " + }, + { + "text": " to enable the use of this syntax.", + "_key": "a37ce9db25f1", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "245bdcbe8b75", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ab6fd943b602", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "500b75c5ac0b", + "markDefs": [], + "children": [ + { + "text": "If you still want to continue to use DSL1 for your pipeline scripts, you will need to add the declaration ", + "_key": "3a9c095a9daf", + "_type": "span", + "marks": [] + }, + { + "text": "nextflow.enable.dsl = 1", + "_key": "0cbc89df0844", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " at the top of your pipeline script or use the command line option ", + "_key": "0248aee8b501", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "-dsl1", + "_key": "07befd452a8b", + "_type": "span" + }, + { + "_key": "783b04cb3341", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7009a7822a6b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4ccf0d02ff4d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c22d408fc28f", + "markDefs": [], + "children": [ + { + "text": "To make this transition as smooth as possible, we have also added the possibility to declare the DSL version in the Nextflow configuration file, using the same syntax shown above.", + "_key": "60caa0d27216", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5413b54a4720", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "365beb04f28c" + }, + { + "_type": "block", + "style": "normal", + "_key": "20f4983d70d9", + "markDefs": [], + "children": [ + { + "_key": "faf1496f480c", + "_type": "span", + "marks": [], + "text": "Finally, if you wish to keep the current DSL behaviour and not make any changes in your pipeline scripts, the following variable can be defined in your system environment:" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "73585930399f" + } + ], + "_type": "block", + "style": "normal", + "_key": "4322063f7e24" + }, + { + "code": "export NXF_DEFAULT_DSL=1", + "_type": "code", + "_key": "5f343f89977e" + }, + { + "style": "normal", + "_key": "9f6d4e7bc11c", + "markDefs": [], + "children": [ + { + "_key": "50827b49fc84", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "46afa98b31af", + "_type": "span", + "marks": [], + "text": "DSL1 end-of-life phase" + } + ], + "_type": "block", + "style": "h2", + "_key": "608179f25360" + }, + { + "style": "normal", + "_key": "ef0ca37d3101", + "markDefs": [], + "children": [ + { + "_key": "6ab122bf0f3e", + "_type": "span", + "marks": [], + "text": "Maintaining two separate DSL implementations in the same programming environment is not sustainable and, above all, does not make much sense. For this reason, along with making DSL2 the default Nextflow syntax, DSL1 will enter into a 12-month end-of-life phase, at the end of which it will be removed. Therefore version 22.04.x and 22.10.x will be the last stable versions providing the ability to run DSL1 scripts." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "f91427f76068", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "79566760c8ab" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This is required to keep evolving the framework and to create a more solid implementation of Nextflow grammar. Maintaining compatibility with the legacy syntax implementation and data structures is a challenging task that prevents the evolution of the new syntax.", + "_key": "be5061112ef2" + } + ], + "_type": "block", + "style": "normal", + "_key": "253745843863" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "0a849cbf24f2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "90cbf87d4503", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/releases", + "_key": "d1e2981cd904", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Bear in mind, this does ", + "_key": "78cfff12e672", + "_type": "span" + }, + { + "_key": "a3947d5fa744", + "_type": "span", + "marks": [ + "strong" + ], + "text": "not" + }, + { + "_type": "span", + "marks": [], + "text": " mean it will not be possible to use DSL1 starting from 2023. All existing Nextflow runtimes will continue to be available, and it will be possible to for any legacy pipeline to run using the required version available from the GitHub ", + "_key": "8df1419fcf96" + }, + { + "_type": "span", + "marks": [ + "d1e2981cd904" + ], + "text": "releases page", + "_key": "bdb28f19a25d" + }, + { + "text": ", or by specifying the version using the NXF_VER variable, e.g.", + "_key": "5184698579b0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "779015e1a628" + }, + { + "_key": "9285387ddfb5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "17976845589c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "64d1e1607622", + "code": "NXF_VER: 21.10.6 nextflow run " + }, + { + "_type": "block", + "style": "normal", + "_key": "47be3491a7b8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "10e6f099c2c1" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "New configuration format", + "_key": "edf6a634fc9f", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "b33987079c9d", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "8f67710faa58", + "_type": "span", + "marks": [], + "text": "The configuration file is a key component of the Nextflow framework since it allows workflow developers to decouple the pipeline logic from the execution parameters and infrastructure deployment settings." + } + ], + "_type": "block", + "style": "normal", + "_key": "d6509f2b57c4" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e9d9022ba1c5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "880486b74a61" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The current Nextflow configuration file mechanism is extremely powerful, but it also has some serious drawbacks due to its ", + "_key": "fc133ed67f9d" + }, + { + "_key": "07e424d8a40f", + "_type": "span", + "marks": [ + "em" + ], + "text": "dynamic" + }, + { + "_type": "span", + "marks": [], + "text": " nature that makes it very hard to keep stable and maintainable over time.", + "_key": "50c77083262c" + } + ], + "_type": "block", + "style": "normal", + "_key": "afb500b83a74" + }, + { + "style": "normal", + "_key": "8f76ebb26ae9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "93e5a4514dc6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For this reason, we are planning to re-engineer the current configuration component and replace it with a better configuration component with two major goals: 1) continue to provide a rich and human-readable configuration system (so, no YAML or JSON), 2) have a well-defined syntax with a solid foundation that guarantees predictable configurations, simpler troubleshooting and more sustainable maintenance.", + "_key": "888a87dc54d3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f81607750743" + }, + { + "children": [ + { + "_key": "d51dc5548954", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "414d40874f57", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Currently, the most likely options are ", + "_key": "db966f3ebf12", + "_type": "span" + }, + { + "text": "Hashicorp HCL", + "_key": "2600b769af09", + "_type": "span", + "marks": [ + "f88563f6e6cf" + ] + }, + { + "_key": "f09939b614a3", + "_type": "span", + "marks": [], + "text": " (as used by Terraform and other Hashicorp tools) and " + }, + { + "marks": [ + "dd31218adb40" + ], + "text": "Lightbend HOCON", + "_key": "b6bf5edfb3df", + "_type": "span" + }, + { + "_key": "20c8d2004b5d", + "_type": "span", + "marks": [], + "text": ". You can read more about this feature at " + }, + { + "_type": "span", + "marks": [ + "5f58504262ad" + ], + "text": "this link", + "_key": "a20826ee22f8" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "9e066f85e9e9" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb2c28e9ee23", + "markDefs": [ + { + "_key": "f88563f6e6cf", + "_type": "link", + "href": "https://github.com/hashicorp/hcl" + }, + { + "href": "https://github.com/lightbend/config", + "_key": "dd31218adb40", + "_type": "link" + }, + { + "_key": "5f58504262ad", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/2723" + } + ] + }, + { + "_key": "31617ababdfc", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d4516291347a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "e743e38f44bb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Ignite executor deprecation", + "_key": "0aff91392e44" + } + ] + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/ignite.html", + "_key": "7860ca109cbd", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The executor for ", + "_key": "6a499af3a8bb", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "7860ca109cbd" + ], + "text": "Apache Ignite", + "_key": "9dc4b9754145" + }, + { + "_type": "span", + "marks": [], + "text": " was an early attempt to provide Nextflow with a self-contained, distributed cluster for the deployment of pipelines into HPC environments. However, it had very little adoption over the years, which was not balanced by the increasing complexity of its maintenance.", + "_key": "22d1e1eda6cd" + } + ], + "_type": "block", + "style": "normal", + "_key": "26b895ea2ac0" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bd593e1d634a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "94ac0085d8b3" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this reason, it was decided to deprecate it and remove it from the default Nextflow distribution. The module is still available in the form of a separate project plugin and available at ", + "_key": "788499d39268" + }, + { + "text": "this link", + "_key": "3c17485fef37", + "_type": "span", + "marks": [ + "534211f225ac" + ] + }, + { + "_key": "89e0994a19f0", + "_type": "span", + "marks": [], + "text": ", however, it will not be actively maintained." + } + ], + "_type": "block", + "style": "normal", + "_key": "78e007205aa7", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nf-ignite", + "_key": "534211f225ac" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "71ca7cb19b99", + "markDefs": [], + "children": [ + { + "_key": "869a5f0a3647", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "text": "Conclusion", + "_key": "e7fc7b8a31ac", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "39a3918382ab", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This post is focused on the most fundamental changes we are planning to make in the following months.", + "_key": "fc633bc84c6f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e5492a05b394" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8a0dad7ac6f1" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d27793d27a1", + "markDefs": [] + }, + { + "_key": "d0d726854852", + "markDefs": [], + "children": [ + { + "_key": "ac1d7ce9e376", + "_type": "span", + "marks": [], + "text": "With the adoption of Java 11, the full migration of DSL1 to DSL2 and the re-engineering of the configuration system, our purpose is to consolidate the Nextflow technology and lay the foundation for all the new exciting developments and features on which we are working on. Stay tuned for future blogs about each of them in upcoming posts." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "71f5034fef7f", + "markDefs": [], + "children": [ + { + "_key": "b26cce36e65d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you want to learn more about the upcoming changes reach us out on ", + "_key": "095acc7b9d14" + }, + { + "marks": [ + "797eed5aec45" + ], + "text": "Slack at this link", + "_key": "0ee03ac01f9c", + "_type": "span" + }, + { + "marks": [], + "text": ". ", + "_key": "01a525eb74ff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c8b7484c9e7b", + "markDefs": [ + { + "_type": "link", + "href": "https://app.slack.com/client/T03L6DM9G", + "_key": "797eed5aec45" + } + ] + } + ], + "title": "Evolution of the Nextflow runtime", + "_type": "blogPost", + "_createdAt": "2024-09-25T14:16:36Z", + "tags": [ + { + "_key": "332cc2616f09", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ] + }, + { + "_type": "blogPost", + "_updatedAt": "2024-05-28T14:18:22Z", + "_createdAt": "2024-05-23T07:01:07Z", + "_id": "35e0b13e-aa5a-4018-88c5-6a175d477f1d", + "publishedAt": "2024-05-23T12:00:00.000Z", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "tags": [ + { + "_ref": "ea6c309b-154f-45c3-9fda-650d7764b260", + "_type": "reference", + "_key": "ef12481e08d5" + }, + { + "_type": "reference", + "_key": "508790ebf0f9", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058" + } + ], + "meta": { + "_type": "meta", + "shareImage": { + "_type": "image", + "asset": { + "_type": "reference", + "_ref": "image-85ca91b4138fbab39962965a2ac2eec7e49514bf-4800x2700-png" + } + }, + "description": "Today marks a major milestone in that journey as we release two new free and open resources for the community: Seqera Pipelines and Seqera Containers.", + "noIndex": false, + "slug": { + "current": "introducing-seqera-pipelines-containers", + "_type": "slug" + } + }, + "title": "Empowering scientists with seamless access to bioinformatics resources", + "body": [ + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera is built on the promise that modern tooling and open software can improve scientists’ daily lives. We believe in empowering scientists and developers to focus on what they do best: groundbreaking research. Today marks a major milestone in that journey as we release two new free and open resources for the community: Seqera Pipelines and Seqera Containers.", + "_key": "a8a33347272f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "a558c16e7d96", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "These projects bring together the components bioinformaticians need into a simple interface, making it easy to find open-source pipelines to run or build a software container combining virtually any tools. By streamlining access to resources and fostering collaboration, we improve the velocity, quality, and reproducibility of your research.", + "_key": "1f5f2a98e9c80", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a5a7c42890d3" + }, + { + "_type": "block", + "style": "h2", + "_key": "ee4ea263d7a6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera Pipelines: Guiding Your Research Journey", + "_key": "5d6bde0200f20" + } + ] + }, + { + "_key": "0ee406b47b21", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/awesome-nextflow", + "_key": "0bac7a322b6a", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the early days of Nextflow, the ", + "_key": "6a2a8f59d99f0" + }, + { + "_key": "6a2a8f59d99f1", + "_type": "span", + "marks": [ + "0bac7a322b6a" + ], + "text": "“awesome-nextflow” GitHub repository" + }, + { + "_key": "6a2a8f59d99f2", + "_type": "span", + "marks": [], + "text": " was the go-to place to find pipelines. People would list their open-source workflows so that others could find one to match their data. Over time, the Nextflow community grew, and this particular resource became unmanageable. Projects such as nf-core have emerged with collections of workflows, but there are very many other high-quality Nextflow pipelines beyond nf-core that can be difficult to find." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b91d8e210a02", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Seqera Pipelines is the modern replacement for the ", + "_key": "4b279492ce7a0", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "“awesome-nextflow”", + "_key": "4b279492ce7a1", + "_type": "span" + }, + { + "text": " repo. We’ve put together a list of the best open-source workflows for you to search. We know from experience that finding high-quality pipelines is critical, so we’re using a tightly curated list of the very best workflows to begin with. Every pipeline comes with curated test data, so you can import into Seqera Platform and launch a test run in just a few clicks:", + "_key": "4b279492ce7a2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "youtube", + "id": "KWw0NP-CT_s", + "_key": "659e5fb9c13f" + }, + { + "children": [ + { + "_key": "3aed490d3739", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f97cf65f113", + "markDefs": [] + }, + { + "_key": "82628c058ed8", + "markDefs": [ + { + "_key": "58d9d8012ab0", + "_type": "link", + "href": "https://nextflow.io/" + }, + { + "href": "https://github.com/seqeralabs/tower-cli", + "_key": "9a41867bc689", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/docs/nf-core-tools/pipelines/launch", + "_key": "92b93540f2b1" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you’ve found an interesting pipeline, you can easily dive into the details. We show key information on the pipeline details page and provide a one-click experience to add pipelines to your launchpad within Seqera Platform. If you’re more at home in the terminal, you can use the launch box to grab commands for ", + "_key": "72ba431d1b440" + }, + { + "marks": [ + "58d9d8012ab0" + ], + "text": "Nextflow", + "_key": "72ba431d1b441", + "_type": "span" + }, + { + "text": ", ", + "_key": "72ba431d1b442", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "9a41867bc689" + ], + "text": "Seqera Platform CLI", + "_key": "72ba431d1b443" + }, + { + "marks": [], + "text": ", and ", + "_key": "72ba431d1b444", + "_type": "span" + }, + { + "marks": [ + "92b93540f2b1" + ], + "text": "nf-core/tools", + "_key": "72ba431d1b445", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "72ba431d1b446", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "414a3516da55", + "markDefs": [], + "children": [ + { + "_key": "4e2e0870666b0", + "_type": "span", + "marks": [], + "text": "We have big plans for Seqera Pipelines. By prioritizing actively maintained pipelines that adhere to industry standards, we minimize the risk of researchers encountering obsolete or malfunctioning pipelines. As we improve our accuracy, we will open up the catalog to include greater numbers of workflows." + } + ] + }, + { + "style": "normal", + "_key": "1c4ddb148e2c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Discovering a workflow is only the first step of a journey. In the future, we will extend Seqera Pipelines with additional features, such as the ability to create collections of your favorite pipelines and discuss their usage – both to get help and to help others in the community. Seqera Pipelines is already the best place to find your next workflow, and it’s only going to get better.", + "_key": "7064064535a60", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera Containers: The Magic of Reproducibility", + "_key": "8a75a73cb2f80" + } + ], + "_type": "block", + "style": "h2", + "_key": "dd0249c1bab7", + "markDefs": [] + }, + { + "children": [ + { + "_key": "0052d9e9bd970", + "_type": "span", + "marks": [], + "text": "Containers have transformed the research landscape, providing portable environments that encapsulate software, dependencies, and libraries – eliminating compatibility issues across various computing environments. Nextflow was a " + }, + { + "_type": "span", + "marks": [ + "122f56122a7d" + ], + "text": "very early adopter", + "_key": "0052d9e9bd971" + }, + { + "text": " of Docker and has provided first-class support for software containers for nearly a decade.", + "_key": "0052d9e9bd972", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b5ae89a0a5a4", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/podcast/2023/ep13_nextflow_10_years.html", + "_key": "122f56122a7d" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "While using containers isn’t entirely without friction. Pipeline developers need to write Dockerfile scripts for each step in their workflow. Projects such as ", + "_key": "cddca7666cf60" + }, + { + "_key": "cddca7666cf61", + "_type": "span", + "marks": [ + "618378cd6cb4" + ], + "text": "BioContainers" + }, + { + "marks": [], + "text": " have greatly simplified this process with pre-built images for ", + "_key": "cddca7666cf62", + "_type": "span" + }, + { + "_key": "cddca7666cf63", + "_type": "span", + "marks": [ + "e82d6ce5c752" + ], + "text": "Bioconda" + }, + { + "_type": "span", + "marks": [], + "text": " tools but are somewhat limited, especially when multiple tools are needed in a single container. We set out to improve this experience with Wave: our open-source on-demand container provisioning service. Wave allows Nextflow developers to simply reference a set of conda packages or a bundled Dockerfile. When the pipeline runs, the container is built on the fly and can be targeted for the specific local environment that the workflow is running in.", + "_key": "cddca7666cf64" + } + ], + "_type": "block", + "style": "normal", + "_key": "47493cbada63", + "markDefs": [ + { + "href": "https://biocontainers.pro/", + "_key": "618378cd6cb4", + "_type": "link" + }, + { + "_type": "link", + "href": "https://bioconda.github.io/", + "_key": "e82d6ce5c752" + } + ] + }, + { + "markDefs": [ + { + "href": "https://seqera.io/containers", + "_key": "a60f91c08427", + "_type": "link" + } + ], + "children": [ + { + "text": "With ", + "_key": "19330cec0f8c0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "a60f91c08427" + ], + "text": "Seqera Containers", + "_key": "19330cec0f8c1", + "_type": "span" + }, + { + "_key": "19330cec0f8c2", + "_type": "span", + "marks": [], + "text": ", we’re taking the experience of Wave one step further. Instead of browsing available images as you would with a traditional container registry, just type in the names of the tools you want to use. Clicking “Get container” returns a container URI instantly, which you can use for anything - Nextflow pipeline or not. The key difference with Seqera Containers is that the image is also stored in an image cache, with infrastructure provided by our friends at AWS. Subsequent requests for the same package set will return the same image, ensuring reproducibility across runs. The cache has no expiry date, so those images will still be there if you need to rerun your analysis in the future." + } + ], + "_type": "block", + "style": "normal", + "_key": "86a0cd5ff0e7" + }, + { + "_key": "c6c73031246e", + "_type": "youtube", + "id": "mk67PjOIp8o" + }, + { + "_type": "block", + "style": "normal", + "_key": "ea5a4d906865", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Not only can you request any combination of packages, but you can also select architecture and image format. Builds with linux/arm64 architecture promise to open up analysis to new, more efficient compute platforms. Choosing Singularity leads to a native Singularity / Apptainer build with an OCI-compliant architecture and even a URL to download the ", + "_key": "867e4c63d88b0", + "_type": "span" + }, + { + "_key": "867e4c63d88b1", + "_type": "span", + "marks": [ + "code" + ], + "text": ".sif" + }, + { + "text": " file directly.", + "_key": "867e4c63d88b2", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Clicking “View build details” for the container shows the full information of the Dockerfile, conda environment file, and build settings, as well as the complete build logs. Every container includes results from a security scan using ", + "_key": "2c3f04b3ed850" + }, + { + "_type": "span", + "marks": [ + "5dc792b284e9" + ], + "text": "Trivy", + "_key": "2c3f04b3ed851" + }, + { + "marks": [], + "text": " attached.", + "_key": "2c3f04b3ed852", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3518d0cbbefc", + "markDefs": [ + { + "_key": "5dc792b284e9", + "_type": "link", + "href": "https://trivy.dev/" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8fadea99d6d1", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While the web interface is the easiest way to get started with Seqera Containers, it doesn’t end there. The same functionality extends to Nextflow and the Wave CLI. Just tell Wave to “freeze” with a set of conda packages, and the resulting image will be cached in the public Seqera Containers registry.", + "_key": "60ca627f3ac10", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "060de8d675b40", + "_type": "span", + "marks": [], + "text": "Seqera Containers is a free service provided by Seqera and AWS. It does not require authentication of any kind to use, and is configured with very high rate limits so that nothing stops your pipeline from pulling 50 images all at once! We can’t wait to see how the entire bioinformatics community uses it, both Nextflow users and beyond." + } + ], + "_type": "block", + "style": "normal", + "_key": "845b89f38040", + "markDefs": [] + }, + { + "_key": "1f7bbdd1a7fb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "What lies ahead", + "_key": "a5f3661cee520" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "9bb85bc9c54e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipelines and Containers represent just the beginning of Seqera’s vision to be the home of open science. We think that these two resources can have a real impact on researchers around the globe, and we’re excited to continue working with them to extend their functionality. We’re committed to collaborating with the community to focus on the features that you need, so do let us know what you think and what you want next!", + "_key": "5139413842540" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "cb6fe74411f4", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "f64f44ea540a0", + "_type": "span", + "marks": [] + } + ] + } + ], + "_rev": "mAO9W5hBo57qoxiglmBcPn" + }, + { + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "2a885d306880" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "ed6c7969338d" + } + ], + "_type": "blogPost", + "_createdAt": "2024-09-25T14:14:59Z", + "_updatedAt": "2024-09-26T09:01:18Z", + "title": "The impact of Docker containers on the performance of genomic pipelines", + "publishedAt": "2015-06-15T06:00:00.000Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "the-impact-of-docker-on-genomic-pipelines" + } + }, + "_id": "37d183e76ffa", + "_rev": "g7tG3ShgLiOybM4TXYtt0H", + "body": [ + { + "style": "normal", + "_key": "072f7147ff03", + "markDefs": [], + "children": [ + { + "text": "In a recent publication we assessed the impact of Docker containers technology on the performance of bioinformatic tools and data analysis workflows.", + "_key": "a71af7521775", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "f61882bb89a7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f944cf569361", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "8ac9840d7563", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We benchmarked three different data analyses: a RNA sequence pipeline for gene expression, a consensus assembly and variant calling pipeline, and finally a pipeline for the detection and mapping of long non-coding RNAs.", + "_key": "e79004022a6c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "1646426f4da0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "05460993440f", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "524399dd85e5", + "_type": "span", + "marks": [], + "text": "We found that Docker containers have only a minor impact on the performance of common genomic data analysis, which is negligible when the executed tasks are demanding in terms of computational time." + } + ], + "_type": "block", + "style": "normal", + "_key": "752983bdf7d1" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "adbe9bb46148", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1c1ce4adf686", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "[This publication is available as PeerJ preprint at this link](https://peerj.com/preprints/1171/).", + "_key": "10536ca60ebf" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1a5da8ad836", + "markDefs": [] + } + ] + }, + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_type": "blogPost", + "title": "The impact of Docker containers on the performance of genomic pipelines", + "_id": "37d711cb52e8", + "meta": { + "slug": { + "current": "the-impact-of-docker-on-genomic-pipelines" + }, + "description": "In a recent publication we assessed the impact of Docker containers technology on the performance of bioinformatic tools and data analysis workflows." + }, + "_updatedAt": "2024-10-02T13:43:15Z", + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "2a885d306880" + }, + { + "_key": "ed6c7969338d", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "publishedAt": "2015-06-15T06:00:00.000Z", + "_rev": "2PruMrLMGpvZP5qAknmB7m", + "body": [ + { + "children": [ + { + "_key": "a71af7521775", + "_type": "span", + "marks": [], + "text": "In a recent publication we assessed the impact of Docker containers technology on the performance of bioinformatic tools and data analysis workflows." + } + ], + "_type": "block", + "style": "normal", + "_key": "072f7147ff03", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f944cf569361", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f61882bb89a7" + }, + { + "_type": "block", + "style": "normal", + "_key": "8ac9840d7563", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We benchmarked three different data analyses: a RNA sequence pipeline for gene expression, a consensus assembly and variant calling pipeline, and finally a pipeline for the detection and mapping of long non-coding RNAs.", + "_key": "e79004022a6c" + } + ] + }, + { + "style": "normal", + "_key": "1646426f4da0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "05460993440f" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "752983bdf7d1", + "markDefs": [], + "children": [ + { + "text": "We found that Docker containers have only a minor impact on the performance of common genomic data analysis, which is negligible when the executed tasks are demanding in terms of computational time.", + "_key": "524399dd85e5", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "adbe9bb46148", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1c1ce4adf686", + "markDefs": [] + }, + { + "children": [ + { + "marks": [ + "em" + ], + "text": "This publication is available as ", + "_key": "10536ca60ebf", + "_type": "span" + }, + { + "_key": "2b21a547baf3", + "_type": "span", + "marks": [ + "em", + "16246cde37ab" + ], + "text": "PeerJ preprint here." + } + ], + "_type": "block", + "style": "normal", + "_key": "c1a5da8ad836", + "markDefs": [ + { + "_type": "link", + "href": "https://peerj.com/preprints/1171/", + "_key": "16246cde37ab" + } + ] + } + ], + "_createdAt": "2024-09-25T14:14:59Z" + }, + { + "_id": "38329391-8e62-4aba-b4fa-32c658e33b13", + "_type": "blogPost", + "_updatedAt": "2024-08-23T14:06:31Z", + "tags": [ + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "ce64efeb3685" + }, + { + "_key": "40689d831034", + "_ref": "d356a4d5-06c1-40c2-b655-4cb21cf74df1", + "_type": "reference" + } + ], + "publishedAt": "2024-06-05T13:38:00.000Z", + "_createdAt": "2024-07-26T11:11:53Z", + "title": "Seqera's project on the optimization of computational resources for HPC workloads in the cloud through ML/AI has been funded by the European Union", + "author": { + "_ref": "a7e6fb2d-94cb-4bcd-bcbd-120e379b2298", + "_type": "reference" + }, + "meta": { + "_type": "meta", + "description": "Call for grants 2021 aimed at R&D Projects in AI and other digital technologies and their integration into value chains", + "noIndex": false, + "slug": { + "current": "optimization-computation-resources-ML-AI", + "_type": "slug" + } + }, + "body": [ + { + "_type": "image", + "_key": "2ca9d274f836", + "asset": { + "_ref": "image-22a6d646f122e9df55c154735882a2cb56ae7d87-1600x225-jpg", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "_key": "958028fa63b50", + "_type": "span", + "marks": [], + "text": "Call for grants 2021 aimed at R&D projects in AI and other digital technologies and their integration into value chains" + } + ], + "_type": "block", + "style": "h3", + "_key": "f01b00d90f54" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9af02a5030750" + } + ], + "_type": "block", + "style": "normal", + "_key": "6155ddde2c0a" + }, + { + "style": "normal", + "_key": "09aebbc4262f", + "markDefs": [ + { + "_type": "link", + "href": "https://www.red.es/es", + "_key": "64b2513c3ed1" + }, + { + "_type": "link", + "href": "https://commission.europa.eu/funding-tenders/find-funding/eu-funding-programmes/european-regional-development-fund-erdf_en#:~:text=The%20European%20Regional%20Development%20Fund,dedicated%20national%20or%20regional%20programmes.", + "_key": "19a9e11e0b53" + }, + { + "_type": "link", + "href": "https://next-generation-eu.europa.eu/index_en", + "_key": "47482c371588" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The project 'Optimization of computational resources for HPC workloads in the cloud through ML/AI' by Seqera Labs S.L. has been funded by the ", + "_key": "de8345dee2320" + }, + { + "_key": "fda11d32da9b", + "_type": "span", + "marks": [ + "19a9e11e0b53" + ], + "text": "European Regional Development Fund (ERDF) " + }, + { + "_type": "span", + "marks": [], + "text": "of the ", + "_key": "7ee991ead9e1" + }, + { + "text": "European Union", + "_key": "34fa535e4601", + "_type": "span", + "marks": [ + "47482c371588" + ] + }, + { + "text": ", coordinated and managed by ", + "_key": "b620d2b78c0b", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "64b2513c3ed1" + ], + "text": "red.es", + "_key": "41eb06c9ae78", + "_type": "span" + }, + { + "_key": "7eae1ca7a7b9", + "_type": "span", + "marks": [], + "text": ", aiming to carry out the development of technological entrepreneurship and technological demand within the framework of the Strategic Action of Digital Economy and Society of the State R&D&I Program oriented towards societal challenges." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e210c007ebd3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4c8c550d6dea" + } + ], + "_type": "block" + }, + { + "_key": "b0e67983a64e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Project Description", + "_key": "78e9e7fb0ba20", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The project aims to develop a machine learning model to optimize workflow execution in the cloud, ensuring efficient use of resources. This enables users to control execution costs and achieve significant savings. Through this project's implementation, it is expected that the application of this technology will not only reduce costs and execution time but also minimize the environmental impact of computing tasks. Seqera Labs plays a key role in advancing personalized medicine and the discovery of new drugs.", + "_key": "71c829c8b959" + } + ], + "_type": "block", + "style": "normal", + "_key": "b608cc1805b8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "eaac793c2e93", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "File number: 2021/C005/00149902", + "_key": "96acc3bfbd660", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "de5b39af36aa", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Total investment: €1,165,466.66", + "_key": "8b0400c9463d", + "_type": "span" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "0c274b82d229", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Amount of aid: €669,279.99", + "_key": "034a4c2dc3db", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "h3", + "_key": "39cdd31e17dd", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\nConvocatoria de ayudas 2021 destinadas a proyectos de investigación y desarrollo en IA y otras technologías digitales y su integración en las cadenas de valor", + "_key": "a0489e395fa60", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://www.red.es/es", + "_key": "44d9a8fb6c5c", + "_type": "link" + }, + { + "_key": "5444a1f98c6a", + "_type": "link", + "href": "https://commission.europa.eu/funding-tenders/find-funding/eu-funding-programmes/european-regional-development-fund-erdf_en#:~:text=The%20European%20Regional%20Development%20Fund,dedicated%20national%20or%20regional%20programmes." + }, + { + "href": "https://next-generation-eu.europa.eu/index_en", + "_key": "a8150942df91", + "_type": "link" + } + ], + "children": [ + { + "_key": "21a068641a3c0", + "_type": "span", + "marks": [], + "text": "El proyecto de ‘Optimización de los recursos computacionales para las cargas de trabajo de HPC en la nube mediante ML/AI’ de Seqera Labs S.L. ha sido financiado por el " + }, + { + "_type": "span", + "marks": [ + "5444a1f98c6a" + ], + "text": "Fondo Europeo de Desarrollo Regional (FEDER)", + "_key": "40aa3b3d0ecc" + }, + { + "_key": "78ce1222a23b", + "_type": "span", + "marks": [], + "text": " de la " + }, + { + "_key": "e9cd306a59f1", + "_type": "span", + "marks": [ + "a8150942df91" + ], + "text": "Unión Europea" + }, + { + "_type": "span", + "marks": [], + "text": ", coordinada y gestionada por ", + "_key": "1ea233b7e3bc" + }, + { + "_key": "7045f57ce29d", + "_type": "span", + "marks": [ + "44d9a8fb6c5c" + ], + "text": "red.es" + }, + { + "_type": "span", + "marks": [], + "text": ", con el objetivo llevar a cabo el desarrollo del emprendimiento tecnológico y la demanda tecnológica, en el marco de la Acción Estratégica de Economía y Sociedad Digital del Programa Estatal de I+D+i orientada a retos de la sociedad.", + "_key": "e3e2efd75bba" + } + ], + "_type": "block", + "style": "normal", + "_key": "a59731faab3f" + }, + { + "_type": "block", + "style": "normal", + "_key": "a0b3a9584258", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5f02caf6ec9e" + } + ] + }, + { + "_key": "7c937551c3d3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Descripción del proyecto", + "_key": "8916706675c00" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "51fede68e0f0", + "markDefs": [], + "children": [ + { + "_key": "5b5110cae217", + "_type": "span", + "marks": [], + "text": "El proyecto busca desarrollar un modelo de machine learning para optimizar la ejecución de flujos de trabajo en la nube, garantizando el uso eficiente de recursos. Esto permite a los usuarios controlar los costes de ejecución y lograr ahorros significativos. Con la presente ejecución del proyecto se espera que la aplicación de esta tecnología no solo reduzca los costes y el tiempo de ejecución, sino que también minimice el impacto ambiental de los trabajos de computación. Seqera Labs desempeña un papel fundamental en el avance de la medicina personalizada y el descubrimiento de nuevos medicamentos." + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d7bc90b57c58", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Expediente nº: 2021/C005/00149902", + "_key": "ad6851bdbc700", + "_type": "span" + } + ] + }, + { + "_key": "f01aac1dae9b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Inversión total: 1.165.466,66 €", + "_key": "aaf0c99b85540", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "5e32e917e71e0", + "_type": "span", + "marks": [], + "text": "Importe de la ayuda: 669.279,99 €" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c724b1e64798" + } + ], + "_rev": "0HV4XeadlxB19r3p3EDEa1" + }, + { + "meta": { + "slug": { + "current": "celebrating-our-largest-international-training-event-and-hackathon-to-date" + } + }, + "author": { + "_ref": "drafts.phil-ewels", + "_type": "reference" + }, + "body": [ + { + "markDefs": [ + { + "href": "https://nf-co.re/", + "_key": "90d55f5bc90f", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In mid-March, we conducted our bi-annual Nextflow and ", + "_key": "cd8d22bd916f" + }, + { + "_type": "span", + "marks": [ + "90d55f5bc90f" + ], + "text": "nf-core", + "_key": "4946b78ec00f" + }, + { + "_type": "span", + "marks": [], + "text": " training and hackathon in what was unquestionably our best-attended community events to date. This year we had an impressive ", + "_key": "a58afd886090" + }, + { + "marks": [ + "strong" + ], + "text": "1,345 participants", + "_key": "d1f20dc963a6", + "_type": "span" + }, + { + "_key": "75f0d6e8725c", + "_type": "span", + "marks": [], + "text": " attend the training from " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "76 countries", + "_key": "6cce4de9885f" + }, + { + "text": ". Attendees came from far and wide — from Algeria to Andorra to Zambia to Zimbabwe!", + "_key": "3b07c81f40b2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f74f380b6aa0" + }, + { + "_key": "93757cd7cf59", + "children": [ + { + "_type": "span", + "text": "", + "_key": "adf4c95fb838" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "afc1d4acd5d9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Among our event attendees, we observed the following statistics:", + "_key": "158cd72d93d6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f0d3f2272821" + } + ], + "_type": "block", + "style": "normal", + "_key": "384a242cf0f4" + }, + { + "_type": "block", + "style": "normal", + "_key": "18548b30ae36", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "40% were 30 years old or younger, pointing to a young cohort of Nextflow users;", + "_key": "cc1d0fa89891" + }, + { + "_key": "995febae7f4c", + "_type": "span", + "marks": [], + "text": "55.3% identified as male vs. 40% female, highlighting our growing diversity;" + }, + { + "text": "68.2% came from research institutions;", + "_key": "039013357ae4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "71.4% were attending their first Nextflow training event;", + "_key": "faf89eaea8b5" + }, + { + "_type": "span", + "marks": [], + "text": "96.7% had never attended a Nextflow hackathon.", + "_key": "77dfacd913df" + } + ] + }, + { + "style": "normal", + "_key": "1f5fc152f9bf", + "children": [ + { + "_key": "c641bb176c01", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "185b35fdf1b1", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo", + "_key": "2ae097a6b418" + } + ], + "children": [ + { + "text": "Read on to learn more about these exciting events. If you missed it, you can still ", + "_key": "788d7d542c7a", + "_type": "span", + "marks": [] + }, + { + "text": "watch the Nextflow & nf-core training", + "_key": "93dcf12dab61", + "_type": "span", + "marks": [ + "2ae097a6b418" + ] + }, + { + "_type": "span", + "marks": [], + "text": " at your convenience.", + "_key": "bfaffa2f93ad" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c2e690b0c196", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1791bad97174" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d7a15d59c234", + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "b2585ae9ee80", + "children": [ + { + "_type": "span", + "text": "Multilingual training", + "_key": "ec5bd7c78570" + } + ] + }, + { + "style": "normal", + "_key": "26762c9953ac", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2023/training-march-2023", + "_key": "1d7988202280" + } + ], + "children": [ + { + "text": "This year, we were pleased to offer ", + "_key": "d893346efa7b", + "_type": "span", + "marks": [] + }, + { + "_key": "e4e5fcda544e", + "_type": "span", + "marks": [ + "1d7988202280" + ], + "text": "Nextflow / nf-core training" + }, + { + "text": " in multiple languages: in addition to English, we delivered sessions in French, Hindi, Portuguese, and Spanish.", + "_key": "edf4fe62d96e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "69876880295b", + "children": [ + { + "text": "", + "_key": "6467e25f45fb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "In our pre-event registration, ", + "_key": "c1b3d9c9d446", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "~88%", + "_key": "5b076dcc0de5", + "_type": "span" + }, + { + "marks": [], + "text": " of respondents indicated they would watch the training in English. However, there turned out to be a surprising appetite for training in other languages. We hope that multilingual training will make Nextflow even more accessible to talented scientists and researchers around the world.", + "_key": "da7c21bd3cbd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b80bc18bc70b", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "945063a9dff1" + } + ], + "_type": "block", + "style": "normal", + "_key": "30f565f09d4b" + }, + { + "_type": "block", + "style": "normal", + "_key": "46325ddd653f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The training consisted of four separate sessions in ", + "_key": "b03485d755ee" + }, + { + "_key": "170f019afbac", + "_type": "span", + "marks": [ + "strong" + ], + "text": "5 languages" + }, + { + "_type": "span", + "marks": [], + "text": " for a total of ", + "_key": "12ea469e3ac5" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "20 sessions", + "_key": "557ebf57c19d" + }, + { + "_type": "span", + "marks": [], + "text": ". As of April 19th, we’ve amassed over ", + "_key": "e9c875d7f223" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "6,600 YouTube views", + "_key": "5208b4277622" + }, + { + "_type": "span", + "marks": [], + "text": " with ", + "_key": "d05f97ba7439" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "2,300+ hours", + "_key": "f7b514d956da" + }, + { + "marks": [], + "text": " of training watched so far. ", + "_key": "5327dec00966", + "_type": "span" + }, + { + "_key": "4d1deba44373", + "_type": "span", + "marks": [ + "strong" + ], + "text": "27%" + }, + { + "text": " have watched the non-English sessions, making the effort at translation highly worthwhile.", + "_key": "2b5fddc4a3e8", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "f9bf93cc763f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6b12ca8a2e06" + } + ], + "_type": "block" + }, + { + "_key": "1e334dd7e8d9", + "markDefs": [ + { + "_key": "9d86525d1b7f", + "_type": "link", + "href": "https://twitter.com/Chris_Hakk" + }, + { + "_key": "5efb1e4fbe8b", + "_type": "link", + "href": "https://twitter.com/mribeirodantas" + }, + { + "href": "https://twitter.com/gau", + "_key": "60b27b892f14", + "_type": "link" + }, + { + "href": "https://twitter.com/juliamirpedrol", + "_key": "5697c9294bc1", + "_type": "link" + }, + { + "href": "https://twitter.com/GGabernet", + "_key": "95b6079975bd", + "_type": "link" + }, + { + "href": "https://twitter.com/abhi18av", + "_key": "c792980845f4", + "_type": "link" + } + ], + "children": [ + { + "_key": "0cdb3be44780", + "_type": "span", + "marks": [], + "text": "Thank you to the following people who delivered the training: " + }, + { + "_key": "a2df6d66bef5", + "_type": "span", + "marks": [ + "9d86525d1b7f" + ], + "text": "Chris Hakkaart" + }, + { + "marks": [], + "text": " (English), ", + "_key": "d83de59abca4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5efb1e4fbe8b" + ], + "text": "Marcel Ribeiro-Dantas", + "_key": "52d54c1d65c2" + }, + { + "text": " (Portuguese), ", + "_key": "5c183ef9b74c", + "_type": "span", + "marks": [] + }, + { + "text": "Maxime Garcia", + "_key": "24f335e0b0e8", + "_type": "span", + "marks": [ + "60b27b892f14" + ] + }, + { + "marks": [], + "text": " (French), ", + "_key": "9f9358c56ba1", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5697c9294bc1" + ], + "text": "Julia Mir Pedrol", + "_key": "5a32b0e6ebea" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "183a48c0758f" + }, + { + "marks": [ + "95b6079975bd" + ], + "text": "Gisela Gabernet", + "_key": "56bc58947f1d", + "_type": "span" + }, + { + "marks": [], + "text": " (Spanish), and ", + "_key": "8877a236b9fe", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "c792980845f4" + ], + "text": "Abhinav Sharma", + "_key": "a06209002c43" + }, + { + "_type": "span", + "marks": [], + "text": " (Hindi).", + "_key": "56d7e70eac3a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6a4562431f79", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c813f9969590" + } + ], + "_type": "block" + }, + { + "_key": "698378044374", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You can view the community training sessions on YouTube here:", + "_key": "c6df12a2dff9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "251acab2b63a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d3aaad496fb3" + }, + { + "listItem": "bullet", + "children": [ + { + "text": "[March 2023 Community Training – English](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo)", + "_key": "1506eb2d00f2", + "_type": "span", + "marks": [] + }, + { + "_key": "81c0c1e3605e", + "_type": "span", + "marks": [], + "text": "[March 2023 Community Training – Portugese](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhi41yDYhyHitUhIcUHIbJg)" + }, + { + "_type": "span", + "marks": [], + "text": "[March 2023 Community Training – French](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhiv9SjhoA1EDOXj9nzIqdS)", + "_key": "dde85b2c54ac" + }, + { + "_type": "span", + "marks": [], + "text": "[March 2023 Community Training – Spanish](https://www.youtube.com/playlist?list=PL3xpfTVZLcNhSlCWVoa3GURacuLWeFc8O)", + "_key": "277cc60cf0de" + }, + { + "marks": [], + "text": "[March 2023 Community Training – Hindi](https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ)", + "_key": "1ae98331b499", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c9a747ba68b2" + }, + { + "_type": "block", + "style": "normal", + "_key": "bdf68e20d322", + "children": [ + { + "_type": "span", + "text": "", + "_key": "79d738ea3d18" + } + ] + }, + { + "markDefs": [ + { + "_key": "461ea99100ec", + "_type": "link", + "href": "https://training.nextflow.io/" + } + ], + "children": [ + { + "text": "The videos accompany the written training material, which you can find at ", + "_key": "fca30f18a57e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "461ea99100ec" + ], + "text": "https://training.nextflow.io/", + "_key": "ee88993f9f40" + } + ], + "_type": "block", + "style": "normal", + "_key": "236d93772fd9" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f125d6b91159" + } + ], + "_type": "block", + "style": "normal", + "_key": "99ac21ceb969" + }, + { + "style": "h2", + "_key": "a457f8a2f2fd", + "children": [ + { + "_type": "span", + "text": "Improved community training resources", + "_key": "c755f0e5f474" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://training.nextflow.io/", + "_key": "0e6cde2aef90", + "_type": "link" + }, + { + "href": "https://training.nextflow.io/basic_training/setup/#gitpod", + "_key": "4974d79fdf45", + "_type": "link" + } + ], + "children": [ + { + "text": "Along with the updated training and hackathon resources above, we’ve significantly enhanced our online training materials available at ", + "_key": "09e9a6025011", + "_type": "span", + "marks": [] + }, + { + "_key": "066dfcda3cd6", + "_type": "span", + "marks": [ + "0e6cde2aef90" + ], + "text": "https://training.nextflow.io/" + }, + { + "text": ". Thanks to the efforts of our volunteers, technical training, ", + "_key": "16497fb9b039", + "_type": "span", + "marks": [] + }, + { + "text": "Gitpod resources", + "_key": "b5a2cd924ec9", + "_type": "span", + "marks": [ + "4974d79fdf45" + ] + }, + { + "text": ", and materials for hands-on, self-guided learning are now available in English and Portuguese. Some of the materials are also available in Spanish and French.", + "_key": "b78f73d6dcb1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "db7eba5beb1c" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "6ce348fac324" + } + ], + "_type": "block", + "style": "normal", + "_key": "915ce13a10e4" + }, + { + "children": [ + { + "marks": [], + "text": "The training comprises a significant set of resources covering topics including managing dependencies, containers, channels, processes, operators, and an introduction to the Groovy language. It also includes topics related to nf-core for users and developers as well as Nextflow Tower. Marcel Ribeiro-Dantas describes his experience leading the translation effort for this documentation in his latest nf-core/bytesize ", + "_key": "c74ef2ff4706", + "_type": "span" + }, + { + "_key": "efab97fd52c1", + "_type": "span", + "marks": [ + "1baef680afd7" + ], + "text": "translation talk" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d772c24b9bd5" + } + ], + "_type": "block", + "style": "normal", + "_key": "557d3f4a9223", + "markDefs": [ + { + "_key": "1baef680afd7", + "_type": "link", + "href": "https://nf-co.re/events/2023/bytesize_translations" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "117c8519c79b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9e2cd26b70cc" + } + ] + }, + { + "style": "normal", + "_key": "9dee413e2be0", + "markDefs": [ + { + "_key": "70a9ee3d59d4", + "_type": "link", + "href": "https://nextflow.io/blog/2023/learn-nextflow-in-2023.html" + } + ], + "children": [ + { + "marks": [], + "text": "Additional educational resources are provided in the recent Seqera Labs blog article, ", + "_key": "16d846210d02", + "_type": "span" + }, + { + "marks": [ + "70a9ee3d59d4" + ], + "text": "Learn Nextflow in 2023", + "_key": "3f1559dc4dc9", + "_type": "span" + }, + { + "marks": [], + "text": ", posted in February before our latest training event.", + "_key": "4b43711fbd26", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "86ed35d7c14b", + "children": [ + { + "text": "", + "_key": "eee284792500", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "30c2673d7356", + "children": [ + { + "_key": "63b49118f328", + "_type": "span", + "text": "The nf-core hackathon" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2023/hackathon-march-2023", + "_key": "b2d86ddbea0f" + } + ], + "children": [ + { + "marks": [], + "text": "We also ran a separate ", + "_key": "ce74a270a4e3", + "_type": "span" + }, + { + "text": "hackathon", + "_key": "c299b9f9a215", + "_type": "span", + "marks": [ + "b2d86ddbea0f" + ] + }, + { + "_type": "span", + "marks": [], + "text": " event from March 27th to 29th. This hackathon ran online via Gather, a virtual hosting platform, but for the first time we also asked community members to host local sites. We were blown away by the response, with volunteers coming forward to organize in-person attendance in 16 different locations across the world (and this was before we announced that Seqera would organize pizza for all the sites!). These gatherings had a big impact on the feel of the hackathon, whilst remaining accessible and eco-friendly, avoiding the need for air travel.", + "_key": "53c8a1569c4d" + } + ], + "_type": "block", + "style": "normal", + "_key": "78010d10238f" + }, + { + "_type": "block", + "style": "normal", + "_key": "c9860c9dfa43", + "children": [ + { + "_type": "span", + "text": "", + "_key": "69bf4a7dcab9" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7decb8a4fd97", + "markDefs": [], + "children": [ + { + "_key": "e7aad38c8614", + "_type": "span", + "marks": [], + "text": "The hackathon was divided into five focus areas: modules, pipelines, documentation, infrastructure, and subworkflows. We had " + }, + { + "marks": [ + "strong" + ], + "text": "411", + "_key": "7386d6b58b41", + "_type": "span" + }, + { + "text": " people register, including ", + "_key": "f2d905b1921f", + "_type": "span", + "marks": [] + }, + { + "text": "278 in-person attendees", + "_key": "8f69527808e5", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " at ", + "_key": "a30bbdbb6793" + }, + { + "_key": "d6e855a1369d", + "_type": "span", + "marks": [ + "strong" + ], + "text": "16 locations" + }, + { + "marks": [], + "text": ". This is an increase of ", + "_key": "b2f455be419e", + "_type": "span" + }, + { + "_key": "ac6c863635ad", + "_type": "span", + "marks": [ + "strong" + ], + "text": "38%" + }, + { + "_type": "span", + "marks": [], + "text": " compared to the ", + "_key": "dd69ff74c969" + }, + { + "_key": "855f28c729e1", + "_type": "span", + "marks": [ + "strong" + ], + "text": "289" + }, + { + "_key": "c9fe094870db", + "_type": "span", + "marks": [], + "text": " people that attended our October 2022 event. The hackathon was hosted in multiple countries including Brazil, France, Germany, Italy, Poland, Senegal, Serbia, South Africa, Spain, Sweden, the UK, and the United States." + } + ] + }, + { + "style": "normal", + "_key": "bcabc1f5ea3a", + "children": [ + { + "text": "", + "_key": "518a4bbc462a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "12de9b91d952", + "markDefs": [], + "children": [ + { + "text": "We would like to thank the many organizations worldwide who provided a venue to host the hackathon and helped make it a resounding success. Besides being an excellent educational event, we resolved many longstanding Nextflow and nf-core issues.", + "_key": "af55e8bb3c3c", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "26a5d16d2727", + "children": [ + { + "_key": "362b275555d7", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-d57b62acafc31e78a79b462f923c5c908a8679e0-4000x2250-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Hackathon photo", + "_key": "b4726901a091" + }, + { + "children": [ + { + "_key": "5ef338fe8641", + "_type": "span", + "marks": [], + "text": "You can access the project reports from each hackathon team over the three-day event compiled in HackMD below:" + } + ], + "_type": "block", + "style": "normal", + "_key": "2b7703404230", + "markDefs": [] + }, + { + "children": [ + { + "_key": "edb7d23f0cce", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b185e6bf5ef5" + }, + { + "listItem": "bullet", + "children": [ + { + "marks": [], + "text": "[Modules team](https://hackmd.io/A5v4soteQjKywl3UgFa_6g)", + "_key": "2f920ceb0619", + "_type": "span" + }, + { + "text": "[Pipelines Team](https://hackmd.io/Bj_MK3ubQWGBD4t0X2KpjA)", + "_key": "b758e24acbc3", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[Documentation Team](https://hackmd.io/o6AgPTZ7RBGCyZI72O1haA)", + "_key": "08b46f791f9a" + }, + { + "_key": "45165537081c", + "_type": "span", + "marks": [], + "text": "[Infrastructure Team](https://hackmd.io/uC-mZlEXQy6DaXZdjV6akA)" + }, + { + "_key": "61cb503d894a", + "_type": "span", + "marks": [], + "text": "[Subworkflows Team](https://hackmd.io/Udtvj4jASsWLtMgrbTNwBA)" + } + ], + "_type": "block", + "style": "normal", + "_key": "6a49e6b521fb" + }, + { + "_key": "066f712c984f", + "children": [ + { + "text": "", + "_key": "8fe0488158e6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e5feb5694d4d", + "markDefs": [ + { + "_key": "44aabe4cba4c", + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhfyF_QJIfSslnxRCU817yc" + }, + { + "_type": "link", + "href": "https://github.com/orgs/nf-core/projects/38/views/16?layout=board", + "_key": "29a43310ed23" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can also view ten Hackathon videos outlining the event, introducing an overview of the teams, and daily hackathon activities in the ", + "_key": "08d977d94770" + }, + { + "text": "March 2023 nf-core hackathon YouTube playlist", + "_key": "bf191fbff946", + "_type": "span", + "marks": [ + "44aabe4cba4c" + ] + }, + { + "text": ". Check out activity in the nf-core hackathon March 2023 Github ", + "_key": "4c4ddb729bcd", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "29a43310ed23" + ], + "text": "issues board", + "_key": "7b55bd357d5a", + "_type": "span" + }, + { + "marks": [], + "text": " for a summary of what each team worked on.", + "_key": "55fb72f9c519", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "221cdcf3f06c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bcb1b599505b" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "ea82a40ccd3a", + "children": [ + { + "_type": "span", + "text": "A diverse and growing community", + "_key": "fc456beb0f49" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "We were particularly pleased to see the growing diversity of the Nextflow and nf-core community membership, enabled partly by support from the Chan Zuckerberg Initiative Diversity and Inclusion grant and our nf-core mentorship programs. You can learn more about our mentorship efforts and exciting efforts of our global team in Chris Hakkaart’s excellent post, ", + "_key": "dd902cbe572a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9d5953d995a1" + ], + "text": "Nextflow and nf-core Mentorship", + "_key": "a42cf43d2160" + }, + { + "text": " on the Nextflow blog.", + "_key": "9b4e01ecce4a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "91ae1567fb42", + "markDefs": [ + { + "_key": "9d5953d995a1", + "_type": "link", + "href": "https://nextflow.io/blog/2023/czi-mentorship-round-2.html" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "c3e0127475cf", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a0c89a417a3" + }, + { + "_type": "block", + "style": "normal", + "_key": "90ef9228534c", + "markDefs": [ + { + "_key": "5cd70f4e6d95", + "_type": "link", + "href": "https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The growing diversity of our community was also reflected in the results of our latest Nextflow Community survey, which you can read more about on the ", + "_key": "0f5048f06cf1" + }, + { + "marks": [ + "5cd70f4e6d95" + ], + "text": "Seqera Labs blog", + "_key": "82f0954eb9c2", + "_type": "span" + }, + { + "_key": "5e7c401d084b", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "children": [ + { + "_key": "b84b019d7d88", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bb7e341fc6f6" + }, + { + "asset": { + "_ref": "image-a734dc5ff7fb25c55689cdaac7b8a0991c92135f-1600x900-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Hackathon photo", + "_key": "090a4c9df6f6" + }, + { + "_type": "block", + "style": "h2", + "_key": "3b910d2315d9", + "children": [ + { + "text": "Looking forward", + "_key": "9354bd614305", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "822a0ea65c1d", + "markDefs": [], + "children": [ + { + "text": "Running global events at this scale takes a tremendous team effort. The resources compiled will be valuable in introducing more people to Nextflow and nf-core. Thanks to everyone who participated in this year’s training and hackathon events. We look forward to making these even bigger and better in the future!", + "_key": "45098b969a4b", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cbf7a0c97aba", + "children": [ + { + "_key": "fa1f370abee1", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "4e596d8f0008", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The next community training will be held online September 2023. This will be followed by two Nextflow Summit events with associated nf-core hackathons:", + "_key": "763e9bc6332f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "b9ee43eaf25d", + "children": [ + { + "_key": "0238376c3ee7", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "d82889d891de", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Barcelona: October 16-20, 2023", + "_key": "95c07a2102db" + }, + { + "_type": "span", + "marks": [], + "text": "Boston: November 2023 (dates to be confirmed)", + "_key": "0a826d53fc83" + } + ], + "_type": "block" + }, + { + "_key": "7b1a58b28ade", + "children": [ + { + "text": "", + "_key": "2fdd80963170", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/summit-2023-preregistration/", + "_key": "478488031598" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you’d like to join, you can register to receive news and updates about the events at ", + "_key": "b69c400ebec9" + }, + { + "_type": "span", + "marks": [ + "478488031598" + ], + "text": "https://summit.nextflow.io/summit-2023-preregistration/", + "_key": "49189fcae9b5" + } + ], + "_type": "block", + "style": "normal", + "_key": "e16d79ae7611" + }, + { + "_key": "5b70ee9ab4cf", + "children": [ + { + "_key": "01d37ed30bbd", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f90e9c28852b", + "markDefs": [ + { + "_type": "link", + "href": "https://twitter.com/nextflowio", + "_key": "269fbd4e4e00" + }, + { + "_type": "link", + "href": "https://twitter.com/nf_core", + "_key": "f436d0ef5811" + }, + { + "href": "https://www.nextflow.io/slack-invite.html", + "_key": "215c1775b8fe", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/join", + "_key": "60e1586785da" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can follow us on Twitter at ", + "_key": "87a04d104a2a" + }, + { + "_type": "span", + "marks": [ + "269fbd4e4e00" + ], + "text": "@nextflowio", + "_key": "ec82ac6a6686" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "97ee9e4f1a70" + }, + { + "_type": "span", + "marks": [ + "f436d0ef5811" + ], + "text": "@nf_core", + "_key": "c956f4bddb43" + }, + { + "_type": "span", + "marks": [], + "text": " or join the discussion on the ", + "_key": "18df91171ee4" + }, + { + "text": "Nextflow", + "_key": "e8b67ad4b9c1", + "_type": "span", + "marks": [ + "215c1775b8fe" + ] + }, + { + "text": " and ", + "_key": "8711575299e4", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "60e1586785da" + ], + "text": "nf-core", + "_key": "6aaf674e62f7", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " community Slack channels.", + "_key": "d236bdfbcdaf" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "750720d0e7b3", + "children": [ + { + "text": "", + "_key": "876d72ad0097", + "_type": "span" + } + ] + }, + { + "alt": "Hackathon photo", + "_key": "9bd5bc4b92ec", + "asset": { + "_ref": "image-b49849f4e96bd2a35128fdfdb7f68faa5e62dccf-1600x900-jpg", + "_type": "reference" + }, + "_type": "image" + }, + { + "asset": { + "_type": "reference", + "_ref": "image-e6edb9510ce549ea88d230d9961a0f1dac92f1ed-1600x900-jpg" + }, + "_type": "image", + "alt": "Hackathon photo", + "_key": "62b900e5c2ae" + } + ], + "publishedAt": "2023-04-25T06:00:00.000Z", + "_createdAt": "2024-09-25T14:17:05Z", + "_type": "blogPost", + "_updatedAt": "2024-09-25T14:17:05Z", + "_rev": "mvya9zzDXWakVjnX4hhYxy", + "_id": "3e6eb521cfdb", + "title": "Celebrating our largest international training event and hackathon to date" + }, + { + "title": "Nextflow’s community is moving to Slack!", + "_id": "3ee301ac8104", + "_updatedAt": "2024-09-30T09:14:14Z", + "_type": "blogPost", + "_createdAt": "2024-09-25T14:16:38Z", + "meta": { + "slug": { + "current": "nextflow-is-moving-to-slack" + }, + "description": "The Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members." + }, + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5kKL", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "eb1f9a49532c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f983d966a1d8" + } + ] + }, + { + "style": "normal", + "_key": "eece47f145ca", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The Nextflow community channel on Gitter has grown substantially over the last few years and today has more than 1,300 members.", + "_key": "acf71a1466ce", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7d4a23f11a7d", + "markDefs": [], + "children": [ + { + "_key": "f57f9607de4c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "7f71b43c3bec", + "markDefs": [ + { + "href": "https://twitter.com/helicobacter1", + "_key": "7947962b2c73", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I still remember when a ", + "_key": "fec89bbc75d3" + }, + { + "marks": [ + "7947962b2c73" + ], + "text": "former colleague", + "_key": "e4d6f885d940", + "_type": "span" + }, + { + "_key": "67b578f1a768", + "_type": "span", + "marks": [], + "text": " proposed the idea of opening a Nextflow channel on Gitter. At the time, I didn't know anything about Gitter, and my initial response was : "would that not be a waste of time?"." + } + ] + }, + { + "_key": "76d2aa7c6d27", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "65c103510509", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "970388c072bd", + "markDefs": [], + "children": [ + { + "text": "Fortunately, I took him up on his suggestion and the Gitter channel quickly became an important resource for all Nextflow developers and a key factor to its success.", + "_key": "d3802f3d61bc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "ff50960d0544", + "markDefs": [], + "children": [ + { + "_key": "68255089825b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "35f80dc3e94e", + "markDefs": [], + "children": [ + { + "_key": "d982f77f1240", + "_type": "span", + "marks": [], + "text": "Where the future lies" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "text": "As the Nextflow community continues to grow, we realize that we have reached the limit of the discussion experience on Gitter. The lack of internal channels and the poor support for threads make the discussion unpleasant and difficult to follow. Over the last few years, Slack has proven to deliver a much better user experience and it is also touted as one of the most used platforms for discussion.", + "_key": "bd69b241b71e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b3fdc48b2b08", + "markDefs": [] + }, + { + "style": "normal", + "_key": "fb1571a41c54", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0caa103ad4c3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "f9abe2c0f471", + "_type": "span", + "marks": [], + "text": "For these reasons, we felt that it is time to say goodbye to the beloved Nextflow Gitter channel and would like to welcome the community into the brand-new, official Nextflow workspace on Slack!" + } + ], + "_type": "block", + "style": "normal", + "_key": "6516a02ccf49", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "658d58bb6a62", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "35d362ebee35" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can join today using ", + "_key": "233d172b5cbe" + }, + { + "text": "this link", + "_key": "a59b3adc6190", + "_type": "span", + "marks": [ + "3ba8555ed3cd" + ] + }, + { + "text": "!", + "_key": "1b23e1eb15bd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0ca12e043cc4", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/slack-invite.html", + "_key": "3ba8555ed3cd" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "b43719ec8aff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7162e7f181e2", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "292a47c383a3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Once you have joined, you will be added to a selection of generic channels. However, we have also set up various additional channels for discussion around specific Nextflow topics, and for infrastructure-related topics. Please feel free to join whichever channels are appropriate to you.", + "_key": "097d7d189a86", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7dea68cd3328", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cd4c9b5617ba" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ded100c45df3", + "markDefs": [ + { + "href": "https://groups.google.com/forum/#!forum/nextflow", + "_key": "7bfa47d830ea", + "_type": "link" + }, + { + "_key": "4c2eb8cc0256", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/discussions" + } + ], + "children": [ + { + "marks": [], + "text": "Along the same lines, the Nextflow discussion forum is moving from ", + "_key": "57bf8018a991", + "_type": "span" + }, + { + "text": "Google Groups", + "_key": "ebca5418e49e", + "_type": "span", + "marks": [ + "7bfa47d830ea" + ] + }, + { + "_key": "c228faea430e", + "_type": "span", + "marks": [], + "text": " to the " + }, + { + "_type": "span", + "marks": [ + "4c2eb8cc0256" + ], + "text": "Discussion forum", + "_key": "57e249d36364" + }, + { + "_type": "span", + "marks": [], + "text": " in the Nextflow GitHub repository. We hope this will provide a much better experience for Nextflow users by having a more direct connection with the codebase and issue repository.", + "_key": "eeb00973422a" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ef86b791937d" + } + ], + "_type": "block", + "style": "normal", + "_key": "35afff91ef04" + }, + { + "style": "normal", + "_key": "7428b1f9bd35", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The old Gitter channel and Google Groups will be kept active for reference and historical purposes, however we are actively promoting all members to move to the new channels.", + "_key": "efccb6ce21a7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "883885455491", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "397ed7d302a2", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "mailto:info@nextflow.io", + "_key": "deb0c96db9bb", + "_type": "link" + } + ], + "children": [ + { + "text": "If you have any questions or problems signing up then please feel free to let us know at ", + "_key": "e769f4e0d39a", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "deb0c96db9bb" + ], + "text": "info@nextflow.io", + "_key": "a1c6783b64e1", + "_type": "span" + }, + { + "text": ".", + "_key": "adc6641b6c03", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1dbe6fc35a5f" + }, + { + "_key": "168ba993139d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bede098d3ccb", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As always, we thank you for being a part of the Nextflow community and for your ongoing support in driving its development and making workflows cool!", + "_key": "da76dbcbbad8" + } + ], + "_type": "block", + "style": "normal", + "_key": "1055fa1a28a9" + }, + { + "_key": "9964685b19ac", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bd898ec7f1ad", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See you on Slack!", + "_key": "ac34d8f21734" + } + ], + "_type": "block", + "style": "normal", + "_key": "cd09692e0dd8" + }, + { + "style": "normal", + "_key": "43286816b515", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "3092d3531ce7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "b66cef68f98b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Credits", + "_key": "2af9402fe6b7", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "7fde3f48366d", + "markDefs": [ + { + "_type": "link", + "href": "https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/", + "_key": "e301fd89678c" + }, + { + "_type": "link", + "href": "https://slack.com/intl/en-gb/about/slack-for-good", + "_key": "bd2f412907e4" + }, + { + "_type": "link", + "href": "https://www.seqera.io", + "_key": "80bf2bf9b52c" + } + ], + "children": [ + { + "text": "This was also made possible thanks to sponsorship from the ", + "_key": "4199ed5e9969", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e301fd89678c" + ], + "text": "Chan Zuckerberg Initiative", + "_key": "50b04d1c362f" + }, + { + "_type": "span", + "marks": [], + "text": ", the ", + "_key": "20298eb6543c" + }, + { + "_type": "span", + "marks": [ + "bd2f412907e4" + ], + "text": "Slack for Nonprofits program", + "_key": "ba77b8e5d660" + }, + { + "marks": [], + "text": " and support from ", + "_key": "f1698ff8ab35", + "_type": "span" + }, + { + "text": "Seqera Labs", + "_key": "9c01eee4313d", + "_type": "span", + "marks": [ + "80bf2bf9b52c" + ] + }, + { + "_key": "1a9abbfc4144", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + } + ], + "tags": [ + { + "_ref": "3d25991c-f357-442b-a5fa-6c02c3419f88", + "_type": "reference", + "_key": "04b9d88ed694" + } + ], + "publishedAt": "2022-02-22T07:00:00.000Z" + }, + { + "_createdAt": "2024-09-25T14:17:34Z", + "title": "Nextflow Summit 2023 Recap", + "author": { + "_ref": "noel-ortiz", + "_type": "reference" + }, + "_updatedAt": "2024-09-25T14:17:34Z", + "publishedAt": "2023-10-25T06:00:00.000Z", + "_rev": "Ot9x7kyGeH5005E3MJ9Wp8", + "_type": "blogPost", + "meta": { + "slug": { + "current": "nextflow-summit-2023-recap" + } + }, + "_id": "3f235e907825", + "body": [ + { + "children": [ + { + "_key": "dfa6f114da10", + "_type": "span", + "text": "Five days of Nextflow Awesomeness in Barcelona" + } + ], + "_type": "block", + "style": "h2", + "_key": "13ec8fba09fd" + }, + { + "_type": "block", + "style": "normal", + "_key": "7115faea0544", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/hackathon", + "_key": "55999ee7f4e6" + }, + { + "href": "https://summit.nextflow.io/", + "_key": "3bd42d641edb", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nextflow.slack.com/archives/C0602TWRT5G", + "_key": "198ca41c4066" + }, + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL", + "_key": "b574108d0a23" + } + ], + "children": [ + { + "marks": [], + "text": "On Friday, Oct 20, we wrapped up our ", + "_key": "559145446d99", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "55999ee7f4e6" + ], + "text": "hackathon", + "_key": "bdaf5551163e" + }, + { + "marks": [], + "text": " and ", + "_key": "cd5b2f21de78", + "_type": "span" + }, + { + "_key": "c81796a049df", + "_type": "span", + "marks": [ + "3bd42d641edb" + ], + "text": "Nextflow Summit" + }, + { + "text": " in Barcelona, Spain. By any measure, this year’s Summit was our best community event ever, drawing roughly 900 attendees across multiple channels, including in-person attendees, participants in our ", + "_key": "a0b699df91d3", + "_type": "span", + "marks": [] + }, + { + "_key": "0f94a55ba2c3", + "_type": "span", + "marks": [ + "198ca41c4066" + ], + "text": "#summit-2023" + }, + { + "text": " Slack channel, and ", + "_key": "325459a5f8d3", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "b574108d0a23" + ], + "text": "Summit Livestream", + "_key": "3bfd251ca155", + "_type": "span" + }, + { + "_key": "ef1fbd009b5a", + "_type": "span", + "marks": [], + "text": " viewers on YouTube." + } + ] + }, + { + "_key": "edab1ba1020a", + "children": [ + { + "_key": "09884e46bd47", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "5f2e408b7949", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Summit drew attendees, speakers, and sponsors from around the world. Over the course of the three-day event, we heard from dozens of impressive speakers working at the cutting edge of life sciences from academia, research, healthcare providers, biotechs, and cloud providers, including:", + "_key": "31711ea0a93b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "dc157db099d3" + } + ], + "_type": "block", + "style": "normal", + "_key": "2f72d2d8269c" + }, + { + "listItem": "bullet", + "children": [ + { + "text": "Australian BioCommons", + "_key": "6d027453abac", + "_type": "span", + "marks": [] + }, + { + "_key": "fa60fb6749be", + "_type": "span", + "marks": [], + "text": "Genomics England" + }, + { + "_type": "span", + "marks": [], + "text": "Pixelgen Technologies", + "_key": "3a82b961f9d9" + }, + { + "marks": [], + "text": "University of Tennessee Health Science Center", + "_key": "bc9fd757b430", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Amazon Web Services", + "_key": "a331f9c79b8d" + }, + { + "marks": [], + "text": "Quantitative Biology Center - University of Tübingen", + "_key": "b8431e88bc66", + "_type": "span" + }, + { + "_key": "3369ea31f5c2", + "_type": "span", + "marks": [], + "text": "Biomodal" + }, + { + "_type": "span", + "marks": [], + "text": "Matterhorn Studio", + "_key": "ca1340d5bd96" + }, + { + "text": "Centre for Genomic Regulation (CRG)", + "_key": "335f81fbfcf5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Heidelberg University Hospital", + "_key": "4632262e3753" + }, + { + "marks": [], + "text": "MemVerge", + "_key": "13ea71aad878", + "_type": "span" + }, + { + "marks": [], + "text": "University of Cambridge", + "_key": "95396ee3da8b", + "_type": "span" + }, + { + "marks": [], + "text": "Oxford Nanopore Technologies", + "_key": "53f9611f4c8f", + "_type": "span" + }, + { + "text": "Medical University of Innsbruck", + "_key": "8a8cb436c813", + "_type": "span", + "marks": [] + }, + { + "_key": "ac924a49384f", + "_type": "span", + "marks": [], + "text": "Sano Genetics" + }, + { + "_key": "b796bec1f629", + "_type": "span", + "marks": [], + "text": "Institute of Genetics and Development of Rennes, University of Rennes" + }, + { + "text": "Ardigen", + "_key": "935fe6caf5a2", + "_type": "span", + "marks": [] + }, + { + "_key": "b25c35049e99", + "_type": "span", + "marks": [], + "text": "ZS" + }, + { + "_type": "span", + "marks": [], + "text": "Wellcome Sanger Institute", + "_key": "85b42d1e46ab" + }, + { + "_type": "span", + "marks": [], + "text": "SciLifeLab", + "_key": "1fcf80e8bd2c" + }, + { + "marks": [], + "text": "AstraZeneca UK Ltd", + "_key": "87f82ea75981", + "_type": "span" + }, + { + "marks": [], + "text": "University of Texas at Dallas", + "_key": "04e6eeeede27", + "_type": "span" + }, + { + "_key": "5373aa66e739", + "_type": "span", + "marks": [], + "text": "Seqera" + } + ], + "_type": "block", + "style": "normal", + "_key": "277b4cd9d9f0" + }, + { + "_type": "block", + "style": "normal", + "_key": "b68026fb478b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3f5f32007d1f" + } + ] + }, + { + "style": "h2", + "_key": "d875f24e2ff2", + "children": [ + { + "_type": "span", + "text": "The Hackathon – advancing the Nextflow ecosystem", + "_key": "6b9388404dae" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7f638c916e12", + "markDefs": [ + { + "href": "https://github.com/orgs/nf-core/projects/47/views/1", + "_key": "ab5c0bf6bc4d", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "7c4f1146311a" + } + ], + "children": [ + { + "marks": [], + "text": "The week began with a three-day in-person and virtual nf-core hackathon event. With roughly 100 in-person developers, this was twice the size of our largest Hackathon to date. As with previous Hackathons, participants were divided into project groups, with activities coordinated via a single ", + "_key": "bdd59dd86bad", + "_type": "span" + }, + { + "_key": "9f6d79c354fb", + "_type": "span", + "marks": [ + "ab5c0bf6bc4d" + ], + "text": "GitHub project board" + }, + { + "text": " focusing on different aspects of ", + "_key": "b05499326529", + "_type": "span", + "marks": [] + }, + { + "_key": "08bdcf7834e3", + "_type": "span", + "marks": [ + "7c4f1146311a" + ], + "text": "nf-core" + }, + { + "_key": "44edd8b707b5", + "_type": "span", + "marks": [], + "text": " and Nextflow, including:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f65a0937c414" + } + ], + "_type": "block", + "style": "normal", + "_key": "6d5d41f8da35" + }, + { + "_key": "9d8af9de3d97", + "listItem": "bullet", + "children": [ + { + "text": "Pipelines", + "_key": "475bfe04d882", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Modules & subworkflows", + "_key": "6d780d71f656", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Infrastructure", + "_key": "22e6ed85d13c" + }, + { + "_type": "span", + "marks": [], + "text": "Nextflow & plugins development", + "_key": "fd90938084cd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7cb96ec653e1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b7f913bfdd20" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "1b066c2eca20", + "markDefs": [ + { + "_key": "e96da4917250", + "_type": "link", + "href": "https://code.askimed.com/nf-test/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This year, the focus of the hackathon was ", + "_key": "acecd627bc2b" + }, + { + "_key": "205d24356056", + "_type": "span", + "marks": [ + "e96da4917250" + ], + "text": "nf-test" + }, + { + "text": ", an open-source testing framework for Nextflow pipelines. The team made considerable progress applying nf-test consistently across various nf-core pipelines and modules — and of course, no Hackathon would be complete without a community cooking class, quiz, bingo, a sock hunt, and a scavenger hunt!", + "_key": "3f5cc61b1351", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0ba622c25bf5", + "children": [ + { + "_key": "191c96cd98ce", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "14e66d4a5a06", + "markDefs": [ + { + "_key": "878ef99f81a4", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/" + } + ], + "children": [ + { + "text": "For an overview of the tremendous progress made advancing the state of Nextflow and nf-core in three short days, view Chris Hakkaart’s talk on ", + "_key": "37a27d488495", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "878ef99f81a4" + ], + "text": "highlights from the nf-core hackathon", + "_key": "307843a6950c" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "29e2a477e865" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9c322c7a503f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0412744e0392" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "d0ca596800e1", + "children": [ + { + "_type": "span", + "text": "The Summit kicks off", + "_key": "83ca0fa4c590" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "The Summit began on Wednesday Oct 18 with excellent talks from ", + "_key": "1081730797d8", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "df9b1c2a8898" + ], + "text": "Australian BioCommons", + "_key": "f4c1e5a1a091" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "f2d3808b6dc1" + }, + { + "_key": "cbe8342f675e", + "_type": "span", + "marks": [ + "74f32edad4b7" + ], + "text": "Genomics England" + }, + { + "_key": "1588ab3ed2d3", + "_type": "span", + "marks": [], + "text": ". This was followed by a presentation where " + }, + { + "_type": "span", + "marks": [ + "9a28b4078ee2" + ], + "text": "Pixelgen Technologies", + "_key": "40ad4d1efdac" + }, + { + "marks": [], + "text": " described their unique Molecular Pixelation (MPX) technologies and unveiled their new ", + "_key": "7ebf647520ac", + "_type": "span" + }, + { + "text": "nf-core/pixelator", + "_key": "ee5b107cb013", + "_type": "span", + "marks": [ + "857aeee91cac" + ] + }, + { + "marks": [], + "text": " community pipeline for molecular pixelation assays.", + "_key": "4bf69cc54537", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1a293dd1ab9", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/", + "_key": "df9b1c2a8898" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/", + "_key": "74f32edad4b7" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/", + "_key": "9a28b4078ee2", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/pixelator/1.0.0", + "_key": "857aeee91cac" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "ed625479b624", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ba70f06c5dd7" + }, + { + "style": "normal", + "_key": "bfa61036554a", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html", + "_key": "eb1bde101012" + }, + { + "href": "https://nextflow.io/blog/2023/community-forum.html", + "_key": "ad2cecb264cc", + "_type": "link" + }, + { + "href": "https://community.seqera.io", + "_key": "89ec88f218b9", + "_type": "link" + }, + { + "href": "https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html", + "_key": "5aec553cf479", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.oreilly.com/library/view/genomics-in-the/9781491975183/", + "_key": "5d704377dd00" + } + ], + "children": [ + { + "text": "Next, Seqera’s Phil Ewels took the stage providing a series of community updates, including the announcement of a new ", + "_key": "e5b68d5732d0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "eb1bde101012" + ], + "text": "Nextflow Ambassador", + "_key": "43e5a69e4acc" + }, + { + "marks": [], + "text": " program, ", + "_key": "adda4940eb01", + "_type": "span" + }, + { + "_key": "3178039d4029", + "_type": "span", + "marks": [ + "ad2cecb264cc" + ], + "text": "a new community forum" + }, + { + "text": " at ", + "_key": "915ac277c5b7", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "89ec88f218b9" + ], + "text": "community.seqera.io", + "_key": "2eb943a6e7a3", + "_type": "span" + }, + { + "text": ", and the exciting appointment of ", + "_key": "403fbe290a5f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "5aec553cf479" + ], + "text": "Geraldine Van der Auwera", + "_key": "5292a77ebd78", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " as lead developer advocate for the Nextflow. Geraldine is well known for her work on GATK, WDL, and Terra.bio and is the co-author of the book ", + "_key": "8069e0901ddf" + }, + { + "_type": "span", + "marks": [ + "5d704377dd00" + ], + "text": "Genomics on the Cloud", + "_key": "1ea8773a5167" + }, + { + "_key": "1ec3523ea70e", + "_type": "span", + "marks": [], + "text": ". As Geraldine assumes leadership of the developer advocacy team, Phil will spend more time focusing on open-source development, as product manager of open source at Seqera." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "061df5d01f09", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d89723197808" + } + ] + }, + { + "asset": { + "_ref": "image-1f9c53d8f6d591fa2bb366f2d4f855f964f394b8-1200x661-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Hackathon 2023 photo", + "_key": "ee76d656f349" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/platform/", + "_key": "0538a636d073" + }, + { + "_key": "d5b2801dfc0d", + "_type": "link", + "href": "https://seqera.io/blog/introducing-data-explorer/" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/", + "_key": "3759c14fb96c" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera’s Evan Floden shared his vision of the modern biotech stack for open science, highlighting recent developments at Seqera, including a revamped ", + "_key": "bf490f12635f" + }, + { + "_type": "span", + "marks": [ + "0538a636d073" + ], + "text": "Seqera platform", + "_key": "b69e9d36c04c" + }, + { + "marks": [], + "text": ", new ", + "_key": "2f0f56a5a636", + "_type": "span" + }, + { + "marks": [ + "d5b2801dfc0d" + ], + "text": "Data Explorer", + "_key": "52f6fbba4423", + "_type": "span" + }, + { + "marks": [], + "text": " functionality, and providing an exciting glimpse of the new Data Studios feature now in private preview. You can view ", + "_key": "de9c08d010f1", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3759c14fb96c" + ], + "text": "Evan’s full talk here", + "_key": "67815eeb59ca" + }, + { + "_key": "1575362dfbbb", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "1428e1d6e3cf" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8415f7687a23" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d424bbb023f" + }, + { + "_type": "block", + "style": "normal", + "_key": "6894e65d9026", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/", + "_key": "8b363a582379" + } + ], + "children": [ + { + "text": "A highlight was the keynote delivered by Erik Garrison of the University of Tennessee Health Science Center provided. In his talk, ", + "_key": "5b3c49cdcd98", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "8b363a582379" + ], + "text": "Biological revelations at the frontiers of a draft human pangenome reference", + "_key": "a2e5a7062f5e" + }, + { + "_type": "span", + "marks": [], + "text": ", Erik shared how his team's cutting-edge work applying new computational methods in the context of the Human Pangenome Project has yielded the most complete picture of human sequence evolution available to date.", + "_key": "3cb951fcc8f5" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f26b2a607fdf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ba2db8650abb" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "97a6e355a954", + "markDefs": [ + { + "_key": "27a0d5a00c46", + "_type": "link", + "href": "https://www.globenewswire.com/news-release/2023/10/20/2763899/0/en/Seqera-Sets-Sail-With-Alinghi-Red-Bull-Racing-as-Official-High-Performance-Computing-Supplier.html" + }, + { + "_type": "link", + "href": "https://www.americascup.com/", + "_key": "d6fb925b8f21" + }, + { + "href": "https://alinghiredbullracing.americascup.com/", + "_key": "1fc0c5ec60e0", + "_type": "link" + } + ], + "children": [ + { + "_key": "fd62ca11e3e1", + "_type": "span", + "marks": [], + "text": "Day one wrapped up with a surprise " + }, + { + "text": "announcement", + "_key": "bae5880dae50", + "_type": "span", + "marks": [ + "27a0d5a00c46" + ] + }, + { + "text": " that Seqera has been confirmed as the official High-Performance Computing Supplier for Alinghi Red Bull Racing at the ", + "_key": "6814471a2c12", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "d6fb925b8f21" + ], + "text": "37th America’s Cup", + "_key": "66c5f1bba5b2" + }, + { + "_key": "5bc1a244f320", + "_type": "span", + "marks": [], + "text": " in Barcelona. This was followed by an evening reception hosted by " + }, + { + "marks": [ + "1fc0c5ec60e0" + ], + "text": "Alinghi Red Bull Racing", + "_key": "1051b345a8e9", + "_type": "span" + }, + { + "text": ".", + "_key": "65ae176cda57", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "54617fae136c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1691744cab4f" + } + ] + }, + { + "style": "h2", + "_key": "23905164f376", + "children": [ + { + "_key": "6d20f29c1422", + "_type": "span", + "text": "Day two starts off on the right foot" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Day two kicked off with a brisk sunrise run along the iconic Barcelona Waterfront attended by a team of hardy Summit participants. After that, things kicked into high gear for the morning session with talks on everything from using Nextflow to power ", + "_key": "4e9fb421479a" + }, + { + "text": "Machine Learning pipelines for materials science", + "_key": "ad268d7ad4c3", + "_type": "span", + "marks": [ + "0f5284738d0b" + ] + }, + { + "text": " to ", + "_key": "4e229c6c7f16", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "0de813eee44d" + ], + "text": "standardized frameworks for protein structure prediction", + "_key": "020de77475c3" + }, + { + "_type": "span", + "marks": [], + "text": " to discussions on ", + "_key": "13f13f9f33f9" + }, + { + "_type": "span", + "marks": [ + "89852f2877d3" + ], + "text": "how to estimate the CO2 footprint of pipeline runs", + "_key": "bf103d7111f3" + }, + { + "text": ".", + "_key": "b8f6542a0668", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a0f553dacc6e", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/", + "_key": "0f5284738d0b" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/", + "_key": "0de813eee44d" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/", + "_key": "89852f2877d3", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a8f1e668029b", + "children": [ + { + "text": "", + "_key": "32527b356919", + "_type": "span" + } + ] + }, + { + "_type": "image", + "alt": "Summit 2023 photo", + "_key": "7f47ff745f15", + "asset": { + "_ref": "image-25defaeaabff1a7d9d3435f37cbf7c014263d2d0-1200x724-jpg", + "_type": "reference" + } + }, + { + "_key": "406a0f844ab5", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/fusion/", + "_key": "b72d4de0c539" + }, + { + "_type": "link", + "href": "https://seqera.io/wave/", + "_key": "c4f519475069" + }, + { + "href": "https://nextflow.io/docs/latest/process.html#spack", + "_key": "0bd73c94cc37", + "_type": "link" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/", + "_key": "aaced4648329", + "_type": "link" + }, + { + "_key": "b472902bbba1", + "_type": "link", + "href": "https://github.com/seqeralabs/wave" + } + ], + "children": [ + { + "text": "Nextflow creator and Seqera CTO and co-founder Paolo Di Tommaso provided an update on some of the technologies he and his team have been working on including a deep dive on the ", + "_key": "b288a6126d4a", + "_type": "span", + "marks": [] + }, + { + "text": "Fusion file system", + "_key": "3ede39244967", + "_type": "span", + "marks": [ + "b72d4de0c539" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". Paolo also delved into ", + "_key": "5a5f3d8fdf8b" + }, + { + "text": "Wave containers", + "_key": "a0ffa5cbe273", + "_type": "span", + "marks": [ + "c4f519475069" + ] + }, + { + "marks": [], + "text": ", discussing the dynamic assembly of containers using the ", + "_key": "e79e04aa2077", + "_type": "span" + }, + { + "text": "Spack package manager", + "_key": "f4bdd9a13f96", + "_type": "span", + "marks": [ + "0bd73c94cc37" + ] + }, + { + "marks": [], + "text": ", echoing a similar theme from AWS’s ", + "_key": "bd8319c06ec8", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "aaced4648329" + ], + "text": "Brendan Bouffler", + "_key": "04a8bd9f9d88" + }, + { + "_type": "span", + "marks": [], + "text": " earlier in the day. During the conference, Seqera announced Wave Containers as our latest ", + "_key": "f780c5c2c9aa" + }, + { + "_key": "d8e00b3f762c", + "_type": "span", + "marks": [ + "b472902bbba1" + ], + "text": "open-source" + }, + { + "_type": "span", + "marks": [], + "text": " contribution to the bioinformatics community — a huge contribution to the open science movement.", + "_key": "c59f7c0ac6af" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "53e9cdb6c719", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ac961a6e1160" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "46d8323268ee", + "markDefs": [ + { + "_key": "f6c9fffbc89b", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/", + "_key": "0e99edb6b3f2", + "_type": "link" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/", + "_key": "3b3595e9888b" + } + ], + "children": [ + { + "marks": [], + "text": "Paolo also provided an impressive command-line focused demo of Wave, echoing Harshil Patel’s equally impressive demo earlier in the day focused on ", + "_key": "220f0ac51a49", + "_type": "span" + }, + { + "_key": "627f983618ab", + "_type": "span", + "marks": [ + "f6c9fffbc89b" + ], + "text": "seqerakit and automation on the Seqera Platform" + }, + { + "_key": "13f2a0fc6fa2", + "_type": "span", + "marks": [], + "text": ". Both Harshil and Paolo showed themselves to be " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "\"kings of the live demo\"", + "_key": "4e2c58617a09" + }, + { + "marks": [], + "text": " for their command line mastery under pressure! You can view ", + "_key": "8d838d64321d", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "0e99edb6b3f2" + ], + "text": "Paolo’s talk and demos here", + "_key": "806158bd97a6" + }, + { + "_key": "6ae8e2cc1e62", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "3b3595e9888b" + ], + "text": "Harshil’s talk here", + "_key": "25ac24a79805" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "8a8acb8e5901" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f7f3e312a053" + } + ], + "_type": "block", + "style": "normal", + "_key": "e518ff546fd8" + }, + { + "style": "normal", + "_key": "d2139bda7e00", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/", + "_key": "cae7d8897f8e" + }, + { + "_key": "6331cb2a4c42", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/" + }, + { + "_key": "170f78306998", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/" + } + ], + "children": [ + { + "_key": "7f34e5d4fc8d", + "_type": "span", + "marks": [], + "text": "Talks during day two included " + }, + { + "text": "bringing spatial omics to nf-core", + "_key": "f48874cf9bf4", + "_type": "span", + "marks": [ + "cae7d8897f8e" + ] + }, + { + "_key": "d24b5a84b2e6", + "_type": "span", + "marks": [], + "text": ", a discussion of " + }, + { + "marks": [ + "6331cb2a4c42" + ], + "text": "nf-validation", + "_key": "bc0f9da507e2", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", and a talk on the ", + "_key": "444371d2e600" + }, + { + "marks": [ + "170f78306998" + ], + "text": "development of an integrated DNA and RNA variant calling pipeline", + "_key": "5b9e4d6fd1f7", + "_type": "span" + }, + { + "text": ".", + "_key": "e78f9bc1905b", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "20493e4724c0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "514a25eb86cc" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Unfortunately, there were too many brilliant speakers and topics to mention them all here, so we’ve provided a handy summary of talks at the end of this post so you can look up topics of interest.", + "_key": "6b9ee0acfc9d" + } + ], + "_type": "block", + "style": "normal", + "_key": "346095e51ea9" + }, + { + "_type": "block", + "style": "normal", + "_key": "f9be4873a0f8", + "children": [ + { + "_key": "086aa99c9b81", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "c9d051429045", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/sponsors/", + "_key": "e3299d96a7fa" + }, + { + "_key": "2fa668892f57", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/posters/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Summit also featured an exhibition area, and attendees visited booths hosted by ", + "_key": "8e577ce4a71b" + }, + { + "marks": [ + "e3299d96a7fa" + ], + "text": "event sponsors", + "_key": "8c8069288586", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " between talks and viewed the many excellent ", + "_key": "b7ebeef68b83" + }, + { + "text": "scientific posters", + "_key": "1d65bfd34780", + "_type": "span", + "marks": [ + "2fa668892f57" + ] + }, + { + "_key": "0d9a9d966605", + "_type": "span", + "marks": [], + "text": " contributed for the event. Following a packed day of sessions that went into the evening, attendees relaxed and socialized with colleagues over dinner." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0d54636b27ba", + "children": [ + { + "text": "", + "_key": "34368684c049", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "Morning run photo", + "_key": "dfe9ee6ddb9a", + "asset": { + "_ref": "image-199b150da416e3587b8c53cdbcd4937c4e7792ad-1200x620-jpg", + "_type": "reference" + } + }, + { + "children": [ + { + "_type": "span", + "text": "Wrapping up", + "_key": "0c9634a13b96" + } + ], + "_type": "block", + "style": "h2", + "_key": "a4a4adcb8ae8" + }, + { + "markDefs": [ + { + "_key": "7bf6199160a8", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/" + }, + { + "_key": "4c7077cba673", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/", + "_key": "04fc360d8753", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As things wound to a close on day three, there were additional talks on topics ranging from ZS’s ", + "_key": "c09c3422865c" + }, + { + "_type": "span", + "marks": [ + "7bf6199160a8" + ], + "text": "contributing to nf-core through client collaboration", + "_key": "d1a9c44863e0" + }, + { + "marks": [], + "text": " to ", + "_key": "dbeb5671539f", + "_type": "span" + }, + { + "text": "decoding the Tree of Life at Wellcome Sanger Institute", + "_key": "1b73ff468e4a", + "_type": "span", + "marks": [ + "4c7077cba673" + ] + }, + { + "marks": [], + "text": " to ", + "_key": "41371e9d4f42", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "04fc360d8753" + ], + "text": "performing large and reproducible GWAS analysis on biobank-scale data", + "_key": "4de009b8a207" + }, + { + "_type": "span", + "marks": [], + "text": " at Medical University of Innsbruck.", + "_key": "b5414637929a" + } + ], + "_type": "block", + "style": "normal", + "_key": "31c2de53fea8" + }, + { + "_key": "08b40ee33552", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7c1300d0ae72" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/", + "_key": "6b4587651057" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/", + "_key": "fceb6bd3060e", + "_type": "link" + } + ], + "children": [ + { + "_key": "96ad9c890d10", + "_type": "span", + "marks": [], + "text": "Phil Ewels discussed " + }, + { + "_type": "span", + "marks": [ + "6b4587651057" + ], + "text": "future plans for MultiQC", + "_key": "dfda154adb89" + }, + { + "_type": "span", + "marks": [], + "text": ", and Edmund Miller ", + "_key": "0ea20c70c33c" + }, + { + "_key": "1307b7a34c2a", + "_type": "span", + "marks": [ + "fceb6bd3060e" + ], + "text": "shared his experience working on nf-test" + }, + { + "text": " and how it is empowering scalable and streamlined testing for nf-core projects.", + "_key": "04ac237265a9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "aa11b3108917" + }, + { + "_key": "311b9157a8e5", + "children": [ + { + "text": "", + "_key": "3174512a6e9a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e38569fe2143", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/boston/", + "_key": "ee74b6d67a02" + } + ], + "children": [ + { + "text": "To close the event, Evan took the stage a final time, thanking the many Summit organizers and contributors, and announcing the next Nextflow Summit Barcelona, scheduled for ", + "_key": "ae4cc0f5eda6", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "October 21-25, 2024", + "_key": "0b46bbaee2fa", + "_type": "span" + }, + { + "text": ". He also reminded attendees of the upcoming North American Hackathon and ", + "_key": "935b3340346b", + "_type": "span", + "marks": [] + }, + { + "text": "Nextflow Summit in Boston", + "_key": "a7a68f3b9173", + "_type": "span", + "marks": [ + "ee74b6d67a02" + ] + }, + { + "_key": "c747351ebf0b", + "_type": "span", + "marks": [], + "text": " beginning on November 28, 2023." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ad24a103371c" + } + ], + "_type": "block", + "style": "normal", + "_key": "758832921c04" + }, + { + "_key": "b47fbf2d850d", + "markDefs": [ + { + "href": "https://summit.nextflow.io/boston/sponsors/", + "_key": "86c64548e007", + "_type": "link" + } + ], + "children": [ + { + "text": "On behalf of the Seqera team, thank you to our fellow ", + "_key": "b9edae03776d", + "_type": "span", + "marks": [] + }, + { + "text": "sponsors", + "_key": "de7f55ce8ec4", + "_type": "span", + "marks": [ + "86c64548e007" + ] + }, + { + "_type": "span", + "marks": [], + "text": " who helped make the Nextflow Summit a resounding success. This year’s sponsors included:", + "_key": "450697fd7977" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "b810517195c9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "588ee148c259" + }, + { + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "AWS", + "_key": "e7573acfa38b" + }, + { + "_type": "span", + "marks": [], + "text": "ZS", + "_key": "d6031b2a2006" + }, + { + "text": "Element Biosciences", + "_key": "f0f5d51932ae", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Microsoft", + "_key": "275a1df9ce19", + "_type": "span" + }, + { + "text": "MemVerge", + "_key": "4dbbbcf4d78f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Pixelgen Technologies", + "_key": "12709e65b4dc" + }, + { + "_type": "span", + "marks": [], + "text": "Oxford Nanopore", + "_key": "527f751bd684" + }, + { + "marks": [], + "text": "Quilt", + "_key": "8fda92181e10", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "TileDB", + "_key": "303f51823f48" + } + ], + "_type": "block", + "style": "normal", + "_key": "5d89d70485b0" + }, + { + "_type": "block", + "style": "normal", + "_key": "da618e55ce88", + "children": [ + { + "_key": "960770e3a050", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "h2", + "_key": "77cfaefad7f1", + "children": [ + { + "text": "In case you missed it", + "_key": "1b09c3e734cb", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "a64aa253c792", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL", + "_key": "98a590e2fc6c" + } + ], + "children": [ + { + "_key": "7c2456d70960", + "_type": "span", + "marks": [], + "text": "If you were unable to attend in person, or missed a talk, you can watch all three days of the Summit on our " + }, + { + "_type": "span", + "marks": [ + "98a590e2fc6c" + ], + "text": "YouTube channel", + "_key": "d7c1ef05b54d" + }, + { + "text": ".", + "_key": "47634c56c2d2", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d4567e3b1187", + "children": [ + { + "text": "", + "_key": "37f792931f4e", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d7a2d3143541", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events", + "_key": "a750d3220ced" + }, + { + "href": "https://seqera.io/events/seqera/", + "_key": "702c32f38a0c", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For information about additional upcoming events including bytesize talks, hackathons, webinars, and training events, you can visit ", + "_key": "8e6d986394e2" + }, + { + "_type": "span", + "marks": [ + "a750d3220ced" + ], + "text": "https://nf-co.re/events", + "_key": "531155f2a23b" + }, + { + "marks": [], + "text": " or ", + "_key": "d8993cc0c7f4", + "_type": "span" + }, + { + "marks": [ + "702c32f38a0c" + ], + "text": "https://seqera.io/events/seqera/", + "_key": "ef154c47e0d7", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "010ae94ae238" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "6da26cdb5358", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a84f95ae41f6" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For your convenience, a handy list of talks from Nextflow Summit 2023 are summarized below.", + "_key": "a9d5a7ac800c" + } + ], + "_type": "block", + "style": "normal", + "_key": "399e5c070bb2" + }, + { + "style": "normal", + "_key": "8a0e9e0c0c33", + "children": [ + { + "_type": "span", + "text": "", + "_key": "74c351eedc32" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "c15ed83c5e93", + "children": [ + { + "text": "Day one (Wednesday Oct 18):", + "_key": "fbea00ef53b9", + "_type": "span" + } + ] + }, + { + "listItem": "bullet", + "children": [ + { + "_key": "5585e3c3e732", + "_type": "span", + "marks": [], + "text": "[The National Nextflow Tower Service for Australian researchers](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/) – Steven Manos" + }, + { + "text": "[Analysing ONT long read data for cancer with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/) – Arthur Gymer", + "_key": "52883707c92f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[Community updates](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-community-updates/) – Phil Ewels", + "_key": "e5d3822002a8" + }, + { + "_key": "72db64334223", + "_type": "span", + "marks": [], + "text": "[Pixelgen Technologies ❤︎ Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/) – John Dahlberg" + }, + { + "_key": "6d6da43c3d7a", + "_type": "span", + "marks": [], + "text": "[The modern biotech stack](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/) – Evan Floden" + }, + { + "_key": "6b2cb0432980", + "_type": "span", + "marks": [], + "text": "[Biological revelations at the frontiers of a draft human pangenome reference](https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/) – Erik Garrison" + } + ], + "_type": "block", + "style": "normal", + "_key": "1e32d90674ad" + }, + { + "children": [ + { + "_key": "d7b36ee84ffa", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "2ab68602bbc7" + }, + { + "children": [ + { + "text": "Day two (Thursday Oct 19):", + "_key": "a91d8981eb2e", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "4140315217b7" + }, + { + "_type": "block", + "style": "normal", + "_key": "58d6b2352cfd", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "[It’s been quite a year for research technology in the cloud: we’ve been busy](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/) – Brendan Bouffler", + "_key": "a97285786e42" + }, + { + "_key": "cf79e509dad8", + "_type": "span", + "marks": [], + "text": "[nf-validation: a Nextflow plugin to validate pipeline parameters and input files](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/) - Júlia Mir Pedrol" + }, + { + "_type": "span", + "marks": [], + "text": "[Computational methods for allele-specific methylation with biomodal Duet](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-biomodal-duet/) – Michael Wilson", + "_key": "f7d2a527e7d2" + }, + { + "marks": [], + "text": "[How to use data pipelines in Machine Learning for Material Science](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/) – Jakob Zeitler", + "_key": "6333780634b1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "[nf-core/proteinfold: a standardized workflow framework for protein structure prediction tools](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/) - Jose Espinosa-Carrasco", + "_key": "e83a228bef7e" + }, + { + "text": "[Automation on the Seqera Platform](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/) - Harshil Patel", + "_key": "12116faf66c1", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[nf-co2footprint: a Nextflow plugin to estimate the CO2 footprint of pipeline runs](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/) - Sabrina Krakau", + "_key": "d60e826d9168" + }, + { + "text": "[Bringing spatial omics to nf-core](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/) - Victor Perez", + "_key": "6cb0b9773020", + "_type": "span", + "marks": [] + }, + { + "text": "[Bioinformatics at the speed of cloud: revolutionizing genomics with Nextflow and MMCloud](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bioinformatics-at-the-speed-of-cloud/) - Sateesh Peri", + "_key": "3fbeb5c4badb", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[Enabling converged computing with the Nextflow ecosystem](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/) - Paolo Di Tommaso", + "_key": "173aff9b7454" + }, + { + "marks": [], + "text": "[Cluster scalable pangenome graph construction with nf-core/pangenome](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-cluster-scalable-pangenome/) - Simon Heumos", + "_key": "0d4b3f79286a", + "_type": "span" + }, + { + "marks": [], + "text": "[Development of an integrated DNA and RNA variant calling pipeline](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/) - Raquel Manzano", + "_key": "bb11ec1c8719", + "_type": "span" + }, + { + "marks": [], + "text": "[Annotation cache: using nf-core/modules and Seqera Platform to build an AWS open data resource](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-annotation-cache/) - Maxime Garcia", + "_key": "ec61e10057b6", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "[Real-time sequencing analysis with Nextflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-real-time-sequencing-analysis-with-nextflow/) - Chris Wright", + "_key": "84f5a35f6b3d" + }, + { + "_type": "span", + "marks": [], + "text": "[nf-core/sarek: a comprehensive & efficient somatic & germline variant calling workflow](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-sarek/) - Friederike Hanssen", + "_key": "e78a4b943bf5" + }, + { + "marks": [], + "text": "[nf-test: a simple but powerful testing framework for Nextflow pipelines](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-test-simple-but-powerful/) - Lukas Forer", + "_key": "dc4a2fdb0468", + "_type": "span" + }, + { + "marks": [], + "text": "[Empowering distributed precision medicine: scalable genomic analysis in clinical trial recruitment](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-empowering-distributed-precision-medicine/) - Heath Obrien", + "_key": "7e872426234b", + "_type": "span" + }, + { + "_key": "8ae819b8838f", + "_type": "span", + "marks": [], + "text": "[nf-core pipeline for genomic imputation: from phasing to imputation to validation](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-pipeline-for-genomic-imputation/) - Louis Le Nézet" + }, + { + "marks": [], + "text": "[Porting workflow managers to Nextflow at a national diagnostic genomics medical service – strategy and learnings](https://summit.nextflow.io/barcelona/agenda/summit/oct-19-genomics-england/) - Several Speakers", + "_key": "9e30eb295c87", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "a6cb27389ab4", + "children": [ + { + "text": "", + "_key": "e0598182fb2b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "a69946a0b68a", + "children": [ + { + "_key": "e06d9730c609", + "_type": "span", + "text": "Day three (Thursday Oct 19):" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "32bfd3582181", + "listItem": "bullet", + "children": [ + { + "marks": [], + "text": "[Driving discovery: contributing to the nf-core project through client collaboration](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/) - Felipe Almeida & Juliet Frederiksen", + "_key": "2a730bbef02f", + "_type": "span" + }, + { + "text": "[Automated production engine to decode the Tree of Life](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/) - Guoying Qi", + "_key": "85157eebd4e9", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[Building a community: experiences from one year as a developer advocate](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-community-building/) - Marcel Ribeiro-Dantas", + "_key": "761616e58e96" + }, + { + "marks": [], + "text": "[nf-core/raredisease: a workflow to analyse data from patients with rare diseases](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-core-raredisease/) - Ramprasad Neethiraj", + "_key": "d9e061cf871c", + "_type": "span" + }, + { + "_key": "9f8ebfdb06da", + "_type": "span", + "marks": [], + "text": "[Enabling AZ bioinformatics with Nextflow/Nextflow Tower](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-az/) - Manasa Surakala" + }, + { + "text": "[Bringing MultiQC into a new era](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/) - Phil Ewels", + "_key": "96c1da880780", + "_type": "span", + "marks": [] + }, + { + "_key": "72f17d68cf80", + "_type": "span", + "marks": [], + "text": "[nf-test at nf-core: empowering scalable and streamlined testing](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/) - Edmund Miller" + }, + { + "marks": [], + "text": "[Performing large and reproducible GWAS analysis on biobank-scale data](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/) - Sebastian Schönherr", + "_key": "ae37ae1848ec", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "[Highlights from the nf-core hackathon](https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/) - Chris Hakkaart", + "_key": "57ad466b7478" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "969301457a9a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0f0ef353fac1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a5e4c059a002", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges.", + "_key": "40d8943b35bc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "286c888621d0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0c458cad3365" + } + ] + }, + { + "_key": "ac980d908410", + "asset": { + "_type": "reference", + "_ref": "image-df17e8a21b15056284176b5b0a510e2e1d265850-1146x128-png" + }, + "_type": "image", + "alt": "grant logos" + } + ] + }, + { + "publishedAt": "2017-04-26T06:00:00.000Z", + "title": "Nextflow workshop is coming!", + "tags": [ + { + "_type": "reference", + "_key": "4cd73bd1eaf9", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_type": "blogPost", + "_updatedAt": "2024-09-26T09:01:47Z", + "body": [ + { + "children": [ + { + "text": "We are excited to announce the first Nextflow workshop that will take place at the Barcelona Biomedical Research Park building (", + "_key": "d5dab0b48063", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "fef78602d68e" + ], + "text": "PRBB", + "_key": "96fd8563cfc1" + }, + { + "marks": [], + "text": ") on 14-15th September 2017.", + "_key": "723976d2b5ff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fd116d64b8ed", + "markDefs": [ + { + "href": "https://www.prbb.org/", + "_key": "fef78602d68e", + "_type": "link" + } + ] + }, + { + "_key": "5c418ed529b0", + "children": [ + { + "text": "", + "_key": "f6435726ab66", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b2fd019f2c13", + "markDefs": [], + "children": [ + { + "_key": "a14a64d34b41", + "_type": "span", + "marks": [], + "text": "This event is open to everybody who is interested in the problem of computational workflow reproducibility. Leading experts and users will discuss the current state of the Nextflow technology and how it can be applied to manage -omics analyses in a reproducible manner. Best practices will be introduced on how to deploy real-world large-scale genomic applications for precision medicine." + } + ] + }, + { + "style": "normal", + "_key": "d34469ada5c5", + "children": [ + { + "_key": "aa27c36353fb", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "During the hackathon, organized for the second day, participants will have the opportunity to learn how to write self-contained, replicable data analysis pipelines along with Nextflow expert developers.", + "_key": "7b0f7d9d34a5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "86a997fff377" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f8ff45dcf258" + } + ], + "_type": "block", + "style": "normal", + "_key": "9a12ccd0e347" + }, + { + "_type": "block", + "style": "normal", + "_key": "d999a01efeac", + "markDefs": [ + { + "_type": "link", + "href": "http://www.crg.eu/en/event/coursescrg-nextflow-reproducible-silico-genomics", + "_key": "32f701c7c536" + }, + { + "href": "http://apps.crg.es/content/internet/events/webforms/17502", + "_key": "c3e7c680f753", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "More details at ", + "_key": "5ff2308bf79d" + }, + { + "_key": "7d7d2780d42a", + "_type": "span", + "marks": [ + "32f701c7c536" + ], + "text": "this link" + }, + { + "text": ". The registration form is ", + "_key": "7fe8d045e353", + "_type": "span", + "marks": [] + }, + { + "text": "available here", + "_key": "4adf26adf9d3", + "_type": "span", + "marks": [ + "c3e7c680f753" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (deadline 15th Jun).", + "_key": "86415489de0a" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4eda3f91036c" + } + ], + "_type": "block", + "style": "normal", + "_key": "017ff0549e79" + }, + { + "_key": "d8fc025e3066", + "children": [ + { + "_type": "span", + "text": "Schedule (draft)", + "_key": "a1fdf5493afa" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "_key": "28130ea16633", + "_type": "span", + "text": "Thursday, 14 September" + } + ], + "_type": "block", + "style": "h4", + "_key": "3bcd8389ec77" + }, + { + "_key": "d8fefc7b2b23", + "_type": "block" + }, + { + "_type": "block", + "_key": "5489d7d2c1da" + }, + { + "_key": "535da0fa976f", + "_type": "block" + }, + { + "_type": "block", + "_key": "2723409ed326" + }, + { + "_type": "block", + "_key": "adfbe0520c82" + }, + { + "_type": "block", + "_key": "7600bc30abcc" + }, + { + "_type": "block", + "_key": "cf4e23377719" + }, + { + "_type": "block", + "_key": "c4be7d1a952b" + }, + { + "_key": "ad27cff46503", + "_type": "block" + }, + { + "_key": "58d3589e9158", + "_type": "block" + }, + { + "_type": "block", + "_key": "219548a9b25b" + }, + { + "_type": "block", + "_key": "ea2927df8e08" + }, + { + "style": "h4", + "_key": "25ce06fde3c9", + "children": [ + { + "text": "Friday, 15 September", + "_key": "485172c9c983", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "_key": "b6e0388e6e6c" + }, + { + "_key": "31fd0f0b6cb8", + "_type": "block" + }, + { + "_type": "block", + "_key": "795688dd9840" + }, + { + "_type": "block", + "_key": "5ca5ad777963" + }, + { + "_key": "4575d6bda68f", + "_type": "block" + }, + { + "_type": "block", + "_key": "e808b75349f0" + }, + { + "_type": "block", + "_key": "d06ef1bbccb6" + }, + { + "_type": "block", + "_key": "652e506b5d49" + }, + { + "_type": "block", + "_key": "ce80a78d8829" + }, + { + "_type": "block", + "_key": "a97ec6395c70" + }, + { + "_type": "block", + "_key": "28f930e1525d" + }, + { + "_type": "block", + "_key": "3bcea9ab962a" + }, + { + "_type": "image", + "alt": "Nextflow workshop", + "_key": "eb5188b82cd8", + "asset": { + "_type": "reference", + "_ref": "image-8facadad98110a5d8bf8cb2eb92fd9132c69cfc2-800x1132-png" + } + } + ], + "_createdAt": "2024-09-25T14:15:20Z", + "meta": { + "slug": { + "current": "nextflow-workshop" + } + }, + "_rev": "hf9hwMPb7ybAE3bqEU1wvj", + "_id": "3fb6ccf8957c" + }, + { + "publishedAt": "2023-06-06T06:00:00.000Z", + "author": { + "_ref": "noel-ortiz", + "_type": "reference" + }, + "_type": "blogPost", + "_id": "42aec811c9c0", + "body": [ + { + "_key": "9750bbfcaf51", + "markDefs": [ + { + "_type": "link", + "href": "https://www.crg.eu/", + "_key": "b6cbcd97c8a3" + } + ], + "children": [ + { + "marks": [], + "text": "There's been a lot of water under the bridge since the first release of Nextflow in July 2013. From its humble beginnings at the ", + "_key": "d254ff0f67f4", + "_type": "span" + }, + { + "text": "Centre for Genomic Regulation", + "_key": "f086a95e45e8", + "_type": "span", + "marks": [ + "b6cbcd97c8a3" + ] + }, + { + "marks": [], + "text": " (CRG) in Barcelona, Nextflow has evolved from an upstart workflow orchestrator to one of the most consequential projects in open science software (OSS). Today, Nextflow is downloaded ", + "_key": "0274d81afdf2", + "_type": "span" + }, + { + "text": "120,000+", + "_key": "aac78c3d9497", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "bf2f85ac7cc3", + "_type": "span", + "marks": [], + "text": " times monthly, boasts vibrant user and developer communities, and is used by leading pharmaceutical, healthcare, and biotech research firms." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f504a8085c43", + "children": [ + { + "text": "", + "_key": "e62aa425d132", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9ff151aa6bc0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "On the occasion of Nextflow's anniversary, I thought it would be fun to share some perspectives and point out how far we've come as a community. I also wanted to recognize the efforts of Paolo Di Tommaso and the many people who have contributed enormous time and effort to make Nextflow what it is today.", + "_key": "8f70b1423262", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "cc12e7eb654b" + } + ], + "_type": "block", + "style": "normal", + "_key": "dde5ff26c23e" + }, + { + "_type": "block", + "style": "h2", + "_key": "3177235f3c72", + "children": [ + { + "_type": "span", + "text": "A decade of innovation", + "_key": "060e7c6b31fd" + } + ] + }, + { + "_key": "fd0a3366e312", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Bill Gates is credited with observing that "people often overestimate what they can do in one year, but underestimate what they can do in ten." The lesson, of course, is that real, meaningful change takes time. Progress is measured in a series of steps. Considered in isolation, each new feature added to Nextflow seems small, but they combine to deliver powerful capabilities.", + "_key": "d3fd936f11cd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "9285755c55de", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8ad698e88882" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3487dc6d5dd4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Life sciences has seen a staggering amount of innovation. According to estimates from the National Human Genome Research Institute (NHGRI), the cost of sequencing a human genome in 2013 was roughly USD 10,000. Today, sequencing costs are in the range of USD 200—a ", + "_key": "02a67390c3b6" + }, + { + "text": "50-fold reduction", + "_key": "7da4fe5812d8", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".^1^", + "_key": "dd60c3e79e18" + } + ] + }, + { + "_key": "9ff8133a25d3", + "children": [ + { + "_key": "0a9a4f309944", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "A fundamental principle of economics is that ", + "_key": "c05d7271e8a1", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"if you make something cheaper, you get more of it.\"", + "_key": "4954c91fad03" + }, + { + "text": " One didn't need a crystal ball to see that, driven by plummeting sequencing and computing costs, the need for downstream analysis was poised to explode. With advances in sequencing technology outpacing Moore's Law, It was clear that scaling analysis capacity would be a significant issue.^2^", + "_key": "91e56580e2f7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d3de4fd91761", + "markDefs": [] + }, + { + "style": "normal", + "_key": "8195f5ab3ce2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e78058e587f9" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Getting the fundamentals right", + "_key": "84b164c5404f" + } + ], + "_type": "block", + "style": "h2", + "_key": "49ce5def2fbb" + }, + { + "_key": "f3f27d2038a6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "When Paolo and his colleagues started the Nextflow project, it was clear that emerging technologies such as cloud computing, containers, and collaborative software development would be important. Even so, it is still amazing how rapidly these key technologies have advanced in ten short years.", + "_key": "7ccd423bff6e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "643f4d5f2fed", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9d19e6a41c48" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "In an ", + "_key": "49e1e98de245", + "_type": "span", + "marks": [] + }, + { + "_key": "6fc15247e5dd", + "_type": "span", + "marks": [ + "a37229ff49fc" + ], + "text": "article for eLife magazine in 2021" + }, + { + "_type": "span", + "marks": [], + "text": ", Paolo described how Solomon Hyke's talk "", + "_key": "76ccc359aee4" + }, + { + "text": "Why we built Docker", + "_key": "9ff9005e47f4", + "_type": "span", + "marks": [ + "46e886ad8cb5" + ] + }, + { + "marks": [], + "text": "" at DotScale in the summer of 2013 impacted his thinking about the design of Nextflow. It was evident that containers would be a game changer for scientific workflows. Encapsulating application logic in self-contained, portable containers solved a multitude of complexity and dependency management challenges — problems experienced daily at the CRG and by many bioinformaticians to this day. Nextflow was developed concurrent with the container revolution, and Nextflow’s authors had the foresight to make containers first-class citizens.", + "_key": "628d48db1aaa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a68615b822d4", + "markDefs": [ + { + "_type": "link", + "href": "https://elifesciences.org/labs/d193babe/the-story-of-nextflow-building-a-modern-pipeline-orchestrator", + "_key": "a37229ff49fc" + }, + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=3N3n9FzebAA", + "_key": "46e886ad8cb5" + } + ] + }, + { + "style": "normal", + "_key": "21c93fcd8391", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7fa7b10236d1" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/container.html?highlight=containers", + "_key": "4a550dfe73d0" + }, + { + "_type": "link", + "href": "https://www.docker.com/", + "_key": "86ebdddba822" + }, + { + "_type": "link", + "href": "https://sylabs.io/", + "_key": "66c983d1a240" + }, + { + "_type": "link", + "href": "https://podman.io/", + "_key": "9446c9649189" + }, + { + "_type": "link", + "href": "https://hpc.github.io/charliecloud/", + "_key": "d78682cd47f6" + }, + { + "_type": "link", + "href": "https://sarus.readthedocs.io/en/stable/", + "_key": "abcb1e2d6135" + }, + { + "_key": "1df59ac2ed53", + "_type": "link", + "href": "https://github.com/NERSC/shifter" + } + ], + "children": [ + { + "_key": "3e64667aece2", + "_type": "span", + "marks": [], + "text": "With containers, HPC environments have been transformed — from complex environments where application binaries were typically served to compute nodes via NFS to simpler architectures where task-specific containers are pulled from registries on demand. Today, most bioinformatic pipelines use containers. Nextflow supports " + }, + { + "_type": "span", + "marks": [ + "4a550dfe73d0" + ], + "text": "multiple container formats", + "_key": "a6da96e99868" + }, + { + "_key": "3a8e798cf353", + "_type": "span", + "marks": [], + "text": " and runtimes, including " + }, + { + "_type": "span", + "marks": [ + "86ebdddba822" + ], + "text": "Docker", + "_key": "b7e8099a2df8" + }, + { + "text": ", ", + "_key": "75262511b215", + "_type": "span", + "marks": [] + }, + { + "text": "Singularity", + "_key": "e8fc9ab0d5cc", + "_type": "span", + "marks": [ + "66c983d1a240" + ] + }, + { + "marks": [], + "text": ", ", + "_key": "3d3dd40c4ffc", + "_type": "span" + }, + { + "_key": "b4d9dfbe8497", + "_type": "span", + "marks": [ + "9446c9649189" + ], + "text": "Podman" + }, + { + "text": ", ", + "_key": "4f4fe3191df2", + "_type": "span", + "marks": [] + }, + { + "_key": "e093f469deb6", + "_type": "span", + "marks": [ + "d78682cd47f6" + ], + "text": "Charliecloud" + }, + { + "text": ", ", + "_key": "90d68a3a58cd", + "_type": "span", + "marks": [] + }, + { + "text": "Sarus", + "_key": "35c8fd40934f", + "_type": "span", + "marks": [ + "abcb1e2d6135" + ] + }, + { + "marks": [], + "text": ", and ", + "_key": "e7fd337a58db", + "_type": "span" + }, + { + "marks": [ + "1df59ac2ed53" + ], + "text": "Shifter", + "_key": "8abd4555bd6a", + "_type": "span" + }, + { + "_key": "d5dc1d2bd047", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "c3565b8d2875" + }, + { + "_type": "block", + "style": "normal", + "_key": "eab766ce5a2c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d9ab05ca3618" + } + ] + }, + { + "children": [ + { + "text": "The shift to the cloud", + "_key": "ab7bada51e70", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "71f1ce51cc67" + }, + { + "markDefs": [], + "children": [ + { + "text": "Some of the earliest efforts around Nextflow centered on building high-quality executors for HPC workload managers. A key idea behind schedulers such as LSF, PBS, Slurm, and Grid Engine was to share a fixed pool of on-premises resources among multiple users, maximizing throughput, efficiency, and resource utilization.", + "_key": "fa46e7cd08bd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5aabf0e05b8e" + }, + { + "style": "normal", + "_key": "a4c74c716ff5", + "children": [ + { + "text": "", + "_key": "fd335409fc71", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "See the article ", + "_key": "44f3cea35c35", + "_type": "span", + "marks": [] + }, + { + "_key": "27bc099bd8e3", + "_type": "span", + "marks": [ + "45edf192c2af" + ], + "text": "Nextflow on BIG IRON: Twelve tips for improving the effectiveness of pipelines on HPC clusters" + } + ], + "_type": "block", + "style": "normal", + "_key": "5ea2b6e9bf70", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/blog/2023/best-practices-deploying-pipelines-with-hpc-workload-managers.html", + "_key": "45edf192c2af" + } + ] + }, + { + "style": "normal", + "_key": "b5573da27d2e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "32a5daa6995a" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "While cloud infrastructure was initially "clunky" and hard to deploy and use, the idea of instant access and pay-per-use models was too compelling to ignore. In the early days, many organizations attempted to replicate on-premises HPC clusters in the cloud, deploying the same software stacks and management tools used locally to cloud-based VMs.", + "_key": "efb61291eef7" + } + ], + "_type": "block", + "style": "normal", + "_key": "7521d73a40ce", + "markDefs": [] + }, + { + "children": [ + { + "_key": "3683b848bb88", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7377d24b07de" + }, + { + "_key": "0f71267e8d7a", + "markDefs": [ + { + "href": "https://aws.amazon.com/batch/", + "_key": "80a5ce5d4eca", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the launch of ", + "_key": "c1f55420e3c0" + }, + { + "_type": "span", + "marks": [ + "80a5ce5d4eca" + ], + "text": "AWS Batch", + "_key": "434b6cf79768" + }, + { + "text": " in December 2016, Nextflow’s developers realized there was a better way. In cloud environments, resources are (in theory) infinite and just an API call away. The traditional scheduling paradigm of sharing a finite resource pool didn't make sense in the cloud, where users could dynamically provision a private, scalable resource pool for only the duration of their workload. All the complex scheduling and control policies that tended to make HPC workload managers hard to use and manage were no longer required.^3^", + "_key": "b9e5fa3608b7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "233b79eea461", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7db02fff7603" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b0510ab62a41", + "markDefs": [ + { + "_key": "2563920e6a10", + "_type": "link", + "href": "https://azure.microsoft.com/en-us/products/batch" + }, + { + "_type": "link", + "href": "https://cloud.google.com/batch", + "_key": "c20d492213d7" + }, + { + "_key": "2f84994bd24f", + "_type": "link", + "href": "https://kubernetes.io/docs/concepts/overview/" + } + ], + "children": [ + { + "_key": "1c71e129101c", + "_type": "span", + "marks": [], + "text": "AWS Batch also relied on containerization, so it only made sense that AWS Batch was the first cloud-native integration to the Nextflow platform early in 2017, along with native support for S3 storage buckets. Nextflow has since been enhanced to support other batch services, including " + }, + { + "_key": "9433599be88e", + "_type": "span", + "marks": [ + "2563920e6a10" + ], + "text": "Azure Batch" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "ba4d5c84cb84" + }, + { + "_type": "span", + "marks": [ + "c20d492213d7" + ], + "text": "Google Cloud Batch", + "_key": "b3cb034b553f" + }, + { + "_type": "span", + "marks": [], + "text": ", along with a rich set of managed cloud storage solutions. Nextflow’s authors have also embraced ", + "_key": "557cdf12b6b3" + }, + { + "_key": "724c9f67583e", + "_type": "span", + "marks": [ + "2f84994bd24f" + ], + "text": "Kubernetes" + }, + { + "text": ", developed by Google, yet another way to marshal and manage containerized application environments across public and private clouds.", + "_key": "75383360fd89", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "61b398e8a98f", + "children": [ + { + "_key": "963ba01d3c26", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7c66aaba4462", + "children": [ + { + "text": "SCMs come of age", + "_key": "3b632f1a3604", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "A major trend shaping software development has been the use of collaborative source code managers (SCMs) based on Git. When Paolo was thinking about the design of Nextflow, GitHub had already been around for several years, and DevOps techniques were revolutionizing software. These advances turned out to be highly relevant to managing pipelines. Ten years ago, most bioinformaticians stored copies of pipeline scripts locally. Nextflow’s authors recognized what now seems obvious — it would be easier to make Nextflow SCM aware and launch pipelines directly from a code repository. Today, this simple idea has become standard practice. Most users run pipelines directly from GitHub, GitLab, Gitea, or other favorite SCMs.", + "_key": "fe0b600a233b" + } + ], + "_type": "block", + "style": "normal", + "_key": "e8b6b9a5bb50", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "cfdb40938fd9", + "children": [ + { + "_key": "2991f839637b", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "text": "Modularization on steroids", + "_key": "35cda0d90db5", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "4eb1569eecad" + }, + { + "_key": "7385f7753f27", + "markDefs": [ + { + "_type": "link", + "href": "https://hub.docker.com/", + "_key": "3785e7b1feb8" + }, + { + "href": "https://quay.io/", + "_key": "684f922b6e56", + "_type": "link" + }, + { + "_type": "link", + "href": "https://biocontainers.pro/", + "_key": "873c37a01ccc" + } + ], + "children": [ + { + "marks": [], + "text": "A few basic concepts and patterns in computer science appear repeatedly in different contexts. These include iteration, indirection, abstraction, and component reuse/modularization. Enabled by containers, we have seen a significant shift towards modularization in bioinformatics pipelines enabled by catalogs of reusable containers. In addition to general-purpose registries such as ", + "_key": "312d6708afe6", + "_type": "span" + }, + { + "text": "Docker Hub", + "_key": "55cee875586a", + "_type": "span", + "marks": [ + "3785e7b1feb8" + ] + }, + { + "_key": "9082fffa5eb4", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_key": "377de42d9d71", + "_type": "span", + "marks": [ + "684f922b6e56" + ], + "text": "Quay.io" + }, + { + "_type": "span", + "marks": [], + "text": ", domain-specific efforts such as ", + "_key": "0bc0a7eec8c0" + }, + { + "_key": "ed22bdf9f785", + "_type": "span", + "marks": [ + "873c37a01ccc" + ], + "text": "biocontainers" + }, + { + "marks": [], + "text": " have emerged, aimed at curating purpose-built containers to meet the specialized needs of bioinformaticians.", + "_key": "15709fdf04c8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "16d22d7b68ab", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8de72a2d8070" + } + ] + }, + { + "style": "normal", + "_key": "7c7633dc397f", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.conda.io/en/latest/", + "_key": "51cb0fd4beeb" + }, + { + "href": "https://anaconda.org/bioconda/repo", + "_key": "2d739878d8ec", + "_type": "link" + }, + { + "href": "http://bioconda.github.io/conda-package_index.html", + "_key": "66662e77e5f2", + "_type": "link" + } + ], + "children": [ + { + "_key": "f6939bb8ceed", + "_type": "span", + "marks": [], + "text": "We have also seen the emergence of platform and language-independent package managers such as " + }, + { + "_type": "span", + "marks": [ + "51cb0fd4beeb" + ], + "text": "Conda", + "_key": "26901ce88e8b" + }, + { + "text": ". Today, almost ", + "_key": "88fc1582a339", + "_type": "span", + "marks": [] + }, + { + "_key": "5b32db707e5f", + "_type": "span", + "marks": [ + "strong" + ], + "text": "10,000" + }, + { + "_key": "0900053bd174", + "_type": "span", + "marks": [], + "text": " Conda recipes for various bioinformatics tools are freely available from " + }, + { + "_type": "span", + "marks": [ + "2d739878d8ec" + ], + "text": "Bioconda", + "_key": "8e3e802bdb8f" + }, + { + "text": ". Gone are the days of manually installing software. In addition to pulling pre-built bioinformatics containers from registries, developers can leverage ", + "_key": "04c05ef00680", + "_type": "span", + "marks": [] + }, + { + "text": "packages of bioconda", + "_key": "ce5a4beb7af3", + "_type": "span", + "marks": [ + "66662e77e5f2" + ] + }, + { + "marks": [], + "text": " recipes directly from the bioconda channel.", + "_key": "1085b15f82d6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "27b1fd9bff68", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2ee0c27b2e47" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "The Nextflow community has helped lead this trend toward modularization in several areas. For example, in 2022, Seqera Labs introduced ", + "_key": "235c9196b0af", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "6dfb2a300b2a" + ], + "text": "Wave", + "_key": "57f533e5a7bf" + }, + { + "text": ". This new service can dynamically build and serve containers on the fly based on bioconda recipes, enabling the two technologies to work together seamlessly and avoiding building and maintaining containers by hand.", + "_key": "656e755ea78a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4cf6279fdde9", + "markDefs": [ + { + "_key": "6dfb2a300b2a", + "_type": "link", + "href": "https://seqera.io/wave/" + } + ] + }, + { + "style": "normal", + "_key": "75116fb77f7b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9571c56176e8" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "523e20c720fa", + "markDefs": [ + { + "_key": "aeb13e231372", + "_type": "link", + "href": "https://nf-co.re/" + }, + { + "_type": "link", + "href": "https://nf-co.re/modules", + "_key": "f7645d9fc3a8" + }, + { + "_type": "link", + "href": "https://nf-co.re/pipelines", + "_key": "a3454b5e32c6" + } + ], + "children": [ + { + "text": "With ", + "_key": "84ce1d66b358", + "_type": "span", + "marks": [] + }, + { + "text": "nf-core", + "_key": "b344252c7ebd", + "_type": "span", + "marks": [ + "aeb13e231372" + ] + }, + { + "_key": "d5fe3b2a041a", + "_type": "span", + "marks": [], + "text": ", the Nextflow community has extended the concept of modularization and reuse one step further. Much as bioconda and containers have made bioinformatics software modular and portable, " + }, + { + "marks": [ + "f7645d9fc3a8" + ], + "text": "nf-core modules", + "_key": "b306ae0531bc", + "_type": "span" + }, + { + "marks": [], + "text": " extend these concepts to pipelines. Today, there are ", + "_key": "85597ce3e527", + "_type": "span" + }, + { + "text": "900+", + "_key": "a9628b5a5908", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " nf-core modules — essentially building blocks with pre-defined inputs and outputs based on Nextflow's elegant dataflow model. Rather than creating pipelines from scratch, developers can now wire together these pre-assembled modules to deliver new functionality rapidly or use any of ", + "_key": "acb56eb96ae6", + "_type": "span", + "marks": [] + }, + { + "text": "80", + "_key": "85a2ffc57177", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " of the pre-built ", + "_key": "16cb8be9eb1c" + }, + { + "marks": [ + "a3454b5e32c6" + ], + "text": "nf-core analysis pipelines", + "_key": "ad4f2bf149b8", + "_type": "span" + }, + { + "marks": [], + "text": ". The result is a dramatic reduction in development and maintenance costs.", + "_key": "2fc36bff6e73", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "59890f0bc425", + "children": [ + { + "_key": "cba91082e3ce", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "b9003477ddb0", + "children": [ + { + "_key": "70f0eee573d7", + "_type": "span", + "text": "Some key Nextflow milestones" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "d328f7e77e3a", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v0.3.0", + "_key": "8bb4db15f8e8", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Since the ", + "_key": "48abc5821fa8" + }, + { + "marks": [ + "8bb4db15f8e8" + ], + "text": "first Nextflow release", + "_key": "d2c7325b76a5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in July 2013, there have been ", + "_key": "791ec4190743" + }, + { + "text": "237 releases", + "_key": "ee5617ce5aeb", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "f63acb1e0768" + }, + { + "text": "5,800 commits", + "_key": "ec5db5f402cd", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": ". Also, the project has been forked over ", + "_key": "4f2311716cf3", + "_type": "span" + }, + { + "_key": "2eff78afc663", + "_type": "span", + "marks": [ + "strong" + ], + "text": "530" + }, + { + "_key": "f4f44b806389", + "_type": "span", + "marks": [], + "text": " times. There have been too many important enhancements and milestones to capture here. We capture some important developments in the timeline below:" + } + ] + }, + { + "style": "normal", + "_key": "abd1cbc86e64", + "children": [ + { + "_key": "3fee81c97ad5", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-bd96028dd6a15de9cebaa83d075489ed6a40c3f1-3217x1800-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Nextflow ten year graphic", + "_key": "f8cb16b89ef7" + }, + { + "children": [ + { + "marks": [], + "text": "As we look to the future, the pace of innovation continues to increase. It’s been exciting to see Nextflow expand beyond the various ", + "_key": "1bc6b2e84353", + "_type": "span" + }, + { + "_key": "b4a795c38740", + "_type": "span", + "marks": [ + "em" + ], + "text": "omics" + }, + { + "_type": "span", + "marks": [], + "text": " disciplines to new areas such as medical imaging, data science, and machine learning. We continue to evolve Nextflow, adding new features and capabilities to support these emerging use cases and support new compute and storage environments. I can hardly wait to see what the next ten years will bring.", + "_key": "9a38ba1f02eb" + } + ], + "_type": "block", + "style": "normal", + "_key": "52490663a4d9", + "markDefs": [] + }, + { + "_key": "62016297236b", + "children": [ + { + "_key": "8c88d11fd599", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d2502de4b2a0", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/blog/2023/learn-nextflow-in-2023.html", + "_key": "09fba0c23188" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For those new to Nextflow and wishing to learn more about the project, we have compiled an excellent collection of resources to help you ", + "_key": "a79ef4d46582" + }, + { + "_key": "03a854792d04", + "_type": "span", + "marks": [ + "09fba0c23188" + ], + "text": "Learn Nextflow in 2023" + }, + { + "_key": "75891e4d1e20", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "3f591e9bb5da", + "children": [ + { + "text": "", + "_key": "4c57825cff75", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "1b8117d660b1", + "children": [ + { + "text": "---", + "_key": "7820f1887f01", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "bb0f2c2df70b", + "children": [ + { + "_key": "4e52f131b5f3", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "b2fc9f4ff2e9", + "markDefs": [ + { + "_key": "8dea84be5dde", + "_type": "link", + "href": "https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost" + }, + { + "_type": "link", + "href": "https://www.genome.gov/sites/default/files/inline-images/2021_Sequencing_cost_per_Human_Genome.jpg", + "_key": "630407000769" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "^1^ ", + "_key": "a15113d71a1e" + }, + { + "_key": "31dcc8e4bee2", + "_type": "span", + "marks": [ + "8dea84be5dde" + ], + "text": "https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost" + }, + { + "marks": [], + "text": " ^2^ Coined by Gordon Moore of Intel in 1965, Moore’s Law predicted that transistor density, roughly equating to compute performance, would roughly double every two years. This was later revised in some estimates to 18 months. Over ten years, Moore’s law predicts roughly a 2^5 = 32X increase in performance – less than the ~50-fold decrease in sequencing costs. See ", + "_key": "56972bfb6bf2", + "_type": "span" + }, + { + "marks": [ + "630407000769" + ], + "text": "chart here", + "_key": "4d96bbddcbe8", + "_type": "span" + }, + { + "_key": "1a5ee979d780", + "_type": "span", + "marks": [], + "text": ". ^3^ This included features like separate queues, pre-emption policies, application profiles, and weighted fairshare algorithms." + } + ], + "_type": "block" + } + ], + "tags": [ + { + "_type": "reference", + "_key": "293f8551dd71", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "_createdAt": "2024-09-25T14:17:41Z", + "meta": { + "slug": { + "current": "reflecting-on-ten-years-of-nextflow-awesomeness" + } + }, + "_rev": "87gw29IlgU4Z8o00zkoBBh", + "title": "Reflecting on ten years of Nextflow awesomeness", + "_updatedAt": "2024-09-26T09:04:11Z" + }, + { + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "8dc4bd5f302a" + } + ], + "title": "Five more tips for Nextflow user on HPC", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWND4", + "_type": "reference" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5jXv", + "_type": "blogPost", + "publishedAt": "2021-06-15T06:00:00.000Z", + "meta": { + "description": "In May we blogged about Five Nextflow Tips for HPC Users and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers.", + "slug": { + "current": "5-more-tips-for-nextflow-user-on-hpc" + } + }, + "_id": "42b4b1b9eff0", + "_createdAt": "2024-09-25T14:15:52Z", + "_updatedAt": "2024-09-30T09:22:55Z", + "body": [ + { + "_key": "1e079bccd85f", + "markDefs": [ + { + "_key": "bfb7c4f40434", + "_type": "link", + "href": "/blog/2021/5_tips_for_hpc_users.html" + } + ], + "children": [ + { + "marks": [], + "text": "In May we blogged about ", + "_key": "c046a7075e94", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "bfb7c4f40434" + ], + "text": "Five Nextflow Tips for HPC Users", + "_key": "8dffc4f9faa5" + }, + { + "text": " and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers.", + "_key": "f2ddefd9788f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "19821646d6ac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9ddf2ce8d302", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "1. Use the scratch directive", + "_key": "01ef60790a82", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "dc2bfb4d8f87" + }, + { + "children": [ + { + "_key": "a5e06a741038", + "_type": "span", + "marks": [], + "text": "To allow the pipeline tasks to share data with each other, Nextflow requires a shared file system path as a working directory. When using this model, a common recommendation is to use the node's local scratch storage as the job working directory to avoid unnecessary use of the network shared file system and achieve better performance." + } + ], + "_type": "block", + "style": "normal", + "_key": "ff3d627e7f99", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "b0358dd5e166", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c2f1ac172f15" + }, + { + "style": "normal", + "_key": "766afde1bbbd", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow implements this best-practice which can be enabled by adding the following setting in your ", + "_key": "6b6ea5ab02b9", + "_type": "span" + }, + { + "text": "nextflow.config", + "_key": "45490e6d5d3b", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "6735391e2415" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5da7b5339ecc" + } + ], + "_type": "block", + "style": "normal", + "_key": "d7ace5b3ca18", + "markDefs": [] + }, + { + "_type": "code", + "_key": "02fe0e00b2a7", + "code": "process.scratch = true" + }, + { + "style": "normal", + "_key": "d2d7bb191194", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b2b279450e48", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "d80fb760f6fa", + "markDefs": [], + "children": [ + { + "_key": "0c47b9a53384", + "_type": "span", + "marks": [], + "text": "When using this option, Nextflow:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "8244e6f55d52", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce810f8e9d10", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "294ccd8b20b9", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "bfb8d023bea00", + "_type": "span", + "marks": [], + "text": "Creates a unique directory in the computing node’s local " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "/tmp", + "_key": "bfb8d023bea01" + }, + { + "_type": "span", + "marks": [], + "text": " or the path assigned by your cluster via the ", + "_key": "bfb8d023bea02" + }, + { + "_key": "bfb8d023bea03", + "_type": "span", + "marks": [ + "code" + ], + "text": "TMPDIR" + }, + { + "text": " environment variable.", + "_key": "bfb8d023bea04", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "d3fda6538823", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://en.wikipedia.org/wiki/Symbolic_link", + "_key": "f0b589d9bb4b", + "_type": "link" + } + ], + "children": [ + { + "text": "Creates a ", + "_key": "e4e7ec0cd1bc0", + "_type": "span", + "marks": [] + }, + { + "text": "symlink", + "_key": "e4e7ec0cd1bc1", + "_type": "span", + "marks": [ + "f0b589d9bb4b" + ] + }, + { + "_type": "span", + "marks": [], + "text": " for each input file required by the job execution.", + "_key": "e4e7ec0cd1bc2" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "fae2c9db88b7", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Runs the job in the local scratch path. Copies the job output files into the job shared work directory assigned by Nextflow.", + "_key": "59c191f4a7660" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "d4afe9f642bf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c69eb890a3eb" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "b7d2badc9baf", + "markDefs": [], + "children": [ + { + "text": "2. Use -bg option to launch the execution in the background", + "_key": "d8e69ea668d2", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "In some circumstances, you may need to run your Nextflow pipeline in the background without losing the execution output. In this scenario use the ", + "_key": "c6f481f101e0", + "_type": "span", + "marks": [] + }, + { + "text": "-bg", + "_key": "ea12253295cd", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " command line option as shown below.", + "_key": "1b372e263059", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "54073a68f72f" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0cce4c63d040", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b85a8de0d7b2" + }, + { + "_type": "code", + "_key": "8b5e2de93bed", + "code": "nextflow run -bg > my-file.log" + }, + { + "_key": "91e9d34ab66f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5264e2054015", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e66b198a1e92", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This can be very useful when launching the execution from an SSH connected terminal and ensures that any connection issues don't stop the pipeline. You can use ", + "_key": "1949ca52abcd", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "ps", + "_key": "7c866d93dd99", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "ecfcabadac2a" + }, + { + "text": "kill", + "_key": "929e119c2b64", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " to find and stop the execution.", + "_key": "7b314c62d6f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "487075f9f234", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "94878577c759" + }, + { + "_type": "block", + "style": "h2", + "_key": "70e9706e7e4c", + "markDefs": [], + "children": [ + { + "_key": "ddd311cc0801", + "_type": "span", + "marks": [], + "text": "3. Disable interactive logging" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow has rich terminal logging which uses ANSI escape codes to update the pipeline execution counters interactively. However, this is not very useful when submitting the pipeline execution as a cluster job or in the background. In this case, disable the rich ANSI logging using the command line option ", + "_key": "e873cb7f406f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-ansi-log false", + "_key": "1beece2a9231" + }, + { + "marks": [], + "text": " or the environment variable ", + "_key": "0cb747b1d521", + "_type": "span" + }, + { + "_key": "a739f0a18bb2", + "_type": "span", + "marks": [ + "code" + ], + "text": "NXF_ANSI_LOG=false" + }, + { + "_key": "f2cc5643031a", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "c1f6f23aff82" + }, + { + "children": [ + { + "text": "", + "_key": "533bb7cfb10b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9bae6268dad3", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "4. Cluster native options", + "_key": "32773edf1aba", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "af36a9faaa29", + "markDefs": [] + }, + { + "style": "normal", + "_key": "2bbc3d94ca92", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#cpus", + "_key": "45b91467b042" + }, + { + "href": "https://www.nextflow.io/docs/latest/process.html#memory", + "_key": "6f1095c8d436", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#disk", + "_key": "4e0d3d51b11b" + } + ], + "children": [ + { + "_key": "db0e7952fe3a", + "_type": "span", + "marks": [], + "text": "Nextlow has portable directives for common resource requests such as " + }, + { + "_type": "span", + "marks": [ + "45b91467b042" + ], + "text": "cpus", + "_key": "73895a9b020d" + }, + { + "text": ", ", + "_key": "8d7820f7f3bb", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "6f1095c8d436" + ], + "text": "memory", + "_key": "fbf16c8d59c1" + }, + { + "_key": "c94428074a20", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "4e0d3d51b11b" + ], + "text": "disk", + "_key": "34f31b12a268", + "_type": "span" + }, + { + "_key": "674c55b3ca7b", + "_type": "span", + "marks": [], + "text": " allocation." + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "4700d4255ba0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b4e2367f0e17", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "516302240cf2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "These directives allow you to specify the request for a certain number of computing resources e.g CPUs, memory, or disk and Nextflow converts these values to the native setting of the target execution platform specified in the pipeline configuration.", + "_key": "da2b346173c8" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "11a5b181732b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bf4e7b5073e8" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "However, there can be settings that are only available on some specific cluster technology or vendors.", + "_key": "38e4e68000f6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3e4b55d3d37c" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2e53882af5d6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a6d4d4b1773a" + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/process.html#clusterOptions", + "_key": "894e1a7d3a32", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "05b6aa762afd" + }, + { + "text": "clusterOptions", + "_key": "bcf1d9c3b577", + "_type": "span", + "marks": [ + "894e1a7d3a32" + ] + }, + { + "_type": "span", + "marks": [], + "text": " directive allows you to specify any option of your resource manager for which there isn't direct support in Nextflow.", + "_key": "65f30d18a9e7" + } + ], + "_type": "block", + "style": "normal", + "_key": "b115bff69e24" + }, + { + "_key": "aef3b822f954", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "61d8396a97ed" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "5. Retry failing jobs increasing resource allocation", + "_key": "188ff7cae9e9", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "d5097179e6f5", + "markDefs": [] + }, + { + "style": "normal", + "_key": "738a13554797", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A common scenario is that instances of the same process may require different computing resources. For example, requesting an amount of memory that is too low for some processes will result in those tasks failing. You could specify a higher limit which would accommodate the task with the highest memory utilization, but you then run the risk of decreasing your job’s execution priority.", + "_key": "eaa1f97cc36b" + } + ], + "_type": "block" + }, + { + "_key": "f0ef6221b3af", + "markDefs": [], + "children": [ + { + "_key": "5f1bd5086ec8", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure and attempt to re-execute it using a higher limit. For example:", + "_key": "eec942bfa508" + } + ], + "_type": "block", + "style": "normal", + "_key": "1c44ba489d73" + }, + { + "style": "normal", + "_key": "0bb9719ac1e0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "824cd1f57ed7", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "19507665b71c", + "code": "process foo {\n\n memory { 2.GB * task.attempt }\n time { 1.hour * task.attempt }\n\n errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries 3\n\n script:\n \"\"\"\n your_job_command --here\n \"\"\"\n}", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "c0f503f0cf67", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4e617dceddfe" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the above example the memory and execution time limits are defined dynamically. The first time the process is executed the task.attempt is set to 1, thus it will request 2 GB of memory and one hour of maximum execution time.", + "_key": "d2b5dd2fdc3b" + } + ], + "_type": "block", + "style": "normal", + "_key": "c7e9fb85fe6d" + }, + { + "_type": "block", + "style": "normal", + "_key": "35e7bff90950", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2a1bd20061e1", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "If the task execution fails, reporting an exit status in the range between 137 and 140, the task is re-submitted (otherwise it terminates immediately). This time the value of task.attempt is 2, thus increasing the amount of the memory to four GB and the time to 2 hours, and so on.", + "_key": "487fc35e66d5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1e12eb2be3be" + }, + { + "children": [ + { + "_key": "a0a7ffb7f6cf", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ef37c1e2c593", + "markDefs": [] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "03cc831da6b2", + "markDefs": [], + "children": [ + { + "_key": "8ea3741f842b", + "_type": "span", + "marks": [ + "strong" + ], + "text": "NOTE: " + }, + { + "text": "These exit statuses are not standard and can change depending on the resource manager you are using. Consult your cluster administrator or scheduler administration guide for details on the exit statuses used by your cluster in similar error conditions.", + "_key": "141c77d970f3", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "54000e295e84", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a54f4bd5ecd6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "5d31a1142abd", + "_type": "span", + "marks": [], + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h2", + "_key": "1af0fb865cb9", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Nextflow aims to give you control over every aspect of your workflow. These Nextflow options allow you to shape how Nextflow submits your processes to your executor, that can make your workflow more robust by avoiding the overloading of the executor. Some systems have hard limits which if you do not take into account, no processes will be executed. Being aware of these configuration values and how to use them is incredibly helpful when working with larger workflows. ", + "_key": "53bbce01c2c8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9952ef367fe5", + "markDefs": [] + } + ] + }, + { + "_type": "blogPost", + "_id": "43433fcdc74e", + "publishedAt": "2020-11-03T07:00:00.000Z", + "body": [ + { + "_key": "8e84437311cc", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The latest Nextflow version 2020.10.0 is the first stable release running on Groovy 3.", + "_key": "055f633cc22a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "52b71f60c1b3", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e1c53f6038bf" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a3ab3caceb3b", + "_type": "span", + "marks": [], + "text": "The first benefit of this change is that now Nextflow can be compiled and run on any modern Java virtual machine, from Java 8, all the way up to the latest Java 15!" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b1fe2108465" + }, + { + "_type": "block", + "style": "normal", + "_key": "462d3ec40af8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f8dac23d051c" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Along with this, the new Groovy runtime brings a whole lot of syntax enhancements that can be useful in the everyday life of pipeline developers. Let's see them more in detail.", + "_key": "7b104591b2ca", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "64f0b4c5553f", + "markDefs": [] + }, + { + "_key": "40803c5a6afb", + "children": [ + { + "_key": "4b553d7f729b", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "84b7960932ed", + "children": [ + { + "_type": "span", + "text": "Improved not operator", + "_key": "b8b702b72ec8" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "4df2d2fda967", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "!", + "_key": "417db461faeb" + }, + { + "_type": "span", + "marks": [], + "text": " (not) operator can now prefix the ", + "_key": "5083f4ff3611" + }, + { + "marks": [ + "code" + ], + "text": "in", + "_key": "89c529fbd799", + "_type": "span" + }, + { + "marks": [], + "text": " and ", + "_key": "446aec19e957", + "_type": "span" + }, + { + "_key": "1d07f8fca40d", + "_type": "span", + "marks": [ + "code" + ], + "text": "instanceof" + }, + { + "text": " keywords. This makes for more concise writing of some conditional expression, for example, the following snippet:", + "_key": "a43db1a84e26", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4b92921af711" + }, + { + "children": [ + { + "_key": "c888d720727c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d970610dbd14" + }, + { + "_key": "9e02d4ff08ae", + "code": "list = [10,20,30]\n\nif( !(x in list) ) {\n // ..\n}\nelse if( !(x instanceof String) ) {\n // ..\n}", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "02bf430998c0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "48d8a12b00d9" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "could be replaced by the following:", + "_key": "3e54379e66a5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d748372d34e4", + "markDefs": [] + }, + { + "_key": "f68367b5bcbd", + "children": [ + { + "text": "", + "_key": "f36f295a95ac", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f302696825bf", + "code": "list = [10,20,30]\n\nif( x !in list ) {\n // ..\n}\nelse if( x !instanceof String ) {\n // ..\n}", + "_type": "code" + }, + { + "_key": "432e2cccb9ab", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a87a9ef8d146" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "523df5b60ab6", + "markDefs": [], + "children": [ + { + "text": "Again, this is a small syntax change which makes the code a little more readable.", + "_key": "834f2957bfff", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "cf9983b7dd50", + "children": [ + { + "text": "", + "_key": "f06a9fdee1ad", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Elvis assignment operator", + "_key": "e52c01d9293b", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "3f2ad55ecdda" + }, + { + "markDefs": [], + "children": [ + { + "text": "The elvis assignment operator ", + "_key": "8769ef91d9e0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "?=", + "_key": "c78822ddd2e9", + "_type": "span" + }, + { + "_key": "f9fdfeb75297", + "_type": "span", + "marks": [], + "text": " allows the assignment of a value only if it was not previously assigned (or if it evaluates to " + }, + { + "_key": "82e519905626", + "_type": "span", + "marks": [ + "code" + ], + "text": "null" + }, + { + "_key": "4f2a8154227a", + "_type": "span", + "marks": [], + "text": "). Consider the following example:" + } + ], + "_type": "block", + "style": "normal", + "_key": "a640c54e51ec" + }, + { + "style": "normal", + "_key": "cb3b63d1cda6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4e2c1717ccaf" + } + ], + "_type": "block" + }, + { + "code": "def opts = [foo: 1]\n\nopts.foo ?= 10\nopts.bar ?= 20\n\nassert opts.foo == 1\nassert opts.bar == 20", + "_type": "code", + "_key": "89cd3dd8c2fa" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ebde7c798788" + } + ], + "_type": "block", + "style": "normal", + "_key": "2b31f3d6c78c" + }, + { + "_type": "block", + "style": "normal", + "_key": "523ec376fd4e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In this snippet, the assignment ", + "_key": "4ff08e6a4fd6", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "opts.foo ?= 10", + "_key": "b66a8a6c4eb3", + "_type": "span" + }, + { + "marks": [], + "text": " would be ignored because the dictionary ", + "_key": "322101bd9f80", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "opts", + "_key": "dec81060c9d7" + }, + { + "text": " already contains a value for the ", + "_key": "bd89e67e140d", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "foo", + "_key": "3ac803e7cc68", + "_type": "span" + }, + { + "marks": [], + "text": " attribute, while it is now assigned as expected.", + "_key": "8865ce529f08", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "10ff086761f4", + "children": [ + { + "text": "", + "_key": "02f15a26c3c6", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e5dd8ed20821", + "markDefs": [], + "children": [ + { + "_key": "2e33af70d071", + "_type": "span", + "marks": [], + "text": "In other words this is a shortcut for the following idiom:" + } + ] + }, + { + "style": "normal", + "_key": "96928cbeba7e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "16b61792f9bf" + } + ], + "_type": "block" + }, + { + "code": "if( some_variable != null ) {\n some_variable = 'Hello'\n}", + "_type": "code", + "_key": "8696985000dc" + }, + { + "style": "normal", + "_key": "23ac0f77f362", + "children": [ + { + "_type": "span", + "text": "", + "_key": "52cf021ed75e" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "cc410ba5be4f", + "markDefs": [ + { + "href": "https://groovy-lang.org/operators.html#_elvis_operator", + "_key": "1ac5be05f7c1", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you are wondering why it's called ", + "_key": "ab55181bad9f" + }, + { + "_key": "73f555215b7c", + "_type": "span", + "marks": [ + "em" + ], + "text": "Elvis" + }, + { + "marks": [], + "text": " assignment, well it's simple, because there's also the ", + "_key": "4cdc71ad8ab1", + "_type": "span" + }, + { + "_key": "049d89484768", + "_type": "span", + "marks": [ + "1ac5be05f7c1" + ], + "text": "Elvis operator" + }, + { + "_type": "span", + "marks": [], + "text": " that you should know (and use!) already. 😆", + "_key": "40b8254c84fc" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8b152786fe15", + "children": [ + { + "text": "", + "_key": "d1ed04b63951", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "6762625bf879", + "children": [ + { + "text": "Java style lambda expressions", + "_key": "3953b8a1cbc9", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "303906a68bd0", + "markDefs": [], + "children": [ + { + "_key": "5abdb42b36f1", + "_type": "span", + "marks": [], + "text": "Groovy 3 supports the syntax for Java lambda expression. If you don't know what a Java lambda expression is don't worry; it's a concept very similar to a Groovy closure, though with slight differences both in the syntax and the semantic. In a few words, a Groovy closure can modify a variable in the outside scope, while a Java lambda cannot." + } + ] + }, + { + "style": "normal", + "_key": "a9e5c5e34694", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9233ce2182fc" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "4ae4d3ba7f3c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In terms of syntax, a Groovy closure is defined as:", + "_key": "4c14a4668b12", + "_type": "span" + } + ] + }, + { + "_key": "678ba169fa9b", + "children": [ + { + "_key": "c229a82510d0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "{ it -> SOME_EXPRESSION_HERE }", + "_type": "code", + "_key": "f80c69d4887e" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "003c98f9cb64" + } + ], + "_type": "block", + "style": "normal", + "_key": "54aa7aa7e10a" + }, + { + "children": [ + { + "_key": "ee8fbe1f2148", + "_type": "span", + "marks": [], + "text": "While Java lambda expression looks like:" + } + ], + "_type": "block", + "style": "normal", + "_key": "54efda0e9d3d", + "markDefs": [] + }, + { + "_key": "2f82278c5d88", + "children": [ + { + "text": "", + "_key": "3574c0d5f2c0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "it -> { SOME_EXPRESSION_HERE }", + "_type": "code", + "_key": "68b6088a8251" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "02a0803b5f3d" + } + ], + "_type": "block", + "style": "normal", + "_key": "93a50c69875e" + }, + { + "style": "normal", + "_key": "ef25fe8c225e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "which can be simplified to the following form when the expression is a single statement:", + "_key": "8b7386ba16b2" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "8ae9732ea0e1", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "42d493b5e3b2" + }, + { + "_type": "code", + "_key": "4a84aa8b8afa", + "code": "it -> SOME_EXPRESSION_HERE" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "37c285c6b2ce" + } + ], + "_type": "block", + "style": "normal", + "_key": "e6aeb6428ede" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The good news is that the two syntaxes are interoperable in many cases and we can use the ", + "_key": "466897ef6dd9" + }, + { + "_key": "4be630a4ebe0", + "_type": "span", + "marks": [ + "em" + ], + "text": "lambda" + }, + { + "_type": "span", + "marks": [], + "text": " syntax to get rid-off of the curly bracket parentheses used by the Groovy notation to make our Nextflow script more readable.", + "_key": "fd69374c1104" + } + ], + "_type": "block", + "style": "normal", + "_key": "b4ca4482048a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0bcd65c369cd", + "children": [ + { + "text": "", + "_key": "526216b2f654", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "3de7da0e907b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For example, the following Nextflow idiom:", + "_key": "6b5d8035c976", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ae10c0f3432a" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8dcfe8c29f6" + }, + { + "code": "Channel\n .of( 1,2,3 )\n .map { it * it +1 }\n .view { \"the value is $it\" }", + "_type": "code", + "_key": "2bc6a3cd6474" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "701dc97269d1" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc589929cc74" + }, + { + "markDefs": [], + "children": [ + { + "text": "Can be rewritten using the lambda syntax as:", + "_key": "dd1bf0ef103a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e309736ab8db" + }, + { + "_key": "e81e4debd145", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2a9c725297f2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "Channel\n .of( 1,2,3 )\n .map( it -> it * it +1 )\n .view( it -> \"the value is $it\" )", + "_type": "code", + "_key": "5a5917ed6178" + }, + { + "_type": "block", + "style": "normal", + "_key": "386d69c791f7", + "children": [ + { + "text": "", + "_key": "16e2986568ed", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "It is a bit more consistent. Note however that the ", + "_key": "d9eb1e619d01", + "_type": "span" + }, + { + "text": "it -&gt;", + "_key": "f68edfb38ea4", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "e3ab6812a6e2", + "_type": "span", + "marks": [], + "text": " implicit argument is now mandatory (while when using the closure syntax it could be omitted). Also, when the operator argument is not " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "single", + "_key": "b1fe91cda903" + }, + { + "_type": "span", + "marks": [], + "text": " value, the lambda requires the round parentheses to define the argument e.g.", + "_key": "bd979da2eb3b" + } + ], + "_type": "block", + "style": "normal", + "_key": "123835fb7007", + "markDefs": [] + }, + { + "_key": "24f6ae5107b5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ff13dfbaa03b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "Channel\n .of( 1,2,3 )\n .map( it -> tuple(it * it, it+1) )\n .view( (a,b) -> \"the values are $a and $b\" )", + "_type": "code", + "_key": "54a9dee1d889" + }, + { + "_type": "block", + "style": "normal", + "_key": "8515930b1101", + "children": [ + { + "_type": "span", + "text": "", + "_key": "38aea1ede6d1" + } + ] + }, + { + "children": [ + { + "text": "Full support for Java streams API", + "_key": "d309255cf63f", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "4d2df815696f" + }, + { + "_type": "block", + "style": "normal", + "_key": "606f75bd6b57", + "markDefs": [ + { + "_type": "link", + "href": "https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/", + "_key": "a4687f75dcf0" + } + ], + "children": [ + { + "text": "Since version 8, Java provides a ", + "_key": "389fa08af411", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "a4687f75dcf0" + ], + "text": "stream library", + "_key": "732ef9f713ec" + }, + { + "text": " that is very powerful and implements some concepts and operators similar to Nextflow channels.", + "_key": "4def398ef96d", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ccf7e9542ab1" + } + ], + "_type": "block", + "style": "normal", + "_key": "4f57bec9f5ae" + }, + { + "_key": "2586e014a1d1", + "markDefs": [], + "children": [ + { + "text": "The main differences between the two are that Nextflow channels and the corresponding operators are ", + "_key": "4f1cde5376d9", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "non-blocking", + "_key": "a9dbd2d58386" + }, + { + "text": " i.e. their evaluation is performed asynchronously without blocking your program execution, while Java streams are executed in a synchronous manner (at least by default).", + "_key": "9766bb7a8758", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "6187b91a63b3", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "72068e43bdf7" + }, + { + "_key": "89f9d1f5653f", + "markDefs": [], + "children": [ + { + "text": "A Java stream looks like the following:", + "_key": "bb5ca01b3a0b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "1c4a518117fe", + "children": [ + { + "_key": "c3508e4826b3", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "3ffa8889065e", + "code": "assert (1..10).stream()\n .filter(e -> e % 2 == 0)\n .map(e -> e * 2)\n .toList() == [4, 8, 12, 16, 20]\n" + }, + { + "style": "normal", + "_key": "3e7771ca25aa", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6106315e224e" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#filter-java.util.function.Predicate-", + "_key": "fd027d190177", + "_type": "link" + }, + { + "_type": "link", + "href": "https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-", + "_key": "35f7ea986e64" + }, + { + "_key": "0f42b9cfffdd", + "_type": "link", + "href": "https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toList--" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/operator.html#filter", + "_key": "e7e1b60ec288" + }, + { + "href": "https://www.nextflow.io/docs/latest/operator.html#map", + "_key": "b83290f46da4", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/operator.html#tolist", + "_key": "fa9a008dd68d" + } + ], + "children": [ + { + "_key": "accd3f74a4fa", + "_type": "span", + "marks": [], + "text": "Note, in the above example " + }, + { + "text": "filter", + "_key": "90775c4e0d15", + "_type": "span", + "marks": [ + "fd027d190177" + ] + }, + { + "marks": [], + "text": ", ", + "_key": "995475676989", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "35f7ea986e64" + ], + "text": "map", + "_key": "e501d30501df" + }, + { + "text": " and ", + "_key": "d95418c509a3", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "0f42b9cfffdd" + ], + "text": "toList", + "_key": "c18abfeb86e1" + }, + { + "_type": "span", + "marks": [], + "text": " methods are Java stream operator not the ", + "_key": "b80e58067965" + }, + { + "text": "Nextflow", + "_key": "e6bea6688a83", + "_type": "span", + "marks": [ + "e7e1b60ec288" + ] + }, + { + "_key": "3b09f7998bae", + "_type": "span", + "marks": [], + "text": " " + }, + { + "text": "homonymous", + "_key": "f4d1ac8d89cf", + "_type": "span", + "marks": [ + "b83290f46da4" + ] + }, + { + "marks": [], + "text": " ", + "_key": "57d53bec5f91", + "_type": "span" + }, + { + "_key": "4ec3c54062d0", + "_type": "span", + "marks": [ + "fa9a008dd68d" + ], + "text": "ones" + }, + { + "text": ".", + "_key": "4b591c8af53c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "570780753227" + }, + { + "children": [ + { + "text": "", + "_key": "8606c914090e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ec731a9fb11" + }, + { + "_type": "block", + "style": "h3", + "_key": "1bf0faa8e423", + "children": [ + { + "_key": "c2ef08e119cd", + "_type": "span", + "text": "Java style method reference" + } + ] + }, + { + "style": "normal", + "_key": "4d6a43991888", + "markDefs": [], + "children": [ + { + "_key": "cc1093c205b3", + "_type": "span", + "marks": [], + "text": "The new runtime also allows for the use of the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "::", + "_key": "b8643dccc002" + }, + { + "_key": "e8c6a68fede1", + "_type": "span", + "marks": [], + "text": " operator to reference an object method. This can be useful to pass a method as an argument to a Nextflow operator in a similar manner to how it was already possible using a closure. For example:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "22d9fcb7b710", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1ea7b3779d87" + }, + { + "_key": "1f76264f689a", + "code": "Channel\n .of( 'a', 'b', 'c')\n .view( String::toUpperCase )", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "95ec77374dfa", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9958005250ad" + } + ] + }, + { + "_key": "d850ff805482", + "markDefs": [], + "children": [ + { + "_key": "b618d34b147d", + "_type": "span", + "marks": [], + "text": "The above prints:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "8d511a47e3f4", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca361089268d" + }, + { + "code": " A\n B\n C", + "_type": "code", + "_key": "769265e33a6f" + }, + { + "_type": "block", + "style": "normal", + "_key": "7ada1ae474d0", + "children": [ + { + "text": "", + "_key": "cd257b688408", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_key": "d124bb79c12a", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/operator.html#filter" + }, + { + "_key": "d06383a35f27", + "_type": "link", + "href": "https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toUpperCase--" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Because to ", + "_key": "a6f9b693b29f" + }, + { + "_type": "span", + "marks": [ + "d124bb79c12a" + ], + "text": "view", + "_key": "96afa975a3e8" + }, + { + "text": " operator applied the method ", + "_key": "3bd3409505f5", + "_type": "span", + "marks": [] + }, + { + "text": "toUpperCase", + "_key": "039c3cfd38ea", + "_type": "span", + "marks": [ + "d06383a35f27" + ] + }, + { + "_key": "17e369dd5bb2", + "_type": "span", + "marks": [], + "text": " to each element emitted by the channel." + } + ], + "_type": "block", + "style": "normal", + "_key": "04fd63e9e428" + }, + { + "children": [ + { + "_key": "3287e258e238", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8f3977554866" + }, + { + "_type": "block", + "style": "h3", + "_key": "3f3c62936f90", + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "617c50c8dbad" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The new Groovy runtime brings a lot of syntax sugar for Nextflow pipelines and allows the use of modern Java runtime which delivers better performance and resource usage.", + "_key": "40e4525db7b7" + } + ], + "_type": "block", + "style": "normal", + "_key": "8fa836b39a8d", + "markDefs": [] + }, + { + "_key": "7e06bac5e6ba", + "children": [ + { + "text": "", + "_key": "24aa783003d4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ones listed above are only a small selection which may be useful to everyday Nextflow developers. If you are curious to learn more about all the changes in the new Groovy parser you can find more details in ", + "_key": "2feac875dab9" + }, + { + "_key": "1a3af472812e", + "_type": "span", + "marks": [ + "0b82aa48d586" + ], + "text": "this link" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3f58ee0e1a20" + } + ], + "_type": "block", + "style": "normal", + "_key": "fdcd421e5938", + "markDefs": [ + { + "_type": "link", + "href": "https://groovy-lang.org/releasenotes/groovy-3.0.html", + "_key": "0b82aa48d586" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "6baeb1d858b8", + "children": [ + { + "_key": "760e03af03fb", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "267537a577e5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Finally, a big thanks to the Groovy community for their significant efforts in developing and maintaining this great programming environment.", + "_key": "85c29ae625fa" + } + ], + "_type": "block" + } + ], + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "f2d224b1f548" + } + ], + "_updatedAt": "2024-09-26T09:02:22Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "groovy3-syntax-sugar" + } + }, + "_rev": "rsIQ9Jd8Z4nKBVUruy4PKI", + "_createdAt": "2024-09-25T14:15:50Z", + "title": "More syntax sugar for Nextflow developers!" + }, + { + "body": [ + { + "style": "normal", + "_key": "732639609f9c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending ", + "_key": "97b74589a3f5", + "_type": "span" + }, + { + "_key": "49a140b29340", + "_type": "span", + "marks": [ + "code" + ], + "text": "-resume" + }, + { + "_key": "890e4fbf4ac2", + "_type": "span", + "marks": [], + "text": " to the " + }, + { + "text": "nextflow run", + "_key": "7c511d616f5f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.", + "_key": "1eeb488f72d7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c6061a5c1563", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "f1b7cb440560", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "8d3f0ecc874a", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html", + "_key": "d69163952239" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html", + "_key": "6b2950938430" + } + ], + "children": [ + { + "marks": [], + "text": "We have previously written about Nextflow's ", + "_key": "c785c7825c57", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "d69163952239" + ], + "text": "resume functionality", + "_key": "a469ead1685a" + }, + { + "text": " as well as some ", + "_key": "195dcfda6745", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "6b2950938430" + ], + "text": "troubleshooting strategies", + "_key": "a3664d9281bb" + }, + { + "_type": "span", + "marks": [], + "text": " to gain more insights on the caching behavior.", + "_key": "1325a4dcb566" + } + ], + "_type": "block" + }, + { + "_key": "535d6923cce6", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "c30613d02626", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "44f66ec5ef29", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf", + "_key": "e4d5b3d5a97f" + } + ], + "children": [ + { + "marks": [], + "text": "In this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the ", + "_key": "6531bdacd749", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "e4d5b3d5a97f" + ], + "text": "nextflow-io/rnaseq-nf", + "_key": "c4e41e3114a9" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.", + "_key": "8e2cd89e0e54" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "51c0a1cbaa6f", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "157c14d02349" + }, + { + "_key": "cf04aeeda787", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Local setup for the test", + "_key": "084d30d608a6", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "d067adcb4ecc", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf", + "_key": "55186110ab8d" + } + ], + "children": [ + { + "_key": "d7c645f90ff3", + "_type": "span", + "marks": [], + "text": "First, we clone the " + }, + { + "_type": "span", + "marks": [ + "55186110ab8d" + ], + "text": "nextflow-io/rnaseq-nf", + "_key": "712d80280221" + }, + { + "marks": [], + "text": " pipeline locally:", + "_key": "f7eefa643d3e", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ab81a7544e6d" + } + ], + "_type": "block", + "style": "normal", + "_key": "891263db5db4" + }, + { + "_key": "d00c3ed26798", + "code": "$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf", + "_type": "code" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "4aa63bc48fec", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ad94ce3688ad", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the examples below, we have used Nextflow ", + "_key": "92d74afdf7f9" + }, + { + "marks": [ + "code" + ], + "text": "v22.10.0", + "_key": "306a66e0323d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", Docker ", + "_key": "854bd93de532" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "v20.10.8", + "_key": "2bcc81b17709" + }, + { + "_key": "9839887bddcb", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "code" + ], + "text": "Java v17 LTS", + "_key": "8b3a8c18db3b", + "_type": "span" + }, + { + "text": " on MacOS.", + "_key": "7d0974e67eb6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b57147c229ba", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "d1d2cc9f1076", + "markDefs": [], + "children": [ + { + "_key": "d71b2b994302", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "h3", + "_key": "46437579cbe2", + "markDefs": [], + "children": [ + { + "_key": "ff3c422a2d0a", + "_type": "span", + "marks": [], + "text": "Pipeline flowchart" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.", + "_key": "9a2bec7983d4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d5c0a58d3437" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "53b58afd458f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c5cc6fa8e8b0", + "markDefs": [] + }, + { + "_key": "4092c8d3dfc4", + "asset": { + "_ref": "image-ededfe17a105d5ee8cca74f55576d2298cd702e1-732x560-png", + "_type": "reference" + }, + "_type": "image", + "alt": "rnaseq-nf" + }, + { + "_key": "1082b61f3da5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "722542ebc94a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Logs from initial (fresh) run", + "_key": "567ae2f4bf4e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "a8ccb71b1a11", + "markDefs": [] + }, + { + "_key": "c717e7a68381", + "markDefs": [ + { + "href": "https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html", + "_key": "4136933a3161", + "_type": "link" + } + ], + "children": [ + { + "_key": "65307b0af3a3", + "_type": "span", + "marks": [], + "text": "As a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article " + }, + { + "_type": "span", + "marks": [ + "4136933a3161" + ], + "text": "Troubleshooting Nextflow resume", + "_key": "2856c955181c" + }, + { + "text": ".", + "_key": "d8e22dfd4f6a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "1f4d6a843acf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "da3214fcf929" + } + ] + }, + { + "_key": "e8bd88c1ba77", + "markDefs": [], + "children": [ + { + "_key": "01f56f2f8ac7", + "_type": "span", + "marks": [], + "text": "To have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called " + }, + { + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "1c414c119239", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and use them later on as "ground-truth" for the analysis. In order to save the process hashes we use the ", + "_key": "68b784c9dcce" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-dump-hashes", + "_key": "1220e600acb8" + }, + { + "_type": "span", + "marks": [], + "text": " flag, which prints them to the log.", + "_key": "7ee9cfdbe528" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5d10b3ae0575", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dc4710a9f9dd" + }, + { + "style": "normal", + "_key": "abe1ff0cb35a", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/cli.html#execution-logs", + "_key": "9521f1246ec3", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "strong" + ], + "text": "TIP:", + "_key": "32833f786885", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " We rely upon the ", + "_key": "ecb720b277e0" + }, + { + "_type": "span", + "marks": [ + "9521f1246ec3" + ], + "text": "`-log` option", + "_key": "b73e52b90fb3" + }, + { + "_key": "993f5b924dd3", + "_type": "span", + "marks": [], + "text": " in the " + }, + { + "_key": "b8e71f2b6802", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow" + }, + { + "_type": "span", + "marks": [], + "text": " command line interface to be able to supply a custom log file name instead of the default ", + "_key": "a4ca16d2d4ee" + }, + { + "text": ".nextflow.log", + "_key": "e39a0163a989", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "fa3825cfa6cc", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4084d9440bf9" + } + ], + "_type": "block", + "style": "normal", + "_key": "23ef9415249c" + }, + { + "code": "$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]", + "_type": "code", + "_key": "ad81cb2dbb7b" + }, + { + "_type": "block", + "style": "normal", + "_key": "7c78f0d0a134", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "d218e161be55" + } + ] + }, + { + "_key": "869451118ccc", + "markDefs": [], + "children": [ + { + "text": "Edit the `FastQC` process", + "_key": "9c4db898f26a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [ + { + "_key": "5d690c5adb72", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#cpus" + } + ], + "children": [ + { + "marks": [], + "text": "After the initial run of the pipeline, we introduce a change in the ", + "_key": "f5d6bd8e240f", + "_type": "span" + }, + { + "text": "fastqc.nf", + "_key": "4415821817fc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " module, hard coding the number of threads which should be used to run the ", + "_key": "bff539ed3cf2" + }, + { + "_key": "018ab416d98a", + "_type": "span", + "marks": [ + "code" + ], + "text": "FASTQC" + }, + { + "_type": "span", + "marks": [], + "text": " process via Nextflow's ", + "_key": "a72770c0cd9f" + }, + { + "marks": [ + "5d690c5adb72" + ], + "text": "`cpus` directive", + "_key": "794c638e7ff8", + "_type": "span" + }, + { + "_key": "47a4c6d2a001", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "ebe56f3c4f0d" + }, + { + "_type": "block", + "style": "normal", + "_key": "fa151421a1ee", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ac0064faaf09", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "77f80ea998ce", + "markDefs": [], + "children": [ + { + "text": "Here's the output of ", + "_key": "4d7d9ce13afe", + "_type": "span", + "marks": [] + }, + { + "_key": "1d3f59f95a3f", + "_type": "span", + "marks": [ + "code" + ], + "text": "git diff" + }, + { + "text": " on the contents of ", + "_key": "5c57909bc73a", + "_type": "span", + "marks": [] + }, + { + "text": "modules/fastqc/main.nf", + "_key": "3c4a242b8580", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " file:", + "_key": "518c00887f0a" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e51edf190e03", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "759e7c11085e" + }, + { + "_type": "code", + "_key": "8271cf37eb5f", + "code": "--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }" + }, + { + "style": "normal", + "_key": "8bdc995861d1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "4f12aa52c341" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "220da63ef88b", + "markDefs": [], + "children": [ + { + "_key": "d5f28464074f", + "_type": "span", + "marks": [], + "text": "Logs from the follow up run" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5e76228c5234", + "markDefs": [], + "children": [ + { + "text": "Next, we run the pipeline again with the ", + "_key": "1c8c8c09f191", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-resume", + "_key": "82b9d7bfa3dc" + }, + { + "marks": [], + "text": " option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called ", + "_key": "350422ad948e", + "_type": "span" + }, + { + "text": "resumed_run.log", + "_key": "11298572f5a7", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": ".", + "_key": "37584ff05daa", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "4d6b4caeecc1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f3c3318912ef" + } + ] + }, + { + "_key": "c4abca2faba2", + "code": "$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]", + "_type": "code" + }, + { + "_key": "a5762de807ad", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "11452f6945a5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "6f123ae82f8a", + "_type": "span", + "marks": [], + "text": "Analysis of cache hashes" + } + ], + "_type": "block", + "style": "h2", + "_key": "8b8863054dd6", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "From the summary of the command line output above, we can see that the ", + "_key": "fc789c77278a" + }, + { + "marks": [ + "code" + ], + "text": "RNASEQ:FASTQC (FASTQC on ggal_gut)", + "_key": "085496247c88", + "_type": "span" + }, + { + "_key": "57c6976fe5f8", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "text": "MULTIQC", + "_key": "a2691e574c3b", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the ", + "_key": "6da2789ce429", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run", + "_key": "aae77b8e6eff" + }, + { + "text": " and ", + "_key": "adb3bdb8c2a6", + "_type": "span", + "marks": [] + }, + { + "text": "resumed_run", + "_key": "1ba48e531d92", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": ".", + "_key": "dfc833a7556e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c5a7902c1f83", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "71cca0a76695" + } + ], + "_type": "block", + "style": "normal", + "_key": "ecca541ff52c" + }, + { + "_key": "f70864556a59", + "markDefs": [], + "children": [ + { + "text": "For the analysis, we need to keep in mind that:", + "_key": "ce95e8c4179d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d771018d673f" + } + ], + "_type": "block", + "style": "normal", + "_key": "b7bf95090b16" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.", + "_key": "6ab08f0711c0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1cc284086289" + }, + { + "style": "normal", + "_key": "ecc58bfac7a7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ae0c37bd8b18", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "2215c9877647", + "markDefs": [], + "children": [ + { + "text": "\nFind the process level hashes", + "_key": "2ee4b36a0dc4", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "67222908ee7e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We can use standard Unix tools like ", + "_key": "6586d8985b96", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "grep", + "_key": "fd6f7bf8324e", + "_type": "span" + }, + { + "text": ", ", + "_key": "f7046f038a32", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "cut", + "_key": "37210f788748", + "_type": "span" + }, + { + "text": " and ", + "_key": "1c7827db52fe", + "_type": "span", + "marks": [] + }, + { + "_key": "94325fcf0db5", + "_type": "span", + "marks": [ + "code" + ], + "text": "sort" + }, + { + "text": " to address these points and filter out the relevant information:", + "_key": "5366cf63f27b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "797876812757" + } + ], + "_type": "block", + "style": "normal", + "_key": "f262e13aee1a" + }, + { + "_type": "block", + "style": "normal", + "_key": "8c56562e9848", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use `grep` to isolate log entries with `cache hash` stringRemove the prefix time-stamps using `cut -d ‘-’ -f 3`Remove the caching mode related information using `cut -d ';' -f 1`Sort the lines based on process names using `sort` to have a standard order before comparisonUse `tee` to print the resultant strings to the terminal and simultaneously save to a file", + "_key": "58517bb96137" + } + ] + }, + { + "_key": "0b8bb472af8b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5f40609fab0a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Now, let’s apply these transformations to the ", + "_key": "f69cc3652d1f", + "_type": "span" + }, + { + "_key": "58e51715c191", + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run.log" + }, + { + "marks": [], + "text": " as well as ", + "_key": "d0cbac16c5b5", + "_type": "span" + }, + { + "_key": "c354bb5d9332", + "_type": "span", + "marks": [ + "code" + ], + "text": "resumed_run.log" + }, + { + "marks": [], + "text": " entries.", + "_key": "03f26c34617a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d9ce843c2f3f", + "markDefs": [] + }, + { + "_key": "19cfae7cb4bf", + "markDefs": [], + "children": [ + { + "_key": "beb2d0c54850", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "`fresh_run.log`", + "_key": "aa7208b0766f" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1caec0c9faa", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "694e5de3a124", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "af649ed23e7a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038", + "_type": "code", + "_key": "62f28de5f445" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1bc2fd42ff15", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3d36ce9fdf4c" + }, + { + "style": "normal", + "_key": "bb74a1f257a0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "`resumed_run.log`", + "_key": "0475b4690945", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "4d348beb89a7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bd96d2fd6eb3", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "1db225be98b5", + "code": "$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038" + }, + { + "style": "normal", + "_key": "9212e03fe002", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6c719477fedc" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nInference from process top-level hashes", + "_key": "968be1f51aed" + } + ], + "_type": "block", + "style": "h3", + "_key": "18d7101f4b0e" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html", + "_key": "8134dfd14c04" + } + ], + "children": [ + { + "_key": "4a85567dbdc9", + "_type": "span", + "marks": [], + "text": "Computing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in " + }, + { + "_type": "span", + "marks": [ + "8134dfd14c04" + ], + "text": "Demystifying Nextflow resume", + "_key": "377b1bbe58af" + }, + { + "_type": "span", + "marks": [], + "text": " blog post) . The change we made in the task level CPUs directive and script section of the ", + "_key": "808c8a6314b6" + }, + { + "text": "FASTQC", + "_key": "f43ceeb13383", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "68061a3b18bc", + "_type": "span", + "marks": [], + "text": " process triggered a re-computation of hashes:" + } + ], + "_type": "block", + "style": "normal", + "_key": "30c26048ff53" + }, + { + "_key": "99c31fd8f86b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "907388760050" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "eff5fa5afc7e", + "code": "--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n" + }, + { + "_type": "block", + "style": "normal", + "_key": "4c909f9db6ce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6000246ae41a" + } + ] + }, + { + "style": "normal", + "_key": "62aed64f89ff", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Even though we only introduced changes in ", + "_key": "cadb7f0ce394", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "FASTQC", + "_key": "f07dc612f194", + "_type": "span" + }, + { + "text": ", the ", + "_key": "75cde0c19505", + "_type": "span", + "marks": [] + }, + { + "_key": "e060d8f62816", + "_type": "span", + "marks": [ + "code" + ], + "text": "MULTIQC" + }, + { + "_type": "span", + "marks": [], + "text": " process was re-run since it relies upon the output of the ", + "_key": "dc3ba08223c0" + }, + { + "text": "FASTQC", + "_key": "d3c543a71894", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:", + "_key": "0592006acb23", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e1117e993450", + "markDefs": [], + "children": [ + { + "_key": "8960868b9f60", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "alt": "rnaseq-nf after modification", + "_key": "80b27b4ce745", + "asset": { + "_ref": "image-88ad0b925166304c5d29e44a7fdfbaa0994d6ff9-732x503-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "style": "normal", + "_key": "12ce673df236", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7b7a35207bf4" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "d6340d4dae8e", + "_type": "span", + "marks": [], + "text": "Understanding why `FASTQC` was re-run" + } + ], + "_type": "block", + "style": "h3", + "_key": "dfe46bbe6047" + }, + { + "style": "normal", + "_key": "60bf5abdd72f", + "markDefs": [], + "children": [ + { + "_key": "5e4c23c4219f", + "_type": "span", + "marks": [], + "text": "We can see the full list of " + }, + { + "marks": [ + "code" + ], + "text": "FASTQC", + "_key": "5e762db70882", + "_type": "span" + }, + { + "_key": "bfeb7d672cbb", + "_type": "span", + "marks": [], + "text": " process hashes within the " + }, + { + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "28e1f4a3c15c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " file", + "_key": "0c93966f7bf2" + } + ], + "_type": "block" + }, + { + "_key": "4343feff844a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bf51333867a9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n", + "_type": "code", + "_key": "59b0a5296248" + }, + { + "children": [ + { + "_key": "7d25680a03a7", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b60d849573cd", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0a4c735fc332", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "When we isolate and compare the log entries for ", + "_key": "f7bd595d9c33" + }, + { + "marks": [ + "code" + ], + "text": "FASTQC", + "_key": "1dc02cb2a819", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " between ", + "_key": "f4e9bb1da943" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "9a094d9c7082" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "441d8df01e76" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "resumed_run.log", + "_key": "29b8db7190c8" + }, + { + "_key": "0935945838f8", + "_type": "span", + "marks": [], + "text": ", we see the following diff:" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "72be07ca666f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "20c61c59625b" + }, + { + "code": "--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n", + "_type": "code", + "_key": "c22a71af5f4b" + }, + { + "_key": "dcf5557d908c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "45e6d4fe9377" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "351a3a99ba4a", + "markDefs": [], + "children": [ + { + "text": "Observations from the diff:", + "_key": "1cde28d3c278", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "aa22c07074fb", + "markDefs": [], + "children": [ + { + "_key": "3248c000c177", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5cd1bda1a4ca", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command.There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.", + "_key": "efec6b5e8c12", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "3a8af4c40e17", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9b6667d1dd51", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "18e1e0f2a49c", + "markDefs": [], + "children": [ + { + "_key": "5c4cfcb84d53", + "_type": "span", + "marks": [], + "text": "In other words, the diff from log files is confirming our edits." + } + ] + }, + { + "style": "normal", + "_key": "47d414d946ac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f8f7b4eb1f36", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "46f1aec08d07", + "markDefs": [], + "children": [ + { + "_key": "aca9b35068f5", + "_type": "span", + "marks": [], + "text": "\nUnderstanding why `MULTIQC` was re-run" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "_key": "503012eaa399", + "_type": "span", + "marks": [], + "text": "Now, we apply the same analysis technique for the " + }, + { + "text": "MULTIQC", + "_key": "1d49ab1d5509", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " process in both log files:", + "_key": "6f94f230701c" + } + ], + "_type": "block", + "style": "normal", + "_key": "da66f4fed079" + }, + { + "_key": "d9b21665048d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "76d55f7f8db6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true", + "_type": "code", + "_key": "8a5be14e4e72" + }, + { + "style": "normal", + "_key": "c09212836c96", + "markDefs": [], + "children": [ + { + "_key": "8ee886a40043", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "09ba8f476164", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Here, the highlighted diffs show the directory of the input files, changing as a result of ", + "_key": "358b7f8c6954", + "_type": "span" + }, + { + "text": "FASTQC", + "_key": "cdc51f83a135", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " being re-run; as a result ", + "_key": "522f607948f8" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "MULTIQC", + "_key": "e2734525cee4" + }, + { + "marks": [], + "text": " has a new hash and has to be re-run as well.", + "_key": "ae996cbfa762", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "17edf9feb08e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f1a7396f61c1" + }, + { + "style": "h2", + "_key": "09686e2723a6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "3201b6569b94" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "b6f03be3ca6a", + "_type": "span", + "marks": [], + "text": "Debugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run." + } + ], + "_type": "block", + "style": "normal", + "_key": "3bb74e45edbd", + "markDefs": [] + }, + { + "_key": "76a74ca42b28", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "75cae1ccedc3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "When analyzing large datasets, it may be worth using the ", + "_key": "a5362fa12b2e" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-dump-hashes", + "_key": "336ee9e0c0e9" + }, + { + "_type": "span", + "marks": [], + "text": " option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.", + "_key": "d0684f4eeda7" + } + ], + "_type": "block", + "style": "normal", + "_key": "4827c0c18b8c", + "markDefs": [] + }, + { + "children": [ + { + "_key": "4e0ae25fdcbc", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e6f16df28be6", + "markDefs": [] + }, + { + "_key": "7d26c81f4111", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "While this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an ", + "_key": "5ba5dd581415" + }, + { + "_key": "f81d6f3cfcae", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-cache" + }, + { + "_key": "bc95329f11fd", + "_type": "span", + "marks": [], + "text": " plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins." + } + ], + "_type": "block", + "style": "normal" + } + ], + "_type": "blogPost", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWNOF", + "_type": "reference" + }, + "title": "Analyzing caching behavior of pipelines", + "publishedAt": "2022-11-10T07:00:00.000Z", + "meta": { + "description": "The ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending -resume to the nextflow run command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.", + "slug": { + "current": "caching-behavior-analysis" + } + }, + "_rev": "rsIQ9Jd8Z4nKBVUruy4Z0O", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "13481c55fbfe" + } + ], + "_createdAt": "2024-09-25T14:16:26Z", + "_updatedAt": "2024-09-30T08:54:10Z", + "_id": "43906c1c11d4" + }, + { + "author": { + "_ref": "mNsm4Vx1W1Wy6aYYkroetD", + "_type": "reference" + }, + "_type": "blogPost", + "body": [ + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/@nf-core/playlists?view=50&sort=dd&shelf_id=2", + "_key": "51eb0944ee8d" + }, + { + "_key": "a1b64fa15713", + "_type": "link", + "href": "https://www.nextflow.io/" + }, + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "0d141144762b" + } + ], + "children": [ + { + "marks": [], + "text": "The Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual ", + "_key": "a1e6a8ac31a4", + "_type": "span" + }, + { + "marks": [ + "51eb0944ee8d" + ], + "text": "Nextflow and nf-core community training online", + "_key": "a1c83572d41f", + "_type": "span" + }, + { + "_key": "a8a2f07be3ed", + "_type": "span", + "marks": [], + "text": ". In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, " + }, + { + "text": "Nextflow", + "_key": "319609bf04e5", + "_type": "span", + "marks": [ + "a1b64fa15713" + ] + }, + { + "text": " and ", + "_key": "15668be18dee", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "0d141144762b" + ], + "text": "nf-core", + "_key": "2fd4b6857cf3", + "_type": "span" + }, + { + "text": " training for Masters and Ph.D. students in Bioinformatics and other related fields.", + "_key": "206bd0aeb42f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d51c89973ec4" + }, + { + "_key": "671606aec285", + "children": [ + { + "_key": "ed270fa7eeed", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "9d8f1842dc22", + "markDefs": [ + { + "_type": "link", + "href": "https://bioinfo.imd.ufrn.br/site/en-US", + "_key": "98fa1d653af9" + }, + { + "_type": "link", + "href": "https://www.ufrn.br/", + "_key": "95ae3e5d4d94" + } + ], + "children": [ + { + "marks": [], + "text": "We recently held one of our first such projects, a collaboration with the ", + "_key": "42b7195e5bc8", + "_type": "span" + }, + { + "_key": "46b09c776de5", + "_type": "span", + "marks": [ + "98fa1d653af9" + ], + "text": "Bioinformatics Multidisciplinary Environment, BioME" + }, + { + "_key": "ee3c9a7e3150", + "_type": "span", + "marks": [], + "text": " at the " + }, + { + "text": "Federal University of Rio Grande do Norte (UFRN)", + "_key": "492cc5dc78c8", + "_type": "span", + "marks": [ + "95ae3e5d4d94" + ] + }, + { + "marks": [], + "text": " in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.", + "_key": "65a381137efb", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "ba9c8f417396", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e4ae91ce0df2" + } + ], + "_type": "block" + }, + { + "_type": "block", + "_key": "8b987db41ddb" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The course participants included one undergraduate student, Master's students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!", + "_key": "9e9233ba5efe" + } + ], + "_type": "block", + "style": "normal", + "_key": "95ae409ffc6d", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3cb6f3fdc8e9" + } + ], + "_type": "block", + "style": "normal", + "_key": "36ce52267c99" + }, + { + "_type": "block", + "style": "normal", + "_key": "a39e3858a715", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0e2e92569fc1", + "_type": "span" + }, + { + "marks": [], + "text": " Picture with some of the students that attended the course", + "_key": "cd0a0a276f5d", + "_type": "span" + } + ] + }, + { + "_key": "92f1a36cb18e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "cd651eaa8a3f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "fb83e1607c77", + "children": [ + { + "_type": "span", + "text": "Final projects", + "_key": "4f6206eb7dfa" + } + ] + }, + { + "_key": "4dc518ee10e1", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Final hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion.", + "_key": "598db10b2880", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d27c3dc12d23", + "children": [ + { + "_type": "span", + "text": "", + "_key": "41be59fe5f7b" + } + ] + }, + { + "children": [ + { + "text": "Euryale 🐍", + "_key": "8c15efc545af", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "a723dc0c7a21" + }, + { + "_key": "2aa8a02975b8", + "markDefs": [ + { + "href": "https://www.linkedin.com/in/joao-vitor-cavalcante", + "_key": "b63fe28f6eb5", + "_type": "link" + }, + { + "_key": "701842123b44", + "_type": "link", + "href": "https://www.frontiersin.org/articles/10.3389/fgene.2022.814437/full" + }, + { + "href": "https://github.com/dalmolingroup/euryale/", + "_key": "1ce0127637e9", + "_type": "link" + } + ], + "children": [ + { + "text": "João Vitor Cavalcante", + "_key": "6b187dc441b7", + "_type": "span", + "marks": [ + "b63fe28f6eb5" + ] + }, + { + "_key": "485a180a8a2a", + "_type": "span", + "marks": [], + "text": ", along with collaborators, had developed and " + }, + { + "text": "published", + "_key": "aa9ddba2c84d", + "_type": "span", + "marks": [ + "701842123b44" + ] + }, + { + "_key": "672936f430f4", + "_type": "span", + "marks": [], + "text": " a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result " + }, + { + "_type": "span", + "marks": [ + "1ce0127637e9" + ], + "text": "here", + "_key": "b15e9d5bb14b" + }, + { + "marks": [], + "text": " 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓", + "_key": "a4b5dca97d87", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "50fbd5600038", + "children": [ + { + "_type": "span", + "text": "", + "_key": "58a9436ce884" + } + ] + }, + { + "_key": "ed16e8920c58", + "children": [ + { + "text": "Bringing Nanopore to Google Batch ☁️", + "_key": "4ea5b9601fee", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-alignment", + "_key": "30afb2396762" + }, + { + "_type": "link", + "href": "https://www.linkedin.com/in/daniloimparato", + "_key": "0729f1dd0df5" + }, + { + "href": "https://github.com/daniloimparato/wf-alignment", + "_key": "211f0107b536", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, ", + "_key": "2a0259338942" + }, + { + "_type": "span", + "marks": [ + "30afb2396762" + ], + "text": "wf-alignment", + "_key": "546c43557f8b" + }, + { + "text": ", takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, ", + "_key": "cd2bf181005d", + "_type": "span", + "marks": [] + }, + { + "text": "Danilo Imparato", + "_key": "8dd718b54721", + "_type": "span", + "marks": [ + "0729f1dd0df5" + ] + }, + { + "text": " noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his ", + "_key": "f2f8992b2a4e", + "_type": "span", + "marks": [] + }, + { + "text": "final project", + "_key": "1a3645726330", + "_type": "span", + "marks": [ + "211f0107b536" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯", + "_key": "aa9a085e3334" + } + ], + "_type": "block", + "style": "normal", + "_key": "de08655bb970" + }, + { + "children": [ + { + "text": "", + "_key": "04235bc45cf1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "463e99401e21" + }, + { + "style": "h3", + "_key": "7260eeddc37b", + "children": [ + { + "text": "Nextflow applied to Economics! 🤩", + "_key": "947f0bd8688a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.linkedin.com/in/galileu-nobre-901551187/", + "_key": "4b8a87de6eee" + }, + { + "_type": "link", + "href": "https://github.com/galileunobre/nextflow_projeto_1", + "_key": "10465c90626d" + } + ], + "children": [ + { + "_key": "703bc67f2b54", + "_type": "span", + "marks": [ + "4b8a87de6eee" + ], + "text": "Galileu Nobre" + }, + { + "marks": [], + "text": " is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his ", + "_key": "46c5ecb28dd1", + "_type": "span" + }, + { + "_key": "64310262ad38", + "_type": "span", + "marks": [ + "10465c90626d" + ], + "text": "final project" + }, + { + "marks": [], + "text": ". The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.", + "_key": "1d76f5973b62", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "eed89a5c82e3" + }, + { + "_type": "block", + "style": "normal", + "_key": "afde11d90f50", + "children": [ + { + "text": "", + "_key": "4a6105363d71", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "Whole Exome Sequencing 🧬", + "_key": "49d330fc42fd" + } + ], + "_type": "block", + "style": "h3", + "_key": "0ab6c2cc8f1c" + }, + { + "style": "normal", + "_key": "24f4a098cfb8", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/RafaellaFerraz/exome", + "_key": "b42bc6498821" + }, + { + "href": "https://www.linkedin.com/in/rafaella-sousa-ferraz", + "_key": "cef34af031d8", + "_type": "link" + } + ], + "children": [ + { + "_key": "ba0b55ed6e6a", + "_type": "span", + "marks": [], + "text": "For her " + }, + { + "_type": "span", + "marks": [ + "b42bc6498821" + ], + "text": "final project", + "_key": "acdf2ab2a152" + }, + { + "_key": "a3bdce18001c", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "cef34af031d8" + ], + "text": "Rafaella Ferraz", + "_key": "40e6caafdd31" + }, + { + "marks": [], + "text": " used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻", + "_key": "0b02b4bc1c94", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8357f7b6dc70" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1f089d93833" + }, + { + "children": [ + { + "_type": "span", + "text": "RNASeq with contamination 🧫", + "_key": "6df8e7a6d76f" + } + ], + "_type": "block", + "style": "h3", + "_key": "bef5f952a22d" + }, + { + "style": "normal", + "_key": "4564b1ec2cd1", + "markDefs": [ + { + "_key": "2ce26d041b3c", + "_type": "link", + "href": "https://github.com/iaradsouza1/tab-projeto-final" + }, + { + "_type": "link", + "href": "https://www.linkedin.com/in/iaradsouza", + "_key": "cfd0cab87952" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In her ", + "_key": "0eb4bc972908" + }, + { + "text": "final project", + "_key": "aaf4218b9a96", + "_type": "span", + "marks": [ + "2ce26d041b3c" + ] + }, + { + "text": ", ", + "_key": "a24d31f9fd3a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "cfd0cab87952" + ], + "text": "Iara Souza", + "_key": "289a12ef43de" + }, + { + "_key": "c934b4b9bb09", + "_type": "span", + "marks": [], + "text": " developed a bioinformatics pipeline that analyzed RNA-Seq data when it's required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the "batteries included" that come with it 🔋😬" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "8fdc9c47a48d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d17e0050429d" + }, + { + "style": "h3", + "_key": "9af825ed2dd5", + "children": [ + { + "_key": "5e614271b6f9", + "_type": "span", + "text": "SARS-CoV-2 Genome assembly and lineage classification 🦠" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "209d5b4a82af", + "markDefs": [ + { + "_type": "link", + "href": "https://www.linkedin.com/in/diego-go-tex", + "_key": "115f0423c569" + }, + { + "_type": "link", + "href": "https://github.com/diegogotex/sarscov2_irma_nf", + "_key": "bb7f4cab9138" + } + ], + "children": [ + { + "marks": [ + "115f0423c569" + ], + "text": "Diego Teixeira", + "_key": "632422e1b571", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a ", + "_key": "9f0381c1c9ec" + }, + { + "_type": "span", + "marks": [ + "bb7f4cab9138" + ], + "text": "Nextflow pipeline", + "_key": "26663d54bd35" + }, + { + "text": " aggregating all tools and analyses he's been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.", + "_key": "882b9f30cfa9", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "eb2e9d6a5605", + "children": [ + { + "text": "", + "_key": "21be266b3a74", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f7353f089f7d", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/modules", + "_key": "77bf0a155166" + }, + { + "href": "https://nf-co.re/pipelines", + "_key": "70f0ef63311f", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the nf-core project, there are almost a ", + "_key": "0bfb01e85e81" + }, + { + "_key": "5f4d6cc147af", + "_type": "span", + "marks": [ + "77bf0a155166" + ], + "text": "thousand modules" + }, + { + "_type": "span", + "marks": [], + "text": " ready to plug in your pipeline, together with ", + "_key": "27d12fb3edc7" + }, + { + "_key": "61260fa7e7e9", + "_type": "span", + "marks": [ + "70f0ef63311f" + ], + "text": "dozens of full-featured pipelines" + }, + { + "marks": [], + "text": ". However, in many situations, you'll need a custom pipeline. With that in mind, it's very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.", + "_key": "17fda30e3afd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "2830d86ace99", + "children": [ + { + "_key": "ae5ca96f544b", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "h2", + "_key": "2f1b0f0ec8f0", + "children": [ + { + "_key": "185df3b0ca51", + "_type": "span", + "text": "Exciting experience!" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩", + "_key": "5aeb873e4d94" + } + ], + "_type": "block", + "style": "normal", + "_key": "ada74de71e55" + }, + { + "style": "normal", + "_key": "79b864798589", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f21cfe6563d3" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0704dcad0c59", + "markDefs": [ + { + "href": "mailto:community@seqera.io", + "_key": "51fd15d277e6", + "_type": "link" + } + ], + "children": [ + { + "_key": "a4e4fc9f1900", + "_type": "span", + "marks": [], + "text": "We’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don't hesitate to reach out to us at " + }, + { + "marks": [ + "51fd15d277e6" + ], + "text": "community@seqera.io", + "_key": "e407f6a7ccc9", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". We would love to hear from you!", + "_key": "282f228555fb" + } + ] + } + ], + "_createdAt": "2024-09-25T14:17:28Z", + "meta": { + "slug": { + "current": "nextflow-goes-to-university" + } + }, + "_id": "441553c7a45b", + "_updatedAt": "2024-09-25T14:17:28Z", + "_rev": "iDu5BZYWt2aPtfbIxmiuDJ", + "title": "Nextflow goes to university!", + "publishedAt": "2023-07-24T06:00:00.000Z" + }, + { + "_type": "blogPost", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWNOF", + "_type": "reference" + }, + "_rev": "rsIQ9Jd8Z4nKBVUruy4Ydk", + "_updatedAt": "2024-09-27T08:50:18Z", + "meta": { + "description": "Git has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.", + "slug": { + "current": "configure-git-repositories-with-nextflow" + } + }, + "publishedAt": "2021-10-21T06:00:00.000Z", + "_id": "4625921766fb", + "body": [ + { + "_key": "db83426eb157", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Git has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.", + "_key": "25e9e166af3e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "4a6de1edd4c3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "29f610d8a848" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow provides builtin support for Git and most popular Git hosting platforms such as GitHub, GitLab and Bitbucket between the others, which streamline managing versions and track changes in your pipeline projects and facilitate the collaboration across different users.", + "_key": "c5ff8338cd22" + } + ], + "_type": "block", + "style": "normal", + "_key": "29f19ed5e94a" + }, + { + "_key": "57a0732420d3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "64aebe8532db" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In order to access public repositories Nextflow does not require any special configuration, just use the ", + "_key": "27f00eedb788" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "http", + "_key": "cc37780843f8" + }, + { + "text": " URL of the pipeline project you want to run in the run command, for example:", + "_key": "01543c3deb83", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "84a961710db0", + "markDefs": [] + }, + { + "_key": "701aeb101c59", + "code": "nextflow run https://github.com/nextflow-io/hello", + "_type": "code", + "language": "text" + }, + { + "_type": "block", + "style": "normal", + "_key": "b65a378a75db", + "markDefs": [], + "children": [ + { + "_key": "eb553f73fabb", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "dc5d295ad894", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "However to allow Nextflow to access private repositories you will need to specify the repository credentials, and the server hostname in the case of self-managed Git server installations.", + "_key": "e4cbd05c0ff5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "add6deba8dc8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d54c60ae3560" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Configure access to private repositories", + "_key": "c0664fa07f41" + } + ], + "_type": "block", + "style": "h2", + "_key": "81e3472dff70", + "markDefs": [] + }, + { + "children": [ + { + "text": "This is done through a file name ", + "_key": "1dd67091500d", + "_type": "span", + "marks": [] + }, + { + "_key": "14476fc10cc8", + "_type": "span", + "marks": [ + "code" + ], + "text": "scm" + }, + { + "marks": [], + "text": " placed in the ", + "_key": "e62a48c00ab4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "$HOME/.nextflow/", + "_key": "5c31a972ccf6" + }, + { + "marks": [], + "text": " directory, containing the credentials and other details for accessing a particular Git hosting solution. You can refer to the Nextflow documentation for all the ", + "_key": "b1923adff285", + "_type": "span" + }, + { + "_key": "e7b03d8523a4", + "_type": "span", + "marks": [ + "539fde056be1" + ], + "text": "SCM configuration file" + }, + { + "_type": "span", + "marks": [], + "text": " options.", + "_key": "d0173bb389a5" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce8af238c95f", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/edge/sharing.html", + "_key": "539fde056be1", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f931db99a858" + } + ], + "_type": "block", + "style": "normal", + "_key": "6b80e41f9eaa" + }, + { + "markDefs": [], + "children": [ + { + "text": "All of these platforms have their own authentication mechanisms for Git operations which are captured in the ", + "_key": "8cda311afc36", + "_type": "span", + "marks": [] + }, + { + "_key": "07f9a474d9b9", + "_type": "span", + "marks": [ + "code" + ], + "text": "$HOME/.nextflow/scm" + }, + { + "marks": [], + "text": " file with the following syntax:", + "_key": "160de1b5bebc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "87e9b1d949eb" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "165d3fc21c0a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e7c4e7658470" + }, + { + "_key": "b0cba06a0f77", + "code": "providers {\n\n '' {\n user = value\n password = value\n ...\n }\n\n '' {\n user = value\n password = value\n ...\n }\n\n}", + "_type": "code" + }, + { + "_key": "ecdc9cfc00a9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8e1bdb0f2423", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Note: Make sure to enclose the provider name with ", + "_key": "f604b6a11bb4", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "'", + "_key": "3c1012787ee7", + "_type": "span" + }, + { + "text": " if it contains a ", + "_key": "957a09f6b35e", + "_type": "span", + "marks": [] + }, + { + "_key": "1e082162579f", + "_type": "span", + "marks": [ + "code" + ], + "text": "-" + }, + { + "marks": [], + "text": " or a blank character.", + "_key": "98955c7ee38d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4ac7ecccab00", + "markDefs": [] + }, + { + "_key": "0f8bdb917351", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1ff057deab1c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As of the 21.09.0-edge release, Nextflow integrates with the following Git providers:", + "_key": "105ad94ebac2" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa77ab205f1c" + }, + { + "children": [ + { + "_key": "3e33240e0286", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "971dca3b837d", + "markDefs": [] + }, + { + "style": "h2", + "_key": "0a71528f2249", + "markDefs": [], + "children": [ + { + "text": "GitHub", + "_key": "32ba83c0388d", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "57c762fa5d33", + "markDefs": [ + { + "_key": "f16cd57b9321", + "_type": "link", + "href": "https://github.com" + }, + { + "href": "https://github.com/nf-core/", + "_key": "c3f35671c23c", + "_type": "link" + } + ], + "children": [ + { + "_key": "f38fe88ae894", + "_type": "span", + "marks": [ + "f16cd57b9321" + ], + "text": "GitHub" + }, + { + "_type": "span", + "marks": [], + "text": " is one of the most well known Git providers and is home to some of the most popular open-source Nextflow pipelines from the ", + "_key": "a5520c2c8497" + }, + { + "_type": "span", + "marks": [ + "c3f35671c23c" + ], + "text": "nf-core", + "_key": "7416702faf2b" + }, + { + "marks": [], + "text": " community project.", + "_key": "74ca1a4ba214", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e7f45456da3c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7b8e1bf4a5d7" + } + ] + }, + { + "style": "normal", + "_key": "7145f788c16c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you wish to use Nextflow code from a ", + "_key": "268f28dfc39d", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "public", + "_key": "7083c0d403e8", + "_type": "span" + }, + { + "_key": "fea4807986e2", + "_type": "span", + "marks": [], + "text": " repository hosted on GitHub.com, then you don't need to provide credentials (" + }, + { + "marks": [ + "code" + ], + "text": "user", + "_key": "d763e55c075d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "ab9488ca8dcf" + }, + { + "marks": [ + "code" + ], + "text": "password", + "_key": "7680a9eb1acf", + "_type": "span" + }, + { + "text": ") to pull code from the repository. However, if you wish to interact with a private repository or are running into GitHub API rate limits for public repos, then you must provide elevated access to Nextflow by specifying your credentials in the ", + "_key": "0d1c4e9c305e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "scm", + "_key": "24b11b423013" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "b23692c605cd" + } + ], + "_type": "block" + }, + { + "_key": "5d2ee70dcef2", + "markDefs": [], + "children": [ + { + "_key": "310d483b51c6", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/#what-you-need-to-do-today", + "_key": "583dff4d7666" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is worth noting that ", + "_key": "558521fd3e0e" + }, + { + "marks": [ + "583dff4d7666" + ], + "text": "GitHub recently phased out Git password authentication", + "_key": "dde2ae8f1887", + "_type": "span" + }, + { + "text": " and now requires that users supply a more secure GitHub-generated ", + "_key": "162d0521b559", + "_type": "span", + "marks": [] + }, + { + "_key": "5b35024248be", + "_type": "span", + "marks": [ + "em" + ], + "text": "Personal Access Token" + }, + { + "_type": "span", + "marks": [], + "text": " for authentication. With Nextflow, you can specify your ", + "_key": "0d3ac75d0a37" + }, + { + "text": "personal access token", + "_key": "2b52fae60022", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "61ef76d54e9f", + "_type": "span", + "marks": [], + "text": " in the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "password", + "_key": "a315e0d53c0e" + }, + { + "_key": "fe7ad2d79f53", + "_type": "span", + "marks": [], + "text": " field." + } + ], + "_type": "block", + "style": "normal", + "_key": "b6397fe35b0c" + }, + { + "children": [ + { + "text": "", + "_key": "fe993a5ec666", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "097d7a5f785c", + "markDefs": [] + }, + { + "code": "providers {\n\n github {\n user = 'me'\n password = 'my-personal-access-token'\n }\n\n}", + "_type": "code", + "_key": "3dda18e6c7a3" + }, + { + "_key": "6aeccf74c673", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7cee71510d27" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "952110a1f518", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token", + "_key": "1df04550aa1b" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "86baf07da30f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "personal-access-token", + "_key": "0dc18ee2d07e" + }, + { + "text": " for the GitHub platform, follow the instructions provided ", + "_key": "02b7101844b7", + "_type": "span", + "marks": [] + }, + { + "_key": "75acef2e4664", + "_type": "span", + "marks": [ + "1df04550aa1b" + ], + "text": "here" + }, + { + "_type": "span", + "marks": [], + "text": ". Ensure that the token has at a minimum all the permissions in the ", + "_key": "01a36f71288b" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "repo", + "_key": "16f29b294892" + }, + { + "_type": "span", + "marks": [], + "text": " scope.", + "_key": "a3196bfa2d3f" + } + ] + }, + { + "style": "normal", + "_key": "331fe594e726", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "01bc884633a8" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "11b4d051166b", + "markDefs": [], + "children": [ + { + "text": "Once you have provided your username and ", + "_key": "684c90c68f13", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "4a951809a0f1" + }, + { + "_type": "span", + "marks": [], + "text": ", as shown above, you can test the integration by pulling the repository code.", + "_key": "18805dd1669d" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7b7a89cac57d", + "markDefs": [], + "children": [ + { + "_key": "173686cce031", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "6ecadfbf3568", + "code": "nextflow pull https://github.com/user_name/private_repo" + }, + { + "_type": "block", + "style": "normal", + "_key": "aed0cdb9d840", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cc67fd760b98" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "ab6e7b269dda", + "markDefs": [], + "children": [ + { + "text": "Bitbucket Cloud", + "_key": "a5a0477abecb", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [ + { + "href": "https://bitbucket.org/", + "_key": "3136bacc4f44", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "3136bacc4f44" + ], + "text": "Bitbucket", + "_key": "816877ce28f9", + "_type": "span" + }, + { + "marks": [], + "text": " is a publicly accessible Git solution hosted by Atlassian. Please note that if you are using an on-premises Bitbucket installation, you should follow the instructions for ", + "_key": "1ccdc2b55184", + "_type": "span" + }, + { + "text": "Bitbucket Server", + "_key": "c4f831c2d355", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "ccca23bab798", + "_type": "span", + "marks": [], + "text": " in the following section." + } + ], + "_type": "block", + "style": "normal", + "_key": "039a387ad88f" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "626826a5c7ee" + } + ], + "_type": "block", + "style": "normal", + "_key": "f36e589fd61e", + "markDefs": [] + }, + { + "_key": "31c0836aabfd", + "markDefs": [], + "children": [ + { + "_key": "28aa45aebcef", + "_type": "span", + "marks": [], + "text": "If your Nextflow code is in a public Bitbucket repository, then you don't need to specify your credentials to pull code from the repository. However, if you wish to interact with a private repository, you need to provide elevated access to Nextflow by specifying your credentials in the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "scm", + "_key": "90c4cfb52b00" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "aeb53fae60d5" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "48c1901af265", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bbc3c2470ac1" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Please note that Bitbucket Cloud requires your ", + "_key": "d65fecd648fd", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "app password", + "_key": "75ea61922a88" + }, + { + "marks": [], + "text": " in the ", + "_key": "d4986cdb224a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "password", + "_key": "71f6ded09325" + }, + { + "_type": "span", + "marks": [], + "text": " field, which is different from your login password.", + "_key": "a5d5623fb085" + } + ], + "_type": "block", + "style": "normal", + "_key": "85e7ee14d36f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a75e9e9cef65" + } + ], + "_type": "block", + "style": "normal", + "_key": "799598abfae9" + }, + { + "code": "providers {\n\n bitbucket {\n user = 'me'\n password = 'my-app-password'\n }\n\n}", + "_type": "code", + "_key": "f9b7a238b925" + }, + { + "markDefs": [], + "children": [ + { + "_key": "9646cce38d3b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "09606f491acd" + }, + { + "_type": "block", + "style": "normal", + "_key": "f038d4d73450", + "markDefs": [ + { + "_type": "link", + "href": "https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/", + "_key": "3aea2cdcb17d" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate an ", + "_key": "63640d5786f2" + }, + { + "text": "app password", + "_key": "063e4e96e5fd", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " for the Bitbucket platform, follow the instructions provided ", + "_key": "7324690b07ac" + }, + { + "marks": [ + "3aea2cdcb17d" + ], + "text": "here", + "_key": "10fae48e6c3e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Ensure that the token has at least ", + "_key": "9b15db5cb258" + }, + { + "text": "Repositories: Read", + "_key": "433e24cdf803", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " permission.", + "_key": "307d6dbc8066" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0ce51a692912", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8ed900eeeaba" + }, + { + "_type": "block", + "style": "normal", + "_key": "86161c3906b6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once these settings are saved in ", + "_key": "7571bb64310a" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "$HOME/.nextflow/scm", + "_key": "77aa28c2d752" + }, + { + "marks": [], + "text": ", you can test the integration by pulling the repository code.", + "_key": "4874a251a8c4", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f0b5e40a9568", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "90484fd813a0", + "_type": "span", + "marks": [] + } + ] + }, + { + "code": "nextflow pull https://bitbucket.org/user_name/private_repo", + "_type": "code", + "_key": "6a8d2fbe8bac" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "04e7113d0c4a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "fcb52c9c61ec" + }, + { + "_type": "block", + "style": "h2", + "_key": "716f4ada6f30", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Bitbucket Server", + "_key": "bb871768db0a", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [ + "1d753d5a4057" + ], + "text": "Bitbucket Server", + "_key": "909b061e07bf", + "_type": "span" + }, + { + "text": " is a Git hosting solution from Atlassian which is meant for teams that require a self-managed solution. If Nextflow code resides in an open Bitbucket repository, then you don't need to provide credentials to pull code from this repository. However, if you wish to interact with a private repository, you need to give elevated access to Nextflow by specifying your credentials in the ", + "_key": "a2f2757daaf5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "scm", + "_key": "5cbb999b9906" + }, + { + "marks": [], + "text": " file.", + "_key": "1ac4250a9eb7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a6856f439aff", + "markDefs": [ + { + "_key": "1d753d5a4057", + "_type": "link", + "href": "https://www.atlassian.com/software/bitbucket/enterprise" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "e3ac27b19546", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ebedd31342e8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For example, if you'd like to call your hosted Bitbucket server as ", + "_key": "e3b5be6c0b05" + }, + { + "text": "mybitbucketserver", + "_key": "bc3d5f1ad4c7", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", then you'll need to add the following snippet in your ", + "_key": "56e8677766b7" + }, + { + "_key": "13f105e21bdc", + "_type": "span", + "marks": [ + "code" + ], + "text": "~/$HOME/.nextflow/scm" + }, + { + "_key": "d06dee549174", + "_type": "span", + "marks": [], + "text": " file." + } + ], + "_type": "block", + "style": "normal", + "_key": "65653a8c49de" + }, + { + "_key": "864eb62586ef", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "faea57da2d23", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "providers {\n\n mybitbucketserver {\n platform = 'bitbucketserver'\n server = 'https://your.bitbucket.host.com'\n user = 'me'\n password = 'my-password' // OR \"my-token\"\n }\n\n}", + "_type": "code", + "_key": "446030bbf92c" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e4533169e667" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d9aa4bdaa20", + "markDefs": [] + }, + { + "style": "normal", + "_key": "71f44e6c3538", + "markDefs": [ + { + "_type": "link", + "href": "https://confluence.atlassian.com/bitbucketserver/managing-personal-access-tokens-1005339986.html", + "_key": "5d34ac38c04f" + } + ], + "children": [ + { + "_key": "5cbd32ae7db1", + "_type": "span", + "marks": [], + "text": "To generate a " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "cdcb4ddf96ad" + }, + { + "text": " for Bitbucket Server, refer to the ", + "_key": "22cd80357709", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "5d34ac38c04f" + ], + "text": "Bitbucket Support documentation", + "_key": "b809da251376", + "_type": "span" + }, + { + "_key": "f9a1fcc11db8", + "_type": "span", + "marks": [], + "text": " from Atlassian." + } + ], + "_type": "block" + }, + { + "_key": "0a76ed183ff3", + "markDefs": [], + "children": [ + { + "_key": "e82470410c47", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "Once the configuration is saved, you can test the integration by pulling code from a private repository and specifying the ", + "_key": "8ebcdb031c6c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "mybitbucketserver", + "_key": "85a738fa3172" + }, + { + "text": " Git provider using the ", + "_key": "4b73ecdf6c36", + "_type": "span", + "marks": [] + }, + { + "_key": "8962783452b5", + "_type": "span", + "marks": [ + "code" + ], + "text": "-hub" + }, + { + "_type": "span", + "marks": [], + "text": " option.", + "_key": "d3ac60442424" + } + ], + "_type": "block", + "style": "normal", + "_key": "1aceafe3a6da" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9a70dedd8fd0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "cf073281bdf2" + }, + { + "_key": "be79024fd871", + "code": "nextflow pull https://your.bitbucket.host.com/user_name/private_repo -hub mybitbucketserver", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f8265188ee09" + } + ], + "_type": "block", + "style": "normal", + "_key": "4b36d96af813" + }, + { + "_key": "3ee72524fbea", + "markDefs": [ + { + "_type": "link", + "href": "https://www.atlassian.com/migration/assess/journey-to-cloud", + "_key": "a48ed8372354" + }, + { + "_type": "link", + "href": "https://bitbucket.org", + "_key": "0d23da771a8b" + } + ], + "children": [ + { + "marks": [], + "text": "NOTE: It is worth noting that ", + "_key": "6e1ccc176128", + "_type": "span" + }, + { + "_key": "47c8e6a84f05", + "_type": "span", + "marks": [ + "a48ed8372354" + ], + "text": "Atlassian is phasing out the Server offering" + }, + { + "marks": [], + "text": " in favor of cloud product ", + "_key": "60a502b9f779", + "_type": "span" + }, + { + "text": "bitbucket.org", + "_key": "413bdc62c318", + "_type": "span", + "marks": [ + "0d23da771a8b" + ] + }, + { + "marks": [], + "text": ".", + "_key": "98a7bde8f731", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0e9c07416673", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "987bab6e65dd" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "327a8f593397", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "GitLab", + "_key": "82b184b0425b" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://gitlab.com", + "_key": "1dc07434701a" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "1dc07434701a" + ], + "text": "GitLab", + "_key": "6ac2e33d1b06" + }, + { + "_type": "span", + "marks": [], + "text": " is a popular Git provider that offers features covering various aspects of the DevOps cycle.", + "_key": "3bc887cc9704" + } + ], + "_type": "block", + "style": "normal", + "_key": "206e346e8e06" + }, + { + "markDefs": [], + "children": [ + { + "_key": "0c21d1ec2bbd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "897ad0f10a2b" + }, + { + "style": "normal", + "_key": "d6539fc8c533", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you wish to run a Nextflow pipeline from a public GitLab repository, there is no need to provide credentials to pull code. However, if you wish to interact with a private repository, then you must give elevated access to Nextflow by specifying your credentials in the ", + "_key": "bb9c58537d98", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "scm", + "_key": "87515f32bd58" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "36ace447715f" + } + ], + "_type": "block" + }, + { + "_key": "0db30d4cae4f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "cb07bd2ae58c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "Please note that you need to specify your ", + "_key": "0662258f4c56", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "fa22cbc111a8", + "_type": "span" + }, + { + "_key": "d9ae4e3e0063", + "_type": "span", + "marks": [], + "text": " in the " + }, + { + "text": "password", + "_key": "0725a64c71ee", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " field.", + "_key": "d3f0ccd67785" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa2108121075" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "806973d01fc0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "260a779bb088" + }, + { + "_key": "bda02b722173", + "code": "providers {\n\n mygitlab {\n user = 'me'\n password = 'my-password' // or 'my-personal-access-token'\n token = 'my-personal-access-token'\n }\n\n}", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1080e7dd2bbf" + } + ], + "_type": "block", + "style": "normal", + "_key": "6021faa14b33" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://gitlab.com", + "_key": "d827eaef861e" + } + ], + "children": [ + { + "marks": [], + "text": "In addition, you can specify the ", + "_key": "af8649a84fee", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "server", + "_key": "3ef012f71fad" + }, + { + "_key": "10cdc9fbd301", + "_type": "span", + "marks": [], + "text": " fields for your self-hosted instance of GitLab, by default " + }, + { + "marks": [ + "d827eaef861e" + ], + "text": "https://gitlab.com", + "_key": "4f6038105a29", + "_type": "span" + }, + { + "_key": "71af938525a6", + "_type": "span", + "marks": [], + "text": " is assumed as the server." + } + ], + "_type": "block", + "style": "normal", + "_key": "d7fb5c4fc836" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "202acb3801c2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5b18581c76f", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "c355898abbbb", + "markDefs": [ + { + "href": "https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html", + "_key": "dc14c6012d22", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "b55c9375d33c" + }, + { + "_key": "28febb25b2b7", + "_type": "span", + "marks": [ + "code" + ], + "text": "personal-access-token" + }, + { + "text": " for the GitLab platform follow the instructions provided ", + "_key": "05fd10f03aa3", + "_type": "span", + "marks": [] + }, + { + "_key": "e96252c53977", + "_type": "span", + "marks": [ + "dc14c6012d22" + ], + "text": "here" + }, + { + "marks": [], + "text": ". Please ensure that the token has at least ", + "_key": "dadac30f1ddb", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "read_repository", + "_key": "8b9db3d15420" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "520384d27f7e" + }, + { + "marks": [ + "code" + ], + "text": "read_api", + "_key": "7ac82b2441b3", + "_type": "span" + }, + { + "marks": [], + "text": " permissions.", + "_key": "b8e8135dafd5", + "_type": "span" + } + ] + }, + { + "_key": "b1a08fae9469", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a8fb5822193d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "Once the configuration is saved, you can test the integration by pulling the repository code using the ", + "_key": "4bed0e8af1ac", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "-hub", + "_key": "3c8199eaedc7", + "_type": "span" + }, + { + "marks": [], + "text": " option.", + "_key": "1c4b43f0462c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "16ae59b3713d" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fee13eebacc0" + } + ], + "_type": "block", + "style": "normal", + "_key": "79f72f13d8ee" + }, + { + "code": "nextflow pull https://gitlab.com/user_name/private_repo -hub mygitlab", + "_type": "code", + "_key": "1d2b5d657d1c" + }, + { + "style": "normal", + "_key": "8b8f0fc08910", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f51c989b29a5", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "daf8ca6d282d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Gitea", + "_key": "c17b37565aa4", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "text": "Gitea server", + "_key": "5ecb4adc69dd", + "_type": "span", + "marks": [ + "3f1005d9f1d1" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is an open source Git-hosting solution that can be self-hosted. If you have your Nextflow code in an open Gitea repository, there is no need to specify credentials to pull code from this repository. However, if you wish to interact with a private repository, you can give elevated access to Nextflow by specifying your credentials in the ", + "_key": "dc335153b179" + }, + { + "_key": "51da9d53b755", + "_type": "span", + "marks": [ + "code" + ], + "text": "scm" + }, + { + "text": " file.", + "_key": "395fcff2fefc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "70b9e891612b", + "markDefs": [ + { + "href": "https://gitea.com/", + "_key": "3f1005d9f1d1", + "_type": "link" + } + ] + }, + { + "style": "normal", + "_key": "c929f74e6015", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "243bd3a67962", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "79c1e6e3c6dd", + "markDefs": [], + "children": [ + { + "text": "For example, if you'd like to call your hosted Gitea server ", + "_key": "e755bc42b798", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "mygiteaserver", + "_key": "f01d9854b6fb", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", then you'll need to add the following snippet in your ", + "_key": "f370669df5e8" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/$HOME/.nextflow/scm", + "_key": "defa4aefa9ad" + }, + { + "text": " file.", + "_key": "dbf194cd62c6", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "3ea614397f15", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "38ef7bf47a1d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "providers {\n\n mygiteaserver {\n platform = 'gitea'\n server = 'https://gitea.host.com'\n user = 'me'\n password = 'my-password'\n }\n\n}", + "_type": "code", + "_key": "0b75791cc2f0" + }, + { + "_type": "block", + "style": "normal", + "_key": "c508af1799de", + "markDefs": [], + "children": [ + { + "_key": "b221cac08a2d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "3458d0208d0f", + "markDefs": [ + { + "href": "https://docs.gitea.io/en-us/api-usage/", + "_key": "fa597a359003", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "3e127666b123" + }, + { + "text": "personal access token", + "_key": "3c0a60b36df9", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " for your Gitea server, please refer to the ", + "_key": "9509934e123b", + "_type": "span" + }, + { + "marks": [ + "fa597a359003" + ], + "text": "official guide", + "_key": "0569600bac61", + "_type": "span" + }, + { + "_key": "e6908cf372eb", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "_key": "fd85a28a2e65", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5b257c2bee94", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Once the configuration is set, you can test the integration by pulling the repository code and specifying ", + "_key": "42b9d3b98cf4", + "_type": "span" + }, + { + "text": "mygiteaserver", + "_key": "f7124397b2ca", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " as the Git provider using the ", + "_key": "c7f8576b8c9c" + }, + { + "_key": "6bfa2f347122", + "_type": "span", + "marks": [ + "code" + ], + "text": "-hub" + }, + { + "_type": "span", + "marks": [], + "text": " option.", + "_key": "cb68b79bd9e8" + } + ], + "_type": "block", + "style": "normal", + "_key": "456518c081fb" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5be33f254f4a" + } + ], + "_type": "block", + "style": "normal", + "_key": "45872c570a11" + }, + { + "code": "nextflow pull https://git.host.com/user_name/private_repo -hub mygiteaserver", + "_type": "code", + "_key": "f9690d6ebad1" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c4cf796a4156" + } + ], + "_type": "block", + "style": "normal", + "_key": "45ab883f1536", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Azure Repos", + "_key": "0ff65e0c7011" + } + ], + "_type": "block", + "style": "h2", + "_key": "842faa05f75a", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "71a9db316375" + ], + "text": "Azure Repos", + "_key": "3f28783b3bc6" + }, + { + "marks": [], + "text": " is a part of Microsoft Azure Cloud Suite. Nextflow integrates natively Azure Repos via the usual ", + "_key": "5975cb5c35ca", + "_type": "span" + }, + { + "text": "~/$HOME/.nextflow/scm", + "_key": "4ce0a21da5b5", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " file.", + "_key": "f5ac8f4f2e8e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c69cf477b4e4", + "markDefs": [ + { + "href": "https://azure.microsoft.com/en-us/services/devops/repos/", + "_key": "71a9db316375", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "11d7d1b61c1c", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b1cf7539ee18", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "993155389ba5", + "markDefs": [], + "children": [ + { + "text": "If you'd like to use the ", + "_key": "5baedc8057d2", + "_type": "span", + "marks": [] + }, + { + "text": "myazure", + "_key": "4cc6690a6232", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " alias for the ", + "_key": "e1ad83dd3b02" + }, + { + "marks": [ + "code" + ], + "text": "azurerepos", + "_key": "c90b90d1cd4a", + "_type": "span" + }, + { + "text": " provider, then you'll need to add the following snippet in your ", + "_key": "b4dd88033882", + "_type": "span", + "marks": [] + }, + { + "text": "~/$HOME/.nextflow/scm", + "_key": "a7938033c173", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "a295d473fce4" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d03f25c33418", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ec3df0777828" + } + ] + }, + { + "_type": "code", + "_key": "236210736f10", + "code": "providers {\n\n myazure {\n server = 'https://dev.azure.com'\n platform = 'azurerepos'\n user = 'me'\n token = 'my-api-token'\n }\n\n}" + }, + { + "_key": "31c59860e4c4", + "markDefs": [], + "children": [ + { + "_key": "f79099283b02", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "48366bae578b", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=preview-page", + "_key": "127ad6dea324" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "79dc56d5ee56" + }, + { + "text": "personal access token", + "_key": "dca3e2327b81", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " for your Azure Repos integration, please refer to the ", + "_key": "c62fa3acd703", + "_type": "span" + }, + { + "text": "official guide", + "_key": "316c2153bd98", + "_type": "span", + "marks": [ + "127ad6dea324" + ] + }, + { + "text": " on Azure.", + "_key": "7e64d39f1cec", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "76137fc1af3b", + "markDefs": [], + "children": [ + { + "_key": "6150348308d0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "916ba48ce0cd", + "markDefs": [], + "children": [ + { + "_key": "650626725ba9", + "_type": "span", + "marks": [], + "text": "Once the configuration is set, you can test the integration by pulling the repository code and specifying " + }, + { + "marks": [ + "code" + ], + "text": "myazure", + "_key": "5dec137c2f5c", + "_type": "span" + }, + { + "_key": "75dd36910b69", + "_type": "span", + "marks": [], + "text": " as the Git provider using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-hub", + "_key": "dc13cac6c64a" + }, + { + "text": " option.", + "_key": "2e222cc77d79", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f77bcc300a08" + } + ], + "_type": "block", + "style": "normal", + "_key": "b9fa19d98dfd" + }, + { + "code": "nextflow pull https://dev.azure.com/org_name/DefaultCollection/_git/repo_name -hub myazure", + "_type": "code", + "_key": "7f947958ba86" + }, + { + "style": "normal", + "_key": "896a90e04c53", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f1a1bf7b13e6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "c0c41d831c66", + "markDefs": [], + "children": [ + { + "_key": "4d80672ba830", + "_type": "span", + "marks": [], + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "a907b5ed9a76", + "markDefs": [], + "children": [ + { + "_key": "1355f72c8231", + "_type": "span", + "marks": [], + "text": "Git is a popular, widely used software system for source code management. The native integration of Nextflow with various Git hosting solutions is an important feature to facilitate reproducible workflows that enable collaborative development and deployment of Nextflow pipelines." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "427d8ead829f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "381758ec6146", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "72bb7fea75ff", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Stay tuned for more integrations as we continue to improve our support for various source code management solutions! ", + "_key": "582658966ac5" + } + ], + "_type": "block", + "style": "normal" + } + ], + "tags": [], + "title": "Configure Git private repositories with Nextflow", + "_createdAt": "2024-09-25T14:15:55Z" + }, + { + "publishedAt": "2016-10-19T06:00:00.000Z", + "meta": { + "slug": { + "current": "enabling-elastic-computing-nextflow" + } + }, + "_type": "blogPost", + "_rev": "mvya9zzDXWakVjnX4hhYSE", + "_id": "48e5bb8f8161", + "title": "Enabling elastic computing with Nextflow", + "tags": [ + { + "_type": "reference", + "_key": "15c793769ec6", + "_ref": "9161ec05-53f8-455a-a931-7b41f6ec5172" + }, + { + "_type": "reference", + "_key": "42d51282c7c2", + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe" + }, + { + "_type": "reference", + "_key": "ce08b1b8b691", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "dbc666a0ded7" + } + ], + "_createdAt": "2024-09-25T14:15:07Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_updatedAt": "2024-09-26T09:01:29Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "*Learn how to deploy an elastic computing cluster in the AWS cloud with Nextflow *", + "_key": "13725119fc76", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d2df3a52832" + }, + { + "_type": "block", + "style": "normal", + "_key": "4c1011209c3e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "50d834a420c6" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "50dc3ca09200", + "markDefs": [ + { + "href": "/blog/2016/deploy-in-the-cloud-at-snap-of-a-finger.html", + "_key": "f54054e5ba06", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the ", + "_key": "0617211eea9b" + }, + { + "_type": "span", + "marks": [ + "f54054e5ba06" + ], + "text": "previous post", + "_key": "fe8928c432bd" + }, + { + "_key": "8df5424e5651", + "_type": "span", + "marks": [], + "text": " I introduced the new cloud native support for AWS provided by Nextflow." + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "c3e40390046d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e352b24454db" + }, + { + "_key": "be69eed3b99b", + "markDefs": [], + "children": [ + { + "text": "It allows the creation of a computing cluster in the cloud in a no-brainer way, enabling the deployment of complex computational pipelines in a few commands.", + "_key": "2d5a9dfb3dea", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e8a98c1eaced" + } + ], + "_type": "block", + "style": "normal", + "_key": "b36a488a43f3" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This solution is characterised by using a lean application stack which does not require any third party component installed in the EC2 instances other than a Java VM and the Docker engine (the latter it's only required in order to deploy pipeline binary dependencies).", + "_key": "cb6e648644ed", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5184b0e5621e" + }, + { + "children": [ + { + "text": "", + "_key": "cf593a92b416", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e7811af09fe5" + }, + { + "alt": "Nextflow cloud deployment", + "_key": "270551d1e525", + "asset": { + "_ref": "image-72b2eb937f163aedb655e036ec826e63ff06922e-1514x1059-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "text": "", + "_key": "82280c57db4f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "afc942337dd3" + }, + { + "style": "normal", + "_key": "fdf4e4eb41da", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/efs/", + "_key": "68d992903a45" + } + ], + "children": [ + { + "marks": [], + "text": "Each EC2 instance runs a script, at bootstrap time, that mounts the ", + "_key": "da5095658f64", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "68d992903a45" + ], + "text": "EFS", + "_key": "9ff457d30ca0" + }, + { + "_key": "d69dfdd04955", + "_type": "span", + "marks": [], + "text": " storage and downloads and launches the Nextflow cluster daemon. This daemon is self-configuring, it automatically discovers the other running instances and joins them forming the computing cluster." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6480f294933e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "42a3d2a79088" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3a6e95a34dc0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The simplicity of this stack makes it possible to setup the cluster in the cloud in just a few minutes, a little more time than is required to spin up the EC2 VMs. This time does not depend on the number of instances launched, as they configure themself independently.", + "_key": "9a94cdbcc63e", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "53c670ef7a02", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0e369d9cd0bc" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "27f12b60b74e", + "markDefs": [ + { + "_type": "link", + "href": "http://www.nextplatform.com/2016/09/21/three-great-lies-cloud-computing/", + "_key": "ed19781aed0e" + } + ], + "children": [ + { + "text": "This also makes it possible to add or remove instances as needed, realising the ", + "_key": "40e4a54cf75f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "ed19781aed0e" + ], + "text": "long promised\nelastic scalability", + "_key": "32db2cdddbfe" + }, + { + "text": " of cloud computing.", + "_key": "24227e43f02a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "fb81d2946742", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6eb5730499e" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This ability is even more important for bioinformatic workflows, which frequently crunch not homogeneous datasets and are composed of tasks with very different computing requirements (eg. a few very long running tasks and many short-lived tasks in the same workload).", + "_key": "f9239e26551d" + } + ], + "_type": "block", + "style": "normal", + "_key": "0452deca85e5" + }, + { + "_key": "c04ac5dbb2e9", + "children": [ + { + "_key": "df5de16566a8", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Going elastic", + "_key": "67a6682aa4da", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "a8bae2b011ce" + }, + { + "_type": "block", + "style": "normal", + "_key": "ae24c0172d75", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow support for the cloud features an elastic cluster which is capable of resizing itself to adapt to the actual computing needs at runtime, thus spinning up new EC2 instances when jobs wait for too long in the execution queue, or terminating instances that are not used for a certain amount of time.", + "_key": "42ca284374d6" + } + ] + }, + { + "_key": "8f2501ea482d", + "children": [ + { + "_key": "dd32125e2cd8", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "01422631a9cc", + "markDefs": [], + "children": [ + { + "_key": "76894180645d", + "_type": "span", + "marks": [], + "text": "In order to enable the cluster autoscaling you will need to specify the autoscale properties in the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "0dc91541137f" + }, + { + "marks": [], + "text": " file. For example:", + "_key": "14a159e57f36", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "45a52415fa16", + "children": [ + { + "_type": "span", + "text": "", + "_key": "94418ea991c6" + } + ] + }, + { + "code": "cloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.xlarge'\n\n autoscale {\n enabled = true\n minInstances = 5\n maxInstances = 10\n }\n}", + "_type": "code", + "_key": "a4149ffc65bf" + }, + { + "_type": "block", + "style": "normal", + "_key": "bc391d542e7a", + "children": [ + { + "text": "", + "_key": "f96fa96d09f0", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "399f02becd90", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The above configuration enables the autoscaling features so that the cluster will include at least 5 nodes. If at any point one or more tasks spend more than 5 minutes without being processed, the number of instances needed to fullfil the pending tasks, up to limit specified by the ", + "_key": "283a677cada6" + }, + { + "_key": "7f3bc9e23e22", + "_type": "span", + "marks": [ + "code" + ], + "text": "maxInstances" + }, + { + "text": " attribute, are launched. On the other hand, if these instances are idle, they are terminated before reaching the 60 minutes instance usage boundary.", + "_key": "fdce2aca55f5", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "b8e980088ecd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "541509a39c0e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "b2ea423801c1", + "markDefs": [], + "children": [ + { + "text": "The autoscaler launches instances by using the same AMI ID and type specified in the ", + "_key": "d525e6f392ef", + "_type": "span", + "marks": [] + }, + { + "_key": "07a9426248b0", + "_type": "span", + "marks": [ + "code" + ], + "text": "cloud" + }, + { + "_key": "b00ee8b150e4", + "_type": "span", + "marks": [], + "text": " configuration. However it is possible to define different attributes as shown below:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "262519f8da16", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9dfc647d5761" + }, + { + "code": "cloud {\n imageId = 'ami-4b7daa32'\n instanceType = 'm4.large'\n\n autoscale {\n enabled = true\n maxInstances = 10\n instanceType = 'm4.2xlarge'\n spotPrice = 0.05\n }\n}", + "_type": "code", + "_key": "9cbbb15b4f7a" + }, + { + "style": "normal", + "_key": "e2f8b73b896f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "937d2eab8c07" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The cluster is first created by using instance(s) of type ", + "_key": "4f37883b5558" + }, + { + "_key": "9f29c33d922d", + "_type": "span", + "marks": [ + "code" + ], + "text": "m4.large" + }, + { + "_key": "8bb660b1b9bc", + "_type": "span", + "marks": [], + "text": ". Then, when new computing nodes are required the autoscaler launches instances of type " + }, + { + "_key": "2fdba04757d0", + "_type": "span", + "marks": [ + "code" + ], + "text": "m4.2xlarge" + }, + { + "_type": "span", + "marks": [], + "text": ". Also, since the ", + "_key": "0b930923e143" + }, + { + "_key": "7d4f50853ec2", + "_type": "span", + "marks": [ + "code" + ], + "text": "spotPrice" + }, + { + "marks": [], + "text": " attribute is specified, ", + "_key": "a2dc393435b5", + "_type": "span" + }, + { + "_key": "86a6b84a1dcc", + "_type": "span", + "marks": [ + "0ac87538f5a5" + ], + "text": "EC2 spot" + }, + { + "_type": "span", + "marks": [], + "text": " instances are launched, instead of regular on-demand ones, bidding for the price specified.", + "_key": "846b15849d2e" + } + ], + "_type": "block", + "style": "normal", + "_key": "4c53854ea3e2", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/ec2/spot/", + "_key": "0ac87538f5a5" + } + ] + }, + { + "children": [ + { + "_key": "59cea82fb016", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0c982422be44" + }, + { + "children": [ + { + "text": "Conclusion", + "_key": "afde72287b02", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "30eb28b96d2c" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow implements an easy though effective cloud scheduler that is able to scale dynamically to meet the computing needs of deployed workloads taking advantage of the ", + "_key": "c0ec475a298b" + }, + { + "marks": [ + "em" + ], + "text": "elastic", + "_key": "76e588784baf", + "_type": "span" + }, + { + "_key": "86c6686587c4", + "_type": "span", + "marks": [], + "text": " nature of the cloud platform." + } + ], + "_type": "block", + "style": "normal", + "_key": "2c06e03dd5a4", + "markDefs": [] + }, + { + "style": "normal", + "_key": "cfa39e2146e0", + "children": [ + { + "text": "", + "_key": "6c15dd65e034", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "003ff2d6086c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This ability, along the support for spot/preemptible instances, allows a cost effective solution for the execution of your pipeline in the cloud.", + "_key": "6b5679209bde", + "_type": "span" + } + ] + } + ] + }, + { + "body": [ + { + "_key": "5fc3324cbc64", + "markDefs": [ + { + "_key": "6becb2c3b516", + "_type": "link", + "href": "http://www.cygwin.com/" + }, + { + "_type": "link", + "href": "https://wiki.ubuntu.com/WubiGuide", + "_key": "bfc704fda1d5" + } + ], + "children": [ + { + "marks": [], + "text": "For Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as ", + "_key": "6cb202f46482", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "6becb2c3b516" + ], + "text": "Cygwin", + "_key": "f0d573589aef" + }, + { + "text": " or ", + "_key": "ef4cfa1b0908", + "_type": "span", + "marks": [] + }, + { + "text": "Wubi", + "_key": "c118989fc9d8", + "_type": "span", + "marks": [ + "bfc704fda1d5" + ] + }, + { + "marks": [], + "text": ". Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.", + "_key": "d32f8bea0015", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b03c749cbcfa", + "markDefs": [], + "children": [ + { + "_key": "10aee690ecff", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "12930d82455d", + "markDefs": [], + "children": [ + { + "_key": "da9bece122cf", + "_type": "span", + "marks": [], + "text": "The Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0b56bb88fcf2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "382bb307a059" + }, + { + "markDefs": [], + "children": [ + { + "_key": "5717951568be", + "_type": "span", + "marks": [], + "text": "This document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10." + } + ], + "_type": "block", + "style": "normal", + "_key": "bd367e17439d" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "59924261c312", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0cea49820fac" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1c8db0a1cadb0", + "_type": "span", + "marks": [], + "text": "High-level Steps" + } + ], + "_type": "block", + "style": "h2", + "_key": "23eada725707" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The steps described in this guide are as follows:", + "_key": "8258e15b4c650" + } + ], + "_type": "block", + "style": "normal", + "_key": "329409232b59", + "markDefs": [] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b6063eaca1ed", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Install Windows PowerShell", + "_key": "ea5ff14c8db10", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "0a4cac1194f1", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Configure the Windows Subsystem for Linux (WSL2)", + "_key": "c3987f32b2360", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "9bcd0541e3ff", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Obtain and Install a Linux distribution (on WSL2)", + "_key": "d420e301f1990" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "a0576a66796a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Install Windows Terminal", + "_key": "008b4838e96c0", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Install and configure Docker", + "_key": "60926ae0c2aa0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3ccc55792260" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Download and install an IDE (VS Code)", + "_key": "30f1f73adc6a0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b01f8daa0a26" + }, + { + "_key": "9d8f7d2a49bb", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Install and test Nextflow", + "_key": "bc7752c8b52d0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Configure X-Windows for use with the Nextflow console", + "_key": "d86ff3b7f2930" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "df3e27ebf14a", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "005aa228f956", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "a070ba3e0edd0", + "_type": "span", + "marks": [], + "text": "Install and configure GIT" + } + ], + "level": 1 + }, + { + "style": "h2", + "_key": "f76f28c39e5a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Install Windows PowerShell", + "_key": "485fd4116dc4", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "a39fe896f5f0", + "markDefs": [], + "children": [ + { + "_key": "cb103108b784", + "_type": "span", + "marks": [], + "text": "PowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5264507e8c58", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "8a5d70d215de", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "9c87f925feda", + "markDefs": [], + "children": [ + { + "_key": "634eeadc57ce0", + "_type": "span", + "marks": [], + "text": "PowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality." + } + ], + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/PowerShell/PowerShell", + "_key": "b3bd7e72c17c" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can obtain PowerShell for Windows from GitHub at the URL ", + "_key": "8b90055f21120" + }, + { + "marks": [ + "b3bd7e72c17c" + ], + "text": "https://github.com/PowerShell/PowerShell", + "_key": "8b90055f21121", + "_type": "span" + }, + { + "text": ".", + "_key": "8b90055f21122", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "02e7bcabe8c4" + }, + { + "children": [ + { + "text": "Download and install the latest stable version of PowerShell for Windows x64. For example, ", + "_key": "94a72f264d060", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "0d98d3aa635b" + ], + "text": "powershell-7.1.3-win-x64.msi", + "_key": "94a72f264d061", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "94a72f264d062" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ba9bf52ee9e7", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi", + "_key": "0d98d3aa635b", + "_type": "link" + } + ] + }, + { + "_key": "afabf2127302", + "listItem": "bullet", + "markDefs": [ + { + "_key": "bb0b9d1a7d3a", + "_type": "link", + "href": "https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1" + } + ], + "children": [ + { + "text": "If you run into difficulties, Microsoft provides detailed instructions ", + "_key": "9707c601034e0", + "_type": "span", + "marks": [] + }, + { + "_key": "9707c601034e1", + "_type": "span", + "marks": [ + "bb0b9d1a7d3a" + ], + "text": "here" + }, + { + "_key": "9707c601034e2", + "_type": "span", + "marks": [], + "text": "." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "984904fd86f0", + "markDefs": [], + "children": [ + { + "_key": "579bf907f6c30", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Configure the Windows Subsystem for Linux (WSL)", + "_key": "18ee1a18a7fc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "03fdea3f0902" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Enable the Windows Subsystem for Linux", + "_key": "ae1277bdeafa", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "4b81656587f7" + }, + { + "markDefs": [], + "children": [ + { + "_key": "4a71d41d86a1", + "_type": "span", + "marks": [], + "text": "Make sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by selecting WIN-R (using the Windows key to run a command) and running the utility " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "winver", + "_key": "2e54f7b07151" + }, + { + "marks": [], + "text": ".", + "_key": "3840f308f8bc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f6e80cc6159a" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "68a21de6f0d5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "19e8ee9da3ba" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "From within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"", + "_key": "43b4139d7f70", + "_type": "span" + }, + { + "_key": "61b473bfcfbf", + "_type": "span", + "marks": [ + "em" + ], + "text": "Run as administrator\"." + } + ], + "_type": "block", + "style": "normal", + "_key": "0e932cecd2f0" + }, + { + "style": "normal", + "_key": "f7031369f153", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "80b18208e94a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "code": "PS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\nYou can learn more about DISM here.\n", + "_type": "code", + "_key": "a3572b8176cb" + }, + { + "_type": "block", + "style": "normal", + "_key": "62b8296b72f0", + "markDefs": [], + "children": [ + { + "_key": "df72f56ba302", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "e5292856a4bb", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism", + "_key": "f0f427230150" + } + ], + "children": [ + { + "_key": "2b97a42e89e0", + "_type": "span", + "marks": [], + "text": "You can learn more about DISM " + }, + { + "marks": [ + "f0f427230150" + ], + "text": "here", + "_key": "177fc6259b91", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "0e94cbf87b82" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "1aeb6c1e5b9e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5657e8ed3b39", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Step 2: Enable the Virtual Machine Feature", + "_key": "136e79f4d2c4" + } + ], + "_type": "block", + "style": "h3", + "_key": "754691a86770", + "markDefs": [] + }, + { + "_key": "d51d10aa103e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Within PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.", + "_key": "6e9bd6c47908" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "7f931ff4a923", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f3e970691daa", + "markDefs": [] + }, + { + "code": "PS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\nAfter enabling the Virtual Machine Platform support, restart your machine.", + "_type": "code", + "_key": "5520cb110999" + }, + { + "_type": "block", + "style": "normal", + "_key": "eba584b559ab", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "29ff1beb31ee", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "f36a31712562", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "After enabling the Virtual Machine Platform support, ", + "_key": "9d790a95f55c", + "_type": "span" + }, + { + "text": "restart your machine", + "_key": "4fdd4e017eb5", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "1a41edbfa5d8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "97bf3acd4db5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "be9949c88e48" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "c75b04d7f3c8", + "markDefs": [], + "children": [ + { + "_key": "b798f4650238", + "_type": "span", + "marks": [], + "text": "Step 3: Download the Linux Kernel Update Package" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "b0902dd4666d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ccf3eb03f17a", + "markDefs": [] + }, + { + "style": "normal", + "_key": "3db61b10b4e6", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/wsl/compare-versions", + "_key": "cc630f9ac0f7" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 ", + "_key": "d40395d0c95d" + }, + { + "marks": [ + "cc630f9ac0f7" + ], + "text": "here", + "_key": "cabb90f743d5", + "_type": "span" + }, + { + "marks": [], + "text": ". Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:", + "_key": "5e9787db0d3e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6cfb572f3e6b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "654b932aa775", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c8cd0ad46716", + "markDefs": [ + { + "_type": "link", + "href": "https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi", + "_key": "ef6839e5bee4" + } + ], + "children": [ + { + "_key": "1ff7abf51131", + "_type": "span", + "marks": [ + "ef6839e5bee4" + ], + "text": "WSL2 Linux kernel update package for x64 machines" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "5179d301a2af", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "2afa2b6e6a2c" + }, + { + "children": [ + { + "text": "Once downloaded, double click on the kernel update package and select \"Yes\" to install it with elevated permissions.", + "_key": "6e51fe37791e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "943f86771a75", + "markDefs": [] + }, + { + "style": "normal", + "_key": "3ac54b5f90d2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "42702d5a30e6" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "3001c33962bf", + "markDefs": [], + "children": [ + { + "text": "STEP 4: Set WSL2 as your Default Version", + "_key": "3efced757707", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "fa8235c5b29b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "From within PowerShell:", + "_key": "0ff377632cfc" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "822a79e1c2b2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "afcedbb7f568", + "markDefs": [] + }, + { + "code": "PS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2", + "_type": "code", + "_key": "2aaa6e836228" + }, + { + "_key": "8b04fddd5f49", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ee6baf9e64d1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps", + "_key": "7f550bf28027" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you run into difficulties with any of these steps, Microsoft provides detailed installation instructions ", + "_key": "c0ff56067c1a" + }, + { + "marks": [ + "7f550bf28027" + ], + "text": "here", + "_key": "fd7e8bb25a66", + "_type": "span" + }, + { + "text": ".", + "_key": "f6b074e4ceb5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "17156ee9a66e" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "aae45645d044", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4f0b07d0ad03" + }, + { + "_type": "block", + "style": "h2", + "_key": "04f9eac6b121", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Obtain and Install a Linux Distribution on WSL", + "_key": "3c026ae4f649" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "9f904f27cf5f", + "markDefs": [], + "children": [ + { + "_key": "218a9cafdb88", + "_type": "span", + "marks": [], + "text": "If you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux." + } + ] + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://aka.ms/wslstore", + "_key": "952403cf27c7" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - ", + "_key": "868ae80777a00" + }, + { + "_type": "span", + "marks": [ + "952403cf27c7" + ], + "text": "https://aka.ms/wslstore", + "_key": "868ae80777a01" + }, + { + "marks": [], + "text": ".", + "_key": "868ae80777a02", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "41b2795787bd" + }, + { + "style": "normal", + "_key": "a539076d5661", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.", + "_key": "0f7f42fb56110", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "56cd257ad09b0", + "_type": "span", + "marks": [], + "text": "When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have " + }, + { + "_key": "56cd257ad09b1", + "_type": "span", + "marks": [ + "code" + ], + "text": "sudo" + }, + { + "_type": "span", + "marks": [], + "text": " privileges. Whenever a shell is started, it will default to this user.", + "_key": "56cd257ad09b2" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "670295762c2c", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "af0ccab84aea", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:", + "_key": "d796eb3a74700" + } + ], + "level": 1 + }, + { + "_key": "1cb29d30cc99", + "code": "sudo apt update && sudo apt upgrade", + "_type": "code" + }, + { + "children": [ + { + "_key": "f3409a76802c0", + "_type": "span", + "marks": [], + "text": "This is also a good time to add any additional Linux packages that you will want to use." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "28ba32ccbcb8", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "191793600515", + "code": "sudo apt install net-tools", + "_type": "code" + }, + { + "style": "normal", + "_key": "4c90237231b3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "97ade3712c76" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Install Windows Terminal", + "_key": "0749e537492f" + } + ], + "_type": "block", + "style": "h2", + "_key": "8c15f3ddfcd3", + "markDefs": [] + }, + { + "markDefs": [ + { + "_key": "3166a8b799e9", + "_type": "link", + "href": "https://github.com/microsoft/terminal" + } + ], + "children": [ + { + "text": "While not necessary, it is a good idea to install ", + "_key": "9d03e0884fbc", + "_type": "span", + "marks": [] + }, + { + "_key": "9c331d34b97f", + "_type": "span", + "marks": [ + "3166a8b799e9" + ], + "text": "Windows Terminal" + }, + { + "text": " at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.", + "_key": "e9a753f8078c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "832c3474ea13" + }, + { + "markDefs": [], + "children": [ + { + "_key": "5cd2f1fd107d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9825d4220934" + }, + { + "_type": "block", + "style": "normal", + "_key": "f913144549ed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Windows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs.", + "_key": "294e87174ecc" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "92a237142137", + "markDefs": [], + "children": [ + { + "_key": "bee5a82a7393", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "image", + "alt": "Windows Terminal", + "_key": "d7709d5f7894", + "asset": { + "_ref": "image-f02c195fad1be5c9053179106e86221dfca766d8-1381x903-png", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7b73916c7b7d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c87d0b905484" + }, + { + "style": "normal", + "_key": "5035734cdac4", + "markDefs": [ + { + "href": "https://docs.microsoft.com/en-us/windows/terminal/get-started", + "_key": "531dea182e73", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Instructions for downloading and installing Windows Terminal are available at: ", + "_key": "4af5a2c00b900" + }, + { + "text": "https://docs.microsoft.com/en-us/windows/terminal/get-started", + "_key": "4af5a2c00b901", + "_type": "span", + "marks": [ + "531dea182e73" + ] + }, + { + "text": ".", + "_key": "4af5a2c00b902", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ad9c9785c21c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8222d1271262", + "_type": "span" + } + ] + }, + { + "_key": "354eec85ad2a", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments", + "_key": "4b38955a1604" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at ", + "_key": "3da4c8fbcf750" + }, + { + "marks": [ + "4b38955a1604" + ], + "text": "https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments", + "_key": "3da4c8fbcf751", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3da4c8fbcf752" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b7f255fcd559", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4e9621599f75" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Some Windows Terminal commands you’ll need right away are provided below:", + "_key": "9f4e54c6f050", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1b16982af4e6", + "markDefs": [] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8db03ef84d9a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Split the active window vertically: ", + "_key": "b2ef918f73ef" + }, + { + "text": "SHIFT", + "_key": "d659a9e3974f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " , ", + "_key": "9d27fed391b8" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "ALT", + "_key": "97c59fe85219" + }, + { + "text": ", and ", + "_key": "6c665f756495", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "=", + "_key": "0150ab22cb90", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "04909721b889", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Split the active window horizontally: ", + "_key": "b14400a7fc2b", + "_type": "span", + "marks": [] + }, + { + "text": "SHIFT", + "_key": "eee98d5b0717", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": ", ", + "_key": "f40f63688f87", + "_type": "span" + }, + { + "_key": "09f3c32cef34", + "_type": "span", + "marks": [ + "code" + ], + "text": "ALT" + }, + { + "_key": "c4c1c0758ed6", + "_type": "span", + "marks": [], + "text": ", and " + }, + { + "marks": [ + "code" + ], + "text": "–", + "_key": "8d14e763c077", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "e8311575f896", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Resize the active window: ", + "_key": "7c81f07060b1" + }, + { + "text": "SHIFT", + "_key": "62dadffbe469", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "f2360f3d4cb4", + "_type": "span", + "marks": [], + "text": " , " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "ALT", + "_key": "7f230233bbb3" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "ebdf9da42bcd" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "", + "_key": "4a453d4664da" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8d3365e07e92", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "e1659a2e7959", + "_type": "span", + "marks": [], + "text": "Open a new window under the current tab: " + }, + { + "_key": "294bd472a415", + "_type": "span", + "marks": [ + "code" + ], + "text": "ALT" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "f8af2f84bc85" + }, + { + "text": "v", + "_key": "2eee9aaf5155", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (", + "_key": "653e7aed1318" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "the new tab icon along the top of the Windows Terminal interface", + "_key": "b68efa25de09" + }, + { + "_type": "span", + "marks": [], + "text": ")", + "_key": "809428205d03" + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "c62578cd7567", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "adaa98f7ebcd", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "h2", + "_key": "70a2f926bec5", + "markDefs": [], + "children": [ + { + "_key": "128896cf62f8", + "_type": "span", + "marks": [], + "text": "Installing Docker on Windows" + } + ], + "_type": "block" + }, + { + "_key": "2fafc2b4a9e8", + "markDefs": [ + { + "_type": "link", + "href": "https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9", + "_key": "f7517fb7b0a8" + } + ], + "children": [ + { + "text": "There are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this \"native Linux\" approach is provided ", + "_key": "f7ceff158e6b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "f7517fb7b0a8" + ], + "text": "here", + "_key": "21a65b83e337" + }, + { + "_key": "c468f0b7ca47", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "2f1fa80b3668", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "29edc1888975", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "b7a338b888e4", + "_type": "span", + "marks": [], + "text": "A second method is to run " + }, + { + "_key": "0a82b53903ea", + "_type": "span", + "marks": [ + "2b9f6b2e5017" + ], + "text": "Docker Desktop" + }, + { + "marks": [], + "text": " on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:", + "_key": "c8074f9eee53", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "626e152582e2", + "markDefs": [ + { + "_type": "link", + "href": "https://www.docker.com/products/docker-desktop", + "_key": "2b9f6b2e5017" + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5e829b57339f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "ec4bc791702e0", + "_type": "span", + "marks": [], + "text": "The installation process is automated" + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d8bce72b856d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Docker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)", + "_key": "82161a33e3930" + } + ] + }, + { + "style": "normal", + "_key": "1811bc3111ee", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "44b25510017b0", + "_type": "span", + "marks": [], + "text": "Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extension" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9f30aff3417b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Docker Desktop provides support for auto-installing a single-node Kubernetes cluster", + "_key": "809b5cc27a7a0", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "61b38f4df109", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user’s perspective, Docker appears to be running natively on Linux.", + "_key": "e3b1346622b60" + } + ], + "level": 1 + }, + { + "_key": "5a5fea0b9583", + "markDefs": [ + { + "_type": "link", + "href": "https://www.docker.com/blog/new-docker-desktop-wsl2-backend/", + "_key": "67e426d75bee" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "An explanation of how the Docker Desktop WSL 2 Back-end works is provided ", + "_key": "49d21dceffe20" + }, + { + "_type": "span", + "marks": [ + "67e426d75bee" + ], + "text": "here", + "_key": "49d21dceffe21" + }, + { + "_key": "49d21dceffe22", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f6badd856d59", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b248860da727" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Step 1: Install Docker Desktop on Windows", + "_key": "f711a17c9ed9", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "8ae5a39f20d2", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Download and install Docker Desktop for Windows from ", + "_key": "7b09a29b337b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "998761a58dd6" + ], + "text": "here", + "_key": "5c6e3b966ed4" + }, + { + "_key": "24b9b688cfc1", + "_type": "span", + "marks": [], + "text": "." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "bd4f42724d93", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exe", + "_key": "998761a58dd6", + "_type": "link" + } + ] + }, + { + "style": "normal", + "_key": "5844aa2c8d4a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Follow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.", + "_key": "45f23088da220", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.", + "_key": "dd8f9085dfd70", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "beac38117e92" + }, + { + "_type": "block", + "style": "normal", + "_key": "0db910c1930c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The installation process is straightforward, but if you run into difficulties, detailed instructions are available here.", + "_key": "062421101bb6", + "_type": "span" + } + ], + "level": 1 + }, + { + "_type": "image", + "_key": "88ad52fea6f8", + "asset": { + "_ref": "image-3251ed96ddad1790ee87b947f1b24a5d02b078ee-1260x715-png", + "_type": "reference" + } + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available ", + "_key": "a5a2ba77beb70" + }, + { + "_type": "span", + "marks": [ + "8512dfe648c2" + ], + "text": "here", + "_key": "a5a2ba77beb71" + }, + { + "_key": "a5a2ba77beb72", + "_type": "span", + "marks": [], + "text": "." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a8e1f75d4d2b", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2", + "_key": "8512dfe648c2", + "_type": "link" + } + ] + }, + { + "asset": { + "_ref": "image-cf0fd75c31fde1af5462f3679180e6edde4994d4-1039x583-png", + "_type": "reference" + }, + "_type": "image", + "_key": "5bd04f15a6b5" + }, + { + "_key": "32ca7f6c2d3f", + "markDefs": [], + "children": [ + { + "_key": "942090e26fe6", + "_type": "span", + "marks": [], + "text": "Step 2: Verify the Docker installation" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.", + "_key": "0ff61c27cfb3" + } + ], + "_type": "block", + "style": "normal", + "_key": "4a10fde1dff8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b062e53f3562", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "987858ec513d" + }, + { + "_type": "block", + "style": "normal", + "_key": "805111643132", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Run a Docker command from the Linux shell as shown below below. This command downloads a ", + "_key": "880074fca088", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "centos", + "_key": "35ef243f4c36" + }, + { + "marks": [], + "text": " image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n", + "_key": "fcec58e029f8", + "_type": "span" + } + ] + }, + { + "code": "$ docker run -ti centos:6\n[root@02ac0beb2d2c /]# hostname\n02ac0beb2d2c", + "_type": "code", + "_key": "dd996077236b" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "\nYou can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running ", + "_key": "9ec4fd1460ca", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "docker ps", + "_key": "bb1bed23bbd7", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in another window shows the running CentOS Docker container:\n", + "_key": "aa4d2b1c3d97" + } + ], + "_type": "block", + "style": "normal", + "_key": "74fd108fd1c8" + }, + { + "code": "$ docker ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\nf5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes \thappy_hopper\n", + "_type": "code", + "_key": "edaf4fe26b46" + }, + { + "children": [ + { + "marks": [], + "text": "Step 3: Dealing with exit code 139", + "_key": "26183cfa9327", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "bff3301d6dc8", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438", + "_key": "e89fc0187c0e" + }, + { + "_type": "link", + "href": "https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139", + "_key": "fd5b05ee3dd5" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You may encounter exit code ", + "_key": "95dfa02661f9" + }, + { + "text": "139", + "_key": "f1bd067a4afc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided ", + "_key": "fb17703958c3", + "_type": "span" + }, + { + "text": "here", + "_key": "de1a4f8624b3", + "_type": "span", + "marks": [ + "e89fc0187c0e" + ] + }, + { + "text": " and ", + "_key": "adc6c2208999", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "fd5b05ee3dd5" + ], + "text": "here", + "_key": "aec7155b7451" + }, + { + "_key": "d1d7c35d65eb", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "e7ecbf23f2e1" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "363576a213dc" + } + ], + "_type": "block", + "style": "normal", + "_key": "d4bcf251cf7f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "a10a94310c91", + "_type": "span", + "marks": [], + "text": "The solution is to add two lines to a " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".wslconfig", + "_key": "346f9ae4fb86" + }, + { + "_type": "span", + "marks": [], + "text": " file in your Windows home directory. The ", + "_key": "4ec97c28a2c9" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".wslconfig", + "_key": "a6eb54b8e772" + }, + { + "marks": [], + "text": " file specifies kernel options that apply to all Linux distributions running under WSL 2.", + "_key": "e371dae48574", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ec25c415d7a7" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ce923e7c81fe" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8ae2f42cbaa" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Some of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix.", + "_key": "c3df7054aac3" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e6ba742843a", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "25cfd07a16bc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ac912ef0fc86", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "e3efaadbdca3", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Edit the ", + "_key": "7308913ea661" + }, + { + "_key": "2fd6ba139cb1", + "_type": "span", + "marks": [ + "code" + ], + "text": ".wslconfig" + }, + { + "_type": "span", + "marks": [], + "text": " file in your Windows home directory. You can do this using PowerShell as shown:", + "_key": "8e355a30eff9" + } + ], + "level": 1 + }, + { + "_type": "code", + "_key": "e05975f172d2", + "code": "PS C:\\Users\\ notepad .wslconfig" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2b439aa542f1", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Add these two lines to the ", + "_key": "14e4b90e92d8" + }, + { + "text": ".wslconfig", + "_key": "1fd5c55a7ea4", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "1f81c50c8d8a", + "_type": "span", + "marks": [], + "text": " file and save it:" + } + ] + }, + { + "_type": "code", + "_key": "10af24cd79eb", + "code": "[wsl2]\nkernelCommandLine = vsyscall=emulate" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After this, ", + "_key": "37a79af4da37" + }, + { + "_key": "520ae1471aec", + "_type": "span", + "marks": [ + "strong" + ], + "text": "restart your machine" + }, + { + "_key": "db6cbe2210ae", + "_type": "span", + "marks": [], + "text": " to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code " + }, + { + "text": "139", + "_key": "8d6c29492a18", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": ".", + "_key": "c2ef57688402", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1b3fb20ee9b1" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Install Visual Studio Code as your IDE (optional)", + "_key": "e2eb3f4fffe9" + } + ], + "_type": "block", + "style": "h2", + "_key": "6931352cd4ca", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Developers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:", + "_key": "72d67eaa0141", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b2d8b070f1ad" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0d6dd41ccddc" + } + ], + "_type": "block", + "style": "normal", + "_key": "cb1483c80d3b" + }, + { + "_type": "block", + "style": "normal", + "_key": "1bb194e7c996", + "markDefs": [], + "children": [ + { + "_key": "3cbc5f2552520", + "_type": "span", + "marks": [], + "text": "Developers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:" + } + ] + }, + { + "_key": "9026ec1f6a60", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://code.visualstudio.com/Download", + "_key": "de44061c7fcf", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md", + "_key": "a0a7d010e4d3" + } + ], + "children": [ + { + "marks": [], + "text": "Visual Studio Code - ", + "_key": "5a4fa1235c620", + "_type": "span" + }, + { + "marks": [ + "de44061c7fcf" + ], + "text": "https://code.visualstudio.com/Download", + "_key": "5a4fa1235c621", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " (Nextflow VSCode Language plug-in ", + "_key": "5a4fa1235c622" + }, + { + "_type": "span", + "marks": [ + "a0a7d010e4d3" + ], + "text": "here", + "_key": "5a4fa1235c623" + }, + { + "text": ")", + "_key": "5a4fa1235c624", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "f91833de34e4", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.eclipse.org/", + "_key": "9ad362af7a43" + } + ], + "children": [ + { + "text": "Eclipse - ", + "_key": "85382f0d4b4f0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "9ad362af7a43" + ], + "text": "https://www.eclipse.org/", + "_key": "85382f0d4b4f1" + } + ] + }, + { + "markDefs": [ + { + "_key": "54515edd7298", + "_type": "link", + "href": "https://www.vim.org/" + }, + { + "_key": "45445ea97382", + "_type": "link", + "href": "https://github.com/LukeGoodsell/nextflow-vim" + } + ], + "children": [ + { + "_key": "7fc339d213f90", + "_type": "span", + "marks": [], + "text": "VIM - " + }, + { + "text": "https://www.vim.org/", + "_key": "7fc339d213f91", + "_type": "span", + "marks": [ + "54515edd7298" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (VIM plug-in for Nextflow ", + "_key": "7fc339d213f92" + }, + { + "_key": "7fc339d213f93", + "_type": "span", + "marks": [ + "45445ea97382" + ], + "text": "here" + }, + { + "text": ")", + "_key": "7fc339d213f94", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0c815e27fdad", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "f520eff859c5", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.gnu.org/software/emacs/download.html", + "_key": "0d26d3862f1d" + }, + { + "_key": "39adb984a2be", + "_type": "link", + "href": "https://github.com/Emiller88/nextflow-mode" + } + ], + "children": [ + { + "marks": [], + "text": "Emacs - ", + "_key": "181938b656f50", + "_type": "span" + }, + { + "_key": "181938b656f51", + "_type": "span", + "marks": [ + "0d26d3862f1d" + ], + "text": "https://www.gnu.org/software/emacs/download.html" + }, + { + "marks": [], + "text": " (Nextflow syntax highlighter ", + "_key": "181938b656f52", + "_type": "span" + }, + { + "text": "here", + "_key": "181938b656f53", + "_type": "span", + "marks": [ + "39adb984a2be" + ] + }, + { + "text": ")", + "_key": "181938b656f54", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "children": [ + { + "marks": [], + "text": "JetBrains PyCharm - ", + "_key": "babdec2a2d0a0", + "_type": "span" + }, + { + "marks": [ + "0df1f95eb359" + ], + "text": "https://www.jetbrains.com/pycharm/", + "_key": "babdec2a2d0a1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "f14c20100295", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://www.jetbrains.com/pycharm/", + "_key": "0df1f95eb359", + "_type": "link" + } + ] + }, + { + "markDefs": [ + { + "href": "https://www.jetbrains.com/idea/", + "_key": "561db92c1f8b", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "IntelliJ IDEA - ", + "_key": "d31fb37b22f60", + "_type": "span" + }, + { + "marks": [ + "561db92c1f8b" + ], + "text": "https://www.jetbrains.com/idea/", + "_key": "d31fb37b22f61", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "cc325c487c61", + "listItem": "bullet" + }, + { + "markDefs": [ + { + "_key": "b3db190c66de", + "_type": "link", + "href": "https://atom.io/" + }, + { + "_key": "78fe68048f61", + "_type": "link", + "href": "https://atom.io/packages/language-nextflow" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Atom – ", + "_key": "eaa22ca2beef0" + }, + { + "text": "https://atom.io/", + "_key": "eaa22ca2beef1", + "_type": "span", + "marks": [ + "b3db190c66de" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (Nextflow Atom support available ", + "_key": "eaa22ca2beef2" + }, + { + "text": "here", + "_key": "eaa22ca2beef3", + "_type": "span", + "marks": [ + "78fe68048f61" + ] + }, + { + "_type": "span", + "marks": [], + "text": ")", + "_key": "eaa22ca2beef4" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "583aa150acf4", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "e3146dd13ada", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://notepad-plus-plus.org/", + "_key": "70644f3cd4ed", + "_type": "link" + } + ], + "children": [ + { + "_key": "87766525aca80", + "_type": "span", + "marks": [], + "text": "Notepad++ - " + }, + { + "_type": "span", + "marks": [ + "70644f3cd4ed" + ], + "text": "https://notepad-plus-plus.org/", + "_key": "87766525aca81" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "75e14225d3f3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6e8de4ab055d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "We decided to install Visual Studio Code because it has some nice features, including:", + "_key": "f5f8f0b5d1f8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ea3a84d72b88", + "markDefs": [] + }, + { + "children": [ + { + "text": "Support for source code control from within the IDE (Git)", + "_key": "5efd4ccac8010", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "99b9628c5383", + "listItem": "bullet", + "markDefs": [] + }, + { + "style": "normal", + "_key": "019aa22170fe", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Support for developing on Linux via its WSL 2 Video Studio Code Backend", + "_key": "c75a6644c6640" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "c678ebd99666", + "listItem": "bullet", + "markDefs": [ + { + "_key": "afa9012df7ac", + "_type": "link", + "href": "https://github.com/nf-core/vscode-extensionpack" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an ", + "_key": "a1a4ec49591d0" + }, + { + "_type": "span", + "marks": [ + "afa9012df7ac" + ], + "text": "extension pack for the nf-core community", + "_key": "a1a4ec49591d1" + }, + { + "text": ".", + "_key": "a1a4ec49591d2", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "fe449fbd0450", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c7632182d543" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://code.visualstudio.com/Download", + "_key": "970d1c54de40" + } + ], + "children": [ + { + "_key": "a7f15e4097d90", + "_type": "span", + "marks": [], + "text": "Download Visual Studio Code from " + }, + { + "_type": "span", + "marks": [ + "970d1c54de40" + ], + "text": "https://code.visualstudio.com/Download", + "_key": "a7f15e4097d91" + }, + { + "marks": [], + "text": " and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.", + "_key": "a7f15e4097d92", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "50d7b01154d6" + }, + { + "_type": "block", + "style": "normal", + "_key": "eb3b02a79029", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "file://wsl$/Ubuntu-20.04", + "_key": "64c549099fe8" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path ", + "_key": "e8ceb30f4dae0" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "\\\\wsl$\\", + "_key": "e8ceb30f4dae1" + }, + { + "_key": "e8ceb30f4dae2", + "_type": "span", + "marks": [], + "text": ". In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: " + }, + { + "_type": "span", + "marks": [ + "64c549099fe8", + "code" + ], + "text": "\\wsl$\\Ubuntu-20.04", + "_key": "e8ceb30f4dae3" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "e8ceb30f4dae4" + } + ], + "level": 1 + }, + { + "markDefs": [], + "children": [ + { + "_key": "5f8431efbb91", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "55592e34d60c" + }, + { + "_key": "9e1303eac715", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Note that the reverse is possible also – from within Linux, ", + "_key": "31b7bb35b1300", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "/mnt/c", + "_key": "31b7bb35b1301", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " maps to the Windows C: drive. You can inspect ", + "_key": "31b7bb35b1302" + }, + { + "text": "/etc/mtab", + "_key": "31b7bb35b1303", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to see the mounted file systems available under Linux.", + "_key": "31b7bb35b1304" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "36639de3bab1", + "listItem": "bullet", + "markDefs": [ + { + "_key": "0168949fc2f5", + "_type": "link" + } + ], + "children": [ + { + "_key": "5fdfbf43f57d", + "_type": "span", + "marks": [], + "text": "It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at " + }, + { + "_key": "bce28ae8ceda", + "_type": "span", + "marks": [ + "0168949fc2f5" + ], + "text": "https://github.com/nextflow-io/vscode-language-nextflow" + }, + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "5e6d5a06c76c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "20cd9a314cfc", + "asset": { + "_type": "reference", + "_ref": "image-f2fccfacee6669f2a003bb9a1f75a00b65f9a708-1205x452-png" + } + }, + { + "style": "normal", + "_key": "4d93359955ca", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1c7e8da68c1b" + } + ], + "_type": "block" + }, + { + "_key": "4041772604c4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Visual Studio Code Remote Development", + "_key": "27e7829bbd7c", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "text": "Visual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment.", + "_key": "f88a9ec5b0c9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7258aae5f401" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "408ec60281a8" + } + ], + "_type": "block", + "style": "normal", + "_key": "a74ffe0d03c5" + }, + { + "_key": "8befd6bd030a", + "markDefs": [], + "children": [ + { + "_key": "fd17b19d8f7b", + "_type": "span", + "marks": [], + "text": "Windows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "076004a3ade5", + "markDefs": [], + "children": [ + { + "_key": "f8bd1447acd7", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "alt": "The Remote Development Environment in VS Code", + "_key": "2eb73bc6dfbe", + "asset": { + "_type": "reference", + "_ref": "image-c865d3c82010e9ca67f28de8b6d0812668758dca-958x308-png" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "db9d957eab20", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ed1a6ca8baca" + }, + { + "_type": "block", + "style": "normal", + "_key": "326f5da14a98", + "markDefs": [ + { + "href": "https://code.visualstudio.com/docs/remote/remote-overview", + "_key": "3db9f8df4ef0", + "_type": "link" + } + ], + "children": [ + { + "text": "An explanation of how VS Code Remote Development works is provided ", + "_key": "b66cf224f537", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "3db9f8df4ef0" + ], + "text": "here", + "_key": "c4ec2d3db2c2" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "df671cb13891" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "575707f2ba3c" + } + ], + "_type": "block", + "style": "normal", + "_key": "ffa3e6c4ab17", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "d1f07399b193", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "VS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.", + "_key": "68813f5534f9" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "1a15220f5ae7", + "markDefs": [], + "children": [ + { + "_key": "255fb71288a9", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "To open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select ", + "_key": "5e404734e971", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "em" + ], + "text": "\"New WSL Window using Distro ..\"", + "_key": "9608dca75900", + "_type": "span" + }, + { + "_key": "e74cc82b6ab0", + "_type": "span", + "marks": [], + "text": " and select " + }, + { + "_key": "83c0bca6fe35", + "_type": "span", + "marks": [ + "code" + ], + "text": "Ubuntu 20.04" + }, + { + "_key": "dc979953a92c", + "_type": "span", + "marks": [], + "text": ". You'll notice that the environment changes to show that you are working in the WSL: " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "Ubuntu-20.04", + "_key": "3b136f2e25ed" + }, + { + "_key": "2b05ecc5dc9a", + "_type": "span", + "marks": [], + "text": " environment." + } + ], + "_type": "block", + "style": "normal", + "_key": "8477aa50b556" + }, + { + "_key": "ccce0d447c82", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "688c05a2e65a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "212505adace0", + "asset": { + "_type": "reference", + "_ref": "image-21d824cfc6f7e089c8540b039bf70ba6d894cffa-1045x322-png" + }, + "_type": "image", + "alt": "Selecting the Remote Dev Environment within VS Code" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1c0f9e8084ac", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "87de85dd1fed" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Selecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.", + "_key": "033680399f86" + } + ], + "_type": "block", + "style": "normal", + "_key": "97b4bf2c2e39" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "854147eeda3a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f38eb41209d4", + "markDefs": [] + }, + { + "_key": "c760b9bc3f44", + "markDefs": [], + "children": [ + { + "text": "The Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.", + "_key": "a13e2a66abf8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "cdf08a96ee81", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "635e73c6d746", + "_type": "span" + } + ], + "_type": "block" + }, + { + "alt": "Local vs. Remote Extensions in VS Code", + "_key": "de928cb3b887", + "asset": { + "_ref": "image-b57b2e771cb9d391e3b288644597b973dabb7847-1103x460-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e19b3d008590", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f8e67c10872" + }, + { + "_key": "2ffc1b459c1b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Installing Nextflow", + "_key": "3bbe7dc8af6d", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "31fa19c1467b", + "markDefs": [ + { + "_key": "d4fe69bf68bc", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/getstarted.html#installation" + } + ], + "children": [ + { + "_key": "b79d05c8e70c", + "_type": "span", + "marks": [], + "text": "With Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available " + }, + { + "text": "here", + "_key": "776190f9b369", + "_type": "span", + "marks": [ + "d4fe69bf68bc" + ] + }, + { + "marks": [], + "text": ".", + "_key": "3054a76bff09", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f1ba391b9d9d" + } + ], + "_type": "block", + "style": "normal", + "_key": "5f889674e0b7", + "markDefs": [] + }, + { + "_key": "8a6810742a6c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Step 1: Make sure Java is installed (under WSL)", + "_key": "2f13a264ae06", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "4d6baf7bd227", + "markDefs": [ + { + "href": "https://linuxize.com/post/install-java-on-ubuntu-18-04/", + "_key": "1f09e58f94ac", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Java is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available ", + "_key": "8d72534f3e99" + }, + { + "text": "here", + "_key": "1848e7405401", + "_type": "span", + "marks": [ + "1f09e58f94ac" + ] + }, + { + "marks": [], + "text": ". To install the default OpenJDK, follow the instructions below in a Linux shell window:", + "_key": "f14ddc4774cb", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "85c7a65db5d4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc856575d20b", + "markDefs": [] + }, + { + "_key": "1f74277d542d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "b16587feea85", + "_type": "span", + "marks": [], + "text": "Update the " + }, + { + "text": "apt", + "_key": "73c11ad140a1", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": " package index:", + "_key": "e0429cfd9054", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "code": "$ sudo apt update", + "_type": "code", + "_key": "7a147b05cfa0" + }, + { + "_key": "096c5faf28ed", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Install the latest default OpenJDK package", + "_key": "72bd0d3e28da0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "90f0a9ceb877", + "code": "$ sudo apt install default-jdk" + }, + { + "_type": "block", + "style": "normal", + "_key": "04b35ab91539", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Verify the installation", + "_key": "c3ea1950f36e0" + } + ], + "level": 1 + }, + { + "code": "$ java -version", + "_type": "code", + "_key": "7c8de17513d3" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Step 2: Make sure curl is installed", + "_key": "dacfc23890b7" + } + ], + "_type": "block", + "style": "h3", + "_key": "827381d0653c" + }, + { + "_key": "2d33fb95e34a", + "markDefs": [], + "children": [ + { + "text": "curl", + "_key": "17e848035518", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " is a convenient way to obtain Nextflow. ", + "_key": "4d3ed4a3325c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "curl", + "_key": "69c2189e8f8b" + }, + { + "_type": "span", + "marks": [], + "text": " is included in the default Ubuntu repositories, so installation is straightforward.", + "_key": "d1a971e13a3d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f5caa8a2e947", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "01fc5feac2bc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "d51889a68faa", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "ff4e6cb3db020", + "_type": "span", + "marks": [], + "text": "From the shell:" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "code": "$ sudo apt update\n$ sudo apt install curl", + "_type": "code", + "_key": "5932024996ae" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1648e8235381", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Verify that ", + "_key": "a0f9b2b4c1bf0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "curl", + "_key": "a0f9b2b4c1bf1" + }, + { + "_type": "span", + "marks": [], + "text": " works:", + "_key": "a0f9b2b4c1bf2" + } + ] + }, + { + "code": "$ curl\ncurl: try 'curl --help' or 'curl --manual' for more information", + "_type": "code", + "_key": "4dc7df98ecdd" + }, + { + "_type": "block", + "style": "normal", + "_key": "2ee808a19f90", + "markDefs": [], + "children": [ + { + "_key": "74ae7dc053530", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "96cf5665f7ec", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "STEP 3: Download and install Nextflow", + "_key": "43f8a055417e", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Use ", + "_key": "e26ac8292f82", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "curl", + "_key": "d986df950ca6" + }, + { + "_key": "d0c68299e76c", + "_type": "span", + "marks": [], + "text": " to retrieve Nextflow into a temporary directory and then install it in " + }, + { + "marks": [ + "code" + ], + "text": "/usr/bin", + "_key": "073b97d0d8fd", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " so that the Nextflow command is on your path:\n", + "_key": "dd00beb7df17" + } + ], + "_type": "block", + "style": "normal", + "_key": "bceff68ed977", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "code", + "_key": "e0bc81fff862", + "code": "$ mkdir temp\n$ cd temp\n$ curl -s https://get.nextflow.io | bash\n$ sudo cp nextflow /usr/bin" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Make sure that Nextflow is executable:", + "_key": "e026657f4e18" + } + ], + "_type": "block", + "style": "normal", + "_key": "24c9d6d8401a", + "listItem": "bullet" + }, + { + "code": "$ sudo chmod 755 /usr/bin/nextflow", + "_type": "code", + "_key": "02789afebd78" + }, + { + "style": "normal", + "_key": "1cfc02be296f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "or if you prefer:", + "_key": "5e77429b85b8" + } + ], + "_type": "block" + }, + { + "_key": "afdf942345e4", + "code": "$ sudo chmod +x /usr/bin/nextflow", + "_type": "code" + }, + { + "_key": "62c13a393584", + "markDefs": [], + "children": [ + { + "text": "Step 4: Verify the Nextflow installation", + "_key": "65c3814a76b8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "fc05db644cb0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Make sure Nextflow runs:", + "_key": "06c1d10e12d2" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "1ee17e1ad810", + "code": "$ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io" + }, + { + "style": "normal", + "_key": "4770a61d6c1c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Run a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.", + "_key": "adf3e17153dd", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "code": "$ nextflow run hello\n\nN E X T F L O W ~ version 21.04.2\nLaunching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\nexecutor > local (4)\n[06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\nCiao world!\n\nHola world!\n\nBonjour world!\n\nHello world!", + "_type": "code", + "_key": "cf7f618d0ee9" + }, + { + "_type": "block", + "style": "normal", + "_key": "56ab48f63343", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "18a0f30811fc", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Step 5: Run a Containerized Workflow", + "_key": "2592088d054f", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "5fa4f2dd07f4" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "To validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at ", + "_key": "e70b89aaccea" + }, + { + "marks": [ + "7801cfd7c9ae" + ], + "text": "https://github.com/nextflow-io/blast-example", + "_key": "128b42af6c2e", + "_type": "span" + }, + { + "text": ". Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.", + "_key": "c90986f784f2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b40ecea19814", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/blast-example", + "_key": "7801cfd7c9ae", + "_type": "link" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "03fa9a25a03f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c1313b454b5c", + "markDefs": [] + }, + { + "_key": "37e7d897956c", + "markDefs": [ + { + "_type": "link", + "href": "https://hub.docker.com/r/nextflow/examples", + "_key": "1db071436907" + } + ], + "children": [ + { + "_key": "fd2dd435cc3a", + "_type": "span", + "marks": [], + "text": "The " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "5c5a7e01af20" + }, + { + "_type": "span", + "marks": [], + "text": " file for the Blast example (below) specifies that process logic is encapsulated in the container ", + "_key": "763cdccc977c" + }, + { + "text": "nextflow/examples", + "_key": "e410cc3d7af6", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " available from ", + "_key": "04448505be39" + }, + { + "_type": "span", + "marks": [ + "1db071436907" + ], + "text": "Docker Hub", + "_key": "79f3e2a2d78f" + }, + { + "text": ".", + "_key": "720694b4e92c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9f81b9472fcc", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/blast-example/blob/master/nextflow.config", + "_key": "d9b5040bfbea" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "On GitHub: ", + "_key": "a2d76f94ef060" + }, + { + "_type": "span", + "marks": [ + "d9b5040bfbea" + ], + "text": "nextflow-io/blast-example/nextflow.config", + "_key": "a2d76f94ef061" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "b20a7cdd6ed5", + "code": "manifest {\n nextflowVersion = '>= 20.01.0'\n}\n\nprocess {\n container = 'nextflow/examples'\n}" + }, + { + "style": "normal", + "_key": "137c09358d9b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Run the ", + "_key": "3d8d0f534cf30", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "blast-example", + "_key": "3d8d0f534cf31" + }, + { + "marks": [], + "text": " pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:", + "_key": "3d8d0f534cf32", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "code": "$ nextflow run blast-example -with-docker\nN E X T F L O W ~ version 21.04.2\nLaunching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\nexecutor > local (2)\n[aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n[b3/c41401] process > extract (1) [100%] 1 of 1 ✔\nmatching sequences:\n>lcl|1ABO:B unnamed protein product\nMNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n>lcl|1ABO:A unnamed protein product\nMNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n>lcl|1YCS:B unnamed protein product\nPEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\nGHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\nVQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n>lcl|1IHD:C unnamed protein product\nLPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT", + "_type": "code", + "_key": "b61fe9f1ed01" + }, + { + "style": "normal", + "_key": "e953cc56d62b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "62d104ab4bcc0", + "_type": "span", + "marks": [], + "text": "Nextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes." + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "6ce6a6c025f4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "c2aa32475e3b0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1916480ec6bb", + "_type": "span", + "marks": [], + "text": "Configuring an XServer for the Nextflow Console" + } + ], + "_type": "block", + "style": "h2", + "_key": "e7076c68da06" + }, + { + "_key": "9e379b379a25", + "markDefs": [], + "children": [ + { + "text": "Pipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console’s REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.", + "_key": "95970e1527fe0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3", + "_key": "58528341e741" + } + ], + "children": [ + { + "_key": "03d52c3137f20", + "_type": "span", + "marks": [], + "text": "The Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided " + }, + { + "_key": "03d52c3137f21", + "_type": "span", + "marks": [ + "58528341e741" + ], + "text": "here" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "03d52c3137f22" + } + ], + "_type": "block", + "style": "normal", + "_key": "65a09f351437" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://sourceforge.net/projects/vcxsrv/", + "_key": "68a2572003b8" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Download an X-Windows server for Windows. In this example, we use the ", + "_key": "cab0ac4856dd0" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "VcXsrv Windows X Server", + "_key": "cab0ac4856dd1" + }, + { + "_key": "cab0ac4856dd2", + "_type": "span", + "marks": [], + "text": " available from source forge at " + }, + { + "text": "https://sourceforge.net/projects/vcxsrv/", + "_key": "cab0ac4856dd3", + "_type": "span", + "marks": [ + "68a2572003b8" + ] + }, + { + "text": ".", + "_key": "cab0ac4856dd4", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "dd8bea30133a" + }, + { + "style": "normal", + "_key": "f7001d50792b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Accept all the defaults when running the automated installer. The X-server will end up installed in ", + "_key": "bfa2adf0ae610" + }, + { + "_key": "bfa2adf0ae611", + "_type": "span", + "marks": [ + "code" + ], + "text": "c:\\Program Files\\VcXsrv" + }, + { + "marks": [], + "text": ".", + "_key": "bfa2adf0ae612", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "61a9d9cba12b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The automated installation of VcXsrv will create an ", + "_key": "93f153a1fd160" + }, + { + "marks": [ + "em" + ], + "text": "“XLaunch”", + "_key": "93f153a1fd161", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don’t need to interact with the XLaunch interface every time you start the X-server.", + "_key": "93f153a1fd162" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "ff1c1e6051f6", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:", + "_key": "7f1b02509e130", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "66554b9e649d", + "code": "\"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto" + }, + { + "markDefs": [], + "children": [ + { + "text": "Inspecting the new shortcut properties, it should look something like this:", + "_key": "2284ed95a0f2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d228004a387d" + }, + { + "asset": { + "asset": { + "_ref": "image-3b0169eb5dc5e9bbdbdee3f6d7f751163420ce2e-374x560-png", + "_type": "reference" + }, + "_type": "image" + }, + "size": "medium", + "_type": "picture", + "alt": "X-Server", + "_key": "df6c1a46d7f7", + "alignment": "center" + }, + { + "style": "normal", + "_key": "618c172d11f2", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.", + "_key": "a3afa03581f40", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "e05bbb9594d6", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the ", + "_key": "46e91ac418ee" + }, + { + "_key": "7f0e502be5bd", + "_type": "span", + "marks": [ + "code" + ], + "text": "netstat" + }, + { + "marks": [], + "text": " command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using ", + "_key": "d1c3c1cfea16", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "netstat", + "_key": "5bf9f4569e9e" + }, + { + "text": ", you should see output like the following:", + "_key": "93be1f65b235", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e0528185a33a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cf3144147e6d0" + } + ], + "level": 1 + }, + { + "children": [ + { + "marks": [], + "text": "At this point, the X-server is up and running and awaiting a connection from a client.", + "_key": "5f6eee1cc1d00", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b3bd4dbe76e9", + "listItem": "bullet", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.", + "_key": "105d68a88b120" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "dff4dada8e19", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "fbf559de6536", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "a88fef0901780", + "_type": "span", + "marks": [], + "text": "The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "code": "$ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n$ echo $DISPLAY\n172.28.192.1:0.0", + "_type": "code", + "_key": "c49d2101101c" + }, + { + "style": "normal", + "_key": "091b759d8723", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Add this command to the end of your ", + "_key": "a83ec02d8569" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".bashrc", + "_key": "a17b21e90c9d" + }, + { + "marks": [], + "text": " file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n", + "_key": "35d288c98c5b", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "code": "$ cd ~\n$ vi .bashrc", + "_type": "code", + "_key": "e56e4be46a53" + }, + { + "_key": "ec4180858a1f", + "code": "# set the X-Windows display to connect to VcXsrv on Windows\nexport DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n\".bashrc\" 120L, 3912C written", + "_type": "code" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "75a7c7a61050", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Use an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:", + "_key": "a5f8968c7404", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "c7f61a2b1afd", + "code": "$ sudo apt install xterm", + "_type": "code" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Assuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\nBefore testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\nDepending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n", + "_key": "05f1403294f9", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "740854cfd3e3" + }, + { + "_type": "image", + "_key": "df05e45f2377", + "asset": { + "_type": "reference", + "_ref": "image-5151db8525ae4017f59ea00f747ac4d562fa144a-1066x335-png" + } + }, + { + "style": "normal", + "_key": "707a4fd58fbb", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:", + "_key": "e2b2c4cd4933", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "code", + "_key": "add8018a41c5", + "code": "xterm &" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n", + "_key": "3feb7203ee17" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d19990bea364", + "listItem": "bullet" + }, + { + "_key": "75c7e8005ace", + "asset": { + "_ref": "image-06fdba41ff4b784fcf72247b556662c898fac5f9-1115x229-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "text": "Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to \"", + "_key": "66533b0fb4e6", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Open ports to Work and Home networks", + "_key": "8cc311623e0e" + }, + { + "_type": "span", + "marks": [], + "text": "\" for the ", + "_key": "fc846e00c4ab" + }, + { + "_key": "17bbf5832f52", + "_type": "span", + "marks": [ + "code" + ], + "text": "vcxsrv.exe" + }, + { + "text": " program only as shown:", + "_key": "3c5c53bb064a", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "710ec063371f", + "listItem": "bullet" + }, + { + "alt": "Xserver setup", + "_key": "b9bae58e0955", + "alignment": "center", + "asset": { + "_type": "image", + "asset": { + "_ref": "image-cfcfccd460c17edc6522adad918d5c3432f05647-1281x574-png", + "_type": "reference" + } + }, + "size": "full", + "_type": "picture" + }, + { + "children": [ + { + "text": "With the X-server running, the ", + "_key": "ebb6f2a62e24", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "DISPLAY", + "_key": "506f4eaf54e6", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:", + "_key": "861cde948375" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9ffb7f564ae9", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "code", + "_key": "00ea45079a0d", + "code": "$ nextflow console" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The command above opens the Nextflow REPL console under X-Windows.", + "_key": "c121aa2d895c" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "93d4ea00f148", + "listItem": "bullet" + }, + { + "alt": "REPL", + "_key": "82d803239d3a", + "alignment": "center", + "asset": { + "_type": "image", + "asset": { + "_type": "reference", + "_ref": "image-22ef4c15e39ead3455782aa7e2b652b2b4d1819c-1097x494-png" + } + }, + "size": "full", + "_type": "picture" + }, + { + "_key": "f4fef554096a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Inside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.", + "_key": "2b6ad911f5bc", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h1", + "_key": "af52952d5c1a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Installing Git", + "_key": "35870481806a", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Collaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.", + "_key": "eff9192462fc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "723c5dbea44e" + }, + { + "_type": "block", + "style": "normal", + "_key": "3c149e20f09d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "f08378476176", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "38ee4d1cd649", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)", + "_key": "ee9043276923" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "862393f175cc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "109f9c1c3d44" + }, + { + "_key": "b8f8dfcb927b", + "markDefs": [], + "children": [ + { + "text": "Developers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.", + "_key": "93c560ce7104", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0b1a6996a0b9" + } + ], + "_type": "block", + "style": "normal", + "_key": "24a9036ce325" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Step 1: Install Git on Windows (optional)", + "_key": "67a14d3cf30a" + } + ], + "_type": "block", + "style": "h3", + "_key": "51a9ae0990cd" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).", + "_key": "d78b87e46d50" + } + ], + "_type": "block", + "style": "normal", + "_key": "0255a4977720", + "listItem": "bullet" + }, + { + "alt": "Git installer", + "_key": "e781d1c9c03a", + "alignment": "center", + "asset": { + "asset": { + "_type": "reference", + "_ref": "image-1f69d73b540a4a7ddadebf069b8f5439ffb663b0-584x459-png" + }, + "_type": "image" + }, + "size": "medium", + "_type": "picture" + }, + { + "_type": "block", + "style": "normal", + "_key": "56aed394273f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the ", + "_key": "c3f9745c7a830" + }, + { + "text": "PATH", + "_key": "c3f9745c7a831", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.", + "_key": "c3f9745c7a832" + } + ], + "level": 1 + }, + { + "_key": "739ff25a9db5", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "a41c43e6956d0", + "_type": "span", + "marks": [], + "text": "After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "c046a1fafcf9", + "_type": "link", + "href": "https://training.github.com/downloads/github-git-cheat-sheet.pdf" + } + ], + "children": [ + { + "text": "If you plan to use Git from the command line, GitHub provides a useful cheatsheet ", + "_key": "a64cb51ac1d50", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "c046a1fafcf9" + ], + "text": "here", + "_key": "a64cb51ac1d51", + "_type": "span" + }, + { + "text": ".", + "_key": "a64cb51ac1d52", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "21f0e1daac42", + "listItem": "bullet" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "fc3740772c41", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.", + "_key": "ace6c6503caa", + "_type": "span", + "marks": [] + } + ] + }, + { + "asset": { + "_ref": "image-23f768df1d9747a2ddfa1e063e20b2d1a71fb3f8-1106x410-png", + "_type": "reference" + }, + "_type": "image", + "_key": "c5fa71506312" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "54469fddb4a4", + "listItem": "bullet", + "markDefs": [ + { + "_key": "e4c73c309d0a", + "_type": "link", + "href": "https://code.visualstudio.com/docs/editor/versioncontrol" + } + ], + "children": [ + { + "text": "Documentation on using Git with Visual Studio Code is provided at ", + "_key": "8bd98083606b", + "_type": "span", + "marks": [] + }, + { + "_key": "5f44572362d5", + "_type": "span", + "marks": [ + "e4c73c309d0a" + ], + "text": "https://code.visualstudio.com/docs/editor/versioncontrol" + } + ] + }, + { + "children": [ + { + "_key": "144893a25ca4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0f32cde0fab3", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Step 2: Install Git on Linux", + "_key": "653e5d2c0889" + } + ], + "_type": "block", + "style": "h3", + "_key": "3d4ac2b1cdbd" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4c10a381b45a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "6b5acb1f85c7", + "_type": "span", + "marks": [], + "text": "Open a Remote VS Code Window on " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "WSL: Ubuntu 20.04\\", + "_key": "ff309f0609df" + }, + { + "_key": "98e5daa89e6a", + "_type": "span", + "marks": [], + "text": " (by selecting the green icon on the lower-left corner of the VS code interface.)" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Git should already be installed in ", + "_key": "1f9b606b7b95" + }, + { + "marks": [ + "code" + ], + "text": "/usr/bin", + "_key": "1059b4fce06c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", but you can validate this from the Ubuntu shell:", + "_key": "20d3fd40d7eb" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0d991245e399", + "listItem": "bullet" + }, + { + "code": "$ git --version\ngit version 2.25.1", + "_type": "code", + "_key": "c5db4a2f84d8" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "104a27f8a984", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To get started using Git with VS Code Remote on WSL, select the ", + "_key": "f1e75e7d370d", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "Source Control icon", + "_key": "bc422bc03917", + "_type": "span" + }, + { + "marks": [], + "text": " on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to ", + "_key": "12d967901439", + "_type": "span" + }, + { + "_key": "c82beb9135fb", + "_type": "span", + "marks": [ + "em" + ], + "text": "Clone a Repository" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "f4acaeefb3be" + } + ] + }, + { + "children": [ + { + "_key": "2abe50db5d1d", + "_type": "span", + "marks": [], + "text": "Select “Clone Repository,” and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see " + }, + { + "text": "blast-example", + "_key": "1711a31c9b69", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " appear as a source code repository within VS code as shown:", + "_key": "66c35532d140" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "712860f90d2e", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "8488b5cdf9e3", + "asset": { + "_ref": "image-12ef4e77353231b5e79356f1cf28dc00728df601-1044x370-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "_key": "9a527b0aa79d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Select the ", + "_key": "b9ad1da7653e" + }, + { + "text": "Explorer", + "_key": "e471d7de7f67", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " panel in VS Code to see the cloned ", + "_key": "5de220b54852", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "blast-example", + "_key": "bc52e2a946f5" + }, + { + "text": " repo. Now we can explore and modify the pipeline code using the IDE.", + "_key": "909c6c8467df", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "_key": "d30fe03a5b93", + "asset": { + "_ref": "image-0b483f0435be8519d2d23fae295c83293487944a-1036x368-png", + "_type": "reference" + } + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "abeed99d86ef", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "After making modifications to the pipeline, we can execute the ", + "_key": "2854fb13da88", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "local copy", + "_key": "f8a641c2f9e5" + }, + { + "marks": [], + "text": " of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:", + "_key": "649641fb1bba", + "_type": "span" + } + ] + }, + { + "asset": { + "_type": "reference", + "_ref": "image-393a8e39295064a9cfa8ce16c50e06ace0567af2-1022x587-png" + }, + "_type": "image", + "_key": "b1d65785a0c0" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b8f2a2d34b09", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.", + "_key": "38e7e44105390", + "_type": "span", + "marks": [] + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6b4e159ec41f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Git commands are available from within VS Code by selecting the ", + "_key": "aec58a1059d7", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Source Control", + "_key": "82983680a647" + }, + { + "text": " icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.", + "_key": "f6830f9f93a9", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "acd58d584595", + "asset": { + "_ref": "image-76dd0dbcf48b4eb20f69d5a650de49b5e8950465-1065x501-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Summary", + "_key": "2cb8ba9ed7ef" + } + ], + "_type": "block", + "style": "h2", + "_key": "138a8537f9af" + }, + { + "_key": "ee21f6530a7a", + "markDefs": [], + "children": [ + { + "_key": "b21885fd5027", + "_type": "span", + "marks": [], + "text": "With WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "46225a359393", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1b86b9b55660", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "44f7288da3fd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipelines developed in the Windows environment can easily be extended to compute environments in the cloud.", + "_key": "50b769503196" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e36337855b8b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6795e83080f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "429b6c49b3b4", + "markDefs": [], + "children": [ + { + "_key": "79b5d195edde", + "_type": "span", + "marks": [], + "text": "While installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful." + } + ], + "_type": "block" + } + ], + "_createdAt": "2024-09-25T14:16:23Z", + "_type": "blogPost", + "title": "Setting up a Nextflow environment on Windows 10", + "_id": "4b604e483009", + "publishedAt": "2021-10-13T06:00:00.000Z", + "meta": { + "description": "For Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as Cygwin or Wubi. Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.", + "slug": { + "current": "setup-nextflow-on-windows" + } + }, + "_rev": "hf9hwMPb7ybAE3bqEU5jaT", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_updatedAt": "2024-10-14T09:50:58Z" + }, + { + "_type": "blogPost", + "_updatedAt": "2024-09-26T09:03:17Z", + "_id": "4c604992c2f4", + "meta": { + "slug": { + "current": "nextflow-summit-call-for-abstracts" + } + }, + "_createdAt": "2024-09-25T14:16:43Z", + "title": "Nextflow Summit 2022", + "_rev": "Ot9x7kyGeH5005E3MJ4UOr", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "7bdf864c12bc" + } + ], + "publishedAt": "2022-06-17T06:00:00.000Z", + "author": { + "_ref": "drafts.phil-ewels", + "_type": "reference" + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "57a1e971a942", + "markDefs": [ + { + "_type": "link", + "href": "https://twitter.com/nextflowio/status/1534903352810676224", + "_key": "86523a05dcae" + }, + { + "href": "https://nf-co.re/events/2022/hackathon-october-2022", + "_key": "b9ed3488cd80", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "86523a05dcae" + ], + "text": "As recently announced", + "_key": "e5d96fa57dad" + }, + { + "_type": "span", + "marks": [], + "text": ", we are super excited to host a new Nextflow community event late this year! The Nextflow Summit will take place ", + "_key": "8eb0c1bbc49b" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "October 12-14, 2022", + "_key": "d4401201ef11" + }, + { + "_type": "span", + "marks": [], + "text": " at the iconic Torre Glòries in Barcelona, with an associated ", + "_key": "2873c5561581" + }, + { + "text": "nf-core hackathon", + "_key": "59a69f825177", + "_type": "span", + "marks": [ + "b9ed3488cd80" + ] + }, + { + "_key": "64a87bab814a", + "_type": "span", + "marks": [], + "text": " beforehand." + } + ] + }, + { + "style": "normal", + "_key": "38569f0290c9", + "children": [ + { + "_key": "91c6bf95038c", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Call for abstracts", + "_key": "08d081e6bc47" + } + ], + "_type": "block", + "style": "h3", + "_key": "b568501de35e" + }, + { + "_key": "4caeb4804c2c", + "markDefs": [], + "children": [ + { + "_key": "be41440d4ce0", + "_type": "span", + "marks": [], + "text": "Today we’re excited to open the call for abstracts! We’re looking for talks and posters about anything and everything happening in the Nextflow world. Specifically, we’re aiming to shape the program into four key areas:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "23586f5c02e2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6abcf4012779" + }, + { + "style": "normal", + "_key": "36ee018ccc66", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow: central tool / language / plugins", + "_key": "595791ed29fb" + }, + { + "_key": "e7fb8923ed22", + "_type": "span", + "marks": [], + "text": "Community: pipelines / applications / use cases" + }, + { + "_key": "675eea12cdb2", + "_type": "span", + "marks": [], + "text": "Ecosystem: infrastructure / environments" + }, + { + "_type": "span", + "marks": [], + "text": "Software: containers / tool packaging", + "_key": "d3a0221a3112" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "7346b0cc79bf", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "007bf1255aae" + }, + { + "style": "normal", + "_key": "729b63d9df77", + "markDefs": [], + "children": [ + { + "_key": "a78ab63c46c5", + "_type": "span", + "marks": [], + "text": "Speaking at the summit will primarily be in-person, but we welcome posters from remote attendees. Posters will be submitted digitally and available online during and after the event. Talks will be streamed live and be available after the event." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "3180e240eb3a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "85813db438a9" + }, + { + "markDefs": [ + { + "href": "https://seqera.typeform.com/summit-22-talks", + "_key": "ac0f61d6550d", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "ac0f61d6550d" + ], + "text": "Apply for a talk or poster", + "_key": "0214e164e189" + } + ], + "_type": "block", + "style": "normal", + "_key": "0415eda34290" + }, + { + "_type": "block", + "style": "normal", + "_key": "c811914d965a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "aa69d1bb3c71" + } + ] + }, + { + "_key": "b48df9110b51", + "children": [ + { + "_type": "span", + "text": "Key dates", + "_key": "da2107902af6" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "3eb389049034", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Registration for the event will happen separately, with key dates as follows (subject to change):", + "_key": "ae8d2519ae65" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6ab63afa8753", + "children": [ + { + "text": "", + "_key": "ffd0afcd320f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "a3c891ed0b5c", + "listItem": "bullet", + "children": [ + { + "_key": "2b4afe25339c", + "_type": "span", + "marks": [], + "text": "Jun 17: Call for abstracts opens" + }, + { + "_type": "span", + "marks": [], + "text": "July 1: Registration opens", + "_key": "51f87fca56b2" + }, + { + "_type": "span", + "marks": [], + "text": "July 22: Call for abstracts closes", + "_key": "4ee5bee2518b" + }, + { + "_type": "span", + "marks": [], + "text": "July 29: Accepted speakers notified", + "_key": "384703a08163" + }, + { + "marks": [], + "text": "Sept 9: Registration closes", + "_key": "b703afc3bc21", + "_type": "span" + }, + { + "_key": "8a18a0f0ea7d", + "_type": "span", + "marks": [], + "text": "Oct 10-12: Hackathon" + }, + { + "marks": [], + "text": "Oct 12-14: Summit", + "_key": "0af9e75cda6e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0118bf0012da" + } + ], + "_type": "block", + "style": "normal", + "_key": "92edde9851fe" + }, + { + "_key": "ef07c1f688b1", + "markDefs": [], + "children": [ + { + "text": "Abstracts will be read and speakers notified on a rolling basis, so apply soon!", + "_key": "bbec05222bb7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "2f29ef185a41", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2b4c8276e96a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e54986479e0e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The Nextflow Summit will start Weds, Oct 12, 5:00 PM CEST and close Fri, Oct 14, 1:00 PM CEST.", + "_key": "e8046985a6dc", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "1bd6991ba9ea", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "4409d65a5a04" + }, + { + "_key": "08fac6816b7d", + "children": [ + { + "_key": "6722f5d9624c", + "_type": "span", + "text": "Travel bursaries" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "48e0a4e3d719", + "markDefs": [ + { + "_key": "faaa75339152", + "_type": "link", + "href": "https://chanzuckerberg.com/eoss/proposals/nextflow-and-nf-core/" + } + ], + "children": [ + { + "text": "Thanks to funding from the Chan Zuckerberg Initiative ", + "_key": "f14243e0d7a5", + "_type": "span", + "marks": [] + }, + { + "text": "EOSS Diversity & Inclusion grant", + "_key": "9971414159a0", + "_type": "span", + "marks": [ + "faaa75339152" + ] + }, + { + "marks": [], + "text": ", we are offering 5 bursaries for travel and accommodation. These will only be available to those who have applied to present a talk or poster and will cover up to $1500 USD, plus registration costs.", + "_key": "6db327d41b5f", + "_type": "span" + } + ] + }, + { + "_key": "ae4f1abd64fe", + "children": [ + { + "_type": "span", + "text": "", + "_key": "da4db3fd2b9d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you’re interested, please select this option when filling the abstracts application form and we will be in touch with more details.", + "_key": "7256d62993ea", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "984a5d922d7e" + }, + { + "_type": "block", + "style": "normal", + "_key": "30d3f4457202", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e7d4a3352b01" + } + ] + }, + { + "style": "h3", + "_key": "986f052776a8", + "children": [ + { + "_key": "08b926abb36a", + "_type": "span", + "text": "Stay in the loop" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "7df908c288ce", + "markDefs": [ + { + "_key": "9e0f8d76a010", + "_type": "link", + "href": "https://summit.nextflow.io" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "More information about the summit will be available soon, as we continue to plan the event. Please visit ", + "_key": "612a0bab4b41" + }, + { + "marks": [ + "9e0f8d76a010" + ], + "text": "https://summit.nextflow.io", + "_key": "35b0e916ecb7", + "_type": "span" + }, + { + "text": " for details and to sign up to the email list for event updates.", + "_key": "c0a9588c7a40", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "47b459e4397d" + } + ], + "_type": "block", + "style": "normal", + "_key": "01ea682ca980" + }, + { + "_type": "block", + "style": "normal", + "_key": "e7d0bbd6ba60", + "markDefs": [ + { + "href": "https://share.hsforms.com/1F2Q5F0hSSiyNfuKo6tt-lw3zq3j", + "_key": "56f314435d72", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "56f314435d72" + ], + "text": "Subscribe for updates", + "_key": "265c65c13200" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "0284ecb08f83", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5fe7112e45b9" + }, + { + "_key": "d9e2989d8733", + "markDefs": [ + { + "_type": "link", + "href": "http://twitter.com/hashtag/NextflowSummit", + "_key": "01479bd6be09" + } + ], + "children": [ + { + "text": "We will be tweeting about the event using the ", + "_key": "d83a142e1c1c", + "_type": "span", + "marks": [] + }, + { + "_key": "eae889e37eef", + "_type": "span", + "marks": [ + "01479bd6be09" + ], + "text": "#NextflowSummit" + }, + { + "text": " hashtag on Twitter. See you in Barcelona!", + "_key": "bb0d15fe03c6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ] + }, + { + "_rev": "hf9hwMPb7ybAE3bqEITLMZ", + "meta": { + "slug": { + "current": "step-by-step-rna-seq", + "_type": "slug" + }, + "_type": "meta", + "description": "We are excited to launch our new Step-by-Step blog series on running Nextflow pipelines in Seqera Platform. With accompanying technical guides, the series also demonstrates how to create and configure environments for flexible tertiary analysis and troubleshooting with Data Studios.", + "noIndex": false + }, + "_id": "4ec4b56d-7cc0-4395-bb84-83f0e70b3f65", + "_updatedAt": "2024-10-11T07:26:17Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "We are excited to launch our new ", + "_key": "2416087f7fc20", + "_type": "span", + "marks": [] + }, + { + "_key": "b54b44c628b4", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Step-by-Step blog series " + }, + { + "_key": "c48b795647da", + "_type": "span", + "marks": [], + "text": "on running Nextflow pipelines in Seqera Platform. With accompanying technical guides, the series also demonstrates how to create and configure environments for flexible tertiary analysis and troubleshooting with Data Studios." + } + ], + "_type": "block", + "style": "normal", + "_key": "f9560979d244" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/rnaseq/3.14.0/", + "_key": "89cc513e49f2" + } + ], + "children": [ + { + "text": "First up: bulk RNA sequencing (RNA-Seq) analysis with the popular ", + "_key": "9e78ccaa040f0", + "_type": "span", + "marks": [] + }, + { + "text": "nf-core/rnaseq pipeline", + "_key": "9e78ccaa040f1", + "_type": "span", + "marks": [ + "89cc513e49f2" + ] + }, + { + "marks": [], + "text": ".", + "_key": "9e78ccaa040f2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8df044584c5b" + }, + { + "_type": "image", + "_key": "9db3354d0bc0", + "asset": { + "_ref": "image-86381b024e7fed16914933c27bbe38ccfd8e1218-2265x946-png", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "h2", + "_key": "86b1f5b85f80", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The challenge of bulk RNA-Seq analysis", + "_key": "0a14112491e40", + "_type": "span" + } + ] + }, + { + "_key": "fd6999649d8e", + "markDefs": [ + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406561/", + "_key": "42ba36bdabd3" + }, + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/#pone.0278609.ref002", + "_key": "46970291c5f9" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A single RNA-Seq experiment can generate ", + "_key": "b7221a3c3d190" + }, + { + "_key": "b7221a3c3d191", + "_type": "span", + "marks": [ + "42ba36bdabd3" + ], + "text": "gigabytes, or even terabytes" + }, + { + "marks": [], + "text": ", of raw data. Translating this data into meaningful scientific results demands ", + "_key": "b7221a3c3d192", + "_type": "span" + }, + { + "marks": [ + "46970291c5f9" + ], + "text": "substantial computational power, automation, and storage", + "_key": "b7221a3c3d193", + "_type": "span" + }, + { + "text": ".", + "_key": "b7221a3c3d194", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "197efa67c9e40" + } + ], + "_type": "block", + "style": "normal", + "_key": "b5e33c75a3a6" + }, + { + "_type": "block", + "style": "normal", + "_key": "9d6e316ffb52", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As data volumes continue to grow, analysis becomes increasingly complex, especially when leveraging public resources while maintaining full sovereignty over your data. The solution?", + "_key": "66e0cb26931d0" + }, + { + "_key": "66e0cb26931d1", + "_type": "span", + "marks": [ + "strong" + ], + "text": " Seqera — a centralized bio data stack" + }, + { + "_type": "span", + "marks": [], + "text": " for bulk RNA-Seq analysis.", + "_key": "66e0cb26931d2" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "60fff7fa9e5d0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8191c9a75257" + }, + { + "markDefs": [], + "children": [ + { + "_key": "99674b059d630", + "_type": "span", + "marks": [], + "text": "In this blog post, we provide a step-by-step guide to analyze RNA-Seq data with Seqera, from quality control to differential expression analysis. We also demonstrate how to perform downstream analysis and visualize your data in a unified location." + } + ], + "_type": "block", + "style": "normal", + "_key": "1f23b3e52023" + }, + { + "_type": "block", + "style": "normal", + "_key": "536666ed7688", + "markDefs": [], + "children": [ + { + "_key": "93b49ab6c33f0", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "f50c329381fa", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02T26c10", + "_key": "c560b9e28fb8" + } + ], + "children": [ + { + "text": "Check out the full", + "_key": "b3a027e204c0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " ", + "_key": "8f1f21426b6b" + }, + { + "_type": "span", + "marks": [ + "strong", + "c560b9e28fb8" + ], + "text": "RNA-Seq guide", + "_key": "81f13e2f0aa9" + }, + { + "text": " ", + "_key": "6659f0e1d752", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": "now", + "_key": "4c954b7253b7", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "style": "normal", + "_key": "1f530ea7976a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "bb9706570246", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "c9caa0f35d6d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Perform bulk RNA-Seq analysis in Seqera ", + "_key": "e0c48b21920f0", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "af84e8b35d8a", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "063186af7ec2" + }, + { + "_type": "block", + "style": "h3", + "_key": "9e533abca7ae", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "1. Add a compute environment", + "_key": "4d8c8e640876" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "fdb30e0a53c3", + "markDefs": [], + "children": [ + { + "text": "In Seqera, you are not limited to hosted compute solutions. Add and configure your choice of cloud or HPC compute environments tailored to your analysis needs in your organization workspace.\n", + "_key": "d55935c16025", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "d343783891dd", + "markDefs": [ + { + "_type": "link", + "href": "https://deploy-preview-131--seqera-docs.netlify.app/platform/24.1.1/getting-started/rnaseq#rna-seq-data-and-requirements", + "_key": "231c3c1f5d6e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "💡 ", + "_key": "6630bd2611cb0" + }, + { + "marks": [ + "strong" + ], + "text": "Hint: ", + "_key": "d5d5e50eb8af", + "_type": "span" + }, + { + "marks": [], + "text": "Depending on the number of samples and the sequencing depth of your input data, select the desired ", + "_key": "e70a96b76c37", + "_type": "span" + }, + { + "marks": [ + "231c3c1f5d6e" + ], + "text": "compute and storage recommendations", + "_key": "6630bd2611cb1", + "_type": "span" + }, + { + "marks": [], + "text": " for your RNA-Seq analysis.", + "_key": "6630bd2611cb2", + "_type": "span" + } + ] + }, + { + "asset": { + "_ref": "image-11f5e2e5a1fdf1554329af5843be890dcf7f60b0-2452x1080-gif", + "_type": "reference" + }, + "_type": "image", + "_key": "7c3d10af89b6" + }, + { + "_key": "4f47a0ac33d5", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02T26c10", + "_key": "92d7b15144de" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See the ", + "_key": "3340851e8d700" + }, + { + "marks": [ + "92d7b15144de", + "strong" + ], + "text": "full RNASeq guide", + "_key": "c394ca815f2b", + "_type": "span" + }, + { + "marks": [], + "text": " for AWS Batch compute environment configuration steps.", + "_key": "e01af32724dd", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_key": "b82d885bb30e", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "338b9773274b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "3277fa6e7a1e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "2. Add the nf-core/rnaseq pipeline to your workspace", + "_key": "f09af77e58640" + } + ] + }, + { + "_key": "8589cc2e751f", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/pipelines/", + "_key": "9dabd7634c9e" + } + ], + "children": [ + { + "text": "Quickly locate and import the nf-core/rnaseq pipeline from ", + "_key": "fff87197a1c90", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "9dabd7634c9e" + ], + "text": "Seqera Pipelines", + "_key": "ee81bf6bc3d1" + }, + { + "_type": "span", + "marks": [], + "text": ", the largest curated open source repository of Nextflow pipelines.\n", + "_key": "5490456192c1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "asset": { + "_type": "reference", + "_ref": "image-cbd868250d3235cc42d5d2b9afed55cf4a51afc4-2452x1080-gif" + }, + "_type": "image", + "_key": "af17be5a781e" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c923b38b50c7", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "9097e2b50848" + }, + { + "children": [ + { + "_key": "fef4b73d0a460", + "_type": "span", + "marks": [], + "text": "3. Add your input data" + } + ], + "_type": "block", + "style": "h3", + "_key": "b828d3ebe44c", + "markDefs": [] + }, + { + "_key": "29ca750385d3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "d3e46de119d6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/23.3/data/data-explorer", + "_key": "0b4ee73ccb98" + }, + { + "_key": "3d81dbad7494", + "_type": "link", + "href": "https://docs.seqera.io/platform/23.2/datasets/overview" + } + ], + "children": [ + { + "text": "Easily access your RNA-Seq data directly from cloud storage with ", + "_key": "9219b4d669940", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "0b4ee73ccb98" + ], + "text": "Data Explorer", + "_key": "66f2139acb5c" + }, + { + "text": ", or upload your samplesheets as CSV or TSV files with ", + "_key": "93a5a85286cc", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "3d81dbad7494" + ], + "text": "Seqera Datasets", + "_key": "d7ebdb1b9163", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "174bdaa61b05", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d3e6240f6a5b" + }, + { + "_type": "block", + "style": "normal", + "_key": "2e919cf9fcfe", + "markDefs": [], + "children": [ + { + "_key": "220e46a63df8", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "image", + "_key": "014b6b9185b0", + "asset": { + "_ref": "image-c4001e1a1358d7824560347d93e5f73380c2ecbc-2842x1430-gif", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "b670539c67cd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "76b62d63229a" + } + ] + }, + { + "style": "blockquote", + "_key": "14b3055d38ba", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/24.1/getting-started/quickstart-demo/add-data", + "_key": "ee3e29b1836a" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For more information on how to add samplesheets or other data to your workspace, see ", + "_key": "34d2e93139c10" + }, + { + "text": "Add data", + "_key": "34d2e93139c11", + "_type": "span", + "marks": [ + "ee3e29b1836a", + "strong" + ] + }, + { + "marks": [], + "text": ".", + "_key": "34d2e93139c12", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n4. Launch your RNA-Seq analysis", + "_key": "d8ffc77eb2eb0" + } + ], + "_type": "block", + "style": "h3", + "_key": "200f421da11f" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "140e7fde60d1" + } + ], + "_type": "block", + "style": "normal", + "_key": "7b5a1dd73392", + "markDefs": [] + }, + { + "_key": "297cf7a52db2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "So far, you have:\n", + "_key": "7817985b2a6f0" + }, + { + "_key": "7817985b2a6f1", + "_type": "span", + "marks": [], + "text": "✔ Created a compute environment\n✔ Added a pipeline to your workspace\n✔ Made your RNA-Seq data accessible" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b6ded3b5950b" + } + ], + "_type": "block", + "style": "normal", + "_key": "66b8edceceae" + }, + { + "style": "normal", + "_key": "7372b7e9b188", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With your compute environment, pipeline, and data all accessible in your Seqera workspace, you are now ready to launch your analysis.", + "_key": "327b97a200040" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6640e6b2d8ad" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e9e369ea4ea" + }, + { + "_key": "4e5386177db6", + "asset": { + "_type": "reference", + "_ref": "image-ec22f1a3f3bf30daa89a6e2299af6d90e324f5f1-2452x1080-gif" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "feeede0ec4b6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "464c6a3ffbbd" + } + ] + }, + { + "style": "h3", + "_key": "182c2347f18d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "5. Monitor your pipeline run", + "_key": "020fdd0a93170", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "554f6ea184ee" + } + ], + "_type": "block", + "style": "normal", + "_key": "9a7c4161c7b5" + }, + { + "children": [ + { + "_key": "120b0356a9ef0", + "_type": "span", + "marks": [], + "text": "Monitor your RNA-Seq analysis in real-time with aggregated statistics, workflow metrics, execution logs, and task details." + } + ], + "_type": "block", + "style": "normal", + "_key": "d59004f41672", + "markDefs": [] + }, + { + "_key": "4912e62172b9", + "asset": { + "_ref": "image-9fd15d225aeb54b8c2841bc74a54e42a5c8bf410-2844x1390-gif", + "_type": "reference" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "4aec37e32e6f", + "markDefs": [], + "children": [ + { + "_key": "394eff86aae5", + "_type": "span", + "marks": [], + "text": "\n" + } + ] + }, + { + "style": "h3", + "_key": "562e0b16234f", + "markDefs": [], + "children": [ + { + "_key": "00861713cf910", + "_type": "span", + "marks": [], + "text": "6. Visualize results in a single, shareable report" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "46601358193a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ea18e816a43b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "59e41678413f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Generate a single HTML report with MultiQC for your RNA-Seq analysis to assess the integrity of your results, including statistics, alignment scores, and quality control metrics. Easily share your findings with collaborators via the report URL.", + "_key": "c4666f11149d0" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b19eb6571ffe", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a8cf58fd9ac7" + } + ] + }, + { + "_type": "image", + "_key": "ce897e818b5d", + "asset": { + "_ref": "image-1adf78a2589c3429a67b2d2935dc62ac0139e06c-2452x1080-gif", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "_key": "8bd8a33eaca0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bc8b22dc8bcb" + }, + { + "children": [ + { + "_key": "836844ce5743", + "_type": "span", + "marks": [], + "text": "💡" + }, + { + "_key": "f46f32f1d643", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Hint:" + }, + { + "_type": "span", + "marks": [], + "text": " Easily share your findings with collaborators via the report URL.", + "_key": "57892c4eb147" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "86269ae87f15", + "markDefs": [] + }, + { + "_key": "bddab7973f06", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n7. Perform interactive downstream analysis adjacent to your pipeline outputs", + "_key": "b25adfd756ce0" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "text": "", + "_key": "bb7ed0bd126d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "73f5877cf28c", + "markDefs": [] + }, + { + "children": [ + { + "text": "RNA-Seq analysis often requires human interpretation or further downstream analysis of pipeline outputs. For example, using ", + "_key": "d20a88decf2d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "DESeq2", + "_key": "ee61aae39e891" + }, + { + "_type": "span", + "marks": [], + "text": " for differential gene expression analysis.", + "_key": "ee61aae39e892" + } + ], + "_type": "block", + "style": "normal", + "_key": "0fd3c84dff27", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "f200cfbaf920", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/24.1/data/data-studios", + "_key": "43bb4dfea049" + } + ], + "children": [ + { + "marks": [], + "text": "Bring interactive analytical notebook environments (RStudio, Jupyter, VSCode) adjacent to your data with ", + "_key": "764a091e4b5c0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "43bb4dfea049" + ], + "text": "Seqera’s Data Studios", + "_key": "764a091e4b5c1" + }, + { + "_key": "764a091e4b5c2", + "_type": "span", + "marks": [], + "text": " and perform downstream analysis as if you were running locally." + } + ] + }, + { + "_key": "fc4e3a11cc58", + "asset": { + "_ref": "image-9fed530cfba0aa3bd72f477449603e8bded83f09-2452x1080-gif", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "_key": "d56aeedbf5a4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "af81eb9ab77f", + "markDefs": [] + }, + { + "_type": "block", + "style": "blockquote", + "_key": "73600369d6a7", + "markDefs": [ + { + "href": "https://hubs.la/Q02T26c10", + "_key": "c027a3adceef", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Check out the ", + "_key": "153ee960e8ee0" + }, + { + "_type": "span", + "marks": [ + "c027a3adceef", + "strong" + ], + "text": "full RNASeq guide", + "_key": "f2f0efceb629" + }, + { + "text": " ", + "_key": "abc753492ace", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": "now", + "_key": "cba517eb8e9c", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "ca0db6380e4d", + "markDefs": [], + "children": [ + { + "text": "\nTry Seqera for free", + "_key": "422d6784ced7", + "_type": "span", + "marks": [ + "strong" + ] + } + ] + }, + { + "children": [ + { + "_key": "e30a2b48e8c5", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "814ff460cdd6", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "By leveraging cloud-native technology, Seqera bridges the gap between experimental data and computational analysis, allowing you to accelerate the time from data generation to meaningful scientific insights.", + "_key": "f115a7417898" + } + ], + "_type": "block", + "style": "normal", + "_key": "f8ca9ce7dfd1" + }, + { + "_key": "d6ccd91918ea", + "markDefs": [ + { + "_key": "e87779c2247d", + "_type": "link", + "href": "https://hubs.la/Q02T26TB0" + } + ], + "children": [ + { + "_key": "8e509cdc34581", + "_type": "span", + "marks": [ + "e87779c2247d", + "strong" + ], + "text": "Sign-up" + }, + { + "_key": "22644e6e6b12", + "_type": "span", + "marks": [], + "text": " for free" + } + ], + "_type": "block", + "style": "blockquote" + } + ], + "_createdAt": "2024-10-02T07:26:47Z", + "_type": "blogPost", + "tags": [ + { + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference", + "_key": "e4630a226ba3" + } + ], + "title": "Step-by-Step Series: RNA-Seq analysis in Seqera", + "publishedAt": "2024-10-11T07:54:00.000Z", + "author": { + "_type": "reference", + "_ref": "7691d57c-16a2-4ca7-a29a-fa5d9b158a3b" + } + }, + { + "_type": "blogPost", + "body": [ + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max describes how they deploy complex cancer data analysis pipelines using Nextflow and Singularity. We are very happy to share their experience across the Nextflow community.", + "_key": "a5016cfc0d3f" + } + ], + "_type": "block", + "style": "normal", + "_key": "d3ae7d6b8b48" + }, + { + "_key": "795be87b0121", + "children": [ + { + "text": "", + "_key": "58cbc65b5de5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "5a64db12bd62", + "children": [ + { + "text": "The CAW pipeline", + "_key": "143e44dfb014", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "image", + "alt": "Cancer Analysis Workflow logo", + "_key": "d7f04441979d", + "asset": { + "_ref": "image-dbe4ce75bccf9c7ff6b86a4daf9a3325f811dd92-197x100-png", + "_type": "reference" + } + }, + { + "_key": "662c2673c6bf", + "markDefs": [ + { + "href": "http://opensource.scilifelab.se/projects/sarek/", + "_key": "b5370da3669b", + "_type": "link" + }, + { + "_key": "376de3c4a017", + "_type": "link", + "href": "https://www.scilifelab.se/" + }, + { + "href": "https://ngisweden.scilifelab.se/", + "_key": "5eac11f777ec", + "_type": "link" + }, + { + "href": "https://www.scilifelab.se/facilities/ngi-stockholm/", + "_key": "c10c77ac5016", + "_type": "link" + }, + { + "href": "https://www.nbis.se/", + "_key": "edb4c35acfd4", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "b5370da3669b" + ], + "text": "Cancer Analysis Workflow", + "_key": "9cbbdd284485" + }, + { + "_type": "span", + "marks": [], + "text": " (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs. It is developed in collaboration with two infrastructures within ", + "_key": "ad26c75ea4f4" + }, + { + "_type": "span", + "marks": [ + "376de3c4a017" + ], + "text": "Science for Life Laboratory", + "_key": "9b4acb7fa111" + }, + { + "marks": [], + "text": ": ", + "_key": "b2878166cde1", + "_type": "span" + }, + { + "_key": "668aa26bdca4", + "_type": "span", + "marks": [ + "5eac11f777ec" + ], + "text": "National Genomics Infrastructure" + }, + { + "_key": "ee4e4dd86a75", + "_type": "span", + "marks": [], + "text": " (NGI), in The Stockholm " + }, + { + "_key": "12be2148a727", + "_type": "span", + "marks": [ + "c10c77ac5016" + ], + "text": "Genomics Applications Development Facility" + }, + { + "_type": "span", + "marks": [], + "text": " to be precise and ", + "_key": "a6862a44a7f8" + }, + { + "_key": "1b61c1ecda59", + "_type": "span", + "marks": [ + "edb4c35acfd4" + ], + "text": "National Bioinformatics Infrastructure Sweden" + }, + { + "_type": "span", + "marks": [], + "text": " (NBIS).", + "_key": "7a41341e9854" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "237118c93032", + "children": [ + { + "text": "", + "_key": "fb06baa4d5d6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "28b5d5e00f3f", + "markDefs": [ + { + "href": "https://software.broadinstitute.org/gatk/best-practices/", + "_key": "400b1a893741", + "_type": "link" + }, + { + "href": "https://github.com/broadinstitute/mutect/", + "_key": "29d1f2d25a2d", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/broadgsa/gatk-protected/", + "_key": "ec5a7e979f1f" + }, + { + "_type": "link", + "href": "https://github.com/Illumina/strelka/", + "_key": "ffc06c8a5432" + }, + { + "_type": "link", + "href": "https://github.com/ekg/freebayes/", + "_key": "2a2f890970e8" + }, + { + "_type": "link", + "href": "https://github.com/broadgsa/gatk-protected/", + "_key": "affda0050980" + }, + { + "_type": "link", + "href": "https://github.com/Illumina/manta/", + "_key": "3d1af233cd71" + }, + { + "_type": "link", + "href": "https://github.com/Crick-CancerGenomics/ascat/", + "_key": "75d0de940ae1" + }, + { + "href": "http://snpeff.sourceforge.net/", + "_key": "3c1912c340f6", + "_type": "link" + }, + { + "_key": "bc474e49db24", + "_type": "link", + "href": "https://www.ensembl.org/info/docs/tools/vep/index.html" + }, + { + "href": "http://multiqc.info/", + "_key": "7062b31b0eb9", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "CAW is based on ", + "_key": "54c97875242f" + }, + { + "_key": "f4d2fb053447", + "_type": "span", + "marks": [ + "400b1a893741" + ], + "text": "GATK Best Practices" + }, + { + "marks": [], + "text": " for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels (", + "_key": "b8fa3833e0d5", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "29d1f2d25a2d" + ], + "text": "MuTect1", + "_key": "dfc943c67a90" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "0f3600d339b1" + }, + { + "text": "MuTect2", + "_key": "a99a6bca29bc", + "_type": "span", + "marks": [ + "ec5a7e979f1f" + ] + }, + { + "text": ", ", + "_key": "a6b4d26158f8", + "_type": "span", + "marks": [] + }, + { + "text": "Strelka", + "_key": "80ff453e43b3", + "_type": "span", + "marks": [ + "ffc06c8a5432" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "0afcd1050f50" + }, + { + "_type": "span", + "marks": [ + "2a2f890970e8" + ], + "text": "Freebayes", + "_key": "dd6f5967f128" + }, + { + "_type": "span", + "marks": [], + "text": "), (", + "_key": "f759ba55a4ab" + }, + { + "_type": "span", + "marks": [ + "affda0050980" + ], + "text": "GATK HaplotyeCaller", + "_key": "ec8897e6cfe0" + }, + { + "marks": [], + "text": "), for structural variants(", + "_key": "2a408fe12fc4", + "_type": "span" + }, + { + "text": "Manta", + "_key": "26af2e6cca8b", + "_type": "span", + "marks": [ + "3d1af233cd71" + ] + }, + { + "text": ") and for CNVs (", + "_key": "277875b6e802", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "75d0de940ae1" + ], + "text": "ASCAT", + "_key": "3f1ad521d274", + "_type": "span" + }, + { + "text": "). Annotation tools (", + "_key": "af749cb8be5f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "3c1912c340f6" + ], + "text": "snpEff", + "_key": "4bd03567c4f1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "f3804b9f0ef7" + }, + { + "_type": "span", + "marks": [ + "bc474e49db24" + ], + "text": "VEP", + "_key": "d64ba9fe7363" + }, + { + "_key": "f0c23ac43bfe", + "_type": "span", + "marks": [], + "text": ") are also used, and finally " + }, + { + "_key": "4ff43b64b73f", + "_type": "span", + "marks": [ + "7062b31b0eb9" + ], + "text": "MultiQC" + }, + { + "_key": "c759b797cfc5", + "_type": "span", + "marks": [], + "text": " for handling reports." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "236fa86215f7", + "children": [ + { + "_key": "fab4f4751001", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "1115444b3f9c", + "_type": "link", + "href": "https://github.com/SciLifeLab/CAW/" + }, + { + "_key": "4f341343a76a", + "_type": "link", + "href": "https://gitter.im/SciLifeLab/CAW/" + } + ], + "children": [ + { + "_key": "0927870d18a0", + "_type": "span", + "marks": [], + "text": "We are currently working on a manuscript, but you're welcome to look at (or even contribute to) our " + }, + { + "marks": [ + "1115444b3f9c" + ], + "text": "github repository", + "_key": "4ae440c4e9b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " or talk with us on our ", + "_key": "1c7e6f22debd" + }, + { + "marks": [ + "4f341343a76a" + ], + "text": "gitter channel", + "_key": "f39052b7cf1e", + "_type": "span" + }, + { + "text": ".", + "_key": "65b9842de381", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f44d0fa4804b" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "6ee67b01bef2" + } + ], + "_type": "block", + "style": "normal", + "_key": "cbae57b40bee" + }, + { + "children": [ + { + "_key": "a1e675dd46b5", + "_type": "span", + "text": "Singularity and UPPMAX" + } + ], + "_type": "block", + "style": "h3", + "_key": "82375d0c0c2c" + }, + { + "style": "normal", + "_key": "ad28f23966ae", + "markDefs": [ + { + "_key": "6b1dda8f8f1a", + "_type": "link", + "href": "http://singularity.lbl.gov/" + } + ], + "children": [ + { + "_key": "dbbc5e85c35e", + "_type": "span", + "marks": [ + "6b1dda8f8f1a" + ], + "text": "Singularity" + }, + { + "text": " is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges.", + "_key": "69dff9bf4e2c", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "da1510958d2b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "302b409d9911" + } + ] + }, + { + "style": "normal", + "_key": "8b40855a955f", + "markDefs": [ + { + "_type": "link", + "href": "https://uppmax.uu.se/", + "_key": "f82be797dbce" + }, + { + "_type": "link", + "href": "https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/", + "_key": "83733d3d4572" + } + ], + "children": [ + { + "marks": [], + "text": "We're based in Sweden, and ", + "_key": "3f2e0e95376c", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "f82be797dbce" + ], + "text": "Uppsala Multidisciplinary Center for Advanced Computational Science", + "_key": "24111e45e4c0" + }, + { + "text": " (UPPMAX) provides Computational infrastructures for all Swedish researchers. Since we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: ", + "_key": "45b292061e84", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "83733d3d4572" + ], + "text": "SNIC-SENS", + "_key": "a9fe33761df1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6b3a7646742b" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "41726010f3fa", + "children": [ + { + "text": "", + "_key": "223b538c3960", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_key": "d9ccc539cbaf", + "_type": "link", + "href": "https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In my case, since we're still developing the pipeline, I am mainly using the research cluster ", + "_key": "b418a11871ae" + }, + { + "_key": "f4766386d16f", + "_type": "span", + "marks": [ + "d9ccc539cbaf" + ], + "text": "Bianca" + }, + { + "text": ". So I can only transfer files and data in one specific repository using SFTP.", + "_key": "cb5de068d31a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1e48dd24f1df" + }, + { + "_type": "block", + "style": "normal", + "_key": "c81d9aca8939", + "children": [ + { + "_key": "dc1cbdcbc771", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "711a5e817264", + "markDefs": [ + { + "href": "http://modules.sourceforge.net/", + "_key": "f8a2d45700e5", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "UPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time. Typically, ", + "_key": "fd1d47c71b0c" + }, + { + "_type": "span", + "marks": [ + "f8a2d45700e5" + ], + "text": "Environment Modules", + "_key": "45ebdc608a21" + }, + { + "_type": "span", + "marks": [], + "text": " are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.", + "_key": "1c0b2168107c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "dfdc4d7a5265", + "children": [ + { + "_type": "span", + "text": "", + "_key": "87a0557545fb" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Why use containers?", + "_key": "1af05e5d272f", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "b4ce1f703f10" + }, + { + "_key": "a451f3b0f292", + "markDefs": [ + { + "_type": "link", + "href": "https://www.docker.com/", + "_key": "819ab4284c07" + }, + { + "href": "http://singularity.lbl.gov/", + "_key": "e722a5846d88", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow. We cannot use ", + "_key": "a627e3673f4c", + "_type": "span" + }, + { + "_key": "7f88b603e6ff", + "_type": "span", + "marks": [ + "819ab4284c07" + ], + "text": "Docker" + }, + { + "text": " on our secure cluster, so we wanted to run CAW with ", + "_key": "d10fd389a827", + "_type": "span", + "marks": [] + }, + { + "_key": "9ff91e90189c", + "_type": "span", + "marks": [ + "e722a5846d88" + ], + "text": "Singularity" + }, + { + "_key": "680217768a1b", + "_type": "span", + "marks": [], + "text": " images instead." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "594afc570d37", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "583735e027d4" + }, + { + "_type": "block", + "style": "h3", + "_key": "5cc82bec4c7c", + "children": [ + { + "_type": "span", + "text": "How was the switch made?", + "_key": "b9eba064ce25" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2fd5b5b4bc8d", + "markDefs": [ + { + "_key": "0cc4622f76ee", + "_type": "link", + "href": "https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf" + } + ], + "children": [ + { + "_key": "81ba9fe29a97", + "_type": "span", + "marks": [], + "text": "We were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process. Because this process is quite slow, repetitive and I" + }, + { + "text": "~~'m lazy~~", + "_key": "86f9343566a4", + "_type": "span" + }, + { + "marks": [], + "text": " like to automate everything, I made a simple NF ", + "_key": "78f05308c2b2", + "_type": "span" + }, + { + "_key": "c80dcde038ad", + "_type": "span", + "marks": [ + "0cc4622f76ee" + ], + "text": "script" + }, + { + "_type": "span", + "marks": [], + "text": " to build and push all docker containers. Basically it's just ", + "_key": "40d58c805faa" + }, + { + "text": "build", + "_key": "e1f1d906d978", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " and ", + "_key": "10e5431bf0af", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "pull", + "_key": "468c093d09fa" + }, + { + "marks": [], + "text": " for all containers, with some configuration possibilities.", + "_key": "57938a3ddb31", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "65165a0413c5", + "children": [ + { + "text": "", + "_key": "1e1281cc4660", + "_type": "span" + } + ] + }, + { + "code": "docker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}", + "_type": "code", + "_key": "7981801cbeed" + }, + { + "_key": "dc0efc43a64d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "300b67a22d42" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Since Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.", + "_key": "7cbb1511785f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "71b4b6d6cc7f", + "markDefs": [] + }, + { + "children": [ + { + "_key": "8ec40d876508", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "eeddff34f6f9" + }, + { + "code": "singularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}", + "_type": "code", + "_key": "7d5e902fc937" + }, + { + "_key": "51c0d6ad7268", + "children": [ + { + "_key": "3240935023e8", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config", + "_key": "cebbf0178117", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile. I'll spare you the details of the SFTP transfer. This is what the configuration file for such Singularity images looks like: ", + "_key": "dfa0c39bab14" + }, + { + "_key": "0480b3a9196d", + "_type": "span", + "marks": [ + "cebbf0178117" + ], + "text": "`singularity-path.config`" + } + ], + "_type": "block", + "style": "normal", + "_key": "6a828caff9a6" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "eb76bcac33bc" + } + ], + "_type": "block", + "style": "normal", + "_key": "0537320059ac" + }, + { + "_type": "code", + "_key": "6d498f3b1666", + "code": "/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}" + }, + { + "children": [ + { + "text": "", + "_key": "094a711d3b0b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2255d8ad22ba" + }, + { + "_key": "b792900f8d76", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name...", + "_key": "3e773860cba6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "74150686d861", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8cb827f459d9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "cf88e53506ee", + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "617aec92026d" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ca50cf71162e", + "_type": "span", + "marks": [], + "text": "This switch was completed a couple of months ago and has been a great success. We are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI. Even if we do enjoy the improved control, we must not forgot that:" + } + ], + "_type": "block", + "style": "normal", + "_key": "449863778fd8" + }, + { + "_type": "block", + "style": "normal", + "_key": "239f444c4481", + "children": [ + { + "text": "", + "_key": "7c9d480ae59c", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "3b97c6dd8db9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> With great power comes great responsibility!", + "_key": "6b83a6d639a8" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e22b13c61dc4" + } + ], + "_type": "block", + "style": "normal", + "_key": "039aa16680d4" + }, + { + "_type": "block", + "style": "h3", + "_key": "7d6d9a7c878e", + "children": [ + { + "_type": "span", + "text": "Credits", + "_key": "5966c10264f9" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "92c3bb0be030", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/Hammarn", + "_key": "b929c7c6c76f" + }, + { + "href": "http://phil.ewels.co.uk/", + "_key": "db38f32bd3af", + "_type": "link" + } + ], + "children": [ + { + "text": "Thanks to ", + "_key": "280624830fa5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "b929c7c6c76f" + ], + "text": "Rickard Hammarén", + "_key": "e783ab24478a" + }, + { + "_key": "29dd480cdda4", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "text": "Phil Ewels", + "_key": "439fe18a2258", + "_type": "span", + "marks": [ + "db38f32bd3af" + ] + }, + { + "marks": [], + "text": " for comments and suggestions for improving the post.", + "_key": "3ed8309b3f11", + "_type": "span" + } + ] + } + ], + "author": { + "_ref": "c121be61-087a-4ca7-a3c2-1729e5d706f3", + "_type": "reference" + }, + "_id": "4f0eddd5c83c", + "_createdAt": "2024-09-25T14:15:12Z", + "meta": { + "slug": { + "current": "caw-and-singularity" + } + }, + "title": "Running CAW with Singularity and Nextflow", + "_updatedAt": "2024-09-26T09:01:36Z", + "tags": [ + { + "_key": "52686bd7856a", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "2346151467fb", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "publishedAt": "2017-11-16T07:00:00.000Z", + "_rev": "g7tG3ShgLiOybM4TXYtt6r" + }, + { + "_updatedAt": "2024-10-02T13:55:57Z", + "_type": "blogPost", + "publishedAt": "2016-02-04T07:00:00.000Z", + "meta": { + "description": "As a new bioinformatics student with little formal computer science training, there are few things that scare me more than PhD committee meetings and having to run my code in a completely different operating environment.", + "slug": { + "current": "developing-bioinformatics-pipeline-across-multiple-environments" + } + }, + "body": [ + { + "style": "normal", + "_key": "0b36fa3f630d", + "markDefs": [], + "children": [ + { + "text": "As a new bioinformatics student with little formal computer science training, there are few things that scare me more than PhD committee meetings and having to run my code in a completely different operating environment.", + "_key": "a4c1d97a50cd", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3c60a17143d2", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7f71794298d8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6f0a1be725d3", + "markDefs": [ + { + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Univa_Grid_Engine", + "_key": "db498cbe4bfb" + }, + { + "href": "http://www.bsc.es", + "_key": "d43b1cf0fab1", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Recently my work landed me in the middle of the phylogenetic tree jungle and the computational requirements of my project far outgrew the resources that were available on our institute’s ", + "_key": "0739d43394bf" + }, + { + "marks": [ + "db498cbe4bfb" + ], + "text": "Univa Grid Engine", + "_key": "26bbebd55519", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " based cluster. Luckily for me, an opportunity arose to participate in a joint program at the MareNostrum HPC at the ", + "_key": "afd6a3f83f86" + }, + { + "_type": "span", + "marks": [ + "d43b1cf0fab1" + ], + "text": "Barcelona Supercomputing Centre", + "_key": "bc0092f22b6e" + }, + { + "text": " (BSC).", + "_key": "e8fecd0a44d2", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7e7e58379eb8" + } + ], + "_type": "block", + "style": "normal", + "_key": "43f45d9085ea", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://www.bsc.es/discover-bsc/the-centre/marenostrum", + "_key": "71b1efd06bb6", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As one of the top 100 supercomputers in the world, the ", + "_key": "470fd7b5ac2c" + }, + { + "_key": "a3d121da5a9d", + "_type": "span", + "marks": [ + "71b1efd06bb6" + ], + "text": "MareNostrum III" + }, + { + "text": " dwarfs our cluster and consists of nearly 50'000 processors. However it soon became apparent that with great power comes great responsibility and in the case of the BSC, great restrictions. These include no internet access, restrictive wall times for jobs, longer queues, fewer pre-installed binaries and an older version of bash. Faced with the possibility of having to rewrite my 16 bodged scripts for another queuing system I turned to Nextflow.", + "_key": "353e24b139b3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8e2f3402708c" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1f200d0e40c6" + } + ], + "_type": "block", + "style": "normal", + "_key": "22e9cdc235c1" + }, + { + "markDefs": [], + "children": [ + { + "text": "Straight off the bat I was able to reduce all my previous scripts to a single Nextflow script. Admittedly, the original code was not great, but the data processing model made me feel confident in what I was doing and I was able to reduce the volume of code to 25% of its initial amount whilst making huge improvements in the readability. The real benefits however came from the portability.", + "_key": "0028ffdfa590", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "00412316556d" + }, + { + "_key": "63fcec60e600", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b609f84d8d9c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Platform_LSF", + "_key": "a28ccdd5264c" + }, + { + "_type": "link", + "href": "/blog/2015/mpi-like-execution-with-nextflow.html", + "_key": "b56c39de2824" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I was able to write the project on my laptop (Macbook Air), continuously test it on my local desktop machine (Linux) and then perform more realistic heavy lifting runs on the cluster, all managed from a single GitHub repository. The BSC uses the ", + "_key": "feac2fcfafb4" + }, + { + "_key": "9137207ae85f", + "_type": "span", + "marks": [ + "a28ccdd5264c" + ], + "text": "Load Sharing Facility" + }, + { + "_key": "e1f5505bcc40", + "_type": "span", + "marks": [], + "text": " (LSF) platform with longer queue times, but a large number of CPUs. My project on the other hand had datasets that require over 100'000 tasks, but the tasks processes themselves run for a matter of seconds or minutes. We were able to marry these two competing interests deploying Nextflow in a " + }, + { + "text": "distributed execution manner that resemble the one of an MPI application", + "_key": "f0e0969bb592", + "_type": "span", + "marks": [ + "b56c39de2824" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "840b93914905" + } + ], + "_type": "block", + "style": "normal", + "_key": "5b6e8f1ff75d" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "e7b51bef3be8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e709542e24f2", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "In this configuration, the queuing system allocates the Nextflow requested resources and using the embedded ", + "_key": "0f48af11ef84", + "_type": "span" + }, + { + "_key": "380f3e8db74c", + "_type": "span", + "marks": [ + "2ecafe164e43" + ], + "text": "Apache Ignite" + }, + { + "text": " clustering engine, Nextflow handles the submission of processes to the individual nodes.", + "_key": "e9060c6026b3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a3cc5d75ed20", + "markDefs": [ + { + "_type": "link", + "href": "https://ignite.apache.org/", + "_key": "2ecafe164e43" + } + ] + }, + { + "style": "normal", + "_key": "714aa6049f8d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b4adcbf5b019", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Here is some examples of how to run the same Nextflow project over multiple platforms.", + "_key": "7f1d06ceed71" + } + ], + "_type": "block", + "style": "normal", + "_key": "af18db018f1d" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3c0c2c1afd59", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ed1edccffba6" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Local", + "_key": "b636a86a840f" + } + ], + "_type": "block", + "style": "h4", + "_key": "617daea92f56" + }, + { + "_key": "acd7c0c7eb14", + "markDefs": [], + "children": [ + { + "_key": "67b9ac226123", + "_type": "span", + "marks": [], + "text": "If I wished to launch a job locally I can run it with the command:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c6f30a0001e1", + "markDefs": [], + "children": [ + { + "_key": "1cea880b896e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "nextflow run myproject.nf", + "_type": "code", + "_key": "a47c9d825b9e" + }, + { + "_type": "block", + "style": "h4", + "_key": "e97a016d2fc0", + "markDefs": [], + "children": [ + { + "text": "Univa Grid Engine (UGE)", + "_key": "9a90f0d708f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "32578d372f9d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the UGE I simply needed to specify the following in the ", + "_key": "e6fda01d6317" + }, + { + "_key": "0ff3f079d142", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config" + }, + { + "_type": "span", + "marks": [], + "text": " file:", + "_key": "c69a73aca4f8" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ffb9a5a90cdf" + } + ], + "_type": "block", + "style": "normal", + "_key": "686743efef97" + }, + { + "_key": "5a41750313c4", + "code": "process {\n executor='uge'\n queue='my_queue'\n}", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "58541385a7ce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "And then launch the pipeline execution as we did before:", + "_key": "cb8252b251ef" + } + ] + }, + { + "children": [ + { + "_key": "917d1be366bc", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b7cf33a1c1c", + "markDefs": [] + }, + { + "_type": "code", + "_key": "6bc483d39d1f", + "code": "nextflow run myproject.nf" + }, + { + "_type": "block", + "style": "h4", + "_key": "5f6ca8e3ccb4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Load Sharing Facility (LSF)", + "_key": "9d345ec573f6" + } + ] + }, + { + "_key": "5cf745e4feb5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For running the same pipeline in the MareNostrum HPC environment, taking advantage of the MPI standard to deploy my workload, I first created a wrapper script (for example ", + "_key": "574ad2f77840", + "_type": "span" + }, + { + "text": "bsc-wrapper.sh", + "_key": "a7e434c9a1cd", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "a31766699b87", + "_type": "span", + "marks": [], + "text": ") declaring the resources that I want to reserve for the pipeline execution:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "d9c2aff93d23", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "85bda35c7c6c", + "markDefs": [] + }, + { + "_key": "ee8656e5d47a", + "code": "#!/bin/bash\n#BSUB -oo logs/output_%J.out\n#BSUB -eo logs/output_%J.err\n#BSUB -J myProject\n#BSUB -q bsc_ls\n#BSUB -W 2:00\n#BSUB -x\n#BSUB -n 512\n#BSUB -R \"span[ptile=16]\"\nexport NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)\nmpirun --pernode bin/nextflow run concMSA.nf -with-mpi", + "_type": "code" + }, + { + "style": "normal", + "_key": "0f8d5e627b66", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "And then can execute it using ", + "_key": "59f700ea9f3e", + "_type": "span" + }, + { + "text": "bsub", + "_key": "78740e128847", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " as shown below:", + "_key": "cc2767bdec19", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9ba90be09ded", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3a0517bacd79" + } + ] + }, + { + "code": "bsub < bsc-wrapper.sh", + "_type": "code", + "_key": "0dec03443e56" + }, + { + "_type": "block", + "style": "normal", + "_key": "742300ca9ce1", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/getstarted.html?highlight=resume#modify-and-resume", + "_key": "9d682263bd9a" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "By running Nextflow in this way and given the wrapper above, a single ", + "_key": "8811f7ad3a7c" + }, + { + "_key": "a3a508ece92c", + "_type": "span", + "marks": [ + "code" + ], + "text": "bsub" + }, + { + "_key": "4c12180e404f", + "_type": "span", + "marks": [], + "text": " job will run on 512 cores in 32 computing nodes (512/16 = 32) with a maximum wall time of 2 hours. Thousands of Nextflow processes can be spawned during this and the execution can be monitored in the standard manner from a single Nextflow output and error files. If any errors occur the execution can of course to continued with " + }, + { + "_key": "5d3cf754246e", + "_type": "span", + "marks": [ + "9d682263bd9a" + ], + "text": "`-resume` command line option" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d050dd9a5d70" + } + ] + }, + { + "style": "normal", + "_key": "ef8fa6091372", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "08b706c77b83", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "b14a0aca49d4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "a1ca81102c1f" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "fb95f3ed5208", + "markDefs": [], + "children": [ + { + "text": "Nextflow provides a simplified way to develop across multiple platforms and removes much of the overhead associated with running niche, user developed pipelines in an HPC environment.", + "_key": "cac33061df57", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "efd26efcb394" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "998f1b93b325" + } + ], + "_id": "5210e5ecc723", + "title": "Developing a bioinformatics pipeline across multiple environments", + "_createdAt": "2024-09-25T14:15:04Z", + "_rev": "Ot9x7kyGeH5005E3MJ8z3o", + "author": { + "_ref": "evan-floden", + "_type": "reference" + } + }, + { + "_type": "blogPost", + "publishedAt": "2023-04-25T06:00:00.000Z", + "title": "Celebrating our largest international training event and hackathon to date", + "_updatedAt": "2024-09-30T09:50:31Z", + "meta": { + "description": "In mid-March, we conducted our bi-annual Nextflow and nf-core training and hackathon in what was unquestionably our best-attended community events to date. This year we had an impressive 1,345 participants attend the training from 76 countries. Attendees came from far and wide — from Algeria to Andorra to Zambia to Zimbabwe!", + "slug": { + "current": "celebrating-our-largest-international-training-event-and-hackathon-to-date" + } + }, + "body": [ + { + "_key": "f74f380b6aa0", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "90d55f5bc90f" + } + ], + "children": [ + { + "marks": [], + "text": "In mid-March, we conducted our bi-annual Nextflow and ", + "_key": "cd8d22bd916f", + "_type": "span" + }, + { + "marks": [ + "90d55f5bc90f" + ], + "text": "nf-core", + "_key": "4946b78ec00f", + "_type": "span" + }, + { + "marks": [], + "text": " training and hackathon in what was unquestionably our best-attended community events to date. This year we had an impressive ", + "_key": "a58afd886090", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "1,345 participants", + "_key": "d1f20dc963a6" + }, + { + "_key": "75f0d6e8725c", + "_type": "span", + "marks": [], + "text": " attend the training from " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "76 countries", + "_key": "6cce4de9885f" + }, + { + "text": ". Attendees came from far and wide — from Algeria to Andorra to Zambia to Zimbabwe!", + "_key": "3b07c81f40b2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "93757cd7cf59", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "adf4c95fb838", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "509c9bdfd8d0", + "markDefs": [], + "children": [ + { + "text": "Among our event attendees, we observed the following statistics:", + "_key": "a693b9eaebf90", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "40% were 30 years old or younger, pointing to a young cohort of Nextflow users", + "_key": "6ea1c9e3bd230" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "401111c4b8f7", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "9dd7368ecfa7", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "55.3% identified as male vs. 40% female, highlighting our growing diversity", + "_key": "553b886f27160", + "_type": "span" + } + ], + "level": 1 + }, + { + "_key": "920831cc1322", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "68.2% came from research institutions", + "_key": "bc77c06b630f0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "71.4% were attending their first Nextflow training event", + "_key": "df6dd574f3370", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "839707345b77" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "96.7% had never attended a Nextflow hackathon", + "_key": "2f4e07d3f5fa0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c8cb59d82af7" + }, + { + "children": [ + { + "_key": "c641bb176c01", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f5fc152f9bf", + "markDefs": [] + }, + { + "_key": "185b35fdf1b1", + "markDefs": [ + { + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo", + "_key": "2ae097a6b418", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Read on to learn more about these exciting events. If you missed it, you can still ", + "_key": "788d7d542c7a", + "_type": "span" + }, + { + "marks": [ + "2ae097a6b418" + ], + "text": "watch the Nextflow & nf-core training", + "_key": "93dcf12dab61", + "_type": "span" + }, + { + "_key": "bfaffa2f93ad", + "_type": "span", + "marks": [], + "text": " at your convenience." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d7a15d59c234", + "markDefs": [], + "children": [ + { + "_key": "9c446949cd3d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "b2585ae9ee80", + "markDefs": [], + "children": [ + { + "text": "Multilingual training", + "_key": "ec5bd7c78570", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "26762c9953ac", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2023/training-march-2023", + "_key": "1d7988202280" + } + ], + "children": [ + { + "text": "This year, we were pleased to offer ", + "_key": "d893346efa7b", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "1d7988202280" + ], + "text": "Nextflow / nf-core training", + "_key": "e4e5fcda544e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in multiple languages: in addition to English, we delivered sessions in French, Hindi, Portuguese, and Spanish.", + "_key": "edf4fe62d96e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "69876880295b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6467e25f45fb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "In our pre-event registration, ", + "_key": "c1b3d9c9d446", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "~88%", + "_key": "5b076dcc0de5" + }, + { + "_type": "span", + "marks": [], + "text": " of respondents indicated they would watch the training in English. However, there turned out to be a surprising appetite for training in other languages. We hope that multilingual training will make Nextflow even more accessible to talented scientists and researchers around the world.", + "_key": "da7c21bd3cbd" + } + ], + "_type": "block", + "style": "normal", + "_key": "b80bc18bc70b", + "markDefs": [] + }, + { + "style": "normal", + "_key": "30f565f09d4b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "945063a9dff1" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The training consisted of four separate sessions in ", + "_key": "b03485d755ee", + "_type": "span" + }, + { + "_key": "170f019afbac", + "_type": "span", + "marks": [ + "strong" + ], + "text": "5 languages" + }, + { + "marks": [], + "text": " for a total of ", + "_key": "12ea469e3ac5", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "20 sessions", + "_key": "557ebf57c19d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". As of April 19, we’ve amassed over ", + "_key": "e9c875d7f223" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "6,600 YouTube views", + "_key": "5208b4277622" + }, + { + "_type": "span", + "marks": [], + "text": " with ", + "_key": "d05f97ba7439" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "2,300+ hours", + "_key": "f7b514d956da" + }, + { + "_type": "span", + "marks": [], + "text": " of training watched so far. ", + "_key": "5327dec00966" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "27%", + "_key": "4d1deba44373" + }, + { + "_key": "2b5fddc4a3e8", + "_type": "span", + "marks": [], + "text": " have watched the non-English sessions, making the effort at translation highly worthwhile." + } + ], + "_type": "block", + "style": "normal", + "_key": "46325ddd653f" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6b12ca8a2e06", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f9bf93cc763f" + }, + { + "markDefs": [ + { + "href": "https://twitter.com/Chris_Hakk", + "_key": "9d86525d1b7f", + "_type": "link" + }, + { + "href": "https://twitter.com/mribeirodantas", + "_key": "5efb1e4fbe8b", + "_type": "link" + }, + { + "_key": "60b27b892f14", + "_type": "link", + "href": "https://twitter.com/gau" + }, + { + "_type": "link", + "href": "https://twitter.com/juliamirpedrol", + "_key": "5697c9294bc1" + }, + { + "href": "https://twitter.com/GGabernet", + "_key": "95b6079975bd", + "_type": "link" + }, + { + "_type": "link", + "href": "https://twitter.com/abhi18av", + "_key": "c792980845f4" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Thank you to the following people who delivered the training: ", + "_key": "0cdb3be44780" + }, + { + "_type": "span", + "marks": [ + "9d86525d1b7f" + ], + "text": "Chris Hakkaart", + "_key": "a2df6d66bef5" + }, + { + "_type": "span", + "marks": [], + "text": " (English), ", + "_key": "d83de59abca4" + }, + { + "_key": "52d54c1d65c2", + "_type": "span", + "marks": [ + "5efb1e4fbe8b" + ], + "text": "Marcel Ribeiro-Dantas" + }, + { + "_key": "5c183ef9b74c", + "_type": "span", + "marks": [], + "text": " (Portuguese), " + }, + { + "_type": "span", + "marks": [ + "60b27b892f14" + ], + "text": "Maxime Garcia", + "_key": "24f335e0b0e8" + }, + { + "_type": "span", + "marks": [], + "text": " (French), ", + "_key": "9f9358c56ba1" + }, + { + "_type": "span", + "marks": [ + "5697c9294bc1" + ], + "text": "Julia Mir Pedrol", + "_key": "5a32b0e6ebea" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "183a48c0758f" + }, + { + "text": "Gisela Gabernet", + "_key": "56bc58947f1d", + "_type": "span", + "marks": [ + "95b6079975bd" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (Spanish), and ", + "_key": "8877a236b9fe" + }, + { + "_key": "a06209002c43", + "_type": "span", + "marks": [ + "c792980845f4" + ], + "text": "Abhinav Sharma" + }, + { + "_type": "span", + "marks": [], + "text": " (Hindi).", + "_key": "56d7e70eac3a" + } + ], + "_type": "block", + "style": "normal", + "_key": "1e334dd7e8d9" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c813f9969590" + } + ], + "_type": "block", + "style": "normal", + "_key": "6a4562431f79", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You can view the community training sessions on YouTube here:", + "_key": "c1dddb43bb970", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b41dfe3ac529" + }, + { + "_key": "76617390c223", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhoWxHR0CS-7xzu5eRT8uHo", + "_key": "25c597947b1e" + } + ], + "children": [ + { + "text": "March 2023 Community Training – English", + "_key": "6ef8f924b7270", + "_type": "span", + "marks": [ + "25c597947b1e" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "6072cdf9b28e" + ], + "text": "March 2023 Community Training – Portugese", + "_key": "10faa4a727a10" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "493fab60977e", + "listItem": "bullet", + "markDefs": [ + { + "_key": "6072cdf9b28e", + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhi41yDYhyHitUhIcUHIbJg" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhiv9SjhoA1EDOXj9nzIqdS", + "_key": "e738e55271df" + } + ], + "children": [ + { + "text": "March 2023 Community Training – French", + "_key": "ef20f64001c80", + "_type": "span", + "marks": [ + "e738e55271df" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "48367b809e84", + "listItem": "bullet" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhSlCWVoa3GURacuLWeFc8O", + "_key": "401c59b8e71e" + } + ], + "children": [ + { + "_key": "577a35f55c5a0", + "_type": "span", + "marks": [ + "401c59b8e71e" + ], + "text": "March 2023 Community Training – Spanish" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d6c46fa56c8e" + }, + { + "style": "normal", + "_key": "c955a429e8a2", + "listItem": "bullet", + "markDefs": [ + { + "_key": "12b36b1c7e53", + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "12b36b1c7e53" + ], + "text": "March 2023 Community Training – Hindi", + "_key": "e81667aa6a5d0" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "0fc7a3cff09e", + "markDefs": [ + { + "_type": "link", + "href": "https://training.nextflow.io/", + "_key": "f97d8acb1531" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The videos accompany the written training material, which you can find at ", + "_key": "9972703785950" + }, + { + "_key": "9972703785951", + "_type": "span", + "marks": [ + "f97d8acb1531" + ], + "text": "https://training.nextflow.io/" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "857c1b338e210" + } + ], + "_type": "block", + "style": "normal", + "_key": "09992c336794" + }, + { + "markDefs": [], + "children": [ + { + "text": "Improved community training resources", + "_key": "c755f0e5f474", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "a457f8a2f2fd" + }, + { + "style": "normal", + "_key": "db7eba5beb1c", + "markDefs": [ + { + "_type": "link", + "href": "https://training.nextflow.io/", + "_key": "0e6cde2aef90" + }, + { + "_type": "link", + "href": "https://training.nextflow.io/basic_training/setup/#gitpod", + "_key": "4974d79fdf45" + } + ], + "children": [ + { + "text": "Along with the updated training and hackathon resources above, we’ve significantly enhanced our online training materials available at ", + "_key": "09e9a6025011", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "0e6cde2aef90" + ], + "text": "https://training.nextflow.io/", + "_key": "066dfcda3cd6" + }, + { + "marks": [], + "text": ". Thanks to the efforts of our volunteers, technical training, ", + "_key": "16497fb9b039", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "4974d79fdf45" + ], + "text": "Gitpod resources", + "_key": "b5a2cd924ec9" + }, + { + "marks": [], + "text": ", and materials for hands-on, self-guided learning are now available in English and Portuguese. Some of the materials are also available in Spanish and French.", + "_key": "b78f73d6dcb1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6ce348fac324" + } + ], + "_type": "block", + "style": "normal", + "_key": "915ce13a10e4", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "557d3f4a9223", + "markDefs": [ + { + "href": "https://nf-co.re/events/2023/bytesize_translations", + "_key": "1baef680afd7", + "_type": "link" + } + ], + "children": [ + { + "_key": "c74ef2ff4706", + "_type": "span", + "marks": [], + "text": "The training comprises a significant set of resources covering topics including managing dependencies, containers, channels, processes, operators, and an introduction to the Groovy language. It also includes topics related to nf-core for users and developers as well as Nextflow Tower. Marcel Ribeiro-Dantas describes his experience leading the translation effort for this documentation in his latest nf-core/bytesize " + }, + { + "_type": "span", + "marks": [ + "1baef680afd7" + ], + "text": "translation talk", + "_key": "efab97fd52c1" + }, + { + "marks": [], + "text": ".", + "_key": "d772c24b9bd5", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "117c8519c79b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9e2cd26b70cc", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "16d846210d02", + "_type": "span", + "marks": [], + "text": "Additional educational resources are provided in the recent Seqera Labs blog article, " + }, + { + "_type": "span", + "marks": [ + "70a9ee3d59d4" + ], + "text": "Learn Nextflow in 2023", + "_key": "3f1559dc4dc9" + }, + { + "_type": "span", + "marks": [], + "text": ", posted in February before our latest training event.", + "_key": "4b43711fbd26" + } + ], + "_type": "block", + "style": "normal", + "_key": "9dee413e2be0", + "markDefs": [ + { + "_key": "70a9ee3d59d4", + "_type": "link", + "href": "https://nextflow.io/blog/2023/learn-nextflow-in-2023.html" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "eee284792500", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "86ed35d7c14b" + }, + { + "_key": "30c2673d7356", + "markDefs": [], + "children": [ + { + "_key": "63b49118f328", + "_type": "span", + "marks": [], + "text": "The nf-core hackathon" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [ + { + "href": "https://nf-co.re/events/2023/hackathon-march-2023", + "_key": "b2d86ddbea0f", + "_type": "link" + } + ], + "children": [ + { + "_key": "ce74a270a4e3", + "_type": "span", + "marks": [], + "text": "We also ran a separate " + }, + { + "marks": [ + "b2d86ddbea0f" + ], + "text": "hackathon", + "_key": "c299b9f9a215", + "_type": "span" + }, + { + "text": " event from March 27th to 29th. This hackathon ran online via Gather, a virtual hosting platform, but for the first time we also asked community members to host local sites. We were blown away by the response, with volunteers coming forward to organize in-person attendance in 16 different locations across the world (and this was before we announced that Seqera would organize pizza for all the sites!). These gatherings had a big impact on the feel of the hackathon, whilst remaining accessible and eco-friendly, avoiding the need for air travel.", + "_key": "53c8a1569c4d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "78010d10238f" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "69bf4a7dcab9" + } + ], + "_type": "block", + "style": "normal", + "_key": "c9860c9dfa43" + }, + { + "_key": "7decb8a4fd97", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The hackathon was divided into five focus areas: modules, pipelines, documentation, infrastructure, and subworkflows. We had ", + "_key": "e7aad38c8614" + }, + { + "marks": [ + "strong" + ], + "text": "411", + "_key": "7386d6b58b41", + "_type": "span" + }, + { + "marks": [], + "text": " people register, including ", + "_key": "f2d905b1921f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "278 in-person attendees", + "_key": "8f69527808e5" + }, + { + "_type": "span", + "marks": [], + "text": " at ", + "_key": "a30bbdbb6793" + }, + { + "marks": [ + "strong" + ], + "text": "16 locations", + "_key": "d6e855a1369d", + "_type": "span" + }, + { + "_key": "b2f455be419e", + "_type": "span", + "marks": [], + "text": ". This is an increase of " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "38%", + "_key": "ac6c863635ad" + }, + { + "marks": [], + "text": " compared to the ", + "_key": "dd69ff74c969", + "_type": "span" + }, + { + "_key": "855f28c729e1", + "_type": "span", + "marks": [ + "strong" + ], + "text": "289" + }, + { + "text": " people that attended our October 2022 event. The hackathon was hosted in multiple countries including Brazil, France, Germany, Italy, Poland, Senegal, Serbia, South Africa, Spain, Sweden, the UK, and the United States.", + "_key": "c9fe094870db", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "bcabc1f5ea3a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "518a4bbc462a", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We would like to thank the many organizations worldwide who provided a venue to host the hackathon and helped make it a resounding success. Besides being an excellent educational event, we resolved many longstanding Nextflow and nf-core issues.", + "_key": "af55e8bb3c3c" + } + ], + "_type": "block", + "style": "normal", + "_key": "12de9b91d952" + }, + { + "_type": "block", + "style": "normal", + "_key": "26a5d16d2727", + "markDefs": [], + "children": [ + { + "_key": "362b275555d7", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "image", + "alt": "Hackathon photo", + "_key": "b4726901a091", + "asset": { + "_type": "reference", + "_ref": "image-d57b62acafc31e78a79b462f923c5c908a8679e0-4000x2250-jpg" + } + }, + { + "style": "normal", + "_key": "ecf21343616b", + "markDefs": [], + "children": [ + { + "text": "You can access the project reports from each hackathon team over the three-day event compiled in HackMD below:", + "_key": "a0f302e330d10", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6c7a1ad85ab7", + "listItem": "bullet", + "markDefs": [ + { + "_key": "f98d406118de", + "_type": "link", + "href": "https://hackmd.io/A5v4soteQjKywl3UgFa_6g" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "f98d406118de" + ], + "text": "Modules team", + "_key": "27f09f421b650" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "_key": "f9a59f1f12600", + "_type": "span", + "marks": [ + "91faaae9dcd9" + ], + "text": "Pipelines Team" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "7610918bb88d", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://hackmd.io/Bj_MK3ubQWGBD4t0X2KpjA", + "_key": "91faaae9dcd9" + } + ] + }, + { + "style": "normal", + "_key": "b14ed27e9bf0", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://hackmd.io/o6AgPTZ7RBGCyZI72O1haA", + "_key": "c3a88b0554b3", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "c3a88b0554b3" + ], + "text": "Documentation Team", + "_key": "2d719c08437a0", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "7f367d1eac31", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://hackmd.io/uC-mZlEXQy6DaXZdjV6akA", + "_key": "c472f005f636", + "_type": "link" + } + ], + "children": [ + { + "_key": "b65a163a4ac80", + "_type": "span", + "marks": [ + "c472f005f636" + ], + "text": "Infrastructure Team" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "46b83faf0e1e", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://hackmd.io/Udtvj4jASsWLtMgrbTNwBA", + "_key": "eb53dcff6d07" + } + ], + "children": [ + { + "text": "Subworkflows Team", + "_key": "6f484cf93eb60", + "_type": "span", + "marks": [ + "eb53dcff6d07" + ] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "8fe0488158e6", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "066f712c984f" + }, + { + "_type": "block", + "style": "normal", + "_key": "e5feb5694d4d", + "markDefs": [ + { + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNhfyF_QJIfSslnxRCU817yc", + "_key": "44aabe4cba4c", + "_type": "link" + }, + { + "_key": "29a43310ed23", + "_type": "link", + "href": "https://github.com/orgs/nf-core/projects/38/views/16?layout=board" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can also view ten Hackathon videos outlining the event, introducing an overview of the teams, and daily hackathon activities in the ", + "_key": "08d977d94770" + }, + { + "_key": "bf191fbff946", + "_type": "span", + "marks": [ + "44aabe4cba4c" + ], + "text": "March 2023 nf-core hackathon YouTube playlist" + }, + { + "text": ". Check out activity in the nf-core hackathon March 2023 Github ", + "_key": "4c4ddb729bcd", + "_type": "span", + "marks": [] + }, + { + "text": "issues board", + "_key": "7b55bd357d5a", + "_type": "span", + "marks": [ + "29a43310ed23" + ] + }, + { + "_type": "span", + "marks": [], + "text": " for a summary of what each team worked on.", + "_key": "55fb72f9c519" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "bcb1b599505b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "221cdcf3f06c" + }, + { + "_key": "ea82a40ccd3a", + "markDefs": [], + "children": [ + { + "_key": "fc456beb0f49", + "_type": "span", + "marks": [], + "text": "A diverse and growing community" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "91ae1567fb42", + "markDefs": [ + { + "href": "https://nextflow.io/blog/2023/czi-mentorship-round-2.html", + "_key": "9d5953d995a1", + "_type": "link" + } + ], + "children": [ + { + "text": "We were particularly pleased to see the growing diversity of the Nextflow and nf-core community membership, enabled partly by support from the Chan Zuckerberg Initiative Diversity and Inclusion grant and our nf-core mentorship programs. You can learn more about our mentorship efforts and exciting efforts of our global team in Chris Hakkaart’s excellent post, ", + "_key": "dd902cbe572a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "9d5953d995a1" + ], + "text": "Nextflow and nf-core Mentorship", + "_key": "a42cf43d2160" + }, + { + "_type": "span", + "marks": [], + "text": " on the Nextflow blog.", + "_key": "9b4e01ecce4a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "c3e0127475cf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8a0c89a417a3", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "90ef9228534c", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results/", + "_key": "5cd70f4e6d95" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The growing diversity of our community was also reflected in the results of our latest Nextflow Community survey, which you can read more about on the ", + "_key": "0f5048f06cf1" + }, + { + "_type": "span", + "marks": [ + "5cd70f4e6d95" + ], + "text": "Seqera Labs blog", + "_key": "82f0954eb9c2" + }, + { + "marks": [], + "text": ".", + "_key": "5e7c401d084b", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "bb7e341fc6f6", + "markDefs": [], + "children": [ + { + "_key": "b84b019d7d88", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "Hackathon photo", + "_key": "090a4c9df6f6", + "asset": { + "_ref": "image-a734dc5ff7fb25c55689cdaac7b8a0991c92135f-1600x900-jpg", + "_type": "reference" + } + }, + { + "_key": "3b910d2315d9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Looking forward", + "_key": "9354bd614305" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "822a0ea65c1d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Running global events at this scale takes a tremendous team effort. The resources compiled will be valuable in introducing more people to Nextflow and nf-core. Thanks to everyone who participated in this year’s training and hackathon events. We look forward to making these even bigger and better in the future!", + "_key": "45098b969a4b" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "fa1f370abee1", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "cbf7a0c97aba" + }, + { + "style": "normal", + "_key": "4e596d8f0008", + "markDefs": [], + "children": [ + { + "_key": "763e9bc6332f", + "_type": "span", + "marks": [], + "text": "The next community training will be held online September 2023. This will be followed by two Nextflow Summit events with associated nf-core hackathons:" + } + ], + "_type": "block" + }, + { + "_key": "b9ee43eaf25d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0238376c3ee7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "d82889d891de", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Barcelona: October 16-20, 2023Boston: November 2023 (dates to be confirmed)", + "_key": "95c07a2102db", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "7b1a58b28ade", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "2fdd80963170", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you’d like to join, you can register to receive news and updates about the events at ", + "_key": "b69c400ebec9" + }, + { + "marks": [ + "478488031598" + ], + "text": "https://summit.nextflow.io/summit-2023-preregistration/", + "_key": "49189fcae9b5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e16d79ae7611", + "markDefs": [ + { + "href": "https://summit.nextflow.io/summit-2023-preregistration/", + "_key": "478488031598", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "01d37ed30bbd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5b70ee9ab4cf" + }, + { + "children": [ + { + "marks": [], + "text": "You can follow us on Twitter at ", + "_key": "87a04d104a2a", + "_type": "span" + }, + { + "_key": "ec82ac6a6686", + "_type": "span", + "marks": [ + "269fbd4e4e00" + ], + "text": "@nextflowio" + }, + { + "marks": [], + "text": " or ", + "_key": "97ee9e4f1a70", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "f436d0ef5811" + ], + "text": "@nf_core", + "_key": "c956f4bddb43" + }, + { + "_key": "18df91171ee4", + "_type": "span", + "marks": [], + "text": " or join the discussion on the " + }, + { + "_key": "e8b67ad4b9c1", + "_type": "span", + "marks": [ + "215c1775b8fe" + ], + "text": "Nextflow" + }, + { + "marks": [], + "text": " and ", + "_key": "8711575299e4", + "_type": "span" + }, + { + "_key": "6aaf674e62f7", + "_type": "span", + "marks": [ + "60e1586785da" + ], + "text": "nf-core" + }, + { + "_type": "span", + "marks": [], + "text": " community Slack channels.", + "_key": "d236bdfbcdaf" + } + ], + "_type": "block", + "style": "normal", + "_key": "f90e9c28852b", + "markDefs": [ + { + "_key": "269fbd4e4e00", + "_type": "link", + "href": "https://twitter.com/nextflowio" + }, + { + "href": "https://twitter.com/nf_core", + "_key": "f436d0ef5811", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/slack-invite.html", + "_key": "215c1775b8fe" + }, + { + "_type": "link", + "href": "https://nf-co.re/join", + "_key": "60e1586785da" + } + ] + }, + { + "style": "normal", + "_key": "750720d0e7b3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "876d72ad0097" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "Hackathon photo", + "_key": "9bd5bc4b92ec", + "asset": { + "_ref": "image-b49849f4e96bd2a35128fdfdb7f68faa5e62dccf-1600x900-jpg", + "_type": "reference" + } + }, + { + "_type": "image", + "alt": "Hackathon photo", + "_key": "62b900e5c2ae", + "asset": { + "_type": "reference", + "_ref": "image-e6edb9510ce549ea88d230d9961a0f1dac92f1ed-1600x900-jpg" + } + } + ], + "_createdAt": "2024-09-25T14:17:05Z", + "_rev": "Ot9x7kyGeH5005E3MJ9HNe", + "_id": "522a12fbeb50", + "author": { + "_type": "reference", + "_ref": "drafts.phil-ewels" + }, + "tags": [ + { + "_type": "reference", + "_key": "72d32f9fe3de", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ] + }, + { + "_updatedAt": "2024-10-02T13:13:56Z", + "title": "Scaling with AWS Batch", + "publishedAt": "2017-11-08T07:00:00.000Z", + "body": [ + { + "style": "normal", + "_key": "a27bea3529f6", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/batch/", + "_key": "c5daa195af6b" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The latest Nextflow release (0.26.0) includes built-in support for ", + "_key": "a6ff24002d19" + }, + { + "marks": [ + "c5daa195af6b" + ], + "text": "AWS Batch", + "_key": "2f5d1fdba72e", + "_type": "span" + }, + { + "text": ", a managed computing service that allows the execution of containerised workloads over the Amazon EC2 Container Service (ECS).", + "_key": "cf0deed10a24", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0033e90abc20", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "64551967199c" + }, + { + "_type": "block", + "style": "normal", + "_key": "6bd066c7d2ef", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading the process executions as managed Batch jobs. The service takes care to spin up the required computing instances on-demand, scaling up and down the number and composition of the instances to best accommodate the actual workload resource needs at any point in time.", + "_key": "901e2f9d6dee" + } + ] + }, + { + "children": [ + { + "_key": "21e679683da1", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e40ff2cebbc5", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "AWS Batch shares with Nextflow the same vision regarding workflow containerisation i.e. each compute task is executed in its own Docker container. This dramatically simplifies the workflow deployment through the download of a few container images. This common design background made the support for AWS Batch a natural extension for Nextflow.", + "_key": "76f3a7e14a40" + } + ], + "_type": "block", + "style": "normal", + "_key": "ae0bee544e0b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ab7f794ce634", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "23a77ae26b42" + } + ] + }, + { + "_key": "04c08c613b64", + "markDefs": [], + "children": [ + { + "_key": "59b81a947826", + "_type": "span", + "marks": [], + "text": "Batch in a nutshell" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Batch is organised in ", + "_key": "b84859ea78b7" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Compute Environments", + "_key": "1ac148f94824" + }, + { + "marks": [], + "text": ", ", + "_key": "26d5c72c810c", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Job queues", + "_key": "d50161486bf3" + }, + { + "_key": "713b89df6679", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_key": "0b8171533b19", + "_type": "span", + "marks": [ + "em" + ], + "text": "Job definitions" + }, + { + "text": " and ", + "_key": "34623d9eb1f7", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Jobs", + "_key": "3156d6add377" + }, + { + "_key": "8033d0dff989", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "51e484bcc0fc" + }, + { + "_type": "block", + "style": "normal", + "_key": "4c5e35cf7c66", + "markDefs": [], + "children": [ + { + "_key": "057f450d6c7d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "66e4b9376be8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "3dbee83bb41d" + }, + { + "text": "Compute Environment", + "_key": "73b0764aafb2", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " allows you to define the computing resources required for a specific workload (type). You can specify the minimum and maximum number of CPUs that can be allocated, the EC2 provisioning model (On-demand or Spot), the AMI to be used and the allowed instance types.", + "_key": "310c1529c95e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "2041abf88edd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b85e1da8840f", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "The ", + "_key": "be04b79e4cd4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Job queue", + "_key": "de5c0eb32a39" + }, + { + "marks": [], + "text": " definition allows you to bind a specific task to one or more Compute Environments.", + "_key": "16914999e841", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "09c83ab3ecc4", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "bbad292f996a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d957f9d07381" + }, + { + "children": [ + { + "_key": "bb6b890c12e4", + "_type": "span", + "marks": [], + "text": "Then, the " + }, + { + "_key": "3b1da5f7a535", + "_type": "span", + "marks": [ + "em" + ], + "text": "Job definition" + }, + { + "_type": "span", + "marks": [], + "text": " is a template for one or more jobs in your workload. This is required to specify the Docker image to be used in running a particular task along with other requirements such as the container mount points, the number of CPUs, the amount of memory and the number of retries in case of job failure.", + "_key": "c0ec3f61ab10" + } + ], + "_type": "block", + "style": "normal", + "_key": "22fc8eba9ad0", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "f38fd874f8c8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a707904a826d" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Finally the ", + "_key": "902b5367a1b1" + }, + { + "_key": "70a7aae9ebf3", + "_type": "span", + "marks": [ + "em" + ], + "text": "Job" + }, + { + "marks": [], + "text": " binds a Job definition to a specific Job queue and allows you to specify the actual task command to be executed in the container.", + "_key": "c4f9593730ba", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e4f9237d8881", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b20a925bcca4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9819563c5f4b" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The job input and output data management is delegated to the user. This means that if you only use Batch API/tools you will need to take care to stage the input data from a S3 bucket (or a different source) and upload the results to a persistent storage location.", + "_key": "b63889686655" + } + ], + "_type": "block", + "style": "normal", + "_key": "5981c02a9677", + "markDefs": [] + }, + { + "_key": "c9d2e692bbbe", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9d4fac58df9e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "This could turn out to be cumbersome in complex workflows with a large number of tasks and above all it makes it difficult to deploy the same applications across different infrastructure.", + "_key": "e813f4c4d5f4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "23a2c18c51cf" + }, + { + "style": "normal", + "_key": "bb51576688e4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "38ef40e7dbd2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "88d72649e790", + "markDefs": [], + "children": [ + { + "_key": "109945bda3d0", + "_type": "span", + "marks": [], + "text": "How to use Batch with Nextflow" + } + ] + }, + { + "style": "normal", + "_key": "f35ce16deb48", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow streamlines the use of AWS Batch by smoothly integrating it in its workflow processing model and enabling transparent interoperability with other systems.", + "_key": "b90254c4328a" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "984feea8e2e5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c12c292131d7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "10f96c579225", + "markDefs": [ + { + "_type": "link", + "href": "http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html", + "_key": "4c0375e4bea1" + }, + { + "_key": "f027756f3bc5", + "_type": "link", + "href": "http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html" + } + ], + "children": [ + { + "marks": [], + "text": "To run Nextflow you will need to set-up in your AWS Batch account a ", + "_key": "54b2e618a132", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "4c0375e4bea1" + ], + "text": "Compute Environment", + "_key": "591d8f21aa6f" + }, + { + "_key": "3127335cd55b", + "_type": "span", + "marks": [], + "text": " defining the required computing resources and associate it to a " + }, + { + "_key": "2c396bd8334c", + "_type": "span", + "marks": [ + "f027756f3bc5" + ], + "text": "Job Queue" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "82a7fdb6bca3" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "8e5c8137a2be", + "markDefs": [], + "children": [ + { + "_key": "b0069029a864", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow takes care to create the required ", + "_key": "1370afbe1fad" + }, + { + "text": "Job Definitions", + "_key": "37d082404afc", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "bc85f33088dc", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "em" + ], + "text": "Job", + "_key": "7542b524d9b2", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " requests as needed. This spares some Batch configurations steps.", + "_key": "16bb547eeabb" + } + ], + "_type": "block", + "style": "normal", + "_key": "5b773f037ad7", + "markDefs": [] + }, + { + "_key": "2cc15770048c", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "311649c5ce71", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "88c87fbeb5f2", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In the ", + "_key": "83e8ec5d32e7", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "12191a14a258", + "_type": "span" + }, + { + "_key": "b4edeec1d5af", + "_type": "span", + "marks": [], + "text": ", file specify the " + }, + { + "_key": "06bd4c7b0394", + "_type": "span", + "marks": [ + "code" + ], + "text": "awsbatch" + }, + { + "_key": "7e7fd58dbb47", + "_type": "span", + "marks": [], + "text": " executor, the Batch " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "queue", + "_key": "c310437fe1f5" + }, + { + "_type": "span", + "marks": [], + "text": " and the container to be used in the usual manner. You may also need to specify the AWS region and access credentials if they are not provided by other means. For example:", + "_key": "d6cdb6d0ddf1" + } + ] + }, + { + "style": "normal", + "_key": "e9be99a49e43", + "markDefs": [], + "children": [ + { + "_key": "9be875ce8fc2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "097fc170b106", + "code": "process.executor = 'awsbatch'\nprocess.queue = 'my-batch-queue'\nprocess.container = your-org/your-docker:image\naws.region = 'eu-west-1'\naws.accessKey = 'xxx'\naws.secretKey = 'yyy'", + "_type": "code" + }, + { + "children": [ + { + "_key": "0a799aea5cb3", + "_type": "span", + "marks": [], + "text": "Each process can eventually use a different queue and Docker image (see Nextflow documentation for details). The container image(s) must be published in a Docker registry that is accessible from the instances run by AWS Batch eg. " + }, + { + "marks": [ + "8edf91599a37" + ], + "text": "Docker Hub", + "_key": "821fbefcb0c0", + "_type": "span" + }, + { + "marks": [], + "text": ", ", + "_key": "96c2fa32efda", + "_type": "span" + }, + { + "_key": "343e1bca5758", + "_type": "span", + "marks": [ + "71bd27ed336e" + ], + "text": "Quay" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "aac20010ead2" + }, + { + "text": "ECS Container Registry", + "_key": "3a7752c02bf0", + "_type": "span", + "marks": [ + "32dedd97b75c" + ] + }, + { + "text": ".", + "_key": "71186ef67e51", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "34437eb32b3c", + "markDefs": [ + { + "_key": "8edf91599a37", + "_type": "link", + "href": "https://hub.docker.com/" + }, + { + "href": "https://quay.io/", + "_key": "71bd27ed336e", + "_type": "link" + }, + { + "_key": "32dedd97b75c", + "_type": "link", + "href": "https://aws.amazon.com/ecr/" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "df58912f583e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0856d00cc55a" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow process can be launched either in a local computer or a EC2 instance. The latter is suggested for heavy or long running workloads.", + "_key": "bd5cb1adbfc1" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ca75ce52f68" + }, + { + "_key": "cd9ed3bc6102", + "markDefs": [], + "children": [ + { + "_key": "966082ad7d3c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "Note that input data should be stored in the S3 storage. In the same manner the pipeline execution must specify a S3 bucket as a working directory by using the ", + "_key": "446efd365987", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-w", + "_key": "d1fb5981742a" + }, + { + "_type": "span", + "marks": [], + "text": " command line option.", + "_key": "21a700327770" + } + ], + "_type": "block", + "style": "normal", + "_key": "091b61bcb1b6" + }, + { + "style": "normal", + "_key": "ba6d057adaa0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b0a4389d5afd", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "050bcae08875", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A final caveat about custom containers and computing AMI. Nextflow automatically stages input data and shares tasks intermediate results by using the S3 bucket specified as a work directory. For this reason it needs to use the ", + "_key": "d9fe6548083b" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "aws", + "_key": "fe2fdcf86c8d" + }, + { + "_type": "span", + "marks": [], + "text": " command line tool which must be installed either in your process container or be present in a custom AMI that can be mounted and accessed by the Docker containers.", + "_key": "7ff23b944ade" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "8ad255b2b520", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "18d6712ba839", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You may also need to create a custom AMI because the default image used by AWS Batch only provides 22 GB of storage which may not be enough for real world analysis pipelines.", + "_key": "d51164488694" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ffb1e78152c" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b7477277cd0a" + } + ], + "_type": "block", + "style": "normal", + "_key": "2144e5cdb0d1", + "markDefs": [] + }, + { + "style": "normal", + "_key": "6db8d938a47e", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/awscloud.html#custom-ami", + "_key": "2ace7239aa52" + } + ], + "children": [ + { + "marks": [], + "text": "See the documentation to learn ", + "_key": "3f3208d1b209", + "_type": "span" + }, + { + "marks": [ + "2ace7239aa52" + ], + "text": "how to create a custom AMI", + "_key": "269577d1d9a6", + "_type": "span" + }, + { + "_key": "36ebab5aab17", + "_type": "span", + "marks": [], + "text": " with larger storage and how to setup the AWS CLI tools." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "d4eea62db1ce", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "278e1eecfff9", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "e973dac36071", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "An example", + "_key": "37061abeb372", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "ae03d5f7b076", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In order to validate Nextflow integration with AWS Batch, we used a simple RNA-Seq pipeline.", + "_key": "c1a63c9f67e7" + } + ] + }, + { + "_key": "85207a7f515a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3cad6c2caf78" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://www.encodeproject.org/search/?type=Experiment&award.project=ENCODE&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&files.file_type=fastq&files.run_type=paired-ended&replicates.library.nucleic_acid_term_name=RNA&replicates.library.depleted_in_term_name=rRNA", + "_key": "a26cfeec8608", + "_type": "link" + } + ], + "children": [ + { + "text": "This pipeline takes as input a metadata file from the Encode project corresponding to a ", + "_key": "efb3a33706b3", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "a26cfeec8608" + ], + "text": "search\nreturning all human RNA-seq paired-end datasets", + "_key": "b5e5153b6ab8" + }, + { + "text": " (the metadata file has been additionally filtered to retain only data having a SRA ID).", + "_key": "a267024be8e7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3f43f30cc919" + }, + { + "_type": "block", + "style": "normal", + "_key": "d70bf06e0a47", + "markDefs": [], + "children": [ + { + "_key": "3eea0fa2f108", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a45a549e2767", + "markDefs": [ + { + "_key": "776f9d9e4de0", + "_type": "link", + "href": "https://combine-lab.github.io/salmon/" + }, + { + "href": "http://multiqc.info/", + "_key": "d14d204dd6fd", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline automatically downloads the FASTQ files for each sample from the EBI ENA database, it assesses the overall quality of sequencing data using FastQC and then runs ", + "_key": "5ded8110aca7" + }, + { + "text": "Salmon", + "_key": "9935fb547858", + "_type": "span", + "marks": [ + "776f9d9e4de0" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to perform the quantification over the human transcript sequences. Finally all the QC and quantification outputs are summarised using the ", + "_key": "b0788b1cfece" + }, + { + "marks": [ + "d14d204dd6fd" + ], + "text": "MultiQC", + "_key": "9d391e1fb686", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " tool.", + "_key": "9de81524e5b5" + } + ] + }, + { + "children": [ + { + "_key": "73100dab1547", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b285e3ea5c86", + "markDefs": [] + }, + { + "style": "normal", + "_key": "67fa3af86519", + "markDefs": [], + "children": [ + { + "_key": "eef1f873a8c5", + "_type": "span", + "marks": [], + "text": "For the sake of this benchmark we used the first 38 samples out of the full 375 samples dataset." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9817cefe782c", + "markDefs": [], + "children": [ + { + "_key": "eb692a3892f2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline was executed both on AWS Batch cloud and in the CRG internal Univa cluster, using ", + "_key": "6a096c586c3a" + }, + { + "_key": "2819f1d8f61e", + "_type": "span", + "marks": [ + "266a0d48aeef" + ], + "text": "Singularity" + }, + { + "_type": "span", + "marks": [], + "text": " as containers runtime.", + "_key": "d00aa9928879" + } + ], + "_type": "block", + "style": "normal", + "_key": "9ec96316e352", + "markDefs": [ + { + "_key": "266a0d48aeef", + "_type": "link", + "href": "/blog/2016/more-fun-containers-hpc.html" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c5d0d0992d9a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "54a6eabbf1b5" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "4cfea0912233", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/rnaseq-encode-nf", + "_key": "7fefbd96e81d", + "_type": "link" + } + ], + "children": [ + { + "_key": "5b3290f61980", + "_type": "span", + "marks": [], + "text": "It's worth noting that with the exception of the two configuration changes detailed below, we used exactly the same pipeline implementation at " + }, + { + "_key": "1d352b86f86b", + "_type": "span", + "marks": [ + "7fefbd96e81d" + ], + "text": "this GitHub repository" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "85f362058326" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "91192c4feb86", + "markDefs": [], + "children": [ + { + "_key": "9d9671d34072", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The AWS deploy used the following configuration profile:", + "_key": "9820cf05e8e6" + } + ], + "_type": "block", + "style": "normal", + "_key": "2ff1a09bd415" + }, + { + "style": "normal", + "_key": "92975dbf0739", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5fbbe3cd31a2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "aws.region = 'eu-west-1'\naws.client.storageEncryption = 'AES256'\nprocess.queue = 'large'\nexecutor.name = 'awsbatch'\nexecutor.awscli = '/home/ec2-user/miniconda/bin/aws'", + "_type": "code", + "_key": "4592e11fc83e" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "While for the cluster deployment the following configuration was used:", + "_key": "aa7505b75841" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc2a1896df81", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "8ffaaf4850c7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "00ef349299e9", + "_type": "span" + } + ] + }, + { + "code": "executor = 'crg'\nsingularity.enabled = true\nprocess.container = \"docker://nextflow/rnaseq-nf\"\nprocess.queue = 'cn-el7'\nprocess.time = '90 min'\nprocess.$quant.time = '4.5 h'", + "_type": "code", + "_key": "94ecfc21795d" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Results", + "_key": "6c9eb3d23dd1", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "fcb72214e5b1" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The AWS Batch Compute environment was configured to use a maximum of 132 CPUs as the number of CPUs that were available in the queue for local cluster deployment.", + "_key": "58fe7ed4d4af", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "cb46ac2b6e2a" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a0dbe35ceaea" + } + ], + "_type": "block", + "style": "normal", + "_key": "b76c5bf05b8b" + }, + { + "_type": "block", + "style": "normal", + "_key": "5472ac8f16e8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The two executions ran in roughly the same time: 2 hours and 24 minutes when running in the CRG cluster and 2 hours and 37 minutes when using AWS Batch.", + "_key": "ed40ffb699c2" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2d548a45e657", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e8266ca879ab" + }, + { + "_type": "block", + "style": "normal", + "_key": "8d63bb01f293", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It must be noted that 14 jobs failed in the Batch deployment, presumably because one or more spot instances were retired. However Nextflow was able to re-schedule the failed jobs automatically and the overall pipeline execution completed successfully, also showing the benefits of a truly fault tolerant environment.", + "_key": "46d418a9f554" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "e3adea8acb35", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ea044ef42311", + "markDefs": [] + }, + { + "_key": "e7a649de56a5", + "markDefs": [], + "children": [ + { + "_key": "8399fa01f3a4", + "_type": "span", + "marks": [], + "text": "The overall cost for running the pipeline with AWS Batch was " + }, + { + "text": "$5.47", + "_key": "48cd265fa7b3", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "e6d37261d9c4", + "_type": "span", + "marks": [], + "text": " ($ 3.28 for EC2 instances, $1.88 for EBS volume and $0.31 for S3 storage). This means that with ~ $55 we could have performed the same analysis on the full Encode dataset." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8c57b6b08683" + } + ], + "_type": "block", + "style": "normal", + "_key": "1bf88a585e42", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "56e539a917f3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is more difficult to estimate the cost when using the internal cluster, because we don't have access to such detailed cost accounting. However, as a user, we can estimate it roughly comes out at $0.01 per CPU-Hour. The pipeline needed around 147 CPU-Hour to carry out the analysis, hence with an estimated cost of ", + "_key": "cb8dcbe040b9" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "$1.47", + "_key": "bb501100898e" + }, + { + "marks": [], + "text": " just for the computation.", + "_key": "201f0afff356", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "3ddd57449eee", + "markDefs": [], + "children": [ + { + "_key": "4289cbc0d09b", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/aws-batch/report.html", + "_key": "984670afa53a" + }, + { + "_key": "2336fe7d376f", + "_type": "link", + "href": "https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/crg-cluster/report.html" + } + ], + "children": [ + { + "_key": "661beb8ebf81", + "_type": "span", + "marks": [], + "text": "The execution report for the Batch execution is available at " + }, + { + "marks": [ + "984670afa53a" + ], + "text": "this link", + "_key": "15ab5c295c43", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and the one for cluster is available ", + "_key": "84435d5f2267" + }, + { + "text": "here", + "_key": "b6b4f315e877", + "_type": "span", + "marks": [ + "2336fe7d376f" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "50279fe6483b" + } + ], + "_type": "block", + "style": "normal", + "_key": "6e4756eef22c" + }, + { + "style": "normal", + "_key": "7ba1d92d1a03", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b7ef4d33aec2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "a408b3fd6b05" + } + ], + "_type": "block", + "style": "h2", + "_key": "f04a8cde4634", + "markDefs": [] + }, + { + "children": [ + { + "_key": "cdda08689332", + "_type": "span", + "marks": [], + "text": "This post shows how Nextflow integrates smoothly with AWS Batch and how it can be used to deploy and execute real world genomics pipeline in the cloud with ease." + } + ], + "_type": "block", + "style": "normal", + "_key": "41cbe2ece04f", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "2342d2dce2b7", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ef2d1a8c9ddf" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The auto-scaling ability provided by AWS Batch along with the use of spot instances make the use of the cloud even more cost effective. Running on a local cluster may still be cheaper, even if it is non trivial to account for all the real costs of a HPC infrastructure. However the cloud allows flexibility and scalability not possible with common on-premises clusters.", + "_key": "77b2da54c3ff" + } + ], + "_type": "block", + "style": "normal", + "_key": "83d202a4d949" + }, + { + "children": [ + { + "_key": "6a08276c2978", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "2756a293a535", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "f21ed0c5c93d", + "_type": "span", + "marks": [], + "text": "We also demonstrate how the same Nextflow pipeline can be " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "transparently", + "_key": "6dcb4a5ef09a" + }, + { + "_type": "span", + "marks": [], + "text": " deployed in two very different computing infrastructure, using different containerisation technologies by simply providing a separate configuration profile.", + "_key": "b07d840ccef9" + } + ], + "_type": "block", + "style": "normal", + "_key": "ed4011357f0b" + }, + { + "children": [ + { + "_key": "d1da09179183", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ba41ec9396ad", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This approach enables the interoperability across different deployment sites, reduces operational and maintenance costs and guarantees consistent results over time.", + "_key": "e13a3d5af8e9" + } + ], + "_type": "block", + "style": "normal", + "_key": "9b46363220a7" + }, + { + "style": "normal", + "_key": "4d01301006d9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ad30f5421ec7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Credits", + "_key": "6d75d603ec58", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "35a97ee2841b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "56863abc994b", + "markDefs": [ + { + "href": "https://twitter.com/fstrozzi", + "_key": "dcc1c8a19617", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/emi80", + "_key": "5be6936f03af" + }, + { + "_type": "link", + "href": "https://gitter.im/skptic", + "_key": "2cf585b55b24" + } + ], + "children": [ + { + "marks": [], + "text": "This post is co-authored with ", + "_key": "ea1bba6d5b45", + "_type": "span" + }, + { + "_key": "75fd6ae0caa2", + "_type": "span", + "marks": [ + "dcc1c8a19617" + ], + "text": "Francesco Strozzi" + }, + { + "text": ", who also helped to write the pipeline used for the benchmark in this post and contributed to and tested the AWS Batch integration. Thanks to ", + "_key": "1253c482491b", + "_type": "span", + "marks": [] + }, + { + "_key": "d2617c9e8488", + "_type": "span", + "marks": [ + "5be6936f03af" + ], + "text": "Emilio Palumbo" + }, + { + "_type": "span", + "marks": [], + "text": " that helped to set-up and configure the AWS Batch environment and ", + "_key": "b40d2b1cb79c" + }, + { + "_type": "span", + "marks": [ + "2cf585b55b24" + ], + "text": "Evan Floden", + "_key": "94cfba4288fe" + }, + { + "_key": "9abe7f2c2ec0", + "_type": "span", + "marks": [], + "text": " for the comments." + } + ] + } + ], + "_type": "blogPost", + "_createdAt": "2024-09-25T14:15:21Z", + "_rev": "rsIQ9Jd8Z4nKBVUruy4XsZ", + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "eaca60cde6d5" + }, + { + "_type": "reference", + "_key": "bb0317333425", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_ref": "9161ec05-53f8-455a-a931-7b41f6ec5172", + "_type": "reference", + "_key": "e01020702cf2" + } + ], + "meta": { + "slug": { + "current": "scaling-with-aws-batch" + }, + "description": "The latest Nextflow release (0.26.0) includes built-in support for AWS Batch, a managed computing service that allows the execution of containerised workloads over the Amazon EC2 Container Service (ECS)." + }, + "_id": "5581b0fa9726", + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + } + }, + { + "_type": "blogPost", + "body": [ + { + "style": "normal", + "_key": "0d6283d373e3", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nature.com/articles/nbt.3820", + "_key": "6c43c8854e13" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Containers have become an essential part of well-structured data analysis pipelines. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed. Containers are also key to enabling predictable and ", + "_key": "77a22c375269" + }, + { + "_type": "span", + "marks": [ + "6c43c8854e13" + ], + "text": "reproducible results", + "_key": "b958ad8d3461" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "7102d4e70b5d" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9e475b1f48e2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "74faab139434" + } + ] + }, + { + "_key": "6940214e367b", + "markDefs": [ + { + "_key": "f430d3b51fbc", + "_type": "link", + "href": "https://www.nextflow.io/blog/2014/using-docker-in-hpc-cluster.html" + }, + { + "href": "https://biocontainers.pro/", + "_key": "7a7588c03f0a", + "_type": "link" + } + ], + "children": [ + { + "text": "Nextflow was one of the first workflow technologies to fully embrace ", + "_key": "bfb1c5c8d582", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "f430d3b51fbc" + ], + "text": "containers", + "_key": "77a2a0e01f5b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for data analysis pipelines. Community curated container collections such as ", + "_key": "70022261aad8" + }, + { + "_key": "3f967fef3a73", + "_type": "span", + "marks": [ + "7a7588c03f0a" + ], + "text": "BioContainers" + }, + { + "_type": "span", + "marks": [], + "text": " also helped speed container adoption.", + "_key": "c655460a994a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8454e0268741" + } + ], + "_type": "block", + "style": "normal", + "_key": "2c04e62d7044" + }, + { + "markDefs": [], + "children": [ + { + "_key": "f0fa55860466", + "_type": "span", + "marks": [], + "text": "However, the increasing complexity of data analysis pipelines and the need to deploy them across different clouds and platforms pose new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these containers and ensure that their functionality precisely aligns with the requirements of every pipeline task." + } + ], + "_type": "block", + "style": "normal", + "_key": "a03b2b683a29" + }, + { + "_key": "1ae8e80d4a27", + "children": [ + { + "text": "", + "_key": "1f9d427c5bf3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a5da86b558b0", + "markDefs": [], + "children": [ + { + "text": "Also, multi-cloud deployments and the increased use of private container registries further increase complexity for developers. Building and maintaining containers, pushing them to multiple registries, and dealing with platform-specific authentication schemes are tedious, time consuming, and a source of potential errors.", + "_key": "496c6afd3270", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "a9c904b79871", + "children": [ + { + "_type": "span", + "text": "", + "_key": "aacedca4985d" + } + ], + "_type": "block" + }, + { + "_key": "d2b2d9ddbe8c", + "children": [ + { + "_type": "span", + "text": "Wave – a game changer", + "_key": "1bbf66d4324d" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "e57b57181f58", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For these reasons, we decided to fundamentally rethink how containers are deployed and managed in Nextflow. Today we are thrilled to announce Wave — a container provisioning and augmentation service that is fully integrated with the Nextflow and Nextflow Tower ecosystems.", + "_key": "1d21175627b0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e71f3466d4ae" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e1728f3462e" + }, + { + "_type": "block", + "style": "normal", + "_key": "4541aa3196e4", + "markDefs": [], + "children": [ + { + "text": "Instead of viewing containers as separate artifacts that need to be integrated into a pipeline, Wave allows developers to manage containers as part of the pipeline itself. This approach helps simplify development, improves reliability, and makes pipelines easier to maintain. It can even improve pipeline performance.", + "_key": "fd85f3d34c2c", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "310e3069ed33", + "children": [ + { + "_key": "f0d565a798ee", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "aa2c82158f66", + "children": [ + { + "_type": "span", + "text": "How container provisioning works with Wave", + "_key": "db916d04b60a" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "83f9ecca9ab2", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/process.html#container", + "_key": "54c01308ebc3", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Instead of creating container images, pushing them to registries, and referencing them using Nextflow's ", + "_key": "e732d57a8c90" + }, + { + "_type": "span", + "marks": [ + "54c01308ebc3" + ], + "text": "container", + "_key": "0551982240cc" + }, + { + "_key": "1d1c9bbcf052", + "_type": "span", + "marks": [], + "text": " directive, Wave allows developers to simply include a Dockerfile in the directory where a process is defined." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "da7bbb926e65", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b1e0e2d4adc3" + }, + { + "_type": "block", + "style": "normal", + "_key": "2a1e7023cb62", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "When a process runs, the new Wave plug-in for Nextflow takes the Dockerfile and submits it to the Wave service. Wave then builds a container on-the-fly, pushes it to a destination container registry, and returns the container used for the actual process execution. The Wave service also employs caching at multiple levels to ensure that containers are built only once or when there is a change in the corresponding Dockerfile.", + "_key": "9abcb93987fd" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c3b79a1c780c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a139805bb6b9" + } + ] + }, + { + "_key": "7a600aad9d3f", + "markDefs": [], + "children": [ + { + "_key": "5f4e98c95e7d", + "_type": "span", + "marks": [], + "text": "The registry where images are stored can be specified in the Nextflow config file, along with the other pipeline settings. This means containers can be served from cloud registries closer to where pipelines execute, delivering better performance and reducing network traffic." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "d10f093090c1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f2626d087667" + }, + { + "asset": { + "_type": "reference", + "_ref": "image-908c0c63afc27651d62f59e57db90daf296e83a1-2400x1176-png" + }, + "_type": "image", + "alt": "Wave diagram", + "_key": "3047691f89d4" + }, + { + "children": [ + { + "_key": "cc645358b5f2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "81faa4b100c3" + }, + { + "children": [ + { + "_type": "span", + "text": "Nextflow, Wave, and Conda – a match made in heaven", + "_key": "1152d6babb69" + } + ], + "_type": "block", + "style": "h2", + "_key": "8f484de9a5b9" + }, + { + "markDefs": [ + { + "href": "https://conda.io/", + "_key": "dbb171e9072d", + "_type": "link" + }, + { + "href": "https://www.nextflow.io/blog/2018/conda-support-has-landed.html", + "_key": "b42177fa4414", + "_type": "link" + } + ], + "children": [ + { + "_key": "940dfc24a1c1", + "_type": "span", + "marks": [ + "dbb171e9072d" + ], + "text": "Conda" + }, + { + "text": " is an excellent package manager, fully ", + "_key": "0a786c92e28d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "b42177fa4414" + ], + "text": "supported in Nextflow", + "_key": "5f0c580b0ee5" + }, + { + "_type": "span", + "marks": [], + "text": " as an alternative to using containers to manage software dependencies in pipelines. However, until now, Conda could not be easily used in cloud-native computing platforms such as AWS Batch or Kubernetes.", + "_key": "00a6982bbfea" + } + ], + "_type": "block", + "style": "normal", + "_key": "cd2637cf4a43" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "5abeae7ccec2" + } + ], + "_type": "block", + "style": "normal", + "_key": "f388082c52e3" + }, + { + "_type": "block", + "style": "normal", + "_key": "ad7dfb1ec782", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#conda", + "_key": "d7909c912d24" + }, + { + "_type": "link", + "href": "https://github.com/mamba-org/mamba", + "_key": "74978ca12e8e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wave provides developers with a powerful new way to leverage Conda in Nextflow by using a ", + "_key": "30d0812e3e03" + }, + { + "_type": "span", + "marks": [ + "d7909c912d24" + ], + "text": "conda", + "_key": "47bbdc7c3f68" + }, + { + "_key": "57eb9cbcb343", + "_type": "span", + "marks": [], + "text": " directive as an alternative way to provision containers in their pipelines. When Wave encounters the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "conda", + "_key": "34b36230fd45" + }, + { + "_key": "6f71e470e853", + "_type": "span", + "marks": [], + "text": " directive in a process definition, and no container or Dockerfile is present, Wave automatically builds a container based on the Conda recipe using the strategy described above. Wave makes this process exceptionally fast (at least compared to vanilla Conda) by leveraging with the " + }, + { + "text": "Micromamba", + "_key": "1b4296b62b23", + "_type": "span", + "marks": [ + "74978ca12e8e" + ] + }, + { + "_type": "span", + "marks": [], + "text": " project under the hood.", + "_key": "d3193009d518" + } + ] + }, + { + "_key": "8564692795fc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "70ca8188b26b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Support for private registries", + "_key": "c40b4e467fc9" + } + ], + "_type": "block", + "style": "h2", + "_key": "392ad9b0d245" + }, + { + "children": [ + { + "text": "A long-standing problem with containers in Nextflow was the lack of support for private container registries. Wave solves this problem by acting as an authentication proxy between the Docker client requesting the container and a target container repository. Wave relies on ", + "_key": "b2d74d652123", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "81caeb07fbc1" + ], + "text": "Nextflow Tower", + "_key": "82ca173642a8", + "_type": "span" + }, + { + "marks": [], + "text": " to authenticate user requests to container registries.", + "_key": "08e7c42ddbe4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "26e633dd0f34", + "markDefs": [ + { + "_key": "81caeb07fbc1", + "_type": "link", + "href": "https://seqera.io/tower/" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f6a018a4a509" + } + ], + "_type": "block", + "style": "normal", + "_key": "2dfb87dcad6b" + }, + { + "style": "normal", + "_key": "431b8e52d581", + "markDefs": [ + { + "_type": "link", + "href": "https://help.tower.nf/22.2/credentials/overview/", + "_key": "b6ee694b5b04" + } + ], + "children": [ + { + "_key": "f9258dfdf1af", + "_type": "span", + "marks": [], + "text": "To access private container registries from a Nextflow pipeline, developers can simply specify their Tower access token in the pipeline configuration file and store their repository credentials in " + }, + { + "_type": "span", + "marks": [ + "b6ee694b5b04" + ], + "text": "Nextflow Tower", + "_key": "9623f2dd31f0" + }, + { + "_type": "span", + "marks": [], + "text": " page in your account. Wave will automatically and securely use these credentials to authenticate to the private container registry.", + "_key": "eb00035afb77" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "94c14b17a3e1", + "children": [ + { + "_key": "2b5ae2a9d7dc", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "1cfa1fc2beec", + "children": [ + { + "_key": "3dd5411b9869", + "_type": "span", + "text": "But wait, there's more! Container augmentation!" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "5efca03b9ebf", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "By automatically building and provisioning containers, Wave dramatically simplifies how containers are handled in Nextflow. However, there are cases where organizations are required to use validated containers for security or policy reasons rather than build their own images, but still they need to provide additional functionality, like for example, adding site-specific scripts or logging agents while keeping the base container layers intact.", + "_key": "9856919d0e43", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "31cd3afc0b3f", + "children": [ + { + "text": "", + "_key": "85f4401361a1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "863d7826acb7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow allows for the definition of pipeline level (and more recently module level) scripts executed in the context of the task execution environment. These scripts can be made accessible to the container environment by mounting a host volume. However, this approach only works when using a local or shared file system.", + "_key": "2c57e001cf7c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ff3802ca3094" + } + ], + "_type": "block", + "style": "normal", + "_key": "7c093a0956c5" + }, + { + "children": [ + { + "marks": [], + "text": "Wave solves these problems by dynamically adding one or more layers to an existing container image during the container image download phase from the registry. Developers can use container augmentation to inject an arbitrary payload into any container without re-building it. Wave then recomputes the image's final manifest adding new layers and checksums on-the-fly, so that the final downloaded image reflects the added content.", + "_key": "eabb8b211dcc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "22b402f6b4d9", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1867c32aad93", + "children": [ + { + "_key": "ce8b1eb513ed", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "text": "With container augmentation, developers can include a directory called ", + "_key": "7e064093785d", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "resources", + "_key": "c14722a3ae4a", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in pipeline ", + "_key": "44dbaf714d4a" + }, + { + "_type": "span", + "marks": [ + "fd756a0cdc96" + ], + "text": "module directories", + "_key": "d69dfedb6b41" + }, + { + "marks": [], + "text": ". When the corresponding containerized task is executed, Wave automatically mirrors the content of the resources directory in the root path of the container where it can be accessed by scripts running within the container.", + "_key": "6403afc85474", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "96083ba491c0", + "markDefs": [ + { + "_key": "fd756a0cdc96", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/dsl2.html#module-directory" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ab3bd1005f05", + "children": [ + { + "text": "", + "_key": "5c8f1f2b82d2", + "_type": "span" + } + ] + }, + { + "style": "h2", + "_key": "29d349b86b3b", + "children": [ + { + "text": "A sneak preview of Fusion file system", + "_key": "72dc57d92424", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "One of the main motivations for implementing Wave is that we wanted to have the ability to easily package a Fusion client in containers to make this important functionality readily available in Nextflow pipelines.", + "_key": "a33b277ebf54" + } + ], + "_type": "block", + "style": "normal", + "_key": "af55130d1154" + }, + { + "style": "normal", + "_key": "893f4625688b", + "children": [ + { + "text": "", + "_key": "4f85097296cd", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "a770f6d8e30e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fusion implements a virtual distributed file system and presents a thin-client allowing data hosted in AWS S3 buckets to be accessed via the standard POSIX filesystem interface expected by the pipeline tools. This client runs in the task container and is added automatically via the Wave augmentation capability. This makes Fusion functionality available for pipeline execution at runtime.", + "_key": "cc8b350bdc26" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a7c0b39fb63b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0a89d1629ee7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "71e79a9f0a92", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This means the Nextflow pipeline can use an AWS S3 bucket as the work directory, and pipeline tasks can access the S3 bucket natively as a local file system path. This is an important innovation as it avoids the additional step of copying files in and out of object storage. Fusion takes advantage for the Nextflow tasks segregation and idempotent execution model to optimise and speedup file access operations.", + "_key": "7315ae292460", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "1d8c35bc90db" + } + ], + "_type": "block", + "style": "normal", + "_key": "3fad60d473f0" + }, + { + "style": "h2", + "_key": "322b4cf600ad", + "children": [ + { + "_type": "span", + "text": "Getting started", + "_key": "12643c2e576d" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3ca34f863f63", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wave requires Nextflow version 22.10.0 or later and can be enabled by using the ", + "_key": "95342d8d8838" + }, + { + "marks": [ + "code" + ], + "text": "-with-wave", + "_key": "571af463eaf0", + "_type": "span" + }, + { + "marks": [], + "text": " command line option or by adding the following snippet in your nextflow.config file:", + "_key": "b97d1c1a9675", + "_type": "span" + } + ] + }, + { + "_key": "bad4654d24c5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a0966e5eb349" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "wave {\n enabled = true\n strategy = 'conda,container'\n}\n\ntower {\n accessToken = \"\"\n}", + "_type": "code", + "_key": "0f789a6dcb25" + }, + { + "children": [ + { + "_key": "c01001397afd", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "404134e5f5b0" + }, + { + "style": "normal", + "_key": "468bbdabae55", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The use of the Tower access token is not mandatory, however, it's required to enable the access to private repositories. The use of authentication also allows higher service rate limits compared to anonymous users. You can run a Nextflow pipeline such as rnaseq-nf with Wave, as follows:", + "_key": "7b9bc21701a4" + } + ], + "_type": "block" + }, + { + "_key": "ac1cd74091b1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6b2b2f1753f8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "nextflow run nextflow-io/rnaseq-nf -with-wave", + "_type": "code", + "_key": "37fdce289192" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a9da4fc2d36b" + } + ], + "_type": "block", + "style": "normal", + "_key": "ee0a1140f6bd" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The configuration in the nextflow.config snippet above will enable the provisioning of Wave containers created starting from the ", + "_key": "d1115eb1e9f9", + "_type": "span" + }, + { + "_key": "c11da9cd3b9b", + "_type": "span", + "marks": [ + "code" + ], + "text": "conda" + }, + { + "_key": "408015af4ffc", + "_type": "span", + "marks": [], + "text": " requirements specified in the pipeline processes." + } + ], + "_type": "block", + "style": "normal", + "_key": "31167566e013" + }, + { + "style": "normal", + "_key": "88046880c4ea", + "children": [ + { + "text": "", + "_key": "dca726395ea3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/wave.html", + "_key": "733069b687f9", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/seqeralabs/wave-showcase", + "_key": "1e286f61f0f8" + } + ], + "children": [ + { + "text": "You can find additional information and examples in the Nextflow ", + "_key": "80360901ad39", + "_type": "span", + "marks": [] + }, + { + "text": "documentation", + "_key": "95c584045968", + "_type": "span", + "marks": [ + "733069b687f9" + ] + }, + { + "marks": [], + "text": " and in the Wave ", + "_key": "6711fc617537", + "_type": "span" + }, + { + "marks": [ + "1e286f61f0f8" + ], + "text": "showcase project", + "_key": "fdb6801999d4", + "_type": "span" + }, + { + "text": ".", + "_key": "1ef6828c826b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "94f65d52d4bb" + }, + { + "style": "normal", + "_key": "6bc39e1959c7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "818fc5992e0b" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "590fe9465f14", + "children": [ + { + "_type": "span", + "text": "Availability", + "_key": "79e44bb92582" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2d969ccc2baf", + "markDefs": [ + { + "_key": "f0142a0afa31", + "_type": "link", + "href": "https://hub.docker.com/" + }, + { + "href": "https://quay.io/", + "_key": "6c2ff6b99291", + "_type": "link" + }, + { + "_key": "41c35dd8a975", + "_type": "link", + "href": "https://aws.amazon.com/ecr/" + }, + { + "_type": "link", + "href": "https://cloud.google.com/artifact-registry", + "_key": "438888af9e43" + }, + { + "_type": "link", + "href": "https://azure.microsoft.com/en-us/products/container-registry/", + "_key": "4adc3c27593d" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Wave container provisioning service is available free of charge as technology preview to all Nextflow and Tower users. Wave supports all major container registries including ", + "_key": "8491ad86056b" + }, + { + "text": "Docker Hub", + "_key": "b0ca3741acf3", + "_type": "span", + "marks": [ + "f0142a0afa31" + ] + }, + { + "text": ", ", + "_key": "2c5cfcfa2d5a", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "6c2ff6b99291" + ], + "text": "Quay.io", + "_key": "5b0deb9606ca", + "_type": "span" + }, + { + "marks": [], + "text": ", ", + "_key": "dce638e1917e", + "_type": "span" + }, + { + "marks": [ + "41c35dd8a975" + ], + "text": "AWS Elastic Container Registry", + "_key": "27f4d81fb913", + "_type": "span" + }, + { + "_key": "fe7d9c5693d6", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "438888af9e43" + ], + "text": "Google Artifact Registry", + "_key": "783ef030f69b" + }, + { + "text": " and ", + "_key": "47cb009dc0ff", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "4adc3c27593d" + ], + "text": "Azure Container Registry", + "_key": "2e90e1bdb77a", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "31e0fef7cea7", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "9ee8965d4eec", + "children": [ + { + "text": "", + "_key": "d637b93c85e7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c187ad19ea87", + "markDefs": [], + "children": [ + { + "text": "During the preview period, anonymous users can build up to 10 container images per day and pull 100 containers per hour. Tower authenticated users can build 100 container images per hour and pull 1000 containers per minute. After the preview period, we plan to make the Wave service available free of charge to academic users and open-source software (OSS) projects.", + "_key": "70bbf61bdb97", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "aca1c12fe0bd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "618486f02de0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "cc087b119b7d", + "children": [ + { + "_key": "7ecaa1181ac3", + "_type": "span", + "text": "Conclusion" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Software containers greatly simplify the deployment of complex data analysis pipelines. However, there still have been many challenges preventing organizations from fully unlocking the potential of this exciting technology. For too long, containers have been viewed as a replacement for package managers, but they serve a different purpose.", + "_key": "7a9d7c72c38f" + } + ], + "_type": "block", + "style": "normal", + "_key": "923561cd0f43", + "markDefs": [] + }, + { + "_key": "8b6becb6dc90", + "children": [ + { + "_key": "37eb61e2e31b", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "981c3011d932", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In our view, it's time to re-consider containers as monolithic artifacts that are assembled separately from pipeline code. Instead, containers should be viewed simply as an execution substrate facilitating the deployment of the pipeline software dependencies defined via a proper package manager such as Conda.", + "_key": "f6329e3dcfbf" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "01a4e7f24987", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a0190214690f" + } + ] + }, + { + "style": "normal", + "_key": "5061cd487909", + "markDefs": [], + "children": [ + { + "_key": "840d48d9a5b3", + "_type": "span", + "marks": [], + "text": "Wave, Nextflow, and Nextflow Tower combine to fully automate the container lifecycle including management, provisioning and dependencies of complex data pipelines on-demand while removing unnecessary error-prone manual steps. " + }, + { + "_key": "4190368e7ee0", + "_type": "span", + "text": "" + } + ], + "_type": "block" + } + ], + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "54a5dce7c0d3" + }, + { + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference", + "_key": "6467db64513b" + } + ], + "publishedAt": "2022-10-13T06:00:00.000Z", + "title": "Rethinking containers for cloud native pipelines", + "_rev": "hf9hwMPb7ybAE3bqEU5rbP", + "_createdAt": "2024-09-25T14:16:46Z", + "_id": "55e8b21794fe", + "meta": { + "slug": { + "current": "rethinking-containers-for-cloud-native-pipelines" + } + }, + "_updatedAt": "2024-09-26T09:03:20Z" + }, + { + "meta": { + "slug": { + "current": "docker-for-dunces-nextflow-for-nunces" + } + }, + "_type": "blogPost", + "_id": "561ca06ac707", + "_createdAt": "2024-09-25T14:15:05Z", + "_updatedAt": "2024-09-26T09:01:26Z", + "title": "Docker for dunces & Nextflow for nunces", + "tags": [ + { + "_type": "reference", + "_key": "5edc3ed408ba", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "c2a74b2b2cad" + } + ], + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "5de644223001", + "markDefs": [], + "children": [ + { + "text": "Below is a step-by-step guide for creating [Docker](http://www.docker.io) images for use with [Nextflow](http://www.nextflow.io) pipelines. This post was inspired by recent experiences and written with the hope that it may encourage others to join in the virtualization revolution.", + "_key": "aa74c907fb89", + "_type": "span", + "marks": [ + "em" + ] + } + ] + }, + { + "_key": "fba2c75d251d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1e58c8a15fb2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "50833a8d465d", + "markDefs": [], + "children": [ + { + "text": "Modern science is built on collaboration. Recently I became involved with one such venture between several groups across Europe. The aim was to annotate long non-coding RNA (lncRNA) in farm animals and I agreed to help with the annotation based on RNA-Seq data. The basic procedure relies on mapping short read data from many different tissues to a genome, generating transcripts and then determining if they are likely to be lncRNA or protein coding genes.", + "_key": "5ad57d04cb9d", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "df4dbb73e883", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f171be4200cf" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "During several successful 'hackathon' meetings the best approach was decided and implemented in a joint effort. I undertook the task of wrapping the procedure up into a Nextflow pipeline with a view to replicating the results across our different institutions and to allow the easy execution of the pipeline by researchers anywhere.", + "_key": "85b35fa626c4" + } + ], + "_type": "block", + "style": "normal", + "_key": "84ce0feaea47", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "d043f09e00b4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca94bc941408" + }, + { + "children": [ + { + "text": "Creating the Nextflow pipeline (", + "_key": "155a8a08d8cd", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "99165958e6b5" + ], + "text": "here", + "_key": "357c4685588b" + }, + { + "marks": [], + "text": ") in itself was not a difficult task. My collaborators had documented their work well and were on hand if anything was not clear. However installing and keeping aligned all the pipeline dependencies across different the data centers was still a challenging task.", + "_key": "f3317867e3c0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "974f1a1cdfa3", + "markDefs": [ + { + "href": "http://www.github.com/cbcrg/lncrna-annotation-nf", + "_key": "99165958e6b5", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ab6f59d351cb", + "children": [ + { + "text": "", + "_key": "2a5c98bf3a96", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "004440881a96", + "markDefs": [ + { + "href": "https://www.docker.com/", + "_key": "905a8bc500ad", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline is typical of many in bioinformatics, consisting of binary executions, BASH scripting, R, Perl, BioPerl and some custom Perl modules. We found the BioPerl modules in particular where very sensitive to the various versions in the ", + "_key": "8390ee0ee4e6" + }, + { + "text": "long", + "_key": "c58b7dc20cce", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "e384258a5c3f", + "_type": "span", + "marks": [], + "text": " dependency tree. The solution was to turn to " + }, + { + "marks": [ + "905a8bc500ad" + ], + "text": "Docker", + "_key": "48755f8b6d14", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " containers.", + "_key": "236e84a2092d" + } + ] + }, + { + "_key": "55e482405e7c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f4792876a9aa" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "df983b305d4f", + "markDefs": [], + "children": [ + { + "text": "I have taken this opportunity to document the process of developing the Docker side of a Nextflow + Docker pipeline in a step-by-step manner.", + "_key": "8fe4f707201e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "649e13290a13" + } + ], + "_type": "block", + "style": "normal", + "_key": "64ccdad0c58d" + }, + { + "children": [ + { + "_key": "f8e4f2418ada", + "_type": "span", + "marks": [], + "text": "###Docker Installation" + } + ], + "_type": "block", + "style": "normal", + "_key": "9daaf61343a0", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "7dbae6fbfa16", + "children": [ + { + "text": "", + "_key": "22f03df3d9b5", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "a438411f6220", + "markDefs": [ + { + "href": "https://docs.docker.com/engine/installation", + "_key": "b39b383b61e5", + "_type": "link" + }, + { + "_type": "link", + "href": "https://blog.docker.com/2016/02/docker-engine-1-10-security/", + "_key": "1664943865ae" + } + ], + "children": [ + { + "text": "By far the most challenging issue is the installation of Docker. For local installations, the ", + "_key": "3af57ef1c497", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "b39b383b61e5" + ], + "text": "process is relatively straight forward", + "_key": "29497e07ff62" + }, + { + "_key": "bac8833f273e", + "_type": "span", + "marks": [], + "text": ". However difficulties arise as computing moves to a cluster. Owing to security concerns, many HPC administrators have been reluctant to install Docker system-wide. This is changing and Docker developers have been responding to many of these concerns with " + }, + { + "text": "updates addressing these issues", + "_key": "f4e68c0049e2", + "_type": "span", + "marks": [ + "1664943865ae" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "ebdcec8ebe01" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "369a356018e0", + "children": [ + { + "_key": "6ea9c938cc17", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "9c82fe0136e7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "That being the case, local installations are usually perfectly fine for development. One of the golden rules in Nextflow development is to have a small test dataset that can run the full pipeline in minutes with few computational resources, ie can run on a laptop.", + "_key": "f06c6b5ed104", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "9f5f313834ae", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "11b1347afcab" + }, + { + "_type": "block", + "style": "normal", + "_key": "3640fc87e1c5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you have Docker and Nextflow installed and you wish to view the working pipeline, you can perform the following commands to obtain everything you need and run the full lncrna annotation pipeline on a test dataset.", + "_key": "0b77a23d5bf7", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b0ad6ffae120" + } + ], + "_type": "block", + "style": "normal", + "_key": "9edd5abef435" + }, + { + "_key": "e04747c2e377", + "code": "docker pull cbcrg/lncrna_annotation\nnextflow run cbcrg/lncrna-annotation-nf -profile test", + "_type": "code" + }, + { + "_key": "0fc16192bebe", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "[If the following does not work, there could be a problem with your Docker installation.]", + "_key": "fb8752c7e000", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "0af154258b87", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e0523eff522a" + }, + { + "_type": "block", + "style": "normal", + "_key": "773b9de99fad", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The first command will download the required Docker image in your computer, while the second will launch Nextflow which automatically download the pipeline repository and run it using the test data included with it.", + "_key": "36689d3a632c" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0973e8d341fc" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ba84ebe36e1" + }, + { + "markDefs": [], + "children": [ + { + "_key": "3a33f2cb54af", + "_type": "span", + "marks": [], + "text": "###The Dockerfile" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f30f62bc089" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "50364dafcb96" + } + ], + "_type": "block", + "style": "normal", + "_key": "4cb45a2ade99" + }, + { + "style": "normal", + "_key": "3f1d99c7b705", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "5f45a4596c7c" + }, + { + "text": "Dockerfile", + "_key": "6e92add363fc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "908e792d54df", + "_type": "span", + "marks": [], + "text": " contains all the instructions required by Docker to build the Docker image. It provides a transparent and consistent way to specify the base operating system and installation of all software, libraries and modules." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "eedc860980f3" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb6597312e37" + }, + { + "_key": "d4223ee66e84", + "markDefs": [], + "children": [ + { + "text": "We begin by creating a file ", + "_key": "b0b033a77a83", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "Dockerfile", + "_key": "69aa3263d8b0" + }, + { + "text": " in the Nextflow project directory. The Dockerfile begins with:", + "_key": "ea7e45e2295a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b6aef4e4bff6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dd2d05610c5b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "c95932bd73bd", + "code": "# Set the base image to debian jessie\nFROM debian:jessie\n\n# File Author / Maintainer\nMAINTAINER Evan Floden " + }, + { + "children": [ + { + "_key": "dbd6ec0da776", + "_type": "span", + "marks": [], + "text": "This sets the base distribution for our Docker image to be Debian v8.4, a lightweight Linux distribution that is ideally suited for the task. We must also specify the maintainer of the Docker image." + } + ], + "_type": "block", + "style": "normal", + "_key": "dd72b4cb8f73", + "markDefs": [] + }, + { + "_key": "e3e22b6493fa", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8ce23ba404a7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "883b5be27cf1", + "_type": "span", + "marks": [], + "text": "Next we update the repository sources and install some essential tools such as " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "wget", + "_key": "de93d23dcc24" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "a75f0d48042f" + }, + { + "_key": "b8f1f6977f76", + "_type": "span", + "marks": [ + "code" + ], + "text": "perl" + }, + { + "text": ".", + "_key": "3d2e30dbd5be", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "24d492f7dd06" + }, + { + "children": [ + { + "_key": "74087f39767c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b388a47cc16" + }, + { + "code": "RUN apt-get update && apt-get install --yes --no-install-recommends \\\n wget \\\n locales \\\n vim-tiny \\\n git \\\n cmake \\\n build-essential \\\n gcc-multilib \\\n perl \\\n python ...", + "_type": "code", + "_key": "0781f0913220" + }, + { + "children": [ + { + "_key": "82c7900bb435", + "_type": "span", + "marks": [], + "text": "Notice that we use the command " + }, + { + "text": "RUN", + "_key": "7029c2127e5e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " before each line. The ", + "_key": "cc075c2808b5", + "_type": "span" + }, + { + "text": "RUN", + "_key": "5372b2fbc07e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " instruction executes commands as if they are performed from the Linux shell.", + "_key": "54c24028d590", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ca70fafd6b8", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "2f4253d9b870" + } + ], + "_type": "block", + "style": "normal", + "_key": "24e6cf4eeaad" + }, + { + "style": "normal", + "_key": "ac0cc4e414e7", + "markDefs": [ + { + "href": "https://blog.replicated.com/2016/02/05/refactoring-a-dockerfile-for-image-size/", + "_key": "3b99f1c6e0d0", + "_type": "link" + }, + { + "_type": "link", + "href": "https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/", + "_key": "ee681c47a630" + } + ], + "children": [ + { + "_key": "a715e201a410", + "_type": "span", + "marks": [], + "text": "Also is good practice to group as many as possible commands in the same " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "RUN", + "_key": "4c0542b30503" + }, + { + "marks": [], + "text": " statement. This reduces the size of the final Docker image. See ", + "_key": "cd0129fc2cb4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3b99f1c6e0d0" + ], + "text": "here", + "_key": "95753b3703a7" + }, + { + "marks": [], + "text": " for these details and ", + "_key": "b3d6166d7b40", + "_type": "span" + }, + { + "text": "here", + "_key": "ea9f63a37e2f", + "_type": "span", + "marks": [ + "ee681c47a630" + ] + }, + { + "_key": "fec090986d03", + "_type": "span", + "marks": [], + "text": " for more best practices." + } + ], + "_type": "block" + }, + { + "_key": "24659e48c3e7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9f35046732fb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://search.cpan.org/~miyagawa/Menlo-1.9003/script/cpanm-menlo", + "_key": "d68e3d739fed" + } + ], + "children": [ + { + "text": "Next we can specify the install of the required perl modules using ", + "_key": "ab9ae2c48fd3", + "_type": "span", + "marks": [] + }, + { + "text": "cpan minus", + "_key": "376a38ae89cc", + "_type": "span", + "marks": [ + "d68e3d739fed" + ] + }, + { + "_key": "b82c42d7d1f5", + "_type": "span", + "marks": [], + "text": ":" + } + ], + "_type": "block", + "style": "normal", + "_key": "57e5a413a943" + }, + { + "children": [ + { + "_key": "0b5c9131deb9", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a23d9bbf5ef9" + }, + { + "code": "# Install perl modules\nRUN cpanm --force CPAN::Meta \\\n YAML \\\n Digest::SHA \\\n Module::Build \\\n Data::Stag \\\n Config::Simple \\\n Statistics::Lite ...", + "_type": "code", + "_key": "e7530c3f6dba" + }, + { + "_type": "block", + "style": "normal", + "_key": "83711b5bfb64", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We can give the instructions to download and install software from GitHub using:", + "_key": "c3ff2167e3c1" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "00fd8f533a9a", + "children": [ + { + "text": "", + "_key": "6891af5db4de", + "_type": "span" + } + ] + }, + { + "_type": "code", + "_key": "ac765553f6ad", + "code": "# Install Star Mapper\nRUN wget -qO- https://github.com/alexdobin/STAR/archive/2.5.2a.tar.gz | tar -xz \\\n && cd STAR-2.5.2a \\\n && make STAR" + }, + { + "_type": "block", + "style": "normal", + "_key": "5387c5d1aae0", + "markDefs": [], + "children": [ + { + "_key": "21f01a7dee08", + "_type": "span", + "marks": [], + "text": "We can add custom Perl modules and specify environmental variables such as " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "PERL5LIB", + "_key": "3c35ccd9597e" + }, + { + "_key": "7edd690d58bf", + "_type": "span", + "marks": [], + "text": " as below:" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "88da8fa38161" + } + ], + "_type": "block", + "style": "normal", + "_key": "95b43e15b080" + }, + { + "_key": "02cae409f036", + "code": "# Install FEELnc\nRUN wget -q https://github.com/tderrien/FEELnc/archive/a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && unzip a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip \\\n && mv FEELnc-a6146996e06f8a206a0ae6fd59f8ca635c7d9467 /FEELnc \\\n && rm a6146996e06f8a206a0ae6fd59f8ca635c7d9467.zip\n\nENV FEELNCPATH /FEELnc\nENV PERL5LIB $PERL5LIB:${FEELNCPATH}/lib/", + "_type": "code" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "R and R libraries can be installed as follows:", + "_key": "fab1d01a8d76" + } + ], + "_type": "block", + "style": "normal", + "_key": "3db7c8965a0b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "369cb978dbc9", + "children": [ + { + "_key": "7e8b16febe0b", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "b635cd93fe02", + "code": "# Install R\nRUN echo \"deb http://cran.rstudio.com/bin/linux/debian jessie-cran3/\" >> /etc/apt/sources.list &&\\\napt-key adv --keyserver keys.gnupg.net --recv-key 381BA480 &&\\\napt-get update --fix-missing && \\\napt-get -y install r-base\n\n# Install R libraries\nRUN R -e 'install.packages(\"ROCR\", repos=\"http://cloud.r-project.org/\"); install.packages(\"randomForest\",repos=\"http://cloud.r-project.org/\")'", + "_type": "code" + }, + { + "style": "normal", + "_key": "f897e630ac44", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/cbcrg/lncRNA-Annotation-nf/blob/master/Dockerfile", + "_key": "95d80901751f" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the complete working Dockerfile of this project see ", + "_key": "31f01f88d7d4" + }, + { + "marks": [ + "95d80901751f" + ], + "text": "here", + "_key": "61cd37841c10", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "34a2ed31ef9a", + "children": [ + { + "text": "", + "_key": "ec0c46f9c3c6", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "1abf7c16ad8c", + "markDefs": [], + "children": [ + { + "_key": "99404c3f6b68", + "_type": "span", + "marks": [], + "text": "###Building the Docker Image" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "1d5e4d812566" + } + ], + "_type": "block", + "style": "normal", + "_key": "b636fd70f4f5" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once we start working on the Dockerfile, we can build it anytime using:", + "_key": "5264f09e8e11" + } + ], + "_type": "block", + "style": "normal", + "_key": "5650048a4760" + }, + { + "style": "normal", + "_key": "d641e7f6bf5b", + "children": [ + { + "text": "", + "_key": "70fcc4126623", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "docker build -t skptic/lncRNA_annotation .", + "_type": "code", + "_key": "e90f06c1b843" + }, + { + "_key": "8ccc8a028371", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This builds the image from the Dockerfile and assigns a tag (i.e. a name) for the image. If there are no errors, the Docker image is now in you local Docker repository ready for use.", + "_key": "fe3388bbb799", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "23129a90f294" + } + ], + "_type": "block", + "style": "normal", + "_key": "7738ce1608b0" + }, + { + "_type": "block", + "style": "normal", + "_key": "53e684ed0883", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Testing the Docker Image", + "_key": "ac9aefee0790", + "_type": "span" + } + ] + }, + { + "_key": "f459dd9c7e8f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "29310a754336" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "995c7f634de1", + "markDefs": [], + "children": [ + { + "text": "We find it very helpful to test our images as we develop the Docker file. Once built, it is possible to launch the Docker image and test if the desired software was correctly installed. For example, we can test if FEELnc and its dependencies were successfully installed by running the following:", + "_key": "0f7532136e6a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "76902143bcad" + } + ], + "_type": "block", + "style": "normal", + "_key": "0f3a99e8f0f9" + }, + { + "_type": "code", + "_key": "8bc163f9f47c", + "code": "docker run -ti lncrna_annotation\n\ncd FEELnc/test\n\nFEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf \\\n> -b transcript_biotype=protein_coding > candidate_lncRNA.gtf\n\nexit # remember to exit the Docker image" + }, + { + "style": "normal", + "_key": "8a04e5fe54c3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "###Tagging the Docker Image", + "_key": "3c27e7d47f5a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "376d9185809e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b58a8fff4134" + } + ], + "_type": "block" + }, + { + "_key": "1a1035fe3e9e", + "markDefs": [ + { + "_type": "link", + "href": "https://hub.docker.com/", + "_key": "e8267b213edb" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you are confident your image is built correctly, you can tag it, allowing you to push it to ", + "_key": "0e81f997274e" + }, + { + "marks": [ + "e8267b213edb" + ], + "text": "Dockerhub.io", + "_key": "9f99511c671e", + "_type": "span" + }, + { + "marks": [], + "text": ". Dockerhub is an online repository for docker images which allows anyone to pull public images and run them.", + "_key": "62279d8d8677", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7ad2329cd8e6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "629916622f88" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "ab2403070ee6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You can view the images in your local repository with the ", + "_key": "83a7985ea39e", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "docker images", + "_key": "9b9c237f8f87" + }, + { + "_key": "6aaa7f1f9459", + "_type": "span", + "marks": [], + "text": " command and tag using " + }, + { + "text": "docker tag", + "_key": "56edcf9c0231", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " with the image ID and the name.", + "_key": "fccfd00ea0ef", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "2883293716da", + "children": [ + { + "text": "", + "_key": "4796d2e24cad", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "cb58c9b6a966", + "code": "docker images\n\nREPOSITORY TAG IMAGE ID CREATED SIZE\nlncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB\n\ndocker tag d8ec49cbe3ed cbcrg/lncrna_annotation:latest", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "_key": "efecf9499efc", + "_type": "span", + "marks": [], + "text": "Now when we check our local images we can see the updated tag." + } + ], + "_type": "block", + "style": "normal", + "_key": "977cb77dafd8" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "de27f8c8d34d" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e069ba58981" + }, + { + "_key": "859c42e5cad8", + "code": "docker images\n\nREPOSITORY TAG IMAGE ID CREATED SIZE\ncbcrg/lncrna_annotation latest d8ec49cbe3ed 2 minutes ago 821.5 MB", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "###Pushing the Docker Image to Dockerhub", + "_key": "adbb0489873f" + } + ], + "_type": "block", + "style": "normal", + "_key": "36110c0bc0bc" + }, + { + "_key": "72eb6aa2d1ff", + "children": [ + { + "text": "", + "_key": "1818ebcbf996", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hub.docker.com/", + "_key": "1cf86a9aeb72" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you have not previously, sign up for a Dockerhub account ", + "_key": "d3c68be9bab9" + }, + { + "text": "here", + "_key": "73adbe5a767b", + "_type": "span", + "marks": [ + "1cf86a9aeb72" + ] + }, + { + "_key": "fdb56fb68fc0", + "_type": "span", + "marks": [], + "text": ". From the command line, login to Dockerhub and push your image." + } + ], + "_type": "block", + "style": "normal", + "_key": "a7bd5e43df27" + }, + { + "children": [ + { + "_key": "74b517dfa3a2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "76d11410797f" + }, + { + "code": "docker login --username=cbcrg\ndocker push cbcrg/lncrna_annotation", + "_type": "code", + "_key": "72e018a1b3a7" + }, + { + "markDefs": [], + "children": [ + { + "text": "You can test if you image has been correctly pushed and is publicly available by removing your local version using the IMAGE ID of the image and pulling the remote:", + "_key": "603c47308e12", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4e814562758e" + }, + { + "_type": "block", + "style": "normal", + "_key": "12f3da40fcc2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "25b68c28836e" + } + ] + }, + { + "_type": "code", + "_key": "9c8cc03d66d4", + "code": "docker rmi -f d8ec49cbe3ed\n\n# Ensure the local version is not listed.\ndocker images\n\ndocker pull cbcrg/lncrna_annotation" + }, + { + "markDefs": [], + "children": [ + { + "_key": "fb18e8ebb6fb", + "_type": "span", + "marks": [], + "text": "We are now almost ready to run our pipeline. The last step is to set up the Nexflow config." + } + ], + "_type": "block", + "style": "normal", + "_key": "67b0d083f1e1" + }, + { + "_key": "851718e1c203", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7e1f6285672c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "###Nextflow Configuration", + "_key": "ee703ba1a7b8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "89a8e9b57253" + }, + { + "style": "normal", + "_key": "301b53373abc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3c12e4f84be6" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "853618d141bc", + "markDefs": [], + "children": [ + { + "_key": "e450d4c03687", + "_type": "span", + "marks": [], + "text": "Within the " + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "eb562dcd976e", + "_type": "span" + }, + { + "text": " file in the main project directory we can add the following line which links the Docker image to the Nexflow execution. The images can be:", + "_key": "0aada97916c3", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "bcefea639daa", + "children": [ + { + "text": "", + "_key": "56eecb336e47", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b30a67dabb0c", + "listItem": "bullet", + "children": [ + { + "marks": [], + "text": "General (same docker image for all processes):", + "_key": "dc79564414fe", + "_type": "span" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "88c45b92cac0" + }, + { + "_type": "span", + "text": " process {\n container = 'cbcrg/lncrna_annotation'\n }\n", + "_key": "b8e693fb94da" + }, + { + "marks": [], + "text": "Specific to a profile (specified by `-profile crg` for example):", + "_key": "dbfdedf028a7", + "_type": "span" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "93fd2bf6af97" + }, + { + "text": " profile {\n crg {\n container = 'cbcrg/lncrna_annotation'\n }\n }\n", + "_key": "0e604dd4732b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Specific to a given process within a pipeline:", + "_key": "711cd0470649" + }, + { + "text": "\n\n", + "_key": "345b60ce2451", + "_type": "span" + }, + { + "_type": "span", + "text": " $processName.container = 'cbcrg/lncrna_annotation'", + "_key": "96fd3fcfe331" + } + ] + }, + { + "children": [ + { + "_key": "14aebd1b7468", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a560b7a67c40" + }, + { + "markDefs": [ + { + "_key": "f61aacdb2ef0", + "_type": "link", + "href": "https://www.nextflow.io/blog/2016/best-practice-for-reproducibility.html" + } + ], + "children": [ + { + "_key": "1a64527f3033", + "_type": "span", + "marks": [], + "text": "In most cases it is easiest to use the same Docker image for all processes. One further thing to consider is the inclusion of the sha256 hash of the image in the container reference. I have " + }, + { + "_type": "span", + "marks": [ + "f61aacdb2ef0" + ], + "text": "previously written about this", + "_key": "38cf9657683c" + }, + { + "_type": "span", + "marks": [], + "text": ", but briefly, including a hash ensures that not a single byte of the operating system or software is different.", + "_key": "bc4e97553513" + } + ], + "_type": "block", + "style": "normal", + "_key": "4033d3bebdf9" + }, + { + "_key": "0eaf14f96c05", + "children": [ + { + "text": "", + "_key": "441548f75de3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": " process {\n container = 'cbcrg/lncrna_annotation@sha256:9dfe233b...'\n }", + "_type": "code", + "_key": "e986f84b6af5" + }, + { + "markDefs": [], + "children": [ + { + "text": "All that is left now to run the pipeline.", + "_key": "132c729c8d25", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "39e6843958d4" + }, + { + "_type": "block", + "style": "normal", + "_key": "a4a85a9e7228", + "children": [ + { + "text": "", + "_key": "3eba503d4fca", + "_type": "span" + } + ] + }, + { + "_key": "a90f3eeed817", + "code": "nextflow run lncRNA-Annotation-nf -profile test", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "text": "Whilst I have explained this step-by-step process in a linear, consequential manner, in reality the development process is often more circular with changes in the Docker images reflecting changes in the pipeline.", + "_key": "6bc1b9275274", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e51c1eda68c5" + }, + { + "_key": "f4ab602e7e18", + "children": [ + { + "_key": "7ece5b1d69ec", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "###CircleCI and Nextflow", + "_key": "127548bd2ca6" + } + ], + "_type": "block", + "style": "normal", + "_key": "3e9640736a34", + "markDefs": [] + }, + { + "style": "normal", + "_key": "8bd12b9a35d5", + "children": [ + { + "_key": "48776ea3a77d", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "52d7d21fec88", + "_type": "link", + "href": "http://www.circleci.com" + } + ], + "children": [ + { + "text": "Now that you have a pipeline that successfully runs on a test dataset with Docker, a very useful step is to add a continuous development component to the pipeline. With this, whenever you push a modification of the pipeline to the GitHub repo, the test data set is run on the ", + "_key": "e1a0115c8a63", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "52d7d21fec88" + ], + "text": "CircleCI", + "_key": "bf9e5650e51a", + "_type": "span" + }, + { + "_key": "a7690c9f35e1", + "_type": "span", + "marks": [], + "text": " servers (using Docker)." + } + ], + "_type": "block", + "style": "normal", + "_key": "006188af7329" + }, + { + "children": [ + { + "_key": "565cf1047e08", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bbb43942df4f" + }, + { + "_key": "41364a5f63d3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To include CircleCI in the Nexflow pipeline, create a file named ", + "_key": "f8cea4ca2097", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "circle.yml", + "_key": "fa98c01db045" + }, + { + "_type": "span", + "marks": [], + "text": " in the project directory. We add the following instructions to the file:", + "_key": "2f21b332f3b0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e2b0b14d0fd2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "25942a6c677a" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "7433acb412d2", + "code": "machine:\n java:\n version: oraclejdk8\n services:\n - docker\n\ndependencies:\n override:\n\ntest:\n override:\n - docker pull cbcrg/lncrna_annotation\n - curl -fsSL get.nextflow.io | bash\n - ./nextflow run . -profile test" + }, + { + "markDefs": [], + "children": [ + { + "text": "Next you can sign up to CircleCI, linking your GitHub account.", + "_key": "433129d9fd5e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "70d6d1859e1d" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3a0243c5639e" + } + ], + "_type": "block", + "style": "normal", + "_key": "2f2296b7bb34" + }, + { + "style": "normal", + "_key": "261b716a06a7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Within the GitHub README.md you can add a badge with the following:", + "_key": "0be2d4a60379" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "64db719ff8e1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c227bf5b9089" + }, + { + "code": "![CircleCI status](https://circleci.com/gh/cbcrg/lncRNA-Annotation-nf.png?style=shield)", + "_type": "code", + "_key": "a375b3bed0e9" + }, + { + "_type": "block", + "style": "normal", + "_key": "1642b961bc5a", + "markDefs": [], + "children": [ + { + "text": "###Tips and Tricks", + "_key": "46f101c27e69", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "993ba2832874", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dd9168b63937" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "File permissions", + "_key": "a2f6b726c62d" + }, + { + "text": ": When a process is executed by a Docker container, the UNIX user running the process is not you. Therefore any files that are used as an input should have the appropriate file permissions. For example, I had to change the permissions of all the input data in the test data set with:", + "_key": "287310bd8df1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d33d746e9473" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f294eccb09f6" + } + ], + "_type": "block", + "style": "normal", + "_key": "0fd72cb1c652" + }, + { + "_key": "d03452f5b41c", + "markDefs": [], + "children": [ + { + "text": "find ", + "_key": "d7e384eae7c7", + "_type": "span", + "marks": [] + }, + { + "_key": "28f9a9ea28bf", + "_type": "span", + "text": "" + }, + { + "_type": "span", + "marks": [], + "text": " -type f -exec chmod 644 {} ", + "_key": "d91084971fde" + }, + { + "_type": "span", + "text": "\\;", + "_key": "542ab5615352" + }, + { + "_type": "span", + "marks": [], + "text": " find ", + "_key": "64fa506b82ef" + }, + { + "_type": "span", + "text": "", + "_key": "151f315f9c14" + }, + { + "_type": "span", + "marks": [], + "text": " -type d -exec chmod 755 {} ", + "_key": "d588ee4d8fb4" + }, + { + "text": "\\;", + "_key": "83813e5d73d2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "8dc4ce35290d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5c097f5ad5b2" + }, + { + "_type": "block", + "style": "normal", + "_key": "16153769e1e6", + "markDefs": [ + { + "_type": "link", + "href": "mailto:/evanfloden@gmail.com", + "_key": "a645ea709cb2" + } + ], + "children": [ + { + "text": "###Summary This was my first time building a Docker image and after a bit of trial-and-error the process was surprising straight forward. There is a wealth of information available for Docker and the almost seamless integration with Nextflow is fantastic. Our collaboration team is now looking forward to applying the pipeline to different datasets and publishing the work, knowing our results will be completely reproducible across any platform. ", + "_key": "f21d4187558c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "text": "", + "_key": "4270a7ace6f5" + }, + { + "_type": "span", + "text": "", + "_key": "91edb73c5f75" + }, + { + "text": "/evanfloden@gmail.com", + "_key": "38e9df5b660d", + "_type": "span", + "marks": [ + "a645ea709cb2" + ] + } + ] + } + ], + "publishedAt": "2016-06-10T06:00:00.000Z", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_rev": "Ot9x7kyGeH5005E3MIo38v" + }, + { + "_updatedAt": "2024-09-27T09:09:49Z", + "_id": "5635ab13468c", + "title": "Nextflow workshop at the 20th KOGO Winter Symposium", + "meta": { + "slug": { + "current": "nxf-nf-core-workshop-kogo" + }, + "description": "Through a partnership between AWS Asia Pacific and Japan, and Seqera, Nextflow touched ground in South Korea for the first time with a training session at the Korea Genome Organization (KOGO) Winter Symposium. The objective was to introduce participants to Nextflow, empowering them to craft their own pipelines. Recognizing the interest among bioinformaticians, MinSung Cho from AWS Korea’s Healthcare & Research Team decided to sponsor this 90-minute workshop session. This initiative covered my travel expenses and accommodations." + }, + "_createdAt": "2024-09-25T14:18:31Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Through a partnership between AWS Asia Pacific and Japan, and Seqera, Nextflow touched ground in South Korea for the first time with a training session at the Korea Genome Organization (KOGO) Winter Symposium. The objective was to introduce participants to Nextflow, empowering them to craft their own pipelines. Recognizing the interest among bioinformaticians, MinSung Cho from AWS Korea’s Healthcare & Research Team decided to sponsor this 90-minute workshop session. This initiative covered my travel expenses and accommodations.", + "_key": "15827bb9adb4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ddbc0f6780a" + }, + { + "_type": "block", + "style": "normal", + "_key": "1160821ab2e5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fea4de287794" + } + ] + }, + { + "style": "normal", + "_key": "693cbcdc5eb9", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "fcfd9e3cc2ac", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-1113aa37834d3dd5de51eebdde898a49b7b4fad5-1600x1200-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Nextflow workshop at KOGO Winter Symposium 2024", + "_key": "dfa401f56a2f" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/nanoseq/3.1.0", + "_key": "94079640c52c" + }, + { + "_key": "8e2b90528221", + "_type": "link", + "href": "https://github.com/nf-core/tools" + }, + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=KM1A0_GD2vQ", + "_key": "436a47aba2b8" + } + ], + "children": [ + { + "_key": "11dba3b52d89", + "_type": "span", + "marks": [], + "text": "The training commenced with an overview of Nextflow pipelines, exemplified by the " + }, + { + "_type": "span", + "marks": [ + "94079640c52c" + ], + "text": "nf-core/nanoseq", + "_key": "407c839220d9" + }, + { + "marks": [], + "text": " Nextflow pipeline, highlighting the subworkflows and modules. nfcore/nanoseq is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data that can be used to perform base-calling, demultiplexing, QC, alignment, and downstream analysis. Following this, participants engaged in a hands-on workshop using the AWS Cloud9 environment. In 70 minutes, they constructed a basic pipeline for analyzing nanopore sequencing data, incorporating workflow templates, modules, and subworkflows from ", + "_key": "9d8e8038f63d", + "_type": "span" + }, + { + "marks": [ + "8e2b90528221" + ], + "text": "nf-core/tools", + "_key": "182529b5389c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". If you're interested in learning more about the nf-core/nanoseq Nextflow pipeline, I recorded a video talking about it in the nf-core bytesize meeting. You can watch it ", + "_key": "9d08844ac924" + }, + { + "marks": [ + "436a47aba2b8" + ], + "text": "here", + "_key": "ad4616778011", + "_type": "span" + }, + { + "text": ".", + "_key": "7705a45ea03e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3affeea2c2b6" + }, + { + "style": "normal", + "_key": "398c3c5471f1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ca2283886b2d" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-0716bd3af2a7fa5b5fec2494ae09dcf0d52fba18-2446x1378-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Slide from Nextflow workshop at KOGO Winter Symposium 2024", + "_key": "6c9e74f6149d" + }, + { + "_key": "2b4b36f291b5", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.google.com/presentation/d/1OC4ccgbrNet4e499ShIT7S6Gm6S0xr38_OauKPa4G88/edit?usp=sharing", + "_key": "67b6cfd65e72" + }, + { + "_key": "4e92934532b6", + "_type": "link", + "href": "https://github.com/yuukiiwa/nf-core-koreaworkshop" + } + ], + "children": [ + { + "_key": "ba3493be440f", + "_type": "span", + "marks": [], + "text": "You can find the workshop slides " + }, + { + "text": "here", + "_key": "35a089390cea", + "_type": "span", + "marks": [ + "67b6cfd65e72" + ] + }, + { + "text": " and the GitHub repository with source code ", + "_key": "2eb706b73483", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "4e92934532b6" + ], + "text": "here", + "_key": "b2b180a3f699", + "_type": "span" + }, + { + "_key": "2112eed2e327", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "1c2c71553e77", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c5cc1a18691c" + } + ] + }, + { + "style": "normal", + "_key": "c7925a5bc2b2", + "markDefs": [], + "children": [ + { + "text": "The workshop received positive feedback, with participants expressing interest in further sessions to deepen their Nextflow proficiency. Due to this feedback, AWS and the nf-core outreach team are considering organizing small-group local or Zoom training sessions in response to these requests.", + "_key": "fe6bdf88576e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "9f573907ae2a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "63868748e237" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7cdc8fe30886", + "markDefs": [], + "children": [ + { + "text": "It is imperative to acknowledge the invaluable contributions and support from AWS Korea’s Health Care & Research Team, including MinSung Cho, HyunMin Kim, YoungUng Kim, SeungChang Kang, and Jiyoon Hwang, without whom this workshop would not have been possible. Gratitude is also extended to Charlie Lee for fostering collaboration with the nf-core/outreach team.", + "_key": "91da4b118fd8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "_type": "blogPost", + "publishedAt": "2024-03-14T07:00:00.000Z", + "author": { + "_ref": "ntV3A5cVsWRByk7zltFcwH", + "_type": "reference" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5iqb" + }, + { + "publishedAt": "2016-02-11T07:00:00.000Z", + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_id": "56ea9559b182", + "tags": [ + { + "_type": "reference", + "_key": "da5d8937b595", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "f518c5cdae5c" + } + ], + "_rev": "2PruMrLMGpvZP5qAknmBA8", + "_createdAt": "2024-09-25T14:15:09Z", + "_updatedAt": "2024-10-02T13:54:17Z", + "body": [ + { + "children": [ + { + "marks": [ + "em" + ], + "text": "Recently a new feature has been added to Nextflow that allows failing jobs to be rescheduled, automatically increasing the amount of computational resources requested.", + "_key": "3666d1db7d1a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4361d396238b", + "markDefs": [] + }, + { + "style": "normal", + "_key": "dcf8f1de3f97", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ad70a4b38a67", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "The problem", + "_key": "0901fefbf70e", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "a07e21221078", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "138c9b1f7293", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow provides a mechanism that allows tasks to be automatically re-executed when a command terminates with an error exit status. This is useful to handle errors caused by temporary or even permanent failures (i.e. network hiccups, broken disks, etc.) that may happen in a cloud based environment.", + "_key": "de26488bbf42" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "844f8f64931b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6a1f7eb5f99a", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "However in an HPC cluster these events are very rare. In this scenario error conditions are more likely to be caused by a peak in computing resources, allocated by a job exceeding the original resource requested. This leads to the batch scheduler killing the job which in turn stops the overall pipeline execution.", + "_key": "c1cce72ce72c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6bdc3c9f92e4" + }, + { + "_key": "e1224dc5670f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8647ddaf4d3f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "7e6db9b67074", + "markDefs": [], + "children": [ + { + "_key": "e45457328df8", + "_type": "span", + "marks": [], + "text": "In this context automatically re-executing the failed task is useless because it would simply replicate the same error condition. A common solution consists of increasing the resource request for the needs of the most consuming job, even though this will result in a suboptimal allocation of most of the jobs that are less resource hungry." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1ecaa954e713" + } + ], + "_type": "block", + "style": "normal", + "_key": "b453d56eda27" + }, + { + "style": "normal", + "_key": "30283a44d50b", + "markDefs": [], + "children": [ + { + "_key": "1bcdb6a35c4a", + "_type": "span", + "marks": [], + "text": "Moreover it is also difficult to predict such upper limit. In most cases the only way to determine it is by using a painful fail-and-retry approach." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "5c9bf951dee0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c1d6c7628134", + "markDefs": [] + }, + { + "_key": "c94e918c2f9f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Take in consideration, for example, the following Nextflow process:", + "_key": "586eff75fa6b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "a323dd9093f0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ad15da66cdc9" + } + ] + }, + { + "code": "process align {\n executor 'sge'\n memory 1.GB\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n}", + "_type": "code", + "_key": "80997fbaa80b" + }, + { + "_type": "block", + "style": "normal", + "_key": "9c0952456bab", + "markDefs": [], + "children": [ + { + "text": "The above definition will execute as many jobs as there are fasta files emitted by the ", + "_key": "2e729163a772", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "sequences", + "_key": "441367d58db9", + "_type": "span" + }, + { + "text": " channel. Since the ", + "_key": "cfdb70cee701", + "_type": "span", + "marks": [] + }, + { + "_key": "01e17ab34c7b", + "_type": "span", + "marks": [ + "code" + ], + "text": "retry" + }, + { + "_type": "span", + "marks": [], + "text": " ", + "_key": "ea272e0d0933" + }, + { + "text": "error strategy", + "_key": "0972acdbc79f", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is specified, if the task returns a non-zero error status, Nextflow will reschedule the job execution requesting the same amount of memory and disk storage. In case the error is generated by ", + "_key": "87d038d9d0c9" + }, + { + "_key": "6353f8ffdcfb", + "_type": "span", + "marks": [ + "code" + ], + "text": "t_coffee" + }, + { + "marks": [], + "text": " that it needs more than one GB of memory for a specific alignment, the task will continue to fail, stopping the pipeline execution as a consequence.", + "_key": "d881268f8fd2", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "8af24f71d197", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1f721717afc2", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Increase job resources automatically", + "_key": "441944268af1", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "04d871e1e1d1" + }, + { + "markDefs": [], + "children": [ + { + "text": "A better solution can be implemented with Nextflow which allows resources to be defined in a dynamic manner. By doing this it is possible to increase the memory request when rescheduling a failing task execution. For example:", + "_key": "5e64b65cb092", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "26d9644c4d23" + }, + { + "children": [ + { + "_key": "b473f2aa04e3", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "27919914349f", + "markDefs": [] + }, + { + "_type": "code", + "_key": "9ccec0a274c7", + "code": "process align {\n executor 'sge'\n memory { 1.GB * task.attempt }\n errorStrategy 'retry'\n\n input:\n file 'seq.fa' from sequences\n\n script:\n '''\n t_coffee -in seq.fa\n '''\n}" + }, + { + "style": "normal", + "_key": "57662ca454d2", + "markDefs": [], + "children": [ + { + "_key": "97d1bd070f0e", + "_type": "span", + "marks": [], + "text": "In the above example the memory requirement is defined by using a dynamic rule. The " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "task.attempt", + "_key": "8b77185889e8" + }, + { + "text": " attribute represents the current task attempt (", + "_key": "4b896a750983", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "1", + "_key": "200beab3a265", + "_type": "span" + }, + { + "_key": "371d71e92854", + "_type": "span", + "marks": [], + "text": " the first time the task is executed, " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "2", + "_key": "d4b8d4e76f08" + }, + { + "marks": [], + "text": " the second and so on).", + "_key": "d87ebed10288", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "a799a6353525", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "948441f6bf29" + } + ] + }, + { + "style": "normal", + "_key": "c9ddb758947a", + "markDefs": [], + "children": [ + { + "_key": "f5261360391a", + "_type": "span", + "marks": [], + "text": "The task will then request one GB of memory. In case of an error it will be rescheduled requesting two GB and so on, until it is executed successfully or the limit of times a task can be retried is reached, forcing the termination of the pipeline." + } + ], + "_type": "block" + }, + { + "_key": "7b9b86801343", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b00985fb788a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "155e2f0615a2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is also possible to define the ", + "_key": "c946ae563000" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "errorStrategy", + "_key": "773f9340ddb1" + }, + { + "_key": "3b8107513a6e", + "_type": "span", + "marks": [], + "text": " directive in a dynamic manner. This is useful to re-execute failed jobs only if a certain condition is verified." + } + ], + "_type": "block" + }, + { + "_key": "6962755ade1e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d871143af841" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "bea9b9d8fc2b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For example the Univa Grid Engine batch scheduler returns the exit status ", + "_key": "12dbb975fcfc", + "_type": "span" + }, + { + "text": "140", + "_key": "bfb8fa411f96", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " when a job is terminated because it's using more resources than the ones requested.", + "_key": "bee9f230e8f6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c939affd01b5", + "markDefs": [], + "children": [ + { + "_key": "8d0ec6ac4ef5", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "By checking this exit status we can reschedule only the jobs that fail by exceeding the resources allocation. This can be done with the following directive declaration:", + "_key": "a325de18294a" + } + ], + "_type": "block", + "style": "normal", + "_key": "17e456e86bcd", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0729c1a97d0c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ec9c2214388" + }, + { + "code": "errorStrategy { task.exitStatus == 140 ? 'retry' : 'terminate' }", + "_type": "code", + "_key": "b945a6f92766" + }, + { + "_key": "1a9b82eb47fc", + "markDefs": [], + "children": [ + { + "_key": "972aba6e83a6", + "_type": "span", + "marks": [], + "text": "In this way a failed task is rescheduled only when it returns the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "140", + "_key": "655a9c4f4cce" + }, + { + "text": " exit status. In all other cases the pipeline execution is terminated.", + "_key": "36fd6675fc2e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ec1cab6cea26" + } + ], + "_type": "block", + "style": "normal", + "_key": "133b9036a065", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Conclusion", + "_key": "ef21453a54ca", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "fed22172b44c" + }, + { + "children": [ + { + "_key": "a16d20ae31c9", + "_type": "span", + "marks": [], + "text": "Nextflow provides a very flexible mechanism for defining the job resource request and handling error events. It makes it possible to automatically reschedule failing tasks under certain conditions and to define job resource requests in a dynamic manner so that they can be adapted to the actual job's needs and to optimize the overall resource utilisation." + } + ], + "_type": "block", + "style": "normal", + "_key": "f4cc4c5d4e8e", + "markDefs": [] + } + ], + "title": "Error recovery and automatic resource management with Nextflow", + "_type": "blogPost", + "meta": { + "slug": { + "current": "error-recovery-and-automatic-resources-management" + }, + "description": "Recently a new feature has been added to Nextflow that allows failing jobs to be rescheduled, automatically increasing the amount of computational resources requested." + } + }, + { + "publishedAt": "2024-05-27T06:00:00.000Z", + "_rev": "2PruMrLMGpvZP5qAknmCjr", + "_id": "59c50c076bde", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "nextflow-24.04-highlights" + } + }, + "_createdAt": "2024-09-25T14:18:03Z", + "body": [ + { + "_key": "ae8c611fc8ba", + "markDefs": [ + { + "_key": "e105fd111373", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases" + } + ], + "children": [ + { + "_key": "3e521ac4ecff", + "_type": "span", + "marks": [], + "text": "We release an "edge" version of Nextflow every month and a "stable" version every six months. The stable releases are recommended for production usage and represent a significant milestone. The " + }, + { + "_type": "span", + "marks": [ + "e105fd111373" + ], + "text": "release changelogs", + "_key": "80a928e79cae" + }, + { + "_key": "861bee1d1552", + "_type": "span", + "marks": [], + "text": " contain a lot of detail, so we thought we'd highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let's get into it!" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "36df3ebd7735", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b45186f517d0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "9d00078d3cf8", + "markDefs": [ + { + "_type": "link", + "href": "/podcast/2024/ep41_nextflow_2404.html", + "_key": "6ebeddd8c555" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": ":::tip We also did a podcast episode about some of these changes! Check it out here: ", + "_key": "ee75968de7de" + }, + { + "_key": "0aaab535973d", + "_type": "span", + "marks": [ + "6ebeddd8c555" + ], + "text": "Channels Episode 41" + }, + { + "marks": [], + "text": ". :::", + "_key": "eb5138047221", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "e1a80dd9f082", + "children": [ + { + "_key": "f4d8c885f546", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "52d08cf7120d", + "children": [ + { + "text": "Table of contents", + "_key": "b2d4aa1f32e3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "a72bd5a475e7", + "listItem": "bullet", + "children": [ + { + "_key": "b40dcc57df1a", + "_type": "span", + "marks": [], + "text": "[New features](#new-features)" + }, + { + "_type": "span", + "text": "- [Seqera Containers](#seqera-containers)\n- [Workflow output definition](#workflow-output-definition)\n- [Topic channels](#topic-channels)\n- [Process eval outputs](#process-eval-outputs)\n- [Resource limits](#resource-limits)\n- [Job arrays](#job-arrays)", + "_key": "d4dc4129e3d2" + }, + { + "_type": "span", + "marks": [], + "text": "[Enhancements](#enhancements)", + "_key": "a7d772aa11a8" + }, + { + "text": "- [Colored logs](#colored-logs)\n- [AWS Fargate support](#aws-fargate-support)\n- [OCI auto pull mode for Singularity and Apptainer](#oci-auto-pull-mode-for-singularity-and-apptainer)\n- [Support for GA4GH TES](#support-for-ga4gh-tes)", + "_key": "7d854709f6ca", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "[Fusion](#fusion)", + "_key": "b6255689e892" + }, + { + "text": "- [Enhanced Garbage Collection](#enhanced-garbage-collection)\n- [Increased File Handling Capacity](#increased-file-handling-capacity)\n- [Correct Publishing of Symbolic Links](#correct-publishing-of-symbolic-links)", + "_key": "b9aae95e767d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "[Other notable changes](#other-notable-changes)", + "_key": "0700543a14d6" + } + ] + }, + { + "style": "normal", + "_key": "d861e94639a4", + "children": [ + { + "_key": "e920439bcf03", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "522408976f0f", + "children": [ + { + "_key": "b60180b7be8b", + "_type": "span", + "text": "New features" + } + ] + }, + { + "style": "h3", + "_key": "ec80dbc37861", + "children": [ + { + "_type": "span", + "text": "Seqera Containers", + "_key": "3372eed2ef26" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "A new flagship community offering was revealed at the Nextflow Summit 2024 Boston - ", + "_key": "1f840c0d2f2d", + "_type": "span" + }, + { + "_key": "35d2aa5fed90", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Seqera Containers" + }, + { + "marks": [], + "text": ". This is a free-to-use container cache powered by ", + "_key": "c3ee2e25a45c", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "ef148eed18f6" + ], + "text": "Wave", + "_key": "35c6f76b65c4" + }, + { + "_type": "span", + "marks": [], + "text": ", allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a ", + "_key": "853037045a02" + }, + { + "text": "dedicated blog post", + "_key": "5ed3a9167370", + "_type": "span", + "marks": [ + "72f2197b9867" + ] + }, + { + "_type": "span", + "marks": [], + "text": " about this, but it's worth noting that the service can be used directly from Nextflow and not only through ", + "_key": "fec72cdd5d96" + }, + { + "_type": "span", + "marks": [ + "c23676992428" + ], + "text": "https://seqera.io/containers/", + "_key": "d49e0ec30dd8" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca5a19a849db", + "markDefs": [ + { + "href": "https://seqera.io/wave/", + "_key": "ef148eed18f6", + "_type": "link" + }, + { + "_key": "72f2197b9867", + "_type": "link", + "href": "https://seqera.io/blog/introducing-seqera-pipelines-containers/" + }, + { + "href": "https://seqera.io/containers/", + "_key": "c23676992428", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "a0128f7eb3c0", + "children": [ + { + "text": "", + "_key": "788ebc17382a", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "c4ee7b96b966", + "_type": "span", + "marks": [], + "text": "In order to use Seqera Containers in Nextflow, simply set " + }, + { + "_key": "66cacb763596", + "_type": "span", + "marks": [ + "code" + ], + "text": "wave.freeze" + }, + { + "text": " ", + "_key": "7497a10b8214", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "without", + "_key": "dbcbd2f471e6" + }, + { + "text": " setting ", + "_key": "3e1fd8ebf38e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "wave.build.repository", + "_key": "4ba9c42b9279" + }, + { + "_type": "span", + "marks": [], + "text": " - for example, by using the following config for your pipeline:", + "_key": "9cb3b102cb3a" + } + ], + "_type": "block", + "style": "normal", + "_key": "28bf29210bc9", + "markDefs": [] + }, + { + "style": "normal", + "_key": "ad08a7ff3d2c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "26fa9ef7dfe3" + } + ], + "_type": "block" + }, + { + "code": "wave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'", + "_type": "code", + "_key": "1ffd6f1c5595" + }, + { + "_type": "block", + "style": "normal", + "_key": "0008384c8664", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4e16d1920646" + } + ] + }, + { + "style": "normal", + "_key": "a879cfd5d08d", + "markDefs": [], + "children": [ + { + "text": "Any processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether ", + "_key": "406e1ffb6f9b", + "_type": "span", + "marks": [] + }, + { + "text": "singularity.enabled", + "_key": "3b0cf3ba3f41", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the ", + "_key": "03b7399718a8" + }, + { + "marks": [ + "code" + ], + "text": "nextflow inspect", + "_key": "7ef740ed7ede", + "_type": "span" + }, + { + "text": " command.", + "_key": "67b827eba7fb", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c32a62c04264" + } + ], + "_type": "block", + "style": "normal", + "_key": "4095144d5280" + }, + { + "_type": "block", + "style": "h3", + "_key": "8d36cc81ef56", + "children": [ + { + "_type": "span", + "text": "Workflow output definition", + "_key": "1cf1c2b7c7ed" + } + ] + }, + { + "style": "normal", + "_key": "7c0209c77569", + "markDefs": [], + "children": [ + { + "text": "The workflow output definition is a new syntax for defining workflow outputs:", + "_key": "063ec8a68344", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "6310a83b9768", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "77a8d365a4a2" + }, + { + "code": "nextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]", + "_type": "code", + "_key": "e1bd2668ba0f" + }, + { + "_key": "8660a73f0e86", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4b9b5117825a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4aee265c2462", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/workflow.html#publishing-outputs", + "_key": "19facccbe9e1" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It essentially provides a DSL2-style approach for publishing, and will replace ", + "_key": "135b9f12fd70" + }, + { + "text": "publishDir", + "_key": "2a367a49dba9", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " once it is finalized. It also provides extra flexibility as it allows you to publish ", + "_key": "b8a0ffbda9fa" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "any", + "_key": "a9523b760499" + }, + { + "marks": [], + "text": " channel, not just process outputs. See the ", + "_key": "b806dcea22c7", + "_type": "span" + }, + { + "_key": "69b4fc297514", + "_type": "span", + "marks": [ + "19facccbe9e1" + ], + "text": "Nextflow docs" + }, + { + "marks": [], + "text": " for more information.", + "_key": "2e6dbc719351", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "01d28eef2756", + "children": [ + { + "text": "", + "_key": "c2c5cff274e6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "39c23afa2dfc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": ":::info This feature is still in preview and may change in a future release. We hope to finalize it in version 24.10, so don't hesitate to share any feedback with us! :::", + "_key": "42b961434f6c" + } + ], + "_type": "block" + }, + { + "_key": "8ffdab93ced1", + "children": [ + { + "text": "", + "_key": "ae6591dfb966", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "621b21fc327b", + "children": [ + { + "text": "Topic channels", + "_key": "9339d5e1ca0d", + "_type": "span" + } + ] + }, + { + "_key": "3f55745fd17a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Topic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or "topic":", + "_key": "0338f1e08f41", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b31fd3ed5565", + "children": [ + { + "text": "", + "_key": "e53b7c6ed4a1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f6dc814963d8", + "code": "process foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}", + "_type": "code" + }, + { + "style": "normal", + "_key": "05566be31953", + "children": [ + { + "text": "", + "_key": "c994984a2837", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/channel.html#topic", + "_key": "d8644ea8c4bb" + } + ], + "children": [ + { + "_key": "112ec26989f8", + "_type": "span", + "marks": [], + "text": "Topic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the " + }, + { + "_key": "0f351e4f849f", + "_type": "span", + "marks": [ + "code" + ], + "text": "mix" + }, + { + "marks": [], + "text": " operator). See the ", + "_key": "eb7fd399f9b4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "d8644ea8c4bb" + ], + "text": "Nextflow docs", + "_key": "d0c619a4a20c" + }, + { + "_type": "span", + "marks": [], + "text": " for more information.", + "_key": "7921587dbe75" + } + ], + "_type": "block", + "style": "normal", + "_key": "a285cf3b88d9" + }, + { + "children": [ + { + "_key": "104065ec985e", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "53faf869c1ea" + }, + { + "_key": "3d52960ca449", + "children": [ + { + "_type": "span", + "text": "Process `eval` outputs", + "_key": "3e767101c687" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "daec0ce3c107", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Process ", + "_key": "edebff988a31", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "eval", + "_key": "eb34ce7f57df" + }, + { + "marks": [], + "text": " outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:", + "_key": "a6a0b6a14aa3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "012225305a60", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c035ab69c4a8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "process sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}", + "_type": "code", + "_key": "ccd01ea94415" + }, + { + "style": "normal", + "_key": "d56f2727d629", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8a12333a2850" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "2eccf1cf4a28", + "markDefs": [ + { + "href": "https://nextflow.io/docs/latest/process.html#output-type-eval", + "_key": "164013340ab1", + "_type": "link" + } + ], + "children": [ + { + "text": "The shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a ", + "_key": "dc95b3bc8e04", + "_type": "span", + "marks": [] + }, + { + "_key": "9d2a6aff1809", + "_type": "span", + "marks": [ + "code" + ], + "text": "path" + }, + { + "marks": [], + "text": " or ", + "_key": "338655a33e5a", + "_type": "span" + }, + { + "_key": "41d39b232b1e", + "_type": "span", + "marks": [ + "code" + ], + "text": "env" + }, + { + "text": " output. The new ", + "_key": "66b4cdde231a", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "eval", + "_key": "5f4fd1c9c395", + "_type": "span" + }, + { + "text": " output is a much more convenient way to capture this kind of command output directly. See the ", + "_key": "fbc369e54f96", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "164013340ab1" + ], + "text": "Nextflow docs", + "_key": "e282c93a4844", + "_type": "span" + }, + { + "marks": [], + "text": " for more information.", + "_key": "03dbbf257276", + "_type": "span" + } + ] + }, + { + "_key": "2bd8c1ccbaae", + "children": [ + { + "_key": "0fc5686c0804", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Collecting software versions", + "_key": "dcd0c28602d6", + "_type": "span" + } + ], + "_type": "block", + "style": "h4", + "_key": "734b41a8d204" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Together, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:", + "_key": "41dfd92dcc7e" + } + ], + "_type": "block", + "style": "normal", + "_key": "edaa9475d49b" + }, + { + "children": [ + { + "text": "", + "_key": "43823e387da4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "abfaf563992e" + }, + { + "_type": "code", + "_key": "28c5fdf7cb85", + "code": "process FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}" + }, + { + "children": [ + { + "_key": "9e9c8ba77cc5", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c8e96050b8e4" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/rnaseq/pull/1109", + "_key": "dbb015531547" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/rnaseq/pull/1115", + "_key": "0f01f4e3f263" + } + ], + "children": [ + { + "text": "This approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq ", + "_key": "8714327ba476", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "dbb015531547" + ], + "text": "here", + "_key": "93f8caa31d0e" + }, + { + "marks": [], + "text": " and ", + "_key": "b33a11e19fda", + "_type": "span" + }, + { + "marks": [ + "0f01f4e3f263" + ], + "text": "here", + "_key": "8cedf43eeb1f", + "_type": "span" + }, + { + "_key": "3cc6689db67b", + "_type": "span", + "marks": [], + "text": " to see them in action!" + } + ], + "_type": "block", + "style": "normal", + "_key": "41d32e2c2916" + }, + { + "_key": "9146b6d47cd7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "72cd5747369a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6b1ef11fd3a0", + "children": [ + { + "_type": "span", + "text": "Resource limits", + "_key": "f44fabf72540" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "15076ef3b76e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "e5e2010cc945" + }, + { + "text": "resourceLimits", + "_key": "3a2551ec71d4", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:", + "_key": "65ddce241245", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2ab9a610a82c", + "children": [ + { + "text": "", + "_key": "5fcd12c1fa2b", + "_type": "span" + } + ] + }, + { + "_key": "64ad47053799", + "code": "process.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]", + "_type": "code" + }, + { + "_key": "9b54c9ae3902", + "children": [ + { + "_key": "baac49fc873e", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d3fa1813fceb", + "markDefs": [ + { + "_key": "401af70f9fde", + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#dynamic-computing-resources" + }, + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#resourcelimits", + "_key": "b7a1925ece6e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If a task requests more than the specified limit (e.g. due to ", + "_key": "2ce266e12f3a" + }, + { + "text": "retry with dynamic resources", + "_key": "20c20a706634", + "_type": "span", + "marks": [ + "401af70f9fde" + ] + }, + { + "_key": "80ca2f00a9b8", + "_type": "span", + "marks": [], + "text": "), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the " + }, + { + "marks": [ + "code" + ], + "text": "check_max()", + "_key": "cdfd0b18dd66", + "_type": "span" + }, + { + "text": " function, which can now be replaced with ", + "_key": "d3f1f9c1e222", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "resourceLimits", + "_key": "7c9a672623c5" + }, + { + "text": ". See the ", + "_key": "1899583359ab", + "_type": "span", + "marks": [] + }, + { + "_key": "bd63d4401634", + "_type": "span", + "marks": [ + "b7a1925ece6e" + ], + "text": "Nextflow docs" + }, + { + "marks": [], + "text": " for more information.", + "_key": "954b15f9c395", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b6290cff554b" + } + ], + "_type": "block", + "style": "normal", + "_key": "95d1ca831fb8" + }, + { + "style": "h3", + "_key": "551589835037", + "children": [ + { + "_key": "8152b5db88fe", + "_type": "span", + "text": "Job arrays" + } + ], + "_type": "block" + }, + { + "_key": "bdabcefeaa7d", + "markDefs": [], + "children": [ + { + "text": "Job arrays", + "_key": "6735621e3543", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": " are now supported in Nextflow using the ", + "_key": "0e64b777706f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "array", + "_key": "999aabcfa026" + }, + { + "_type": "span", + "marks": [], + "text": " directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a "job array" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.", + "_key": "3ebf2e7cb85c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b2589c62a953" + } + ], + "_type": "block", + "style": "normal", + "_key": "1502e87fa3c6" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With Nextflow, using job arrays is a one-liner:", + "_key": "82f99ab75041" + } + ], + "_type": "block", + "style": "normal", + "_key": "e9ffaac0ec29" + }, + { + "style": "normal", + "_key": "6df64acb8b0b", + "children": [ + { + "_key": "cb904d24755b", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "code": "process.array = 100", + "_type": "code", + "_key": "fa7c0d0d8b2e" + }, + { + "_key": "657c9f7e6253", + "children": [ + { + "_type": "span", + "text": "", + "_key": "77a9b005acd6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "fe40820e1e62", + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#array" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can also enable job arrays for individual processes like any other directive. See the ", + "_key": "47f50deb216b" + }, + { + "_type": "span", + "marks": [ + "fe40820e1e62" + ], + "text": "Nextflow docs", + "_key": "0ec0ab201634" + }, + { + "_key": "04e0b3828edc", + "_type": "span", + "marks": [], + "text": " for more information." + } + ], + "_type": "block", + "style": "normal", + "_key": "33a9534a2675" + }, + { + "children": [ + { + "_key": "d7265d1cf201", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "4685c61b51cb" + }, + { + "markDefs": [], + "children": [ + { + "_key": "645fcfe3c651", + "_type": "span", + "marks": [], + "text": ":::tip On Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "machineType", + "_key": "13e48c906599" + }, + { + "text": " directive in conjunction with the ", + "_key": "6d0022d2bc90", + "_type": "span", + "marks": [] + }, + { + "text": "cpus", + "_key": "3a626f7bd57e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " and ", + "_key": "b390ba343f6a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "memory", + "_key": "6f0100b86939" + }, + { + "marks": [], + "text": " directives. :::", + "_key": "11a8fdeda8b2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "95d29d016b38" + }, + { + "style": "normal", + "_key": "84c29fa9c40f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bbdcfb99647c" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "59326bb7d850", + "children": [ + { + "_key": "4d1e89cde68a", + "_type": "span", + "text": "Enhancements" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "4b706c1b484b", + "children": [ + { + "_type": "span", + "text": "Colored logs", + "_key": "84266e23add4" + } + ] + }, + { + "_key": "e5380afd470a", + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9d24f5992ace", + "markDefs": [ + { + "href": "https://nextflow.io/blog/2024/nextflow-colored-logs.html", + "_key": "dfa18d7b8f6d", + "_type": "link" + } + ], + "children": [ + { + "_key": "b8237056c730", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Colored logs" + }, + { + "marks": [], + "text": " have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire ", + "_key": "e909f98b2645", + "_type": "span" + }, + { + "text": "blog post", + "_key": "368bb9469d2c", + "_type": "span", + "marks": [ + "dfa18d7b8f6d" + ] + }, + { + "_type": "span", + "marks": [], + "text": " about it, so go check that out for more details!", + "_key": "71b6e615d0bc" + } + ] + }, + { + "_key": "4a46295e9e26", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5277f8addee9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "_key": "3cf067af661c" + }, + { + "alt": "New coloured output from Nextflow", + "_key": "bf914b019395", + "asset": { + "_type": "reference", + "_ref": "image-aca8082c7fcd2be86b3cbd8d29611c81c8127620-2532x1577-png" + }, + "_type": "image" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "94653e98ddde" + } + ], + "_type": "block", + "style": "normal", + "_key": "0d64d965155c" + }, + { + "_type": "block", + "_key": "e8b4f2b16f0e" + }, + { + "_type": "block", + "style": "h3", + "_key": "aee22ad5b92b", + "children": [ + { + "_key": "b31d1b72bdd4", + "_type": "span", + "text": "AWS Fargate support" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "6a33ada10117", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/aws.html#aws-fargate", + "_key": "df9015e7e177" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow now supports ", + "_key": "ed4efebd7c56" + }, + { + "_key": "34eaa40c12dd", + "_type": "span", + "marks": [ + "strong" + ], + "text": "AWS Fargate" + }, + { + "_key": "0e5a86990bd8", + "_type": "span", + "marks": [], + "text": " for AWS Batch jobs. See the " + }, + { + "_type": "span", + "marks": [ + "df9015e7e177" + ], + "text": "Nextflow docs", + "_key": "67dc13efad7f" + }, + { + "_type": "span", + "marks": [], + "text": " for details.", + "_key": "ddbfda3f3e93" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b2e06783cfb8" + } + ], + "_type": "block", + "style": "normal", + "_key": "ee2c62be20e9" + }, + { + "_type": "block", + "style": "h3", + "_key": "4ccfa0b15f77", + "children": [ + { + "_key": "127d720a7b2d", + "_type": "span", + "text": "OCI auto pull mode for Singularity and Apptainer" + } + ] + }, + { + "style": "normal", + "_key": "0186cd80b694", + "markDefs": [], + "children": [ + { + "_key": "8cefb0272343", + "_type": "span", + "marks": [], + "text": "Nextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "63b121fda7a2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8f6bc8e8aeff" + }, + { + "_type": "block", + "style": "normal", + "_key": "bfbe871517c4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now Nextflow allows specifying the option ", + "_key": "c7a4296e6c2a" + }, + { + "text": "ociAutoPull", + "_key": "96c774749ab2", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the ", + "_key": "d58d68d99236", + "_type": "span" + }, + { + "text": "exec", + "_key": "4d13570323f5", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "67ba92a1b76f", + "_type": "span", + "marks": [], + "text": " command." + } + ] + }, + { + "_key": "3b71e4f8b356", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bf12e3f90477" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "singularity.ociAutoPull = true", + "_type": "code", + "_key": "42d99ee2e155" + }, + { + "_key": "40f2dd26bdfc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "221425f6ac8c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "05613f2e89ea", + "markDefs": [], + "children": [ + { + "text": "This results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.", + "_key": "5c21500334e5", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "46d64ab0ebd8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b84cdd204874" + } + ] + }, + { + "markDefs": [ + { + "href": "https://nextflow.io/docs/latest/config.html#scope-singularity", + "_key": "6539c975522d", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See the ", + "_key": "964821bd744b" + }, + { + "marks": [ + "6539c975522d" + ], + "text": "Nextflow docs", + "_key": "d9410a6b7de2", + "_type": "span" + }, + { + "marks": [], + "text": " for more information.", + "_key": "b1f2b22badf0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ffd93508a693" + }, + { + "children": [ + { + "_key": "1dac8bf88bd9", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f8ebe2f1dda8" + }, + { + "_key": "12746235b124", + "children": [ + { + "_type": "span", + "text": "Support for GA4GH TES", + "_key": "718674fca84d" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "5d70c11c99c5", + "markDefs": [ + { + "_key": "6d28ff2ff74f", + "_type": "link", + "href": "https://ga4gh.github.io/task-execution-schemas/docs/" + }, + { + "_key": "a160083885e8", + "_type": "link", + "href": "https://www.ga4gh.org/" + }, + { + "_type": "link", + "href": "https://github.com/ohsu-comp-bio/funnel", + "_key": "eecd77a2b7ac" + }, + { + "href": "https://github.com/microsoft/ga4gh-tes", + "_key": "ebd6249e99b5", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "22c1f960cd92" + }, + { + "text": "Task Execution Service (TES)", + "_key": "e07d5392bd23", + "_type": "span", + "marks": [ + "6d28ff2ff74f" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is an API specification, developed by ", + "_key": "8bdad5a92db6" + }, + { + "marks": [ + "a160083885e8" + ], + "text": "GA4GH", + "_key": "0bb9a2b680c1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are ", + "_key": "37b6cd9e4c7b" + }, + { + "_type": "span", + "marks": [ + "eecd77a2b7ac" + ], + "text": "Funnel", + "_key": "023b74efbd6b" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "24344b030149" + }, + { + "_key": "a55645a98515", + "_type": "span", + "marks": [ + "ebd6249e99b5" + ], + "text": "TES Azure" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "81ca571996b9" + } + ], + "_type": "block" + }, + { + "_key": "66a2becd7c5e", + "children": [ + { + "text": "", + "_key": "dec181d1faa6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the ", + "_key": "88b94210ef74" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "bin", + "_key": "2206cbf28ca2" + }, + { + "marks": [], + "text": " directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:", + "_key": "3aa8ef91e443", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e2fd7cb5d97c" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8c7d1a1e091f" + } + ], + "_type": "block", + "style": "normal", + "_key": "dcd607188b80" + }, + { + "_key": "3967136bc682", + "code": "plugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'", + "_type": "code" + }, + { + "style": "normal", + "_key": "447067039818", + "children": [ + { + "text": "", + "_key": "dcfc1e1af973", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "bef1f47fc67b", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/executor.html#ga4gh-tes", + "_key": "0f440b394c1b" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See the ", + "_key": "b8e67be34604" + }, + { + "marks": [ + "0f440b394c1b" + ], + "text": "Nextflow docs", + "_key": "6276972b35c4", + "_type": "span" + }, + { + "marks": [], + "text": " for more information.", + "_key": "68a7b4e484e8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c45acdd70931" + } + ], + "_type": "block", + "style": "normal", + "_key": "f20f35048655" + }, + { + "_key": "b3f5b3848a0d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": ":::note To better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, ", + "_key": "d7f6e46d4ee0" + }, + { + "text": "nextflow-io/nf-ga4gh", + "_key": "6edc0cfe0a05", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above. :::", + "_key": "cbbe1ba50e00" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "221827e061cf" + } + ], + "_type": "block", + "style": "normal", + "_key": "deaa6fef10f7" + }, + { + "children": [ + { + "_type": "span", + "text": "Fusion", + "_key": "e138c2d8053b" + } + ], + "_type": "block", + "style": "h2", + "_key": "b8a43250eecb" + }, + { + "_key": "1ce4a38931e5", + "markDefs": [ + { + "_key": "bb03411aacf3", + "_type": "link", + "href": "https://seqera.io/fusion/" + } + ], + "children": [ + { + "marks": [ + "bb03411aacf3" + ], + "text": "Fusion", + "_key": "f729ba1c38e1", + "_type": "span" + }, + { + "marks": [], + "text": " is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:", + "_key": "fde17e1bf817", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ed96a9bad94a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5deb6e0d44d8" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Enhanced Garbage Collection", + "_key": "9b8874103506", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "cae24b525118" + }, + { + "_key": "bd5deb6b3df1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage.", + "_key": "58fa1d303887" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0e6ba71967dc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "26478170d5d8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "b4fcd67cef84", + "_type": "span", + "text": "Increased File Handling Capacity" + } + ], + "_type": "block", + "style": "h3", + "_key": "d2eef148c093" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c7a0efb102db", + "_type": "span", + "marks": [], + "text": "Support for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets." + } + ], + "_type": "block", + "style": "normal", + "_key": "d141f0f6e179" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "35505d8fa5b1" + } + ], + "_type": "block", + "style": "normal", + "_key": "69ee157f75d6" + }, + { + "children": [ + { + "_type": "span", + "text": "Correct Publishing of Symbolic Links", + "_key": "43775737120b" + } + ], + "_type": "block", + "style": "h3", + "_key": "54a28730d372" + }, + { + "style": "normal", + "_key": "c8e0916d8498", + "markDefs": [], + "children": [ + { + "text": "In previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.", + "_key": "b79e5d96c7fa", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "2220802c2577", + "children": [ + { + "text": "", + "_key": "8daeb9cbdf54", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e1b46453dbeb", + "markDefs": [], + "children": [ + { + "_key": "b93d2a601bbc", + "_type": "span", + "marks": [], + "text": "These enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users." + } + ] + }, + { + "_key": "a0e88cb461b8", + "children": [ + { + "_key": "e8d94230d96a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "19fc0690bba8", + "_type": "span", + "text": "Other notable changes" + } + ], + "_type": "block", + "style": "h2", + "_key": "50091cbe17e5" + }, + { + "_key": "65703344e2fb", + "listItem": "bullet", + "children": [ + { + "text": "Add native retry on spot termination for Google Batch ([`ea1c1b`](https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b))", + "_key": "c9a6d48a466b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Add support for instance templates in Google Batch ([`df7ed2`](https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96))", + "_key": "721c2453e319" + }, + { + "_type": "span", + "marks": [], + "text": "Allow secrets to be used with `includeConfig` ([`00c9f2`](https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d))", + "_key": "2953286e2ce4" + }, + { + "_key": "505427a34383", + "_type": "span", + "marks": [], + "text": "Allow secrets to be used in the pipeline script ([`df866a`](https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27))" + }, + { + "text": "Add retry strategy for publishing ([`c9c703`](https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761))", + "_key": "1d5347afe351", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Add `k8s.cpuLimits` config option ([`3c6e96`](https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7))", + "_key": "8fe0361ec7d7" + }, + { + "marks": [], + "text": "Removed `seqera` and `defaults` from the standard channels used by the nf-wave plugin. ([`ec5ebd`](https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062))", + "_key": "2d2e4e025000", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7863080c74d6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dba5c01be978" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "9d3fb20c3490", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0" + } + ], + "children": [ + { + "marks": [], + "text": "You can view the full ", + "_key": "25a367774129", + "_type": "span" + }, + { + "text": "Nextflow release notes on GitHub", + "_key": "16fce96ab399", + "_type": "span", + "marks": [ + "9d3fb20c3490" + ] + }, + { + "text": ".", + "_key": "ba376df78e17", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "356b6ff740e0" + } + ], + "_type": "blogPost", + "title": "Nextflow 24.04 - Release highlights", + "_updatedAt": "2024-09-25T14:18:03Z" + }, + { + "_rev": "Ot9x7kyGeH5005E3MJ9TnO", + "title": "Edge release 19.03: The Sequence Read Archive & more!", + "publishedAt": "2019-03-19T07:00:00.000Z", + "tags": [ + { + "_type": "reference", + "_key": "d15804e240ca", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "_createdAt": "2024-09-25T14:15:43Z", + "body": [ + { + "_key": "4e136b88cc83", + "markDefs": [], + "children": [ + { + "text": "It's time for the monthly Nextflow release for March, ", + "_key": "67bc588250ea", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "edge", + "_key": "b9b46318a95d" + }, + { + "text": " version 19.03. This is another great release with some cool new features, bug fixes and improvements.", + "_key": "20d50a0505b2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "9009129bc77d" + } + ], + "_type": "block", + "style": "normal", + "_key": "fad70a23d44e" + }, + { + "children": [ + { + "_key": "a2401babb87b", + "_type": "span", + "text": "SRA channel factory" + } + ], + "_type": "block", + "style": "h3", + "_key": "4c433f4c6a04" + }, + { + "_key": "6a699e8b3acc", + "markDefs": [ + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/sra", + "_key": "cea210fea683" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This sees the introduction of the long-awaited sequence read archive (SRA) channel factory. The ", + "_key": "6df17a604a47" + }, + { + "_key": "b49ecd31d057", + "_type": "span", + "marks": [ + "cea210fea683" + ], + "text": "SRA" + }, + { + "marks": [], + "text": " is a key public repository for sequencing data and run in coordination between The National Center for Biotechnology Information (NCBI), The European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).", + "_key": "a479b062944c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f6b6e637f639" + } + ], + "_type": "block", + "style": "normal", + "_key": "e8ad88128670" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/89", + "_key": "b624e9be2505" + }, + { + "_type": "link", + "href": "https://ewels.github.io/sra-explorer/", + "_key": "b9dae8ab88d1" + }, + { + "_key": "8522e0b18f33", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/channel.html#fromfilepairs" + } + ], + "children": [ + { + "marks": [], + "text": "This feature originates all the way back in ", + "_key": "223e42485b9f", + "_type": "span" + }, + { + "marks": [ + "b624e9be2505" + ], + "text": "2015", + "_key": "dd692646142a", + "_type": "span" + }, + { + "_key": "97d3c5938e37", + "_type": "span", + "marks": [], + "text": " and was worked on during a 2018 Nextflow hackathon. It was brought to fore again thanks to the release of Phil Ewels' excellent " + }, + { + "marks": [ + "b9dae8ab88d1" + ], + "text": "SRA Explorer", + "_key": "66079ce2ee11", + "_type": "span" + }, + { + "marks": [], + "text": ". The SRA channel factory allows users to pull read data in FASTQ format directly from SRA by referencing a study, accession ID or even a keyword. It works in a similar way to ", + "_key": "6e6a1dc8d5c8", + "_type": "span" + }, + { + "text": "`fromFilePairs`", + "_key": "1f3556066637", + "_type": "span", + "marks": [ + "8522e0b18f33" + ] + }, + { + "marks": [], + "text": ", returning a sample ID and files (single or pairs of files) for each sample.", + "_key": "44035ef6d932", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "82cdcee95b66" + }, + { + "_key": "3963c3f311a6", + "children": [ + { + "text": "", + "_key": "037dd2ed8ba4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "The code snippet below creates a channel containing 24 samples from a chromatin dynamics study and runs FASTQC on the resulting files.", + "_key": "8cdf5635cc69", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9efa76fde530", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "141eb1620078", + "children": [ + { + "_type": "span", + "text": "", + "_key": "82d4b6f3c59e" + } + ] + }, + { + "code": "Channel\n .fromSRA('SRP043510')\n .set{reads}\n\nprocess fastqc {\n input:\n set sample_id, file(reads_file) from reads\n\n output:\n file(\"fastqc_${sample_id}_logs\") into fastqc_ch\n\n script:\n \"\"\"\n mkdir fastqc_${sample_id}_logs\n fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads_file}\n \"\"\"\n}", + "_type": "code", + "_key": "8f0078ccd86b" + }, + { + "_key": "69d063032752", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c7dd535fe6d2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "8a99193a8c41", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/edge/channel.html#fromsra", + "_key": "ffaffa9a5edb", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See the ", + "_key": "ab6e5e703afa" + }, + { + "_type": "span", + "marks": [ + "ffaffa9a5edb" + ], + "text": "documentation", + "_key": "33201e68ba32" + }, + { + "_type": "span", + "marks": [], + "text": " for more details. When combined with downstream processes, you can quickly open a firehose of data on your workflow!", + "_key": "f077513fb3f6" + } + ] + }, + { + "style": "normal", + "_key": "dd790e1e582c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a6109a53fc7e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Edge release", + "_key": "7fc30091e0b1", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "69bb5ea8442c" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Note that this is a monthly edge release. To use it simply execute the following command prior to running Nextflow:", + "_key": "814ad965fd03" + } + ], + "_type": "block", + "style": "normal", + "_key": "5bdfa216f2d7" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "5d8898e300d2" + } + ], + "_type": "block", + "style": "normal", + "_key": "473030fcf202" + }, + { + "_type": "code", + "_key": "27498f9d6701", + "code": "export NXF_VER=19.03.0-edge" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "21b5e47530e4" + } + ], + "_type": "block", + "style": "normal", + "_key": "a6c4fcbec253" + }, + { + "style": "h3", + "_key": "c18ed3a40121", + "children": [ + { + "text": "If you need help", + "_key": "c06782c67643", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "dec714c02936", + "markDefs": [ + { + "href": "https://gitter.im/nextflow-io/nextflow", + "_key": "a16101aa8b19", + "_type": "link" + }, + { + "_type": "link", + "href": "https://groups.google.com/forum/#!forum/nextflow", + "_key": "88156b52d77c" + } + ], + "children": [ + { + "text": "Please don’t hesitate to use our very active ", + "_key": "9bd30a295105", + "_type": "span", + "marks": [] + }, + { + "_key": "9639ef714aad", + "_type": "span", + "marks": [ + "a16101aa8b19" + ], + "text": "Gitter" + }, + { + "_type": "span", + "marks": [], + "text": " channel or create a thread in the ", + "_key": "2b34e025d1eb" + }, + { + "_type": "span", + "marks": [ + "88156b52d77c" + ], + "text": "Google discussion group", + "_key": "285724f17a80" + }, + { + "text": ".", + "_key": "4a9683b8e21a", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "6de54a23616a", + "children": [ + { + "text": "", + "_key": "7ebf269746bf", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "Reporting Issues", + "_key": "ccb29324fb85", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "46902e2fd6f2" + }, + { + "style": "normal", + "_key": "f2787a57637c", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues", + "_key": "14d16b367b78" + } + ], + "children": [ + { + "marks": [], + "text": "Experiencing issues introduced by this release? Please report them in our ", + "_key": "c6bb388ffbac", + "_type": "span" + }, + { + "_key": "28f8de0d9ae3", + "_type": "span", + "marks": [ + "14d16b367b78" + ], + "text": "issue tracker" + }, + { + "text": ". Make sure to fill in the fields of the issue template.", + "_key": "2073f625f6a1", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b7b77fe2d3c9" + } + ], + "_type": "block", + "style": "normal", + "_key": "be31f953e1c9" + }, + { + "_key": "13ebcad9bc5c", + "children": [ + { + "_type": "span", + "text": "Contributions", + "_key": "51542dde8c36" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Special thanks to the contributors of this release:", + "_key": "5d026eb2859c" + } + ], + "_type": "block", + "style": "normal", + "_key": "263a1b2bc015" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3ba1273e1ea5" + } + ], + "_type": "block", + "style": "normal", + "_key": "4f56220b988c" + }, + { + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Akira Sekiguchi - [pachiras](https://github.com/pachiras)", + "_key": "a29799d1598d" + }, + { + "text": "Jon Haitz Legarreta Gorroño - [jhlegarreta](https://github.com/jhlegarreta)", + "_key": "6670e4782e27", + "_type": "span", + "marks": [] + }, + { + "text": "Jonathan Leitschuh - [JLLeitschuh](https://github.com/JLLeitschuh)", + "_key": "169b63b134f2", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Kevin Sayers - [KevinSayers](https://github.com/KevinSayers)", + "_key": "b2a178dd0747" + }, + { + "text": "Lukas Jelonek - [lukasjelonek](https://github.com/lukasjelonek)", + "_key": "ff8c294ce074", + "_type": "span", + "marks": [] + }, + { + "text": "Paolo Di Tommaso - [pditommaso](https://github.com/pditommaso)", + "_key": "ffc7efa1b489", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Toni Hermoso Pulido - [toniher](https://github.com/toniher)", + "_key": "9ce417a91ac1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Philippe Hupé [phupe](https://github.com/phupe)", + "_key": "5b208a3e65ac" + }, + { + "_type": "span", + "marks": [], + "text": "[phue](https://github.com/phue)", + "_key": "1e9500fa4f70" + } + ], + "_type": "block", + "style": "normal", + "_key": "0641be74d333" + }, + { + "style": "normal", + "_key": "34343ee97745", + "children": [ + { + "_key": "5f217248be81", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Complete changes", + "_key": "e0b2e51cfb52" + } + ], + "_type": "block", + "style": "h3", + "_key": "b0cc543c715a" + }, + { + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fix Nextflow hangs submitting jobs to AWS batch #1024", + "_key": "0b3e25ecbc1d" + }, + { + "_type": "span", + "marks": [], + "text": "Fix process builder incomplete output [2fe1052c]", + "_key": "f3b800ea1d1b" + }, + { + "marks": [], + "text": "Fix Grid executor reports invalid queue status #1045", + "_key": "98ad13944a82", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Fix Script execute permission is lost in container #1060", + "_key": "eb810ca29d4d" + }, + { + "_type": "span", + "marks": [], + "text": "Fix K8s serviceAccount is not honoured #1049", + "_key": "aebaa57b3678" + }, + { + "text": "Fix K8s kuberun login path #1072", + "_key": "dfd49e50042d", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Fix K8s imagePullSecret and imagePullPolicy #1062", + "_key": "379f83a793af", + "_type": "span" + }, + { + "marks": [], + "text": "Fix Google Storage docs #1023", + "_key": "588ded32f16c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Fix Env variable NXF_CONDA_CACHEDIR is ignored #1051", + "_key": "7ea2910557af" + }, + { + "_type": "span", + "marks": [], + "text": "Fix failing task due to legacy sleep command [3e150b56]", + "_key": "7a5a2328b449" + }, + { + "text": "Fix SplitText operator should accept a closure parameter #1021", + "_key": "5c519c5e6b03", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Add Channel.fromSRA factory method #1070", + "_key": "10875e7a6e98" + }, + { + "text": "Add voluntary/involuntary context switches to metrics #1047", + "_key": "21192affbf70", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Add noHttps option to singularity config #1041", + "_key": "99f003a3e8ec" + }, + { + "_type": "span", + "marks": [], + "text": "Add docker-daemon Singularity support #1043 [dfef1391]", + "_key": "11e07c3d42cb" + }, + { + "marks": [], + "text": "Use peak_vmem and peak_rss as default output in the trace file instead of rss and vmem #1020", + "_key": "a7fcc5a95121", + "_type": "span" + }, + { + "text": "Improve ansi log rendering #996 [33038a18]", + "_key": "9d82820c2778", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7dda48f48328" + }, + { + "children": [ + { + "text": "", + "_key": "7c62a2ea3d0a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2942fbdb143f" + }, + { + "children": [ + { + "_type": "span", + "text": "Breaking changes:", + "_key": "8078dc96d44a" + } + ], + "_type": "block", + "style": "h3", + "_key": "458d3220abb9" + }, + { + "_key": "e99ab15ca693", + "markDefs": [], + "children": [ + { + "_key": "c3735f6f03c7", + "_type": "span", + "marks": [], + "text": "None known." + } + ], + "_type": "block", + "style": "normal" + } + ], + "_id": "5a9b744cd374", + "meta": { + "slug": { + "current": "release-19.03.0-edge" + } + }, + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "_type": "blogPost", + "_updatedAt": "2024-09-26T09:02:14Z" + }, + { + "_updatedAt": "2024-08-16T15:23:20Z", + "tags": [ + { + "_key": "c2000532faf0", + "_ref": "82fd60f1-c6d0-4b8a-9c5d-f971c622f341", + "_type": "reference" + } + ], + "_id": "5cf61b02-f036-49f9-850c-72e0bf3d4f35", + "author": { + "_ref": "mattia-bosio", + "_type": "reference" + }, + "_createdAt": "2024-08-15T08:44:07Z", + "_type": "blogPost", + "title": "Introducing the new Pipeline Launch forms: A leap forward in usability and functionality", + "_rev": "y83n3eQxj1PRqzuDdkeW1u", + "meta": { + "_type": "meta", + "description": "Today, we are excited to introduce the newly redesigned Pipeline launch forms, marking the first phase in a broader initiative to revamp the entire form submission experience across our platform. ", + "noIndex": false, + "slug": { + "_type": "slug", + "current": "new-pipeline-launch-forms" + } + }, + "body": [ + { + "children": [ + { + "marks": [], + "text": "At Seqera, we’re committed to listening to our users feedback and continuously improving the Platform to meet your evolving needs. One of the most ", + "_key": "02fd5be96db90", + "_type": "span" + }, + { + "_key": "02fd5be96db91", + "_type": "span", + "marks": [ + "88ea45bac22e" + ], + "text": "common feature requests" + }, + { + "_key": "02fd5be96db92", + "_type": "span", + "marks": [], + "text": " has been to enhance the form submission process, specifically the Pipeline Launch and Relaunch forms. Today, we are excited to introduce the newly redesigned Pipeline Launch forms, marking the first phase in a broader initiative to " + }, + { + "_key": "02fd5be96db93", + "_type": "span", + "marks": [ + "strong" + ], + "text": "revamp the entire form submission experience" + }, + { + "text": " across our platform. This update drastically simplifies interactions in the Seqera Platform, enhancing the day-to-day user experience by addressing known usability issues in our most frequently used forms.", + "_key": "02fd5be96db94", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c3aaafb21d9e", + "markDefs": [ + { + "_type": "link", + "href": "https://feedback.seqera.io/feature-requests/p/update-forms-user-interface-including-pipeline-launch-relaunch-form-redesign", + "_key": "88ea45bac22e" + } + ] + }, + { + "id": "_fru3RxBDPY", + "_key": "8957e1a6644c", + "_type": "youtube" + }, + { + "_type": "block", + "style": "normal", + "_key": "d013fcd8ab46", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/rnaseq", + "_key": "115a2d75cd73" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Screen recording of the submission of the popular", + "_key": "ff0559080c5f" + }, + { + "text": " ", + "_key": "4a7b74425853", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em", + "115a2d75cd73" + ], + "text": "nf-core rnaseq", + "_key": "825c7d2a20a2" + }, + { + "marks": [], + "text": " ", + "_key": "1371156ece2d", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "pipeline, highlighting several features of the new form.", + "_key": "61f7d7a58d6d" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "/contact-us/", + "_key": "d729831b3c3d" + } + ], + "children": [ + { + "_key": "62c6e4fa4b92", + "_type": "span", + "marks": [], + "text": "The new Pipeline Launch and Relaunch form will be available to all Cloud users in an upcoming release, but if you are interested in being one of the early adopters, please contact your Seqera Account Executive or " + }, + { + "_type": "span", + "marks": [ + "d729831b3c3d" + ], + "text": "send us an email directly", + "_key": "a327f42f4d58" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "0f90a0416b6a" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "0c7e71524011" + }, + { + "_type": "block", + "style": "h2", + "_key": "8d2fef8b52c1", + "markDefs": [], + "children": [ + { + "_key": "b99fcb056cb80", + "_type": "span", + "marks": [], + "text": "Why the change?" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "User experience is at the heart of everything we do. Over time, we've received valuable feedback from our users about the forms on our platform. In particular, we gathered feedback on some of the most frequently used forms: Pipeline Launch, Relaunch and Resume forms. In response, we have made significant enhancements to create a more intuitive, efficient, and user-friendly experience.", + "_key": "e1d6b1a2dfc3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2ab0657a6c5d" + }, + { + "_type": "block", + "style": "h2", + "_key": "839b514875ce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Key Objectives", + "_key": "eff267db0ffe0" + } + ] + }, + { + "_key": "663f9546066e", + "markDefs": [], + "children": [ + { + "text": "The redesign of the Pipeline Launch and Relaunch forms was guided by four objectives:", + "_key": "7ff0898d0be60", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Simpler navigation: ", + "_key": "1d26e055f5610" + }, + { + "_key": "5e9912500f67", + "_type": "span", + "marks": [], + "text": "The new multi-step approach ensures that users can easily navigate through pipeline launch form submissions without unnecessary steps. Key information is stored and grouped logically, allowing users to focus on the essential steps.\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "615515ed9874", + "listItem": "number", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "29322453600b0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Enhanced validation:" + }, + { + "_type": "span", + "marks": [], + "text": " We've added robust validation features to ensure the accuracy and completeness of submitted information, reducing errors and helping users avoid common pitfalls during pipeline configuration.\n", + "_key": "38f8ea4cc069" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0a3c9275d682", + "listItem": "number" + }, + { + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Improved clarity:", + "_key": "9a36879cad1a0", + "_type": "span" + }, + { + "_key": "b59310c7a08f", + "_type": "span", + "marks": [], + "text": " Form content has been updated to be more concise and clear, ensuring users can quickly understand the requirements and options available to them, thus reducing confusion and improving overall efficiency.\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "41c43b0e4337" + }, + { + "style": "normal", + "_key": "62a1887cef00", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_key": "cc7b374d1f3c0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Enhanced key components:" + }, + { + "_key": "0b6abf765bcd", + "_type": "span", + "marks": [], + "text": " Key form components have been redesigned to offer a more intuitive user experience. This includes more dynamic control of the configured parameters, the ability to switch between a UI schema view, and interactive JSON and YAML rendering for full control every time a user launches, relaunches or resumes a pipeline." + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Enhancements", + "_key": "312d4a3b6aaf0" + } + ], + "_type": "block", + "style": "h2", + "_key": "fa6adc370047", + "markDefs": [] + }, + { + "style": "normal", + "_key": "73b1e280cd30", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The redesigned Pipeline Launch and Relaunch forms come with a host of new features designed to improve usability and functionality:", + "_key": "61e7239ff41b0" + } + ], + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "38620088a7ad", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Multi-step approach:", + "_key": "cdb7790f12f30", + "_type": "span" + }, + { + "text": " Users can now navigate through forms with a streamlined, multi-step approach. If everything is set up correctly, there's no need to go through all steps –simply run what you know works.", + "_key": "1890588ea99a", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "b97a50ad5378", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_key": "527b2be40ea70", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Enhanced assistance:" + }, + { + "text": " We've improved feedback mechanisms to provide detailed information about errors or missing parameters helping users to quickly identify and rectify issues before launching pipelines.", + "_key": "e27e6ee2d88f", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Developer-friendly:", + "_key": "b73d4356a53b0" + }, + { + "text": " Developers can switch between UI schema views and a more comprehensive parameter view using JSON and YAML interactive rendering. This flexibility allows for dynamic control of form validity and ensures that developers have the tools they need to configure their pipelines effectively.", + "_key": "b02329811097", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5e86cf51f887" + }, + { + "_type": "block", + "style": "normal", + "_key": "407755155091", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_key": "ea635d07dac20", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Enhanced rendering:" + }, + { + "_type": "span", + "marks": [], + "text": " The form now dynamically generates the UI interface for parameter input whenever a compatible schema is defined. This improvement addresses previous limitations where the UI interface was only rendered when launching a saved pipeline, as opposed to relaunching or using Quick Launch. With this update, the UI is rendered consistently across all launching scenarios, providing a more convenient and streamlined experience.", + "_key": "34bd6d508e99" + } + ], + "level": 1 + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "fdc5babed53d", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Improved flow and status information:", + "_key": "afcef5fe2e9a0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " The new design offers a smoother flow and more informative status updates, providing a clear view of the submission process at every stage.", + "_key": "45d7a470b0fb" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Summary step:", + "_key": "3e527527f45a", + "_type": "span" + }, + { + "_key": "88d8bdfc44f4", + "_type": "span", + "marks": [], + "text": " A new summary view allows users to review all information at a glance before launching their pipeline.\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "909bfadb3e3e", + "listItem": "number" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Summary", + "_key": "1a460c4d3b490" + } + ], + "_type": "block", + "style": "h2", + "_key": "73fc1879d9a4" + }, + { + "style": "normal", + "_key": "2f8698228d89", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "By focusing on these key objectives and enhancements, we aimed to improve one of the most commonly performed actions in the Seqera Platform. The redesigned form makes it easier for new and experienced users to run pipelines in the Seqera Platform. This effort is just the beginning of our goal of enhancing the form submission experience across the platform. Moreover, this initial refactor enables us to continue improving and expanding the user experience in the future. You can expect more enhancements as we roll out additional features and improvements based on our community feedback.", + "_key": "f7514251e56d0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0fcc413706ad", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "77091802bb0d", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "dc820fe7dbe3", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.seqera.io/platform/24.1/launch/launchpad#launch-form", + "_key": "0e8e643720c4" + } + ], + "children": [ + { + "text": "Read the ", + "_key": "9654eadd2b3b0", + "_type": "span", + "marks": [] + }, + { + "_key": "bdb86ee0c748", + "_type": "span", + "marks": [ + "0e8e643720c4" + ], + "text": "official documentation" + }, + { + "_type": "span", + "marks": [], + "text": " to find out more.", + "_key": "881341a728c4" + } + ], + "_type": "block", + "style": "normal" + } + ], + "publishedAt": "2024-08-15T09:11:00.000Z" + }, + { + "body": [ + { + "_key": "8aa3cb416193", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/c080150321e5000a2c891e477bb582df07b7f75f", + "_key": "c4f5b0ead172" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow is growing up. The past week marked five years since the ", + "_key": "bbe2d4c241bc", + "_type": "span" + }, + { + "text": "first commit", + "_key": "a859381e5e69", + "_type": "span", + "marks": [ + "c4f5b0ead172" + ] + }, + { + "text": " of the project on GitHub. Like a parent reflecting on their child attending school for the first time, we know reaching this point hasn’t been an entirely solo journey, despite Paolo's best efforts!", + "_key": "1fef043c10ae", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "bbaf86238dec", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "02a83dd87176" + }, + { + "_key": "3f04f1651ccf", + "markDefs": [ + { + "_type": "link", + "href": "https://gitter.im/nextflow-io/nextflow", + "_key": "ed87d807c47d" + } + ], + "children": [ + { + "_key": "9595742c88f8", + "_type": "span", + "marks": [], + "text": "A lot has happened recently and we thought it was time to highlight some of the recent evolutions. We also take the opportunity to extend the warmest of thanks to all those who have contributed to the development of Nextflow as well as the fantastic community of users who consistently provide ideas, feedback and the occasional late night banter on the " + }, + { + "marks": [ + "ed87d807c47d" + ], + "text": "Gitter channel", + "_key": "6437820f7d18", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3040af7e4e7a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "87ee7dc08dc5", + "children": [ + { + "_key": "909aef394280", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "fd78a8def337", + "_type": "span", + "marks": [], + "text": "Here are a few neat developments churning out of the birthday cake mix." + } + ], + "_type": "block", + "style": "normal", + "_key": "616102d157b5", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "5f116c6a9a5e" + } + ], + "_type": "block", + "style": "normal", + "_key": "5ac0bffdeba5" + }, + { + "_type": "block", + "style": "h3", + "_key": "e7c8758d1ac7", + "children": [ + { + "_type": "span", + "text": "nf-core", + "_key": "d056eb05a8e8" + } + ] + }, + { + "children": [ + { + "_key": "b00b7bc82eee", + "_type": "span", + "marks": [ + "9c09d1ca00c5" + ], + "text": "nf-core" + }, + { + "_type": "span", + "marks": [], + "text": " is a community effort to provide a home for high quality, production-ready, curated analysis pipelines built using Nextflow. The project has been initiated and is being led by ", + "_key": "eaa24b3c6570" + }, + { + "text": "Phil Ewels", + "_key": "e8d0714e004a", + "_type": "span", + "marks": [ + "ea5a8f960ef4" + ] + }, + { + "_key": "6a9e17c0d5fe", + "_type": "span", + "marks": [], + "text": " of " + }, + { + "marks": [ + "9ab93ebace41" + ], + "text": "MultiQC", + "_key": "e97a11552fa6", + "_type": "span" + }, + { + "_key": "4d0938f702a6", + "_type": "span", + "marks": [], + "text": " fame. The principle is that " + }, + { + "marks": [ + "em" + ], + "text": "nf-core", + "_key": "edb2200325d7", + "_type": "span" + }, + { + "text": " pipelines can be used out-of-the-box or as inspiration for something different.", + "_key": "f91bad020ca0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f92e7518086e", + "markDefs": [ + { + "href": "https://nf-core.github.io/", + "_key": "9c09d1ca00c5", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/ewels", + "_key": "ea5a8f960ef4" + }, + { + "_type": "link", + "href": "http://multiqc.info/", + "_key": "9ab93ebace41" + } + ] + }, + { + "children": [ + { + "_key": "4cd173191a0c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7f5348911373" + }, + { + "style": "normal", + "_key": "f5c2d5b8c9a6", + "markDefs": [ + { + "href": "https://github.com/nf-core/cookiecutter", + "_key": "42dd99cd7f47", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As well as being a place for best-practise pipelines, other features of ", + "_key": "5b9cacd6453c" + }, + { + "text": "nf-core", + "_key": "5f3915410c62", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": " include the ", + "_key": "c3aa039155e2", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "42dd99cd7f47" + ], + "text": "cookie cutter template tool", + "_key": "fb90e7ca7a74" + }, + { + "_type": "span", + "marks": [], + "text": " which provides a fast way to create a dependable workflow using many of Nextflow’s sweet capabilities such as:", + "_key": "09057a667619" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "697d109130fb", + "children": [ + { + "_key": "975226ed8062", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "d2713e4ab3ef", + "listItem": "bullet", + "children": [ + { + "_key": "b4d708cb8b54", + "_type": "span", + "marks": [], + "text": "_Outline:_ Skeleton pipeline script." + }, + { + "_type": "span", + "marks": [], + "text": "_Data:_ Reference Genome implementation (AWS iGenomes).", + "_key": "fcc561f5137a" + }, + { + "marks": [], + "text": "_Configuration:_ Robust configuration setup.", + "_key": "7d300365036a", + "_type": "span" + }, + { + "marks": [], + "text": "_Containers:_ Skeleton files for Docker image generation.", + "_key": "b74a1b5490c6", + "_type": "span" + }, + { + "_key": "ca1cba9f2355", + "_type": "span", + "marks": [], + "text": "_Reporting:_ HTML email functionality and and HTML results output." + }, + { + "_type": "span", + "marks": [], + "text": "_Documentation:_ Installation, Usage, Output, Troubleshooting, etc.", + "_key": "831a2602b9f9" + }, + { + "text": "_Continuous Integration:_ Skeleton files for automated testing using Travis CI.", + "_key": "30d7b8b619e8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "1605def1e146" + } + ], + "_type": "block", + "style": "normal", + "_key": "19b199b1bf4b" + }, + { + "style": "normal", + "_key": "e2dfd6dbe553", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "There is also a Python package with helper tools for Nextflow.", + "_key": "633c4098bb67" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c11e49b69cc8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "81cfbdc1a1c7" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e7fb89e9f85a", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-core.github.io", + "_key": "4b0e8810ccfd" + }, + { + "_key": "08b3e7a30f8d", + "_type": "link", + "href": "https://github.com/nf-core" + }, + { + "_type": "link", + "href": "https://twitter.com/nf_core", + "_key": "4ec92c8fb8af" + }, + { + "_key": "09a4ad0516de", + "_type": "link", + "href": "https://gitter.im/nf-core/Lobby" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can find more information about the community via the project ", + "_key": "e144ebf92cfd" + }, + { + "_type": "span", + "marks": [ + "4b0e8810ccfd" + ], + "text": "website", + "_key": "ff467c2e7702" + }, + { + "_key": "7b11a79c2924", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "08b3e7a30f8d" + ], + "text": "GitHub repository", + "_key": "70c08b7e50c8" + }, + { + "text": ", ", + "_key": "808402f96ea0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "4ec92c8fb8af" + ], + "text": "Twitter account", + "_key": "b3b7d41cb353" + }, + { + "_type": "span", + "marks": [], + "text": " or join the dedicated ", + "_key": "4165cadf8be3" + }, + { + "_type": "span", + "marks": [ + "09a4ad0516de" + ], + "text": "Gitter", + "_key": "09885c5c0b3e" + }, + { + "marks": [], + "text": " chat.", + "_key": "3a513c998210", + "_type": "span" + } + ] + }, + { + "_key": "578ac60399d6", + "children": [ + { + "text": "", + "_key": "ffa9b4c9f90f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "343b921e13bb", + "_type": "block" + }, + { + "style": "normal", + "_key": "d374168e58ae", + "markDefs": [ + { + "href": "https://nf-co.re", + "_key": "e16327d51c0b", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "e16327d51c0b" + ], + "text": "![nf-core logo](/img/nf-core-logo-min.png)", + "_key": "56a2575e5314" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "a9751b1a7c35", + "children": [ + { + "text": "", + "_key": "85387ba45fa7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "_key": "b26424f3f6be" + }, + { + "children": [ + { + "_type": "span", + "text": "Kubernetes has landed", + "_key": "7f4b2fc9f995" + } + ], + "_type": "block", + "style": "h3", + "_key": "bfaff5b545ba" + }, + { + "markDefs": [ + { + "_key": "d1e893c4842e", + "_type": "link", + "href": "https://www.youtube.com/watch?v=4ht22ReBjno" + } + ], + "children": [ + { + "text": "As of version 0.28.0 Nextflow now has support for Kubernetes. If you don’t know much about Kubernetes, at its heart it is an open-source platform for the management and deployment of containers at scale. Google led the initial design and it is now maintained by the Cloud Native Computing Foundation. I found the ", + "_key": "dc97336089b0", + "_type": "span", + "marks": [] + }, + { + "_key": "7e8b5cd1f26a", + "_type": "span", + "marks": [ + "d1e893c4842e" + ], + "text": "The Illustrated Children's Guide to Kubernetes" + }, + { + "marks": [], + "text": " particularly useful in explaining the basic vocabulary and concepts.", + "_key": "dc96d8ce5cd9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5ca2f84aa6fc" + }, + { + "_key": "6e3835acf394", + "children": [ + { + "_key": "6e07e461afff", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.openshift.com/", + "_key": "47c246c27377" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Kubernetes looks be one of the key technologies for the application of containers in the cloud as well as for building Infrastructure as a Service (IaaS) and Platform and a Service (PaaS) applications. We have been approached by many users who wish to use Nextflow with Kubernetes to be able to deploy workflows across both academic and commercial settings. With enterprise versions of Kubernetes such as Red Hat's ", + "_key": "d22f0e84634e" + }, + { + "text": "OpenShift", + "_key": "fbd5c7048697", + "_type": "span", + "marks": [ + "47c246c27377" + ] + }, + { + "marks": [], + "text": ", it was becoming apparent there was a need for native execution with Nextflow.", + "_key": "f0342ab97b3f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1fc7ab523738" + }, + { + "children": [ + { + "_key": "13c7b18cf3ac", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0d7541c31e89" + }, + { + "_type": "block", + "style": "normal", + "_key": "b553f44c34dd", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/kubernetes.html", + "_key": "594c0928be9f", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The new command ", + "_key": "432698e935b9" + }, + { + "_key": "b400d99791d2", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow kuberun" + }, + { + "text": " launches the Nextflow driver as a ", + "_key": "d7426e897f9f", + "_type": "span", + "marks": [] + }, + { + "text": "pod", + "_key": "c38b43092073", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": " which is then able to run workflow tasks as other pods within a Kubernetes cluster. You can read more in the documentation on Kubernetes support for Nextflow ", + "_key": "22c493062d2b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "594c0928be9f" + ], + "text": "here", + "_key": "6ba2a00ef3fd" + }, + { + "_key": "3de473585509", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "style": "normal", + "_key": "447e26051266", + "children": [ + { + "_type": "span", + "text": "", + "_key": "cd272c0a2f8b" + } + ], + "_type": "block" + }, + { + "alt": "Nextflow and Kubernetes", + "_key": "8c669f38a94c", + "asset": { + "_ref": "image-1d1a2e1f06d3d9a82c8ba501ad64eecad15353a9-1600x1356-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "text": "", + "_key": "445eb400c7a8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ed158c756f2a" + }, + { + "children": [ + { + "text": "Improved reporting and notifications", + "_key": "7d80855e900c", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "d8cb79f6a4af" + }, + { + "style": "normal", + "_key": "e8148ac7d315", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Following the hackathon in September we wrote about the addition of HTML trace reports that allow for the generation HTML detailing resource usage (CPU time, memory, disk i/o etc).", + "_key": "5ec294161704", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "f8ebc3dac9c0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "77a76c8950be" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/issues/547", + "_key": "558876f359bc", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/521", + "_key": "e55ccb974527" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/534", + "_key": "dc9e2a00f3d7" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Thanks to valuable feedback there has continued to be many improvements to the reports as tracked through the Nextflow GitHub issues page. Reports are now able to display ", + "_key": "dd6c402b3a2d" + }, + { + "marks": [ + "558876f359bc" + ], + "text": "thousands of tasks", + "_key": "060aee4178c4", + "_type": "span" + }, + { + "_key": "0d8f312faf8a", + "_type": "span", + "marks": [], + "text": " and include extra information such as the " + }, + { + "_type": "span", + "marks": [ + "e55ccb974527" + ], + "text": "container engine used", + "_key": "e33d77616c6a" + }, + { + "_type": "span", + "marks": [], + "text": ". Tasks can be filtered and an ", + "_key": "cf40b3f42107" + }, + { + "marks": [ + "dc9e2a00f3d7" + ], + "text": "overall progress bar", + "_key": "5bffc38664cf", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " has been added.", + "_key": "4d5afdb739dd" + } + ], + "_type": "block", + "style": "normal", + "_key": "1aa780e08a38" + }, + { + "_key": "9a4351d22085", + "children": [ + { + "text": "", + "_key": "59f7a2850687", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "42a6cbfa1c33", + "markDefs": [ + { + "_key": "e315049e0ab8", + "_type": "link", + "href": "/misc/nf-trace-report2.html" + }, + { + "href": "https://www.nextflow.io/docs/latest/tracing.html", + "_key": "8027ea1d11b8", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can explore a ", + "_key": "235413b54003" + }, + { + "_type": "span", + "marks": [ + "e315049e0ab8" + ], + "text": "real-world HTML report", + "_key": "8df0f57282fa" + }, + { + "marks": [], + "text": " and more information on HTML reports can be found in the ", + "_key": "d3d909778a75", + "_type": "span" + }, + { + "marks": [ + "8027ea1d11b8" + ], + "text": "documentation", + "_key": "90027a67eed5", + "_type": "span" + }, + { + "_key": "17af8f6cf7eb", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a8df6ffcc414", + "children": [ + { + "_key": "d1ccda6fda00", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/mail.html?highlight=notification#workflow-notification", + "_key": "8ad63e024fde" + } + ], + "children": [ + { + "marks": [], + "text": "There has also been additions to workflow notifications. Currently these can be configured to automatically send a notification email when a workflow execution terminates. You can read more about how to setup notifications in the ", + "_key": "0969b767aea7", + "_type": "span" + }, + { + "marks": [ + "8ad63e024fde" + ], + "text": "documentation", + "_key": "0115cfcb46a3", + "_type": "span" + }, + { + "text": ".", + "_key": "32e01dccb2e5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2d12550f1db3" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "1ff531171b67" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ea29e491d90" + }, + { + "_key": "10c5bbd20672", + "children": [ + { + "_type": "span", + "text": "Syntax-tic!", + "_key": "759fb9adc783" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "50755e3f3e91", + "markDefs": [ + { + "href": "https://atom.io", + "_key": "1494006b290b", + "_type": "link" + }, + { + "href": "https://code.visualstudio.com", + "_key": "1d65ff27dce2", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Writing workflows no longer has to be done in monochrome. There is now syntax highlighting for Nextflow in the popular ", + "_key": "cf41772e2fc9", + "_type": "span" + }, + { + "text": "Atom editor", + "_key": "c19e378b3d50", + "_type": "span", + "marks": [ + "1494006b290b" + ] + }, + { + "_type": "span", + "marks": [], + "text": " as well as in ", + "_key": "73b9930214c5" + }, + { + "marks": [ + "1d65ff27dce2" + ], + "text": "Visual Studio Code", + "_key": "22fb24996c84", + "_type": "span" + }, + { + "_key": "a6f6ade4059d", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b3a9d1f4c765" + } + ], + "_type": "block", + "style": "normal", + "_key": "d5262d4944ad" + }, + { + "_key": "2f9f5e6b0981", + "_type": "block" + }, + { + "style": "normal", + "_key": "bf437db64fca", + "markDefs": [ + { + "_key": "76c6505ca96e", + "_type": "link", + "href": "/img/atom-min.png" + } + ], + "children": [ + { + "marks": [ + "76c6505ca96e" + ], + "text": "![Nextflow syntax highlighting with Atom](/img/atom-min.png)", + "_key": "a6e7854d13d4", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "606eb9399382", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "dddb83b4a22b" + }, + { + "_type": "block", + "_key": "ff7b5eaa8236" + }, + { + "children": [ + { + "marks": [ + "dcfcf66b920f" + ], + "text": "![Nextflow syntax highlighting with VSCode](/img/vscode-min.png)", + "_key": "eb3ebf24f620", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "585213dfdf0a", + "markDefs": [ + { + "href": "/img/vscode-min.png", + "_key": "dcfcf66b920f", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8c2c2767792a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ae26cb55b9d2" + } + ] + }, + { + "_key": "4512d5f0e928", + "_type": "block" + }, + { + "style": "normal", + "_key": "eac24bf441ae", + "markDefs": [ + { + "_key": "96f64edce7bc", + "_type": "link", + "href": "https://atom.io/packages/language-nextflow" + }, + { + "_type": "link", + "href": "https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow", + "_key": "976b5728ffc7" + } + ], + "children": [ + { + "_key": "0c927cf96241", + "_type": "span", + "marks": [], + "text": "You can find the Atom plugin by searching for Nextflow in Atoms package installer or clicking " + }, + { + "text": "here", + "_key": "58fb4bd234b1", + "_type": "span", + "marks": [ + "96f64edce7bc" + ] + }, + { + "_key": "cfdc6f528916", + "_type": "span", + "marks": [], + "text": ". The Visual Studio plugin can be downloaded " + }, + { + "_type": "span", + "marks": [ + "976b5728ffc7" + ], + "text": "here", + "_key": "6ddc63571fbc" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "e44be252be08" + } + ], + "_type": "block" + }, + { + "_key": "4042587d3828", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b8471c465861" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "afd66570369b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "On a related note, Nextflow is now an official language on GitHub!", + "_key": "17bc7deaf3ab", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "3c81f3acd12b", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5af52d95f273" + }, + { + "alt": "GitHub nextflow syntax", + "_key": "1afdd1dcd82d", + "asset": { + "_ref": "image-09a2cedaf949ec00544160b7497269882cd69b48-1200x944-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "468482c3a6c8", + "children": [ + { + "_key": "3017ffb20c1a", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "abd6b7e87d43", + "_type": "span", + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h3", + "_key": "649f31a2d54d" + }, + { + "_type": "block", + "style": "normal", + "_key": "de40aa29e2ff", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow developments are progressing faster than ever and with the help of the community, there are a ton of great new features on the way. If you have any suggestions of your killer NF idea then please drop us a line, open an issue or even better, join in the fun.", + "_key": "cc44438b01f2", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3b363f416ff2", + "children": [ + { + "text": "", + "_key": "e3c1503f402a", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Over the coming months Nextflow will be reaching out with several training and presentation sessions across the US and Europe. We hope to see as many of you as possible on the road.", + "_key": "7c7cde5d84d6" + } + ], + "_type": "block", + "style": "normal", + "_key": "e52f055ca522" + } + ], + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "e0bcf067adab" + }, + { + "_ref": "fd7c81e0-c3bf-48a5-a383-b99ab3222308", + "_type": "reference", + "_key": "389ef23c15f8" + } + ], + "_type": "blogPost", + "_updatedAt": "2024-09-26T09:02:09Z", + "publishedAt": "2018-04-03T06:00:00.000Z", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "nextflow-turns-5" + } + }, + "_id": "5d38232c62a9", + "title": "Nextflow turns five! Happy birthday!", + "_createdAt": "2024-09-25T14:15:38Z", + "_rev": "Ot9x7kyGeH5005E3MIwi6G" + }, + { + "meta": { + "description": "We are thrilled to announce that Seqera is joining forces with tinybio, a NYC-based tech-bio start-up known for its AI-integrated scientific tools focused on executing pipelines and analyses via natural language. \n", + "noIndex": false, + "slug": { + "current": "tinybio-joins-seqera-to-advance-science-for-everyone-now-through-genai", + "_type": "slug" + }, + "_type": "meta", + "shareImage": { + "_type": "image", + "asset": { + "_ref": "image-3d25c202215864675258a5c2c5084d2f656aae73-1200x629-png", + "_type": "reference" + } + } + }, + "tags": [ + { + "_ref": "d356a4d5-06c1-40c2-b655-4cb21cf74df1", + "_type": "reference", + "_key": "3395edbcdd9d" + }, + { + "_key": "c9112db92839", + "_ref": "1b55a117-18fe-40cf-8873-6efd157a6058", + "_type": "reference" + } + ], + "_createdAt": "2024-08-06T13:37:21Z", + "_updatedAt": "2024-09-03T07:58:25Z", + "title": "Seqera acquires tinybio to Advance Science for Everyone - Now Through GenAI!", + "body": [ + { + "asset": { + "_ref": "image-3d25c202215864675258a5c2c5084d2f656aae73-1200x629-png", + "_type": "reference" + }, + "_type": "image", + "_key": "39e47c87a1bd" + }, + { + "children": [ + { + "_key": "e2be5dfe2b940", + "_type": "span", + "marks": [], + "text": "We are thrilled to announce that Seqera is joining forces with " + }, + { + "marks": [ + "0eadc8ec1fed" + ], + "text": "tinybio", + "_key": "e2be5dfe2b941", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", a NYC-based tech-bio start-up known for its AI-integrated scientific tools focused on executing pipelines and analyses via natural language. We are happy to welcome the tinybio team and community into the Seqera family.", + "_key": "e2be5dfe2b942" + } + ], + "_type": "block", + "style": "normal", + "_key": "f5d38bed2100", + "markDefs": [ + { + "_type": "link", + "href": "https://www.tinybio.cloud/", + "_key": "0eadc8ec1fed" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "e583cc66c5fc0", + "_type": "span", + "marks": [], + "text": "Empowering All Scientists with Advanced Data Tools" + } + ], + "_type": "block", + "style": "h2", + "_key": "28fd4bc59876" + }, + { + "style": "normal", + "_key": "9d1bdb6acb67", + "markDefs": [ + { + "_key": "b2a101e1faea", + "_type": "link", + "href": "https://www.nature.com/articles/s41467-024-49777-x" + }, + { + "_type": "link", + "href": "https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01304-y", + "_key": "8e1ad6d30663" + }, + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8719813/", + "_key": "307be9175f5a" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Scientists spend a ", + "_key": "6a6167e26f940" + }, + { + "_type": "span", + "marks": [ + "b2a101e1faea" + ], + "text": "significant proportion of their time", + "_key": "6a6167e26f941" + }, + { + "_key": "6a6167e26f942", + "_type": "span", + "marks": [], + "text": " transforming and structuring data for analysis. In fact, a " + }, + { + "_type": "span", + "marks": [ + "8e1ad6d30663" + ], + "text": "lessons learned piece on the COVID-19 pandemic ", + "_key": "6a6167e26f943" + }, + { + "_key": "6a6167e26f944", + "_type": "span", + "marks": [], + "text": "underscored how " + }, + { + "marks": [ + "strong" + ], + "text": "issues in data analysis ", + "_key": "6a6167e26f945", + "_type": "span" + }, + { + "_key": "6a6167e26f946", + "_type": "span", + "marks": [], + "text": "and study design can " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "significantly impact scientific breakthroughs", + "_key": "6a6167e26f947" + }, + { + "_key": "6a6167e26f948", + "_type": "span", + "marks": [], + "text": ". " + }, + { + "_type": "span", + "marks": [ + "307be9175f5a" + ], + "text": "As biological data continues to grow exponentially", + "_key": "6a6167e26f949" + }, + { + "marks": [], + "text": ", there is an urgent need to manage large-scale data more rapidly for accelerated scientific breakthroughs. To achieve this, we are partnering with tinybio to ", + "_key": "6a6167e26f9410", + "_type": "span" + }, + { + "_key": "6a6167e26f9411", + "_type": "span", + "marks": [ + "strong" + ], + "text": "harness the power of GenAI," + }, + { + "_type": "span", + "marks": [], + "text": " lowering the barrier for scientists to fully leverage advanced computational tools to achieve their research goals.", + "_key": "6a6167e26f9412" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "45b78a00c7e8", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "27ddeccac626", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "3ab4e5cf5dfe", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "tinybio: Specialized ChatGPT for Researchers", + "_key": "a3b9dc23dc9a0" + } + ], + "_type": "block" + }, + { + "_key": "58b28d00c516", + "markDefs": [ + { + "href": "https://www.tinybio.cloud/", + "_key": "908a8535997e", + "_type": "link" + }, + { + "_type": "link", + "href": "https://chatgpt.com/", + "_key": "4cc3db069b96" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Motivated by challenges faced as researchers experimenting with different bioinformatics packages, Sasha and Vishal founded ", + "_key": "405ee96282980" + }, + { + "marks": [ + "908a8535997e" + ], + "text": "tinybio", + "_key": "405ee96282981", + "_type": "span" + }, + { + "_key": "405ee96282982", + "_type": "span", + "marks": [], + "text": " in 2022, convinced there had to be a better, easier way to get started with bioinformatics. The initial goal of tinybio was to remove the barrier to entry for running bioinformatics packages, a mission that gained significant momentum with the announcement of " + }, + { + "text": "ChatGPT", + "_key": "405ee96282983", + "_type": "span", + "marks": [ + "4cc3db069b96" + ] + }, + { + "marks": [], + "text": " in November 2022. The tinybio co-founders recognized the potential of ", + "_key": "405ee96282984", + "_type": "span" + }, + { + "text": "leveraging GenAI ", + "_key": "0ba59e2f9664", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": "for empowering all scientists to effectively utilize bioinformatics tools, regardless of their experience or research background. Ever since, tinybio have focused on applying GenAI to drive bioinformatics innovation.", + "_key": "984fcc760079", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e289bac08afd", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9722baf8eb0a", + "_type": "span" + } + ] + }, + { + "style": "blockquote", + "_key": "c3ff7a2beda0", + "markDefs": [], + "children": [ + { + "_key": "bc450efddd08", + "_type": "span", + "marks": [], + "text": "\"After seeing the amazing traction around our chat-based pipeline execution and analysis tool, Vishal and I knew that we needed to partner with the leader in bioinformatics pipelines to enable our vision for " + }, + { + "marks": [ + "strong" + ], + "text": "more open science", + "_key": "d051bfadfe53", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and to ", + "_key": "f42d1845bb72" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "onboard millions more to computational biology.", + "_key": "6b8aae3beb95" + }, + { + "marks": [], + "text": " We are truly excited to be joining the Seqera team and contributing to advancing science for everyone through their Nextflow, Wave, MultiQC, and Fusion products.\" - ", + "_key": "2f742e542afe", + "_type": "span" + }, + { + "text": "Sasha Dagayev, Co-founder at tinybio", + "_key": "bba720e98780", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "8a74a60c4d9c", + "_type": "span", + "marks": [], + "text": "\ntinybio’s authentic and pragmatic approach to " + }, + { + "text": "leveraging LLMs for bioinformatics", + "_key": "f21eee87a50a", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "1b90bec51a6f", + "_type": "span", + "marks": [], + "text": " is essential in bridging the gap between scientists and advanced computational capabilities to accelerate scientific discovery. By incorporating this technology, we aim to significantly enhance our existing " + }, + { + "text": "pipelines", + "_key": "d079d983b04a", + "_type": "span", + "marks": [ + "95fd0bd0acf6" + ] + }, + { + "_key": "b1b5f1bfc9ae", + "_type": "span", + "marks": [], + "text": "," + }, + { + "_key": "c42631b1fa90", + "_type": "span", + "marks": [ + "d259679e3355" + ], + "text": " containers" + }, + { + "_type": "span", + "marks": [], + "text": " and web resources, making high-quality, reproducible bioinformatics tools more accessible to researchers worldwide. Our goal is to ", + "_key": "a1530ae5fa4d" + }, + { + "_key": "f5a66b9f8ce7", + "_type": "span", + "marks": [ + "strong" + ], + "text": "empower the global scientific community" + }, + { + "marks": [], + "text": " with the resources they need to drive innovation and advance our understanding of complex biological systems.", + "_key": "4f48f33b58fc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "20b81fc022bb", + "markDefs": [ + { + "_key": "95fd0bd0acf6", + "_type": "link", + "href": "https://seqera.io/pipelines/" + }, + { + "_type": "link", + "href": "https://seqera.io/containers/", + "_key": "d259679e3355" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "781c75a3bb98", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "de50ded18161", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "a35537f824ee", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "A New Era for AI-enabled Bioinformatics", + "_key": "e0633d49502f0", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cf91c1dabfd0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7998306/", + "_key": "81a2231f5c08" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The biotech and bioinformatics landscape is rapidly evolving, driven in-part by technological advances in AI. The ability to analyze massive datasets, identify patterns, and generate predictive models is revolutionizing scientific research. We also believe that AI is a powerful tool to democratize and amplify access to the most sophisticated bioinformatics tools out there. By leveraging ", + "_key": "03d0891e1aa70" + }, + { + "_type": "span", + "marks": [ + "81a2231f5c08" + ], + "text": "human-centric AI", + "_key": "03d0891e1aa71" + }, + { + "_key": "03d0891e1aa72", + "_type": "span", + "marks": [], + "text": ", we can " + }, + { + "marks": [ + "strong" + ], + "text": "enable the 10x scientist", + "_key": "0a03364b0e08", + "_type": "span" + }, + { + "text": " to translate complex biological data into actionable insights, thereby expediting scientific discovery and innovation.", + "_key": "49ef31effc8b", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "text": "\n", + "_key": "8560ad3cef8e0", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Our partnership with tinybio represents a significant milestone in our journey to ", + "_key": "903ece1a151f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "advance science for everyone through software", + "_key": "3cc381edad69" + }, + { + "text": ". This collaboration will lower the barrier of entry for a broader range of researchers to utilize bioinformatics tools effectively, facilitating groundbreaking innovations and transforming the future of genomics.", + "_key": "9d2e53834331", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dd017c0c5d56", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "3a554d01a336", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "32ab8ed1feb7", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\"New interaction models with powerful computational platforms are transforming not just how scientists work but also what they discover. By empowering scientists with modern software engineering practices, we are ", + "_key": "ce5a8009c2f90", + "_type": "span" + }, + { + "_key": "797c10fef3eb", + "_type": "span", + "marks": [ + "strong" + ], + "text": "enabling the next generation of innovations" + }, + { + "marks": [], + "text": " in personalized therapeutics, sustainable materials, better drug delivery methodologies, and green chemical and agricultural production. This acquisition marks a significant step towards accelerating scientific discoveries and enabling researchers with better software.\" -", + "_key": "f3cfe9eb53f7", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " Evan Floden, CEO at Seqera", + "_key": "7c3b3468dfe3" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "fd72009a708b" + }, + { + "_type": "block", + "style": "normal", + "_key": "849a938fb79b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1bcc2f6d724a", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "h2", + "_key": "57dd7a75086e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Enhancing our Open Science Core", + "_key": "2b06457c0ed50", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Our mission at Seqera is to ", + "_key": "28b149c00eee0", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "make science accessible to everyone through software", + "_key": "57dc51e60f83", + "_type": "span" + }, + { + "marks": [], + "text": ". As research becomes increasingly digitized, there is a critical need to access all available scientific research to make informed R&D decisions and ultimately accelerate the impact on patients. Central to achieving this is Open Science, which ensures reproducibility, validation and transparency across the scientific community. With AI, we want to further enhance our Open Science core, by lowering the barrier of adoption of bioinformatics tools for ", + "_key": "f068fe8ef9f4", + "_type": "span" + }, + { + "text": "millions more researchers worldwide,", + "_key": "28b149c00eee1", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " driving more rapid advancements in science and medicine.", + "_key": "28b149c00eee2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6ace1e068755" + }, + { + "_type": "block", + "style": "normal", + "_key": "83a6b2dc9d92", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a54ccaf62d6c0" + } + ] + }, + { + "_key": "5b4fbb778158", + "markDefs": [], + "children": [ + { + "text": "What’s Next for Seqera and tinybio?", + "_key": "6db8f88cd0a20", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "be45046290e5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Seqera is excited to collaborate closely with tinybio’s founders Sasha Dagayev and Vishal Patel to further its mission of advancing science for everyone through software. Their expertise will be instrumental in driving the development of community-centric tools on Seqera.io,", + "_key": "d1f478e8ca4e0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " empowering scientists worldwide", + "_key": "7df2ef0b8ff2" + }, + { + "_type": "span", + "marks": [], + "text": " to leverage modern software capabilities on demand.", + "_key": "30172707b906" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We will first focus on leveraging AI to solve the cold start problem for the next generation of scientists and ", + "_key": "a38bd6a01e2e0", + "_type": "span" + }, + { + "text": "removing barriers to entry to bioinformatics", + "_key": "6970533750e5", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". Existing powerful frameworks and resources, such as Nextflow, nf-core, Seqera Pipelines and Containers, have been significantly enhancing the research productivity of bioinformaticians, but come with a steep learning curve that prevents newcomers from getting started fast.", + "_key": "7f4fec95cdd5" + } + ], + "_type": "block", + "style": "normal", + "_key": "249e21a060a8" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We want to free the next generation of scientists from wasting time in the nitty gritty of setting up various bioinformatics packages and infrastructure. We believe future scientists should be able to focus on understanding the “what” and “why” of their analysis, while the “how” is generated for them in an understandable and verifiable way. Our tools and resources provide already powerful building blocks to enable this, and we cannot wait to bring these new updates to users in the coming months. Stay tuned!", + "_key": "8ee4c6456e1a0" + } + ], + "_type": "block", + "style": "normal", + "_key": "d42b3b57dfac", + "markDefs": [] + }, + { + "_key": "6d3fbe9d99a0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7f7a558dfa34" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "902d0053f19e", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02NlSWM0", + "_key": "eb86d506d8aa" + }, + { + "_key": "5f0b731c8d4f", + "_type": "link", + "href": "https://hubs.la/Q02NlVXy0" + } + ], + "children": [ + { + "text": "Interested in finding out more? Watch the Nextflow Channels podcast on ", + "_key": "fd5a307d6e25", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "eb86d506d8aa" + ], + "text": "GenAI for bioinformatics", + "_key": "f1a61d9d9482", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "8a4c1b94d2cb" + }, + { + "marks": [ + "5f0b731c8d4f" + ], + "text": "subscribe to our newsletter", + "_key": "d505924c7484", + "_type": "span" + }, + { + "text": " to stay tuned!", + "_key": "09ba811e5ef1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "d1a81dc47aac", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "83248382cacc" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "About tinybio", + "_key": "113745debebc0" + } + ], + "_type": "block", + "style": "h2", + "_key": "cfcca3842875" + }, + { + "children": [ + { + "_key": "eedbe6e83ab60", + "_type": "span", + "marks": [], + "text": "tinybio is a New York City based startup focused on the application of generally available generative AI technologies to help bioinformaticians and researchers. It was started by Sasha Dagayev and Vishal Patel in 2022. To date, the company has helped thousands of researchers to resolve hundreds of thousands of bioinformatics issues." + } + ], + "_type": "block", + "style": "normal", + "_key": "0119756f1ecc", + "markDefs": [] + } + ], + "publishedAt": "2024-08-06T13:43:00.000Z", + "_type": "blogPost", + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "_rev": "Z979U64FXLC2cFZCkgkV9v", + "_id": "5df71356-10dc-422f-bae8-e26491a560dc" + }, + { + "publishedAt": "2014-09-09T06:00:00.000Z", + "meta": { + "slug": { + "current": "nextflow-meets-docker" + } + }, + "_updatedAt": "2024-09-26T09:00:16Z", + "title": "Reproducibility in Science - Nextflow meets Docker", + "_id": "5e3eebce9153", + "body": [ + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The scientific world nowadays operates on the basis of published articles. These are used to report novel discoveries to the rest of the scientific community.", + "_key": "59b35c33b672" + } + ], + "_type": "block", + "style": "normal", + "_key": "0cd3e947bf8a" + }, + { + "_type": "block", + "style": "normal", + "_key": "6e68445ed27b", + "children": [ + { + "text": "", + "_key": "e7d0dc517871", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "46b1cdc124eb", + "_type": "span", + "marks": [], + "text": "But have you ever wondered what a scientific article is? It is a:" + } + ], + "_type": "block", + "style": "normal", + "_key": "1213fc789f24" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "328057fd92b3" + } + ], + "_type": "block", + "style": "normal", + "_key": "21be8dbe0b1f" + }, + { + "_key": "0cccc78adb84", + "listItem": "bullet", + "children": [ + { + "marks": [], + "text": "defeasible argument for claims, supported by", + "_key": "b9d38272dc5c", + "_type": "span" + }, + { + "_key": "5565c69b4300", + "_type": "span", + "marks": [], + "text": "exhibited, reproducible data and methods, and" + }, + { + "_type": "span", + "marks": [], + "text": "explicit references to other work in that domain;", + "_key": "f431d9b9deed" + }, + { + "_key": "5e654872e5e7", + "_type": "span", + "marks": [], + "text": "described using domain-agreed technical terminology," + }, + { + "_key": "a2d6261e4f5a", + "_type": "span", + "marks": [], + "text": "which exists within a complex ecosystem of technologies, people and activities." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "d5ade5b0849e", + "children": [ + { + "_key": "5ebbadf6c6c3", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Hence the very essence of Science relies on the ability of scientists to reproduce and build upon each other’s published results.", + "_key": "2d3e042adb1c" + } + ], + "_type": "block", + "style": "normal", + "_key": "91a69d2fbcda" + }, + { + "children": [ + { + "_key": "cc0ddddc5d07", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5c0178061126" + }, + { + "_key": "7fdd218e7bc2", + "markDefs": [ + { + "href": "http://www.nature.com/nature/journal/v483/n7391/full/483531a.html", + "_key": "923c83780e83", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "So how much can we rely on published data? In a recent report in Nature, researchers at the Amgen corporation found that only 11% of the academic research in the literature was reproducible by their groups [", + "_key": "195742b919df" + }, + { + "_type": "span", + "marks": [ + "923c83780e83" + ], + "text": "1", + "_key": "649e32779802" + }, + { + "text": "].", + "_key": "e4931c5e781f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0d5ee61a36ad", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5d01dfdcb150" + } + ], + "_type": "block" + }, + { + "_key": "42a6b962fe47", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While many factors are likely at play here, perhaps the most basic requirement for reproducibility holds that the materials reported in a study can be uniquely identified and obtained, such that experiments can be reproduced as faithfully as possible. This information is meant to be documented in the "materials and methods" of journal articles, but as many can attest, the information provided there is often not adequate for this task.", + "_key": "c964b5781329", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "999bf374b806", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0688d1c8b0bd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Promoting Computational Research Reproducibility", + "_key": "38ae292b3527", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "5b8bb9a1bc83" + }, + { + "children": [ + { + "marks": [], + "text": "Encouragingly scientific reproducibility has been at the forefront of many news stories and there exist numerous initiatives to help address this problem. Particularly, when it comes to producing reproducible computational analyses, some publications are starting to publish the code and data used for analysing and generating figures.", + "_key": "fa897da7d059", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4488f3038e9f", + "markDefs": [] + }, + { + "_key": "68d409c65f72", + "children": [ + { + "_type": "span", + "text": "", + "_key": "46f455012afa" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f87b23bd7ef3", + "markDefs": [], + "children": [ + { + "_key": "bac5245ef7b8", + "_type": "span", + "marks": [], + "text": "For example, many articles in Nature and in the new Elife journal (and others) provide a "source data" download link next to figures. Sometimes Elife might even have an option to download the source code for figures." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3af90a7e1a8b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "fc0ec4326de2" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "f0fc53a7dc08", + "markDefs": [ + { + "href": "http://melissagymrek.com/science/2014/08/29/docker-reproducible-research.html", + "_key": "64ead0bb6a12", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "As pointed out by Melissa Gymrek ", + "_key": "8b1fb3008476", + "_type": "span" + }, + { + "text": "in a recent post", + "_key": "cfb3aed207bc", + "_type": "span", + "marks": [ + "64ead0bb6a12" + ] + }, + { + "_key": "9fe2744de1be", + "_type": "span", + "marks": [], + "text": " this is a great start, but there are still lots of problems. She wrote that, for example, if one wants to re-execute a data analyses from these papers, he/she will have to download the scripts and the data, to only realize that he/she has not all the required libraries, or that it only runs on, for example, an Ubuntu version he/she doesn't have, or some paths are hard-coded to match the authors' machine." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3b755927ebc8", + "children": [ + { + "text": "", + "_key": "920a61a3920a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If it's not easy to run and doesn't run out of the box the chances that a researcher will actually ever run most of these scripts is close to zero, especially if they lack the time or expertise to manage the required installation of third-party libraries, tools or implement from scratch state-of-the-art data processing algorithms.", + "_key": "c77f1d8de0c2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b7cdacb20a82" + }, + { + "_key": "38feec499605", + "children": [ + { + "text": "", + "_key": "e56664ca20ad", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "0fe5db0e880a", + "children": [ + { + "_key": "1599518c0f64", + "_type": "span", + "text": "Here comes Docker" + } + ] + }, + { + "_key": "5f8bb85bd22b", + "markDefs": [ + { + "_key": "a57b5569587e", + "_type": "link", + "href": "http://www.docker.com" + } + ], + "children": [ + { + "text": "Docker", + "_key": "888c00fd2ed0", + "_type": "span", + "marks": [ + "a57b5569587e" + ] + }, + { + "text": " containers technology is a solution to many of the computational research reproducibility problems. Basically, it is a kind of a lightweight virtual machine where you can set up a computing environment including all the libraries, code and data that you need, within a single ", + "_key": "f4e9403190f6", + "_type": "span", + "marks": [] + }, + { + "text": "image", + "_key": "032d85fe5462", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "5fe926122ea3", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "be7a77a7ecb0" + } + ], + "_type": "block", + "style": "normal", + "_key": "887df9fbc509" + }, + { + "style": "normal", + "_key": "68b46bcd163b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This image can be distributed publicly and can seamlessly run on any major Linux operating system. No need for the user to mess with installation, paths, etc.", + "_key": "4672a9e481e8" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3bbf9a929d0a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c6e16e1a26d8" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7503c4787585", + "markDefs": [ + { + "_type": "link", + "href": "http://www.bioinformaticszen.com/post/reproducible-assembler-benchmarks/", + "_key": "6a1af6c49833" + }, + { + "_type": "link", + "href": "https://bcbio.wordpress.com/2014/03/06/improving-reproducibility-and-installation-of-genomic-analysis-pipelines-with-docker/", + "_key": "0aa5d689e2c8" + } + ], + "children": [ + { + "text": "They just run the Docker image you provided, and everything is set up to work out of the box. Researchers have already started discussing this (e.g. ", + "_key": "98cfa2355473", + "_type": "span", + "marks": [] + }, + { + "_key": "ee62266358db", + "_type": "span", + "marks": [ + "6a1af6c49833" + ], + "text": "here" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "611ac5421a4d" + }, + { + "text": "here", + "_key": "b7b9fdc71e8f", + "_type": "span", + "marks": [ + "0aa5d689e2c8" + ] + }, + { + "text": ").", + "_key": "5ce952e7bd6e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "08c0889fc07f", + "children": [ + { + "_key": "0b88ce227c33", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "8b9ad08221da", + "children": [ + { + "text": "Docker and Nextflow: a perfect match", + "_key": "7129ed72df1b", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "One big advantage Docker has compared to ", + "_key": "078f1ae7ede8" + }, + { + "text": "traditional", + "_key": "b42ddbdaef86", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " machine virtualisation technology is that it doesn't need a complete copy of the operating system, thus it has a minimal startup time. This makes it possible to virtualise single applications or launch the execution of multiple containers, that can run in parallel, in order to speedup a large computation.", + "_key": "9d9ffb18cff9" + } + ], + "_type": "block", + "style": "normal", + "_key": "9af1ad3e6983" + }, + { + "style": "normal", + "_key": "f34da90d14bb", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ea49d39f17b4" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "8314f9492b20", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow is a data-driven toolkit for computational pipelines, which aims to simplify the deployment of distributed and highly parallelised pipelines for scientific applications.", + "_key": "701775fcf0a4" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "716701240f07", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "210cf738282e" + }, + { + "children": [ + { + "text": "The latest version integrates the support for Docker containers that enables the deployment of self-contained and truly reproducible pipelines.", + "_key": "377dda485511", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "25ac907efe1c", + "markDefs": [] + }, + { + "children": [ + { + "_key": "f7fc6dc5aede", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "10f90ba6e489" + }, + { + "children": [ + { + "text": "How they work together", + "_key": "666a0d7a5b41", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "58e1c5cf843e" + }, + { + "children": [ + { + "marks": [], + "text": "A Nextflow pipeline is made up by putting together several processes. Each process can be written in any scripting language that can be executed by the Linux platform (BASH, Perl, Ruby, Python, etc). Parallelisation is automatically managed by the framework and it is implicitly defined by the processes input and output declarations.", + "_key": "23ec1438f980", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "23c47b66d85c", + "markDefs": [] + }, + { + "children": [ + { + "_key": "933c374df2dc", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "42201d1a0044" + }, + { + "_key": "a138e2ccbde5", + "markDefs": [], + "children": [ + { + "_key": "223c5dc9d1bd", + "_type": "span", + "marks": [], + "text": "By integrating Docker with Nextflow, every pipeline process can be executed independently in its own container, this guarantees that each of them run in a predictable manner without worrying about the configuration of the target execution platform. Moreover the minimal overhead added by Docker allows us to spawn multiple container executions in a parallel manner with a negligible performance loss when compared to a platform " + }, + { + "text": "native", + "_key": "e42cd8aab7ab", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " execution.", + "_key": "b5f079b31201" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f7179fd76e64", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3a9f5b7a1b82" + } + ] + }, + { + "style": "h3", + "_key": "1d6d86892b60", + "children": [ + { + "_key": "2fbf3e23d741", + "_type": "span", + "text": "An example" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d3ede21d981e", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/examples/blob/master/blast-parallel.nf", + "_key": "95f19cf8a7b5", + "_type": "link" + } + ], + "children": [ + { + "_key": "9b2064b60f14", + "_type": "span", + "marks": [], + "text": "As a proof of concept of the Docker integration with Nextflow you can try out the pipeline example at this " + }, + { + "_type": "span", + "marks": [ + "95f19cf8a7b5" + ], + "text": "link", + "_key": "d3c587ca2706" + }, + { + "_key": "bc1b2c3d296e", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "_key": "7b379b25fc7b", + "children": [ + { + "text": "", + "_key": "2ff104eefa0a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "995dc3ef59c4", + "_type": "span", + "marks": [], + "text": "It splits a protein sequences multi FASTA file into chunks of " + }, + { + "marks": [ + "em" + ], + "text": "n", + "_key": "3ff5e20615a0", + "_type": "span" + }, + { + "_key": "0ab8d7b6b968", + "_type": "span", + "marks": [], + "text": " entries, executes a BLAST query for each of them, then extracts the top 10 matching sequences and finally aligns the results with the T-Coffee multiple sequence aligner." + } + ], + "_type": "block", + "style": "normal", + "_key": "693e9833dfa8" + }, + { + "children": [ + { + "_key": "a59ef33a4b11", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "505523f39862" + }, + { + "_type": "block", + "style": "normal", + "_key": "86c6ece9f1af", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In a common scenario you generally need to install and configure the tools required by this script: BLAST and T-Coffee. Moreover you should provide a formatted protein database in order to execute the BLAST search.", + "_key": "510cd0c4b455", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cd3ffa966e79", + "children": [ + { + "_type": "span", + "text": "", + "_key": "471e9c00da48" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "47a60fd8d676", + "markDefs": [], + "children": [ + { + "text": "By using Docker with Nextflow you only need to have the Docker engine installed in your computer and a Java VM. In order to try this example out, follow these steps:", + "_key": "edea1d1d76b7", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "28b0121fbecb", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7e6b569fdfd8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "451ff9229b62", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Install the latest version of Nextflow by entering the following command in your shell terminal:", + "_key": "11f987e78567", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "1226322fda36", + "children": [ + { + "_type": "span", + "text": "", + "_key": "09a3e0a66330" + } + ] + }, + { + "_type": "code", + "_key": "41933fe37d1d", + "code": " curl -fsSL get.nextflow.io | bash" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Then download the required Docker image with this command:", + "_key": "850cd0f072cf" + } + ], + "_type": "block", + "style": "normal", + "_key": "f07384de1256", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "2b15270a6664" + } + ], + "_type": "block", + "style": "normal", + "_key": "d15a6515f59e" + }, + { + "code": " docker pull nextflow/examples", + "_type": "code", + "_key": "a0fd796c7f18" + }, + { + "_type": "block", + "style": "normal", + "_key": "c08729034043", + "markDefs": [ + { + "_key": "db182b7798b3", + "_type": "link", + "href": "https://github.com/nextflow-io/examples/blob/master/Dockerfile" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can check the content of the image looking at the ", + "_key": "eeaa0b6f70aa" + }, + { + "_type": "span", + "marks": [ + "db182b7798b3" + ], + "text": "Dockerfile", + "_key": "867810f7eaf7" + }, + { + "_type": "span", + "marks": [], + "text": " used to create it.", + "_key": "140457af601a" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c9b645e8abc6", + "children": [ + { + "text": "", + "_key": "200806a4a22f", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "9bc8fda6eb37", + "_type": "span", + "marks": [], + "text": "Now you are ready to run the demo by launching the pipeline execution as shown below:" + } + ], + "_type": "block", + "style": "normal", + "_key": "db5f057ba47f" + }, + { + "_key": "a75943c7b299", + "children": [ + { + "text": "", + "_key": "94e89a281fe8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "nextflow run examples/blast-parallel.nf -with-docker", + "_type": "code", + "_key": "f12092554082" + }, + { + "markDefs": [], + "children": [ + { + "_key": "e11f8954e4d0", + "_type": "span", + "marks": [], + "text": "This will run the pipeline printing the final alignment out on the terminal screen. You can also provide your own protein sequences multi FASTA file by adding, in the above command line, the option " + }, + { + "text": "--query <file>", + "_key": "5799a212d7cd", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and change the splitting chunk size with ", + "_key": "ca22a142e5f6" + }, + { + "_key": "68cedbfda986", + "_type": "span", + "marks": [ + "code" + ], + "text": "--chunk n" + }, + { + "marks": [], + "text": " option.", + "_key": "fb3207c6ea3b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "96b3c7fbb2bf" + }, + { + "children": [ + { + "text": "", + "_key": "ef4b3e44fdb6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3379bbe73c3b" + }, + { + "style": "normal", + "_key": "e2ae936230cd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Note: the result doesn't have a real biological meaning since it uses a very small protein database.", + "_key": "e6934599e94d" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0107bb7f6bf1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a9a7d555a8d9" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "63909b8e2449", + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "315b06147edd" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The mix of Docker, GitHub and Nextflow technologies make it possible to deploy self-contained and truly replicable pipelines. It requires zero configuration and enables the reproducibility of data analysis pipelines in any system in which a Java VM and the Docker engine are available.", + "_key": "ddd5aea27036" + } + ], + "_type": "block", + "style": "normal", + "_key": "5f7c0b08fbed" + }, + { + "style": "normal", + "_key": "bf283b2abe07", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8e17dc417921" + } + ], + "_type": "block" + }, + { + "_key": "f3ab3956c37f", + "children": [ + { + "text": "Learn how to do it!", + "_key": "9bb018a91da3", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "marks": [], + "text": "Follow our documentation for a quick start using Docker with Nextflow at the following link ", + "_key": "b92eeb971011", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9cc965622b48" + ], + "text": "https://www.nextflow.io/docs/latest/docker.html", + "_key": "ff0e07795e14" + }, + { + "_type": "span", + "marks": [], + "text": " ", + "_key": "fbe6df02a5d4" + }, + { + "_type": "span", + "text": "
", + "_key": "f31359351125" + } + ], + "_type": "block", + "style": "normal", + "_key": "130158487105", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/docker.html", + "_key": "9cc965622b48" + } + ] + } + ], + "tags": [], + "_createdAt": "2024-09-25T14:14:50Z", + "_rev": "rsIQ9Jd8Z4nKBVUruy4O4v", + "_type": "blogPost", + "author": { + "_ref": "7d389002-0fae-4149-98d4-22623b6afbed", + "_type": "reference" + } + }, + { + "_createdAt": "2024-09-25T14:17:48Z", + "_id": "5f3cd5938313", + "title": "Moving toward better support through the Community forum", + "_updatedAt": "2024-09-27T09:51:34Z", + "publishedAt": "2024-08-28T06:00:00.000Z", + "author": { + "_ref": "geraldine-van-der-auwera", + "_type": "reference" + }, + "_rev": "rsIQ9Jd8Z4nKBVUruy4XGp", + "_type": "blogPost", + "tags": [ + { + "_type": "reference", + "_key": "b95b2f2fa77a", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_ref": "3d25991c-f357-442b-a5fa-6c02c3419f88", + "_type": "reference", + "_key": "36f4314239bb" + } + ], + "meta": { + "description": "As the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support.", + "slug": { + "current": "better-support-through-community-forum-2024" + } + }, + "body": [ + { + "style": "normal", + "_key": "f83ae119d802", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support. By improving the forum’s usability and accessibility, we’re aiming to create a more efficient and welcoming environment for everyone. Read on to learn about the changes we’re implementing and how you can contribute to making the forum an even better resource for the community.", + "_key": "975d210f07a0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "832741b82fb0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "cea7bd7b09e8", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "One of the things that impressed me the most when I joined Seqera last year as a developer advocate for the Nextflow community, was how engaged people are, and how much peer-to-peer interaction there is across a vast range of scientific domains, cultures, and geographies. That’s wonderful for a number of reasons, not least of which is that whenever you run into a problem —or you’re trying to do something a bit complicated or new— it’s very likely that there is someone out there who is able and willing to help you figure it out.", + "_key": "805d14899cbd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a5f6c9409136", + "markDefs": [] + }, + { + "_key": "51998bea4846", + "markDefs": [], + "children": [ + { + "_key": "02353600d457", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e87bc3093a4b", + "markDefs": [ + { + "_type": "link", + "href": "https://community.seqera.io/", + "_key": "286d13f872a3" + } + ], + "children": [ + { + "_key": "e075f2913e21", + "_type": "span", + "marks": [], + "text": "For the past few months, our small team of developer advocates have been thinking about how to nurture that dynamism, and how to further improve the experience of peer-to-peer support as the Nextflow community continues to grow. We’ve come to the conclusion that the best thing we can do is make the " + }, + { + "marks": [ + "286d13f872a3" + ], + "text": "community forum", + "_key": "7c1586e24afc", + "_type": "span" + }, + { + "marks": [], + "text": " an awesome place to go for help, answers, and resources.", + "_key": "c152851896fa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d204598de8c3", + "markDefs": [], + "children": [ + { + "_key": "3b85f75ec79a", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "text": "Why focus on the forum?", + "_key": "3a3a6a4b84e1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "30be06bfbdb1", + "markDefs": [] + }, + { + "children": [ + { + "text": "If you’re familiar with the Nextflow Slack workspace, you know there’s a lot of activity there, and the #help channel is always hopping. It’s true, and that’s great, buuuuut using Slack has some important downsides that the forum doesn’t suffer from.", + "_key": "673398ee6c9b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0b70b3d41ca3", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "2048729a9cf8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "359d2c52adcd", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "One of the standout features of the forum is the ability to search past questions and answers really easily. Whether you're browsing directly within the forum, or using Google or some other search engine, you can quickly find relevant information in a way that’s much harder to do on Slack. This means that solutions to common issues are readily accessible, saving you (and the resident experts who have already answered the same question a bunch of times) a whole lot of time and effort.", + "_key": "efb40ae8dbeb", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9b17349051bc" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "da3297d10918", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0ffa5ca023ef" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, the forum has no barrier to access— you can view all the content without the need to join yet another app. This open access ensures that everyone can benefit from the wealth of knowledge shared by community members.", + "_key": "c1b8bc252365" + } + ], + "_type": "block", + "style": "normal", + "_key": "98bdb6cd3574", + "markDefs": [] + }, + { + "_key": "04f38b1f44fa", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "eff8fecc8cb9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "9c9526483719", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Immediate improvements to the forum’s ease of use", + "_key": "9f68ec3cee23" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We’re excited to roll out a few immediate changes to the forum that should make it easier and more pleasant to use.", + "_key": "4f508160c66d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7d1b7e5747b9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d8921c79a9d0" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2e6708266a5" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "We’re introducing a new, sleeker visual design to make navigation and posting more intuitive and enjoyable. ", + "_key": "10ace6743d0d", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "952be771f2c0" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We’ve reorganized the categories to streamline the process of finding and providing help. Instead of having separate categories for various things (like Nextflow, Wave, Seqera Platform etc), there is now a single \"Ask for help\" category for all topics, eliminating any confusion about where to post your question. Simply put, if you need help, just post in the \"Ask for help\" category. Done.", + "_key": "123388410037", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4017d4448ad9" + }, + { + "_key": "f12aa532d61e", + "markDefs": [], + "children": [ + { + "_key": "759ac57f39b4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5ec6ec2ac1c7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We’re also planning to mirror existing categories from the Nextflow Slack workspace, such as the jobs board and shameless promo channels, to make that content more visible and searchable. This will help you find opportunities and promote your work more effectively.", + "_key": "03b236799d91" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d4a11d6ec75d", + "markDefs": [], + "children": [ + { + "_key": "5bc990578b86", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "b724d8cb9556", + "markDefs": [], + "children": [ + { + "_key": "a1e795d5bf44", + "_type": "span", + "marks": [], + "text": "What you can do to help" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "ad54cbb38c97", + "_type": "span", + "marks": [], + "text": "These changes are meant to make the forum a great place for peer-to-peer support for the Nextflow community. You can help us improve it further by giving us your feedback about the forum functionality (don’t be shy), by posting your questions in the forum, and of course, if you’re already a Nextflow expert, by answering questions there." + } + ], + "_type": "block", + "style": "normal", + "_key": "78765a70912e", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "1afca099c8cd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "948a32a1f182", + "markDefs": [] + }, + { + "style": "normal", + "_key": "f2d37db417c6", + "markDefs": [ + { + "_key": "1a8df9c34f9f", + "_type": "link", + "href": "https://community.seqera.io/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Check out the ", + "_key": "d37db990570b" + }, + { + "_key": "5a0abff9240f", + "_type": "span", + "marks": [ + "1a8df9c34f9f" + ], + "text": "community forum" + }, + { + "text": " now!", + "_key": "650faf81aead", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + } + ] + }, + { + "_createdAt": "2024-09-25T14:15:27Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "title": "Conda support has landed!", + "meta": { + "slug": { + "current": "conda-support-has-landed" + } + }, + "body": [ + { + "_key": "8e1c781f5ec3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow aims to ease the development of large scale, reproducible workflows allowing developers to focus on the main application logic and to rely on best community tools and best practices.", + "_key": "e4b551d5f424", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "dcc930a16cc3", + "children": [ + { + "text": "", + "_key": "c2aebdad7124", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://conda.io/docs/", + "_key": "39b964772ae4" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this reason we are very excited to announce that the latest Nextflow version (", + "_key": "6a8eb6043710" + }, + { + "marks": [ + "code" + ], + "text": "0.30.0", + "_key": "415330266c17", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ") finally provides built-in support for ", + "_key": "71b461737937" + }, + { + "marks": [ + "39b964772ae4" + ], + "text": "Conda", + "_key": "36ed138aec6e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "2b940c73a6b9" + } + ], + "_type": "block", + "style": "normal", + "_key": "594f5b83ef3b" + }, + { + "_key": "eca3ea36cda2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "237b6fb6c889" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Conda is a popular package manager that simplifies the installation of software packages and the configuration of complex software environments. Above all, it provides access to large tool and software package collections maintained by domain specific communities such as ", + "_key": "8da796d4a02d", + "_type": "span", + "marks": [] + }, + { + "text": "Bioconda", + "_key": "126fbfff61f5", + "_type": "span", + "marks": [ + "3972c697d8eb" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "6d6dc5f05f7c" + }, + { + "marks": [ + "3c5575677800" + ], + "text": "BioBuild", + "_key": "bd989f0a0af4", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "362b40365c6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "86ba305a5cf6", + "markDefs": [ + { + "_key": "3972c697d8eb", + "_type": "link", + "href": "https://bioconda.github.io" + }, + { + "_key": "3c5575677800", + "_type": "link", + "href": "https://biobuilds.org/" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "bdf458020ca9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d1443ccc9f9a" + } + ] + }, + { + "children": [ + { + "_key": "636a94f2dc78", + "_type": "span", + "marks": [], + "text": "The native integration with Nextflow allows researchers to develop workflow applications in a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the configuration flexibility, portability and scalability provided by Nextflow." + } + ], + "_type": "block", + "style": "normal", + "_key": "8d2ff7a252a9", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b9c247c9f3b0" + } + ], + "_type": "block", + "style": "normal", + "_key": "935a6d1a1d1b" + }, + { + "_type": "block", + "style": "h3", + "_key": "d0c1004d0559", + "children": [ + { + "_type": "span", + "text": "How it works", + "_key": "bb547be66da9" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "705e94df49b1", + "markDefs": [], + "children": [ + { + "_key": "892c9465fd99", + "_type": "span", + "marks": [], + "text": "Nextflow automatically creates and activates the Conda environment(s) given the dependencies specified by each process." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "347aa5fcd432", + "children": [ + { + "_type": "span", + "text": "", + "_key": "240c830f4bd5" + } + ] + }, + { + "style": "normal", + "_key": "dc319cfc02d4", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/process.html#conda", + "_key": "28d0caca2c0f" + } + ], + "children": [ + { + "marks": [], + "text": "Dependencies are specified by using the ", + "_key": "3ab14c702633", + "_type": "span" + }, + { + "marks": [ + "28d0caca2c0f" + ], + "text": "conda", + "_key": "60f32ae06d7f", + "_type": "span" + }, + { + "marks": [], + "text": " directive, providing either the names of the required Conda packages, the path of a Conda environment yaml file or the path of an existing Conda environment directory.", + "_key": "d404c5c777ba", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e3fa1e5294a7", + "children": [ + { + "text": "", + "_key": "938184641d1d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "c540f99c0f27", + "markDefs": [], + "children": [ + { + "text": "Conda environments are stored on the file system. By default Nextflow instructs Conda to save the required environments in the pipeline work directory. You can specify the directory where the Conda environments are stored using the ", + "_key": "e72873bc6d31", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "conda.cacheDir", + "_key": "8b5e3d7c4e9b" + }, + { + "_type": "span", + "marks": [], + "text": " configuration property.", + "_key": "c988c4e87355" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "977562c50320", + "children": [ + { + "_type": "span", + "text": "", + "_key": "db3cd534afa2" + } + ], + "_type": "block" + }, + { + "_key": "1d08e0a0a186", + "children": [ + { + "text": "Use Conda package names", + "_key": "4b55a8decd8d", + "_type": "span" + } + ], + "_type": "block", + "style": "h4" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The simplest way to use one or more Conda packages consists in specifying their names using the ", + "_key": "f7157c59efa9" + }, + { + "marks": [ + "code" + ], + "text": "conda", + "_key": "22a792c4a534", + "_type": "span" + }, + { + "text": " directive. Multiple package names can be specified by separating them with a space. For example:", + "_key": "5a1b0a625aaf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6d100f5ea5e8" + }, + { + "_type": "block", + "style": "normal", + "_key": "9e1355c93a5d", + "children": [ + { + "_key": "fd929d230c24", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "c69c76e353f4", + "code": "process foo {\n conda \"bwa samtools multiqc\"\n\n \"\"\"\n your_command --here\n \"\"\"\n}" + }, + { + "children": [ + { + "_key": "2440a13e29d0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f7fc39c33959" + }, + { + "_type": "block", + "style": "normal", + "_key": "0dfe954377ed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Using the above definition a Conda environment that includes BWA, Samtools and MultiQC tools is created and activated when the process is executed.", + "_key": "538d52903d80" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7e2dbbeaec64" + } + ], + "_type": "block", + "style": "normal", + "_key": "161fc6ca34ae" + }, + { + "_type": "block", + "style": "normal", + "_key": "dbc2e50963a7", + "markDefs": [], + "children": [ + { + "text": "The usual Conda package syntax and naming conventions can be used. The version of a package can be specified after the package name as shown here: ", + "_key": "9703b9139acb", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "bwa=0.7.15", + "_key": "934afae29289" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "041526567843" + } + ] + }, + { + "_key": "9a56d457a79c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f30f8854828b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "019ecd1d0037", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The name of the channel where a package is located can be specified prefixing the package with the channel name as shown here: ", + "_key": "66d9c16b777a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "bioconda::bwa=0.7.15", + "_key": "22bdee518abe" + }, + { + "text": ".", + "_key": "31d3ffff0f30", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "77b481802943" + } + ], + "_type": "block", + "style": "normal", + "_key": "e324d1d431c2" + }, + { + "children": [ + { + "_key": "50121e1d3843", + "_type": "span", + "text": "Use Conda environment files" + } + ], + "_type": "block", + "style": "h4", + "_key": "09dcdcfb9f33" + }, + { + "style": "normal", + "_key": "464a85127087", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "When working in a project requiring a large number of dependencies it can be more convenient to consolidate all required tools using a Conda environment file. This is a file that lists the required packages and channels, structured using the YAML format. For example:", + "_key": "2ab570ef9d18" + } + ], + "_type": "block" + }, + { + "_key": "ec5f4c03fda2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "57c6b7e93f56" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "3af9d15db043", + "code": "name: my-env\nchannels:\n - bioconda\n - conda-forge\n - defaults\ndependencies:\n - star=2.5.4a\n - bwa=0.7.15" + }, + { + "style": "normal", + "_key": "3cb3ee8f4d7e", + "children": [ + { + "text": "", + "_key": "90209221496e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1da0ceb0b5f", + "markDefs": [], + "children": [ + { + "_key": "00a881a6e462", + "_type": "span", + "marks": [], + "text": "The path of the environment file can be specified using the " + }, + { + "_key": "2a752b118912", + "_type": "span", + "marks": [ + "code" + ], + "text": "conda" + }, + { + "marks": [], + "text": " directive:", + "_key": "219de28fc9c7", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "0135da8d77aa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc89e01355d0" + }, + { + "code": "process foo {\n conda '/some/path/my-env.yaml'\n\n '''\n your_command --here\n '''\n}", + "_type": "code", + "_key": "02fde70de661" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e2a3a8eec5d6" + } + ], + "_type": "block", + "style": "normal", + "_key": "65aed67dd390" + }, + { + "_type": "block", + "style": "normal", + "_key": "31114b76c635", + "markDefs": [], + "children": [ + { + "_key": "97f4fc922077", + "_type": "span", + "marks": [], + "text": "Note: the environment file name " + }, + { + "marks": [ + "strong" + ], + "text": "must", + "_key": "8e7b6545afe4", + "_type": "span" + }, + { + "_key": "791414332dd6", + "_type": "span", + "marks": [], + "text": " end with a " + }, + { + "_key": "7dfb057dcc51", + "_type": "span", + "marks": [ + "code" + ], + "text": ".yml" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "46454509315f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".yaml", + "_key": "cdfda04da021" + }, + { + "marks": [], + "text": " suffix otherwise it won't be properly recognized. Also relative paths are resolved against the workflow launching directory.", + "_key": "90814a5cce02", + "_type": "span" + } + ] + }, + { + "_key": "c916c6c3eb84", + "children": [ + { + "text": "", + "_key": "e87a28b3a9ff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The suggested approach is to store the the Conda environment file in your project root directory and reference it in the ", + "_key": "d7397645a0e3" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "15ed72b3f3fd" + }, + { + "_key": "90e6f7f0d97f", + "_type": "span", + "marks": [], + "text": " directory using the " + }, + { + "text": "baseDir", + "_key": "22caeabb7f94", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " variable as shown below:", + "_key": "d16f7c149cff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b6675befb31c", + "markDefs": [] + }, + { + "_key": "fc09a3a6a53d", + "children": [ + { + "text": "", + "_key": "e005eb442107", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "process.conda = \"$baseDir/my-env.yaml\"", + "_type": "code", + "_key": "5204ab192f28" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ac8c9619d49a" + } + ], + "_type": "block", + "style": "normal", + "_key": "21779cb8c938" + }, + { + "children": [ + { + "text": "This guarantees that the environment paths is correctly resolved independently of the execution path.", + "_key": "492d3ffe6c89", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "562416248e72", + "markDefs": [] + }, + { + "_key": "e2f943e042cb", + "children": [ + { + "_type": "span", + "text": "", + "_key": "cf72e4a8ec53" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e7d4a6a08bd2", + "markDefs": [ + { + "href": "/docs/latest/conda.html", + "_key": "66d59a50a7a4", + "_type": "link" + } + ], + "children": [ + { + "_key": "683de47047d8", + "_type": "span", + "marks": [], + "text": "See the " + }, + { + "_type": "span", + "marks": [ + "66d59a50a7a4" + ], + "text": "documentation", + "_key": "1616c8038e9c" + }, + { + "_type": "span", + "marks": [], + "text": " for more details on how to configure and use Conda environments in your Nextflow workflow.", + "_key": "41217c97018c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "858518382baa", + "children": [ + { + "text": "", + "_key": "1c44a834e6db", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "4bfa23325312", + "children": [ + { + "text": "Bonus!", + "_key": "561e05bafaa7", + "_type": "span" + } + ] + }, + { + "_key": "c5997755f2df", + "markDefs": [ + { + "_type": "link", + "href": "https://biocontainers.pro/", + "_key": "15b3eb805afd" + } + ], + "children": [ + { + "text": "This release includes also a better support for ", + "_key": "6bc93514cb23", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "15b3eb805afd" + ], + "text": "Biocontainers", + "_key": "d8d8c188144f" + }, + { + "_type": "span", + "marks": [], + "text": ". So far, Nextflow users were able to use container images provided by the Biocontainers community. However, it was not possible to collect process metrics and runtime statistics within those images due to the usage of a legacy version of the ", + "_key": "aa9139f3fd7d" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "ps", + "_key": "9bdf8e100481" + }, + { + "_type": "span", + "marks": [], + "text": " system tool that is not compatible with the one expected by Nextflow.", + "_key": "b09bee9bfd48" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "cd730902c8f9", + "children": [ + { + "_key": "14f001d46337", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The latest version of Nextflow does not require the ", + "_key": "0a8c4a6ec222" + }, + { + "marks": [ + "code" + ], + "text": "ps", + "_key": "2a2b6e490217", + "_type": "span" + }, + { + "_key": "911f5128b219", + "_type": "span", + "marks": [], + "text": " tool any more to fetch execution metrics and runtime statistics, therefore this information is collected and correctly reported when using Biocontainers images." + } + ], + "_type": "block", + "style": "normal", + "_key": "da99f5d82733" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c243f5ce2b09" + } + ], + "_type": "block", + "style": "normal", + "_key": "b6127601042d" + }, + { + "_key": "a2f83771207c", + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "a9ff6a7721da" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "marks": [], + "text": "We are very excited by this new feature bringing the ability to use popular Conda tool collections, such as Bioconda, directly into Nextflow workflow applications.", + "_key": "a17640ed2289", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fd196e1679d4", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "877f9b3f5ed5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9071da705084" + }, + { + "_type": "block", + "style": "normal", + "_key": "7c4830edabe0", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/process.html#module", + "_key": "50a3bd45ad1f" + }, + { + "_type": "link", + "href": "/docs/latest/docker.html", + "_key": "092299e5493a" + }, + { + "_key": "e3f18233e8cb", + "_type": "link", + "href": "/docs/latest/singularity.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow developers have now yet another option to transparently manage the dependencies in their workflows along with ", + "_key": "032d4f84a990" + }, + { + "text": "Environment Modules", + "_key": "dcf1d1a14c31", + "_type": "span", + "marks": [ + "50a3bd45ad1f" + ] + }, + { + "marks": [], + "text": " and ", + "_key": "1bbf662db980", + "_type": "span" + }, + { + "text": "containers", + "_key": "4770201cd0f8", + "_type": "span", + "marks": [ + "092299e5493a" + ] + }, + { + "text": " ", + "_key": "e75526c363e9", + "_type": "span", + "marks": [] + }, + { + "_key": "2771d68d121d", + "_type": "span", + "marks": [ + "e3f18233e8cb" + ], + "text": "technology" + }, + { + "_type": "span", + "marks": [], + "text": ", giving them great configuration flexibility.", + "_key": "046d54b327b8" + } + ] + }, + { + "style": "normal", + "_key": "7af51a444e2f", + "children": [ + { + "_key": "9780b0688114", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "fd14b4fbb5ed", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The resulting workflow applications can easily be reconfigured and deployed across a range of different platforms choosing the best technology according to the requirements of the target system.", + "_key": "2cdf92b5eb3d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + } + ], + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "422fb1709920" + } + ], + "_type": "blogPost", + "_id": "5f4514b9fac7", + "_updatedAt": "2024-09-26T09:01:55Z", + "_rev": "2PruMrLMGpvZP5qAknmBcO", + "publishedAt": "2018-06-05T06:00:00.000Z" + }, + { + "body": [ + { + "_key": "a5528d44262b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow is a powerful tool for developing scientific workflows for use on HPC systems. It provides a simple solution to deploy parallelized workloads at scale using an elegant reactive/functional programming model in a portable manner.", + "_key": "00bfb4363686", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "04bb838de795", + "children": [ + { + "text": "", + "_key": "39746ddf94fc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "1fad2b9795ad", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It supports the most popular workload managers such as Grid Engine, Slurm, LSF and PBS, among other out-of-the-box executors, and comes with sensible defaults for each. However, each HPC system is a complex machine with its own characteristics and constraints. For this reason you should always consult your system administrator before running a new piece of software or a compute intensive pipeline that spawns a large number of jobs.", + "_key": "72e078e6f485" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "40cb73841049", + "children": [ + { + "text": "", + "_key": "3c3bbec1528a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "02bb0c9fcaae", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this series of posts, we will be sharing the top tips we have learned along the way that should help you get results faster while keeping in the good books of your sys admins.", + "_key": "5540df73b437" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "ca75f12995ca", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "af0f0c48bcfb" + }, + { + "_type": "block", + "style": "h3", + "_key": "e38d801cb1a9", + "children": [ + { + "_type": "span", + "text": "1. Don't forget the executor", + "_key": "382abe7979a2" + } + ] + }, + { + "style": "normal", + "_key": "defceba2784e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow, by default, spawns parallel task executions in the computer on which it is running. This is generally useful for development purposes, however, when using an HPC system you should specify the executor matching your system. This instructs Nextflow to submit pipeline tasks as jobs into your HPC workload manager. This can be done adding the following setting to the ", + "_key": "1a112a3261bf" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "64b2b37cda78" + }, + { + "marks": [], + "text": " file in the launching directory, for example:", + "_key": "22976c85146b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "106bbe443739" + } + ], + "_type": "block", + "style": "normal", + "_key": "c64abc360e9a" + }, + { + "code": "process.executor = 'slurm'", + "_type": "code", + "_key": "a7e5a09a5ccf" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b0be932e74d2" + } + ], + "_type": "block", + "style": "normal", + "_key": "3dc9a2b24ae2" + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/executor.html", + "_key": "07b7cbd41fc8", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the above setting Nextflow will submit the job executions to your Slurm cluster spawning a ", + "_key": "2d57d6634e7a" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "sbatch", + "_key": "89db489d2a4f" + }, + { + "marks": [], + "text": " command for each job in your pipeline. Find the executor matching your system at ", + "_key": "9993bd02ea4c", + "_type": "span" + }, + { + "marks": [ + "07b7cbd41fc8" + ], + "text": "this link", + "_key": "a37e73b9e40e", + "_type": "span" + }, + { + "text": ". Even better, to prevent the undesired use of the local executor in a specific environment, define the ", + "_key": "f55cb4ed2f4a", + "_type": "span", + "marks": [] + }, + { + "_key": "16a3ff0eee0a", + "_type": "span", + "marks": [ + "em" + ], + "text": "default" + }, + { + "marks": [], + "text": " executor to be used by Nextflow using the following system variable:", + "_key": "25dc4e3908d2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e4d924a7d494" + }, + { + "_type": "block", + "style": "normal", + "_key": "fa78c4395ca1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "75646083b7eb" + } + ] + }, + { + "code": "export NXF_EXECUTOR=slurm", + "_type": "code", + "_key": "ca340d29e16c" + }, + { + "style": "normal", + "_key": "2aaa9fd5f34b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9bd6557a7003" + } + ], + "_type": "block" + }, + { + "_key": "101739375aa8", + "children": [ + { + "_type": "span", + "text": "2. Nextflow as a job", + "_key": "a5a8c072ad1d" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "0f25f1d82e70", + "markDefs": [], + "children": [ + { + "_key": "09b666357680", + "_type": "span", + "marks": [], + "text": "Quite surely your sys admin has already warned you that the login/head node should only be used to submit job executions and not run compute intensive tasks. When running a Nextflow pipeline, the driver application submits and monitors the job executions on your cluster (provided you have correctly specified the executor as stated in point 1), and therefore it should not run compute intensive tasks." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b280850aa299", + "children": [ + { + "_key": "7a95ba7fa16b", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "1e0eda33feac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "However, it's never a good practice to launch a long running job in the login node, and therefore a good practice consists of running Nextflow itself as a cluster job. This can be done by wrapping the ", + "_key": "5a0a18ef2483", + "_type": "span" + }, + { + "text": "nextflow run", + "_key": "41087e124c17", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "c1f661b2937e", + "_type": "span", + "marks": [], + "text": " command in a shell script and submitting it as any other job. An average pipeline may require 2 CPUs and 2 GB of resources allocation." + } + ] + }, + { + "children": [ + { + "_key": "3199bdb9e22c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5e797e5d3dc6" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Note: the queue where the Nextflow driver job is submitted should allow the spawning of the pipeline jobs to carry out the pipeline execution.", + "_key": "0f28ad2fd649" + } + ], + "_type": "block", + "style": "normal", + "_key": "6905e8be4fd7", + "markDefs": [] + }, + { + "_key": "48ab876dfe8e", + "children": [ + { + "text": "", + "_key": "810e409c0977", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "3. Use the queueSize directive", + "_key": "17a472b82a73" + } + ], + "_type": "block", + "style": "h3", + "_key": "c5159311e7d3" + }, + { + "_key": "616b3f20a57b", + "markDefs": [], + "children": [ + { + "text": "The ", + "_key": "6cf31c352cd5", + "_type": "span", + "marks": [] + }, + { + "_key": "450d8dea6907", + "_type": "span", + "marks": [ + "code" + ], + "text": "queueSize" + }, + { + "_key": "48d506f6f05d", + "_type": "span", + "marks": [], + "text": " directive is part of the executor configuration in the " + }, + { + "text": "nextflow.config", + "_key": "2aba511383c4", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " file, and defines how many processes are queued at a given time. By default, Nextflow will submit up to 100 jobs at a time for execution. Increase or decrease this setting depending your HPC system quota and throughput. For example:", + "_key": "9faeabd3b466", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e4cb21aafbd0", + "children": [ + { + "_key": "87feb0f1dd04", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "executor {\n name = 'slurm'\n queueSize = 50\n}", + "_type": "code", + "_key": "44720d6705a3" + }, + { + "style": "normal", + "_key": "3cd58646da3d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "298fedf17193" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "4ceef9af064b", + "children": [ + { + "_type": "span", + "text": "4. Specify the max heap size", + "_key": "0edafb3a3a27" + } + ], + "_type": "block" + }, + { + "_key": "15d487949313", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow runtime runs on top of the Java virtual machine which, by design, tries to allocate as much memory as is available. This is not a good practice in HPC systems which are designed to share compute resources across many users and applications. To avoid this, specify the maximum amount of memory that can be used by the Java VM using the -Xms and -Xmx Java flags. These can be specified using the ", + "_key": "d81b95aabb7a" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "NXF_OPTS", + "_key": "eafe423246c9" + }, + { + "_type": "span", + "marks": [], + "text": " environment variable.", + "_key": "922a02ea25e0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "6c1ecf681f7e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b0b07097173a" + } + ] + }, + { + "style": "normal", + "_key": "ae13f925a9f6", + "markDefs": [], + "children": [ + { + "_key": "fab67168f4e5", + "_type": "span", + "marks": [], + "text": "For example:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "6b89ce4db8d2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "273f8c92ca0f" + }, + { + "code": "export NXF_OPTS=\"-Xms500M -Xmx2G\"", + "_type": "code", + "_key": "8da3f66fd627" + }, + { + "style": "normal", + "_key": "bbb6af637642", + "children": [ + { + "text": "", + "_key": "6a8cd758b4c7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The above setting instructs Nextflow to allocate a Java heap in the range of 500 MB and 2 GB of RAM.", + "_key": "db259da92a47" + } + ], + "_type": "block", + "style": "normal", + "_key": "e46997cfefe8" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f8d576371805" + } + ], + "_type": "block", + "style": "normal", + "_key": "2636915417ff" + }, + { + "children": [ + { + "_type": "span", + "text": "5. Limit the Nextflow submit rate", + "_key": "1f98af53ef0e" + } + ], + "_type": "block", + "style": "h3", + "_key": "85b3cc54a7f9" + }, + { + "_key": "6fb19e0fb327", + "markDefs": [], + "children": [ + { + "_key": "1e551b32b4d8", + "_type": "span", + "marks": [], + "text": "Nextflow attempts to submit the job executions as quickly as possible, which is generally not a problem. However, in some HPC systems the submission throughput is constrained or it should be limited to avoid degrading the overall system performance. To prevent this problem you can use " + }, + { + "marks": [ + "code" + ], + "text": "submitRateLimit", + "_key": "90a15258ce64", + "_type": "span" + }, + { + "marks": [], + "text": " to control the Nextflow job submission throughput. This directive is part of the ", + "_key": "d66eaf79d7c4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "executor", + "_key": "ffec3aad606c" + }, + { + "text": " configuration scope, and defines the number of tasks that can be submitted per a unit of time. The default for the ", + "_key": "22457b571cea", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "submitRateLimit", + "_key": "748448acc977" + }, + { + "text": " is unlimited. You can specify the ", + "_key": "ec53f6d16397", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "submitRateLimit", + "_key": "f4aa4853d620", + "_type": "span" + }, + { + "marks": [], + "text": " like this:", + "_key": "54783ccdc294", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c9ea39dd969b", + "children": [ + { + "text": "", + "_key": "bccaaa02ec6b", + "_type": "span" + } + ] + }, + { + "code": "executor {\n submitRateLimit = '10 sec'\n}", + "_type": "code", + "_key": "30983a34b7e1" + }, + { + "style": "normal", + "_key": "e2ef1fdbb7cd", + "children": [ + { + "_key": "226f6f8af507", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "You can also more explicitly specify it as a rate of # processes / time unit:", + "_key": "ed6856a2a65d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b8a7bbb229d2" + }, + { + "children": [ + { + "text": "", + "_key": "ad5c59f186de", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8f7b1cd163bf" + }, + { + "code": "executor {\n submitRateLimit = '10/2min'\n}", + "_type": "code", + "_key": "e780801b3237" + }, + { + "children": [ + { + "text": "", + "_key": "a5a30e3483fa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab766a17e1dc" + }, + { + "_type": "block", + "style": "h3", + "_key": "9dfd78000926", + "children": [ + { + "_key": "2382373bae51", + "_type": "span", + "text": "Conclusion" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow aims to give you control over every aspect of your workflow. These options allow you to shape how Nextflow communicates with your HPC system. This can make workflows more robust while avoiding overloading the executor. Some systems have hard limits, and if you do not take them into account, it will stop any jobs from being scheduled.", + "_key": "a6c1f5422318", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3395309d751b" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c16c805e4846" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f238fc2ff8b" + }, + { + "_key": "c0d6f9fdbda6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Stay tuned for part two where we will discuss background executions, retry strategies, maxForks and other tips.", + "_key": "4e6532baf185" + } + ], + "_type": "block", + "style": "normal" + } + ], + "publishedAt": "2021-05-13T06:00:00.000Z", + "_id": "5fa9c85b6ccb", + "tags": [ + { + "_type": "reference", + "_key": "06bcba9dfc37", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "author": { + "_type": "reference", + "_ref": "5bLgfCKN00diCN0ijmWND4" + }, + "_rev": "dXl6H5FKmU0e1y1haIlKYb", + "_createdAt": "2024-09-25T14:15:54Z", + "_updatedAt": "2024-09-26T09:02:26Z", + "_type": "blogPost", + "meta": { + "slug": { + "current": "5_tips_for_hpc_users" + } + }, + "title": "5 Nextflow Tips for HPC Users" + }, + { + "_id": "636455d7dab6", + "meta": { + "description": "Reproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is hopefully being updated, and pipelines are ideally being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.", + "slug": { + "current": "nf-test-in-nf-core" + } + }, + "_createdAt": "2024-09-25T14:18:27Z", + "_updatedAt": "2024-09-27T08:46:33Z", + "body": [ + { + "_key": "d78dc1997976", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ever-changing landscape of bioinformatics", + "_key": "d942023c82c4" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "7b9073c525fc", + "markDefs": [], + "children": [ + { + "text": "Reproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is ", + "_key": "f1dbb3706962", + "_type": "span", + "marks": [] + }, + { + "_key": "15bf44e9649f", + "_type": "span", + "marks": [ + "strong" + ], + "text": "hopefully" + }, + { + "marks": [], + "text": " being updated, and pipelines are ", + "_key": "aafd8d3af8cf", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "ideally", + "_key": "45a0fa5cde67" + }, + { + "marks": [], + "text": " being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.", + "_key": "52abfb103fca", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d0c633c915c7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7ee03a57a60f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "c367b0ab0276", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The wonderful world of testing", + "_key": "11007d896b7d" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "3b6dc1f09c1e", + "_type": "span", + "marks": [], + "text": "\"Software testing is the process of evaluating and verifying that a software product does what it is supposed to do,\" " + }, + { + "_key": "f6068f127b27", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Lukas Forer, co-creator of nf-test." + } + ], + "_type": "block", + "style": "blockquote", + "_key": "210473de2d41" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "4f79eb3f6d53", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "63710736a2bc" + }, + { + "_key": "e45d889f3e33", + "markDefs": [], + "children": [ + { + "text": "Software testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs across runs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are introduced during an update.", + "_key": "d9d5dd72b69a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "48023f7774ed", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a339a1397c94", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "8d8ad1bbc95f", + "_type": "span", + "marks": [], + "text": "Writing effective tests" + } + ], + "_type": "block", + "style": "h2", + "_key": "77444d1ae950" + }, + { + "_type": "block", + "style": "normal", + "_key": "6d8a94206b0a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Although having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups:", + "_key": "881094ed647a" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7419bcb18756", + "listItem": "number", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Which inputs/functionalities should be tested?", + "_key": "2dfed6160324", + "_type": "span" + } + ], + "level": 1 + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4a4e1aa8d6f0", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_key": "301a43e872d2", + "_type": "span", + "marks": [], + "text": "What contents should be tested?" + } + ] + }, + { + "style": "normal", + "_key": "b5bac8bcfc4b", + "markDefs": [], + "children": [ + { + "_key": "65e27d3e4316", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Consideration 1: Testing inputs/functionality", + "_key": "b8c60e3b6e65", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "2cbeb7565520" + }, + { + "children": [ + { + "marks": [], + "text": "Generally, software will have a default or most common use case. For instance, the nf-core ", + "_key": "919508a2c099", + "_type": "span" + }, + { + "text": "FastQC", + "_key": "ef668a2e4d1e", + "_type": "span", + "marks": [ + "a0534635bae5" + ] + }, + { + "_type": "span", + "marks": [], + "text": " module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage (", + "_key": "aeee8dd56191" + }, + { + "_key": "5d3e2b5da7de", + "_type": "span", + "marks": [ + "3c4057a25f4a" + ], + "text": "\"the degree to which a test or set of tests exercises a particular program or system\"" + }, + { + "_type": "span", + "marks": [], + "text": "), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the ", + "_key": "be7231b257a1" + }, + { + "marks": [ + "f0209f4f4a5b" + ], + "text": "bowtie2/align", + "_key": "212c74902511", + "_type": "span" + }, + { + "marks": [], + "text": " module, aside from input files, the ", + "_key": "b12ec65bbf06", + "_type": "span" + }, + { + "text": "save_unaligned", + "_key": "fc97720fa814", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "dd7961a71879", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "sort_bam", + "_key": "5de54b50cd12" + }, + { + "_type": "span", + "marks": [], + "text": " parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process.", + "_key": "212ceb1406f9" + } + ], + "_type": "block", + "style": "normal", + "_key": "c8994834851d", + "markDefs": [ + { + "_key": "a0534635bae5", + "_type": "link", + "href": "https://nf-co.re/modules/fastqc" + }, + { + "_key": "3c4057a25f4a", + "_type": "link", + "href": "https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/" + }, + { + "_type": "link", + "href": "https://nf-co.re/modules/bowtie2_align", + "_key": "f0209f4f4a5b" + } + ] + }, + { + "_key": "bacac6bb3c60", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8a580e492e3b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Consideration 2: Testing outputs", + "_key": "3c702cc05af7", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "2b8b54e164e7" + }, + { + "style": "normal", + "_key": "e60a951811d0", + "markDefs": [], + "children": [ + { + "_key": "d8bf4f58b93d", + "_type": "span", + "marks": [], + "text": "Once test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible (for example, outputs containing timestamps or paths). In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced." + } + ], + "_type": "block" + }, + { + "_key": "14103c699745", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8771e780fa93" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "37768657518a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Testing in nf-core", + "_key": "4afd6843bf3c" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "nf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs.", + "_key": "b3cbf377efe7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "307231c984e3" + }, + { + "children": [ + { + "text": "", + "_key": "5abfa10ccb3e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "bc361a1713a3", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "New and Improved Nextflow Testing with nf-test", + "_key": "2192751d77ec" + } + ], + "_type": "block", + "style": "h2", + "_key": "bc5a624d3dcd", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Created by ", + "_key": "8091b5798ed0" + }, + { + "_key": "ad25834d621e", + "_type": "span", + "marks": [ + "ea58be3576b6" + ], + "text": "Lukas Forer" + }, + { + "_key": "c2fe24f9c682", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "8c5a2d63478e" + ], + "text": "Sebastian Schönherr", + "_key": "58b81f8f4fc9", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform:", + "_key": "50bf9532434d" + } + ], + "_type": "block", + "style": "normal", + "_key": "303f74c980d9", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/lukfor", + "_key": "ea58be3576b6" + }, + { + "_type": "link", + "href": "https://github.com/seppinho", + "_key": "8c5a2d63478e" + } + ] + }, + { + "children": [ + { + "_key": "ff83c2672764", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "76b89eba4f1e", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "dfbd6e0d8974", + "listItem": "number", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nf-test.com/docs/assertions/snapshots/", + "_key": "24bb096f5c77" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Comprehensive Output Testing", + "_key": "079bcd68a42f" + }, + { + "_type": "span", + "marks": [], + "text": ": nf-test employs ", + "_key": "903ea45cd46d" + }, + { + "marks": [ + "24bb096f5c77" + ], + "text": "snapshots ", + "_key": "e7e46c57fa06", + "_type": "span" + }, + { + "text": "for handling complex data structures. This feature evaluates the contents of any specified output channel/file, enabling comprehensive and reliable tests that ensure data integrity following changes.\n\n", + "_key": "f10ce47db27f", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "_type": "block", + "style": "normal", + "_key": "06572a42d087", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "A Consistent Testing Framework for All Nextflow Components", + "_key": "fac58e12e02d" + }, + { + "marks": [], + "text": ": nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components.\n\n", + "_key": "39f59e990118", + "_type": "span" + } + ], + "level": 1 + }, + { + "children": [ + { + "text": "A DSL for Tests", + "_key": "a3fa9cf8d9ab", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": ": Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing.\n\n", + "_key": "aa4ff7357593", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "228e3822886f", + "listItem": "number", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://www.nf-test.com/docs/assertions/assertions/", + "_key": "b3da144ff1fe", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Readable Assertions:", + "_key": "4a9374065224" + }, + { + "_type": "span", + "marks": [], + "text": " nf-test offers a wide range of functions for writing clear and understandable ", + "_key": "0a2843073702" + }, + { + "marks": [ + "b3da144ff1fe" + ], + "text": "assertions", + "_key": "7268e74d745e", + "_type": "span" + }, + { + "marks": [], + "text": ", improving the clarity and maintainability of tests.\n\n", + "_key": "3fe849657117", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3add4ef68d08", + "listItem": "number" + }, + { + "listItem": "number", + "markDefs": [], + "children": [ + { + "text": "Boilerplate Code Generation", + "_key": "f19688a483f8", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "a5b610b560fa", + "_type": "span", + "marks": [], + "text": ": To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests." + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c809fa3e1bca" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "06810366b339", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4a26455d8bab" + }, + { + "_type": "block", + "style": "h2", + "_key": "876496675fc6", + "markDefs": [], + "children": [ + { + "text": "But wait… there's more!", + "_key": "cf76043522dc", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "text": "The merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have nf-test implemented. Moreover, an ", + "_key": "14a82e18cb55", + "_type": "span", + "marks": [] + }, + { + "text": "expanding collection of examples", + "_key": "9540de112e8e", + "_type": "span", + "marks": [ + "97c2d0c992bc" + ] + }, + { + "marks": [], + "text": " is available to guide you through adopting nf-test for your projects.", + "_key": "dec1bf20f8ee", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8e934b8ac79", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/docs/contributing/tutorials/nf-test_assertions", + "_key": "97c2d0c992bc" + } + ] + }, + { + "_key": "e74b89ea6a7f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a8a1702d13f8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "4753e2f6d96c", + "markDefs": [], + "children": [ + { + "_key": "c53de44fa7a3", + "_type": "span", + "marks": [], + "text": "Adding nf-test to pipelines" + } + ] + }, + { + "children": [ + { + "_key": "dae2a6460660", + "_type": "span", + "marks": [], + "text": "Several nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, " + }, + { + "text": "nf-core/methylseq", + "_key": "58b808e431e5", + "_type": "span", + "marks": [ + "2bc70c7e1e4a" + ] + }, + { + "_type": "span", + "marks": [], + "text": " was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future. Following these insights, ", + "_key": "b169e0266c05" + }, + { + "_key": "9491d804515e", + "_type": "span", + "marks": [ + "25b095497343" + ], + "text": "nf-core/fetchngs" + }, + { + "marks": [], + "text": " has been at the forefront of incorporating nf-test for testing modules, subworkflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other nf-core pipelines actively integrating nf-test include ", + "_key": "5b97775f9d58", + "_type": "span" + }, + { + "marks": [ + "48976d9a8e15" + ], + "text": "mag", + "_key": "a2a3a9f7d95e", + "_type": "span" + }, + { + "text": ", ", + "_key": "85dbda17aa5b", + "_type": "span", + "marks": [] + }, + { + "_key": "5f4a059f43a0", + "_type": "span", + "marks": [ + "1d0f2c26d973" + ], + "text": "sarek" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "02f7c17112f3" + }, + { + "marks": [ + "ea366d3939f7" + ], + "text": "readsimulator", + "_key": "848074bd3030", + "_type": "span" + }, + { + "_key": "6ddfae3991d4", + "_type": "span", + "marks": [], + "text": ", and " + }, + { + "_type": "span", + "marks": [ + "cd398e3a9de8" + ], + "text": "rnaseq", + "_key": "f76585352138" + }, + { + "marks": [], + "text": ".", + "_key": "6fd0667e457a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "72046114f094", + "markDefs": [ + { + "href": "https://nf-co.re/methylseq/", + "_key": "2bc70c7e1e4a", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/fetchngs", + "_key": "25b095497343" + }, + { + "_key": "48976d9a8e15", + "_type": "link", + "href": "https://nf-co.re/mag" + }, + { + "_type": "link", + "href": "https://nf-co.re/sarek", + "_key": "1d0f2c26d973" + }, + { + "_type": "link", + "href": "https://nf-co.re/readsimulator", + "_key": "ea366d3939f7" + }, + { + "_key": "cd398e3a9de8", + "_type": "link", + "href": "https://nf-co.re/rnaseq" + } + ] + }, + { + "children": [ + { + "_key": "19ec73d4cd35", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f7c48b90963", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipeline development with nf-test", + "_key": "2cff63f2bde3" + } + ], + "_type": "block", + "style": "h2", + "_key": "ebf55d38bd1b" + }, + { + "children": [ + { + "text": "For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended", + "_key": "f4ddf2b9eb21", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "ca5d8030582c", + "_type": "span", + "marks": [], + "text": ". An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is " + }, + { + "_key": "56c93a20e628", + "_type": "span", + "marks": [ + "ee9a093ad86d" + ], + "text": "phageannotator" + }, + { + "_key": "b8e2adc05dee", + "_type": "span", + "marks": [], + "text": ". Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected. For those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience:" + } + ], + "_type": "block", + "style": "normal", + "_key": "c08bc039b1c1", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/phageannotator", + "_key": "ee9a093ad86d" + } + ] + }, + { + "style": "normal", + "_key": "deef7bf9f138", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "97e2307b7d93", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "69c39eb6d0a1", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Leverage nf-core modules/subworkflows extensively.", + "_key": "b073ac7697ce" + }, + { + "text": " Devoting time early to contribute modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests.\n\n", + "_key": "d3f2ab684ee3", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "markDefs": [], + "children": [ + { + "_key": "e98311fc6bfe", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Prioritize incremental changes over large overhauls" + }, + { + "_key": "f0b07225e822", + "_type": "span", + "marks": [], + "text": ". Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable.\n\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b0b6d2ece75f", + "listItem": "number" + }, + { + "_type": "block", + "style": "normal", + "_key": "1019cbf01e85", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Facilitate parallel execution of nf-test to generate and test snapshots", + "_key": "510a595f7bd2" + }, + { + "_key": "440720ade48a", + "_type": "span", + "marks": [], + "text": ". By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or updating snapshots time-consuming. Implementing scripts that allow tests to run in parallel—whether via a workload manager or in the cloud—can significantly save time and simplify the process of monitoring tests for pass or fail outcomes." + } + ], + "level": 1 + }, + { + "_key": "e048140c3bf6", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "763aef5e2d70", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Community and contribution", + "_key": "587bc5c8a72c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "54ce450dcb3d", + "markDefs": [] + }, + { + "style": "normal", + "_key": "c3699a07845f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "nf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds true as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition:", + "_key": "85a59be968d2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "c3a333241466", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "65f5164823c3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "b0b2ec3b4840", + "listItem": "number", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test", + "_key": "38773175d700" + } + ], + "children": [ + { + "_key": "ad21fd896097", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Adding nf-tests to new and existing nf-core modules/subworkflows" + }, + { + "_type": "span", + "marks": [], + "text": ". There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful ", + "_key": "7d5da716a140" + }, + { + "text": "instructions ", + "_key": "c464994975f6", + "_type": "span", + "marks": [ + "38773175d700" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to the website, which has made this process much more streamlined.\n\n", + "_key": "66341ec5ac40" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "feaa89745d8d", + "listItem": "number", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/fetchngs/tree/master", + "_key": "7e39c68b626c" + }, + { + "href": "https://github.com/nf-core/sarek/tree/master", + "_key": "191255c16d8e", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/rnaseq/tree/master", + "_key": "d8ce9ab578f9" + }, + { + "_key": "f0edcef01696", + "_type": "link", + "href": "https://github.com/nf-core/readsimulator/tree/master" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/phageannotator", + "_key": "b8631a555d2a" + } + ], + "children": [ + { + "text": "Adding nf-tests to nf-core pipelines.", + "_key": "dfa6eefc9950", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started (", + "_key": "dad9ad27bfef", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "7e39c68b626c" + ], + "text": "fetchngs", + "_key": "604fc21a72f1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "], ", + "_key": "1bc44446469a" + }, + { + "_type": "span", + "marks": [ + "191255c16d8e" + ], + "text": "sarek", + "_key": "98ceb627e6f0" + }, + { + "text": ", ", + "_key": "9cdc39f9dee9", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "d8ce9ab578f9" + ], + "text": "rnaseq", + "_key": "694ee75bce11" + }, + { + "marks": [], + "text": ", ", + "_key": "ad6f6f1d24e6", + "_type": "span" + }, + { + "marks": [ + "f0edcef01696" + ], + "text": "readsimulator", + "_key": "2688d0012056", + "_type": "span" + }, + { + "_key": "7186d2ec794f", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "b8631a555d2a" + ], + "text": "phageannotator", + "_key": "402a899f8a4a" + }, + { + "_type": "span", + "marks": [], + "text": ").", + "_key": "6ce3218f741b" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "text": "These are great areas to work on & contribute in nf-core hackathons", + "_key": "c215e4be18ec", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "d466715b8643", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "eac11bf38236" + } + ], + "_type": "block", + "style": "normal", + "_key": "6522171924d9", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2024/hackathon-march-2024", + "_key": "03e0d409c794" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The nf-core community added a significant number of nf-tests during the recent ", + "_key": "0a96540be983" + }, + { + "_key": "ef1f53c9d9c3", + "_type": "span", + "marks": [ + "03e0d409c794" + ], + "text": "hackathon in March 2024" + }, + { + "_type": "span", + "marks": [], + "text": ". Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about a much more effective, transparent, and inclusive system than if we worked in isolation.", + "_key": "940548547241" + } + ], + "_type": "block", + "style": "normal", + "_key": "7d5cc9279a82" + }, + { + "_type": "block", + "style": "normal", + "_key": "c465e55981f4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7ae0a16a710e", + "_type": "span" + } + ] + }, + { + "style": "h2", + "_key": "9ac796b501ea", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Future directions", + "_key": "240475bef07f" + } + ], + "_type": "block" + }, + { + "_key": "01906c795cdb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Looking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management.", + "_key": "e93f3483694e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "317325ae8be1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7e000d3777f7" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "48c17981c567", + "markDefs": [], + "children": [ + { + "text": "Conclusion", + "_key": "39c5d50b6759", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "The integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research.", + "_key": "86bf517ba8af", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "cee9c104b1b5" + }, + { + "style": "normal", + "_key": "95ad7f7bb3b8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7ec5f81ab34f" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0474e6707fcd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Special thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration.", + "_key": "c5069006f601" + } + ] + }, + { + "style": "normal", + "_key": "df44a270bd7b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f0d145957fbf", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "blockquote", + "_key": "a7c0295748d8", + "markDefs": [ + { + "_key": "3a93111f409f", + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html" + } + ], + "children": [ + { + "_key": "e2793efe91f40", + "_type": "span", + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it " + }, + { + "_key": "e2793efe91f41", + "_type": "span", + "marks": [ + "3a93111f409f" + ], + "text": "here" + }, + { + "marks": [], + "text": ".", + "_key": "e2793efe91f42", + "_type": "span" + } + ], + "_type": "block" + } + ], + "author": { + "_ref": "OWAhkDWC92JN5AHHJ7pVfj", + "_type": "reference" + }, + "title": "Leveraging nf-test for enhanced quality control in nf-core", + "publishedAt": "2024-04-03T06:00:00.000Z", + "_rev": "Ot9x7kyGeH5005E3MJ9LAK", + "tags": [ + { + "_type": "reference", + "_key": "818543be9096", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "_type": "blogPost" + }, + { + "author": { + "_ref": "L90MLvtZSPRQtUzPRoOtHk", + "_type": "reference" + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After diving into the Nextflow community, I've seen how it benefits bioinformatics in places like South Africa, Brazil, and France. I'm confident it can do the same for Türkiye by fostering collaboration and speeding up research. Since I became a Nextflow Ambassador, I am happy and excited because I can contribute to this development! Even though our first attempt to organize an introductory Nextflow workshop was online, it was a fruitful collaboration with RSG-Türkiye that initiated our effort to promote more Nextflow in Türkiye. We are happy to announce that we will organize a hands-on workshop soon.", + "_key": "4c1e3a223c28" + } + ], + "_type": "block", + "style": "normal", + "_key": "728aadc7df47" + }, + { + "_key": "0947163b20c5", + "children": [ + { + "_key": "96ec68814f5d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "_key": "c1a91f132252" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "I am ", + "_key": "544e70c5b2bf" + }, + { + "_key": "458d51a67d5e", + "_type": "span", + "marks": [ + "b41f17361216" + ], + "text": "Kübra Narcı" + }, + { + "text": ", currently employed as a bioinformatician within the ", + "_key": "5c5400d98cf5", + "_type": "span", + "marks": [] + }, + { + "text": "German Human Genome Phenome Archive (GHGA) Workflows workstream", + "_key": "e6f9c346c988", + "_type": "span", + "marks": [ + "1b327c1797d0" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". Upon commencing this position nearly two years ago, I was introduced to Nextflow due to the necessity of transporting certain variant calling workflows here, and given my prior experience with other workflow managers, I was well-suited for the task. Though the initial two months were marked by challenges and moments of frustration, my strong perseverance ultimately led to the successful development of my first pipeline.", + "_key": "bf74d36af211" + } + ], + "_type": "block", + "style": "normal", + "_key": "746b53fcfd5b", + "markDefs": [ + { + "_key": "b41f17361216", + "_type": "link", + "href": "https://www.ghga.de/about-us/team-members/narci-kuebra" + }, + { + "_key": "1b327c1797d0", + "_type": "link", + "href": "https://www.ghga.de/about-us/how-we-work/workstreams" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "014e9ec2d9a9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3442f255a610" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Subsequently, owing much to the supportive Nextflow community, my interest, as well as my proficiency in the platform, steadily grew, culminating in my acceptance to the role of Nextflow Ambassador for the past six months. I jumped into the role since it was a great opportunity for GHGA and Nextflow to be connected even more.", + "_key": "76c859d4aad4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "488db9c1ff14", + "markDefs": [] + }, + { + "_key": "0bb2d975455d", + "children": [ + { + "text": "", + "_key": "f8e74f191372", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "85eae924205e", + "asset": { + "_ref": "image-805013bdaf2f1d396596eeb5484335d5bc8f4e10-1998x1114-png", + "_type": "reference" + } + }, + { + "children": [ + { + "marks": [], + "text": "Transitioning into this ambassadorial role prompted a solid realization: the absence of a dedicated Nextflow community in Türkiye. This revelation was a shock, particularly given my academic background in bioinformatics there, where the community’s live engagement in workflow development is undeniable. Witnessing Turkish contributors within Nextflow and nf-core Slack workspaces further underscored this sentiment. It became evident that what was lacking was a spark for organizing events to ignite the Turkish community, a task I gladly undertook.", + "_key": "e0cae04b7c15", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "daa476edbff4", + "markDefs": [] + }, + { + "children": [ + { + "_key": "1c169d51a7c2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1890aebcb24c" + }, + { + "style": "normal", + "_key": "e1545316fc7b", + "markDefs": [ + { + "href": "https://www.twitter.com/mribeirodantas", + "_key": "0d9f21c1a065", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "While I possessed foresight regarding the establishment of a Nextflow community, I initially faced uncertainty regarding the appropriate course of action. To address this, I sought counsel from ", + "_key": "d92384fb925f", + "_type": "span" + }, + { + "marks": [ + "0d9f21c1a065" + ], + "text": "Marcel", + "_key": "47f023f5fd19", + "_type": "span" + }, + { + "marks": [], + "text": ", given his pivotal role in the initiation of the Nextflow community in Brazil. Following our discussion and receipt of valuable insights, it became evident that establishing connections with the appropriate community from my base in Germany was a necessity.", + "_key": "12d7f0de46f8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "204a8601a84f", + "children": [ + { + "_key": "ab3812501569", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "99a38a3c2aee", + "markDefs": [ + { + "_type": "link", + "href": "https://rsgturkey.com", + "_key": "fa6637f14b88" + } + ], + "children": [ + { + "_key": "f1d5583a6c44", + "_type": "span", + "marks": [], + "text": "This attempt led me to meet with " + }, + { + "text": "RSG-Türkiye", + "_key": "bfdf2f0d07cf", + "_type": "span", + "marks": [ + "fa6637f14b88" + ] + }, + { + "marks": [], + "text": ". RSG-Türkiye aims to create a platform for students and post-docs in computational biology and bioinformatics in Türkiye. It aims to share knowledge and experience, promote collaboration, and expand training opportunities. The organization also collaborates with universities and the Bioinformatics Council, a recently established national organization as the Turkish counterpart of the ISCB (International Society for Computational Biology) to introduce industrial and academic research. To popularize the field, they have offline and online talk series in university student centers to promote computational biology and bioinformatics.", + "_key": "9573ac04552a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "6dc197a3bb2c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1990b33f56f1" + }, + { + "style": "normal", + "_key": "58cabc9c49b9", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/watch?v=AqNmIkoQrNo&ab_channel=RSG-Turkey", + "_key": "611275fadb5f" + } + ], + "children": [ + { + "text": "Following our introduction, RSG-Türkiye and I hosted a workshop focusing on workflow reproducibility, Nextflow, and nf-core. We chose Turkish as the language to make it more accessible for participants who are not fluent in English. The online session lasted a bit more than an hour and attracted nearly 50 attendees, mostly university students but also individuals from the research and industry sectors. The strong student turnout was especially gratifying as it aligned with my goal of building a vibrant Nextflow community in Türkiye. I took the opportunity to discuss Nextflow’s ambassadorship and mentorship programs, which can greatly benefit students, given Türkiye’s growing interest in bioinformatics. The whole workshop was recorded and can be viewed on ", + "_key": "381ce5a3e2bc", + "_type": "span", + "marks": [] + }, + { + "text": "YouTube", + "_key": "34f332614a75", + "_type": "span", + "marks": [ + "611275fadb5f" + ] + }, + { + "marks": [], + "text": ".", + "_key": "4dc7a5e18632", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b362c7e14620" + } + ], + "_type": "block", + "style": "normal", + "_key": "ccf3920707a9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I am delighted to report that the workshop was a success. It was not only attracting considerable interest but also marked the commencement of a promising journey. Our collaboration with RSG-Türkiye persists, with plans underway for a more comprehensive on-site training session in Türkiye scheduled for later this year. I look forward to more engagement from Turkish participants as we work together to strengthen our community. Hopefully, this effort will lead to more Turkish-language content, new mentor relations from the core Nextflow team, and the emergence of a local Nextflow ambassador.", + "_key": "7be04a1ef8fe" + } + ], + "_type": "block", + "style": "normal", + "_key": "68326f84e35e" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "2643942ce84c" + } + ], + "_type": "block", + "style": "normal", + "_key": "739060eaacb6" + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "468edcc8e518", + "asset": { + "_type": "reference", + "_ref": "image-855769e39cae10399f05dc42268e931a1e49fab0-1990x1112-png" + } + }, + { + "style": "h2", + "_key": "647fdf809466", + "children": [ + { + "_key": "b12b7f94a80c", + "_type": "span", + "text": "How can I contact the Nextflow Türkiye community?" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "a159a77bc700", + "markDefs": [], + "children": [ + { + "_key": "2e03c7c2ec13", + "_type": "span", + "marks": [], + "text": "If you want to help grow the Nextflow community in Türkiye, join the Nextflow and nf-core Slack workspaces and connect with Turkish contributors in the #region-turkiye channel. Don't be shy—say hello, and let's build up the community together! Feel free to contact me if you're interested in helping organize local hands-on Nextflow workshops. We welcome both advanced users and beginners. By participating, you'll contribute to the growth of bioinformatics in Türkiye, collaborate with peers, and access resources to advance your research and career." + } + ] + } + ], + "_type": "blogPost", + "_id": "64e877d8c56e", + "publishedAt": "2024-06-12T06:00:00.000Z", + "_createdAt": "2024-09-25T14:17:53Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "7400aeee0c47" + } + ], + "meta": { + "slug": { + "current": "bioinformatics-growth-in-turkiye" + } + }, + "_updatedAt": "2024-09-26T09:04:23Z", + "_rev": "hf9hwMPb7ybAE3bqEU5ikE", + "title": "Fostering Bioinformatics Growth in Türkiye" + }, + { + "_id": "66f7b16b128a", + "_createdAt": "2024-09-25T14:14:50Z", + "body": [ + { + "_key": "0cd3e947bf8a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The scientific world nowadays operates on the basis of published articles. These are used to report novel discoveries to the rest of the scientific community.", + "_key": "59b35c33b672", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "e7d0dc517871", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6e68445ed27b" + }, + { + "style": "normal", + "_key": "1213fc789f24", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "But have you ever wondered what a scientific article is? It is a:", + "_key": "46b1cdc124eb", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "defeasible argument for claims, supported by", + "_key": "1cd5fd8c19500", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d0afc4278f46", + "listItem": "number" + }, + { + "style": "normal", + "_key": "bb2c7783e27a", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "exhibited, reproducible data and methods, and", + "_key": "a8cb4d67b20b0" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "explicit references to other work in that domain;", + "_key": "84dafb34b02b0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "711842bd0fb2", + "listItem": "number", + "markDefs": [] + }, + { + "listItem": "number", + "markDefs": [], + "children": [ + { + "_key": "6bf9fa01139d0", + "_type": "span", + "marks": [], + "text": "described using domain-agreed technical terminology," + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5843c477145c" + }, + { + "style": "normal", + "_key": "a00b1b760a41", + "listItem": "number", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "which exists within a complex ecosystem of technologies, people and activities.", + "_key": "5f30cefdf1580" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "d5ade5b0849e", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5ebbadf6c6c3", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "91a69d2fbcda", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Hence the very essence of Science relies on the ability of scientists to reproduce and build upon each other’s published results.", + "_key": "2d3e042adb1c" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "cc0ddddc5d07", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5c0178061126", + "markDefs": [] + }, + { + "style": "normal", + "_key": "7fdd218e7bc2", + "markDefs": [ + { + "_key": "923c83780e83", + "_type": "link", + "href": "http://www.nature.com/nature/journal/v483/n7391/full/483531a.html" + } + ], + "children": [ + { + "text": "So how much can we rely on published data? In a recent report in Nature, researchers at the Amgen corporation found that only 11% of the academic research in the literature was reproducible by their groups [", + "_key": "195742b919df", + "_type": "span", + "marks": [] + }, + { + "text": "1", + "_key": "649e32779802", + "_type": "span", + "marks": [ + "923c83780e83" + ] + }, + { + "_key": "e4931c5e781f", + "_type": "span", + "marks": [], + "text": "]." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0d5ee61a36ad", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5d01dfdcb150" + } + ] + }, + { + "_key": "42a6b962fe47", + "markDefs": [], + "children": [ + { + "text": "While many factors are likely at play here, perhaps the most basic requirement for reproducibility holds that the materials reported in a study can be uniquely identified and obtained, such that experiments can be reproduced as faithfully as possible. This information is meant to be documented in the \"materials and methods\"; of journal articles, but as many can attest, the information provided there is often not adequate for this task.", + "_key": "c964b5781329", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "999bf374b806", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0688d1c8b0bd" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Promoting Computational Research Reproducibility", + "_key": "38ae292b3527", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "5b8bb9a1bc83" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Encouragingly scientific reproducibility has been at the forefront of many news stories and there exist numerous initiatives to help address this problem. Particularly, when it comes to producing reproducible computational analyses, some publications are starting to publish the code and data used for analysing and generating figures.", + "_key": "fa897da7d059" + } + ], + "_type": "block", + "style": "normal", + "_key": "4488f3038e9f" + }, + { + "_type": "block", + "style": "normal", + "_key": "68d409c65f72", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "46f455012afa", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For example, many articles in Nature and in the new Elife journal (and others) provide a \"source data\" download link next to figures. Sometimes Elife might even have an option to download the source code for figures.", + "_key": "bac5245ef7b8" + } + ], + "_type": "block", + "style": "normal", + "_key": "f87b23bd7ef3" + }, + { + "_type": "block", + "style": "normal", + "_key": "3af90a7e1a8b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fc0ec4326de2" + } + ] + }, + { + "style": "normal", + "_key": "f0fc53a7dc08", + "markDefs": [ + { + "href": "http://melissagymrek.com/science/2014/08/29/docker-reproducible-research.html", + "_key": "64ead0bb6a12", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "As pointed out by Melissa Gymrek ", + "_key": "8b1fb3008476", + "_type": "span" + }, + { + "marks": [ + "64ead0bb6a12" + ], + "text": "in a recent post", + "_key": "cfb3aed207bc", + "_type": "span" + }, + { + "text": " this is a great start, but there are still lots of problems. She wrote that, for example, if one wants to re-execute a data analyses from these papers, he/she will have to download the scripts and the data, to only realize that he/she has not all the required libraries, or that it only runs on, for example, an Ubuntu version he/she doesn't have, or some paths are hard-coded to match the authors' machine.", + "_key": "9fe2744de1be", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3b755927ebc8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "920a61a3920a" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c77f1d8de0c2", + "_type": "span", + "marks": [], + "text": "If it's not easy to run and doesn't run out of the box the chances that a researcher will actually ever run most of these scripts is close to zero, especially if they lack the time or expertise to manage the required installation of third-party libraries, tools or implement from scratch state-of-the-art data processing algorithms." + } + ], + "_type": "block", + "style": "normal", + "_key": "b7cdacb20a82" + }, + { + "_type": "block", + "style": "normal", + "_key": "38feec499605", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e56664ca20ad" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "0fe5db0e880a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Here comes Docker", + "_key": "1599518c0f64", + "_type": "span" + } + ] + }, + { + "_key": "5f8bb85bd22b", + "markDefs": [ + { + "href": "http://www.docker.com", + "_key": "a57b5569587e", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "a57b5569587e" + ], + "text": "Docker", + "_key": "888c00fd2ed0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " containers technology is a solution to many of the computational research reproducibility problems. Basically, it is a kind of a lightweight virtual machine where you can set up a computing environment including all the libraries, code and data that you need, within a single ", + "_key": "f4e9403190f6" + }, + { + "text": "image", + "_key": "032d85fe5462", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "text": ".", + "_key": "5fe926122ea3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "be7a77a7ecb0" + } + ], + "_type": "block", + "style": "normal", + "_key": "887df9fbc509", + "markDefs": [] + }, + { + "_key": "68b46bcd163b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This image can be distributed publicly and can seamlessly run on any major Linux operating system. No need for the user to mess with installation, paths, etc.", + "_key": "4672a9e481e8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3bbf9a929d0a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c6e16e1a26d8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "6a1af6c49833", + "_type": "link", + "href": "http://www.bioinformaticszen.com/post/reproducible-assembler-benchmarks/" + }, + { + "_key": "0aa5d689e2c8", + "_type": "link", + "href": "https://bcbio.wordpress.com/2014/03/06/improving-reproducibility-and-installation-of-genomic-analysis-pipelines-with-docker/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "They just run the Docker image you provided, and everything is set up to work out of the box. Researchers have already started discussing this (e.g. ", + "_key": "98cfa2355473" + }, + { + "text": "here", + "_key": "ee62266358db", + "_type": "span", + "marks": [ + "6a1af6c49833" + ] + }, + { + "_key": "611ac5421a4d", + "_type": "span", + "marks": [], + "text": ", and " + }, + { + "_type": "span", + "marks": [ + "0aa5d689e2c8" + ], + "text": "here", + "_key": "b7b9fdc71e8f" + }, + { + "text": ").", + "_key": "5ce952e7bd6e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7503c4787585" + }, + { + "style": "normal", + "_key": "08c0889fc07f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0b88ce227c33", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Docker and Nextflow: a perfect match", + "_key": "7129ed72df1b" + } + ], + "_type": "block", + "style": "h2", + "_key": "8b9ad08221da" + }, + { + "children": [ + { + "text": "One big advantage Docker has compared to ", + "_key": "078f1ae7ede8", + "_type": "span", + "marks": [] + }, + { + "text": "traditional", + "_key": "b42ddbdaef86", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "9d9ffb18cff9", + "_type": "span", + "marks": [], + "text": " machine virtualization technology is that it doesn't need a complete copy of the operating system, thus it has a minimal startup time. This makes it possible to virtualize single applications or launch the execution of multiple containers, that can run in parallel, in order to speedup a large computation." + } + ], + "_type": "block", + "style": "normal", + "_key": "9af1ad3e6983", + "markDefs": [] + }, + { + "_key": "f34da90d14bb", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ea49d39f17b4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow is a data-driven toolkit for computational pipelines, which aims to simplify the deployment of distributed and highly parallelised pipelines for scientific applications.", + "_key": "701775fcf0a4" + } + ], + "_type": "block", + "style": "normal", + "_key": "8314f9492b20", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "210cf738282e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "716701240f07" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "25ac907efe1c", + "markDefs": [], + "children": [ + { + "text": "The latest version integrates the support for Docker containers that enables the deployment of self-contained and truly reproducible pipelines.", + "_key": "377dda485511", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "10f90ba6e489", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f7fc6dc5aede" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "58e1c5cf843e", + "markDefs": [], + "children": [ + { + "_key": "666a0d7a5b41", + "_type": "span", + "marks": [], + "text": "How they work together" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "23c47b66d85c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A Nextflow pipeline is made up by putting together several processes. Each process can be written in any scripting language that can be executed by the Linux platform (BASH, Perl, Ruby, Python, etc). Parallelisation is automatically managed by the framework and it is implicitly defined by the processes input and output declarations.", + "_key": "23ec1438f980" + } + ] + }, + { + "_key": "42201d1a0044", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "933c374df2dc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a138e2ccbde5", + "markDefs": [], + "children": [ + { + "_key": "223c5dc9d1bd", + "_type": "span", + "marks": [], + "text": "By integrating Docker with Nextflow, every pipeline process can be executed independently in its own container, this guarantees that each of them run in a predictable manner without worrying about the configuration of the target execution platform. Moreover the minimal overhead added by Docker allows us to spawn multiple container executions in a parallel manner with a negligible performance loss when compared to a platform " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "native", + "_key": "e42cd8aab7ab" + }, + { + "text": " execution.", + "_key": "b5f079b31201", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3a9f5b7a1b82" + } + ], + "_type": "block", + "style": "normal", + "_key": "f7179fd76e64", + "markDefs": [] + }, + { + "_key": "1d6d86892b60", + "markDefs": [], + "children": [ + { + "text": "An example", + "_key": "2fbf3e23d741", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "9b2064b60f14", + "_type": "span", + "marks": [], + "text": "As a proof of concept of the Docker integration with Nextflow you can try out the pipeline example at this " + }, + { + "marks": [ + "95f19cf8a7b5" + ], + "text": "link", + "_key": "d3c587ca2706", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "bc1b2c3d296e" + } + ], + "_type": "block", + "style": "normal", + "_key": "d3ede21d981e", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/examples/blob/master/blast-parallel.nf", + "_key": "95f19cf8a7b5", + "_type": "link" + } + ] + }, + { + "_key": "7b379b25fc7b", + "markDefs": [], + "children": [ + { + "_key": "2ff104eefa0a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It splits a protein sequences multi FASTA file into chunks of ", + "_key": "995dc3ef59c4" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "n", + "_key": "3ff5e20615a0" + }, + { + "_type": "span", + "marks": [], + "text": " entries, executes a BLAST query for each of them, then extracts the top 10 matching sequences and finally aligns the results with the T-Coffee multiple sequence aligner.", + "_key": "0ab8d7b6b968" + } + ], + "_type": "block", + "style": "normal", + "_key": "693e9833dfa8" + }, + { + "_key": "505523f39862", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a59ef33a4b11" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "86c6ece9f1af", + "markDefs": [], + "children": [ + { + "_key": "510cd0c4b455", + "_type": "span", + "marks": [], + "text": "In a common scenario you generally need to install and configure the tools required by this script: BLAST and T-Coffee. Moreover you should provide a formatted protein database in order to execute the BLAST search." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "471e9c00da48", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "cd3ffa966e79", + "markDefs": [] + }, + { + "children": [ + { + "text": "By using Docker with Nextflow you only need to have the Docker engine installed in your computer and a Java VM. In order to try this example out, follow these steps:", + "_key": "edea1d1d76b7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "47a60fd8d676", + "markDefs": [] + }, + { + "_key": "28b0121fbecb", + "markDefs": [], + "children": [ + { + "_key": "7e6b569fdfd8", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "451ff9229b62", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Install the latest version of Nextflow by entering the following command in your shell terminal:", + "_key": "11f987e78567" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "1226322fda36", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "09a3e0a66330", + "_type": "span" + } + ] + }, + { + "_key": "41933fe37d1d", + "code": " curl -fsSL get.nextflow.io | bash", + "_type": "code" + }, + { + "_key": "f07384de1256", + "markDefs": [], + "children": [ + { + "text": "Then download the required Docker image with this command:", + "_key": "850cd0f072cf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d15a6515f59e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2b15270a6664" + } + ] + }, + { + "_key": "a0fd796c7f18", + "code": " docker pull nextflow/examples", + "_type": "code" + }, + { + "style": "normal", + "_key": "c08729034043", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/examples/blob/master/Dockerfile", + "_key": "db182b7798b3" + } + ], + "children": [ + { + "marks": [], + "text": "You can check the content of the image looking at the ", + "_key": "eeaa0b6f70aa", + "_type": "span" + }, + { + "text": "Dockerfile", + "_key": "867810f7eaf7", + "_type": "span", + "marks": [ + "db182b7798b3" + ] + }, + { + "_key": "140457af601a", + "_type": "span", + "marks": [], + "text": " used to create it." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "200806a4a22f" + } + ], + "_type": "block", + "style": "normal", + "_key": "c9b645e8abc6", + "markDefs": [] + }, + { + "style": "normal", + "_key": "db5f057ba47f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now you are ready to run the demo by launching the pipeline execution as shown below:", + "_key": "9bc8fda6eb37" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "94e89a281fe8" + } + ], + "_type": "block", + "style": "normal", + "_key": "a75943c7b299" + }, + { + "code": "nextflow run examples/blast-parallel.nf -with-docker", + "_type": "code", + "_key": "f12092554082" + }, + { + "_type": "block", + "style": "normal", + "_key": "96b3c7fbb2bf", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This will run the pipeline printing the final alignment out on the terminal screen. You can also provide your own protein sequences multi FASTA file by adding, in the above command line, the option ", + "_key": "e11f8954e4d0", + "_type": "span" + }, + { + "_key": "5799a212d7cd", + "_type": "span", + "marks": [ + "code" + ], + "text": "--query <file>" + }, + { + "_key": "ca22a142e5f6", + "_type": "span", + "marks": [], + "text": " and change the splitting chunk size with " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "--chunk n", + "_key": "68cedbfda986" + }, + { + "_key": "fb3207c6ea3b", + "_type": "span", + "marks": [], + "text": " option." + } + ] + }, + { + "_key": "3379bbe73c3b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ef4b3e44fdb6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e2ae936230cd", + "markDefs": [], + "children": [ + { + "text": "Note: the result doesn't have a real biological meaning since it uses a very small protein database.", + "_key": "e6934599e94d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0107bb7f6bf1", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a9a7d555a8d9", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "315b06147edd" + } + ], + "_type": "block", + "style": "h2", + "_key": "63909b8e2449" + }, + { + "_type": "block", + "style": "normal", + "_key": "5f7c0b08fbed", + "markDefs": [], + "children": [ + { + "_key": "ddd5aea27036", + "_type": "span", + "marks": [], + "text": "The mix of Docker, GitHub and Nextflow technologies make it possible to deploy self-contained and truly replicable pipelines. It requires zero configuration and enables the reproducibility of data analysis pipelines in any system in which a Java VM and the Docker engine are available." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8e17dc417921", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bf283b2abe07" + }, + { + "markDefs": [], + "children": [ + { + "text": "Learn how to do it!", + "_key": "9bb018a91da3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "f3ab3956c37f" + }, + { + "children": [ + { + "_key": "b92eeb971011", + "_type": "span", + "marks": [], + "text": "Follow our documentation for a quick start using Docker with Nextflow at the following link " + }, + { + "text": "https://www.nextflow.io/docs/latest/docker.html", + "_key": "ff0e07795e14", + "_type": "span", + "marks": [ + "9cc965622b48" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "130158487105", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/docker.html", + "_key": "9cc965622b48" + } + ] + } + ], + "publishedAt": "2014-09-09T06:00:00.000Z", + "_type": "blogPost", + "author": { + "_ref": "7d389002-0fae-4149-98d4-22623b6afbed", + "_type": "reference" + }, + "_rev": "hf9hwMPb7ybAE3bqEU5jP0", + "title": "Reproducibility in Science - Nextflow meets Docker", + "_updatedAt": "2024-10-02T13:36:52Z", + "meta": { + "description": "The scientific world nowadays operates on the basis of published articles. These are used to report novel discoveries to the rest of the scientific community.", + "slug": { + "current": "nextflow-meets-docker" + } + } + }, + { + "_updatedAt": "2024-09-27T10:07:48Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "e0215b5bcb38" + } + ], + "_type": "blogPost", + "meta": { + "slug": { + "current": "nextflow-24.04-highlights" + }, + "description": "We release an “edge” version of Nextflow every month and a “stable” version every six months. The stable releases are recommended for production usage and represent a significant milestone. The release changelogs contain a lot of detail, so we thought we’d highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let’s get into it!" + }, + "_rev": "mvya9zzDXWakVjnX4hhWC2", + "_id": "68b0d1a77a92", + "title": "Nextflow 24.04 - Release highlights", + "publishedAt": "2024-05-27T06:00:00.000Z", + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "ae8c611fc8ba", + "markDefs": [ + { + "_key": "ea179dd46607", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases" + } + ], + "children": [ + { + "_key": "3e521ac4ecff", + "_type": "span", + "marks": [], + "text": "We release an “edge” version of Nextflow every month and a “stable” version every six months. The stable releases are recommended for production usage and represent a significant milestone. The " + }, + { + "_type": "span", + "marks": [ + "ea179dd46607" + ], + "text": "release changelogs", + "_key": "fab18365df0e" + }, + { + "_type": "span", + "marks": [], + "text": " contain a lot of detail, so we thought we’d highlight some of the goodies that have just been released in Nextflow 24.04 stable. Let’s get into it!", + "_key": "f322940823a0" + } + ] + }, + { + "style": "normal", + "_key": "971f43d4c490", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "57d97069477b", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [ + "strong" + ], + "text": "Tip: ", + "_key": "ee75968de7de", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "We also did a podcast episode about some of these changes! Check it out here: ", + "_key": "ffc555b63293" + }, + { + "_key": "0aaab535973d", + "_type": "span", + "marks": [ + "6ebeddd8c555" + ], + "text": "Channels Episode 41" + }, + { + "marks": [], + "text": ".", + "_key": "eb5138047221", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "9d00078d3cf8", + "markDefs": [ + { + "href": "/podcast/2024/ep41_nextflow_2404.html", + "_key": "6ebeddd8c555", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d861e94639a4", + "markDefs": [], + "children": [ + { + "_key": "e920439bcf03", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "522408976f0f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "New features\n\n", + "_key": "b60180b7be8b", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "3372eed2ef26", + "_type": "span", + "marks": [], + "text": "Seqera Containers" + } + ], + "_type": "block", + "style": "h3", + "_key": "ec80dbc37861", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "A new flagship community offering was revealed at the Nextflow Summit 2024 Boston - ", + "_key": "1f840c0d2f2d" + }, + { + "_key": "35d2aa5fed90", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Seqera Containers" + }, + { + "_key": "c3ee2e25a45c", + "_type": "span", + "marks": [], + "text": ". This is a free-to-use container cache powered by " + }, + { + "_type": "span", + "marks": [ + "ef148eed18f6" + ], + "text": "Wave", + "_key": "35c6f76b65c4" + }, + { + "_type": "span", + "marks": [], + "text": ", allowing anyone to request an image with a combination of packages from Conda and PyPI. The image will be built on demand and cached (for at least 5 years after creation). There is a ", + "_key": "853037045a02" + }, + { + "text": "dedicated blog post", + "_key": "5ed3a9167370", + "_type": "span", + "marks": [ + "72f2197b9867" + ] + }, + { + "_key": "fec72cdd5d96", + "_type": "span", + "marks": [], + "text": " about this, but it's worth noting that the service can be used directly from Nextflow and not only through " + }, + { + "_type": "span", + "marks": [ + "c23676992428" + ], + "text": "https://seqera.io/containers/", + "_key": "d49e0ec30dd8" + } + ], + "_type": "block", + "style": "normal", + "_key": "ca5a19a849db", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/wave/", + "_key": "ef148eed18f6" + }, + { + "_type": "link", + "href": "https://seqera.io/blog/introducing-seqera-pipelines-containers/", + "_key": "72f2197b9867" + }, + { + "_key": "c23676992428", + "_type": "link", + "href": "https://seqera.io/containers/" + } + ] + }, + { + "_key": "a0128f7eb3c0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "788ebc17382a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "28bf29210bc9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In order to use Seqera Containers in Nextflow, simply set ", + "_key": "c4ee7b96b966", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "wave.freeze", + "_key": "66cacb763596" + }, + { + "text": " ", + "_key": "7497a10b8214", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "without", + "_key": "dbcbd2f471e6" + }, + { + "marks": [], + "text": " setting ", + "_key": "3e1fd8ebf38e", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "wave.build.repository", + "_key": "4ba9c42b9279" + }, + { + "text": " - for example, by using the following config for your pipeline:", + "_key": "9cb3b102cb3a", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "ad08a7ff3d2c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "26fa9ef7dfe3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "wave.enabled = true\nwave.freeze = true\nwave.strategy = 'conda'", + "_type": "code", + "_key": "1ffd6f1c5595" + }, + { + "style": "normal", + "_key": "0008384c8664", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "4e16d1920646", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "a879cfd5d08d", + "markDefs": [], + "children": [ + { + "text": "Any processes in your pipeline specifying Conda packages will have Docker or Singularity images created on the fly (depending on whether ", + "_key": "406e1ffb6f9b", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "singularity.enabled", + "_key": "3b0cf3ba3f41", + "_type": "span" + }, + { + "marks": [], + "text": " is set or not) and cached for immediate access in subsequent runs. These images will be publicly available. You can view all container image names with the ", + "_key": "03b7399718a8", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow inspect", + "_key": "7ef740ed7ede" + }, + { + "_key": "67b827eba7fb", + "_type": "span", + "marks": [], + "text": " command." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "4095144d5280", + "markDefs": [], + "children": [ + { + "_key": "c32a62c04264", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1cf1c2b7c7ed", + "_type": "span", + "marks": [], + "text": "Workflow output definition" + } + ], + "_type": "block", + "style": "h3", + "_key": "8d36cc81ef56" + }, + { + "_type": "block", + "style": "normal", + "_key": "7c0209c77569", + "markDefs": [], + "children": [ + { + "_key": "063ec8a68344", + "_type": "span", + "marks": [], + "text": "The workflow output definition is a new syntax for defining workflow outputs:" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "77a8d365a4a2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6310a83b9768" + } + ] + }, + { + "_type": "code", + "_key": "e1bd2668ba0f", + "code": "nextflow.preview.output = true // [!code ++]\n\nworkflow {\n main:\n ch_foo = foo(data)\n bar(ch_foo)\n\n publish:\n ch_foo >> 'foo' // [!code ++]\n}\n\noutput { // [!code ++]\n directory 'results' // [!code ++]\n mode 'copy' // [!code ++]\n} // [!code ++]" + }, + { + "_key": "8660a73f0e86", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4b9b5117825a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "It essentially provides a DSL2-style approach for publishing, and will replace ", + "_key": "135b9f12fd70", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "publishDir", + "_key": "2a367a49dba9", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " once it is finalized. It also provides extra flexibility as it allows you to publish ", + "_key": "b8a0ffbda9fa" + }, + { + "_key": "a9523b760499", + "_type": "span", + "marks": [ + "em" + ], + "text": "any" + }, + { + "text": " channel, not just process outputs. See the ", + "_key": "b806dcea22c7", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "19facccbe9e1" + ], + "text": "Nextflow docs", + "_key": "69b4fc297514" + }, + { + "text": " for more information.", + "_key": "2e6dbc719351", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4aee265c2462", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/workflow.html#publishing-outputs", + "_key": "19facccbe9e1" + } + ] + }, + { + "style": "normal", + "_key": "01d28eef2756", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c2c5cff274e6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Info:", + "_key": "42b961434f6c" + }, + { + "_type": "span", + "marks": [], + "text": " This feature is still in preview and may change in a future release. We hope to finalize it in version 24.10, so don't hesitate to share any feedback with us! ", + "_key": "4dfccd1d64b7" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "39c23afa2dfc" + }, + { + "_key": "8ffdab93ced1", + "markDefs": [], + "children": [ + { + "_key": "ae6591dfb966", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "621b21fc327b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Topic channels", + "_key": "9339d5e1ca0d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "ee40cdfcd859", + "markDefs": [], + "children": [ + { + "text": "Topic channels are a new channel type introduced in 23.11.0-edge. A topic channel is essentially a queue channel that can receive values from multiple sources, using a matching name or “topic”:", + "_key": "e4fe44ef7c010", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "b31fd3ed5565", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e53b7c6ed4a1" + } + ], + "_type": "block" + }, + { + "code": "process foo {\n output:\n val('foo'), topic: 'my-topic' // [!code ++]\n}\n\nprocess bar {\n output:\n val('bar'), topic: 'my-topic' // [!code ++]\n}\n\nworkflow {\n foo()\n bar()\n\n Channel.topic('my-topic').view() // [!code ++]\n}", + "_type": "code", + "_key": "f6dc814963d8" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c994984a2837", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "05566be31953" + }, + { + "_key": "a285cf3b88d9", + "markDefs": [ + { + "_key": "d8644ea8c4bb", + "_type": "link", + "href": "https://nextflow.io/docs/latest/channel.html#topic" + } + ], + "children": [ + { + "_key": "112ec26989f8", + "_type": "span", + "marks": [], + "text": "Topic channels are particularly useful for collecting metadata from various places in the pipeline, without needing to write all of the channel logic that is normally required (e.g. using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "mix", + "_key": "0f351e4f849f" + }, + { + "_type": "span", + "marks": [], + "text": " operator). See the ", + "_key": "eb7fd399f9b4" + }, + { + "text": "Nextflow docs", + "_key": "d0c619a4a20c", + "_type": "span", + "marks": [ + "d8644ea8c4bb" + ] + }, + { + "_type": "span", + "marks": [], + "text": " for more information.", + "_key": "7921587dbe75" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "53faf869c1ea", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "104065ec985e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3d52960ca449", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Process `eval` outputs", + "_key": "3e767101c687", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "daec0ce3c107", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Process ", + "_key": "edebff988a31", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "eval", + "_key": "eb34ce7f57df", + "_type": "span" + }, + { + "_key": "a6a0b6a14aa3", + "_type": "span", + "marks": [], + "text": " outputs are a new type of process output which allows you to capture the standard output of an arbitrary shell command:" + } + ] + }, + { + "style": "normal", + "_key": "012225305a60", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c035ab69c4a8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "ccd01ea94415", + "code": "process sayHello {\n output:\n eval('bash --version') // [!code ++]\n\n \"\"\"\n echo Hello world!\n \"\"\"\n}\n\nworkflow {\n sayHello | view\n}" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "8a12333a2850", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d56f2727d629", + "markDefs": [] + }, + { + "children": [ + { + "_key": "dc95b3bc8e04", + "_type": "span", + "marks": [], + "text": "The shell command is executed alongside the task script. Until now, you would typically execute these supplementary commands in the main process script, save the output to a file or environment variable, and then capture it using a " + }, + { + "text": "path", + "_key": "9d2a6aff1809", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "338655a33e5a" + }, + { + "marks": [ + "code" + ], + "text": "env", + "_key": "41d39b232b1e", + "_type": "span" + }, + { + "text": " output. The new ", + "_key": "66b4cdde231a", + "_type": "span", + "marks": [] + }, + { + "_key": "5f4fd1c9c395", + "_type": "span", + "marks": [ + "code" + ], + "text": "eval" + }, + { + "_type": "span", + "marks": [], + "text": " output is a much more convenient way to capture this kind of command output directly. See the ", + "_key": "fbc369e54f96" + }, + { + "text": "Nextflow docs", + "_key": "e282c93a4844", + "_type": "span", + "marks": [ + "164013340ab1" + ] + }, + { + "text": " for more information.", + "_key": "03dbbf257276", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2eccf1cf4a28", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#output-type-eval", + "_key": "164013340ab1" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "\n", + "_key": "0fc5686c0804", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2bd8c1ccbaae", + "markDefs": [] + }, + { + "style": "h4", + "_key": "734b41a8d204", + "markDefs": [], + "children": [ + { + "_key": "dcd0c28602d6", + "_type": "span", + "marks": [], + "text": "Collecting software versions" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "edaa9475d49b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Together, topic channels and eval outputs can be used to simplify the collection of software tool versions. For example, for FastQC:", + "_key": "41dfd92dcc7e", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "43823e387da4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "abfaf563992e", + "markDefs": [] + }, + { + "_key": "28c5fdf7cb85", + "code": "process FASTQC {\n input:\n tuple val(meta), path(reads)\n\n output:\n tuple val(meta), path('*.html'), emit: html\n tuple val(\"${task.process}\"), val('fastqc'), eval('fastqc --version'), topic: versions // [!code ++]\n\n \"\"\"\n fastqc $reads\n \"\"\"\n}\n\nworkflow {\n Channel.topic('versions') // [!code ++]\n | unique()\n | map { process, name, version ->\n \"\"\"\\\n ${process.tokenize(':').last()}:\n ${name}: ${version}\n \"\"\".stripIndent()\n }\n | collectFile(name: 'collated_versions.yml')\n | CUSTOM_DUMPSOFTWAREVERSIONS\n}", + "_type": "code" + }, + { + "_key": "c8e96050b8e4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9e9c8ba77cc5" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "41d32e2c2916", + "markDefs": [ + { + "href": "https://github.com/nf-core/rnaseq/pull/1109", + "_key": "dbb015531547", + "_type": "link" + }, + { + "href": "https://github.com/nf-core/rnaseq/pull/1115", + "_key": "0f01f4e3f263", + "_type": "link" + } + ], + "children": [ + { + "text": "This approach will be implemented across all nf-core pipelines, and will cut down on a lot of boilerplate code. Check out the full prototypes for nf-core/rnaseq ", + "_key": "8714327ba476", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "dbb015531547" + ], + "text": "here", + "_key": "93f8caa31d0e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "b33a11e19fda" + }, + { + "_type": "span", + "marks": [ + "0f01f4e3f263" + ], + "text": "here", + "_key": "8cedf43eeb1f" + }, + { + "marks": [], + "text": " to see them in action!", + "_key": "3cc6689db67b", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "\n", + "_key": "72cd5747369a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9146b6d47cd7", + "markDefs": [] + }, + { + "style": "h3", + "_key": "6b1ef11fd3a0", + "markDefs": [], + "children": [ + { + "text": "Resource limits", + "_key": "f44fabf72540", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "e5e2010cc945" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "resourceLimits", + "_key": "3a2551ec71d4" + }, + { + "marks": [], + "text": " directive is a new process directive which allows you to define global limits on the resources requested by individual tasks. For example, if you know that the largest node in your compute environment has 24 CPUs, 768 GB or memory, and a maximum walltime of 72 hours, you might specify the following:", + "_key": "65ddce241245", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "15076ef3b76e", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5fcd12c1fa2b" + } + ], + "_type": "block", + "style": "normal", + "_key": "2ab9a610a82c" + }, + { + "code": "process.resourceLimits = [ cpus: 24, memory: 768.GB, time: 72.h ]", + "_type": "code", + "_key": "64ad47053799" + }, + { + "markDefs": [], + "children": [ + { + "_key": "baac49fc873e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9b54c9ae3902" + }, + { + "_type": "block", + "style": "normal", + "_key": "d3fa1813fceb", + "markDefs": [ + { + "_key": "401af70f9fde", + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#dynamic-computing-resources" + }, + { + "href": "https://nextflow.io/docs/latest/process.html#resourcelimits", + "_key": "b7a1925ece6e", + "_type": "link" + } + ], + "children": [ + { + "_key": "2ce266e12f3a", + "_type": "span", + "marks": [], + "text": "If a task requests more than the specified limit (e.g. due to " + }, + { + "_type": "span", + "marks": [ + "401af70f9fde" + ], + "text": "retry with dynamic resources", + "_key": "20c20a706634" + }, + { + "marks": [], + "text": "), Nextflow will automatically reduce the task resources to satisfy the limit, whereas normally the task would be rejected by the scheduler or would simply wait in the queue forever! The nf-core community has maintained a custom workaround for this problem, the ", + "_key": "80ca2f00a9b8", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "check_max()", + "_key": "cdfd0b18dd66" + }, + { + "_type": "span", + "marks": [], + "text": " function, which can now be replaced with ", + "_key": "d3f1f9c1e222" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "resourceLimits", + "_key": "7c9a672623c5" + }, + { + "_type": "span", + "marks": [], + "text": ". See the ", + "_key": "1899583359ab" + }, + { + "marks": [ + "b7a1925ece6e" + ], + "text": "Nextflow docs", + "_key": "bd63d4401634", + "_type": "span" + }, + { + "text": " for more information.", + "_key": "954b15f9c395", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "95d1ca831fb8", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "b6290cff554b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "551589835037", + "markDefs": [], + "children": [ + { + "_key": "8152b5db88fe", + "_type": "span", + "marks": [], + "text": "Job arrays" + } + ] + }, + { + "children": [ + { + "_key": "6735621e3543", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Job arrays" + }, + { + "text": " are now supported in Nextflow using the ", + "_key": "0e64b777706f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "array", + "_key": "999aabcfa026", + "_type": "span" + }, + { + "text": " directive. Most HPC schedulers, and even some cloud batch services including AWS Batch and Google Batch, support a "job array" which allows you to submit many independent jobs with a single job script. While the individual jobs are still executed separately as normal, submitting jobs as arrays where possible puts considerably less stress on the scheduler.", + "_key": "3ebf2e7cb85c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "bdabcefeaa7d", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1502e87fa3c6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b2589c62a953" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "82f99ab75041", + "_type": "span", + "marks": [], + "text": "With Nextflow, using job arrays is a one-liner:" + } + ], + "_type": "block", + "style": "normal", + "_key": "e9ffaac0ec29" + }, + { + "children": [ + { + "text": "", + "_key": "cb904d24755b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6df64acb8b0b", + "markDefs": [] + }, + { + "_key": "fa7c0d0d8b2e", + "code": "process.array = 100", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "77a9b005acd6" + } + ], + "_type": "block", + "style": "normal", + "_key": "657c9f7e6253" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can also enable job arrays for individual processes like any other directive. See the ", + "_key": "47f50deb216b" + }, + { + "_key": "0ec0ab201634", + "_type": "span", + "marks": [ + "fe40820e1e62" + ], + "text": "Nextflow docs" + }, + { + "_type": "span", + "marks": [], + "text": " for more information.", + "_key": "04e0b3828edc" + } + ], + "_type": "block", + "style": "normal", + "_key": "33a9534a2675", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#array", + "_key": "fe40820e1e62" + } + ] + }, + { + "_key": "4685c61b51cb", + "markDefs": [], + "children": [ + { + "_key": "d7265d1cf201", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "645fcfe3c651", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Tip:" + }, + { + "_type": "span", + "marks": [], + "text": " On Google Batch, using job arrays also allows you to pack multiple tasks onto the same VM by using the ", + "_key": "3867bc032618" + }, + { + "text": "machineType", + "_key": "13e48c906599", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "6d0022d2bc90", + "_type": "span", + "marks": [], + "text": " directive in conjunction with the " + }, + { + "marks": [ + "code" + ], + "text": "cpus", + "_key": "3a626f7bd57e", + "_type": "span" + }, + { + "marks": [], + "text": " and ", + "_key": "b390ba343f6a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "memory", + "_key": "6f0100b86939" + }, + { + "_type": "span", + "marks": [], + "text": " directives.", + "_key": "11a8fdeda8b2" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "95d29d016b38" + }, + { + "_type": "block", + "style": "normal", + "_key": "84c29fa9c40f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bbdcfb99647c" + } + ] + }, + { + "_key": "59326bb7d850", + "markDefs": [], + "children": [ + { + "_key": "4d1e89cde68a", + "_type": "span", + "marks": [], + "text": "Enhancements" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "h3", + "_key": "4b706c1b484b", + "markDefs": [], + "children": [ + { + "_key": "84266e23add4", + "_type": "span", + "marks": [], + "text": "\nColored logs" + } + ] + }, + { + "_key": "e5380afd470a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "97322ce741c1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "dfa18d7b8f6d", + "_type": "link", + "href": "https://nextflow.io/blog/2024/nextflow-colored-logs.html" + } + ], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Colored logs", + "_key": "b8237056c730", + "_type": "span" + }, + { + "_key": "e909f98b2645", + "_type": "span", + "marks": [], + "text": " have come to Nextflow! Specifically, the process log which is continuously printed to the terminal while the pipeline is running. Not only is it more colorful, but it also makes better use of the available space to show you what's most important. But we already wrote an entire " + }, + { + "_type": "span", + "marks": [ + "dfa18d7b8f6d" + ], + "text": "blog post", + "_key": "368bb9469d2c" + }, + { + "_type": "span", + "marks": [], + "text": " about it, so go check that out for more details!", + "_key": "71b6e615d0bc" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d24f5992ace" + }, + { + "_key": "4a46295e9e26", + "markDefs": [], + "children": [ + { + "_key": "5277f8addee9", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3cf067af661c", + "markDefs": [], + "children": [ + { + "_key": "9dff5454a640", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "image", + "alt": "New coloured output from Nextflow", + "_key": "bf914b019395", + "asset": { + "_ref": "image-aca8082c7fcd2be86b3cbd8d29611c81c8127620-2532x1577-png", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "0d64d965155c", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "94653e98ddde", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "74ea9eaa2319", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e8b4f2b16f0e", + "markDefs": [] + }, + { + "style": "h3", + "_key": "aee22ad5b92b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "AWS Fargate support", + "_key": "b31d1b72bdd4" + } + ], + "_type": "block" + }, + { + "_key": "6a33ada10117", + "markDefs": [ + { + "_key": "df9015e7e177", + "_type": "link", + "href": "https://nextflow.io/docs/latest/aws.html#aws-fargate" + } + ], + "children": [ + { + "_key": "ed4efebd7c56", + "_type": "span", + "marks": [], + "text": "Nextflow now supports " + }, + { + "_key": "34eaa40c12dd", + "_type": "span", + "marks": [ + "strong" + ], + "text": "AWS Fargate" + }, + { + "_type": "span", + "marks": [], + "text": " for AWS Batch jobs. See the ", + "_key": "0e5a86990bd8" + }, + { + "_type": "span", + "marks": [ + "df9015e7e177" + ], + "text": "Nextflow docs", + "_key": "67dc13efad7f" + }, + { + "_type": "span", + "marks": [], + "text": " for details.", + "_key": "ddbfda3f3e93" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "ee2c62be20e9", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "b2e06783cfb8", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "OCI auto pull mode for Singularity and Apptainer", + "_key": "127d720a7b2d" + } + ], + "_type": "block", + "style": "h3", + "_key": "4ccfa0b15f77" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow now supports OCI auto pull mode both Singularity and Apptainer. Historically, Singularity could run a Docker container image converting to the Singularity image file format via the Singularity pull command and using the resulting image file in the exec command. This adds extra overhead to the head node running Nextflow for converting all container images to the Singularity format.", + "_key": "8cefb0272343" + } + ], + "_type": "block", + "style": "normal", + "_key": "0186cd80b694" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "63b121fda7a2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8f6bc8e8aeff" + }, + { + "children": [ + { + "text": "Now Nextflow allows specifying the option ", + "_key": "c7a4296e6c2a", + "_type": "span", + "marks": [] + }, + { + "text": "ociAutoPull", + "_key": "96c774749ab2", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "d58d68d99236", + "_type": "span", + "marks": [], + "text": " both for Singularity and Apptainer. When enabling this setting Nextflow delegates the pull and conversion of the Docker image directly to the " + }, + { + "marks": [ + "code" + ], + "text": "exec", + "_key": "4d13570323f5", + "_type": "span" + }, + { + "marks": [], + "text": " command.", + "_key": "67ba92a1b76f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bfbe871517c4", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bf12e3f90477", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3b71e4f8b356" + }, + { + "code": "singularity.ociAutoPull = true", + "_type": "code", + "_key": "42d99ee2e155" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "221425f6ac8c" + } + ], + "_type": "block", + "style": "normal", + "_key": "40f2dd26bdfc" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "This results in the running of the pull and caching of the Singularity images to the compute jobs instead of the head job and removing the need to maintain a separate image files cache.", + "_key": "5c21500334e5" + } + ], + "_type": "block", + "style": "normal", + "_key": "05613f2e89ea", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b84cdd204874" + } + ], + "_type": "block", + "style": "normal", + "_key": "46d64ab0ebd8", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ffd93508a693", + "markDefs": [ + { + "_key": "6539c975522d", + "_type": "link", + "href": "https://nextflow.io/docs/latest/config.html#scope-singularity" + } + ], + "children": [ + { + "text": "See the ", + "_key": "964821bd744b", + "_type": "span", + "marks": [] + }, + { + "text": "Nextflow docs", + "_key": "d9410a6b7de2", + "_type": "span", + "marks": [ + "6539c975522d" + ] + }, + { + "text": " for more information.", + "_key": "b1f2b22badf0", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1dac8bf88bd9", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "f8ebe2f1dda8" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Support for GA4GH TES", + "_key": "718674fca84d" + } + ], + "_type": "block", + "style": "h3", + "_key": "12746235b124", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "5d70c11c99c5", + "markDefs": [ + { + "_key": "6d28ff2ff74f", + "_type": "link", + "href": "https://ga4gh.github.io/task-execution-schemas/docs/" + }, + { + "_type": "link", + "href": "https://www.ga4gh.org/", + "_key": "a160083885e8" + }, + { + "_type": "link", + "href": "https://github.com/ohsu-comp-bio/funnel", + "_key": "eecd77a2b7ac" + }, + { + "_key": "ebd6249e99b5", + "_type": "link", + "href": "https://github.com/microsoft/ga4gh-tes" + } + ], + "children": [ + { + "text": "The ", + "_key": "22c1f960cd92", + "_type": "span", + "marks": [] + }, + { + "_key": "e07d5392bd23", + "_type": "span", + "marks": [ + "6d28ff2ff74f" + ], + "text": "Task Execution Service (TES)" + }, + { + "marks": [], + "text": " is an API specification, developed by ", + "_key": "8bdad5a92db6", + "_type": "span" + }, + { + "text": "GA4GH", + "_key": "0bb9a2b680c1", + "_type": "span", + "marks": [ + "a160083885e8" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", which attempts to provide a standard way for workflow managers like Nextflow to interface with execution backends. Two noteworthy TES implementations are ", + "_key": "37b6cd9e4c7b" + }, + { + "text": "Funnel", + "_key": "023b74efbd6b", + "_type": "span", + "marks": [ + "eecd77a2b7ac" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "24344b030149" + }, + { + "marks": [ + "ebd6249e99b5" + ], + "text": "TES Azure", + "_key": "a55645a98515", + "_type": "span" + }, + { + "_key": "81ca571996b9", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "dec181d1faa6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "66a2becd7c5e", + "markDefs": [] + }, + { + "_key": "e2fd7cb5d97c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow has long supported TES as an executor, but only in a limited sense, as TES did not support some important capabilities in Nextflow such as glob and directory outputs and the ", + "_key": "88b94210ef74" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "bin", + "_key": "2206cbf28ca2" + }, + { + "_key": "3aa8ef91e443", + "_type": "span", + "marks": [], + "text": " directory. However, with TES 1.1 and its adoption into Nextflow, these gaps have been closed. You can use the TES executor with the following configuration:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "dcd607188b80", + "markDefs": [], + "children": [ + { + "_key": "8c7d1a1e091f", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "plugins {\n id 'nf-ga4gh'\n}\n\nprocess.executor = 'tes'\ntes.endpoint = '...'", + "_type": "code", + "_key": "3967136bc682" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "dcfc1e1af973" + } + ], + "_type": "block", + "style": "normal", + "_key": "447067039818", + "markDefs": [] + }, + { + "style": "normal", + "_key": "bef1f47fc67b", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/executor.html#ga4gh-tes", + "_key": "0f440b394c1b" + } + ], + "children": [ + { + "marks": [], + "text": "See the ", + "_key": "b8e67be34604", + "_type": "span" + }, + { + "_key": "6276972b35c4", + "_type": "span", + "marks": [ + "0f440b394c1b" + ], + "text": "Nextflow docs" + }, + { + "_key": "68a7b4e484e8", + "_type": "span", + "marks": [], + "text": " for more information.\n" + } + ], + "_type": "block" + }, + { + "style": "blockquote", + "_key": "b3f5b3848a0d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Note:", + "_key": "d7f6e46d4ee0" + }, + { + "_key": "f4c55623b930", + "_type": "span", + "marks": [], + "text": " To better facilitate community contributions, the nf-ga4gh plugin will soon be moved from the Nextflow repository into its own repository, " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow-io/nf-ga4gh", + "_key": "6edc0cfe0a05" + }, + { + "_type": "span", + "marks": [], + "text": ". To ensure a smooth transition with your pipelines, make sure to explicitly include the plugin in your configuration as shown above.", + "_key": "cbbe1ba50e00" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "221827e061cf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "deaa6fef10f7", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Fusion", + "_key": "e138c2d8053b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "b8a43250eecb" + }, + { + "_type": "block", + "style": "normal", + "_key": "1ce4a38931e5", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/fusion/", + "_key": "bb03411aacf3" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "bb03411aacf3" + ], + "text": "Fusion", + "_key": "f729ba1c38e1" + }, + { + "_type": "span", + "marks": [], + "text": " is a distributed virtual file system for cloud-native data pipeline and optimized for Nextflow workloads. Nextflow 24.04 now works with a new release, Fusion 2.3. This brings a few notable quality-of-life improvements:", + "_key": "fde17e1bf817" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ed96a9bad94a", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "5deb6e0d44d8", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Enhanced Garbage Collection", + "_key": "9b8874103506" + } + ], + "_type": "block", + "style": "h3", + "_key": "cae24b525118" + }, + { + "style": "normal", + "_key": "bd5deb6b3df1", + "markDefs": [], + "children": [ + { + "_key": "58fa1d303887", + "_type": "span", + "marks": [], + "text": "Fusion 2.3 features an improved garbage collection system, enabling it to operate effectively with reduced scratch storage. This enhancement ensures that your pipelines run more efficiently, even with limited temporary storage." + } + ], + "_type": "block" + }, + { + "_key": "0e6ba71967dc", + "markDefs": [], + "children": [ + { + "_key": "26478170d5d8", + "_type": "span", + "marks": [], + "text": "\n" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "d2eef148c093", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Increased File Handling Capacity", + "_key": "b4fcd67cef84" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d141f0f6e179", + "markDefs": [], + "children": [ + { + "_key": "c7a0efb102db", + "_type": "span", + "marks": [], + "text": "Support for more concurrently open files is another significant improvement in Fusion 2.3. This means that larger directories, such as those used by Alphafold2, can now be utilized without issues, facilitating the handling of extensive datasets." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "69ee157f75d6", + "markDefs": [], + "children": [ + { + "_key": "35505d8fa5b1", + "_type": "span", + "marks": [], + "text": "\n" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Correct Publishing of Symbolic Links", + "_key": "43775737120b" + } + ], + "_type": "block", + "style": "h3", + "_key": "54a28730d372" + }, + { + "style": "normal", + "_key": "c8e0916d8498", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In previous versions, output files that were symbolic links were not published correctly — instead of the actual file, a text file containing the file path was published. Fusion 2.3 addresses this issue, ensuring that symbolic links are published correctly.", + "_key": "b79e5d96c7fa" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8daeb9cbdf54" + } + ], + "_type": "block", + "style": "normal", + "_key": "2220802c2577" + }, + { + "style": "normal", + "_key": "e1b46453dbeb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "These enhancements in Fusion 2.3 contribute to a more robust and efficient filesystem for Nextflow users.", + "_key": "b93d2a601bbc" + } + ], + "_type": "block" + }, + { + "_key": "a0e88cb461b8", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e8d94230d96a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "50091cbe17e5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Other notable changes", + "_key": "19fc0690bba8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e0786e58bf46", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/ea1c1b70da7a9b8c90de445b8aee1ee7a7148c9b", + "_key": "e4f07159e17e" + } + ], + "children": [ + { + "marks": [], + "text": "Add native retry on spot termination for Google Batch (", + "_key": "33d0bf86dd0e0", + "_type": "span" + }, + { + "marks": [ + "e4f07159e17e", + "code" + ], + "text": "ea1c1b", + "_key": "33d0bf86dd0e1", + "_type": "span" + }, + { + "marks": [], + "text": ")", + "_key": "33d0bf86dd0e2", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "48ee25f73500", + "listItem": "bullet", + "markDefs": [ + { + "_key": "865d7b3d6b7c", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/df7ed294520ad2bfc9ad091114ae347c1e26ae96" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Add support for instance templates in Google Batch (", + "_key": "f8415a680af10" + }, + { + "_type": "span", + "marks": [ + "865d7b3d6b7c", + "code" + ], + "text": "df7ed2", + "_key": "f8415a680af11" + }, + { + "marks": [], + "text": ")", + "_key": "f8415a680af12", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "b541e49f7aa7", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/00c9f226b201c964f67d520d0404342bc33cf61d" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Allow secrets to be used with ", + "_key": "e55a0696efe50" + }, + { + "text": "includeConfig", + "_key": "e55a0696efe51", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " (", + "_key": "e55a0696efe52", + "_type": "span", + "marks": [] + }, + { + "text": "00c9f2", + "_key": "e55a0696efe53", + "_type": "span", + "marks": [ + "b541e49f7aa7", + "code" + ] + }, + { + "marks": [], + "text": ")", + "_key": "e55a0696efe54", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c3b4d20ad430" + }, + { + "children": [ + { + "marks": [], + "text": "Allow secrets to be used in the pipeline script (", + "_key": "25a3ca17c91a0", + "_type": "span" + }, + { + "marks": [ + "43cccf607d50", + "code" + ], + "text": "df866a", + "_key": "25a3ca17c91a1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ")", + "_key": "25a3ca17c91a2" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0871742da57f", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/df866a243256d5018e23b6c3237fb06d1c5a4b27", + "_key": "43cccf607d50" + } + ] + }, + { + "style": "normal", + "_key": "83defc816873", + "listItem": "bullet", + "markDefs": [ + { + "_key": "7b7b108c5872", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/c9c7032c2e34132cf721ffabfea09d893adf3761" + } + ], + "children": [ + { + "marks": [], + "text": "Add retry strategy for publishing (", + "_key": "f2806d80e7420", + "_type": "span" + }, + { + "_key": "f2806d80e7421", + "_type": "span", + "marks": [ + "7b7b108c5872", + "code" + ], + "text": "c9c703" + }, + { + "_type": "span", + "marks": [], + "text": ")", + "_key": "f2806d80e7422" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "a87f4dfb14e9", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/commit/3c6e96d07c9a4fa947cf788a927699314d5e5ec7", + "_key": "d3006ca2727f", + "_type": "link" + } + ], + "children": [ + { + "text": "Add ", + "_key": "def310e0aadb0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "k8s.cpuLimits", + "_key": "def310e0aadb1", + "_type": "span" + }, + { + "_key": "def310e0aadb2", + "_type": "span", + "marks": [], + "text": " config option (" + }, + { + "_type": "span", + "marks": [ + "d3006ca2727f", + "code" + ], + "text": "3c6e96", + "_key": "def310e0aadb3" + }, + { + "text": ")", + "_key": "def310e0aadb4", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "5d78a40c3ddc", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/commit/ec5ebd0bc96e986415e7bac195928b90062ed062", + "_key": "1f9d90b7dbfc" + } + ], + "children": [ + { + "marks": [], + "text": "Removed ", + "_key": "7b1c49714cce0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "seqera", + "_key": "7b1c49714cce1" + }, + { + "text": " and ", + "_key": "7b1c49714cce2", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "defaults", + "_key": "7b1c49714cce3" + }, + { + "_key": "7b1c49714cce4", + "_type": "span", + "marks": [], + "text": " from the standard channels used by the nf-wave plugin. (" + }, + { + "_type": "span", + "marks": [ + "1f9d90b7dbfc", + "code" + ], + "text": "ec5ebd", + "_key": "7b1c49714cce5" + }, + { + "_type": "span", + "marks": [], + "text": ")", + "_key": "7b1c49714cce6" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "7863080c74d6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "dba5c01be978" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "356b6ff740e0", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v24.04.0", + "_key": "9d3fb20c3490", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can view the full ", + "_key": "25a367774129" + }, + { + "_key": "16fce96ab399", + "_type": "span", + "marks": [ + "9d3fb20c3490" + ], + "text": "Nextflow release notes on GitHub" + }, + { + "marks": [], + "text": ".", + "_key": "ba376df78e17", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + } + ], + "_createdAt": "2024-09-25T14:18:03Z" + }, + { + "body": [ + { + "style": "h2", + "_key": "d3de926d7d4e", + "children": [ + { + "_type": "span", + "text": "Introduction", + "_key": "738e30f0d3fd" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-a5412f2dc14c657261e561d975324028e70cb4ae-1024x1024-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Word cloud", + "_key": "51df27a86835" + }, + { + "_type": "block", + "style": "normal", + "_key": "72e9df06430f", + "markDefs": [], + "children": [ + { + "text": "Word cloud of scientific interest keywords, averaged across all applications.", + "_key": "393ce0fb4007", + "_type": "span", + "marks": [ + "em" + ] + } + ] + }, + { + "style": "normal", + "_key": "118da7abe79c", + "children": [ + { + "text": "", + "_key": "1ac6421ab376", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "bd1046cd4085", + "_type": "block" + }, + { + "style": "normal", + "_key": "4527211bc460", + "markDefs": [ + { + "_key": "af5842058746", + "_type": "link", + "href": "https://seqera.io/blog/state-of-the-workflow-2022-results/" + } + ], + "children": [ + { + "marks": [], + "text": "Our recent ", + "_key": "194e0a2dea30", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "af5842058746" + ], + "text": "The State of the Workflow 2022: Community Survey Results", + "_key": "1846c969f96f" + }, + { + "marks": [], + "text": " showed that Nextflow and nf-core have a strong global community with a high level of engagement in several countries. As the community continues to grow, we aim to prioritize inclusivity for everyone through active outreach to groups with low representation.", + "_key": "8d6ca510af5b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "afcded4123f7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0493af51e9f3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Thanks to funding from our Chan Zuckerberg Initiative Diversity and Inclusion grant we established an international Nextflow and nf-core mentoring program with the aim of empowering those from underrepresented groups. With the first round of the mentorship now complete, we look back at the success of the program so far.", + "_key": "591167eb5f93", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "98f04ee33da7", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "ef57b732a44e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "16f858a219f9" + }, + { + "children": [ + { + "marks": [], + "text": "From almost 200 applications, five pairs of mentors and mentees were selected for the first round of the program. Over the following four months they met weekly to work on Nextflow based projects. We attempted to pair mentors and mentees based on their time zones and scientific interests. Project tasks were left up to the individuals and so tailored to the mentee's scientific interests and schedules.", + "_key": "60ff5a9c05fd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "074b572cd129", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4cfcd5ae0a91" + } + ], + "_type": "block", + "style": "normal", + "_key": "219432f6cada" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "People worked on things ranging from setting up Nextflow and nf-core on their institutional clusters to developing and implementing Nextflow and nf-core pipelines for next-generation sequencing data. Impressively, after starting the program knowing very little about Nextflow and nf-core, mentees finished the program being able to confidently develop and implement scalable and reproducible scientific workflows.", + "_key": "02df0b7d42b1" + } + ], + "_type": "block", + "style": "normal", + "_key": "55cc552c50d9" + }, + { + "_type": "block", + "style": "normal", + "_key": "3ac045899f3b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "529136b7574e" + } + ] + }, + { + "_type": "image", + "alt": "Map of mentor / mentee pairs", + "_key": "d910397d21ed", + "asset": { + "_type": "reference", + "_ref": "image-31539d4ccaf43ac747479baa294127db8f940419-1833x867-png" + } + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d712594299e8" + } + ], + "_type": "block", + "style": "normal", + "_key": "48ecc573fb19" + }, + { + "_key": "ee42852d0068", + "children": [ + { + "_type": "span", + "text": "Ndeye Marième Top (mentee) & John Juma (mentor)", + "_key": "249da459c935" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "f8ae55ba3dbe", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For the mentorship, Marième wanted to set up Nextflow and nf-core on the servers at the Institut Pasteur de Dakar in Senegal and learn how to develop / contribute to a pipeline. Her mentor was John Juma, from the ILRI/SANBI in Kenya.", + "_key": "a755e3dc6d21", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "a96500950495", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "90679b4f0e1e" + }, + { + "style": "normal", + "_key": "67d2039365d6", + "markDefs": [ + { + "_key": "c6e86c42169c", + "_type": "link", + "href": "https://nf-co.re/viralrecon" + }, + { + "_type": "link", + "href": "https://gisaid.org/", + "_key": "fa5fe8e2d66b" + }, + { + "_type": "link", + "href": "https://nf-co.re/mag", + "_key": "557ae690d572" + } + ], + "children": [ + { + "_key": "c7b802c84777", + "_type": "span", + "marks": [], + "text": "Together, Marème overcame issues with containers and server privileges and developed her local config, learning about how to troubleshoot and where to find help along the way. By the end of the mentorship she was able to set up the " + }, + { + "marks": [ + "c6e86c42169c" + ], + "text": "nf-core/viralrecon", + "_key": "d150e207b445", + "_type": "span" + }, + { + "text": " pipeline for the genomic surveillance analysis of SARS-Cov2 sequencing data from Senegal as well as 17 other countries in West Africa, ready for submission to ", + "_key": "64b6246d9e10", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "fa5fe8e2d66b" + ], + "text": "GISAID", + "_key": "66bc008dff22" + }, + { + "text": ". She also got up to speed with the ", + "_key": "6e1523bba58e", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "557ae690d572" + ], + "text": "nf-core/mag", + "_key": "4ebfddd89624", + "_type": "span" + }, + { + "text": " pipeline for metagenomic analysis.", + "_key": "269184163b9f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a312a980c5b2" + } + ], + "_type": "block", + "style": "normal", + "_key": "875f81900a0e" + }, + { + "markDefs": [], + "children": [ + { + "_key": "e0db6b94ac72", + "_type": "span", + "marks": [], + "text": "> " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"Having someone experienced who can guide you in my learning process. My mentor really helped me understand and focus on the practical aspects since my main concern was having the pipelines correctly running in my institution.\"", + "_key": "2131a66e8beb" + }, + { + "_type": "span", + "marks": [], + "text": " - Marième Top (mentee)", + "_key": "51df78dd7f77" + } + ], + "_type": "block", + "style": "normal", + "_key": "7573542163c4" + }, + { + "style": "normal", + "_key": "2a3b2363e0ba", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bc759ecb2258" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5a9aed754e56", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "> ", + "_key": "e4ef106a7a8e", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"The program was awesome. I had a chance to impart nextflow principles to someone I have never met before. Fully virtual, the program instilled some sense of discipline in terms of setting and meeting objectives.\"", + "_key": "8a8f45e2df0a" + }, + { + "_key": "b75e6b2536ba", + "_type": "span", + "marks": [], + "text": " - John Juma (mentor)" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0beb6b65ed62", + "children": [ + { + "_key": "9ac26d2425c9", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "d61766d73f88", + "children": [ + { + "_type": "span", + "text": "Philip Ashton (mentee) & Robert Petit (mentor)", + "_key": "61b0e31178db" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://bactopia.github.io/", + "_key": "62aa54ec1a18" + } + ], + "children": [ + { + "marks": [], + "text": "Philip wanted to move up the Nextflow learning curve and set up nf-core workflows at Kamuzu University of Health Sciences in Malawi. His mentor was Robert Petit from the Wyoming Public Health Laboratory in the USA. Robert has developed the ", + "_key": "dfe3ff994f93", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "62aa54ec1a18" + ], + "text": "Bactopia", + "_key": "9b1c81a0eb37" + }, + { + "text": " pipeline for the analysis of bacterial pipeline and it was Philip’s aim to get this running for his group in Malawi.", + "_key": "5c66190409c9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e1b1a9aaec4f" + }, + { + "style": "normal", + "_key": "559b0139b9e2", + "children": [ + { + "_key": "8f91adccd3e1", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Robert helped Philip learn Nextflow, enabling him to independently deploy DSL2 pipelines and process genomes using Nextflow Tower. Philip is already using his new found skills to answer important public health questions in Malawi and is now passing his knowledge to other staff and students at his institute. Even though the mentorship program has finished, Philip and Rob will continue a collaboration and have plans to deploy pipelines that will benefit public health in the future.", + "_key": "745e843dc99f" + } + ], + "_type": "block", + "style": "normal", + "_key": "fb513bd10d33" + }, + { + "style": "normal", + "_key": "9b9d000b4a7d", + "children": [ + { + "_key": "51c2e6c630cd", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> ", + "_key": "e290a8c2277c" + }, + { + "_key": "8df88cb8a907", + "_type": "span", + "marks": [ + "em" + ], + "text": "\"I tried to learn nextflow independently some time ago, but abandoned it for the more familiar snakemake. Thanks to Robert’s mentorship I’m now over the learning curve and able to deploy nf-core pipelines and use cloud resources more efficiently via Nextflow Tower\"" + }, + { + "_key": "b259ce528ab7", + "_type": "span", + "marks": [], + "text": " - Phil Ashton (mentee)" + } + ], + "_type": "block", + "style": "normal", + "_key": "cf3aac1e1593" + }, + { + "_key": "faf540bb721e", + "children": [ + { + "_key": "c9acfafe2874", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3276db06cc48", + "markDefs": [], + "children": [ + { + "text": "> ", + "_key": "2483d84cea57", + "_type": "span", + "marks": [] + }, + { + "text": "\"I found being a mentor to be a rewarding experience and a great opportunity to introduce mentees into the Nextflow/nf-core community. Phil and I were able to accomplish a lot in the span of a few months, and now have many plans to collaborate in the future.\"", + "_key": "fe592fdac35f", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " - Robert Petit (mentor)", + "_key": "42e5e52be9dd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "299be282ae9a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3dcdb661801c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "283e4de78711", + "children": [ + { + "_key": "1b73fb3f98d5", + "_type": "span", + "text": "Kalayanee Chairat (mentee) & Alison Meynert (mentor)" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "4fbc7f8abc75", + "_type": "span", + "marks": [], + "text": "Kalayanee’s goal for the mentorship program was to set up and run Nextflow and nf-core pipelines at the local infrastructure at the King Mongkut’s University of Technology Thonburi in Thailand. Kalayanee was mentored by Alison Meynert, from the University of Edinburgh in the United Kingdom." + } + ], + "_type": "block", + "style": "normal", + "_key": "4ad6496f2d49", + "markDefs": [] + }, + { + "style": "normal", + "_key": "0b4491576528", + "children": [ + { + "_type": "span", + "text": "", + "_key": "72bf1014087b" + } + ], + "_type": "block" + }, + { + "_key": "019db67b03ac", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/configs", + "_key": "693db29767c2" + }, + { + "_type": "link", + "href": "https://nf-co.re/sarek", + "_key": "24a8c82f740e" + }, + { + "_type": "link", + "href": "https://nf-co.re/rnaseq", + "_key": "0b0903b66397" + } + ], + "children": [ + { + "_key": "b273b101c58e", + "_type": "span", + "marks": [], + "text": "Working with Alison, Kalayanee learned about Nextflow and nf-core and the requirements for working with Slurm and Singularity. Together, they created a configuration profile that Kalayanee and others at her institute can use - they have plans to submit this to " + }, + { + "_key": "085a972023a8", + "_type": "span", + "marks": [ + "693db29767c2" + ], + "text": "nf-core/configs" + }, + { + "_type": "span", + "marks": [], + "text": " as an institutional profile. Now she is familiar with these tools, Kalayanee is using ", + "_key": "b2e402259036" + }, + { + "_type": "span", + "marks": [ + "24a8c82f740e" + ], + "text": "nf-core/sarek", + "_key": "b26f0dc77189" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "04c0248e6ac8" + }, + { + "_type": "span", + "marks": [ + "0b0903b66397" + ], + "text": "nf-core/rnaseq", + "_key": "7b44a1a5ba5b" + }, + { + "_key": "9b3631ca088a", + "_type": "span", + "marks": [], + "text": " to analyze 100s of samples of her own next-generation sequencing data on her local HPC environment." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f5c4cb724bff", + "children": [ + { + "text": "", + "_key": "26324a75d8d6", + "_type": "span" + } + ] + }, + { + "_key": "b34b4c3c8f1c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> ", + "_key": "4dba083e351d" + }, + { + "text": "\"The mentorship program is a great start to learn to use and develop analysis pipelines built using Nextflow. I gained a lot of knowledge through this program. I am also very lucky to have Dr. Alison Meynert as my mentor. She is very knowledgeable, kind and willing to help in every step.\"", + "_key": "1af2f58d53fb", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "marks": [], + "text": " - Kalayanee Chairat (mentee)", + "_key": "e77d89d866ab", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "dedb8d93ba88", + "children": [ + { + "text": "", + "_key": "028e43e089f0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "9cf4e3b95ae8", + "markDefs": [], + "children": [ + { + "text": "> ", + "_key": "e6cbe7501959", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"It was a great experience for me to work with my mentee towards her goal. The process solidified some of my own topical knowledge and I learned new things along the way as well.\"", + "_key": "6b58b44128b8" + }, + { + "text": " - Alison Meynert (mentor)", + "_key": "c08c1807714e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "29e6bc861527", + "children": [ + { + "text": "", + "_key": "4f65186449bb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Edward Lukyamuzi (mentee) & Emilio Garcia-Rios (mentor)", + "_key": "270629a92a81", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "7a73aec61d83" + }, + { + "style": "normal", + "_key": "074400679a39", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the mentoring program Edward’s goal was to understand the fundamental components of a Nextflow script and write a Nextflow pipeline for analyzing mosquito genomes. Edward was mentored by Emilio Garcia-Rios, from the EMBL-EBI in the United Kingdom.", + "_key": "ba460c9aa91e" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9ce3e32e8d46", + "children": [ + { + "text": "", + "_key": "6cd8e8b720d8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Edward learned the fundamental concepts of Nextflow, including channels, processes and operators. Edward works with sequencing data from the mosquito genome - with help from Emilio he wrote a Nextflow pipeline with an accompanying Dockerfile for the alignment of reads and genotyping of SNPs. Edward will continue to develop his pipeline and wants to become more involved with the Nextflow and nf-core community by attending the nf-core hackathons. Edward is also very keen to help others learn Nextflow and expressed an interest in being part of this program again as a mentor.", + "_key": "3c75d3a0accd" + } + ], + "_type": "block", + "style": "normal", + "_key": "f7708a13e9ae", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "84986d379ba1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "6cb0d2b884c8" + } + ] + }, + { + "_key": "a29b3e354b68", + "markDefs": [], + "children": [ + { + "_key": "2f0bd90b55cf", + "_type": "span", + "marks": [], + "text": "> " + }, + { + "text": "\"Learning Nextflow can be a steep curve. Having a partner to give you a little push might be what facilitates adoption of Nextflow into your daily routine.\"", + "_key": "c50222763f79", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "74e365660454", + "_type": "span", + "marks": [], + "text": " - Edward Lukyamuzi (mentee)" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "983403438678", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a7fe9c30bb1b" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> ", + "_key": "7d1ddd7f4ef9" + }, + { + "marks": [ + "em" + ], + "text": "\"I would like more people to discover and learn the benefits using Nextflow has. Being a mentor in this program can help me collaborate with other colleagues and be a mentor in my institute as well.\"", + "_key": "50761b319c65", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " - Emilio Garcia-Rios (mentor)", + "_key": "135710a3916e" + } + ], + "_type": "block", + "style": "normal", + "_key": "e42966884ae0" + }, + { + "_type": "block", + "style": "normal", + "_key": "8c9a2c369a57", + "children": [ + { + "text": "", + "_key": "b1710f24819e", + "_type": "span" + } + ] + }, + { + "style": "h2", + "_key": "b2fafb3d2661", + "children": [ + { + "_type": "span", + "text": "Suchitra Thapa (mentee) & Maxime Borry (mentor)", + "_key": "cc3de2e6f7c7" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0a66f450f5d7", + "markDefs": [ + { + "href": "https://github.com/suchitrathapa/metaphlankrona", + "_key": "05c3ce85f98a", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Suchitra started the program to learn about running Nextflow pipelines but quickly moved on to pipeline development and deployment on the cloud. Suchitra and Maxime encountered some technical challenges during the mentorship, including difficulties with internet connectivity and access to computational platforms for analysis. Despite this, with help from Maxime, Suchitra applied her newly acquired skills and made substantial progress converting the ", + "_key": "31a64959aeab" + }, + { + "_type": "span", + "marks": [ + "05c3ce85f98a" + ], + "text": "metaphlankrona", + "_key": "4c9fb915325d" + }, + { + "marks": [], + "text": " pipeline for metagenomic analysis of microbial communities from Nextflow DSL1 to DSL2 syntax.", + "_key": "0d040aabf018", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "ca1e4a140ddd", + "children": [ + { + "text": "", + "_key": "8d4c1b00df02", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Suchitra will be sharing her work and progress on the pipeline as a poster at the ", + "_key": "775eba9624a8", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "88944a9c0c59" + ], + "text": "Nextflow Summit 2022", + "_key": "24e91bc97a4e" + }, + { + "text": ".", + "_key": "d00b0ec50bf9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "202ad20eb189", + "markDefs": [ + { + "href": "https://summit.nextflow.io/speakers/suchitra-thapa/", + "_key": "88944a9c0c59", + "_type": "link" + } + ] + }, + { + "_key": "b2db3eec3f2e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "41c727116ad2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "8ff27e09cc2b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "> ", + "_key": "9ef9c83605ce", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "\"This mentorship was one of the best organized online learning opportunities that I have attended so far. With time flexibility and no deadline burden, you can easily fit this mentorship into your busy schedule. I would suggest everyone interested to definitely go for it.\"", + "_key": "a9952bbf287c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " - Suchitra Thapa (mentee)", + "_key": "665afe3bc168" + } + ], + "_type": "block" + }, + { + "_key": "a079cd58fb3c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "81259c1f4ae3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "8f1ed3147881", + "markDefs": [], + "children": [ + { + "_key": "79fd5739f813", + "_type": "span", + "marks": [], + "text": "> " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"This mentorship program was a very fruitful and positive experience, and the satisfaction to see someone learning and growing their bioinformatics skills is very rewarding.\"", + "_key": "e91d4988fd83" + }, + { + "_type": "span", + "marks": [], + "text": " - Maxime Borry (mentor)", + "_key": "fb07dd0f06bd" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "490b63262630", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "90cbe5d73720" + }, + { + "_key": "cb71844b2004", + "children": [ + { + "text": "Conclusion", + "_key": "05573d4c8c19", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "marks": [], + "text": "Feedback from the first round of the mentorship program was overwhelmingly positive. Both mentors and mentees found the experience to be a rewarding opportunity and were grateful for taking part. Everyone who participated in the program said that they would encourage others to be a part of it in the future.", + "_key": "1f4d6f173c17", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "46a962656f63", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e5d1023accb2" + } + ], + "_type": "block", + "style": "normal", + "_key": "f19c2ae70921" + }, + { + "_key": "3e3993943317", + "markDefs": [], + "children": [ + { + "_key": "803ff12c1b77", + "_type": "span", + "marks": [], + "text": "> "This is an exciting program that can help us make use of curated pipelines to advance open science. I don't mind repeating the program!" - John Juma (mentor)" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "ffbce387b7bb", + "children": [ + { + "text": "", + "_key": "5f553cd34b95", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "asset": { + "_ref": "image-4d5f2f1331d74bcba295e81c065d0c95cb4a0848-3246x1820-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Screenshot of final zoom meetup", + "_key": "c842eac6b09f" + }, + { + "style": "normal", + "_key": "82a583b16afd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a0158ad4433f" + } + ], + "_type": "block" + }, + { + "_key": "812c162381d7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As the Nextflow and nf-core communities continue to grow, the mentorship program will have long-term benefits beyond those that are immediately measurable. Mentees from the program are already acting as positive role models and contributing new perspectives to the wider community. Additionally, some mentees are interested in being mentors in the future and will undoubtedly support others as our communities continue to grow.", + "_key": "b7f58b2e02a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "1a81b9d201c2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b3a0554a611a" + }, + { + "_key": "417226b675a8", + "markDefs": [ + { + "_key": "6754093172a9", + "_type": "link", + "href": "https://nf-co.re/mentorships" + } + ], + "children": [ + { + "marks": [], + "text": "We were delighted with the high quality of this year’s mentors and mentees. Stay tuned for information about the next round of the Nextflow and nf-core mentorship program. Applications for round 2 will open on October 1, 2022. See ", + "_key": "8bef7e9fa93b", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "6754093172a9" + ], + "text": "https://nf-co.re/mentorships", + "_key": "6218f4af465d" + }, + { + "_type": "span", + "marks": [], + "text": " for details.", + "_key": "497f552b3da6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "dcaa2822c17d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e7ff87ca2f33" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/mentorships", + "_key": "cfda61ac796d" + } + ], + "children": [ + { + "text": "Mentorship Round 2 - Details", + "_key": "acc9dcde16c1", + "_type": "span", + "marks": [ + "cfda61ac796d" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "f7c90e2b1f56" + } + ], + "_updatedAt": "2024-09-25T14:16:33Z", + "_rev": "mvya9zzDXWakVjnX4hhad8", + "title": "Nextflow and nf-core mentorship, Round 1", + "publishedAt": "2022-09-18T06:00:00.000Z", + "_id": "6915a16d4440", + "meta": { + "slug": { + "current": "czi-mentorship-round-1" + } + }, + "_type": "blogPost", + "author": { + "_ref": "chris-hakkaart", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:16:33Z" + }, + { + "_createdAt": "2024-09-25T14:17:56Z", + "author": { + "_type": "reference", + "_ref": "5bLgfCKN00diCN0ijmWOAg" + }, + "_id": "6d6127d902c3", + "_type": "blogPost", + "_updatedAt": "2024-09-27T10:13:03Z", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "0dc59615d57f", + "markDefs": [], + "children": [ + { + "_key": "3b60c698a278", + "_type": "span", + "marks": [], + "text": "In my journey with the nf-core Mentorship Program, I've mentored individuals from Malawi, Chile, and Brazil, guiding them through Nextflow and nf-core. Despite the distances, my mentees successfully adapted their workflows, contributing to the open-source community. Witnessing the transformative impact of mentorship firsthand, I'm encouraged to continue participating in future mentorship efforts and urge others to join this rewarding experience. But how did it all start?" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "69d5df6acb53", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c33a98011e34" + } + ] + }, + { + "style": "normal", + "_key": "9398bcbc919e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e8dada4aa053", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "ead5b667cc13", + "_type": "span", + "marks": [], + "text": "I’m " + }, + { + "_key": "e85c1aba945a", + "_type": "span", + "marks": [ + "e8f940ae3802" + ], + "text": "Robert Petit" + }, + { + "marks": [], + "text": ", a bioinformatician at the ", + "_key": "ae7d9903f0e1", + "_type": "span" + }, + { + "text": "Wyoming Public Health Laboratory", + "_key": "7d2ee4584a81", + "_type": "span", + "marks": [ + "c26095ecdf29" + ] + }, + { + "_key": "641061945d7b", + "_type": "span", + "marks": [], + "text": ", in " + }, + { + "_type": "span", + "marks": [ + "f07b5078bb0d" + ], + "text": "Wyoming, USA", + "_key": "6d31312321b6" + }, + { + "marks": [], + "text": ". If you don’t know where that is, haha that’s fine, I’m pretty sure half the people in the US don’t know either! Wyoming is the 10th largest US state (253,000 km2), but the least populated with only about 580,000 people. It’s home to some very beautiful mountains and national parks, large animals including bears, wolves and the fastest land animal in the northern hemisphere, the Pronghorn. But it’s rural, can get cold (-10 C) and the high wind speeds (somedays average 50 kmph, with gusts 100+ kmph) only make it feel colder during the winter (sometimes feeling like -60 C to -40 C). You might be wondering:", + "_key": "e8a73eab3efa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "602f8a079238", + "markDefs": [ + { + "_type": "link", + "href": "https://www.robertpetit.com/", + "_key": "e8f940ae3802" + }, + { + "_type": "link", + "href": "https://health.wyo.gov/publichealth/lab/", + "_key": "c26095ecdf29" + }, + { + "_key": "f07b5078bb0d", + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Wyoming" + } + ] + }, + { + "style": "normal", + "_key": "183dc9f0842d", + "markDefs": [], + "children": [ + { + "_key": "763f9a1339a0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "How did some random person from Wyoming get involved in the nf-core Mentorship Program, and end up being the only mentor to have participated in all three rounds?", + "_key": "28ef13de5ac3" + } + ], + "_type": "block", + "style": "normal", + "_key": "b823579e8daf" + }, + { + "_type": "block", + "style": "normal", + "_key": "b67499b92943", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2376a0fcc76e", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d0c7e0b61f06", + "markDefs": [ + { + "_key": "f1b65c099f8c", + "_type": "link", + "href": "https://staphopia.github.io/" + }, + { + "_type": "link", + "href": "https://bactopia.github.io/latest/", + "_key": "1f0fec1e9e9e" + } + ], + "children": [ + { + "marks": [], + "text": "I’ve been in the Nextflow world for over 7 years now (as of 2024), when I first converted a pipeline, ", + "_key": "c17e55bdaf6b", + "_type": "span" + }, + { + "text": "Staphopia", + "_key": "cb6a3634af3a", + "_type": "span", + "marks": [ + "f1b65c099f8c" + ] + }, + { + "_key": "b07ed5174555", + "_type": "span", + "marks": [], + "text": " from Ruffus to Nextflow. Eventually, I would develop " + }, + { + "_key": "3a3e6ebacb22", + "_type": "span", + "marks": [ + "1f0fec1e9e9e" + ], + "text": "Bactopia" + }, + { + "_type": "span", + "marks": [], + "text": ", one of the leading and longest maintained (5 years now!) Nextflow pipelines for the analysis of Bacterial genomes. Through Bactopia, I’ve had the opportunity to help people all around the world get started using Nextflow and analyzing their own bacterial sequencing. It has also allowed me to make numerous contributions to nf-core, mostly through the nf-core/modules. So, when I heard about the opportunity to be a mentor in the nf-core’s Mentorship Program, I immediately applied.", + "_key": "1f224387c02c" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "97301639f76a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e572a4da93b" + }, + { + "style": "normal", + "_key": "fc337052ecea", + "markDefs": [], + "children": [ + { + "_key": "5657300c7580", + "_type": "span", + "marks": [], + "text": "Round 1! To be honest, I didn’t know what to expect from the program. Only that I would help a mentee with whatever they needed related to Nextflow and nf-core. Then at the first meeting, I learned I would be working with Phil Ashton the Lead Bioinformatcian at Malawi Liverpool Wellcome Trust, in Blantyre, Malawi, and immediately sent him a “Yo!”. Phil and I had run into each other in the past because when it comes to bacterial genomics, the field is very small! Phil’s goal was to get Nextflow pipelines running on their infrastructure in Malawi to help with their public health response. We would end up using Bactopia as the model. But this mentorship wasn’t just about “running Bactopia”, for Phil it was important we built a basic understanding of how things are working on the back-end with Nextflow. In the end, Phil was able to get Nextflow, and Bactopia running, using Singularity, but also gain a better understanding of Nextflow by writing his own Nextflow code." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "4f39ca963c6c", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b2b725c162de", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "33d89ba8d240", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/microbialds/hantaflow", + "_key": "0de57dfb4074" + } + ], + "children": [ + { + "text": "Round 2! When Round 2 was announced, I didn’t hesitate to apply again as a mentor. This time, I would be paired up with Juan Ugalde, an Assistant Professor at Universidad Andres Bello in Santiago, Chile. I think Juan and I were both excited by this, as similar to Phil, Juan and I had run into each other (virtually) through MetaSub, a project to sequence samples taken from public transport systems across the globe. Like many during the COVID-19 pandemic, Juan was pulled into the response, during which he began looking into Nextflow for other viruses. In particular, hantavirus, a public health concern due to it being endemic in parts of Chile. Juan had developed a pipeline for hantavirus sequence analysis, and his goal was to convert it into Nextflow. Throughout this Juan got to learn about the nf-core community and Nextflow development, which he was successful at! As he was able to convert his pipeline into Nextflow and make it publicly available as ", + "_key": "a15ed800ea80", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "0de57dfb4074" + ], + "text": "hantaflow", + "_key": "5fc1f827003e" + }, + { + "marks": [], + "text": ".", + "_key": "d923718782de", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5a3bfd26fbef", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b1589c7e94d8" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/icaromsc/nf-core-phiflow", + "_key": "ab8dcfb23774" + } + ], + "children": [ + { + "text": "Round 3! Well Round 3 almost didn’t happen for me, but I’m glad it did happen! At the first meeting, I learned I would be paired with Ícaro Maia Santos de Castro, at the time a PhD candidate at the University of São Paulo, in São Paulo, Brazil. We quickly learned we were both fans of One Piece, as Ícaro’s GitHub picture was Luffy from One Piece, haha and my background included a poster from One Piece. With Ícaro, we were starting with the basics of Nextflow (e.g. the nf-core training materials) with the goal of writing a Nextflow pipeline for his meta-transcriptomics dissertation work. We set the goal to develop his Nextflow pipeline, before an overseas move he had a few months away. He brought so many questions, his motivation never waned, and once he was asking questions about Channel Operators, I knew he was ready to write his pipeline. While writing his pipeline he learned about the nf-core/tools and also got to submit a new recipe to Bioconda, and modules to nf-core. By the end of the mentorship, Ícaro had succeeded in writing his pipeline in Nextflow and making it publicly available at ", + "_key": "dfdc9c7f7881", + "_type": "span", + "marks": [] + }, + { + "_key": "64c6328f197f", + "_type": "span", + "marks": [ + "ab8dcfb23774" + ], + "text": "phiflow" + }, + { + "_key": "83b4ca8079dc", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "2016f0500100" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "27d5a7f38ebb" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2f1896e5357" + }, + { + "asset": { + "_ref": "image-ac9edf0b33b767e3d5ae83df7661adf22fb8ae7d-3458x1556-png", + "_type": "reference" + }, + "_type": "image", + "alt": "phiflow diagram", + "_key": "3cc0b441355f" + }, + { + "_key": "d451aa97e874", + "markDefs": [ + { + "_key": "7ae9b3c6174a", + "_type": "link", + "href": "https://github.com/icaromsc/nf-core-phiflow" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Metromap of the ", + "_key": "be80593b0452" + }, + { + "_type": "span", + "marks": [ + "7ae9b3c6174a" + ], + "text": "phiflow", + "_key": "8eb3fcb05697" + }, + { + "marks": [], + "text": " workflow", + "_key": "07e33d1f7284", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "5041f794f5c4", + "markDefs": [], + "children": [ + { + "_key": "0c00eb93ae44", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "718b14cb74b7" + } + ], + "_type": "block", + "style": "normal", + "_key": "fe47592c7a26" + }, + { + "_type": "block", + "style": "normal", + "_key": "7508a8543b25", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Through all three rounds, I had the opportunity to work with some incredible people! But the awesomeness didn’t end with my mentees. One thing that always stuck out to me was how motivated everyone was, both mentees and mentors. There was a sense of excitement and real progress was being made by every group. After the first round ended, I remember thinking to myself, “how could it get better?” Haha, well it did, and it continued to get better and better in Rounds 2 and 3. I think this is a great testament to the organizers at nf-core that put it all together, the mentors and mentees, and the community behind Nextflow and nf-core.", + "_key": "8deb67fdf9d1" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b43572673c00", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "06d67670cc5a" + }, + { + "_type": "block", + "style": "normal", + "_key": "cc347b391b33", + "markDefs": [], + "children": [ + { + "_key": "630b1dff3a3f", + "_type": "span", + "marks": [], + "text": "For the future mentees in mentorship opportunities! Please don’t let yourself stop you from applying. Whether it’s a time issue, or a fear of not having enough experience to be productive. In each round, we’ve had people from all over the world, starting from the ground with no experience, to some mentees in which I wondered if maybe they should have been a mentor (some mentees did end up being a mentor in the last round!). As a mentee, it is a great opportunity to work directly with a mentor dedicated to seeing you grow and build confidence when it comes to Nextflow and bioinformatics. In addition, you will be introduced to the incredible community that is behind Nextflow and nf-core. I think you will quickly learn there are so many people in this community that are willing to help!" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "68594baa30bc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "28d9a2855132" + } + ] + }, + { + "_key": "971c65a06813", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the future mentors! It’s always awesome to be able to help others learn, but sometimes the mentor needs to learn too! For me, I found the nf-core Mentorship Program to be a great opportunity to improve my skills as a mentor. But it wasn’t just from working with my mentees. During each round I was surrounded by many great role models in the form of mentors and mentees to learn from. No two groups ever had the same goals, so you really get the chance to see so many different styles of mentorship being implemented, all producing significant results for each mentee. Like I told the mentees, if the opportunity comes up again, take the chance and apply to be a mentor!", + "_key": "afe3fc0d9230" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f91b49c8b279", + "markDefs": [], + "children": [ + { + "_key": "c238548ad66f", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "2e888d29ed15", + "markDefs": [], + "children": [ + { + "_key": "de423013e3c7", + "_type": "span", + "marks": [], + "text": "There have now been three rounds of the nf-core Mentorship Program, and I am very proud to have been a mentor in each round! During this I have learned so much and been able to help my mentees and the community grow. I look forward to seeing what the future holds for the mentorship opportunities in the Nextflow community, and I encourage potential mentors and mentees to consider joining the program!" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a4c996e21cae", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html", + "_key": "d906fdfaf3fa" + } + ], + "children": [ + { + "_key": "22bfa1a73a1c0", + "_type": "span", + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it " + }, + { + "marks": [ + "d906fdfaf3fa" + ], + "text": "here", + "_key": "22bfa1a73a1c1", + "_type": "span" + }, + { + "text": ".", + "_key": "22bfa1a73a1c2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote" + } + ], + "meta": { + "description": "In my journey with the nf-core Mentorship Program, I’ve mentored individuals from Malawi, Chile, and Brazil, guiding them through Nextflow and nf-core. Despite the distances, my mentees successfully adapted their workflows, contributing to the open-source community. Witnessing the transformative impact of mentorship firsthand, I’m encouraged to continue participating in future mentorship efforts and urge others to join this rewarding experience. But how did it all start?", + "slug": { + "current": "empowering-bioinformatics-mentoring" + } + }, + "title": "Empowering bioinformatics: mentoring across continents with Nextflow", + "_rev": "mvya9zzDXWakVjnX4hhI78", + "publishedAt": "2024-04-25T06:00:00.000Z" + }, + { + "_updatedAt": "2024-10-02T13:57:26Z", + "publishedAt": "2015-11-13T07:00:00.000Z", + "_type": "blogPost", + "meta": { + "slug": { + "current": "mpi-like-execution-with-nextflow" + }, + "description": "The main goal of Nextflow is to make workflows portable across different computing platforms taking advantage of the parallelisation features provided by the underlying system without having to reimplement your application code." + }, + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "18b241212ab3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The main goal of Nextflow is to make workflows portable across different computing platforms taking advantage of the parallelisation features provided by the underlying system without having to reimplement your application code.", + "_key": "26b376a1b637" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2c45a3c23878" + } + ], + "_type": "block", + "style": "normal", + "_key": "2b11f92ab457", + "markDefs": [] + }, + { + "style": "normal", + "_key": "267cad31a7e5", + "markDefs": [ + { + "href": "http://www.univa.com", + "_key": "7d38f135cd38", + "_type": "link" + }, + { + "_key": "3734f7f8eaab", + "_type": "link", + "href": "http://www.ibm.com/systems/platformcomputing/products/lsf/" + }, + { + "_type": "link", + "href": "https://computing.llnl.gov/linux/slurm/", + "_key": "d1ca6dc98d3f" + }, + { + "_key": "44dbda4ff0bd", + "_type": "link", + "href": "http://www.pbsworks.com/Product.aspx?id=1" + }, + { + "_key": "b755ed703228", + "_type": "link", + "href": "http://www.adaptivecomputing.com/products/open-source/torque/" + } + ], + "children": [ + { + "marks": [], + "text": "From the beginning Nextflow has included executors designed to target the most popular resource managers and batch schedulers commonly used in HPC data centers, such as ", + "_key": "ec8d4e637254", + "_type": "span" + }, + { + "_key": "58dd43b02265", + "_type": "span", + "marks": [ + "7d38f135cd38" + ], + "text": "Univa Grid Engine" + }, + { + "text": ", ", + "_key": "6c144329be4a", + "_type": "span", + "marks": [] + }, + { + "text": "Platform LSF", + "_key": "bf2f3c556ed8", + "_type": "span", + "marks": [ + "3734f7f8eaab" + ] + }, + { + "_key": "03849e5b2e1c", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "d1ca6dc98d3f" + ], + "text": "SLURM", + "_key": "7321fee6436f" + }, + { + "text": ", ", + "_key": "d21596555c73", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "44dbda4ff0bd" + ], + "text": "PBS", + "_key": "c8b73ca7f55e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "5b9c1fe80b1d" + }, + { + "_key": "a04e2c840dd9", + "_type": "span", + "marks": [ + "b755ed703228" + ], + "text": "Torque" + }, + { + "text": ".", + "_key": "50bcbb99e92e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5c250ce56adc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "34c29179aa81" + }, + { + "_type": "block", + "style": "normal", + "_key": "55462f27ab40", + "markDefs": [], + "children": [ + { + "_key": "369d28565d38", + "_type": "span", + "marks": [], + "text": "When using one of these executors Nextflow submits the computational workflow tasks as independent job requests to the underlying platform scheduler, specifying for each of them the computing resources needed to carry out its job." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ae6714a67e4b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5b4398d8cef4" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "This approach works well for workflows that are composed of long running tasks, which is the case of most common genomic pipelines.", + "_key": "6b6b8ac17a09", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e806b61390ed", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ba4ba1c958b4" + } + ], + "_type": "block", + "style": "normal", + "_key": "de57cd2037a0" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However this approach does not scale well for workloads made up of a large number of short-lived tasks (e.g., a few seconds or sub-seconds). In this scenario the resource manager scheduling time is much longer than the actual task execution time, thus resulting in an overall execution time that is much longer than the real execution time. In some cases, this represents an unacceptable waste of computing resources.", + "_key": "74c9bf327fbe" + } + ], + "_type": "block", + "style": "normal", + "_key": "d1c617472e02" + }, + { + "children": [ + { + "text": "", + "_key": "3e794adccba4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "36827c166c3c", + "markDefs": [] + }, + { + "children": [ + { + "_key": "16d0ed68b051", + "_type": "span", + "marks": [], + "text": "Moreover supercomputers, such as " + }, + { + "_key": "a634040eb6b9", + "_type": "span", + "marks": [ + "fedd25215f86" + ], + "text": "MareNostrum" + }, + { + "_type": "span", + "marks": [], + "text": " in the ", + "_key": "96891b265529" + }, + { + "marks": [ + "53fba5e52e1f" + ], + "text": "Barcelona Supercomputer Center (BSC)", + "_key": "7ea4f84fd081", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", are optimized for memory distributed applications. In this context it is needed to allocate a certain amount of computing resources in advance to run the application in a distributed manner, commonly using the ", + "_key": "39519ee8f7f5" + }, + { + "_type": "span", + "marks": [ + "994a254ef762" + ], + "text": "MPI", + "_key": "f93838a93623" + }, + { + "_type": "span", + "marks": [], + "text": " standard.", + "_key": "3a3cab4d4271" + } + ], + "_type": "block", + "style": "normal", + "_key": "b9760bc5bcc4", + "markDefs": [ + { + "_type": "link", + "href": "https://www.bsc.es/marenostrum-support-services/mn3", + "_key": "fedd25215f86" + }, + { + "href": "https://www.bsc.es/", + "_key": "53fba5e52e1f", + "_type": "link" + }, + { + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Message_Passing_Interface", + "_key": "994a254ef762" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "cffadd887b4a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "fdb60c6ac80d" + }, + { + "children": [ + { + "_key": "6faa6ccb5f63", + "_type": "span", + "marks": [], + "text": "In this scenario, the Nextflow execution model was far from optimal, if not unfeasible." + } + ], + "_type": "block", + "style": "normal", + "_key": "e16918ea234e", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3f24538e4449", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6fab5f4aaeca" + }, + { + "_key": "64984df2d680", + "markDefs": [], + "children": [ + { + "text": "Distributed execution", + "_key": "fc289a941fc5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "6ff2ec6264ea", + "markDefs": [ + { + "href": "https://ignite.apache.org/", + "_key": "df34fbf5e2a4", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this reason, since the release 0.16.0, Nextflow has implemented a new distributed execution model that greatly improves the computation capability of the framework. It uses ", + "_key": "951c8e652bf5" + }, + { + "_type": "span", + "marks": [ + "df34fbf5e2a4" + ], + "text": "Apache Ignite", + "_key": "f0c2f3639373" + }, + { + "text": ", a lightweight clustering engine and in-memory data grid, which has been recently open sourced under the Apache software foundation umbrella.", + "_key": "cc365f889907", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "941b4c4b0718", + "markDefs": [], + "children": [ + { + "_key": "26ebd332450e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "543312592f73", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When using this feature a Nextflow application is launched as if it were an MPI application. It uses a job wrapper that submits a single request specifying all the needed computing resources. The Nextflow command line is executed by using the ", + "_key": "ce049f0ddb89", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "mpirun", + "_key": "b73453ccc8f4", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " utility, as shown in the example below:", + "_key": "c14861b6e852" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8cb1944c4aba" + } + ], + "_type": "block", + "style": "normal", + "_key": "6c33c133721c" + }, + { + "code": "#!/bin/bash\n#$ -l virtual_free=120G\n#$ -q \n#$ -N \n#$ -pe ompi \nmpirun --pernode nextflow run -with-mpi [pipeline parameters]", + "_type": "code", + "_key": "a23141118ff2" + }, + { + "_key": "5babfd6362ac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This tool spawns a Nextflow instance in each of the computing nodes allocated by the cluster manager.", + "_key": "7507b30ec8eb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "09407123f4d4", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "a6af10e3f79c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Each Nextflow instance automatically connects with the other peers creating an ", + "_key": "aeb172651643" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "private", + "_key": "4c09ad14ad49" + }, + { + "text": " internal cluster, thanks to the Apache Ignite clustering feature that is embedded within Nextflow itself.", + "_key": "7fea7111a787", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "24808682a47c" + }, + { + "_type": "block", + "style": "normal", + "_key": "aea030729a84", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0e7a2b2fc6c4", + "_type": "span" + } + ] + }, + { + "_key": "d4f7d24339e0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The first node becomes the application driver that manages the execution of the workflow application, submitting the tasks to the remaining nodes that act as workers.", + "_key": "69b5ae7d02d5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6e320c67f753", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "47eca6c9c7be", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "fd3b53f681b6", + "markDefs": [], + "children": [ + { + "_key": "9576d4d70ec6", + "_type": "span", + "marks": [], + "text": "When the application is complete, the Nextflow driver automatically shuts down the Nextflow/Ignite cluster and terminates the job execution." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "39ae0f301223", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a38ffce2fb0a" + }, + { + "_type": "image", + "alt": "Nextflow distributed execution", + "_key": "ec95622b2912", + "asset": { + "_ref": "image-0358cbbcb995ce9a740f0594df4290f7476612ce-640x512-png", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "_key": "fe8126b636d1", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e6e66d7e981" + }, + { + "_type": "block", + "style": "h3", + "_key": "6de3cc8d2478", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Conclusion", + "_key": "5c2668429c96", + "_type": "span" + } + ] + }, + { + "_key": "966f46280b35", + "markDefs": [], + "children": [ + { + "text": "In this way it is possible to deploy a Nextflow workload in a supercomputer using an execution strategy that resembles the MPI distributed execution model. This doesn't require to implement your application using the MPI api/library and it allows you to maintain your code portable across different execution platforms.", + "_key": "a71037747f43", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "7d2a949b4827", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d12140bade84", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Although we do not currently have a performance comparison between a Nextflow distributed execution and an equivalent MPI application, we assume that the latter provides better performance due to its low-level optimisation.", + "_key": "6ffdc50adb21" + } + ], + "_type": "block", + "style": "normal", + "_key": "8cf23fbae9ea" + }, + { + "_type": "block", + "style": "normal", + "_key": "3763d2691239", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "eed5fa04416c" + } + ] + }, + { + "style": "normal", + "_key": "ce531c8bbe94", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow, however, focuses on the fast prototyping of scientific applications in a portable manner while maintaining the ability to scale and distribute the application workload in an efficient manner in an HPC cluster.", + "_key": "74c1eb8844b3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "925537a87958", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7b3e331f3a77" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This allows researchers to validate an experiment, quickly, reusing existing tools and software components. This eventually makes it possible to implement an optimized version using a low-level programming language in the second stage of a project.", + "_key": "097efbe18283" + } + ], + "_type": "block", + "style": "normal", + "_key": "3c8e508c69eb" + }, + { + "_type": "block", + "style": "normal", + "_key": "9606b4a84b23", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "12ba954572e2" + } + ] + }, + { + "_key": "7e6df95505d0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/ignite.html#execution-with-mpi", + "_key": "128364db09b7" + } + ], + "children": [ + { + "text": "Read the documentation to learn more about the ", + "_key": "e41f5c4b6f31", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "128364db09b7" + ], + "text": "Nextflow distributed execution model", + "_key": "d3f357c2333b", + "_type": "span" + }, + { + "text": ".", + "_key": "c733dcb9517d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ], + "_createdAt": "2024-09-25T14:14:58Z", + "_id": "6fe626744c3d", + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_rev": "Ot9x7kyGeH5005E3MJ9MgC", + "title": "MPI-like distributed execution with Nextflow", + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "3f0ce6141fe0" + } + ] + }, + { + "_rev": "rsIQ9Jd8Z4nKBVUruy4Wgy", + "_updatedAt": "2024-10-02T13:07:03Z", + "title": "Conda support has landed!", + "publishedAt": "2018-06-05T06:00:00.000Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "Nextflow aims to ease the development of large scale, reproducible workflows allowing developers to focus on the main application logic and to rely on best community tools and best practices.", + "_key": "e4b551d5f424", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8e1c781f5ec3" + }, + { + "_type": "block", + "style": "normal", + "_key": "dcc930a16cc3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "c2aebdad7124", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this reason we are very excited to announce that the latest Nextflow version (", + "_key": "6a8eb6043710" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "0.30.0", + "_key": "415330266c17" + }, + { + "text": ") finally provides built-in support for ", + "_key": "71b461737937", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "39b964772ae4" + ], + "text": "Conda", + "_key": "36ed138aec6e", + "_type": "span" + }, + { + "_key": "2b940c73a6b9", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "594f5b83ef3b", + "markDefs": [ + { + "_key": "39b964772ae4", + "_type": "link", + "href": "https://conda.io/docs/" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "237b6fb6c889", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "eca3ea36cda2" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://bioconda.github.io", + "_key": "3972c697d8eb" + }, + { + "_key": "3c5575677800", + "_type": "link", + "href": "https://biobuilds.org/" + } + ], + "children": [ + { + "text": "Conda is a popular package manager that simplifies the installation of software packages and the configuration of complex software environments. Above all, it provides access to large tool and software package collections maintained by domain specific communities such as ", + "_key": "8da796d4a02d", + "_type": "span", + "marks": [] + }, + { + "_key": "126fbfff61f5", + "_type": "span", + "marks": [ + "3972c697d8eb" + ], + "text": "Bioconda" + }, + { + "text": " and ", + "_key": "6d6dc5f05f7c", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "3c5575677800" + ], + "text": "BioBuild", + "_key": "bd989f0a0af4", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "362b40365c6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "86ba305a5cf6" + }, + { + "_key": "bdf458020ca9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d1443ccc9f9a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "636a94f2dc78", + "_type": "span", + "marks": [], + "text": "The native integration with Nextflow allows researchers to develop workflow applications in a rapid and easy repeatable manner, reusing community tools, whilst taking advantage of the configuration flexibility, portability and scalability provided by Nextflow." + } + ], + "_type": "block", + "style": "normal", + "_key": "8d2ff7a252a9" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b9c247c9f3b0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "935a6d1a1d1b" + }, + { + "markDefs": [], + "children": [ + { + "text": "How it works", + "_key": "bb547be66da9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "d0c1004d0559" + }, + { + "markDefs": [], + "children": [ + { + "_key": "892c9465fd99", + "_type": "span", + "marks": [], + "text": "Nextflow automatically creates and activates the Conda environment(s) given the dependencies specified by each process." + } + ], + "_type": "block", + "style": "normal", + "_key": "705e94df49b1" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "240c830f4bd5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "347aa5fcd432" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/process.html#conda", + "_key": "28d0caca2c0f" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Dependencies are specified by using the ", + "_key": "3ab14c702633" + }, + { + "_type": "span", + "marks": [ + "28d0caca2c0f" + ], + "text": "conda", + "_key": "60f32ae06d7f" + }, + { + "_type": "span", + "marks": [], + "text": " directive, providing either the names of the required Conda packages, the path of a Conda environment yaml file or the path of an existing Conda environment directory.", + "_key": "d404c5c777ba" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc319cfc02d4" + }, + { + "_type": "block", + "style": "normal", + "_key": "e3fa1e5294a7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "938184641d1d", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Conda environments are stored on the file system. By default Nextflow instructs Conda to save the required environments in the pipeline work directory. You can specify the directory where the Conda environments are stored using the ", + "_key": "e72873bc6d31", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "conda.cacheDir", + "_key": "8b5e3d7c4e9b" + }, + { + "marks": [], + "text": " configuration property.", + "_key": "c988c4e87355", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c540f99c0f27", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "288b8d057085", + "markDefs": [], + "children": [ + { + "text": "\n", + "_key": "f4ab4e8334c3", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "text": "Use Conda package names", + "_key": "4b55a8decd8d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "1d08e0a0a186", + "markDefs": [] + }, + { + "style": "normal", + "_key": "6d100f5ea5e8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The simplest way to use one or more Conda packages consists in specifying their names using the ", + "_key": "f7157c59efa9" + }, + { + "_key": "22a792c4a534", + "_type": "span", + "marks": [ + "code" + ], + "text": "conda" + }, + { + "_key": "5a1b0a625aaf", + "_type": "span", + "marks": [], + "text": " directive. Multiple package names can be specified by separating them with a space. For example:" + } + ], + "_type": "block" + }, + { + "_key": "9e1355c93a5d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "fd929d230c24", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "process foo {\n conda \"bwa samtools multiqc\"\n\n \"\"\"\n your_command --here\n \"\"\"\n}", + "_type": "code", + "_key": "c69c76e353f4" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "2440a13e29d0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f7fc39c33959", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0dfe954377ed", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Using the above definition a Conda environment that includes BWA, Samtools and MultiQC tools is created and activated when the process is executed.", + "_key": "538d52903d80", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7e2dbbeaec64" + } + ], + "_type": "block", + "style": "normal", + "_key": "161fc6ca34ae" + }, + { + "_type": "block", + "style": "normal", + "_key": "dbc2e50963a7", + "markDefs": [], + "children": [ + { + "text": "The usual Conda package syntax and naming conventions can be used. The version of a package can be specified after the package name as shown here: ", + "_key": "9703b9139acb", + "_type": "span", + "marks": [] + }, + { + "text": "bwa=0.7.15", + "_key": "934afae29289", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": ".", + "_key": "041526567843", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f30f8854828b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9a56d457a79c" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The name of the channel where a package is located can be specified prefixing the package with the channel name as shown here: ", + "_key": "66d9c16b777a" + }, + { + "text": "bioconda::bwa=0.7.15", + "_key": "22bdee518abe", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": ".", + "_key": "31d3ffff0f30", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "019ecd1d0037" + }, + { + "style": "normal", + "_key": "e324d1d431c2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "77b481802943" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "09dcdcfb9f33", + "markDefs": [], + "children": [ + { + "_key": "50121e1d3843", + "_type": "span", + "marks": [], + "text": "Use Conda environment files" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "When working in a project requiring a large number of dependencies it can be more convenient to consolidate all required tools using a Conda environment file. This is a file that lists the required packages and channels, structured using the YAML format. For example:", + "_key": "2ab570ef9d18", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "464a85127087", + "markDefs": [] + }, + { + "_key": "ec5f4c03fda2", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "57c6b7e93f56", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "name: my-env\nchannels:\n - bioconda\n - conda-forge\n - defaults\ndependencies:\n - star=2.5.4a\n - bwa=0.7.15", + "_type": "code", + "_key": "3af9d15db043" + }, + { + "_type": "block", + "style": "normal", + "_key": "3cb3ee8f4d7e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "90209221496e", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The path of the environment file can be specified using the ", + "_key": "00a881a6e462" + }, + { + "text": "conda", + "_key": "2a752b118912", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "219de28fc9c7", + "_type": "span", + "marks": [], + "text": " directive:" + } + ], + "_type": "block", + "style": "normal", + "_key": "d1da0ceb0b5f" + }, + { + "_type": "block", + "style": "normal", + "_key": "fc89e01355d0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0135da8d77aa", + "_type": "span" + } + ] + }, + { + "code": "process foo {\n conda '/some/path/my-env.yaml'\n\n '''\n your_command --here\n '''\n}", + "_type": "code", + "_key": "02fde70de661" + }, + { + "markDefs": [], + "children": [ + { + "text": "Note: the environment file name ", + "_key": "97f4fc922077", + "_type": "span", + "marks": [] + }, + { + "_key": "8e7b6545afe4", + "_type": "span", + "marks": [ + "strong" + ], + "text": "must" + }, + { + "_type": "span", + "marks": [], + "text": " end with a ", + "_key": "791414332dd6" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".yml", + "_key": "7dfb057dcc51" + }, + { + "_key": "46454509315f", + "_type": "span", + "marks": [], + "text": " or " + }, + { + "_key": "cdfda04da021", + "_type": "span", + "marks": [ + "code" + ], + "text": ".yaml" + }, + { + "text": " suffix otherwise it won't be properly recognized. Also relative paths are resolved against the workflow launching directory.", + "_key": "90814a5cce02", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "31114b76c635" + }, + { + "style": "normal", + "_key": "c916c6c3eb84", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e87a28b3a9ff", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "b6675befb31c", + "markDefs": [], + "children": [ + { + "_key": "d7397645a0e3", + "_type": "span", + "marks": [], + "text": "The suggested approach is to store the the Conda environment file in your project root directory and reference it in the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "15ed72b3f3fd" + }, + { + "text": " directory using the ", + "_key": "90e6f7f0d97f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "baseDir", + "_key": "22caeabb7f94" + }, + { + "_type": "span", + "marks": [], + "text": " variable as shown below:", + "_key": "d16f7c149cff" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "e005eb442107", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc09a3a6a53d" + }, + { + "_key": "5204ab192f28", + "code": "process.conda = \"$baseDir/my-env.yaml\"", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ac8c9619d49a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "21779cb8c938" + }, + { + "children": [ + { + "marks": [], + "text": "This guarantees that the environment paths is correctly resolved independently of the execution path.", + "_key": "492d3ffe6c89", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "562416248e72", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "e2f943e042cb", + "markDefs": [], + "children": [ + { + "_key": "cf72e4a8ec53", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "blockquote", + "_key": "e7d4a6a08bd2", + "markDefs": [ + { + "_key": "66d59a50a7a4", + "_type": "link", + "href": "/docs/latest/conda.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See the ", + "_key": "683de47047d8" + }, + { + "_key": "1616c8038e9c", + "_type": "span", + "marks": [ + "66d59a50a7a4" + ], + "text": "documentation" + }, + { + "_key": "41217c97018c", + "_type": "span", + "marks": [], + "text": " for more details on how to configure and use Conda environments in your Nextflow workflow." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1c44a834e6db" + } + ], + "_type": "block", + "style": "normal", + "_key": "858518382baa" + }, + { + "_type": "block", + "style": "h2", + "_key": "4bfa23325312", + "markDefs": [], + "children": [ + { + "text": "Bonus!", + "_key": "561e05bafaa7", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://biocontainers.pro/", + "_key": "15b3eb805afd" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This release includes also a better support for ", + "_key": "6bc93514cb23" + }, + { + "_type": "span", + "marks": [ + "15b3eb805afd" + ], + "text": "Biocontainers", + "_key": "d8d8c188144f" + }, + { + "text": ". So far, Nextflow users were able to use container images provided by the Biocontainers community. However, it was not possible to collect process metrics and runtime statistics within those images due to the usage of a legacy version of the ", + "_key": "aa9139f3fd7d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "ps", + "_key": "9bdf8e100481" + }, + { + "_key": "b09bee9bfd48", + "_type": "span", + "marks": [], + "text": " system tool that is not compatible with the one expected by Nextflow." + } + ], + "_type": "block", + "style": "normal", + "_key": "c5997755f2df" + }, + { + "_type": "block", + "style": "normal", + "_key": "cd730902c8f9", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "14f001d46337", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "da99f5d82733", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The latest version of Nextflow does not require the ", + "_key": "0a8c4a6ec222", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "ps", + "_key": "2a2b6e490217" + }, + { + "_key": "911f5128b219", + "_type": "span", + "marks": [], + "text": " tool any more to fetch execution metrics and runtime statistics, therefore this information is collected and correctly reported when using Biocontainers images." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "c243f5ce2b09", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b6127601042d", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "a9ff6a7721da" + } + ], + "_type": "block", + "style": "h2", + "_key": "a2f83771207c" + }, + { + "children": [ + { + "_key": "a17640ed2289", + "_type": "span", + "marks": [], + "text": "We are very excited by this new feature bringing the ability to use popular Conda tool collections, such as Bioconda, directly into Nextflow workflow applications." + } + ], + "_type": "block", + "style": "normal", + "_key": "fd196e1679d4", + "markDefs": [] + }, + { + "style": "normal", + "_key": "9071da705084", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "877f9b3f5ed5", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "7c4830edabe0", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/process.html#module", + "_key": "50a3bd45ad1f" + }, + { + "_key": "092299e5493a", + "_type": "link", + "href": "/docs/latest/docker.html" + }, + { + "_type": "link", + "href": "/docs/latest/singularity.html", + "_key": "e3f18233e8cb" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow developers have now yet another option to transparently manage the dependencies in their workflows along with ", + "_key": "032d4f84a990" + }, + { + "_type": "span", + "marks": [ + "50a3bd45ad1f" + ], + "text": "Environment Modules", + "_key": "dcf1d1a14c31" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "1bbf662db980" + }, + { + "_type": "span", + "marks": [ + "092299e5493a" + ], + "text": "containers", + "_key": "4770201cd0f8" + }, + { + "_type": "span", + "marks": [], + "text": " ", + "_key": "e75526c363e9" + }, + { + "marks": [ + "e3f18233e8cb" + ], + "text": "technology", + "_key": "2771d68d121d", + "_type": "span" + }, + { + "marks": [], + "text": ", giving them great configuration flexibility.", + "_key": "046d54b327b8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7af51a444e2f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "9780b0688114", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "fd14b4fbb5ed", + "markDefs": [], + "children": [ + { + "text": "The resulting workflow applications can easily be reconfigured and deployed across a range of different platforms choosing the best technology according to the requirements of the target system.", + "_key": "2cdf92b5eb3d", + "_type": "span", + "marks": [] + } + ] + } + ], + "_createdAt": "2024-09-25T14:15:27Z", + "_id": "7154b9b83959", + "_type": "blogPost", + "meta": { + "description": "Nextflow aims to ease the development of large scale, reproducible workflows allowing developers to focus on the main application logic and to rely on best community tools and best practices.", + "slug": { + "current": "conda-support-has-landed" + } + }, + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "422fb1709920" + } + ], + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + } + }, + { + "_id": "728b501cc4c7", + "title": "6 Tips for Setting Up Your Nextflow Dev Environment", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "7d27c5f42160" + } + ], + "_rev": "2PruMrLMGpvZP5qAknmBKj", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "c8216ae8bc07", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html).", + "_key": "baea13971feb" + } + ] + }, + { + "style": "normal", + "_key": "35773849ebf9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "cd1a014b34b1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "384e07386d8b", + "markDefs": [], + "children": [ + { + "text": "This guide is designed to walk you through a basic development setup for writing Nextflow pipelines.", + "_key": "b29577b408a3", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "945b045b1063", + "markDefs": [], + "children": [ + { + "_key": "2b7b71ab9d3d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "cc4455ab3db1", + "markDefs": [], + "children": [ + { + "text": "1. Installation", + "_key": "185122270abc", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the ", + "_key": "16de800a7e79" + }, + { + "_key": "a1d7e7c33a03", + "_type": "span", + "marks": [ + "2b1b6ebcc748" + ], + "text": "Windows Subsystem for Linux" + }, + { + "_type": "span", + "marks": [], + "text": ". Installing Nextflow is straightforward. You just need to download the ", + "_key": "f31aeb8e728e" + }, + { + "_key": "5f6016136103", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow" + }, + { + "text": " executable. In your terminal type the following commands:", + "_key": "66ee62e7ede1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "bf29c31a1d5b", + "markDefs": [ + { + "_key": "2b1b6ebcc748", + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/wsl/install-win10" + } + ] + }, + { + "style": "normal", + "_key": "d17b4a46e121", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "569811fd6eeb", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "1c814b48a55e", + "code": "$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin", + "_type": "code" + }, + { + "_key": "32801bc1ff15", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0b5e04bc88fb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "abf34f69104d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note ", + "_key": "e71dc74e32c9", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "/usr/local/bin", + "_key": "d4abc629acf6" + }, + { + "_type": "span", + "marks": [], + "text": " is the default for MacOS, you might want to choose ", + "_key": "860380200f6a" + }, + { + "text": "~/bin", + "_key": "1b23d2c9ca53", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "d546ea964f49" + }, + { + "marks": [ + "code" + ], + "text": "/usr/bin", + "_key": "87bbce39dff5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " depending on your PATH definition and operating system.", + "_key": "7acb44128d55" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "19639c682cb3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c0e9e5d7361b" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "2. Text Editor or IDE?", + "_key": "09786655aa47", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "fd5c0a16b777" + }, + { + "style": "normal", + "_key": "596bd6fbae5a", + "markDefs": [], + "children": [ + { + "_key": "82913152fe33", + "_type": "span", + "marks": [], + "text": "Nextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b57868bf9f63" + } + ], + "_type": "block", + "style": "normal", + "_key": "f6d267b17671", + "markDefs": [] + }, + { + "children": [ + { + "text": "My current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With ", + "_key": "6a0ba850b5e6", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e946c83f2eb8" + ], + "text": "VSCode installed", + "_key": "323c79f69f0f" + }, + { + "_type": "span", + "marks": [], + "text": ", you can search for the Nextflow extension in the marketplace.", + "_key": "475cc7978db9" + } + ], + "_type": "block", + "style": "normal", + "_key": "140b54ba282a", + "markDefs": [ + { + "href": "https://code.visualstudio.com/download", + "_key": "e946c83f2eb8", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "80b91bcce9ba" + } + ], + "_type": "block", + "style": "normal", + "_key": "c5783c52cac8" + }, + { + "asset": { + "_type": "reference", + "_ref": "image-46057944042068bf75b8c10ecf400e2a2813f736-1600x966-png" + }, + "_type": "image", + "alt": "VSCode with Nextflow Syntax Highlighting", + "_key": "a25636a4b194" + }, + { + "_type": "block", + "style": "normal", + "_key": "227c18352d20", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e83f71ae33db", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7fd3d6ab271d", + "markDefs": [], + "children": [ + { + "text": "Other syntax highlighting has been made available by the community including:", + "_key": "1cdfa72fad84", + "_type": "span", + "marks": [] + } + ] + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://atom.io/packages/language-nextflow", + "_key": "f03597c5e24f", + "_type": "link" + } + ], + "children": [ + { + "_key": "836772c4f5d80", + "_type": "span", + "marks": [ + "f03597c5e24f" + ], + "text": "Atom" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "f97e2741119a" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "777bc95be6af", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/LukeGoodsell/nextflow-vim", + "_key": "cb549090dfbd" + } + ], + "children": [ + { + "text": "Vim", + "_key": "15a967de3eac0", + "_type": "span", + "marks": [ + "cb549090dfbd" + ] + } + ] + }, + { + "style": "normal", + "_key": "4d3f58f59c4b", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/Emiller88/nextflow-mode", + "_key": "6032a900b326", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "6032a900b326" + ], + "text": "Emacs", + "_key": "3588a809d9770" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "b412b511a7ae", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9bea3e344026" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "4c371ac95ade", + "_type": "span", + "marks": [], + "text": "3. The Nextflow REPL console" + } + ], + "_type": "block", + "style": "h2", + "_key": "45830c179224" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts.", + "_key": "650c3df33eee", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bb19bdaecf5b" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4d5e7c19b132" + } + ], + "_type": "block", + "style": "normal", + "_key": "e448cf228140" + }, + { + "_key": "f6f4b6393b47", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Start the Nextflow console with the following command:", + "_key": "3ce365642e62" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ae7b461d75a7", + "markDefs": [], + "children": [ + { + "_key": "89da3ec9e8a0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "222a73d8b62f", + "code": "$ nextflow console", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "12c5de946f00", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "34372c2d80f6" + } + ] + }, + { + "_type": "image", + "alt": "Nextflow REPL console", + "_key": "a9cb994c6431", + "asset": { + "_ref": "image-6f3138f809d97af5e3a85e3ab2497fd0176b8e75-1174x810-png", + "_type": "reference" + } + }, + { + "_key": "cea3a2ad26a3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "46d3b977f70e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use the ", + "_key": "5c606cfc35d6" + }, + { + "text": "CTRL+R", + "_key": "d153eb3b5345", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "60e393c29855", + "_type": "span", + "marks": [], + "text": " keyboard shortcut to run (" + }, + { + "_key": "0ab12a8bb048", + "_type": "span", + "marks": [ + "code" + ], + "text": "⌘+R" + }, + { + "_key": "9a3c425c5b29", + "_type": "span", + "marks": [], + "text": "on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Run selection", + "_key": "f7c37edd2f60" + }, + { + "marks": [], + "text": ".", + "_key": "4b2cfd278f1e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "98cf3b40d8f8" + }, + { + "style": "normal", + "_key": "e1be84c5755a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "884c64999d32", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "868fd834d18f", + "markDefs": [], + "children": [ + { + "text": "4. Containerize all the things", + "_key": "3fbcc4c2e122", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "8b6e7e88a0ba", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Containers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.", + "_key": "ff8176eae89e" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fca42ce6f204" + } + ], + "_type": "block", + "style": "normal", + "_key": "396d31dc18f8" + }, + { + "style": "normal", + "_key": "221f5fdd90e0", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.docker.com/engine/install/", + "_key": "c8cbc8462485" + } + ], + "children": [ + { + "text": "Once you have ", + "_key": "3038f38b6545", + "_type": "span", + "marks": [] + }, + { + "_key": "2be77154c648", + "_type": "span", + "marks": [ + "c8cbc8462485" + ], + "text": "downloaded and installed Docker" + }, + { + "_key": "a261b0a8af1c", + "_type": "span", + "marks": [], + "text": ", try pull a public docker image:" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "6179d57c67cd", + "markDefs": [], + "children": [ + { + "_key": "16639c0ef72d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "8b095c2ed71c", + "code": "$ docker pull quay.io/nextflow/rnaseq-nf" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1ef1ff3a75da", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f70b13f87b8e" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To run a Nextflow pipeline using the latest tag of the image, we can use:", + "_key": "293c61d93b7e" + } + ], + "_type": "block", + "style": "normal", + "_key": "d192baf967cb" + }, + { + "_type": "block", + "style": "normal", + "_key": "b775af109bba", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2b8220fe7194", + "_type": "span" + } + ] + }, + { + "code": "$ nextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest", + "_type": "code", + "_key": "1bcfb7419981" + }, + { + "_type": "block", + "style": "normal", + "_key": "08093eed6c02", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "398e99711b63", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "53db58d0f758", + "markDefs": [ + { + "href": "https://seqera.io/training/#_manage_dependencies_containers", + "_key": "107be351aa22", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To learn more about building Docker containers, see the ", + "_key": "f45d07e3d9f7" + }, + { + "marks": [ + "107be351aa22" + ], + "text": "Seqera Labs tutorial", + "_key": "575fcf632f19", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " on managing dependencies with containers.", + "_key": "a426844c2774" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "17addf9ba414", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "77f2cd784c47" + }, + { + "_type": "block", + "style": "normal", + "_key": "16d8f5e61dc9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.", + "_key": "3f9a4fc8cfc3" + } + ] + }, + { + "style": "normal", + "_key": "f4896c16b10f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ed6edc30a081" + } + ], + "_type": "block" + }, + { + "_key": "fc74ddac5c77", + "asset": { + "_type": "reference", + "_ref": "image-7130bbb5139e37b38e4554d3d8ae0683571479e1-1600x891-png" + }, + "_type": "image", + "alt": "VSCode with Docker Extension" + }, + { + "children": [ + { + "text": "", + "_key": "93b79a7e589d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "36e1cbe648b7", + "markDefs": [] + }, + { + "style": "h2", + "_key": "8e10bed8173a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "5. Use Tower to monitor your pipelines", + "_key": "391785249836" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://tower.nf", + "_key": "04888447b21e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "When developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With ", + "_key": "73e6016f3af2" + }, + { + "_type": "span", + "marks": [ + "04888447b21e" + ], + "text": "Nextflow Tower", + "_key": "2b0b58007cc8" + }, + { + "marks": [], + "text": ", we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.", + "_key": "b44820b00a42", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2ff2e59e1d6a" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "309de3fbc39a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "93453427f64b", + "markDefs": [] + }, + { + "_key": "13812497c7a5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Sign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, ", + "_key": "d68d63824b45" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/.bashrc", + "_key": "e9c904a62fa4" + }, + { + "_key": "d7ee7be8187a", + "_type": "span", + "marks": [], + "text": ", or include them in your nextflow.config." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "08441c45b30e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b9eae55ecc18", + "_type": "span" + } + ] + }, + { + "_key": "23b87d6a070c", + "code": "$ export TOWER_ACCESS_TOKEN=my-secret-tower-key", + "_type": "code" + }, + { + "_key": "deec3f2da1ca", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ea25815762de", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "8e8dd1bac999", + "markDefs": [], + "children": [ + { + "_key": "5a8effaafbbb", + "_type": "span", + "marks": [], + "text": "We can then add the " + }, + { + "_key": "fd9519c060bb", + "_type": "span", + "marks": [ + "code" + ], + "text": "-with-tower" + }, + { + "_key": "9f033abe5cde", + "_type": "span", + "marks": [], + "text": " child-option to any Nextflow run command. A URL with the monitoring dashboard will appear." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "aa2c69a6559a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "30b17bc29b9c" + } + ], + "_type": "block" + }, + { + "code": "$ nextflow run nextflow-io/rnaseq-nf -with-tower", + "_type": "code", + "_key": "3d94f6e51c5c" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "15f91331ffef" + } + ], + "_type": "block", + "style": "normal", + "_key": "436ea22b13fc" + }, + { + "_type": "block", + "style": "h2", + "_key": "771208634a3a", + "markDefs": [], + "children": [ + { + "text": "6. nf-core tools", + "_key": "79853855730e", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "34a901105331", + "markDefs": [ + { + "_key": "b2bf9ddf48cd", + "_type": "link", + "href": "https://nf-co.re/" + } + ], + "children": [ + { + "marks": [ + "b2bf9ddf48cd" + ], + "text": "nf-core", + "_key": "a7b530278830", + "_type": "span" + }, + { + "_key": "5a92affff4a0", + "_type": "span", + "marks": [], + "text": " is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage." + } + ] + }, + { + "style": "normal", + "_key": "4c4ad91e4993", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8ed9dc297b8b" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "094c520696df", + "_type": "span", + "marks": [], + "text": "These tools are particularly useful for developers contributing to the community pipelines on " + }, + { + "text": "GitHub", + "_key": "52e4c9139deb", + "_type": "span", + "marks": [ + "e65204e3dae2" + ] + }, + { + "_type": "span", + "marks": [], + "text": " with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.", + "_key": "86235dbddeb6" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1555964fba5", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/", + "_key": "e65204e3dae2" + } + ] + }, + { + "style": "normal", + "_key": "e61e548e5237", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1cc7c456d37f" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "6f213b44e00b", + "markDefs": [], + "children": [ + { + "_key": "5457ae7b8f5e", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-core tools" + }, + { + "marks": [], + "text": " is a python package that can be installed in your development environment from Bioconda or PyPi.", + "_key": "b18abeb98d06", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "a28e8e45f0b2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3b4d9500f2a7" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "68ee1c400169", + "code": "$ conda install nf-core" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2362a6261961" + } + ], + "_type": "block", + "style": "normal", + "_key": "e995aa968870", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "or", + "_key": "3e5fc6ac8700" + } + ], + "_type": "block", + "style": "normal", + "_key": "8365c21628b4", + "markDefs": [] + }, + { + "children": [ + { + "_key": "031e985c2bd4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "42452766ba52", + "markDefs": [] + }, + { + "code": "$ pip install nf-core", + "_type": "code", + "_key": "7d90d1e8a372" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d8070892db5b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9896671f827b" + }, + { + "_type": "image", + "alt": "nf-core tools", + "_key": "9ffdd5c319f9", + "asset": { + "_ref": "image-ebd33dce7e6c15adb14e331a26bb6e647a2a299a-1450x1022-png", + "_type": "reference" + } + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "2f92dc68b5a9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "817387a8bec1", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Conclusion", + "_key": "1d57f03de998", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "ab7498283a57" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Developer workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.", + "_key": "b4dadce8b09e" + } + ], + "_type": "block", + "style": "normal", + "_key": "7290c3dc5d2a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ef3d89f74f0e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "06e1b2dce95a" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b62845ca9057", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/features/codespaces", + "_key": "e74354fc9ce9" + }, + { + "_key": "217845d136cd", + "_type": "link", + "href": "https://www.gitpod.io/" + } + ], + "children": [ + { + "marks": [], + "text": "Solutions such as ", + "_key": "177e7be968a9", + "_type": "span" + }, + { + "_key": "46572964aade", + "_type": "span", + "marks": [ + "e74354fc9ce9" + ], + "text": "GitHub Codespaces" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "413a457618a4" + }, + { + "_type": "span", + "marks": [ + "217845d136cd" + ], + "text": "Gitpod", + "_key": "0b9fb34520ef" + }, + { + "_type": "span", + "marks": [], + "text": " are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!", + "_key": "843335091dc0" + } + ] + } + ], + "_type": "blogPost", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_updatedAt": "2024-10-02T15:35:31Z", + "publishedAt": "2021-03-04T07:00:00.000Z", + "meta": { + "description": "This guide is designed to walk you through a basic development setup for writing Nextflow pipelines.", + "slug": { + "current": "nextflow-developer-environment" + } + }, + "_createdAt": "2024-09-25T14:16:02Z" + }, + { + "_id": "744f5b6f1509", + "title": "Leveraging nf-test for enhanced quality control in nf-core", + "_createdAt": "2024-09-25T14:18:27Z", + "_type": "blogPost", + "body": [ + { + "_type": "block", + "style": "h1", + "_key": "d78dc1997976", + "children": [ + { + "text": "The ever-changing landscape of bioinformatics", + "_key": "d942023c82c4", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Reproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is ", + "_key": "f1dbb3706962" + }, + { + "_key": "15bf44e9649f", + "_type": "span", + "marks": [ + "strong" + ], + "text": "hopefully" + }, + { + "_type": "span", + "marks": [], + "text": " being updated, and pipelines are ", + "_key": "aafd8d3af8cf" + }, + { + "text": "ideally", + "_key": "45a0fa5cde67", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": " being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing.", + "_key": "52abfb103fca", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7b9073c525fc", + "markDefs": [] + }, + { + "_key": "53aca8a9dac0", + "children": [ + { + "_key": "d4535dfd980f", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d0c633c915c7", + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "The Wonderful World of Testing", + "_key": "11007d896b7d" + } + ], + "_type": "block", + "style": "h1", + "_key": "c367b0ab0276" + }, + { + "style": "normal", + "_key": "210473de2d41", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> "Software testing is the process of evaluating and verifying that a software product does what it is supposed to do," > Lukas Forer, co-creator of nf-test.", + "_key": "3b6dc1f09c1e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4f79eb3f6d53" + } + ], + "_type": "block", + "style": "normal", + "_key": "63710736a2bc" + }, + { + "markDefs": [], + "children": [ + { + "_key": "d9d5dd72b69a", + "_type": "span", + "marks": [], + "text": "Software testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs across runs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are introduced during an update." + } + ], + "_type": "block", + "style": "normal", + "_key": "e45d889f3e33" + }, + { + "_type": "block", + "style": "normal", + "_key": "48023f7774ed", + "children": [ + { + "text": "", + "_key": "a339a1397c94", + "_type": "span" + } + ] + }, + { + "_key": "77444d1ae950", + "children": [ + { + "_type": "span", + "text": "Writing effective tests", + "_key": "8d8ad1bbc95f" + } + ], + "_type": "block", + "style": "h1" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Although having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups:", + "_key": "881094ed647a" + } + ], + "_type": "block", + "style": "normal", + "_key": "6d8a94206b0a" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0cb15a73ca49" + } + ], + "_type": "block", + "style": "normal", + "_key": "9cb16e3704fe" + }, + { + "_key": "7419bcb18756", + "listItem": "bullet", + "children": [ + { + "text": "Which inputs/functionalities should be tested?", + "_key": "2dfed6160324", + "_type": "span", + "marks": [] + }, + { + "text": "What contents should be tested?", + "_key": "64d4f141b363", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "b5bac8bcfc4b", + "children": [ + { + "text": "", + "_key": "65e27d3e4316", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "2cbeb7565520", + "children": [ + { + "_type": "span", + "text": "Consideration 1: Testing inputs/functionality", + "_key": "b8c60e3b6e65" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c8994834851d", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/modules/fastqc", + "_key": "a0534635bae5" + }, + { + "_type": "link", + "href": "https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/", + "_key": "3c4057a25f4a" + }, + { + "_type": "link", + "href": "https://nf-co.re/modules/bowtie2_align", + "_key": "f0209f4f4a5b" + } + ], + "children": [ + { + "marks": [], + "text": "Generally, software will have a default or most common use case. For instance, the nf-core ", + "_key": "919508a2c099", + "_type": "span" + }, + { + "text": "FastQC", + "_key": "ef668a2e4d1e", + "_type": "span", + "marks": [ + "a0534635bae5" + ] + }, + { + "text": " module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage (", + "_key": "aeee8dd56191", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "3c4057a25f4a" + ], + "text": "\"the degree to which a test or set of tests exercises a particular program or system\"", + "_key": "5d3e2b5da7de" + }, + { + "text": "), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the ", + "_key": "be7231b257a1", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "f0209f4f4a5b" + ], + "text": "bowtie2/align", + "_key": "212c74902511", + "_type": "span" + }, + { + "_key": "b12ec65bbf06", + "_type": "span", + "marks": [], + "text": " module, aside from input files, the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "save_unaligned", + "_key": "fc97720fa814" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "dd7961a71879" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "sort_bam", + "_key": "5de54b50cd12" + }, + { + "_type": "span", + "marks": [], + "text": " parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process.", + "_key": "212ceb1406f9" + } + ] + }, + { + "style": "normal", + "_key": "bacac6bb3c60", + "children": [ + { + "text": "", + "_key": "8a580e492e3b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "2b8b54e164e7", + "children": [ + { + "_type": "span", + "text": "Consideration 2: Testing outputs", + "_key": "3c702cc05af7" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "d8bf4f58b93d", + "_type": "span", + "marks": [], + "text": "Once test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible (for example, outputs containing timestamps or paths). In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced." + } + ], + "_type": "block", + "style": "normal", + "_key": "e60a951811d0", + "markDefs": [] + }, + { + "_key": "14103c699745", + "children": [ + { + "_key": "8771e780fa93", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "37768657518a", + "children": [ + { + "_type": "span", + "text": "Testing in nf-core", + "_key": "4afd6843bf3c" + } + ], + "_type": "block", + "style": "h1" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "nf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs.", + "_key": "b3cbf377efe7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "307231c984e3" + }, + { + "_type": "block", + "style": "normal", + "_key": "bc361a1713a3", + "children": [ + { + "_key": "5abfa10ccb3e", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "New and Improved Nextflow Testing with nf-test", + "_key": "2192751d77ec" + } + ], + "_type": "block", + "style": "h1", + "_key": "bc5a624d3dcd" + }, + { + "_key": "303f74c980d9", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/lukfor", + "_key": "ea58be3576b6" + }, + { + "_type": "link", + "href": "https://github.com/seppinho", + "_key": "8c5a2d63478e" + } + ], + "children": [ + { + "marks": [], + "text": "Created by ", + "_key": "8091b5798ed0", + "_type": "span" + }, + { + "text": "Lukas Forer", + "_key": "ad25834d621e", + "_type": "span", + "marks": [ + "ea58be3576b6" + ] + }, + { + "_key": "c2fe24f9c682", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "text": "Sebastian Schönherr", + "_key": "58b81f8f4fc9", + "_type": "span", + "marks": [ + "8c5a2d63478e" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform:", + "_key": "50bf9532434d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "ff83c2672764", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "76b89eba4f1e" + }, + { + "_key": "dfbd6e0d8974", + "listItem": "bullet", + "children": [ + { + "_key": "079bcd68a42f", + "_type": "span", + "marks": [], + "text": "**Comprehensive Output Testing**: nf-test employs [snapshots](https://www.nf-test.com/docs/assertions/snapshots/) for handling complex data structures. This feature evaluates the contents of any specified output channel/file, enabling comprehensive and reliable tests that ensure data integrity following changes." + }, + { + "text": "**A Consistent Testing Framework for All Nextflow Components**: nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components.", + "_key": "16a489b4c8ef", + "_type": "span", + "marks": [] + }, + { + "text": "**A DSL for Tests**: Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing.", + "_key": "7d2befaefec3", + "_type": "span", + "marks": [] + }, + { + "_key": "06f9fee4cb61", + "_type": "span", + "marks": [], + "text": "**Readable Assertions**: nf-test offers a wide range of functions for writing clear and understandable [assertions](https://www.nf-test.com/docs/assertions/assertions/), improving the clarity and maintainability of tests." + }, + { + "text": "**Boilerplate Code Generation**: To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests.", + "_key": "f0963e4d83c7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "06810366b339", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4a26455d8bab" + }, + { + "_key": "876496675fc6", + "children": [ + { + "_type": "span", + "text": "But wait… there's more!", + "_key": "cf76043522dc" + } + ], + "_type": "block", + "style": "h1" + }, + { + "_type": "block", + "style": "normal", + "_key": "a8e934b8ac79", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/docs/contributing/tutorials/nf-test_assertions", + "_key": "97c2d0c992bc" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have nf-test implemented. Moreover, an ", + "_key": "14a82e18cb55" + }, + { + "_key": "9540de112e8e", + "_type": "span", + "marks": [ + "97c2d0c992bc" + ], + "text": "expanding collection of examples" + }, + { + "marks": [], + "text": " is available to guide you through adopting nf-test for your projects.", + "_key": "dec1bf20f8ee", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "e74b89ea6a7f", + "children": [ + { + "_key": "a8a1702d13f8", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h1", + "_key": "4753e2f6d96c", + "children": [ + { + "_type": "span", + "text": "Adding nf-test to pipelines", + "_key": "c53de44fa7a3" + } + ] + }, + { + "_key": "72046114f094", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/methylseq/", + "_key": "2bc70c7e1e4a" + }, + { + "_type": "link", + "href": "https://nf-co.re/fetchngs", + "_key": "25b095497343" + }, + { + "_type": "link", + "href": "https://nf-co.re/mag", + "_key": "48976d9a8e15" + }, + { + "href": "https://nf-co.re/sarek", + "_key": "1d0f2c26d973", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/readsimulator", + "_key": "ea366d3939f7" + }, + { + "_key": "cd398e3a9de8", + "_type": "link", + "href": "https://nf-co.re/rnaseq" + } + ], + "children": [ + { + "marks": [], + "text": "Several nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, ", + "_key": "dae2a6460660", + "_type": "span" + }, + { + "marks": [ + "2bc70c7e1e4a" + ], + "text": "nf-core/methylseq", + "_key": "58b808e431e5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future. Following these insights, ", + "_key": "b169e0266c05" + }, + { + "_key": "9491d804515e", + "_type": "span", + "marks": [ + "25b095497343" + ], + "text": "nf-core/fetchngs" + }, + { + "_type": "span", + "marks": [], + "text": " has been at the forefront of incorporating nf-test for testing modules, subworkflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other nf-core pipelines actively integrating nf-test include ", + "_key": "5b97775f9d58" + }, + { + "_key": "a2a3a9f7d95e", + "_type": "span", + "marks": [ + "48976d9a8e15" + ], + "text": "mag" + }, + { + "_key": "85dbda17aa5b", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_key": "5f4a059f43a0", + "_type": "span", + "marks": [ + "1d0f2c26d973" + ], + "text": "sarek" + }, + { + "_key": "02f7c17112f3", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "ea366d3939f7" + ], + "text": "readsimulator", + "_key": "848074bd3030" + }, + { + "_key": "6ddfae3991d4", + "_type": "span", + "marks": [], + "text": ", and " + }, + { + "_type": "span", + "marks": [ + "cd398e3a9de8" + ], + "text": "rnaseq", + "_key": "f76585352138" + }, + { + "marks": [], + "text": ".", + "_key": "6fd0667e457a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "19ec73d4cd35" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f7c48b90963" + }, + { + "_key": "ebf55d38bd1b", + "children": [ + { + "_type": "span", + "text": "Pipeline development with nf-test", + "_key": "2cff63f2bde3" + } + ], + "_type": "block", + "style": "h1" + }, + { + "markDefs": [ + { + "href": "https://github.com/nf-core/phageannotator", + "_key": "ee9a093ad86d", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended", + "_key": "f4ddf2b9eb21" + }, + { + "marks": [], + "text": ". An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is ", + "_key": "ca5d8030582c", + "_type": "span" + }, + { + "_key": "56c93a20e628", + "_type": "span", + "marks": [ + "ee9a093ad86d" + ], + "text": "phageannotator" + }, + { + "_key": "b8e2adc05dee", + "_type": "span", + "marks": [], + "text": ". Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected. For those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience:" + } + ], + "_type": "block", + "style": "normal", + "_key": "c08bc039b1c1" + }, + { + "style": "normal", + "_key": "deef7bf9f138", + "children": [ + { + "_key": "97e2307b7d93", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "69c39eb6d0a1", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "**Leverage nf-core modules/subworkflows extensively**. Devoting time early to contribute modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests.", + "_key": "b073ac7697ce" + }, + { + "_key": "26b9639eae8f", + "_type": "span", + "marks": [], + "text": "**Prioritize incremental changes over large overhauls**. Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable." + }, + { + "_type": "span", + "marks": [], + "text": "**Facilitate parallel execution of nf-test to generate and test snapshots**. By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or updating snapshots time-consuming. Implementing scripts that allow tests to run in parallel—whether via a workload manager or in the cloud—can significantly save time and simplify the process of monitoring tests for pass or fail outcomes.", + "_key": "c68a4e9702ce" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e048140c3bf6", + "children": [ + { + "_key": "763aef5e2d70", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "h1", + "_key": "54ce450dcb3d", + "children": [ + { + "_type": "span", + "text": "Community and contribution", + "_key": "587bc5c8a72c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c3699a07845f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "nf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds true as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition:", + "_key": "85a59be968d2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "65f5164823c3" + } + ], + "_type": "block", + "style": "normal", + "_key": "c3a333241466" + }, + { + "style": "normal", + "_key": "b0b2ec3b4840", + "listItem": "bullet", + "children": [ + { + "text": "**Adding nf-tests to new and existing nf-core modules/subworkflows**. There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful [instructions](https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test) to the website, which has made this process much more streamlined.", + "_key": "ad21fd896097", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "**Adding nf-tests to nf-core pipelines**. Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started ([fetchngs](https://github.com/nf-core/fetchngs/tree/master), [sarek](https://github.com/nf-core/sarek/tree/master), [rnaseq](https://github.com/nf-core/rnaseq/tree/master), [readsimulator](https://github.com/nf-core/readsimulator/tree/master), [phageannotator](https://github.com/nf-core/phageannotator)).", + "_key": "afbd5a4bac78" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "4f2b43c685b2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "784c4553719e" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c215e4be18ec", + "_type": "span", + "marks": [], + "text": "> These are great areas to work on & contribute in nf-core hackathons" + } + ], + "_type": "block", + "style": "normal", + "_key": "d466715b8643" + }, + { + "_type": "block", + "style": "normal", + "_key": "6522171924d9", + "children": [ + { + "text": "", + "_key": "eac11bf38236", + "_type": "span" + } + ] + }, + { + "_key": "7d5cc9279a82", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/2024/hackathon-march-2024", + "_key": "03e0d409c794" + } + ], + "children": [ + { + "_key": "0a96540be983", + "_type": "span", + "marks": [], + "text": "The nf-core community added a significant number of nf-tests during the recent " + }, + { + "text": "hackathon in March 2024", + "_key": "ef1f53c9d9c3", + "_type": "span", + "marks": [ + "03e0d409c794" + ] + }, + { + "marks": [], + "text": ". Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about a much more effective, transparent, and inclusive system than if we worked in isolation.", + "_key": "940548547241", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7ae0a16a710e" + } + ], + "_type": "block", + "style": "normal", + "_key": "c465e55981f4" + }, + { + "_type": "block", + "style": "h1", + "_key": "9ac796b501ea", + "children": [ + { + "_key": "240475bef07f", + "_type": "span", + "text": "Future directions" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Looking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management.", + "_key": "e93f3483694e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "01906c795cdb", + "markDefs": [] + }, + { + "_key": "317325ae8be1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7e000d3777f7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h1", + "_key": "48c17981c567", + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "39c5d50b6759" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "cee9c104b1b5", + "markDefs": [], + "children": [ + { + "text": "The integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research.", + "_key": "86bf517ba8af", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "95ad7f7bb3b8", + "children": [ + { + "_key": "7ec5f81ab34f", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Special thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration.", + "_key": "c5069006f601" + } + ], + "_type": "block", + "style": "normal", + "_key": "0474e6707fcd", + "markDefs": [] + } + ], + "author": { + "_ref": "OWAhkDWC92JN5AHHJ7pVfj", + "_type": "reference" + }, + "_rev": "Ot9x7kyGeH5005E3MJ9YZ5", + "meta": { + "slug": { + "current": "nf-test-in-nf-core" + } + }, + "_updatedAt": "2024-09-25T14:18:27Z", + "publishedAt": "2024-04-03T06:00:00.000Z" + }, + { + "_createdAt": "2024-09-25T14:15:55Z", + "publishedAt": "2021-10-21T06:00:00.000Z", + "tags": [], + "meta": { + "slug": { + "current": "configure-git-repositories-with-nextflow" + } + }, + "_rev": "mvya9zzDXWakVjnX4hhZR4", + "body": [ + { + "_key": "db83426eb157", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Git has become the de-facto standard for source-code version control system and has seen increasing adoption across the spectrum of software development.", + "_key": "25e9e166af3e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "29f610d8a848", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "4a6de1edd4c3" + }, + { + "_type": "block", + "style": "normal", + "_key": "29f19ed5e94a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow provides builtin support for Git and most popular Git hosting platforms such as GitHub, GitLab and Bitbucket between the others, which streamline managing versions and track changes in your pipeline projects and facilitate the collaboration across different users.", + "_key": "c5ff8338cd22", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "57a0732420d3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "64aebe8532db" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "In order to access public repositories Nextflow does not require any special configuration, just use the ", + "_key": "27f00eedb788", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "em" + ], + "text": "http", + "_key": "cc37780843f8", + "_type": "span" + }, + { + "text": " URL of the pipeline project you want to run in the run command, for example:", + "_key": "01543c3deb83", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "84a961710db0" + }, + { + "children": [ + { + "text": "", + "_key": "793f65ffd2b3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "52cae3774c01" + }, + { + "_type": "code", + "_key": "5c4641be7645", + "code": "nextflow run https://github.com/nextflow-io/hello" + }, + { + "_key": "b65a378a75db", + "children": [ + { + "_type": "span", + "text": "", + "_key": "eb553f73fabb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "e4cbd05c0ff5", + "_type": "span", + "marks": [], + "text": "However to allow Nextflow to access private repositories you will need to specify the repository credentials, and the server hostname in the case of self-managed Git server installations." + } + ], + "_type": "block", + "style": "normal", + "_key": "dc5d295ad894", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d54c60ae3560" + } + ], + "_type": "block", + "style": "normal", + "_key": "add6deba8dc8" + }, + { + "children": [ + { + "_type": "span", + "text": "Configure access to private repositories", + "_key": "c0664fa07f41" + } + ], + "_type": "block", + "style": "h2", + "_key": "81e3472dff70" + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/edge/sharing.html", + "_key": "539fde056be1", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "This is done through a file name ", + "_key": "1dd67091500d", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "scm", + "_key": "14476fc10cc8" + }, + { + "_type": "span", + "marks": [], + "text": " placed in the ", + "_key": "e62a48c00ab4" + }, + { + "text": "$HOME/.nextflow/", + "_key": "5c31a972ccf6", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " directory, containing the credentials and other details for accessing a particular Git hosting solution. You can refer to the Nextflow documentation for all the ", + "_key": "b1923adff285", + "_type": "span" + }, + { + "_key": "e7b03d8523a4", + "_type": "span", + "marks": [ + "539fde056be1" + ], + "text": "SCM configuration file" + }, + { + "marks": [], + "text": " options.", + "_key": "d0173bb389a5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce8af238c95f" + }, + { + "children": [ + { + "_key": "f931db99a858", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6b80e41f9eaa" + }, + { + "style": "normal", + "_key": "87e9b1d949eb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "All of these platforms have their own authentication mechanisms for Git operations which are captured in the ", + "_key": "8cda311afc36" + }, + { + "text": "$HOME/.nextflow/scm", + "_key": "07f9a474d9b9", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "160de1b5bebc", + "_type": "span", + "marks": [], + "text": " file with the following syntax:" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e7c4e7658470", + "children": [ + { + "_key": "165d3fc21c0a", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "b0cba06a0f77", + "code": "providers {\n\n '' {\n user = value\n password = value\n ...\n }\n\n '' {\n user = value\n password = value\n ...\n }\n\n}", + "_type": "code" + }, + { + "style": "normal", + "_key": "ecdc9cfc00a9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8e1bdb0f2423" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "f604b6a11bb4", + "_type": "span", + "marks": [], + "text": "Note: Make sure to enclose the provider name with " + }, + { + "_key": "57226d246dad", + "_type": "span", + "marks": [ + "code" + ], + "text": "'" + }, + { + "_key": "b3079fed105a", + "_type": "span", + "marks": [], + "text": " if it contains a " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-", + "_key": "1e082162579f" + }, + { + "_type": "span", + "marks": [], + "text": " or a blank character.", + "_key": "98955c7ee38d" + } + ], + "_type": "block", + "style": "normal", + "_key": "4ac7ecccab00", + "markDefs": [] + }, + { + "style": "normal", + "_key": "0f8bdb917351", + "children": [ + { + "text": "", + "_key": "1ff057deab1c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As of the 21.09.0-edge release, Nextflow integrates with the following Git providers:", + "_key": "105ad94ebac2" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa77ab205f1c" + }, + { + "children": [ + { + "_key": "3e33240e0286", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "971dca3b837d" + }, + { + "_type": "block", + "style": "h2", + "_key": "0a71528f2249", + "children": [ + { + "_type": "span", + "text": "GitHub", + "_key": "32ba83c0388d" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "57c762fa5d33", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com", + "_key": "f16cd57b9321" + }, + { + "href": "https://github.com/nf-core/", + "_key": "c3f35671c23c", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "f16cd57b9321" + ], + "text": "GitHub", + "_key": "f38fe88ae894", + "_type": "span" + }, + { + "_key": "a5520c2c8497", + "_type": "span", + "marks": [], + "text": " is one of the most well known Git providers and is home to some of the most popular open-source Nextflow pipelines from the " + }, + { + "marks": [ + "c3f35671c23c" + ], + "text": "nf-core", + "_key": "7416702faf2b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " community project.", + "_key": "74ca1a4ba214" + } + ] + }, + { + "_key": "e7f45456da3c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7b8e1bf4a5d7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "7145f788c16c", + "markDefs": [], + "children": [ + { + "text": "If you wish to use Nextflow code from a ", + "_key": "268f28dfc39d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "public", + "_key": "7083c0d403e8" + }, + { + "_key": "fea4807986e2", + "_type": "span", + "marks": [], + "text": " repository hosted on GitHub.com, then you don't need to provide credentials (" + }, + { + "text": "user", + "_key": "d763e55c075d", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " and ", + "_key": "ab9488ca8dcf", + "_type": "span" + }, + { + "_key": "7680a9eb1acf", + "_type": "span", + "marks": [ + "code" + ], + "text": "password" + }, + { + "text": ") to pull code from the repository. However, if you wish to interact with a private repository or are running into GitHub API rate limits for public repos, then you must provide elevated access to Nextflow by specifying your credentials in the ", + "_key": "0d1c4e9c305e", + "_type": "span", + "marks": [] + }, + { + "text": "scm", + "_key": "24b11b423013", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "b23692c605cd", + "_type": "span", + "marks": [], + "text": " file." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5d2ee70dcef2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "310d483b51c6" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "It is worth noting that ", + "_key": "558521fd3e0e", + "_type": "span" + }, + { + "text": "GitHub recently phased out Git password authentication", + "_key": "dde2ae8f1887", + "_type": "span", + "marks": [ + "583dff4d7666" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and now requires that users supply a more secure GitHub-generated ", + "_key": "162d0521b559" + }, + { + "marks": [ + "em" + ], + "text": "Personal Access Token", + "_key": "5b35024248be", + "_type": "span" + }, + { + "marks": [], + "text": " for authentication. With Nextflow, you can specify your ", + "_key": "0d3ac75d0a37", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "2b52fae60022", + "_type": "span" + }, + { + "marks": [], + "text": " in the ", + "_key": "61ef76d54e9f", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "password", + "_key": "a315e0d53c0e" + }, + { + "_type": "span", + "marks": [], + "text": " field.", + "_key": "fe7ad2d79f53" + } + ], + "_type": "block", + "style": "normal", + "_key": "b6397fe35b0c", + "markDefs": [ + { + "_key": "583dff4d7666", + "_type": "link", + "href": "https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/#what-you-need-to-do-today" + } + ] + }, + { + "style": "normal", + "_key": "097d7a5f785c", + "children": [ + { + "_key": "fe993a5ec666", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "3dda18e6c7a3", + "code": "providers {\n\n github {\n user = 'me'\n password = 'my-personal-access-token'\n }\n\n}", + "_type": "code" + }, + { + "style": "normal", + "_key": "6aeccf74c673", + "children": [ + { + "text": "", + "_key": "7cee71510d27", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token", + "_key": "1df04550aa1b" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "86baf07da30f" + }, + { + "marks": [ + "code" + ], + "text": "personal-access-token", + "_key": "0dc18ee2d07e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for the GitHub platform, follow the instructions provided ", + "_key": "02b7101844b7" + }, + { + "marks": [ + "1df04550aa1b" + ], + "text": "here", + "_key": "75acef2e4664", + "_type": "span" + }, + { + "text": ". Ensure that the token has at a minimum all the permissions in the ", + "_key": "01a36f71288b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "repo", + "_key": "16f29b294892" + }, + { + "marks": [], + "text": " scope.", + "_key": "a3196bfa2d3f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "952110a1f518" + }, + { + "_key": "331fe594e726", + "children": [ + { + "_type": "span", + "text": "", + "_key": "01bc884633a8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "11b4d051166b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you have provided your username and ", + "_key": "684c90c68f13" + }, + { + "_key": "4a951809a0f1", + "_type": "span", + "marks": [ + "em" + ], + "text": "personal access token" + }, + { + "marks": [], + "text": ", as shown above, you can test the integration by pulling the repository code.", + "_key": "18805dd1669d", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7b7a89cac57d", + "children": [ + { + "_key": "173686cce031", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "6ecadfbf3568", + "code": "nextflow pull https://github.com/user_name/private_repo" + }, + { + "_key": "aed0cdb9d840", + "children": [ + { + "_type": "span", + "text": "", + "_key": "cc67fd760b98" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "ab6e7b269dda", + "children": [ + { + "_type": "span", + "text": "Bitbucket Cloud", + "_key": "a5a0477abecb" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [ + { + "href": "https://bitbucket.org/", + "_key": "3136bacc4f44", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "3136bacc4f44" + ], + "text": "Bitbucket", + "_key": "816877ce28f9" + }, + { + "_key": "1ccdc2b55184", + "_type": "span", + "marks": [], + "text": " is a publicly accessible Git solution hosted by Atlassian. Please note that if you are using an on-premises Bitbucket installation, you should follow the instructions for " + }, + { + "marks": [ + "em" + ], + "text": "Bitbucket Server", + "_key": "c4f831c2d355", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " in the following section.", + "_key": "ccca23bab798" + } + ], + "_type": "block", + "style": "normal", + "_key": "039a387ad88f" + }, + { + "_key": "f36e589fd61e", + "children": [ + { + "text": "", + "_key": "626826a5c7ee", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "31c0836aabfd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If your Nextflow code is in a public Bitbucket repository, then you don't need to specify your credentials to pull code from the repository. However, if you wish to interact with a private repository, you need to provide elevated access to Nextflow by specifying your credentials in the ", + "_key": "28aa45aebcef" + }, + { + "_key": "90c4cfb52b00", + "_type": "span", + "marks": [ + "code" + ], + "text": "scm" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "aeb53fae60d5" + } + ] + }, + { + "children": [ + { + "_key": "bbc3c2470ac1", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "48c1901af265" + }, + { + "_key": "85e7ee14d36f", + "markDefs": [], + "children": [ + { + "_key": "d65fecd648fd", + "_type": "span", + "marks": [], + "text": "Please note that Bitbucket Cloud requires your " + }, + { + "marks": [ + "code" + ], + "text": "app password", + "_key": "75ea61922a88", + "_type": "span" + }, + { + "marks": [], + "text": " in the ", + "_key": "d4986cdb224a", + "_type": "span" + }, + { + "_key": "71f6ded09325", + "_type": "span", + "marks": [ + "code" + ], + "text": "password" + }, + { + "_key": "a5d5623fb085", + "_type": "span", + "marks": [], + "text": " field, which is different from your login password." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "799598abfae9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a75e9e9cef65" + } + ] + }, + { + "_key": "f9b7a238b925", + "code": "providers {\n\n bitbucket {\n user = 'me'\n password = 'my-app-password'\n }\n\n}", + "_type": "code" + }, + { + "style": "normal", + "_key": "09606f491acd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9646cce38d3b" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "f038d4d73450", + "markDefs": [ + { + "_type": "link", + "href": "https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/", + "_key": "3aea2cdcb17d" + } + ], + "children": [ + { + "text": "To generate an ", + "_key": "63640d5786f2", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "app password", + "_key": "063e4e96e5fd", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for the Bitbucket platform, follow the instructions provided ", + "_key": "7324690b07ac" + }, + { + "marks": [ + "3aea2cdcb17d" + ], + "text": "here", + "_key": "10fae48e6c3e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Ensure that the token has at least ", + "_key": "9b15db5cb258" + }, + { + "_key": "433e24cdf803", + "_type": "span", + "marks": [ + "code" + ], + "text": "Repositories: Read" + }, + { + "marks": [], + "text": " permission.", + "_key": "307d6dbc8066", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8ed900eeeaba", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0ce51a692912" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "86161c3906b6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once these settings are saved in ", + "_key": "7571bb64310a" + }, + { + "text": "$HOME/.nextflow/scm", + "_key": "77aa28c2d752", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", you can test the integration by pulling the repository code.", + "_key": "4874a251a8c4" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "90484fd813a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f0b5e40a9568" + }, + { + "code": "nextflow pull https://bitbucket.org/user_name/private_repo", + "_type": "code", + "_key": "6a8d2fbe8bac" + }, + { + "style": "normal", + "_key": "fcb52c9c61ec", + "children": [ + { + "_key": "04e7113d0c4a", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Bitbucket Server", + "_key": "bb871768db0a", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "716f4ada6f30" + }, + { + "style": "normal", + "_key": "a6856f439aff", + "markDefs": [ + { + "href": "https://www.atlassian.com/software/bitbucket/enterprise", + "_key": "1d753d5a4057", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "1d753d5a4057" + ], + "text": "Bitbucket Server", + "_key": "909b061e07bf", + "_type": "span" + }, + { + "text": " is a Git hosting solution from Atlassian which is meant for teams that require a self-managed solution. If Nextflow code resides in an open Bitbucket repository, then you don't need to provide credentials to pull code from this repository. However, if you wish to interact with a private repository, you need to give elevated access to Nextflow by specifying your credentials in the ", + "_key": "a2f2757daaf5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "scm", + "_key": "5cbb999b9906" + }, + { + "marks": [], + "text": " file.", + "_key": "1ac4250a9eb7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "e3ac27b19546", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ebedd31342e8" + }, + { + "markDefs": [], + "children": [ + { + "_key": "e3b5be6c0b05", + "_type": "span", + "marks": [], + "text": "For example, if you'd like to call your hosted Bitbucket server as " + }, + { + "text": "mybitbucketserver", + "_key": "bc3d5f1ad4c7", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", then you'll need to add the following snippet in your ", + "_key": "56e8677766b7" + }, + { + "text": "~/$HOME/.nextflow/scm", + "_key": "13f105e21bdc", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "d06dee549174" + } + ], + "_type": "block", + "style": "normal", + "_key": "65653a8c49de" + }, + { + "children": [ + { + "_key": "faea57da2d23", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "864eb62586ef" + }, + { + "code": "providers {\n\n mybitbucketserver {\n platform = 'bitbucketserver'\n server = 'https://your.bitbucket.host.com'\n user = 'me'\n password = 'my-password' // OR \"my-token\"\n }\n\n}", + "_type": "code", + "_key": "446030bbf92c" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e4533169e667" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d9aa4bdaa20" + }, + { + "markDefs": [ + { + "href": "https://confluence.atlassian.com/bitbucketserver/managing-personal-access-tokens-1005339986.html", + "_key": "5d34ac38c04f", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "5cbd32ae7db1" + }, + { + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "cdcb4ddf96ad", + "_type": "span" + }, + { + "text": " for Bitbucket Server, refer to the ", + "_key": "22cd80357709", + "_type": "span", + "marks": [] + }, + { + "text": "Bitbucket Support documentation", + "_key": "b809da251376", + "_type": "span", + "marks": [ + "5d34ac38c04f" + ] + }, + { + "marks": [], + "text": " from Atlassian.", + "_key": "f9a1fcc11db8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "71f44e6c3538" + }, + { + "style": "normal", + "_key": "0a76ed183ff3", + "children": [ + { + "text": "", + "_key": "e82470410c47", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "1aceafe3a6da", + "markDefs": [], + "children": [ + { + "text": "Once the configuration is saved, you can test the integration by pulling code from a private repository and specifying the ", + "_key": "8ebcdb031c6c", + "_type": "span", + "marks": [] + }, + { + "_key": "85a738fa3172", + "_type": "span", + "marks": [ + "code" + ], + "text": "mybitbucketserver" + }, + { + "text": " Git provider using the ", + "_key": "4b73ecdf6c36", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "-hub", + "_key": "8962783452b5", + "_type": "span" + }, + { + "marks": [], + "text": " option.", + "_key": "d3ac60442424", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "cf073281bdf2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9a70dedd8fd0" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "be79024fd871", + "code": "nextflow pull https://your.bitbucket.host.com/user_name/private_repo -hub mybitbucketserver" + }, + { + "_key": "4b36d96af813", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f8265188ee09" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3ee72524fbea", + "markDefs": [ + { + "_key": "a48ed8372354", + "_type": "link", + "href": "https://www.atlassian.com/migration/assess/journey-to-cloud" + }, + { + "_type": "link", + "href": "https://bitbucket.org", + "_key": "0d23da771a8b" + } + ], + "children": [ + { + "marks": [], + "text": "NOTE: It is worth noting that ", + "_key": "6e1ccc176128", + "_type": "span" + }, + { + "_key": "47c8e6a84f05", + "_type": "span", + "marks": [ + "a48ed8372354" + ], + "text": "Atlassian is phasing out the Server offering" + }, + { + "marks": [], + "text": " in favor of cloud product ", + "_key": "60a502b9f779", + "_type": "span" + }, + { + "_key": "413bdc62c318", + "_type": "span", + "marks": [ + "0d23da771a8b" + ], + "text": "bitbucket.org" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "98a7bde8f731" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "987bab6e65dd" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e9c07416673" + }, + { + "_key": "327a8f593397", + "children": [ + { + "_key": "82b184b0425b", + "_type": "span", + "text": "GitLab" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "6ac2e33d1b06", + "_type": "span", + "marks": [ + "1dc07434701a" + ], + "text": "GitLab" + }, + { + "text": " is a popular Git provider that offers features covering various aspects of the DevOps cycle.", + "_key": "3bc887cc9704", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "206e346e8e06", + "markDefs": [ + { + "href": "https://gitlab.com", + "_key": "1dc07434701a", + "_type": "link" + } + ] + }, + { + "_key": "897ad0f10a2b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0c21d1ec2bbd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you wish to run a Nextflow pipeline from a public GitLab repository, there is no need to provide credentials to pull code. However, if you wish to interact with a private repository, then you must give elevated access to Nextflow by specifying your credentials in the ", + "_key": "bb9c58537d98" + }, + { + "_key": "87515f32bd58", + "_type": "span", + "marks": [ + "code" + ], + "text": "scm" + }, + { + "text": " file.", + "_key": "36ace447715f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d6539fc8c533", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "cb07bd2ae58c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0db30d4cae4f" + }, + { + "_type": "block", + "style": "normal", + "_key": "aa2108121075", + "markDefs": [], + "children": [ + { + "_key": "0662258f4c56", + "_type": "span", + "marks": [], + "text": "Please note that you need to specify your " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "fa22cbc111a8" + }, + { + "_type": "span", + "marks": [], + "text": " in the ", + "_key": "d9ae4e3e0063" + }, + { + "marks": [ + "code" + ], + "text": "password", + "_key": "0725a64c71ee", + "_type": "span" + }, + { + "_key": "d3f0ccd67785", + "_type": "span", + "marks": [], + "text": " field." + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "806973d01fc0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "260a779bb088" + }, + { + "code": "providers {\n\n mygitlab {\n user = 'me'\n password = 'my-password' // or 'my-personal-access-token'\n token = 'my-personal-access-token'\n }\n\n}", + "_type": "code", + "_key": "bda02b722173" + }, + { + "_type": "block", + "style": "normal", + "_key": "6021faa14b33", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1080e7dd2bbf" + } + ] + }, + { + "markDefs": [ + { + "href": "https://gitlab.com", + "_key": "d827eaef861e", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In addition, you can specify the ", + "_key": "af8649a84fee" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "server", + "_key": "3ef012f71fad" + }, + { + "_key": "10cdc9fbd301", + "_type": "span", + "marks": [], + "text": " fields for your self-hosted instance of GitLab, by default " + }, + { + "_key": "4f6038105a29", + "_type": "span", + "marks": [ + "d827eaef861e" + ], + "text": "https://gitlab.com" + }, + { + "_key": "71af938525a6", + "_type": "span", + "marks": [], + "text": " is assumed as the server." + } + ], + "_type": "block", + "style": "normal", + "_key": "d7fb5c4fc836" + }, + { + "_type": "block", + "style": "normal", + "_key": "a5b18581c76f", + "children": [ + { + "_key": "202acb3801c2", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "text": "To generate a ", + "_key": "b55c9375d33c", + "_type": "span", + "marks": [] + }, + { + "_key": "28febb25b2b7", + "_type": "span", + "marks": [ + "code" + ], + "text": "personal-access-token" + }, + { + "_type": "span", + "marks": [], + "text": " for the GitLab platform follow the instructions provided ", + "_key": "05fd10f03aa3" + }, + { + "marks": [ + "dc14c6012d22" + ], + "text": "here", + "_key": "e96252c53977", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Please ensure that the token has at least ", + "_key": "dadac30f1ddb" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "read_repository", + "_key": "8b9db3d15420" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "520384d27f7e" + }, + { + "marks": [ + "code" + ], + "text": "read_api", + "_key": "7ac82b2441b3", + "_type": "span" + }, + { + "marks": [], + "text": " permissions.", + "_key": "b8e8135dafd5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c355898abbbb", + "markDefs": [ + { + "href": "https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html", + "_key": "dc14c6012d22", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b1a08fae9469", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a8fb5822193d" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Once the configuration is saved, you can test the integration by pulling the repository code using the ", + "_key": "4bed0e8af1ac", + "_type": "span" + }, + { + "text": "-hub", + "_key": "3c8199eaedc7", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "1c4b43f0462c", + "_type": "span", + "marks": [], + "text": " option." + } + ], + "_type": "block", + "style": "normal", + "_key": "16ae59b3713d" + }, + { + "_key": "79f72f13d8ee", + "children": [ + { + "_type": "span", + "text": "", + "_key": "fee13eebacc0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "1d2b5d657d1c", + "code": "nextflow pull https://gitlab.com/user_name/private_repo -hub mygitlab", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "8b8f0fc08910", + "children": [ + { + "text": "", + "_key": "f51c989b29a5", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "c17b37565aa4", + "_type": "span", + "text": "Gitea" + } + ], + "_type": "block", + "style": "h2", + "_key": "daf8ca6d282d" + }, + { + "style": "normal", + "_key": "70b9e891612b", + "markDefs": [ + { + "_type": "link", + "href": "https://gitea.com/", + "_key": "3f1005d9f1d1" + } + ], + "children": [ + { + "marks": [ + "3f1005d9f1d1" + ], + "text": "Gitea server", + "_key": "5ecb4adc69dd", + "_type": "span" + }, + { + "text": " is an open source Git-hosting solution that can be self-hosted. If you have your Nextflow code in an open Gitea repository, there is no need to specify credentials to pull code from this repository. However, if you wish to interact with a private repository, you can give elevated access to Nextflow by specifying your credentials in the ", + "_key": "dc335153b179", + "_type": "span", + "marks": [] + }, + { + "_key": "51da9d53b755", + "_type": "span", + "marks": [ + "code" + ], + "text": "scm" + }, + { + "text": " file.", + "_key": "395fcff2fefc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c929f74e6015", + "children": [ + { + "_key": "243bd3a67962", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "79c1e6e3c6dd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For example, if you'd like to call your hosted Gitea server ", + "_key": "e755bc42b798" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "mygiteaserver", + "_key": "f01d9854b6fb" + }, + { + "_type": "span", + "marks": [], + "text": ", then you'll need to add the following snippet in your ", + "_key": "f370669df5e8" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/$HOME/.nextflow/scm", + "_key": "defa4aefa9ad" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "dbf194cd62c6" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3ea614397f15", + "children": [ + { + "_type": "span", + "text": "", + "_key": "38ef7bf47a1d" + } + ], + "_type": "block" + }, + { + "code": "providers {\n\n mygiteaserver {\n platform = 'gitea'\n server = 'https://gitea.host.com'\n user = 'me'\n password = 'my-password'\n }\n\n}", + "_type": "code", + "_key": "0b75791cc2f0" + }, + { + "_key": "c508af1799de", + "children": [ + { + "_key": "b221cac08a2d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "3458d0208d0f", + "markDefs": [ + { + "href": "https://docs.gitea.io/en-us/api-usage/", + "_key": "fa597a359003", + "_type": "link" + } + ], + "children": [ + { + "text": "To generate a ", + "_key": "3e127666b123", + "_type": "span", + "marks": [] + }, + { + "_key": "3c0a60b36df9", + "_type": "span", + "marks": [ + "em" + ], + "text": "personal access token" + }, + { + "_type": "span", + "marks": [], + "text": " for your Gitea server, please refer to the ", + "_key": "9509934e123b" + }, + { + "marks": [ + "fa597a359003" + ], + "text": "official guide", + "_key": "0569600bac61", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "e6908cf372eb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "fd85a28a2e65", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5b257c2bee94" + } + ], + "_type": "block" + }, + { + "_key": "456518c081fb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once the configuration is set, you can test the integration by pulling the repository code and specifying ", + "_key": "42b9d3b98cf4" + }, + { + "text": "mygiteaserver", + "_key": "f7124397b2ca", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " as the Git provider using the ", + "_key": "c7f8576b8c9c" + }, + { + "_key": "6bfa2f347122", + "_type": "span", + "marks": [ + "code" + ], + "text": "-hub" + }, + { + "text": " option.", + "_key": "cb68b79bd9e8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "5be33f254f4a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "45872c570a11" + }, + { + "_key": "f9690d6ebad1", + "code": "nextflow pull https://git.host.com/user_name/private_repo -hub mygiteaserver", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "45ab883f1536", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c4cf796a4156" + } + ] + }, + { + "style": "h2", + "_key": "842faa05f75a", + "children": [ + { + "_type": "span", + "text": "Azure Repos", + "_key": "0ff65e0c7011" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c69cf477b4e4", + "markDefs": [ + { + "_type": "link", + "href": "https://azure.microsoft.com/en-us/services/devops/repos/", + "_key": "71a9db316375" + } + ], + "children": [ + { + "text": "Azure Repos", + "_key": "3f28783b3bc6", + "_type": "span", + "marks": [ + "71a9db316375" + ] + }, + { + "_type": "span", + "marks": [], + "text": " is a part of Microsoft Azure Cloud Suite. Nextflow integrates natively Azure Repos via the usual ", + "_key": "5975cb5c35ca" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/$HOME/.nextflow/scm", + "_key": "4ce0a21da5b5" + }, + { + "marks": [], + "text": " file.", + "_key": "f5ac8f4f2e8e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "11d7d1b61c1c", + "children": [ + { + "text": "", + "_key": "b1cf7539ee18", + "_type": "span" + } + ] + }, + { + "_key": "993155389ba5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you'd like to use the ", + "_key": "5baedc8057d2", + "_type": "span" + }, + { + "text": "myazure", + "_key": "4cc6690a6232", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " alias for the ", + "_key": "e1ad83dd3b02", + "_type": "span" + }, + { + "text": "azurerepos", + "_key": "c90b90d1cd4a", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "b4dd88033882", + "_type": "span", + "marks": [], + "text": " provider, then you'll need to add the following snippet in your " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/$HOME/.nextflow/scm", + "_key": "a7938033c173" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "a295d473fce4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ec3df0777828" + } + ], + "_type": "block", + "style": "normal", + "_key": "d03f25c33418" + }, + { + "_type": "code", + "_key": "236210736f10", + "code": "providers {\n\n myazure {\n server = 'https://dev.azure.com'\n platform = 'azurerepos'\n user = 'me'\n token = 'my-api-token'\n }\n\n}" + }, + { + "_key": "31c59860e4c4", + "children": [ + { + "text": "", + "_key": "f79099283b02", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=preview-page", + "_key": "127ad6dea324", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To generate a ", + "_key": "79dc56d5ee56" + }, + { + "marks": [ + "em" + ], + "text": "personal access token", + "_key": "dca3e2327b81", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for your Azure Repos integration, please refer to the ", + "_key": "c62fa3acd703" + }, + { + "marks": [ + "127ad6dea324" + ], + "text": "official guide", + "_key": "316c2153bd98", + "_type": "span" + }, + { + "marks": [], + "text": " on Azure.", + "_key": "7e64d39f1cec", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "48366bae578b" + }, + { + "_key": "76137fc1af3b", + "children": [ + { + "_key": "6150348308d0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "916ba48ce0cd", + "markDefs": [], + "children": [ + { + "_key": "650626725ba9", + "_type": "span", + "marks": [], + "text": "Once the configuration is set, you can test the integration by pulling the repository code and specifying " + }, + { + "marks": [ + "code" + ], + "text": "myazure", + "_key": "5dec137c2f5c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " as the Git provider using the ", + "_key": "75dd36910b69" + }, + { + "marks": [ + "code" + ], + "text": "-hub", + "_key": "dc13cac6c64a", + "_type": "span" + }, + { + "_key": "2e222cc77d79", + "_type": "span", + "marks": [], + "text": " option." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "f77bcc300a08", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b9fa19d98dfd" + }, + { + "_type": "code", + "_key": "7f947958ba86", + "code": "nextflow pull https://dev.azure.com/org_name/DefaultCollection/_git/repo_name -hub myazure" + }, + { + "_type": "block", + "style": "normal", + "_key": "896a90e04c53", + "children": [ + { + "_key": "f1a1bf7b13e6", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "4d80672ba830" + } + ], + "_type": "block", + "style": "h2", + "_key": "c0c41d831c66" + }, + { + "_type": "block", + "style": "normal", + "_key": "a907b5ed9a76", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Git is a popular, widely used software system for source code management. The native integration of Nextflow with various Git hosting solutions is an important feature to facilitate reproducible workflows that enable collaborative development and deployment of Nextflow pipelines.", + "_key": "1355f72c8231" + } + ] + }, + { + "_key": "427d8ead829f", + "children": [ + { + "text": "", + "_key": "381758ec6146", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Stay tuned for more integrations as we continue to improve our support for various source code management solutions! ", + "_key": "582658966ac5", + "_type": "span" + }, + { + "_key": "a3e3077a02a2", + "_type": "span", + "text": "" + }, + { + "_key": "db42a8490e7d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "72bb7fea75ff", + "markDefs": [] + } + ], + "_updatedAt": "2024-09-26T09:02:27Z", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWNOF", + "_type": "reference" + }, + "_type": "blogPost", + "title": "Configure Git private repositories with Nextflow", + "_id": "75e287c3b59d" + }, + { + "_rev": "Ot9x7kyGeH5005E3MJ9HX2", + "meta": { + "slug": { + "current": "nextflow-summit-2023-recap" + }, + "description": "After a three-year COVID-related hiatus from in-person events, Nextflow developers and users found their way to Barcelona this October for the 2022 Nextflow Summit. Held at Barcelona’s iconic Agbar tower, this was easily the most successful Nextflow community event yet!" + }, + "_type": "blogPost", + "_createdAt": "2024-09-25T14:17:34Z", + "title": "Nextflow Summit 2023 recap", + "_updatedAt": "2024-09-27T12:48:28Z", + "publishedAt": "2023-10-25T06:00:00.000Z", + "author": { + "_ref": "noel-ortiz", + "_type": "reference" + }, + "body": [ + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Five days of Nextflow awesomeness in Barcelona", + "_key": "dfa6f114da10" + } + ], + "_type": "block", + "style": "h2", + "_key": "13ec8fba09fd", + "markDefs": [] + }, + { + "style": "normal", + "_key": "7115faea0544", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events/hackathon", + "_key": "55999ee7f4e6" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/", + "_key": "3bd42d641edb" + }, + { + "_type": "link", + "href": "https://nextflow.slack.com/archives/C0602TWRT5G", + "_key": "198ca41c4066" + }, + { + "_key": "b574108d0a23", + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL" + } + ], + "children": [ + { + "_key": "559145446d99", + "_type": "span", + "marks": [], + "text": "On Friday, Oct 20, we wrapped up our " + }, + { + "marks": [ + "55999ee7f4e6" + ], + "text": "hackathon", + "_key": "bdaf5551163e", + "_type": "span" + }, + { + "_key": "cd5b2f21de78", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "3bd42d641edb" + ], + "text": "Nextflow Summit", + "_key": "c81796a049df" + }, + { + "_type": "span", + "marks": [], + "text": " in Barcelona, Spain. By any measure, this year’s Summit was our best community event ever, drawing roughly 900 attendees across multiple channels, including in-person attendees, participants in our ", + "_key": "a0b699df91d3" + }, + { + "_type": "span", + "marks": [ + "198ca41c4066" + ], + "text": "#summit-2023", + "_key": "0f94a55ba2c3" + }, + { + "_type": "span", + "marks": [], + "text": " Slack channel, and ", + "_key": "325459a5f8d3" + }, + { + "_type": "span", + "marks": [ + "b574108d0a23" + ], + "text": "Summit Livestream", + "_key": "3bfd251ca155" + }, + { + "_key": "ef1fbd009b5a", + "_type": "span", + "marks": [], + "text": " viewers on YouTube." + } + ], + "_type": "block" + }, + { + "_key": "edab1ba1020a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "09884e46bd47", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "5f2e408b7949", + "markDefs": [], + "children": [ + { + "_key": "31711ea0a93b", + "_type": "span", + "marks": [], + "text": "The Summit drew attendees, speakers, and sponsors from around the world. Over the course of the three-day event, we heard from dozens of impressive speakers working at the cutting edge of life sciences from academia, research, healthcare providers, biotechs, and cloud providers, including:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "2f72d2d8269c", + "markDefs": [], + "children": [ + { + "_key": "dc157db099d3", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "6d027453abac", + "_type": "span", + "marks": [], + "text": "Australian BioCommons" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "277b4cd9d9f0" + }, + { + "style": "normal", + "_key": "950d9796c52e", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Genomics England", + "_key": "f19768c8befe", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "style": "normal", + "_key": "6063a18f8ffc", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Pixelgen Technologies", + "_key": "c804e17da68b", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "578aea095813", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "University of Tennessee Health Science Center", + "_key": "9e6b6d341917" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Amazon Web Services", + "_key": "14f233728029" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "56a882bd19b2" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "00d51e3f95a3", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Quantitative Biology Center - University of Tübingen", + "_key": "868991d76855", + "_type": "span", + "marks": [] + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "509a4fb683e4", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "5aec43b0c33f", + "_type": "span", + "marks": [], + "text": "Biomodal" + } + ] + }, + { + "style": "normal", + "_key": "162fb16137d0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Matterhorn Studio", + "_key": "749b123373ed", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "text": "Centre for Genomic Regulation (CRG)", + "_key": "a7c11f2b0f16", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ca7d6c5fee0b", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "d52420a2f46d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Heidelberg University Hospital", + "_key": "1a8b5a1aabed", + "_type": "span" + } + ], + "level": 1 + }, + { + "_key": "d7ab65d2ca39", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "b6e5d11d3cec", + "_type": "span", + "marks": [], + "text": "MemVerge" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a35d25f1401c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "University of Cambridge", + "_key": "4b36ae5de5a8" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "2f889c34a4a8", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Oxford Nanopore Technologies", + "_key": "bb2795b1b81f" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "f1eccc0543c8", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Medical University of Innsbruck", + "_key": "d6cc2fc2255f", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "d00eb5893f35", + "_type": "span", + "marks": [], + "text": "Sano Genetics" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4340237932e5", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Institute of Genetics and Development of Rennes, University of Rennes", + "_key": "91d5758d6962", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "48de40397b33", + "listItem": "bullet" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Ardigen", + "_key": "7372eca861e2" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "fd99dc15c540", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "4433ff96742c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "ZS", + "_key": "63e8fbf4d935", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Wellcome Sanger Institute", + "_key": "4c300de3884d", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "43f7ac878f31" + }, + { + "_key": "86288a145d48", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "SciLifeLab", + "_key": "6456095e4b28", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "64c4fc360539", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "AstraZeneca UK Ltd", + "_key": "e4a86a0a9b61" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "06f1e7827647", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "University of Texas at Dallas", + "_key": "df6631aa419f", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "3919ada6a427", + "_type": "span", + "marks": [], + "text": "Seqera" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8075096fd8fc", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "b68026fb478b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3f5f32007d1f" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "d875f24e2ff2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The hackathon – advancing the Nextflow ecosystem", + "_key": "6b9388404dae" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7f638c916e12", + "markDefs": [ + { + "_key": "ab5c0bf6bc4d", + "_type": "link", + "href": "https://github.com/orgs/nf-core/projects/47/views/1" + }, + { + "href": "https://nf-co.re/", + "_key": "7c4f1146311a", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The week began with a three-day in-person and virtual nf-core hackathon event. With roughly 100 in-person developers, this was twice the size of our largest Hackathon to date. As with previous Hackathons, participants were divided into project groups, with activities coordinated via a single ", + "_key": "bdd59dd86bad" + }, + { + "_type": "span", + "marks": [ + "ab5c0bf6bc4d" + ], + "text": "GitHub project board", + "_key": "9f6d79c354fb" + }, + { + "_key": "b05499326529", + "_type": "span", + "marks": [], + "text": " focusing on different aspects of " + }, + { + "_type": "span", + "marks": [ + "7c4f1146311a" + ], + "text": "nf-core", + "_key": "08bdcf7834e3" + }, + { + "marks": [], + "text": " and Nextflow, including:", + "_key": "44edd8b707b5", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6d5d41f8da35", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f65a0937c414" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Pipelines", + "_key": "475bfe04d882", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9d8af9de3d97", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "341e673a9669", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Modules & subworkflows", + "_key": "6467360fdc85" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Infrastructure", + "_key": "e5d17d96cd27", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3a2ccf490a75", + "listItem": "bullet", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow & plugins development", + "_key": "396f4081c00e" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4f93c5049c19", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "7cb96ec653e1", + "markDefs": [], + "children": [ + { + "_key": "b7f913bfdd20", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "This year, the focus of the hackathon was ", + "_key": "acecd627bc2b" + }, + { + "_type": "span", + "marks": [ + "e96da4917250" + ], + "text": "nf-test", + "_key": "205d24356056" + }, + { + "_type": "span", + "marks": [], + "text": ", an open-source testing framework for Nextflow pipelines. The team made considerable progress applying nf-test consistently across various nf-core pipelines and modules — and of course, no Hackathon would be complete without a community cooking class, quiz, bingo, a sock hunt, and a scavenger hunt!", + "_key": "3f5cc61b1351" + } + ], + "_type": "block", + "style": "normal", + "_key": "1b066c2eca20", + "markDefs": [ + { + "href": "https://code.askimed.com/nf-test/", + "_key": "e96da4917250", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0ba622c25bf5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "191c96cd98ce", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "14e66d4a5a06", + "markDefs": [ + { + "_key": "878ef99f81a4", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For an overview of the tremendous progress made advancing the state of Nextflow and nf-core in three short days, view Chris Hakkaart’s talk on ", + "_key": "37a27d488495" + }, + { + "marks": [ + "878ef99f81a4" + ], + "text": "highlights from the nf-core hackathon", + "_key": "307843a6950c", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "29e2a477e865", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "0412744e0392", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9c322c7a503f", + "markDefs": [] + }, + { + "_key": "d0ca596800e1", + "markDefs": [], + "children": [ + { + "text": "The Summit kicks off", + "_key": "83ca0fa4c590", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "c1a293dd1ab9", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/", + "_key": "df9b1c2a8898", + "_type": "link" + }, + { + "_key": "74f32edad4b7", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/", + "_key": "9a28b4078ee2" + }, + { + "href": "https://nf-co.re/pixelator/1.0.0", + "_key": "857aeee91cac", + "_type": "link" + } + ], + "children": [ + { + "text": "The Summit began on Wednesday Oct 18 with excellent talks from ", + "_key": "1081730797d8", + "_type": "span", + "marks": [] + }, + { + "_key": "f4c1e5a1a091", + "_type": "span", + "marks": [ + "df9b1c2a8898" + ], + "text": "Australian BioCommons" + }, + { + "_key": "f2d3808b6dc1", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "74f32edad4b7" + ], + "text": "Genomics England", + "_key": "cbe8342f675e" + }, + { + "marks": [], + "text": ". This was followed by a presentation where ", + "_key": "1588ab3ed2d3", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9a28b4078ee2" + ], + "text": "Pixelgen Technologies", + "_key": "40ad4d1efdac" + }, + { + "text": " described their unique Molecular Pixelation (MPX) technologies and unveiled their new ", + "_key": "7ebf647520ac", + "_type": "span", + "marks": [] + }, + { + "text": "nf-core/pixelator", + "_key": "ee5b107cb013", + "_type": "span", + "marks": [ + "857aeee91cac" + ] + }, + { + "_key": "4bf69cc54537", + "_type": "span", + "marks": [], + "text": " community pipeline for molecular pixelation assays." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "ba70f06c5dd7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ed625479b624", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "bfa61036554a", + "markDefs": [ + { + "href": "https://nextflow.io/blog/2023/introducing-nextflow-ambassador-program.html", + "_key": "eb1bde101012", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nextflow.io/blog/2023/community-forum.html", + "_key": "ad2cecb264cc" + }, + { + "_key": "89ec88f218b9", + "_type": "link", + "href": "https://community.seqera.io" + }, + { + "_type": "link", + "href": "https://nextflow.io/blog/2023/geraldine-van-der-auwera-joins-seqera.html", + "_key": "5aec553cf479" + }, + { + "_type": "link", + "href": "https://www.oreilly.com/library/view/genomics-in-the/9781491975183/", + "_key": "5d704377dd00" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Next, Seqera’s Phil Ewels took the stage providing a series of community updates, including the announcement of a new ", + "_key": "e5b68d5732d0" + }, + { + "text": "Nextflow Ambassador", + "_key": "43e5a69e4acc", + "_type": "span", + "marks": [ + "eb1bde101012" + ] + }, + { + "_type": "span", + "marks": [], + "text": " program, ", + "_key": "adda4940eb01" + }, + { + "_key": "3178039d4029", + "_type": "span", + "marks": [ + "ad2cecb264cc" + ], + "text": "a new community forum" + }, + { + "_type": "span", + "marks": [], + "text": " at ", + "_key": "915ac277c5b7" + }, + { + "text": "community.seqera.io", + "_key": "2eb943a6e7a3", + "_type": "span", + "marks": [ + "89ec88f218b9" + ] + }, + { + "text": ", and the exciting appointment of ", + "_key": "403fbe290a5f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "5aec553cf479" + ], + "text": "Geraldine Van der Auwera", + "_key": "5292a77ebd78", + "_type": "span" + }, + { + "marks": [], + "text": " as lead developer advocate for the Nextflow. Geraldine is well known for her work on GATK, WDL, and Terra.bio and is the co-author of the book ", + "_key": "8069e0901ddf", + "_type": "span" + }, + { + "text": "Genomics on the Cloud", + "_key": "1ea8773a5167", + "_type": "span", + "marks": [ + "5d704377dd00" + ] + }, + { + "marks": [], + "text": ". As Geraldine assumes leadership of the developer advocacy team, Phil will spend more time focusing on open-source development, as product manager of open source at Seqera.", + "_key": "1ec3523ea70e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "061df5d01f09", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d89723197808" + } + ], + "_type": "block" + }, + { + "_key": "ee76d656f349", + "asset": { + "_ref": "image-1f9c53d8f6d591fa2bb366f2d4f855f964f394b8-1200x661-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Hackathon 2023 photo" + }, + { + "children": [ + { + "text": "Seqera’s Evan Floden shared his vision of the modern biotech stack for open science, highlighting recent developments at Seqera, including a revamped ", + "_key": "bf490f12635f", + "_type": "span", + "marks": [] + }, + { + "_key": "b69e9d36c04c", + "_type": "span", + "marks": [ + "0538a636d073" + ], + "text": "Seqera platform" + }, + { + "marks": [], + "text": ", new ", + "_key": "2f0f56a5a636", + "_type": "span" + }, + { + "_key": "52f6fbba4423", + "_type": "span", + "marks": [ + "d5b2801dfc0d" + ], + "text": "Data Explorer" + }, + { + "_type": "span", + "marks": [], + "text": " functionality, and providing an exciting glimpse of the new Data Studios feature now in private preview. You can view ", + "_key": "de9c08d010f1" + }, + { + "marks": [ + "3759c14fb96c" + ], + "text": "Evan’s full talk here", + "_key": "67815eeb59ca", + "_type": "span" + }, + { + "_key": "1575362dfbbb", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "1428e1d6e3cf", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/platform/", + "_key": "0538a636d073" + }, + { + "_type": "link", + "href": "https://seqera.io/blog/introducing-data-explorer/", + "_key": "d5b2801dfc0d" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/", + "_key": "3759c14fb96c" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8415f7687a23" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d424bbb023f", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "A highlight was the keynote delivered by Erik Garrison of the University of Tennessee Health Science Center provided. In his talk, ", + "_key": "5b3c49cdcd98", + "_type": "span" + }, + { + "marks": [ + "8b363a582379" + ], + "text": "Biological revelations at the frontiers of a draft human pangenome reference", + "_key": "a2e5a7062f5e", + "_type": "span" + }, + { + "_key": "3cb951fcc8f5", + "_type": "span", + "marks": [], + "text": ", Erik shared how his team's cutting-edge work applying new computational methods in the context of the Human Pangenome Project has yielded the most complete picture of human sequence evolution available to date." + } + ], + "_type": "block", + "style": "normal", + "_key": "6894e65d9026", + "markDefs": [ + { + "_key": "8b363a582379", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f26b2a607fdf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ba2db8650abb" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Day one wrapped up with a surprise ", + "_key": "fd62ca11e3e1" + }, + { + "text": "announcement", + "_key": "bae5880dae50", + "_type": "span", + "marks": [ + "27a0d5a00c46" + ] + }, + { + "_type": "span", + "marks": [], + "text": " that Seqera has been confirmed as the official High-Performance Computing Supplier for Alinghi Red Bull Racing at the ", + "_key": "6814471a2c12" + }, + { + "_key": "66c5f1bba5b2", + "_type": "span", + "marks": [ + "d6fb925b8f21" + ], + "text": "37th America’s Cup" + }, + { + "_type": "span", + "marks": [], + "text": " in Barcelona. This was followed by an evening reception hosted by ", + "_key": "5bc1a244f320" + }, + { + "_type": "span", + "marks": [ + "1fc0c5ec60e0" + ], + "text": "Alinghi Red Bull Racing", + "_key": "1051b345a8e9" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "65ae176cda57" + } + ], + "_type": "block", + "style": "normal", + "_key": "97a6e355a954", + "markDefs": [ + { + "_type": "link", + "href": "https://www.globenewswire.com/news-release/2023/10/20/2763899/0/en/Seqera-Sets-Sail-With-Alinghi-Red-Bull-Racing-as-Official-High-Performance-Computing-Supplier.html", + "_key": "27a0d5a00c46" + }, + { + "_type": "link", + "href": "https://www.americascup.com/", + "_key": "d6fb925b8f21" + }, + { + "_type": "link", + "href": "https://alinghiredbullracing.americascup.com/", + "_key": "1fc0c5ec60e0" + } + ] + }, + { + "style": "normal", + "_key": "54617fae136c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1691744cab4f" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "23905164f376", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Day two starts off on the right foot", + "_key": "6d20f29c1422" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/", + "_key": "0f5284738d0b", + "_type": "link" + }, + { + "_key": "0de813eee44d", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/" + }, + { + "_key": "89852f2877d3", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/" + } + ], + "children": [ + { + "_key": "4e9fb421479a", + "_type": "span", + "marks": [], + "text": "Day two kicked off with a brisk sunrise run along the iconic Barcelona Waterfront attended by a team of hardy Summit participants. After that, things kicked into high gear for the morning session with talks on everything from using Nextflow to power " + }, + { + "text": "Machine Learning pipelines for materials science", + "_key": "ad268d7ad4c3", + "_type": "span", + "marks": [ + "0f5284738d0b" + ] + }, + { + "marks": [], + "text": " to ", + "_key": "4e229c6c7f16", + "_type": "span" + }, + { + "marks": [ + "0de813eee44d" + ], + "text": "standardized frameworks for protein structure prediction", + "_key": "020de77475c3", + "_type": "span" + }, + { + "_key": "13f13f9f33f9", + "_type": "span", + "marks": [], + "text": " to discussions on " + }, + { + "_type": "span", + "marks": [ + "89852f2877d3" + ], + "text": "how to estimate the CO2 footprint of pipeline runs", + "_key": "bf103d7111f3" + }, + { + "text": ".", + "_key": "b8f6542a0668", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a0f553dacc6e" + }, + { + "markDefs": [], + "children": [ + { + "_key": "32527b356919", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8f1e668029b" + }, + { + "_type": "image", + "alt": "Summit 2023 photo", + "_key": "7f47ff745f15", + "asset": { + "_ref": "image-25defaeaabff1a7d9d3435f37cbf7c014263d2d0-1200x724-jpg", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "406a0f844ab5", + "markDefs": [ + { + "_key": "b72d4de0c539", + "_type": "link", + "href": "https://seqera.io/fusion/" + }, + { + "_type": "link", + "href": "https://seqera.io/wave/", + "_key": "c4f519475069" + }, + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html#spack", + "_key": "0bd73c94cc37" + }, + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/", + "_key": "aaced4648329", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/seqeralabs/wave", + "_key": "b472902bbba1" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow creator and Seqera CTO and co-founder Paolo Di Tommaso provided an update on some of the technologies he and his team have been working on including a deep dive on the ", + "_key": "b288a6126d4a", + "_type": "span" + }, + { + "marks": [ + "b72d4de0c539" + ], + "text": "Fusion file system", + "_key": "3ede39244967", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Paolo also delved into ", + "_key": "5a5f3d8fdf8b" + }, + { + "text": "Wave containers", + "_key": "a0ffa5cbe273", + "_type": "span", + "marks": [ + "c4f519475069" + ] + }, + { + "marks": [], + "text": ", discussing the dynamic assembly of containers using the ", + "_key": "e79e04aa2077", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "0bd73c94cc37" + ], + "text": "Spack package manager", + "_key": "f4bdd9a13f96" + }, + { + "_key": "bd8319c06ec8", + "_type": "span", + "marks": [], + "text": ", echoing a similar theme from AWS’s " + }, + { + "_key": "04a8bd9f9d88", + "_type": "span", + "marks": [ + "aaced4648329" + ], + "text": "Brendan Bouffler" + }, + { + "marks": [], + "text": " earlier in the day. During the conference, Seqera announced Wave Containers as our latest ", + "_key": "f780c5c2c9aa", + "_type": "span" + }, + { + "text": "open-source", + "_key": "d8e00b3f762c", + "_type": "span", + "marks": [ + "b472902bbba1" + ] + }, + { + "marks": [], + "text": " contribution to the bioinformatics community — a huge contribution to the open science movement.", + "_key": "c59f7c0ac6af", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ac961a6e1160" + } + ], + "_type": "block", + "style": "normal", + "_key": "53e9cdb6c719", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/", + "_key": "f6c9fffbc89b" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/", + "_key": "0e99edb6b3f2" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/", + "_key": "3b3595e9888b" + } + ], + "children": [ + { + "text": "Paolo also provided an impressive command-line focused demo of Wave, echoing Harshil Patel’s equally impressive demo earlier in the day focused on ", + "_key": "220f0ac51a49", + "_type": "span", + "marks": [] + }, + { + "_key": "627f983618ab", + "_type": "span", + "marks": [ + "f6c9fffbc89b" + ], + "text": "seqerakit and automation on the Seqera Platform" + }, + { + "marks": [], + "text": ". Both Harshil and Paolo showed themselves to be ", + "_key": "13f2a0fc6fa2", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "\"kings of the live demo\"", + "_key": "4e2c58617a09" + }, + { + "_type": "span", + "marks": [], + "text": " for their command line mastery under pressure! You can view ", + "_key": "8d838d64321d" + }, + { + "_type": "span", + "marks": [ + "0e99edb6b3f2" + ], + "text": "Paolo’s talk and demos here", + "_key": "806158bd97a6" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "6ae8e2cc1e62" + }, + { + "_type": "span", + "marks": [ + "3b3595e9888b" + ], + "text": "Harshil’s talk here", + "_key": "25ac24a79805" + }, + { + "marks": [], + "text": ".", + "_key": "8a8acb8e5901", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "46d8323268ee" + }, + { + "style": "normal", + "_key": "e518ff546fd8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f7f3e312a053" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_key": "cae7d8897f8e", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/" + }, + { + "_key": "6331cb2a4c42", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/", + "_key": "170f78306998" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Talks during day two included ", + "_key": "7f34e5d4fc8d" + }, + { + "marks": [ + "cae7d8897f8e" + ], + "text": "bringing spatial omics to nf-core", + "_key": "f48874cf9bf4", + "_type": "span" + }, + { + "text": ", a discussion of ", + "_key": "d24b5a84b2e6", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "6331cb2a4c42" + ], + "text": "nf-validation", + "_key": "bc0f9da507e2", + "_type": "span" + }, + { + "text": ", and a talk on the ", + "_key": "444371d2e600", + "_type": "span", + "marks": [] + }, + { + "_key": "5b9e4d6fd1f7", + "_type": "span", + "marks": [ + "170f78306998" + ], + "text": "development of an integrated DNA and RNA variant calling pipeline" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "e78f9bc1905b" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2139bda7e00" + }, + { + "style": "normal", + "_key": "20493e4724c0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "514a25eb86cc", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "346095e51ea9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Unfortunately, there were too many brilliant speakers and topics to mention them all here, so we’ve provided a handy summary of talks at the end of this post so you can look up topics of interest.", + "_key": "6b9ee0acfc9d" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "086aa99c9b81", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f9be4873a0f8" + }, + { + "_type": "block", + "style": "normal", + "_key": "c9d051429045", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/sponsors/", + "_key": "e3299d96a7fa", + "_type": "link" + }, + { + "href": "https://summit.nextflow.io/barcelona/posters/", + "_key": "2fa668892f57", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The Summit also featured an exhibition area, and attendees visited booths hosted by ", + "_key": "8e577ce4a71b", + "_type": "span" + }, + { + "marks": [ + "e3299d96a7fa" + ], + "text": "event sponsors", + "_key": "8c8069288586", + "_type": "span" + }, + { + "text": " between talks and viewed the many excellent ", + "_key": "b7ebeef68b83", + "_type": "span", + "marks": [] + }, + { + "_key": "1d65bfd34780", + "_type": "span", + "marks": [ + "2fa668892f57" + ], + "text": "scientific posters" + }, + { + "text": " contributed for the event. Following a packed day of sessions that went into the evening, attendees relaxed and socialized with colleagues over dinner.", + "_key": "0d9a9d966605", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0d54636b27ba", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "34368684c049" + } + ] + }, + { + "_key": "dfe9ee6ddb9a", + "asset": { + "_ref": "image-199b150da416e3587b8c53cdbcd4937c4e7792ad-1200x620-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Morning run photo" + }, + { + "_type": "block", + "style": "h2", + "_key": "a4a4adcb8ae8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Wrapping up", + "_key": "0c9634a13b96" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "31c2de53fea8", + "markDefs": [ + { + "_key": "7bf6199160a8", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/" + }, + { + "_key": "4c7077cba673", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/", + "_key": "04fc360d8753" + } + ], + "children": [ + { + "marks": [], + "text": "As things wound to a close on day three, there were additional talks on topics ranging from ZS’s ", + "_key": "c09c3422865c", + "_type": "span" + }, + { + "text": "contributing to nf-core through client collaboration", + "_key": "d1a9c44863e0", + "_type": "span", + "marks": [ + "7bf6199160a8" + ] + }, + { + "text": " to ", + "_key": "dbeb5671539f", + "_type": "span", + "marks": [] + }, + { + "_key": "1b73ff468e4a", + "_type": "span", + "marks": [ + "4c7077cba673" + ], + "text": "decoding the Tree of Life at Wellcome Sanger Institute" + }, + { + "_type": "span", + "marks": [], + "text": " to ", + "_key": "41371e9d4f42" + }, + { + "_key": "4de009b8a207", + "_type": "span", + "marks": [ + "04fc360d8753" + ], + "text": "performing large and reproducible GWAS analysis on biobank-scale data" + }, + { + "marks": [], + "text": " at Medical University of Innsbruck.", + "_key": "b5414637929a", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7c1300d0ae72" + } + ], + "_type": "block", + "style": "normal", + "_key": "08b40ee33552" + }, + { + "_type": "block", + "style": "normal", + "_key": "aa11b3108917", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/", + "_key": "6b4587651057" + }, + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/", + "_key": "fceb6bd3060e" + } + ], + "children": [ + { + "_key": "96ad9c890d10", + "_type": "span", + "marks": [], + "text": "Phil Ewels discussed " + }, + { + "marks": [ + "6b4587651057" + ], + "text": "future plans for MultiQC", + "_key": "dfda154adb89", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", and Edmund Miller ", + "_key": "0ea20c70c33c" + }, + { + "_type": "span", + "marks": [ + "fceb6bd3060e" + ], + "text": "shared his experience working on nf-test", + "_key": "1307b7a34c2a" + }, + { + "marks": [], + "text": " and how it is empowering scalable and streamlined testing for nf-core projects.", + "_key": "04ac237265a9", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "311b9157a8e5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "3174512a6e9a" + } + ], + "_type": "block" + }, + { + "_key": "e38569fe2143", + "markDefs": [ + { + "href": "https://summit.nextflow.io/boston/", + "_key": "ee74b6d67a02", + "_type": "link" + } + ], + "children": [ + { + "text": "To close the event, Evan took the stage a final time, thanking the many Summit organizers and contributors, and announcing the next Nextflow Summit Barcelona, scheduled for ", + "_key": "ae4cc0f5eda6", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "October 21-25, 2024", + "_key": "0b46bbaee2fa", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". He also reminded attendees of the upcoming North American Hackathon and ", + "_key": "935b3340346b" + }, + { + "text": "Nextflow Summit in Boston", + "_key": "a7a68f3b9173", + "_type": "span", + "marks": [ + "ee74b6d67a02" + ] + }, + { + "_key": "c747351ebf0b", + "_type": "span", + "marks": [], + "text": " beginning on November 28, 2023." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "758832921c04", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ad24a103371c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b47fbf2d850d", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/boston/sponsors/", + "_key": "86c64548e007" + } + ], + "children": [ + { + "_key": "b9edae03776d", + "_type": "span", + "marks": [], + "text": "On behalf of the Seqera team, thank you to our fellow " + }, + { + "text": "sponsors", + "_key": "de7f55ce8ec4", + "_type": "span", + "marks": [ + "86c64548e007" + ] + }, + { + "text": " who helped make the Nextflow Summit a resounding success. This year’s sponsors included:", + "_key": "450697fd7977", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b810517195c9" + } + ], + "_type": "block", + "style": "normal", + "_key": "588ee148c259", + "markDefs": [] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5d89d70485b0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "AWS", + "_key": "e7573acfa38b" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "ZS", + "_key": "2ee1670241f0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "746a66b2fdc3", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "b1ffd44d63a6", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Element Biosciences", + "_key": "aa778211fd85" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "4ebfab406774", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Microsoft", + "_key": "4f1abd0b15f7" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "MemVerge", + "_key": "7bb1426e7c26" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5ff1d2e784b2", + "listItem": "bullet" + }, + { + "children": [ + { + "_key": "1cb2b998bf6a", + "_type": "span", + "marks": [], + "text": "Pixelgen Technologies" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "c9da239b4956", + "listItem": "bullet", + "markDefs": [] + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Oxford Nanopore", + "_key": "f388be299e34", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "24554aacb9a3" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3c4c5484adc4", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Quilt", + "_key": "e1aee7cca9b3" + } + ] + }, + { + "style": "normal", + "_key": "dc2fe1c7aaf2", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "TileDB", + "_key": "4160c4548464" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "960770e3a050", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "da618e55ce88" + }, + { + "style": "h2", + "_key": "77cfaefad7f1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In case you missed it", + "_key": "1b09c3e734cb" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUotnP-tWRVNtuNWpN7xbpL", + "_key": "98a590e2fc6c" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you were unable to attend in person, or missed a talk, you can watch all three days of the Summit on our ", + "_key": "7c2456d70960" + }, + { + "_key": "d7c1ef05b54d", + "_type": "span", + "marks": [ + "98a590e2fc6c" + ], + "text": "YouTube channel" + }, + { + "_key": "47634c56c2d2", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "a64aa253c792" + }, + { + "_key": "d4567e3b1187", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "37f792931f4e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/events", + "_key": "a750d3220ced" + }, + { + "_type": "link", + "href": "https://seqera.io/events/seqera/", + "_key": "702c32f38a0c" + } + ], + "children": [ + { + "text": "For information about additional upcoming events including bytesize talks, hackathons, webinars, and training events, you can visit ", + "_key": "8e6d986394e2", + "_type": "span", + "marks": [] + }, + { + "_key": "531155f2a23b", + "_type": "span", + "marks": [ + "a750d3220ced" + ], + "text": "https://nf-co.re/events" + }, + { + "text": " or ", + "_key": "d8993cc0c7f4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "702c32f38a0c" + ], + "text": "https://seqera.io/events/seqera/", + "_key": "ef154c47e0d7" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "010ae94ae238" + } + ], + "_type": "block", + "style": "normal", + "_key": "d7a2d3143541" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a84f95ae41f6", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6da26cdb5358" + }, + { + "markDefs": [], + "children": [ + { + "_key": "a9d5a7ac800c", + "_type": "span", + "marks": [], + "text": "For your convenience, a handy list of talks from Nextflow Summit 2023 are summarized below." + } + ], + "_type": "block", + "style": "normal", + "_key": "399e5c070bb2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "74c351eedc32", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a0e9e0c0c33" + }, + { + "children": [ + { + "marks": [], + "text": "Day one (Wednesday Oct 18):", + "_key": "fbea00ef53b9", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "c15ed83c5e93", + "markDefs": [] + }, + { + "_key": "aa051e474968", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-the-national-nextflow-tower-service-for-australian-researchers/", + "_key": "e8c419d2499b", + "_type": "link" + } + ], + "children": [ + { + "_key": "cf61cfcfe8330", + "_type": "span", + "marks": [ + "e8c419d2499b" + ], + "text": "The National Nextflow Tower Service for Australian researchers" + }, + { + "_key": "cf61cfcfe8331", + "_type": "span", + "marks": [], + "text": " – Steven Manos" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "9252aeac3230", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-analysing-ont-long-read-data-for-cancer-with-nextflow/", + "_key": "b531d3e23cf5" + } + ], + "children": [ + { + "marks": [ + "b531d3e23cf5" + ], + "text": "Analysing ONT long read data for cancer with Nextflow", + "_key": "eaefb2a40dd20", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " – Arthur Gymer", + "_key": "eaefb2a40dd21" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-community-updates/", + "_key": "36d2b864126e", + "_type": "link" + } + ], + "children": [ + { + "_key": "9f6a117b98be0", + "_type": "span", + "marks": [ + "36d2b864126e" + ], + "text": "Community updates" + }, + { + "text": " – Phil Ewels", + "_key": "9f6a117b98be1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1a4874837764" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-pixelgen-technologies-heart-nextflow/", + "_key": "67c65cdfd449" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "67c65cdfd449" + ], + "text": "Pixelgen Technologies ❤︎ Nextflow", + "_key": "cc2d11006fae0" + }, + { + "_key": "cc2d11006fae1", + "_type": "span", + "marks": [], + "text": " – John Dahlberg" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5e428eb21860" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-modern-biotech/", + "_key": "a13a878b0d2e" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "a13a878b0d2e" + ], + "text": "The modern biotech stack", + "_key": "353b9d2726730" + }, + { + "_type": "span", + "marks": [], + "text": " – Evan Floden", + "_key": "353b9d2726731" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2eee9ebec89b" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "bae1ef150e05", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-18-erik-garrison/", + "_key": "27c16f91c357" + } + ], + "children": [ + { + "text": "Biological revelations at the frontiers of a draft human pangenome reference", + "_key": "09e7d9bbf7e90", + "_type": "span", + "marks": [ + "27c16f91c357" + ] + }, + { + "_key": "09e7d9bbf7e91", + "_type": "span", + "marks": [], + "text": " – Erik Garrison" + } + ] + }, + { + "children": [ + { + "text": "Day two (Thursday Oct 19):", + "_key": "a91d8981eb2e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "4140315217b7", + "markDefs": [] + }, + { + "_key": "0583360a0a0c", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-brendan-bouffler/", + "_key": "1b9f1c8379a6", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "1b9f1c8379a6" + ], + "text": "It’s been quite a year for research technology in the cloud: we’ve been busy", + "_key": "da76db19cb640" + }, + { + "_key": "da76db19cb641", + "_type": "span", + "marks": [], + "text": " – Brendan Bouffler" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "0e95c0d938f2", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-validation/", + "_key": "7027fac6c3dc" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "7027fac6c3dc" + ], + "text": "nf-validation: a Nextflow plugin to validate pipeline parameters and input files", + "_key": "a05a183967240" + }, + { + "text": " - Júlia Mir Pedrol", + "_key": "a05a183967241", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-biomodal-duet/", + "_key": "6f8cb47a9ca2" + } + ], + "children": [ + { + "text": "Computational methods for allele-specific methylation with biomodal Duet", + "_key": "d5b5a44cd3f00", + "_type": "span", + "marks": [ + "6f8cb47a9ca2" + ] + }, + { + "marks": [], + "text": " – Michael Wilson", + "_key": "d5b5a44cd3f01", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "aab6cd49a8ec", + "listItem": "bullet" + }, + { + "_key": "56289de94672", + "listItem": "bullet", + "markDefs": [ + { + "_key": "71c0454c662f", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-machine-learning-for-material-science/" + } + ], + "children": [ + { + "text": "How to use data pipelines in Machine Learning for Material Science", + "_key": "d397369073b00", + "_type": "span", + "marks": [ + "71c0454c662f" + ] + }, + { + "_key": "d397369073b01", + "_type": "span", + "marks": [], + "text": " – Jakob Zeitler" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "fa0066cefb53", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-proteinfold/", + "_key": "a382001d6a73", + "_type": "link" + } + ], + "children": [ + { + "text": "nf-core/proteinfold: a standardized workflow framework for protein structure prediction tools", + "_key": "e38075f5c3bf0", + "_type": "span", + "marks": [ + "a382001d6a73" + ] + }, + { + "text": " - Jose Espinosa-Carrasco", + "_key": "e38075f5c3bf1", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "3b8ee04a6796", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-harshil-patel/" + } + ], + "children": [ + { + "marks": [ + "3b8ee04a6796" + ], + "text": "Automation on the Seqera Platform", + "_key": "0b95a098f5d50", + "_type": "span" + }, + { + "marks": [], + "text": " - Harshil Patel", + "_key": "0b95a098f5d51", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "21073ce4328c" + }, + { + "children": [ + { + "marks": [ + "a836d3080394" + ], + "text": "nf-co2footprint: a Nextflow plugin to estimate the CO2 footprint of pipeline runs", + "_key": "96e80726680c0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " - Sabrina Krakau", + "_key": "96e80726680c1" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "4830171a504a", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-co2footprint/", + "_key": "a836d3080394", + "_type": "link" + } + ] + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "ccbd04974240", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bringing-spatial-omics-to-nf-core/" + } + ], + "children": [ + { + "text": "Bringing spatial omics to nf-core", + "_key": "1303d633beda0", + "_type": "span", + "marks": [ + "ccbd04974240" + ] + }, + { + "text": " - Victor Perez", + "_key": "1303d633beda1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "d5d3472e4a0f" + }, + { + "style": "normal", + "_key": "82412086f229", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-bioinformatics-at-the-speed-of-cloud/", + "_key": "91008141c5ac" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "91008141c5ac" + ], + "text": "Bioinformatics at the speed of cloud: revolutionizing genomics with Nextflow and MMCloud", + "_key": "6d2cee7bd2060" + }, + { + "_key": "6d2cee7bd2061", + "_type": "span", + "marks": [], + "text": " - Sateesh Peri" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "f80d6b3e1b64", + "listItem": "bullet", + "markDefs": [ + { + "_key": "bff41b6af024", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-paolo-di-tommaso/" + } + ], + "children": [ + { + "_key": "f1a3a6d9d74b0", + "_type": "span", + "marks": [ + "bff41b6af024" + ], + "text": "Enabling converged computing with the Nextflow ecosystem" + }, + { + "_key": "f1a3a6d9d74b1", + "_type": "span", + "marks": [], + "text": " - Paolo Di Tommaso" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_key": "7eb3b3ffbe14", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-cluster-scalable-pangenome/" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "7eb3b3ffbe14" + ], + "text": "Cluster scalable pangenome graph construction with nf-core/pangenome", + "_key": "7be6b70fa19a0" + }, + { + "_key": "7be6b70fa19a1", + "_type": "span", + "marks": [], + "text": " - Simon Heumos" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "301ca1b8221c" + }, + { + "_type": "block", + "style": "normal", + "_key": "9931a53b8b01", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-development-of-an-integrated-dna-and-rna-variant-calling-pipeline/", + "_key": "935c530d4d9f", + "_type": "link" + } + ], + "children": [ + { + "_key": "8c8618fe75b70", + "_type": "span", + "marks": [ + "935c530d4d9f" + ], + "text": "Development of an integrated DNA and RNA variant calling pipeline" + }, + { + "text": " - Raquel Manzano", + "_key": "8c8618fe75b71", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-annotation-cache/", + "_key": "46e21a74b960" + } + ], + "children": [ + { + "_key": "4a094459811f0", + "_type": "span", + "marks": [ + "46e21a74b960" + ], + "text": "Annotation cache: using nf-core/modules and Seqera Platform to build an AWS open data resource" + }, + { + "marks": [], + "text": " - Maxime Garcia", + "_key": "4a094459811f1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ce7e5e6b22ac" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-real-time-sequencing-analysis-with-nextflow/", + "_key": "42d6daac95aa" + } + ], + "children": [ + { + "marks": [ + "42d6daac95aa" + ], + "text": "Real-time sequencing analysis with Nextflow", + "_key": "c033faa6f6460", + "_type": "span" + }, + { + "text": " - Chris Wright", + "_key": "c033faa6f6461", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "99cd8e1cefd5" + }, + { + "_key": "f4b05abe336e", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-sarek/", + "_key": "5b230e0d72a8" + } + ], + "children": [ + { + "text": "nf-core/sarek: a comprehensive & efficient somatic & germline variant calling workflow", + "_key": "b0cc094552fd0", + "_type": "span", + "marks": [ + "5b230e0d72a8" + ] + }, + { + "marks": [], + "text": " - Friederike Hanssen", + "_key": "b0cc094552fd1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "cc2c10fd6022", + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-test-simple-but-powerful/", + "_key": "9f02203ff86c", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "9f02203ff86c" + ], + "text": "nf-test: a simple but powerful testing framework for Nextflow pipelines", + "_key": "934bcd5734330" + }, + { + "_type": "span", + "marks": [], + "text": " - Lukas Forer", + "_key": "934bcd5734331" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-empowering-distributed-precision-medicine/", + "_key": "a7829f39297d", + "_type": "link" + } + ], + "children": [ + { + "_key": "bf72a4b056030", + "_type": "span", + "marks": [ + "a7829f39297d" + ], + "text": "Empowering distributed precision medicine: scalable genomic analysis in clinical trial recruitment" + }, + { + "_key": "bf72a4b056031", + "_type": "span", + "marks": [], + "text": " - Heath Obrien" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1cc7a74d0d2f", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "208b47a9969d", + "listItem": "bullet", + "markDefs": [ + { + "_key": "427725390738", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-nf-core-pipeline-for-genomic-imputation/" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "427725390738" + ], + "text": "nf-core pipeline for genomic imputation: from phasing to imputation to validation", + "_key": "65730a995fe10" + }, + { + "_type": "span", + "marks": [], + "text": " - Louis Le Nézet", + "_key": "65730a995fe11" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c1995c848f46", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-19-genomics-england/", + "_key": "74b3cb21b3e8" + } + ], + "children": [ + { + "_key": "d54127dd26240", + "_type": "span", + "marks": [ + "74b3cb21b3e8" + ], + "text": "Porting workflow managers to Nextflow at a national diagnostic genomics medical service – strategy and learnings" + }, + { + "_type": "span", + "marks": [], + "text": " - Several Speakers", + "_key": "d54127dd26241" + } + ], + "level": 1 + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Day three (Thursday Oct 19):", + "_key": "e06d9730c609" + } + ], + "_type": "block", + "style": "h3", + "_key": "a69946a0b68a" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-zs/", + "_key": "e80e02362384" + } + ], + "children": [ + { + "text": "Driving discovery: contributing to the nf-core project through client collaboration", + "_key": "24f404fc58940", + "_type": "span", + "marks": [ + "e80e02362384" + ] + }, + { + "_type": "span", + "marks": [], + "text": " - Felipe Almeida & Juliet Frederiksen", + "_key": "24f404fc58941" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ef8ffc25c52c", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "6b0f1dc9e65e", + "listItem": "bullet", + "markDefs": [ + { + "_key": "528ac69ad98e", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-tree-of-life/" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "528ac69ad98e" + ], + "text": "Automated production engine to decode the Tree of Life", + "_key": "a4197b9385b50" + }, + { + "marks": [], + "text": " - Guoying Qi", + "_key": "a4197b9385b51", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "marks": [ + "4b90e73d4b88" + ], + "text": "Building a community: experiences from one year as a developer advocate", + "_key": "d8da75b223bd0", + "_type": "span" + }, + { + "text": " - Marcel Ribeiro-Dantas", + "_key": "d8da75b223bd1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6f999110d021", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-community-building/", + "_key": "4b90e73d4b88" + } + ] + }, + { + "_key": "06fee1fa78e7", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-core-raredisease/", + "_key": "2143c9c97bfb" + } + ], + "children": [ + { + "marks": [ + "2143c9c97bfb" + ], + "text": "nf-core/raredisease: a workflow to analyse data from patients with rare diseases", + "_key": "41de5f4e135c0", + "_type": "span" + }, + { + "marks": [], + "text": " - Ramprasad Neethiraj", + "_key": "41de5f4e135c1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "729491b312ca0", + "_type": "span", + "marks": [ + "96a5fddfaaf2" + ], + "text": "Enabling AZ bioinformatics with Nextflow/Nextflow Tower" + }, + { + "text": " - Manasa Surakala", + "_key": "729491b312ca1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "af3e01aba642", + "listItem": "bullet", + "markDefs": [ + { + "_key": "96a5fddfaaf2", + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-az/" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0ad61b93e572", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-multiqc/", + "_key": "14715cd9c7d1" + } + ], + "children": [ + { + "marks": [ + "14715cd9c7d1" + ], + "text": "Bringing MultiQC into a new era", + "_key": "2810d448f28e0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " - Phil Ewels", + "_key": "2810d448f28e1" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-nf-test-at-nf-core/", + "_key": "9a33b5f0722a", + "_type": "link" + } + ], + "children": [ + { + "text": "nf-test at nf-core: empowering scalable and streamlined testing", + "_key": "6175bcabe61b0", + "_type": "span", + "marks": [ + "9a33b5f0722a" + ] + }, + { + "marks": [], + "text": " - Edmund Miller", + "_key": "6175bcabe61b1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "86ffa2e5b015" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-gwas/", + "_key": "41b3ef5ce5bd" + } + ], + "children": [ + { + "text": "Performing large and reproducible GWAS analysis on biobank-scale data", + "_key": "bb28da5ebaa70", + "_type": "span", + "marks": [ + "41b3ef5ce5bd" + ] + }, + { + "text": " - Sebastian Schönherr", + "_key": "bb28da5ebaa71", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "112d825034f6" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/barcelona/agenda/summit/oct-20-hackathon/", + "_key": "a0b4a3b8238c" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "a0b4a3b8238c" + ], + "text": "Highlights from the nf-core hackathon", + "_key": "aed04cf277ff0" + }, + { + "marks": [], + "text": " - Chris Hakkaart", + "_key": "aed04cf277ff1", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "e8b77d3f2e50" + }, + { + "_key": "969301457a9a", + "markDefs": [], + "children": [ + { + "_key": "0f0ef353fac1", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a5e4c059a002", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "In this event, we're showcasing some of the results of the project 'Optimization of computational resources for HPC workloads in the cloud using ML/AI' by Seqera Labs S.L. This project has been funded by the European Regional Development Fund (ERDF) of the European Union, coordinated and managed by RED.es, with the aim of carrying out the development of technological entrepreneurship and technological demand, within the framework of the Strategic Action on Digital Economy and Society of the State Program for R&D&I oriented towards societal challenges.", + "_key": "40d8943b35bc" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0c458cad3365", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "286c888621d0" + }, + { + "alt": "grant logos", + "_key": "ac980d908410", + "asset": { + "_ref": "image-df17e8a21b15056284176b5b0a510e2e1d265850-1146x128-png", + "_type": "reference" + }, + "_type": "image" + } + ], + "_id": "7650013719b0", + "tags": [ + { + "_type": "reference", + "_key": "de1705230b57", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ] + }, + { + "_type": "blogPost", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "f760214b145d", + "markDefs": [ + { + "_type": "link", + "href": "https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html", + "_key": "36e716fcc3df" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "At one time, selecting a file system for distributed workloads was straightforward. Through the 1990s, the Network File System (NFS), developed by Sun Microsystems in 1984, was pretty much the only game in town. It was part of every UNIX distribution, and it presented a standard ", + "_key": "d924ae31546e" + }, + { + "_key": "7b9a7733b269", + "_type": "span", + "marks": [ + "36e716fcc3df" + ], + "text": "POSIX interface" + }, + { + "marks": [], + "text": ", meaning that applications could read and write data without modification. Dedicated NFS servers and NAS filers became the norm in most clustered computing environments.", + "_key": "3ff83fd939e2", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "af61c2a1ff6b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "58459c889529", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "be2304b67bef", + "markDefs": [ + { + "_key": "0df43e0198dd", + "_type": "link", + "href": "https://www.lustre.org/" + }, + { + "_type": "link", + "href": "https://www.anl.gov/mcs/pvfs-parallel-virtual-file-system", + "_key": "d6dd8a943ce3" + }, + { + "_key": "acb440cf4cb2", + "_type": "link", + "href": "https://openzfs.org/wiki/Main_Page" + }, + { + "_key": "0c713dc3e4be", + "_type": "link", + "href": "https://www.beegfs.io/c/" + }, + { + "href": "https://www.ibm.com/products/storage-scale-system", + "_key": "19d720fb5e9e", + "_type": "link" + } + ], + "children": [ + { + "_key": "bedbd03c48e7", + "_type": "span", + "marks": [], + "text": "For organizations that outgrew the capabilities of NFS, other POSIX file systems emerged. These included parallel file systems such as " + }, + { + "marks": [ + "0df43e0198dd" + ], + "text": "Lustre", + "_key": "4a2c63a9cea5", + "_type": "span" + }, + { + "marks": [], + "text": ", ", + "_key": "601d921ccead", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "d6dd8a943ce3" + ], + "text": "PVFS", + "_key": "0e5c554ec8a1" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "015ae3722947" + }, + { + "marks": [ + "acb440cf4cb2" + ], + "text": "OpenZFS", + "_key": "184156090139", + "_type": "span" + }, + { + "text": ", ", + "_key": "9ec885d644c5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "0c713dc3e4be" + ], + "text": "BeeGFS", + "_key": "ec3d47609d10" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "bca5f6c5c00c" + }, + { + "text": "IBM Spectrum Scale", + "_key": "15a066811fc7", + "_type": "span", + "marks": [ + "19d720fb5e9e" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (formerly GPFS). Parallel file systems can support thousands of compute clients and deliver more than a TB/sec combined throughput, however, they are expensive, and can be complex to deploy and manage. While some parallel file systems work with standard Ethernet, most rely on specialized low-latency fabrics such as Intel® Omni-Path Architecture (OPA) or InfiniBand. Because of this, these file systems are typically found in only the largest HPC data centers.", + "_key": "e8e84009af5d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a0a5d7d426fa", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "fa91c438506f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "677b3412fd7d", + "markDefs": [], + "children": [ + { + "text": "Cloud changes everything", + "_key": "b11f226c8f5b", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "196717039f05", + "markDefs": [ + { + "_key": "2a233c7de0a7", + "_type": "link", + "href": "https://aws.amazon.com/s3/" + } + ], + "children": [ + { + "text": "With the launch of ", + "_key": "f2fd34d71585", + "_type": "span", + "marks": [] + }, + { + "text": "Amazon S3", + "_key": "9bffc25d96c0", + "_type": "span", + "marks": [ + "2a233c7de0a7" + ] + }, + { + "marks": [], + "text": " in 2006, new choices began to emerge. Rather than being a traditional file system, S3 is an object store accessible through a web API. S3 abandoned traditional ideas around hierarchical file systems. Instead, it presented a simple programmatic interface and CLI for storing and retrieving binary objects.", + "_key": "4f6de03f9d46", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "23120689e080" + } + ], + "_type": "block", + "style": "normal", + "_key": "f421dde1a0bd", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ad309f918f94", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html", + "_key": "d9a4f88dce28" + }, + { + "_key": "0d1d3dde87df", + "_type": "link", + "href": "https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Object stores are a good fit for cloud services because they are simple and scalable to multiple petabytes of storage. Rather than relying on central metadata that presents a bottleneck, metadata is stored with each object. All operations are atomic, so there is no need for complex POSIX-style file-locking mechanisms that add complexity to the design. Developers interact with object stores using simple calls like ", + "_key": "81641ea0b49f" + }, + { + "_key": "c7efc133c3c1", + "_type": "span", + "marks": [ + "d9a4f88dce28" + ], + "text": "PutObject" + }, + { + "marks": [], + "text": " (store an object in a bucket in return for a key) and ", + "_key": "57768a3c4c75", + "_type": "span" + }, + { + "marks": [ + "0d1d3dde87df" + ], + "text": "GetObject", + "_key": "0b4d408861d1", + "_type": "span" + }, + { + "marks": [], + "text": " (retrieve a binary object, given a key).", + "_key": "5d8862733c6c", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "911c23f4a25e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "c3a5977d2020", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "a1054cd4bad9", + "markDefs": [ + { + "href": "https://azure.microsoft.com/en-ca/products/storage/blobs/", + "_key": "53286c954ebc", + "_type": "link" + }, + { + "href": "https://wiki.openstack.org/wiki/Swift", + "_key": "ecc7f5d8943b", + "_type": "link" + }, + { + "href": "https://cloud.google.com/storage/", + "_key": "3e9d66b449f2", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This simple approach was ideal for internet-scale applications. It was also much less expensive than traditional file systems. As a result, S3 usage grew rapidly. Similar object stores quickly emerged, including Microsoft ", + "_key": "809cde29aaf8" + }, + { + "text": "Azure Blob Storage", + "_key": "d05a49423fcd", + "_type": "span", + "marks": [ + "53286c954ebc" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "4cd9f5426c0c" + }, + { + "marks": [ + "ecc7f5d8943b" + ], + "text": "Open Stack Swift", + "_key": "49910fc3df6e", + "_type": "span" + }, + { + "marks": [], + "text": ", and ", + "_key": "7e6746a198e4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3e9d66b449f2" + ], + "text": "Google Cloud Storage", + "_key": "7c867c5c426b" + }, + { + "marks": [], + "text": ", released in 2010.", + "_key": "6f98d52453fb", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d7ca6e26b49d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6f95fb711fbe" + }, + { + "_key": "c7c62bb485a8", + "markDefs": [], + "children": [ + { + "text": "Cloud object stores vs. shared file systems", + "_key": "1890d0adf95b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "25131f7d5a54", + "markDefs": [ + { + "href": "https://availability.sre.xyz/", + "_key": "6e6040a287be", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Object stores are attractive because they are reliable, scalable, and cost-effective. They are frequently used to store large amounts of data that are accessed infrequently. Examples include archives, images, raw video footage, or in the case of bioinformatics applications, libraries of biological samples or reference genomes. Object stores provide near-continuous availability by spreading data replicas across cloud availability zones (AZs). AWS claims theoretical data availability of up to 99.999999999% (11 9's) – a level of availability so high that it does not even register on most ", + "_key": "cf0a27b25493" + }, + { + "marks": [ + "6e6040a287be" + ], + "text": "downtime calculators", + "_key": "53443b83c04f", + "_type": "span" + }, + { + "_key": "ef737530052e", + "_type": "span", + "marks": [], + "text": "!" + } + ], + "_type": "block" + }, + { + "_key": "049c31e88b49", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6dd6498183db" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "5f370ecc94d3", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/s3/pricing", + "_key": "a4fd33879e2f" + } + ], + "children": [ + { + "marks": [], + "text": "Because they support both near-line and cold storage, object stores are sometimes referred to as \"cheap and deep.\" Based on current ", + "_key": "206cd15f4daf", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "a4fd33879e2f" + ], + "text": "S3 pricing", + "_key": "cacf9ff109fd" + }, + { + "marks": [], + "text": ", the going rate for data storage is USD 0.023 per GB for the first 50 TB of data. Users can \"pay as they go\" — spinning up S3 storage buckets and storing arbitrary amounts of data for as long as they choose. Some high-level differences between object stores and traditional file systems are summarized below.", + "_key": "845635fa3f52", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "34688807e0ac" + } + ], + "_type": "block", + "style": "normal", + "_key": "df81f13abda6", + "markDefs": [] + }, + { + "body": "
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Cloud object stores\n \n Traditional file systems\n
\n Interface / access protocol\n \n HTTP-based API\n \n POSIX interface\n
\n Cost\n \n $\n \n $$$\n
\n Scalability / capacity\n \n Practically unlimited\n \n Limited\n
\n Reliability / availability\n \n Extremely high\n \n Varies\n
\n Performance\n \n Typically lower\n \n Varies\n
\n Support for existing application\n \n NO\n \n YES\n
\n
", + "_type": "markdownTable", + "_key": "48c632acd0dc" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "853536d84c1b" + } + ], + "_type": "block", + "style": "normal", + "_key": "11d9185b0e3b", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The downside of object storage is that the vast majority of applications are written to work with POSIX file systems. As a result, applications seldom interact directly with object stores. A common practice is to copy data from an object store, perform calculations locally on a cluster node, and write results back to the object store for long-term storage.", + "_key": "9b5ee1c7185b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8008370e24aa" + }, + { + "_key": "8ece30838538", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1dc80d37b75d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Data handling in Nextflow", + "_key": "4df4d4a7b6c6" + } + ], + "_type": "block", + "style": "h2", + "_key": "783fbcf93e71", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Unlike older pipeline orchestrators, Nextflow was built with cloud object stores in mind. Depending on the cloud where pipelines run, Nextflow manages cloud credentials and allows users to provide a path to shared data. This can be a shared file system such as ", + "_key": "7f3184c1d0a7", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "/my-shared-filesystem/data", + "_key": "f51835eca0e6", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " or a cloud object store, e.g., ", + "_key": "735793315d95" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "s3://my-bucket/data/", + "_key": "3e23a473fdb1" + }, + { + "_key": "2a515b1e7977", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "1b5b898d5ac2" + }, + { + "_key": "1b5f2c266f83", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "92b5f44c23bf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "233f6f7832e7", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/executor.html", + "_key": "e9ee0789870f" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Nextflow is exceptionally versatile when it comes to data handling, and can support almost any file system or object store.", + "_key": "9d087ff6ee5a" + }, + { + "text": " Internally, Nextflow uses ", + "_key": "e8422cc528b6", + "_type": "span", + "marks": [] + }, + { + "_key": "62a4a865c268", + "_type": "span", + "marks": [ + "e9ee0789870f" + ], + "text": "executors" + }, + { + "_type": "span", + "marks": [], + "text": " implemented as plug-ins to insulate pipeline code from underlying compute and storage environments. This enables pipelines to run without modification across multiple clouds regardless of the underlying storage technology.", + "_key": "7eded1448d14" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b9f2cee19d81", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "abbac4984319", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-amazon", + "_key": "3c5128af26e2" + } + ], + "children": [ + { + "_key": "fbb8e0e85817", + "_type": "span", + "marks": [], + "text": "Suppose an S3 bucket is specified as a location for shared data during pipeline execution. In that case, aided by the " + }, + { + "_type": "span", + "marks": [ + "3c5128af26e2" + ], + "text": "nf-amazon", + "_key": "ea7bf4632478" + }, + { + "_type": "span", + "marks": [], + "text": " plug-in, Nextflow transparently copies data from the S3 bucket to a file system on a cloud instance. Containerized applications mount the local file system and read and write data directly. Once processing is complete, Nextflow copies data to the shared bucket to be available for the next task. All of this is completely transparent to the pipeline and applications. The same plug-in-based approach is used for other cloud object stores such as Azure BLOBs and Google Cloud Storage.", + "_key": "856a01e6a8e7" + } + ], + "_type": "block", + "style": "normal", + "_key": "ecb08f84abd5" + }, + { + "style": "normal", + "_key": "064ce84daf06", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "f6ba5ae136af", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow scratch directive", + "_key": "c973e2b2e8ce" + } + ], + "_type": "block", + "style": "h2", + "_key": "b930bf2973f8" + }, + { + "markDefs": [], + "children": [ + { + "text": "The idea of staging data from shared repositories to a local disk, as described above, is not new. A common practice with HPC clusters when using NFS file systems is to use local \"scratch\" storage.", + "_key": "1c0a42bbb165", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e6072722f5a1" + }, + { + "style": "normal", + "_key": "1137b8caa5a5", + "markDefs": [], + "children": [ + { + "_key": "06c465d5ed27", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "29e33de39775", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A common problem with shared NFS file systems is that they can be relatively slow — especially when there are multiple clients. File systems introduce latency, have limited IO capacity, and are prone to problems such as “hot spots” and bandwidth limitations when multiple clients read and write files in the same directory.", + "_key": "54a6ee89289a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "452c0c0dcf2d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c2c1df40666" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.mvps.net/docs/how-to-mount-the-physical-memory-from-a-linux-system-as-a-partition/", + "_key": "bb1d97ec5e96" + } + ], + "children": [ + { + "marks": [], + "text": "To avoid bottlenecks, data is often copied from an NFS filer to local scratch storage for processing. Depending on data volumes, users often use fast solid-state drives or ", + "_key": "e903e533e29f", + "_type": "span" + }, + { + "_key": "bded42b8de47", + "_type": "span", + "marks": [ + "bb1d97ec5e96" + ], + "text": "RAM disks" + }, + { + "marks": [], + "text": " for scratch storage to accelerate processing.", + "_key": "7ed48b572c90", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "688a27a734f1" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "171815412434", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b32ad172a4a4", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/process.html?highlight=scratch#scratch", + "_key": "421399f8dcca" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow automates this data handling pattern with built-in support for a ", + "_key": "45b5acb25aab", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "421399f8dcca" + ], + "text": "scratch", + "_key": "380e4d5f70d3" + }, + { + "marks": [], + "text": " directive that can be enabled or disabled per process. If scratch is enabled, data is automatically copied to a designated local scratch device prior to processing.", + "_key": "ab39f384cb37", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "762d92553ba5" + }, + { + "_key": "6678585b57e6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "57c1b09c041d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "When high-performance file systems such as Lustre or Spectrum Scale are available, the question of whether to use scratch storage becomes more complicated. Depending on the file system and interconnect, parallel file systems performance can sometimes exceed that of local disk. In these cases, customers may set scratch to false and perform I/O directly on the parallel file system.", + "_key": "d3f880949402", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "65852ca186ac", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "12004a8a06a4" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ca2759ff9c5", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Results will vary depending on the performance of the shared file system, the speed of local scratch storage, and the amount of shared data to be shuttled back and forth. Users will want to experiment to determine whether enabling scratch benefits pipelines performance.", + "_key": "acc21666d121" + } + ], + "_type": "block", + "style": "normal", + "_key": "574000677ea6" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9cc9d55e591e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e041cca31d85" + }, + { + "_type": "block", + "style": "h2", + "_key": "4f714fe120e5", + "markDefs": [], + "children": [ + { + "_key": "6c65564d8d89", + "_type": "span", + "marks": [], + "text": "Multiple storage options for Nextflow users" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ee25902ca02e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Storage solutions used with Nextflow can be grouped into five categories as described below:", + "_key": "89d2c9fe37f10", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "65587d152ad4", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Traditional file systems", + "_key": "b2c0a3b7916a0", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3db596a90a81", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Cloud object stores", + "_key": "e0b26971b5a30", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Cloud file systems", + "_key": "8ff7f68a90d40" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0a61dd2983b2" + }, + { + "_type": "block", + "style": "normal", + "_key": "fda64f12da09", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "High-performance cloud file systems", + "_key": "f2b4433592410" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "fa2ecf250543", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Fusion file system v2.0", + "_key": "b4f4868accf30", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ba01eded3b5d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ca72ad63a115" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The optimal choice will depend on your environment and the nature of your applications and compute environments.", + "_key": "937a00c428f2" + } + ], + "_type": "block", + "style": "normal", + "_key": "5fa0b4908a67" + }, + { + "_type": "block", + "style": "normal", + "_key": "ce8c985aabb3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c44395a5ade2" + } + ] + }, + { + "markDefs": [ + { + "_key": "d562f186aab2", + "_type": "link", + "href": "https://www.netapp.com/" + }, + { + "_type": "link", + "href": "https://www.ddn.com/", + "_key": "ed24907e75c6" + }, + { + "_type": "link", + "href": "https://www.hpe.com/psnow/doc/a00062172enw", + "_key": "fe0d4e818850" + }, + { + "_type": "link", + "href": "https://www.ibm.com/products/storage-scale-system", + "_key": "cbee001225b3" + } + ], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Traditional file systems", + "_key": "c3f37b9df1bc", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " — These are file systems typically deployed on-premises that present a POSIX interface. NFS is the most popular choice, but some users may use high-performance parallel file systems. Storage vendors often package their offerings as appliances, making them easier to deploy and manage. Solutions common in on-prem HPC environments include ", + "_key": "53638d31561f" + }, + { + "_key": "e340406a5ac5", + "_type": "span", + "marks": [ + "d562f186aab2" + ], + "text": "Network Appliance" + }, + { + "_key": "6606a1c2a9a5", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "ed24907e75c6" + ], + "text": "Data Direct Networks", + "_key": "fef4da4975c5" + }, + { + "_type": "span", + "marks": [], + "text": " (DDN), ", + "_key": "f4436d49dc7c" + }, + { + "marks": [ + "fe0d4e818850" + ], + "text": "HPE Cray ClusterStor", + "_key": "130d64c74fc0", + "_type": "span" + }, + { + "marks": [], + "text": ", and ", + "_key": "bb6169c627c8", + "_type": "span" + }, + { + "marks": [ + "cbee001225b3" + ], + "text": "IBM Storage Scale", + "_key": "29ce17b17dce", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". While customers can deploy self-managed NFS or parallel file systems in the cloud, most don’t bother with this in practice. There are generally better solutions available in the cloud.", + "_key": "4ab7256ed015" + } + ], + "_type": "block", + "style": "normal", + "_key": "666ad3b69798" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0aef91bc9128" + } + ], + "_type": "block", + "style": "normal", + "_key": "330bd66284c1" + }, + { + "_type": "block", + "style": "normal", + "_key": "088db8cdb668", + "markDefs": [], + "children": [ + { + "text": "Cloud object stores", + "_key": "239f3215d516", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "fb09d77bfd5d", + "_type": "span", + "marks": [], + "text": " — In the cloud, object stores tend to be the most popular solution among Nextflow users. Although object stores don’t present a POSIX interface, they are inexpensive, easy to configure, and scale practically without limit. Depending on performance, access, and retention requirements, customers can purchase different object storage tiers at different price points. Popular cloud object stores include Amazon S3, Azure BLOBs, and Google Cloud Storage. As pipelines execute, the Nextflow executors described above manage data transfers to and from cloud object storage automatically. One drawback is that because of the need to copy data to and from the object store for every process, performance may be lower than a fast shared file system." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cda4f0a2dd65", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ee76949dc3ff", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "30dc0a610070", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/efs/", + "_key": "d4a51401993a" + }, + { + "_type": "link", + "href": "https://azure.microsoft.com/en-us/products/storage/files/", + "_key": "eb9826013a6a" + }, + { + "_type": "link", + "href": "https://cloud.google.com/filestore", + "_key": "2b870dc66680" + } + ], + "children": [ + { + "_key": "43658f4e3ddc", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Cloud file systems" + }, + { + "marks": [], + "text": " — Often, it is desirable to have a shared file NFS system. However, these environments can be tedious to deploy and manage in the cloud. Recognizing this, most cloud providers offer cloud file systems that combine some of the best properties of traditional file systems and object stores. These file systems present a POSIX interface and are accessible via SMB and NFS file-sharing protocols. Like object stores, they are easy to deploy and scalable on demand. Examples include ", + "_key": "ff3eb6e0aeb4", + "_type": "span" + }, + { + "marks": [ + "d4a51401993a" + ], + "text": "Amazon EFS", + "_key": "f7bccbd671e9", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "15e8d3fe5583" + }, + { + "_type": "span", + "marks": [ + "eb9826013a6a" + ], + "text": "Azure Files", + "_key": "f63889afcfe4" + }, + { + "text": ", and ", + "_key": "0bc46253f59d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "2b870dc66680" + ], + "text": "Google Cloud Filestore", + "_key": "87883e7ef639" + }, + { + "marks": [], + "text": ". These file systems are described as \"serverless\" and \"elastic\" because there are no servers to manage, and capacity scales automatically.", + "_key": "1d54a575745f", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "283b5356a7f5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9ea83c9d27b0" + }, + { + "_key": "2b802f90d60f", + "markDefs": [ + { + "href": "https://aws.amazon.com/efs/pricing/", + "_key": "2ebc244cd9bf", + "_type": "link" + }, + { + "href": "https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html", + "_key": "1559102899c3", + "_type": "link" + }, + { + "_key": "cf45b481281f", + "_type": "link", + "href": "https://azure.microsoft.com/en-us/pricing/details/storage/files/" + } + ], + "children": [ + { + "text": "Comparing price and performance can be tricky because cloud file systems are highly configurable. For example, ", + "_key": "47ea26292614", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "2ebc244cd9bf" + ], + "text": "Amazon EFS", + "_key": "cf97b99892b4", + "_type": "span" + }, + { + "_key": "09a3673a17a5", + "_type": "span", + "marks": [], + "text": " is available in " + }, + { + "_type": "span", + "marks": [ + "1559102899c3" + ], + "text": "four storage classes", + "_key": "b8668800d816" + }, + { + "_key": "1d613721d2aa", + "_type": "span", + "marks": [], + "text": " – Amazon EFS Standard, Amazon EFS Standard-IA, and two One Zone storage classes – Amazon EFS One Zone and Amazon EFS One Zone-IA. Similarly, Azure Files is configurable with " + }, + { + "_type": "span", + "marks": [ + "cf45b481281f" + ], + "text": "four different redundancy options", + "_key": "77f887e227ae" + }, + { + "_type": "span", + "marks": [], + "text": ", and different billing models apply depending on the offer selected. To provide a comparison, Amazon EFS Standard costs $0.08 /GB-Mo in the US East region, which is ~4x more expensive than Amazon S3.", + "_key": "c7bcbb66ddb1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "6ce913740e3b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b9fc1e362530" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "From the perspective of Nextflow users, using Amazon EFS and similar cloud file systems is the same as using a local NFS system. Nextflow users must ensure that their cloud instances mount the NFS share, so there is slightly more management overhead than using an S3 bucket. Nextflow users and administrators can experiment with the scratch directive governing whether Nextflow stages data in a local scratch area or reads and writes directly to the shared file system.", + "_key": "0eaa9456bf63", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "575fb8ce32a9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "53cffa6357c0" + } + ], + "_type": "block", + "style": "normal", + "_key": "46becda37fe9" + }, + { + "_key": "949b78c43880", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Cloud file systems suffer from some of the same limitations as on-prem NFS file systems. They often don’t scale efficiently, and performance is limited by network bandwidth. Also, depending on the pipeline, users may need to stage data to the shared file system in advance, often by copying data from an object store used for long term storage.", + "_key": "47a479d1cf68" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c23ff1128259", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "bc22d2ef17a9", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d11cde2c8233", + "markDefs": [ + { + "_key": "79d1887d8eb7", + "_type": "link", + "href": "https://cloud.tower.nf/" + } + ], + "children": [ + { + "marks": [], + "text": "For ", + "_key": "cd37f1f4a63d", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "79d1887d8eb7" + ], + "text": "Nextflow Tower", + "_key": "42cf614dac9d" + }, + { + "_type": "span", + "marks": [], + "text": " users, there is a convenient integration with Amazon EFS. Tower Cloud users can have an Amazon EFS instance created for them automatically via Tower Forge, or they can leverage an existing EFS instance in their compute environment. In either case, Tower ensures that the EFS share is available to compute hosts in the AWS Batch environment, reducing configuration requirements.", + "_key": "fd728979d8bd" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8d67187c5883" + } + ], + "_type": "block", + "style": "normal", + "_key": "2323bccfb8be", + "markDefs": [] + }, + { + "_key": "96db3ae2e6b6", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/fsx/lustre/", + "_key": "c3bc1358bf6b" + } + ], + "children": [ + { + "text": "Cloud high-performance file systems", + "_key": "d35a3d12eda7", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "a40ab9bada9e", + "_type": "span", + "marks": [], + "text": " — For customers that need high levels of performance in the cloud, Amazon offers Amazon FSx. Amazon FSx comes in different flavors, including NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. In HPC circles, " + }, + { + "_type": "span", + "marks": [ + "c3bc1358bf6b" + ], + "text": "FSx for Lustre", + "_key": "6e18d9b96aed" + }, + { + "marks": [], + "text": " is most popular delivering sub-millisecond latency, up to 1 TB/sec maximum throughput per file system, and millions of IOPs. Some Nextflow users with data bottlenecks use FSx for Lustre, but it is more difficult to configure and manage than Amazon S3.", + "_key": "2ae2646254a4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "153cfdcef191", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4676216e120b" + }, + { + "_key": "9e2a2fd1fb6b", + "markDefs": [], + "children": [ + { + "text": "Like Amazon EFS, FSx for Lustre is a fully-managed, serverless, elastic file system. Amazon FSx for Lustre is configurable, depending on customer requirements. For example, customers with latency-sensitive applications can deploy FSx cluster nodes with SSD drives. Customers concerned with cost and throughput can select standard hard drives (HDD). HDD-based FSx for Lustre clusters can be optionally configured with an SSD-based cache to accelerate performance. Customers also choose between different persistent file system options and a scratch file system option. Another factor to remember is that with parallel file systems, bandwidth scales with capacity. If you deploy a Lustre file system that is too small, you may be disappointed in the performance.", + "_key": "3f040585112b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "75f95651fa0f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6ed52734e401" + } + ] + }, + { + "style": "normal", + "_key": "65a68042b927", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/fsx/lustre/pricing/", + "_key": "5ad0dbe8f16f" + } + ], + "children": [ + { + "_key": "c49f174fb406", + "_type": "span", + "marks": [], + "text": "FSx for Lustre persistent file systems ranges from 125 to 1,000 MB/s/TiB at " + }, + { + "text": "prices", + "_key": "753043fb343e", + "_type": "span", + "marks": [ + "5ad0dbe8f16f" + ] + }, + { + "text": " ranging from ", + "_key": "20bca0622d6a", + "_type": "span", + "marks": [] + }, + { + "_key": "04e410459318", + "_type": "span", + "marks": [ + "strong" + ], + "text": "$0.145" + }, + { + "_key": "d749fff2d6a3", + "_type": "span", + "marks": [], + "text": " to " + }, + { + "text": "$0.600", + "_key": "11c093fa93fa", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "7afc5d707c1c", + "_type": "span", + "marks": [], + "text": " per GB month. Amazon also offers a lower-cost scratch FSx for Lustre file systems (not to be confused with the scratch directive in Nextflow). At this tier, FSx for Lustre does not replicate data across availability zones, so it is suited to short-term data storage. Scratch FSx for Lustre storage delivers " + }, + { + "text": "200 MB/s/TiB", + "_key": "ac0fcb8862cf", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": ", costing ", + "_key": "6fb731d777b5", + "_type": "span" + }, + { + "_key": "87dd40d4a516", + "_type": "span", + "marks": [ + "strong" + ], + "text": "$0.140" + }, + { + "marks": [], + "text": " per GB month. This is ", + "_key": "ac255f175e21", + "_type": "span" + }, + { + "text": "~75%", + "_key": "51ac87bddbac", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "text": " more expensive than Amazon EFS (Standard) and ", + "_key": "54bd0144aff0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "~6x", + "_key": "d1662a42f60b" + }, + { + "marks": [], + "text": " the cost of standard S3 storage. Persistent FSx for Lustre file systems configured to deliver ", + "_key": "324a75db56f6", + "_type": "span" + }, + { + "_key": "1fe8f6ee2165", + "_type": "span", + "marks": [ + "strong" + ], + "text": "1,000 MB/s/TiB" + }, + { + "text": " can be up to ", + "_key": "9b118c1e9eec", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "~26x", + "_key": "4876d340b9f8", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " the price of standard S3 object storage!", + "_key": "6fdf1752b8ae" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ca4589c54a73", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1ff23b1ccb59" + }, + { + "markDefs": [ + { + "href": "https://www.weka.io/", + "_key": "d7793f8843b5", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Hybrid Cloud file systems", + "_key": "7e928b595b8a", + "_type": "span" + }, + { + "text": " — In addition to the solutions described above, there are other solutions that combine the best of object stores and high-performance parallel file systems. An example is ", + "_key": "359ebb298c07", + "_type": "span", + "marks": [] + }, + { + "_key": "61ccdf002e1a", + "_type": "span", + "marks": [ + "d7793f8843b5" + ], + "text": "WekaFS™" + }, + { + "_type": "span", + "marks": [], + "text": " from WEKA. WekaFS is used by several Nextflow users and is deployable on-premises or across your choice cloud platforms. WekaFS is attractive because it provides multi-protocol access to the same data (POSIX, S3, NFS, SMB) while presenting a common namespace between on-prem and cloud resident compute environments. Weka delivers the performance benefits of a high-performance parallel file system and optionally uses cloud object storage as a backing store for file system data to help reduce costs.", + "_key": "213a8af5e9ba" + } + ], + "_type": "block", + "style": "normal", + "_key": "21ea1d8584b5" + }, + { + "_key": "296d2d96fb28", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e2f48adcfc73", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "5d52e274c929", + "markDefs": [], + "children": [ + { + "_key": "2e901d62ffdb", + "_type": "span", + "marks": [], + "text": "From a Nextflow perspective, WekaFS behaves like any other shared file system. As such, Nextflow and Tower have no specific integration with WEKA. Nextflow users will need to deploy and manage WekaFS themselves making the environment more complex to setup and manage. However, the flexibility and performance provided by a hybrid cloud file system makes this worthwhile for many organizations." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "c1d3d15f9ad2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c2b9b7af28b8", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://seqera.io/fusion", + "_key": "f0cae0d6e931", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Fusion file system 2.0", + "_key": "ee2484e42124" + }, + { + "_type": "span", + "marks": [], + "text": " — Fusion file system is a solution developed by ", + "_key": "c355beff8bb2" + }, + { + "_type": "span", + "marks": [ + "f0cae0d6e931" + ], + "text": "Seqera Labs", + "_key": "a81f4fbd6b4d" + }, + { + "_type": "span", + "marks": [], + "text": " that aims to bridge the gap between cloud-native storage and data analysis workflows. The solution implements a thin client that allows pipeline jobs to access object storage using a standard POSIX interface, thus simplifying and speeding up most operations.", + "_key": "44e5ee571941" + } + ], + "_type": "block", + "style": "normal", + "_key": "953967068b82" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "939c5637547a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bfd28f4bae4d" + }, + { + "_key": "5fb20cf6df96", + "markDefs": [ + { + "_key": "d0c9301ea6a0", + "_type": "link", + "href": "https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/" + } + ], + "children": [ + { + "_key": "5f782c6a9d0e", + "_type": "span", + "marks": [], + "text": "The advantage of the Fusion file system is that there is no need to copy data between S3 and local storage. The Fusion file system driver accesses and manipulates files in Amazon S3 directly. You can learn more about the Fusion file system and how it works in the whitepaper " + }, + { + "_type": "span", + "marks": [ + "d0c9301ea6a0" + ], + "text": "Breakthrough performance and cost-efficiency with the new Fusion file system", + "_key": "4724d29dbbf0" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "cd6203c659e6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e905660ae591", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2105f3a80074" + }, + { + "style": "normal", + "_key": "77ec46186f3e", + "markDefs": [ + { + "_key": "5a9fe30f9b87", + "_type": "link", + "href": "https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/" + } + ], + "children": [ + { + "text": "For sites struggling with performance and scalability issues on shared file systems or object storage, the Fusion file system offers several advantages. ", + "_key": "0030f51a5d6a", + "_type": "span", + "marks": [] + }, + { + "text": "Benchmarks conducted", + "_key": "5c7220ddaaab", + "_type": "span", + "marks": [ + "5a9fe30f9b87" + ] + }, + { + "text": " by Seqera Labs have shown that, in some cases, ", + "_key": "6c672165837b", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "Fusion can deliver performance on par with Lustre but at a much lower cost.", + "_key": "b4d787073182", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " Fusion is also significantly easier to configure and manage and can result in lower costs for both compute and storage resources.", + "_key": "1fa28d30450b" + } + ], + "_type": "block" + }, + { + "_key": "e99e4311a062", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "dea09915adb7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "5106472c51f2", + "markDefs": [], + "children": [ + { + "text": "Comparing the alternatives", + "_key": "d7a07935147b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "7ce0bd3fcfe3", + "_type": "span", + "marks": [], + "text": "A summary of storage options is presented in the table below:" + } + ], + "_type": "block", + "style": "normal", + "_key": "93e6ddcba003" + }, + { + "_type": "block", + "style": "normal", + "_key": "ac76bfd96e44", + "markDefs": [], + "children": [ + { + "_key": "3ee5ec9be1a1", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "markdownTable", + "_key": "fde209cc0d1b", + "body": "
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n Traditional file systems\n \n Cloud object storage\n \n Cloud file systems\n \n Fusion FS\n
\n NFS, Lustre, Spectrum Scale\n \n Amazon S3\n \n Azure BLOB storage\n \n Google Cloud Storage\n \n Amazon EFS\n \n Amazon FSX for Lustre\n \n Azure File\n \n Fusion file system 2.0\n
\n Deployment model\n \n Manual\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n \n Serverless\n
\n Access model\n \n POSIX\n \n Object\n \n Object\n \n Object\n \n POSIX\n \n POSIX\n \n POSIX\n \n POSIX\n
\n Clouds supported\n \n On-prem, any cloud\n \n AWS only\n \n Azure only\n \n GCP only\n \n AWS only\n \n AWS only\n \n Azure only\n \n AWS, GCP and Azure 1\n
\n Requires block storage\n \n Yes\n \n Optional\n \n Optional\n \n Optional\n \n Optional\n \n No\n \n Optional\n \n No\n
\n Relative cost\n \n $$\n \n $\n \n $\n \n $\n \n $$\n \n $$$\n \n $$\n \n $\n
\n Nextflow plugins\n \n -\n \n nf-amazon\n \n nf-azure\n \n nf-google\n \n -\n \n -\n \n -\n \n nf-amazon\n
\n Tower support\n \n Yes\n \n Yes, existing buckets\n \n Yes, existing BLOB container\n \n Yes, existing cloud storage bucket\n \n Yes, creates EFS instances\n \n Yes, creates FSx for Lustre instances\n \n File system created manually\n \n Yes, fully automated\n
\n Dependencies\n \n Externally configured\n \n Wave Amazon S3\n
\n Cost model\n \n Fixed price on-prem, instance+block storage costs\n \n GB per month\n \n GB per month\n \n GB per month\n \n Multiple factors\n \n Multiple factors\n \n Multiple factors\n \n GB per month (uses S3)\n
\n Level of configuration effort (when used with Tower)\n \n High\n \n Low\n \n Low\n \n Low\n \n Medium (low with Tower)\n \n High (easier with Tower)\n \n Medium\n \n Low\n
\n Works best with:\n \n Any on-prem cluster manager (LSF, Slurm, etc.)\n \n AWS Batch\n \n Azure Batch\n \n Google Cloud Batch\n \n AWS Batch\n \n AWS Batch\n \n Azure Batch\n \n AWS Batch, Amazon EKS, Azure Batch, Google Cloud Batch 1\n
\n
" + }, + { + "_key": "bb6e57fae96c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "356c7b05f4d0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "535de99262ec", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "So what’s the bottom line?", + "_key": "b659e110fd84", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "7fde6033fa3d", + "markDefs": [], + "children": [ + { + "_key": "9fef3d3d4fbb", + "_type": "span", + "marks": [], + "text": "The choice or storage solution depends on several factors. Object stores like Amazon S3 are popular because they are convenient and inexpensive. However, depending on data access patterns, and the amount of data to be staged in advance, file systems such as EFS, Azure Files or FSx for Lustre can also be a good alternative." + } + ], + "_type": "block" + }, + { + "_key": "9023f8b6c875", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a707747fb405", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b4c816fe7e19", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/docs/latest/fusion.html", + "_key": "200a2ff7afd9" + } + ], + "children": [ + { + "marks": [], + "text": "For many Nextflow users, Fusion file system will be a better option since it offers performance comparable to a high-performance file system at the cost of cloud object storage. Fusion is also dramatically easier to deploy and manage. ", + "_key": "5dd52baf2a9d", + "_type": "span" + }, + { + "_key": "e2e03c6ba338", + "_type": "span", + "marks": [ + "200a2ff7afd9" + ], + "text": "Adding Fusion support" + }, + { + "_key": "63ca5d31b6b6", + "_type": "span", + "marks": [], + "text": " is just a matter of adding a few lines to the " + }, + { + "text": "nextflow.config", + "_key": "546c7fe61657", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " file.", + "_key": "6c98e446b56a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "689bc5089183", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e9ff0e7621f5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "7a308e0f0570", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Where workloads run is also an important consideration. For example, on-premises clusters will typically use whatever shared file system is available locally. When operating in the cloud, you can choose whether to use cloud file systems, object stores, high-performance file systems, Fusion FS, or hybrid cloud solutions such as Weka.", + "_key": "f54d8bfe630b" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "37e17f06cd13", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a90f0fe4f383" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Still unsure what storage solution will best meet your needs? Considerich storage solution will best meet your needs? Consider joining our community at nextflow.slack.com. There, yd learn more about the pros and cons of the storage solutions described above.", + "_key": "01ddd205d2d0" + } + ], + "_type": "block", + "style": "normal", + "_key": "f382789ac04f", + "markDefs": [] + } + ], + "_createdAt": "2024-09-25T14:17:42Z", + "_id": "77176fa22a84", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "2a7ff6b19526" + } + ], + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_updatedAt": "2024-10-14T10:27:15Z", + "publishedAt": "2023-05-04T06:00:00.000Z", + "meta": { + "description": "In this article we present the various storage solutions supported by Nextflow including on-prem and cloud file systems, parallel file systems, and cloud object stores. We also discuss Fusion file system 2.0, a new high-performance file system that can help simplify configuration, improve throughput, and reduce costs in the cloud.", + "slug": { + "current": "selecting-the-right-storage-architecture-for-your-nextflow-pipelines" + } + }, + "_rev": "mvya9zzDXWakVjnX4hhSGg", + "title": "Selecting the right storage architecture for your Nextflow pipelines" + }, + { + "publishedAt": "2016-02-04T07:00:00.000Z", + "meta": { + "slug": { + "current": "developing-bioinformatics-pipeline-across-multiple-environments" + } + }, + "_createdAt": "2024-09-25T14:15:04Z", + "_rev": "mvya9zzDXWakVjnX4hhXjG", + "_id": "7ab6b9c2f21c", + "_type": "blogPost", + "_updatedAt": "2024-09-25T14:15:04Z", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "body": [ + { + "children": [ + { + "text": "As a new bioinformatics student with little formal computer science training, there are few things that scare me more than PhD committee meetings and having to run my code in a completely different operating environment.", + "_key": "a4c1d97a50cd", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0b36fa3f630d", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "3c60a17143d2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7f71794298d8" + } + ] + }, + { + "markDefs": [ + { + "_key": "db498cbe4bfb", + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Univa_Grid_Engine" + }, + { + "href": "http://www.bsc.es", + "_key": "d43b1cf0fab1", + "_type": "link" + } + ], + "children": [ + { + "text": "Recently my work landed me in the middle of the phylogenetic tree jungle and the computational requirements of my project far outgrew the resources that were available on our institute’s ", + "_key": "0739d43394bf", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "db498cbe4bfb" + ], + "text": "Univa Grid Engine", + "_key": "26bbebd55519" + }, + { + "_type": "span", + "marks": [], + "text": " based cluster. Luckily for me, an opportunity arose to participate in a joint program at the MareNostrum HPC at the ", + "_key": "afd6a3f83f86" + }, + { + "_type": "span", + "marks": [ + "d43b1cf0fab1" + ], + "text": "Barcelona Supercomputing Centre", + "_key": "bc0092f22b6e" + }, + { + "_type": "span", + "marks": [], + "text": " (BSC).", + "_key": "e8fecd0a44d2" + } + ], + "_type": "block", + "style": "normal", + "_key": "6f0a1be725d3" + }, + { + "_type": "block", + "style": "normal", + "_key": "43f45d9085ea", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7e7e58379eb8" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8e2f3402708c", + "markDefs": [ + { + "href": "https://www.bsc.es/discover-bsc/the-centre/marenostrum", + "_key": "71b1efd06bb6", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As one of the top 100 supercomputers in the world, the ", + "_key": "470fd7b5ac2c" + }, + { + "marks": [ + "71b1efd06bb6" + ], + "text": "MareNostrum III", + "_key": "a3d121da5a9d", + "_type": "span" + }, + { + "marks": [], + "text": " dwarfs our cluster and consists of nearly 50'000 processors. However it soon became apparent that with great power comes great responsibility and in the case of the BSC, great restrictions. These include no internet access, restrictive wall times for jobs, longer queues, fewer pre-installed binaries and an older version of bash. Faced with the possibility of having to rewrite my 16 bodged scripts for another queuing system I turned to Nextflow.", + "_key": "353e24b139b3", + "_type": "span" + } + ] + }, + { + "_key": "22e9cdc235c1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1f200d0e40c6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Straight off the bat I was able to reduce all my previous scripts to a single Nextflow script. Admittedly, the original code was not great, but the data processing model made me feel confident in what I was doing and I was able to reduce the volume of code to 25% of its initial amount whilst making huge improvements in the readability. The real benefits however came from the portability.", + "_key": "0028ffdfa590" + } + ], + "_type": "block", + "style": "normal", + "_key": "00412316556d" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b609f84d8d9c" + } + ], + "_type": "block", + "style": "normal", + "_key": "63fcec60e600" + }, + { + "_key": "5b6e8f1ff75d", + "markDefs": [ + { + "_key": "a28ccdd5264c", + "_type": "link", + "href": "https://en.wikipedia.org/wiki/Platform_LSF" + }, + { + "_type": "link", + "href": "/blog/2015/mpi-like-execution-with-nextflow.html", + "_key": "b56c39de2824" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I was able to write the project on my laptop (Macbook Air), continuously test it on my local desktop machine (Linux) and then perform more realistic heavy lifting runs on the cluster, all managed from a single GitHub repository. The BSC uses the ", + "_key": "feac2fcfafb4" + }, + { + "text": "Load Sharing Facility", + "_key": "9137207ae85f", + "_type": "span", + "marks": [ + "a28ccdd5264c" + ] + }, + { + "_type": "span", + "marks": [], + "text": " (LSF) platform with longer queue times, but a large number of CPUs. My project on the other hand had datasets that require over 100'000 tasks, but the tasks processes themselves run for a matter of seconds or minutes. We were able to marry these two competing interests deploying Nextflow in a ", + "_key": "e1f5505bcc40" + }, + { + "marks": [ + "b56c39de2824" + ], + "text": "distributed execution manner that resemble the one of an MPI application", + "_key": "f0e0969bb592", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "840b93914905" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e7b51bef3be8" + } + ], + "_type": "block", + "style": "normal", + "_key": "e709542e24f2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this configuration, the queuing system allocates the Nextflow requested resources and using the embedded ", + "_key": "0f48af11ef84" + }, + { + "_key": "380f3e8db74c", + "_type": "span", + "marks": [ + "2ecafe164e43" + ], + "text": "Apache Ignite" + }, + { + "_type": "span", + "marks": [], + "text": " clustering engine, Nextflow handles the submission of processes to the individual nodes.", + "_key": "e9060c6026b3" + } + ], + "_type": "block", + "style": "normal", + "_key": "a3cc5d75ed20", + "markDefs": [ + { + "_type": "link", + "href": "https://ignite.apache.org/", + "_key": "2ecafe164e43" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b4adcbf5b019" + } + ], + "_type": "block", + "style": "normal", + "_key": "714aa6049f8d" + }, + { + "_key": "af18db018f1d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Here is some examples of how to run the same Nextflow project over multiple platforms.", + "_key": "7f1d06ceed71", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3c0c2c1afd59" + } + ], + "_type": "block", + "style": "normal", + "_key": "ed1edccffba6" + }, + { + "style": "h4", + "_key": "617daea92f56", + "children": [ + { + "_type": "span", + "text": "Local", + "_key": "b636a86a840f" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "acd7c0c7eb14", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If I wished to launch a job locally I can run it with the command:", + "_key": "67b9ac226123", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c6f30a0001e1", + "children": [ + { + "_key": "1cea880b896e", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "nextflow run myproject.nf", + "_type": "code", + "_key": "a47c9d825b9e" + }, + { + "_key": "e97a016d2fc0", + "children": [ + { + "_type": "span", + "text": "Univa Grid Engine (UGE)", + "_key": "9a90f0d708f0" + } + ], + "_type": "block", + "style": "h4" + }, + { + "_key": "32578d372f9d", + "markDefs": [], + "children": [ + { + "_key": "e6fda01d6317", + "_type": "span", + "marks": [], + "text": "For the UGE I simply needed to specify the following in the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "0ff3f079d142" + }, + { + "_type": "span", + "marks": [], + "text": " file:", + "_key": "c69a73aca4f8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "686743efef97", + "children": [ + { + "_key": "ffb9a5a90cdf", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "process {\n executor='uge'\n queue='my_queue'\n}", + "_type": "code", + "_key": "5a41750313c4" + }, + { + "_type": "block", + "style": "normal", + "_key": "58541385a7ce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "And then launch the pipeline execution as we did before:", + "_key": "cb8252b251ef" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "917d1be366bc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b7cf33a1c1c" + }, + { + "_type": "code", + "_key": "6bc483d39d1f", + "code": "nextflow run myproject.nf" + }, + { + "children": [ + { + "_type": "span", + "text": "Load Sharing Facility (LSF)", + "_key": "9d345ec573f6" + } + ], + "_type": "block", + "style": "h4", + "_key": "5f6ca8e3ccb4" + }, + { + "_type": "block", + "style": "normal", + "_key": "5cf745e4feb5", + "markDefs": [], + "children": [ + { + "_key": "574ad2f77840", + "_type": "span", + "marks": [], + "text": "For running the same pipeline in the MareNostrum HPC environment, taking advantage of the MPI standard to deploy my workload, I first created a wrapper script (for example " + }, + { + "marks": [ + "code" + ], + "text": "bsc-wrapper.sh", + "_key": "a7e434c9a1cd", + "_type": "span" + }, + { + "_key": "a31766699b87", + "_type": "span", + "marks": [], + "text": ") declaring the resources that I want to reserve for the pipeline execution:" + } + ] + }, + { + "style": "normal", + "_key": "85bda35c7c6c", + "children": [ + { + "text": "", + "_key": "d9c2aff93d23", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "#!/bin/bash\n#BSUB -oo logs/output_%J.out\n#BSUB -eo logs/output_%J.err\n#BSUB -J myProject\n#BSUB -q bsc_ls\n#BSUB -W 2:00\n#BSUB -x\n#BSUB -n 512\n#BSUB -R \"span[ptile=16]\"\nexport NXF_CLUSTER_SEED=$(shuf -i 0-16777216 -n 1)\nmpirun --pernode bin/nextflow run concMSA.nf -with-mpi", + "_type": "code", + "_key": "ee8656e5d47a" + }, + { + "style": "normal", + "_key": "0f8d5e627b66", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "And then can execute it using ", + "_key": "59f700ea9f3e" + }, + { + "marks": [ + "code" + ], + "text": "bsub", + "_key": "78740e128847", + "_type": "span" + }, + { + "text": " as shown below:", + "_key": "cc2767bdec19", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3a0517bacd79" + } + ], + "_type": "block", + "style": "normal", + "_key": "9ba90be09ded" + }, + { + "_key": "0dec03443e56", + "code": "bsub < bsc-wrapper.sh", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "742300ca9ce1", + "markDefs": [ + { + "_type": "link", + "href": "/docs/latest/getstarted.html?highlight=resume#modify-and-resume", + "_key": "9d682263bd9a" + } + ], + "children": [ + { + "_key": "8811f7ad3a7c", + "_type": "span", + "marks": [], + "text": "By running Nextflow in this way and given the wrapper above, a single " + }, + { + "text": "bsub", + "_key": "a3a508ece92c", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "4c12180e404f", + "_type": "span", + "marks": [], + "text": " job will run on 512 cores in 32 computing nodes (512/16 = 32) with a maximum wall time of 2 hours. Thousands of Nextflow processes can be spawned during this and the execution can be monitored in the standard manner from a single Nextflow output and error files. If any errors occur the execution can of course to continued with " + }, + { + "marks": [ + "9d682263bd9a" + ], + "text": "`-resume` command line option", + "_key": "5d3cf754246e", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "d050dd9a5d70", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ef8fa6091372", + "children": [ + { + "text": "", + "_key": "08b706c77b83", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "a1ca81102c1f", + "_type": "span", + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h3", + "_key": "b14a0aca49d4" + }, + { + "style": "normal", + "_key": "fb95f3ed5208", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow provides a simplified way to develop across multiple platforms and removes much of the overhead associated with running niche, user developed pipelines in an HPC environment.", + "_key": "cac33061df57" + } + ], + "_type": "block" + } + ], + "title": "Developing a bioinformatics pipeline across multiple environments" + }, + { + "meta": { + "slug": { + "current": "better-support-through-community-forum-2024" + } + }, + "author": { + "_type": "reference", + "_ref": "geraldine-van-der-auwera" + }, + "_rev": "iDu5BZYWt2aPtfbIxmis06", + "body": [ + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As the Nextflow community continues to grow, fostering a space where users can easily find help and share knowledge is more important than ever. In this post, we’ll explore our ongoing efforts to enhance the community forum, transitioning from Slack as the primary platform for peer-to-peer support. By improving the forum’s usability and accessibility, we’re aiming to create a more efficient and welcoming environment for everyone. Read on to learn about the changes we’re implementing and how you can contribute to making the forum an even better resource for the community.", + "_key": "975d210f07a0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f83ae119d802" + }, + { + "children": [ + { + "_key": "347a6e48c727", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "8af2a8bf1867" + }, + { + "_type": "block", + "_key": "edb19bb2cb54" + }, + { + "children": [ + { + "_key": "91e8796b2eb9", + "_type": "span", + "text": "---" + } + ], + "_type": "block", + "style": "normal", + "_key": "6ad8384de8fb" + }, + { + "_key": "832741b82fb0", + "children": [ + { + "text": "", + "_key": "cea7bd7b09e8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a5f6c9409136", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "One of the things that impressed me the most when I joined Seqera last year as a developer advocate for the Nextflow community, was how engaged people are, and how much peer-to-peer interaction there is across a vast range of scientific domains, cultures, and geographies. That’s wonderful for a number of reasons, not least of which is that whenever you run into a problem —or you’re trying to do something a bit complicated or new— it’s very likely that there is someone out there who is able and willing to help you figure it out.", + "_key": "805d14899cbd" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "51998bea4846", + "children": [ + { + "_type": "span", + "text": "", + "_key": "02353600d457" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the past few months, our small team of developer advocates have been thinking about how to nurture that dynamism, and how to further improve the experience of peer-to-peer support as the Nextflow community continues to grow. We’ve come to the conclusion that the best thing we can do is make the ", + "_key": "e075f2913e21" + }, + { + "marks": [ + "286d13f872a3" + ], + "text": "community forum", + "_key": "7c1586e24afc", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " an awesome place to go for help, answers, and resources.", + "_key": "c152851896fa" + } + ], + "_type": "block", + "style": "normal", + "_key": "e87bc3093a4b", + "markDefs": [ + { + "_type": "link", + "href": "https://community.seqera.io/", + "_key": "286d13f872a3" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d204598de8c3", + "children": [ + { + "text": "", + "_key": "3b85f75ec79a", + "_type": "span" + } + ] + }, + { + "style": "h2", + "_key": "30be06bfbdb1", + "children": [ + { + "text": "Why focus on the forum?", + "_key": "3a3a6a4b84e1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "0b70b3d41ca3", + "markDefs": [], + "children": [ + { + "_key": "673398ee6c9b", + "_type": "span", + "marks": [], + "text": "If you’re familiar with the Nextflow Slack workspace, you know there’s a lot of activity there, and the #help channel is always hopping. It’s true, and that’s great, buuuuut using Slack has some important downsides that the forum doesn’t suffer from." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "359d2c52adcd", + "children": [ + { + "_key": "2048729a9cf8", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "One of the standout features of the forum is the ability to search past questions and answers really easily. Whether you're browsing directly within the forum, or using Google or some other search engine, you can quickly find relevant information in a way that’s much harder to do on Slack. This means that solutions to common issues are readily accessible, saving you (and the resident experts who have already answered the same question a bunch of times) a whole lot of time and effort.", + "_key": "efb40ae8dbeb", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9b17349051bc" + }, + { + "_type": "block", + "style": "normal", + "_key": "0ffa5ca023ef", + "children": [ + { + "_type": "span", + "text": "", + "_key": "da3297d10918" + } + ] + }, + { + "style": "normal", + "_key": "98bdb6cd3574", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, the forum has no barrier to access— you can view all the content without the need to join yet another app. This open access ensures that everyone can benefit from the wealth of knowledge shared by community members.", + "_key": "c1b8bc252365" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "04f38b1f44fa", + "children": [ + { + "_type": "span", + "text": "", + "_key": "eff8fecc8cb9" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "9c9526483719", + "children": [ + { + "_key": "9f68ec3cee23", + "_type": "span", + "text": "Immediate improvements to the forum’s ease of use" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We’re excited to roll out a few immediate changes to the forum that should make it easier and more pleasant to use.", + "_key": "4f508160c66d" + } + ], + "_type": "block", + "style": "normal", + "_key": "7d1b7e5747b9", + "markDefs": [] + }, + { + "children": [ + { + "_key": "d8921c79a9d0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2e6708266a5" + }, + { + "_key": "952be771f2c0", + "listItem": "bullet", + "children": [ + { + "text": "We’re introducing a new, sleeker visual design to make navigation and posting more intuitive and enjoyable.", + "_key": "10ace6743d0d", + "_type": "span", + "marks": [] + }, + { + "_key": "310bc1047048", + "_type": "span", + "marks": [], + "text": "We’ve reorganized the categories to streamline the process of finding and providing help. Instead of having separate categories for various things (like Nextflow, Wave, Seqera Platform etc), there is now a single \"Ask for help\" category for all topics, eliminating any confusion about where to post your question. Simply put, if you need help, just post in the \"Ask for help\" category. Done." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f12aa532d61e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "759ac57f39b4" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5ec6ec2ac1c7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We’re also planning to mirror existing categories from the Nextflow Slack workspace, such as the jobs board and shameless promo channels, to make that content more visible and searchable. This will help you find opportunities and promote your work more effectively.", + "_key": "03b236799d91" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "5bc990578b86" + } + ], + "_type": "block", + "style": "normal", + "_key": "d4a11d6ec75d" + }, + { + "children": [ + { + "text": "What you can do to help", + "_key": "a1e795d5bf44", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "b724d8cb9556" + }, + { + "children": [ + { + "text": "These changes are meant to make the forum a great place for peer-to-peer support for the Nextflow community. You can help us improve it further by giving us your feedback about the forum functionality (don’t be shy), by posting your questions in the forum, and of course, if you’re already a Nextflow expert, by answering questions there.", + "_key": "ad54cbb38c97", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "78765a70912e", + "markDefs": [] + }, + { + "_key": "948a32a1f182", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1afca099c8cd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "1a8df9c34f9f", + "_type": "link", + "href": "https://community.seqera.io/" + } + ], + "children": [ + { + "marks": [], + "text": "Check out the ", + "_key": "d37db990570b", + "_type": "span" + }, + { + "marks": [ + "1a8df9c34f9f" + ], + "text": "community forum", + "_key": "5a0abff9240f", + "_type": "span" + }, + { + "_key": "650faf81aead", + "_type": "span", + "marks": [], + "text": " now!" + } + ], + "_type": "block", + "style": "normal", + "_key": "f2d37db417c6" + } + ], + "title": "Moving toward better support through the Community forum", + "_id": "7c1de90e23f8", + "publishedAt": "2024-08-28T06:00:00.000Z", + "_createdAt": "2024-09-25T14:17:48Z", + "_type": "blogPost", + "_updatedAt": "2024-09-25T14:17:48Z" + }, + { + "title": "Nextflow published in Nature Biotechnology", + "publishedAt": "2017-04-12T06:00:00.000Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_updatedAt": "2024-09-26T09:01:44Z", + "_rev": "Ot9x7kyGeH5005E3MIwhin", + "_createdAt": "2024-09-25T14:15:18Z", + "_type": "blogPost", + "_id": "7d68193f6d0e", + "body": [ + { + "style": "normal", + "_key": "953a2a7819e9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We are excited to announce the publication of our work ", + "_key": "b908981867b9", + "_type": "span" + }, + { + "text": "[Nextflow enables reproducible computational workflows](http://rdcu.be/qZVo)", + "_key": "62eed18e827c", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": " in Nature Biotechnology.", + "_key": "53af53818937" + } + ], + "_type": "block" + }, + { + "_key": "dd48598d301b", + "children": [ + { + "_key": "9db194ba4a48", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c7061fa83e82", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The article provides a description of the fundamental components and principles of Nextflow. We illustrate how the unique combination of containers, pipeline sharing and portable deployment provides tangible advantages to researchers wishing to generate reproducible computational workflows.", + "_key": "b09630a7d59d" + } + ] + }, + { + "_key": "b938135b5028", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ecbdccfe6eee" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://www.nature.com/news/reproducibility-1.17552", + "_key": "f14fb36f4d13" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Reproducibility is a ", + "_key": "4ebfa87b1ff0" + }, + { + "_key": "95e6ffeda7e8", + "_type": "span", + "marks": [ + "f14fb36f4d13" + ], + "text": "major challenge" + }, + { + "text": " in today's scientific environment. We show how three bioinformatics data analyses produce different results when executed on different execution platforms and how Nextflow, along with software containers, can be used to control numerical stability, enabling consistent and replicable results across different computing platforms. As complex omics analyses enter the clinical setting, ensuring that results remain stable brings on extra importance.", + "_key": "b418ea0b8033", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7a5ecdf73ab7" + }, + { + "children": [ + { + "_key": "f5a887645982", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "142e50f421d7" + }, + { + "_type": "block", + "style": "normal", + "_key": "5fb43aa8903d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Since its first release three years ago, the Nextflow user base has grown in an organic fashion. From the beginning it has been our own demands in a workflow tool and those of our users that have driven the development of Nextflow forward. The publication forms an important milestone in the project and we would like to extend a warm thank you to all those who have been early users and contributors.", + "_key": "15aa9e1f543f" + } + ] + }, + { + "style": "normal", + "_key": "a81e136b892e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3d565e44ee52" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "19f10f872ade", + "_type": "span", + "marks": [], + "text": "We kindly ask if you use Nextflow in your own work to cite the following article:" + } + ], + "_type": "block", + "style": "normal", + "_key": "d1ca40cceff0" + }, + { + "style": "normal", + "_key": "75025325728e", + "children": [ + { + "text": "", + "_key": "7ada418b7bc2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "ad9e75ada6da", + "_type": "block" + } + ], + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "7c5815e02d68" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "516140eace9d" + } + ], + "meta": { + "slug": { + "current": "nextflow-nature-biotech-paper" + } + } + }, + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "publishedAt": "2019-05-22T06:00:00.000Z", + "meta": { + "slug": { + "current": "one-more-step-towards-modules" + } + }, + "_createdAt": "2024-09-25T14:15:42Z", + "_type": "blogPost", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "4c710f7b5cf9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The ability to create components, libraries or module files has been among the most requested feature ever over the years.", + "_key": "f3a011ee5618", + "_type": "span" + } + ] + }, + { + "_key": "7f687916d0be", + "children": [ + { + "text": "", + "_key": "4932277ddfec", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "44f5b06fbdba", + "markDefs": [ + { + "_key": "8ed6483b9df8", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/984" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v19.05.0-edge", + "_key": "fc6ffe45e56c" + } + ], + "children": [ + { + "text": "For this reason, today we are very happy to announce that a preview implementation of the ", + "_key": "3cc199377aff", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "8ed6483b9df8" + ], + "text": "modules feature", + "_key": "bb0b3f56ecf4", + "_type": "span" + }, + { + "_key": "1b257fc4b972", + "_type": "span", + "marks": [], + "text": " has been merged on master branch of the project and included in the " + }, + { + "_key": "6481fb3aad21", + "_type": "span", + "marks": [ + "fc6ffe45e56c" + ], + "text": "19.05.0-edge" + }, + { + "text": " release.", + "_key": "32c2a85e18a4", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_key": "4d40de447fe5", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "df1985a5e5ab" + }, + { + "style": "normal", + "_key": "23b98adb7dea", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The implementation of this feature has opened the possibility for many fantastic improvements to Nextflow and its syntax. We are extremely excited as it results in a radical new way of writing Nextflow applications! So much so, that we are referring to these changes as DSL 2.", + "_key": "7be4289a80e3" + } + ], + "_type": "block" + }, + { + "_key": "53fb6ffdde93", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b8161d83c0b7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h4", + "_key": "c5f225d8ffae", + "children": [ + { + "text": "Enabling DSL 2 syntax", + "_key": "275257db42e2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "abbf8c1bf53f", + "_type": "span", + "marks": [], + "text": "Since this is still a preview technology and, above all, to not break any existing applications, to enable the new syntax you will need to add the following line at the beginning of your workflow script:" + } + ], + "_type": "block", + "style": "normal", + "_key": "22d0c163c4e8", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8279fb68e52a" + } + ], + "_type": "block", + "style": "normal", + "_key": "3840f47d8836" + }, + { + "_key": "250aba294c5e", + "code": "nextflow.preview.dsl=2", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "60659449e9c7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ac9df01a2f12" + } + ] + }, + { + "style": "h4", + "_key": "2973c98f3cca", + "children": [ + { + "text": "Module files", + "_key": "9f98e5794807", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "e03991ab837c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A module file simply consists of one or more ", + "_key": "428c1a0b1c20" + }, + { + "marks": [ + "code" + ], + "text": "process", + "_key": "25fb4e47f54c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " definitions, written with the usual syntax. The ", + "_key": "033da6662c2e" + }, + { + "marks": [ + "em" + ], + "text": "only", + "_key": "7bd672d80c96", + "_type": "span" + }, + { + "marks": [], + "text": " difference is that the ", + "_key": "adb44ceb5baf", + "_type": "span" + }, + { + "text": "from", + "_key": "10b3c6038368", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " and ", + "_key": "22253dc8d3b9", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "into", + "_key": "fa35db509364" + }, + { + "_type": "span", + "marks": [], + "text": " clauses in the ", + "_key": "23b6d3125b36" + }, + { + "_key": "42b5454d6401", + "_type": "span", + "marks": [ + "code" + ], + "text": "input:" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "38bf96121185" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "output:", + "_key": "e7406a512182" + }, + { + "marks": [], + "text": " definition blocks has to be omitted. For example:", + "_key": "06d2ab14eb5a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d7965ecf12b1" + } + ], + "_type": "block", + "style": "normal", + "_key": "e52cb12abe62" + }, + { + "code": "process INDEX {\n input:\n file transcriptome\n output:\n file 'index'\n script:\n \"\"\"\n salmon index --threads $task.cpus -t $transcriptome -i index\n \"\"\"\n}", + "_type": "code", + "_key": "8b11e942a98e" + }, + { + "style": "normal", + "_key": "5ca331c22832", + "children": [ + { + "text": "", + "_key": "e2575f9993e3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "6cd3e89e1a10", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The above snippet defines a process component that can be imported in the main application script using the ", + "_key": "37bce3c6f415" + }, + { + "text": "include", + "_key": "ebb80cbdff3b", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " statement shown below.", + "_key": "6a2a24d59739", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "4101b323f81b", + "children": [ + { + "text": "", + "_key": "d53a8228c326", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "3b9325c3da8b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Also, module files can declare optional parameters using the usual ", + "_key": "656826ace5c7" + }, + { + "marks": [ + "code" + ], + "text": "params", + "_key": "47aea0482234", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " idiom, as it can be done in any standard script file.", + "_key": "f78e20cf0350" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7750bcd0d74d" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb7404a13aa6" + }, + { + "style": "normal", + "_key": "59685249a4e9", + "markDefs": [], + "children": [ + { + "text": "This approach, which is consistent with the current Nextflow syntax, makes very easy to migrate existing code to the new modules system, reducing it to a mere copy & pasting exercise in most cases.", + "_key": "a0ff7a773ccd", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c03b6dee2816", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7619f3f225e0" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf/blob/66ebeea/modules/rnaseq.nf", + "_key": "f95d2b7af476" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can see a complete module file ", + "_key": "2c43f2f57e01" + }, + { + "_type": "span", + "marks": [ + "f95d2b7af476" + ], + "text": "here", + "_key": "41540219521d" + }, + { + "marks": [], + "text": ".", + "_key": "015bb00a9228", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d966eaa19682" + }, + { + "style": "normal", + "_key": "4875661a8621", + "children": [ + { + "text": "", + "_key": "66b546786ce6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Module inclusion", + "_key": "f49e95042dd4" + } + ], + "_type": "block", + "style": "h3", + "_key": "f82753da2e37" + }, + { + "style": "normal", + "_key": "fff48666e292", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "A module file can be included into a Nextflow script using the ", + "_key": "728c0e34edde", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "include", + "_key": "24150f4f3fae" + }, + { + "_key": "5621799bb97e", + "_type": "span", + "marks": [], + "text": " statement. With this it becomes possible to reference any process defined in the module using the usual syntax for a function invocation, and specifying the expected input channels as they were function arguments." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "be2ab973c684" + } + ], + "_type": "block", + "style": "normal", + "_key": "1e3b6a777803" + }, + { + "code": "nextflow.preview.dsl=2\ninclude 'modules/rnaseq'\n\nread_pairs_ch = Channel.fromFilePairs( params.reads, checkIfExists: true )\ntranscriptome_file = file( params.transcriptome )\n\nINDEX( transcriptome_file )\nFASTQC( read_pairs_ch )\nQUANT( INDEX.out, read_pairs_ch )\nMULTIQC( QUANT.out.mix(FASTQC.out).collect(), multiqc_file )", + "_type": "code", + "_key": "06f264cb8c7f" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "827f2469dfdf" + } + ], + "_type": "block", + "style": "normal", + "_key": "b9387b512a22" + }, + { + "_key": "0929de1af27b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Notably, each process defines its own namespace in the script scope which allows the access of the process output channel(s) using the ", + "_key": "8eac355d5e48" + }, + { + "marks": [ + "code" + ], + "text": ".out", + "_key": "9a028451b73f", + "_type": "span" + }, + { + "marks": [], + "text": " attribute. This can be used then as any other Nextflow channel variable in your pipeline script.", + "_key": "2f39c637cb50", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "43ae5fc6feeb" + } + ], + "_type": "block", + "style": "normal", + "_key": "c9eeb20e4705" + }, + { + "_type": "block", + "style": "normal", + "_key": "7a22d17f0e7c", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/edge/dsl2.html#selective-inclusion", + "_key": "8950ee469db7", + "_type": "link" + }, + { + "href": "https://www.nextflow.io/docs/edge/dsl2.html#module-aliases", + "_key": "e4a22d98815c", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "ca0d5deff1ab" + }, + { + "_key": "d6edbba9756c", + "_type": "span", + "marks": [ + "code" + ], + "text": "include" + }, + { + "_type": "span", + "marks": [], + "text": " statement gives also the possibility to include only a ", + "_key": "1d91497d6106" + }, + { + "_key": "354e5444c7b0", + "_type": "span", + "marks": [ + "8950ee469db7" + ], + "text": "specific process" + }, + { + "_key": "dc5279cb2e4b", + "_type": "span", + "marks": [], + "text": " or to include a process with a different " + }, + { + "_type": "span", + "marks": [ + "e4a22d98815c" + ], + "text": "name alias", + "_key": "d9519d2ede9b" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6023a24cb36c" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "219a31383b1e" + } + ], + "_type": "block", + "style": "normal", + "_key": "d464673ed05a" + }, + { + "_type": "block", + "style": "h3", + "_key": "ee52bd6fe111", + "children": [ + { + "_key": "f43bfa868a26", + "_type": "span", + "text": "Smart channel forking" + } + ] + }, + { + "_key": "ac5cf23eff26", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "One of the most important changes of the new syntax is that any channel can be read as many times as you need removing the requirement to duplicate them using the ", + "_key": "7e69db886407", + "_type": "span" + }, + { + "text": "into", + "_key": "52a18bb24a49", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " operator.", + "_key": "a17334c88dfa", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "363f38455f95", + "children": [ + { + "_type": "span", + "text": "", + "_key": "05253630746e" + } + ] + }, + { + "children": [ + { + "text": "For example, in the above snippet, the ", + "_key": "2cd321223621", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "read_pairs_ch", + "_key": "718f5fd50474", + "_type": "span" + }, + { + "_key": "a4aef921330d", + "_type": "span", + "marks": [], + "text": " channel has been used twice, as input both for the " + }, + { + "text": "FASTQC", + "_key": "c20fe7ae20d9", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "3af52209c6aa" + }, + { + "_key": "09d735854f39", + "_type": "span", + "marks": [ + "code" + ], + "text": "QUANT" + }, + { + "marks": [], + "text": " processes. Nextflow forks it behind the scene for you.", + "_key": "6587c0ce27fa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b5defdf7325a", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8b719f3f7c96" + } + ], + "_type": "block", + "style": "normal", + "_key": "6b0a8d1199af" + }, + { + "style": "normal", + "_key": "6dade71cef85", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This makes the writing of workflow scripts much more fluent, readable and ... fun! No more channel names proliferation!", + "_key": "7d11cb47adf9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ef51a575599a" + } + ], + "_type": "block", + "style": "normal", + "_key": "725f1e19a042" + }, + { + "children": [ + { + "_type": "span", + "text": "Nextflow pipes!", + "_key": "3b51f93a4fd4" + } + ], + "_type": "block", + "style": "h3", + "_key": "b2a42ca32824" + }, + { + "markDefs": [], + "children": [ + { + "text": "Finally, maybe our favourite one. The new DSL introduces the ", + "_key": "8c061b9c04d9", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "|", + "_key": "178687c1cd24" + }, + { + "marks": [], + "text": " (pipe) operator which allows for the composition of Nextflow channels, processes and operators together seamlessly in a much more expressive way.", + "_key": "e78c529f0747", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3bd1047d87d9" + }, + { + "_type": "block", + "style": "normal", + "_key": "f09c795bb959", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a158e044085d" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "bdf81ebf4139", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Consider the following example:", + "_key": "b3547d044031", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "88dd53f628ac" + } + ], + "_type": "block", + "style": "normal", + "_key": "0eaaf3f151f7" + }, + { + "_type": "code", + "_key": "fef15128e9f8", + "code": "process align {\n input:\n file seq\n output:\n file 'result'\n\n \"\"\"\n t_coffee -in=${seq} -out result\n \"\"\"\n}\n\nChannel.fromPath(params.in) | splitFasta | align | view { it.text }" + }, + { + "_key": "64fea6bfcd04", + "children": [ + { + "text": "", + "_key": "62be6509a482", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d6d2c1ea2df2", + "markDefs": [ + { + "_key": "f9d6dd012b51", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/operator.html#splitfasta" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/operator.html#view", + "_key": "6879e3b1020b" + } + ], + "children": [ + { + "_key": "71b1ba0b6e5b", + "_type": "span", + "marks": [], + "text": "In the last line, the " + }, + { + "marks": [ + "code" + ], + "text": "fromPath", + "_key": "22a3bee0d5bb", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " channel is piped to the ", + "_key": "11033d58e1f9" + }, + { + "marks": [ + "f9d6dd012b51" + ], + "text": "`splitFasta`", + "_key": "124904c8ad20", + "_type": "span" + }, + { + "_key": "c03c67462963", + "_type": "span", + "marks": [], + "text": " operator whose result is used as input by the " + }, + { + "text": "align", + "_key": "6405dff0e31d", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " process. Then the output is finally printed by the ", + "_key": "71492a879ad0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "6879e3b1020b" + ], + "text": "`view`", + "_key": "f0cd3ee8fcb2" + }, + { + "_key": "6de67f52905a", + "_type": "span", + "marks": [], + "text": " operator." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "2b7db12fedc7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "948abd42e22f" + } + ] + }, + { + "_key": "fde0774491b0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.", + "_key": "35157b5c82e4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e0c080f25e23" + } + ], + "_type": "block", + "style": "normal", + "_key": "380253bd9c15" + }, + { + "children": [ + { + "_key": "386f946f8819", + "_type": "span", + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h4", + "_key": "366a957e5ad4" + }, + { + "style": "normal", + "_key": "5704973d0b62", + "markDefs": [], + "children": [ + { + "text": "This wave of improvements brings a radically new experience when it comes to writing Nextflow workflows. We are releasing it as a preview technology to allow users to try, test, provide their feedback and give us the possibility stabilise it.", + "_key": "0d1aed5ab263", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e2c972777a84", + "children": [ + { + "_type": "span", + "text": "", + "_key": "43b5eead7c38" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "8c15bff2c34c", + "_type": "span", + "marks": [], + "text": "We are also working to other important enhancements that will be included soon, such as remote modules, sub-workflows composition, simplified file path wrangling and more. Stay tuned!" + } + ], + "_type": "block", + "style": "normal", + "_key": "e9be88e4d4c5" + } + ], + "_updatedAt": "2024-09-26T09:02:13Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "06360a013f18" + } + ], + "_rev": "Ot9x7kyGeH5005E3MIwiho", + "_id": "7e6263a0ca78", + "title": "One more step towards Nextflow modules" + }, + { + "_createdAt": "2024-09-25T14:15:29Z", + "_updatedAt": "2024-09-26T09:01:59Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "673a4ea4a81d" + } + ], + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "813d9324e6e2", + "_type": "span", + "marks": [], + "text": "Today marks an important milestone in the Nextflow project. We are thrilled to announce three important changes to better meet users’ needs and ground the project on a solid foundation upon which to build a vibrant ecosystem of tools and data analysis applications for genomic research and beyond." + } + ], + "_type": "block", + "style": "normal", + "_key": "0e8afe0b8d70" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d4b0cc78dc07" + } + ], + "_type": "block", + "style": "normal", + "_key": "3a3863f4a032" + }, + { + "children": [ + { + "_type": "span", + "text": "Apache license", + "_key": "fde10db6acf3" + } + ], + "_type": "block", + "style": "h3", + "_key": "cdfe3da819fa" + }, + { + "style": "normal", + "_key": "44792244e8ff", + "markDefs": [ + { + "_key": "3d014720d7ab", + "_type": "link", + "href": "https://copyleft.org/guide/comprehensive-gpl-guidech5.html" + }, + { + "href": "https://opensource.com/law/14/7/lawsuit-threatens-break-new-ground-gpl-and-software-licensing-issues", + "_key": "93fe19b125a5", + "_type": "link" + }, + { + "_type": "link", + "href": "/blog/2018/clarification-about-nextflow-license.html", + "_key": "ec82f076967b" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow was originally licensed as GPLv3 open source software more than five years ago. GPL is designed to promote the adoption and spread of open source software and culture. On the other hand it has also some controversial side-effects, such as the one on ", + "_key": "6c06a52b3ce0" + }, + { + "_type": "span", + "marks": [ + "3d014720d7ab" + ], + "text": "derivative works", + "_key": "347065109c68" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "ec0e03bfa44d" + }, + { + "marks": [ + "93fe19b125a5" + ], + "text": "legal implications", + "_key": "2971620f6297", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " which make the use of GPL released software a headache in many organisations. We have previously discussed these concerns in ", + "_key": "a472c31e32a9" + }, + { + "text": "this blog post", + "_key": "ab02fb40f7c1", + "_type": "span", + "marks": [ + "ec82f076967b" + ] + }, + { + "text": " and, after community feedback, have opted to change the project license to Apache 2.0.", + "_key": "b4a053b5f65f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "5c97aa017386" + } + ], + "_type": "block", + "style": "normal", + "_key": "44c767c4d049" + }, + { + "markDefs": [ + { + "_key": "e966c3df2eda", + "_type": "link", + "href": "https://www.apache.org/" + } + ], + "children": [ + { + "_key": "176069757acc", + "_type": "span", + "marks": [], + "text": "This is a popular permissive free software license written by the " + }, + { + "marks": [ + "e966c3df2eda" + ], + "text": "Apache Software Foundation", + "_key": "1e8b237a8537", + "_type": "span" + }, + { + "_key": "d0e18aad916e", + "_type": "span", + "marks": [], + "text": " (ASF). Software distributed with this license requires the preservation of the copyright notice and disclaimer. It allows the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software without dictating the licence terms of the resulting applications and derivative works. We are sure this licensing model addresses the concerns raised by the Nextflow community and will boost further project developments." + } + ], + "_type": "block", + "style": "normal", + "_key": "24832e375d5d" + }, + { + "children": [ + { + "text": "", + "_key": "39eed8c4c292", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab3c50d4abf6" + }, + { + "children": [ + { + "_key": "5d65248d878b", + "_type": "span", + "text": "New release schema" + } + ], + "_type": "block", + "style": "h3", + "_key": "2d77398c7215" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In the time since Nextflow was open sourced, we have released 150 versions which have been used by many organizations to deploy critical production workflows on a large range of computational platforms and under heavy loads and stress conditions.", + "_key": "988ea5f7696b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e3bbac2ecfc4" + }, + { + "_type": "block", + "style": "normal", + "_key": "a688591cfb52", + "children": [ + { + "_key": "3f048f3a1b93", + "_type": "span", + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "For example, at the Centre for Genomic Regulation (CRG) alone, Nextflow has been used to deploy data intensive computation workflows since 2014, and it has orchestrated the execution of over 12 million jobs totalling 1.4 million CPU-hours.", + "_key": "8ef197077677", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "051902f2951a" + }, + { + "_key": "e5d14977051d", + "children": [ + { + "_key": "a6903324609e", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "ca74d64a0b03", + "asset": { + "_type": "reference", + "_ref": "image-11f8ddf3c8c614c4162509d052182fc8ac15fd3d-1828x2017-png" + }, + "_type": "image", + "alt": "Nextflow release schema" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This extensive use across different execution environments has resulted in a reliable software package, and it's therefore finally time to declare Nextflow stable and drop the zero from the version number!", + "_key": "fc98b257db1c" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6b566b80c88" + }, + { + "_type": "block", + "style": "normal", + "_key": "2712d09a617f", + "children": [ + { + "text": "", + "_key": "6ba41e91c117", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "From today onwards, Nextflow will use a 3 monthly time-based ", + "_key": "87c9c5a3f695", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "stable", + "_key": "49d2932afa3a", + "_type": "span" + }, + { + "_key": "f5b20c01d2dc", + "_type": "span", + "marks": [], + "text": " release cycle. Today's release is numbered as " + }, + { + "text": "18.10", + "_key": "3f0ecd8c6586", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "98c26c68e269", + "_type": "span", + "marks": [], + "text": ", the next one will be on January 2019, numbered as 19.01, and so on. This gives our users a more predictable release cadence and allows us to better focus on new feature development and scheduling." + } + ], + "_type": "block", + "style": "normal", + "_key": "c602741a9e53" + }, + { + "_key": "817ffb93b59a", + "children": [ + { + "text": "", + "_key": "e39773424522", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "37be0ac85f79", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Along with the 3-months stable release cycle, we will provide a monthly ", + "_key": "10b85c91119c", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "edge", + "_key": "a4057e86b53c" + }, + { + "_type": "span", + "marks": [], + "text": " release, which will include access to the latest experimental features and developments. As such, it should only be used for evaluation and testing purposes.", + "_key": "e0ec9c8f2520" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4afe15f0567e" + } + ], + "_type": "block", + "style": "normal", + "_key": "f2be8b115a8e" + }, + { + "children": [ + { + "_key": "d0c0dbb32caf", + "_type": "span", + "text": "Commercial support" + } + ], + "_type": "block", + "style": "h3", + "_key": "4b6176657a4c" + }, + { + "children": [ + { + "text": "Finally, for organisations requiring commercial support, we have recently incorporated ", + "_key": "8c0ba81e62f6", + "_type": "span", + "marks": [] + }, + { + "text": "Seqera Labs", + "_key": "a77bb699f756", + "_type": "span", + "marks": [ + "4ea8b05c5f64" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", a spin-off of the Centre for Genomic Regulation.", + "_key": "cf6dfc30e348" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a661dd8aa50", + "markDefs": [ + { + "_key": "4ea8b05c5f64", + "_type": "link", + "href": "https://www.seqera.io/" + } + ] + }, + { + "children": [ + { + "_key": "09d0590f7524", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7f72068a5e63" + }, + { + "_key": "d90312a3f126", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Seqera Labs will foster Nextflow adoption as professional open source software by providing commercial support services and exploring new innovative products and solutions.", + "_key": "92e35227f7eb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "02711646f52e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "abc00539d559" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7ee084c62627", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "It's important to highlight that Seqera Labs will not close or make Nextflow a commercial project. Nextflow is and will continue to be owned by the CRG and the other contributing organisations and individuals.", + "_key": "3dfadc08b9cc", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "191a0deaab19", + "children": [ + { + "_type": "span", + "text": "", + "_key": "04e6fbd38948" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Conclusion", + "_key": "7712b3d830fd", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "afc80d599ce1" + }, + { + "style": "normal", + "_key": "106bb7a098cd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow project has reached an important milestone. In the last five years it has grown and managed to become a stable technology used by thousands of people daily to deploy large scale workloads for life science data analysis applications and beyond. The project is now exiting from the experimental stage.", + "_key": "4ab7c4cae672" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "cb0d5664fdd6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2b9abe112a6" + }, + { + "style": "normal", + "_key": "6c37a165c05c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the above changes we want to fulfil the needs of researchers, for a reliable tool enabling scalable and reproducible data analysis, along with the demand of production oriented users, who require reliable support and services for critical deployments.", + "_key": "c1abf6926a87" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "64d579b3b0ea", + "children": [ + { + "_key": "045f3840d120", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "aed87891b226", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Above all, our aim is to strengthen the community effort around the Nextflow ecosystem and make it a sustainable and solid technology in the long run.", + "_key": "27a772f40b0a" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "bfb4d69cc8c9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce9759552a1a" + }, + { + "children": [ + { + "text": "Credits", + "_key": "cdd0e68eb36f", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "08c57b22c384" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We want to say thank you to all the people who have supported and contributed to this project to this stage. First of all to Cedric Notredame for his long term commitment to the project within the Comparative Bioinformatics group at CRG. The Open Bioinformatics Foundation (OBF) in the name of Chris Fields and The Ontario Institute for Cancer Research (OICR), namely Dr Lincoln Stein, for supporting the Nextflow change of license. The CRG TBDO department, and in particular Salvatore Cappadona for his continued support and advice. Finally, the user community who with their feedback and constructive criticism contribute everyday to make this project more stable, useful and powerful.", + "_key": "a8e5d52ce32f" + } + ], + "_type": "block", + "style": "normal", + "_key": "a2c0f6846993" + } + ], + "publishedAt": "2018-10-24T06:00:00.000Z", + "_rev": "mvya9zzDXWakVjnX4hhYaA", + "_type": "blogPost", + "meta": { + "slug": { + "current": "goodbye-zero-hello-apache" + } + }, + "title": "Goodbye zero, Hello Apache!", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_id": "7e9bc9ff3145" + }, + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "using-docker-in-hpc-cluster" + } + }, + "_rev": "hf9hwMPb7ybAE3bqEU5qlA", + "publishedAt": "2014-11-06T07:00:00.000Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "21a8e2818342", + "_type": "span", + "marks": [], + "text": "Scientific data analysis pipelines are rarely composed by a single piece of software. In a real world scenario, computational pipelines are made up of multiple stages, each of which can execute many different scripts, system commands and external tools deployed in a hosting computing environment, usually an HPC cluster." + } + ], + "_type": "block", + "style": "normal", + "_key": "a25b08b8f56a" + }, + { + "style": "normal", + "_key": "907655974abe", + "children": [ + { + "_key": "b9c4c3807595", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "530e70036d22", + "_type": "span", + "marks": [], + "text": "As I work as a research engineer in a bioinformatics lab I experience on a daily basis the difficulties related on keeping such a piece of software consistent." + } + ], + "_type": "block", + "style": "normal", + "_key": "49cc2b96a907", + "markDefs": [] + }, + { + "_key": "2735c64f6cdc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "96364a8569d8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "fc13a6bbc553", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Computing environments can change frequently in order to test new pieces of software or maybe because system libraries need to be updated. For this reason replicating the results of a data analysis over time can be a challenging task.", + "_key": "3934f31485ce", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "685177a023e3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "145c02a03cd3" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "857bdc2f8141" + ], + "text": "Docker", + "_key": "7a4165200edd" + }, + { + "_type": "span", + "marks": [], + "text": " has emerged recently as a new type of virtualisation technology that allows one to create a self-contained runtime environment. There are plenty of examples showing the benefits of using it to run application services, like web servers or databases.", + "_key": "4c988b03fe0d" + } + ], + "_type": "block", + "style": "normal", + "_key": "848fa4196541", + "markDefs": [ + { + "_type": "link", + "href": "http://www.docker.com", + "_key": "857bdc2f8141" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b78bfd6ffe15" + } + ], + "_type": "block", + "style": "normal", + "_key": "e4d8376e4189" + }, + { + "style": "normal", + "_key": "966b19a5e97e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However it seems that few people have considered using Docker for the deployment of scientific data analysis pipelines on distributed cluster of computer, in order to simplify the development, the deployment and the replicability of this kind of applications.", + "_key": "2ef9662fe703" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "2fc19aabd0e0", + "children": [ + { + "_key": "3414ca206b77", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "1cc8b6454482", + "markDefs": [ + { + "_key": "cd0e0b3d5c50", + "_type": "link", + "href": "http://www.crg.eu" + } + ], + "children": [ + { + "marks": [], + "text": "For this reason I wanted to test the capabilities of Docker to solve these problems in the cluster available in our ", + "_key": "fc61cb5812a1", + "_type": "span" + }, + { + "marks": [ + "cd0e0b3d5c50" + ], + "text": "institute", + "_key": "7a360bced9bb", + "_type": "span" + }, + { + "_key": "d661f0453985", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "_key": "7629d3de1266", + "children": [ + { + "_type": "span", + "text": "", + "_key": "76da63e7914d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Method", + "_key": "8bb982d2616d" + } + ], + "_type": "block", + "style": "h2", + "_key": "315712d85ed0" + }, + { + "children": [ + { + "_key": "30e6b189c0f7", + "_type": "span", + "marks": [], + "text": "The Docker engine has been installed in each node of our cluster, that runs a " + }, + { + "text": "Univa grid engine", + "_key": "c19c3428be52", + "_type": "span", + "marks": [ + "30aafdbb8c01" + ] + }, + { + "_type": "span", + "marks": [], + "text": " resource manager. A Docker private registry instance has also been installed in our internal network, so that images can be pulled from the local repository in a much faster way when compared to the public ", + "_key": "d254e73cbce3" + }, + { + "_key": "2b1005b58d1c", + "_type": "span", + "marks": [ + "528676649d6b" + ], + "text": "Docker registry" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "7d2a6c6bf231" + } + ], + "_type": "block", + "style": "normal", + "_key": "68959c99ae6c", + "markDefs": [ + { + "href": "http://www.univa.com/products/grid-engine.php", + "_key": "30aafdbb8c01", + "_type": "link" + }, + { + "href": "http://registry.hub.docker.com", + "_key": "528676649d6b", + "_type": "link" + } + ] + }, + { + "style": "normal", + "_key": "b9e6b3a330c5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "30c775a9d7c0" + } + ], + "_type": "block" + }, + { + "_key": "e83ed6476beb", + "markDefs": [ + { + "href": "http://www.gridengine.eu/mangridengine/htmlman5/complex.html", + "_key": "51b40f4451f1", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Moreover the Univa grid engine has been configured with a custom ", + "_key": "73e7928d35b1" + }, + { + "_key": "6923f788cefe", + "_type": "span", + "marks": [ + "51b40f4451f1" + ], + "text": "complex" + }, + { + "_key": "d0ad3d4b82a1", + "_type": "span", + "marks": [], + "text": " resource type. This allows us to request a specific Docker image as a resource type while submitting a job execution to the cluster." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "d1e6e8237056", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c75dc5d9e29b" + }, + { + "style": "normal", + "_key": "a94140a87dce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Docker image is requested as a ", + "_key": "6e94c79b603f" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "soft", + "_key": "89a626adb300" + }, + { + "_type": "span", + "marks": [], + "text": " resource, by doing that the UGE scheduler tries to run a job to a node where that image has already been pulled, otherwise a lower priority is given to it and it is executed, eventually, by a node where the specified Docker image is not available. This will force the node to pull the required image from the local registry at the time of the job execution.", + "_key": "32156455e022" + } + ], + "_type": "block" + }, + { + "_key": "28bde6dcc640", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ae16149b3b9f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "This environment has been tested with ", + "_key": "ccd5d71ff786" + }, + { + "text": "Piper-NF", + "_key": "9b9942a451b5", + "_type": "span", + "marks": [ + "d99d7cf25964" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", a genomic pipeline for the detection and mapping of long non-coding RNAs.", + "_key": "dc2383411fc4" + } + ], + "_type": "block", + "style": "normal", + "_key": "96a9dff42d0c", + "markDefs": [ + { + "_key": "d99d7cf25964", + "_type": "link", + "href": "https://github.com/cbcrg/piper-nf" + } + ] + }, + { + "_key": "f8e6f4bf0cf0", + "children": [ + { + "text": "", + "_key": "4ce27f61de6e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "467bfe39b228", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline runs on top of Nextflow, which takes care of the tasks parallelisation and submits the jobs for execution to the Univa grid engine.", + "_key": "298907ca1526" + } + ], + "_type": "block" + }, + { + "_key": "c58da74a21dd", + "children": [ + { + "_key": "9f37ce0dcfe9", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "63571e36d056", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Piper-NF code wasn't modified in order to run it using Docker. Nextflow is able to handle it automatically. The Docker containers are run in such a way that the tasks result files are created in the hosting file system, in other words it behaves in a completely transparent manner without requiring extra steps or affecting the flow of the pipeline execution.", + "_key": "bf36dae331c5" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "506ba08f5671", + "children": [ + { + "_type": "span", + "text": "", + "_key": "e466207d5bc4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "It was only necessary to specify the Docker image (or images) to be used in the Nextflow configuration file for the pipeline. You can read more about this at ", + "_key": "25639f706326", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "e6d913eae644" + ], + "text": "this link", + "_key": "222284abb483" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "4ff5371122f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "1fd1bfa59d24", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/docker.html", + "_key": "e6d913eae644", + "_type": "link" + } + ] + }, + { + "style": "normal", + "_key": "7a535c4f399d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "04405ed2f160" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "23b8dfd2f6fd", + "children": [ + { + "text": "Results", + "_key": "59fd51c05b3d", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "c514a23baab3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To benchmark the impact of Docker on the pipeline performance a comparison was made running it with and without Docker.", + "_key": "80f513ffb104" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "0fc80faa04e1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1a3121b4c3bd" + }, + { + "_type": "block", + "style": "normal", + "_key": "dc2f2fa26c98", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this experiment 10 cluster nodes were used. The pipeline execution launches around 100 jobs, and it was run 5 times by using the same dataset with and without Docker.", + "_key": "04272fc661bf" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8694183d05db" + } + ], + "_type": "block", + "style": "normal", + "_key": "6e5071ea700e" + }, + { + "markDefs": [], + "children": [ + { + "text": "The average execution time without Docker was 28.6 minutes, while the average pipeline execution time, running each job in a Docker container, was 32.2 minutes. Thus, by using Docker the overall execution time increased by something around 12.5%.", + "_key": "0bfbde9e5b22", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "fcc69544fbe0" + }, + { + "style": "normal", + "_key": "de87f9734107", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8010702b167d" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "It is important to note that this time includes both the Docker bootstrap time, and the time overhead that is added to the task execution by the virtualisation layer.", + "_key": "58c8a0547238", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e9b6ebfb64d6" + }, + { + "_type": "block", + "style": "normal", + "_key": "069850deaa54", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a6c16e43e354" + } + ] + }, + { + "children": [ + { + "_key": "b9381c8c62e5", + "_type": "span", + "marks": [], + "text": "For this reason the actual task run time was measured as well i.e. without including the Docker bootstrap time overhead. In this case, the aggregate average task execution time was 57.3 minutes and 59.5 minutes when running the same tasks using Docker. Thus, the time overhead added by the Docker virtualisation layer to the effective task run time can be estimated to around 4% in our test." + } + ], + "_type": "block", + "style": "normal", + "_key": "681b379a78fb", + "markDefs": [] + }, + { + "style": "normal", + "_key": "47fa9b5c3fde", + "children": [ + { + "text": "", + "_key": "f8019ef1d722", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "833a6d34561a", + "markDefs": [ + { + "href": "https://registry.hub.docker.com/repos/cbcrg/", + "_key": "1904cb29f48e", + "_type": "link" + } + ], + "children": [ + { + "text": "Keeping the complete toolset required by the pipeline execution within a Docker image dramatically reduced configuration and deployment problems. Also storing these images into the private and ", + "_key": "305dea57c370", + "_type": "span", + "marks": [] + }, + { + "_key": "a893fda2d701", + "_type": "span", + "marks": [ + "1904cb29f48e" + ], + "text": "public" + }, + { + "text": " repositories with a unique tag allowed us to replicate the results without the usual burden required to set-up an identical computing environment.", + "_key": "9596447b4002", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "07022de84527", + "children": [ + { + "_key": "a40887c4fe43", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "e989ea5a171a", + "children": [ + { + "text": "Conclusion", + "_key": "df3594c6f0cf", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3b29c57d7283", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The fast start-up time for Docker containers technology allows one to virtualise a single process or the execution of a bunch of applications, instead of a complete operating system. This opens up new possibilities, for example the possibility to "virtualise" distributed job executions in an HPC cluster of computers.", + "_key": "b610b92c2f8d" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "38dd9964c872", + "children": [ + { + "_key": "274eeadce480", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "256591933336", + "_type": "span", + "marks": [], + "text": "The minimal performance loss introduced by the Docker engine is offset by the advantages of running your analysis in a self-contained and dead easy to reproduce runtime environment, which guarantees the consistency of the results over time and across different computing platforms." + } + ], + "_type": "block", + "style": "normal", + "_key": "61ef627676a3", + "markDefs": [] + }, + { + "style": "normal", + "_key": "fc7bba825666", + "children": [ + { + "text": "", + "_key": "4407a9d4873f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Credits", + "_key": "f49a24119778", + "_type": "span" + } + ], + "_type": "block", + "style": "h4", + "_key": "74d033d7a132" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Thanks to Arnau Bria and the all scientific systems admins team to manage the Docker installation in the CRG computing cluster.", + "_key": "4fcaeb424993", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0284aeb960ed" + } + ], + "title": "Using Docker for scientific data analysis in an HPC cluster", + "tags": [], + "_updatedAt": "2024-09-26T09:00:20Z", + "_createdAt": "2024-09-25T14:14:53Z", + "_type": "blogPost", + "_id": "826e3c0139a2" + }, + { + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "59e18ed3d8a2" + } + ], + "author": { + "_type": "reference", + "_ref": "5bLgfCKN00diCN0ijmWNOF" + }, + "title": "The Nextflow CLI - tricks and treats!", + "meta": { + "slug": { + "current": "cli-docs-release" + } + }, + "_id": "84a7ecf4acdc", + "publishedAt": "2020-10-22T06:00:00.000Z", + "_rev": "2PruMrLMGpvZP5qAknmBqW", + "_updatedAt": "2024-09-26T09:02:19Z", + "_type": "blogPost", + "body": [ + { + "_key": "f53a71042080", + "markDefs": [ + { + "_type": "link", + "href": "https://tower.nf", + "_key": "5f74049a4a0b" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/cli.html", + "_key": "23b4bf1a0a17" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For most developers, the command line is synonymous with agility. While tools such as ", + "_key": "02a92956a55e" + }, + { + "_key": "df77273e95ab", + "_type": "span", + "marks": [ + "5f74049a4a0b" + ], + "text": "Nextflow Tower" + }, + { + "_type": "span", + "marks": [], + "text": " are opening up the ecosystem to a whole new set of users, the Nextflow CLI remains a bedrock for pipeline development. The CLI in Nextflow has been the core interface since the beginning; however, its full functionality was never extensively documented. Today we are excited to release the first iteration of the CLI documentation available on the ", + "_key": "4bf7305364f0" + }, + { + "_key": "0d9489b78b84", + "_type": "span", + "marks": [ + "23b4bf1a0a17" + ], + "text": "Nextflow website" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3cc4a6f05f5c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0964d6176602", + "children": [ + { + "_type": "span", + "text": "", + "_key": "488984dad191" + } + ], + "_type": "block" + }, + { + "_key": "3ed5e08f86e5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "And given Halloween is just around the corner, in this blog post we'll take a look at 5 CLI tricks and examples which will make your life easier in designing, executing and debugging data pipelines. We are also giving away 5 limited-edition Nextflow hoodies and sticker packs so you can code in style this Halloween season!", + "_key": "7a716109cd37", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "090ecbd4f732", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c2049fd5e87d" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "1. Invoke a remote pipeline execution with the latest revision", + "_key": "52c2d551540f" + } + ], + "_type": "block", + "style": "h3", + "_key": "66f038e8388a" + }, + { + "children": [ + { + "text": "Nextflow facilitates easy collaboration and re-use of existing pipelines in multiple ways. One of the simplest ways to do this is to use the URL of the Git repository.", + "_key": "2058e24358dc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b7f36aeb45f8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "af4e488a2cd1", + "children": [ + { + "text": "", + "_key": "5c749d6a1e44", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "34a45055c363", + "code": "$ nextflow run https://www.github.com/nextflow-io/hello" + }, + { + "style": "normal", + "_key": "8712e775b594", + "children": [ + { + "_key": "b6bd611d7266", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "bd0a8586ad0f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When executing a pipeline using the run command, it first checks to see if it has been previously downloaded in the ~/.nextflow/assets directory, and if so, Nextflow uses this to execute the pipeline. If the pipeline is not already cached, Nextflow will download it, store it in the ", + "_key": "1a6d5f168c7b", + "_type": "span" + }, + { + "text": "$HOME/.nextflow/", + "_key": "37896a0f63f6", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " directory and then launch the execution.", + "_key": "8ffc714ce324", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "7269baed76c8", + "children": [ + { + "_key": "e3aa2801e71f", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d9b8e4db1dc9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "How can we make sure that we always run the latest code from the remote pipeline? We simply need to add the ", + "_key": "826d92ed952a" + }, + { + "marks": [ + "code" + ], + "text": "-latest", + "_key": "05891da81525", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " option to the run command, and Nextflow takes care of the rest.", + "_key": "502fedc589ab" + } + ] + }, + { + "_key": "7b1a2ed4a7c9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7c122dcd519b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ nextflow run nextflow-io/hello -latest", + "_type": "code", + "_key": "8b41fbaf989d" + }, + { + "_type": "block", + "style": "normal", + "_key": "725641bd62f8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7dd400f9bf57" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "69f318f1cd6a", + "children": [ + { + "_type": "span", + "text": "2. Query work directories for a specific execution", + "_key": "65d9e0858203" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For every invocation of Nextflow, all the metadata about an execution is stored including task directories, completion status and time etc. We can use the ", + "_key": "6eac33152bb4", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "nextflow log", + "_key": "00a46431af2c", + "_type": "span" + }, + { + "_key": "fa28d524a7e5", + "_type": "span", + "marks": [], + "text": " command to generate a summary of this information for a specific run." + } + ], + "_type": "block", + "style": "normal", + "_key": "0b65592870fc" + }, + { + "_type": "block", + "style": "normal", + "_key": "d140bf1fb8aa", + "children": [ + { + "text": "", + "_key": "335c277d6ab5", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b0492756900f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To see a list of work directories associated with a particular execution (for example, ", + "_key": "87365e4fa96a" + }, + { + "text": "tiny_leavitt", + "_key": "3426c2a6d34e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": "), use:", + "_key": "7bb2013a5b34" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "ad9f9a075c13", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7fae984a7a79" + }, + { + "code": "$ nextflow log tiny_leavitt", + "_type": "code", + "_key": "77fc8bf4b8ac" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e2ebd70bf39a" + } + ], + "_type": "block", + "style": "normal", + "_key": "716df9e751b8" + }, + { + "_key": "d2fd6746543f", + "markDefs": [], + "children": [ + { + "text": "To filter out specific process-level information from the logs of any execution, we simply need to use the fields (-f) option and specify the fields.", + "_key": "9fad5087922c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f876db1e52f6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f9376c2deca9" + } + ] + }, + { + "_key": "d9297d1e7a95", + "code": "$ nextflow log tiny_leavitt –f 'process, hash, status, duration'", + "_type": "code" + }, + { + "children": [ + { + "_key": "0462f5a09916", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "4c90ff7149cd" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1201a29850d6", + "_type": "span", + "marks": [], + "text": "The hash is the name of the work directory where the process was executed; therefore, the location of a process work directory would be something like " + }, + { + "text": "work/74/68ff183", + "_key": "9286c85ecb80", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "9e9f85259301" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a1c1c457041" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "2cd6d7f16c1b" + } + ], + "_type": "block", + "style": "normal", + "_key": "54974b549279" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The log command also has other child options including ", + "_key": "617ecbfb2cdf" + }, + { + "_key": "dda550682e34", + "_type": "span", + "marks": [ + "code" + ], + "text": "-before" + }, + { + "text": " and ", + "_key": "0daea83d286c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-after", + "_key": "58056e8b5c79" + }, + { + "text": " to help with the chronological inspection of logs.", + "_key": "e25877645b2d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6d39b22a4319" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1ad4693f4f0", + "children": [ + { + "_key": "7230b4385ab6", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "de3b269917d6", + "children": [ + { + "_type": "span", + "text": "3. Top-level configuration", + "_key": "1e3076196e5e" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow emphasizes customization of pipelines and exposes multiple options to facilitate this. The configuration is applied to multiple Nextflow commands and is therefore a top-level option. In practice, this means specifying configuration options ", + "_key": "2daf8e559865" + }, + { + "marks": [ + "em" + ], + "text": "before", + "_key": "d3449bed824c", + "_type": "span" + }, + { + "text": " the command.", + "_key": "dea9cffe4c5b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "00712450e6df", + "markDefs": [] + }, + { + "_key": "8248c23c1bdc", + "children": [ + { + "text": "", + "_key": "f682139db714", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "9cf43882c7ff", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow CLI provides two kinds of config overrides - the soft override and the hard override.", + "_key": "45b569c9d5eb", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "c82bc6e6de82", + "children": [ + { + "_key": "737ca2f8ee45", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "972704119e6a", + "_type": "span", + "marks": [], + "text": "The top-level soft override "-c" option allows us to change the previous config in an additive manner, overriding only the fields included the configuration file." + } + ], + "_type": "block", + "style": "normal", + "_key": "c22b3aaba489" + }, + { + "_key": "c1c3e6057958", + "children": [ + { + "_type": "span", + "text": "", + "_key": "19732bb38c4f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ nextflow -c my.config run nextflow-io/hello", + "_type": "code", + "_key": "cc13a89abfa8" + }, + { + "_type": "block", + "style": "normal", + "_key": "bf01438a7a26", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dcb0f33c95be" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "675978c5735b", + "markDefs": [], + "children": [ + { + "text": "On the other hand, the hard override ", + "_key": "13301f70e6d7", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-C", + "_key": "11a385f62eb4" + }, + { + "text": " completely replaces and ignores any additional configurations.", + "_key": "bdf5e0b90c3f", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b2b42f53c395", + "children": [ + { + "_type": "span", + "text": "", + "_key": "630fd7da4ac9" + } + ] + }, + { + "_type": "code", + "_key": "c17e17e6962d", + "code": "$ nextflow –C my.config nextflow-io/hello" + }, + { + "markDefs": [], + "children": [ + { + "text": "Moreover, we can also use the config command to inspect the final inferred configuration and view any profiles.", + "_key": "aa336a662327", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ffbabdf579f3" + }, + { + "_type": "block", + "style": "normal", + "_key": "b0f2177085c3", + "children": [ + { + "_key": "d40e3d9830ad", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "03a9ec8e025e", + "code": "$ nextflow config -show-profiles" + }, + { + "_key": "7f3820fe3257", + "children": [ + { + "_type": "span", + "text": "", + "_key": "cdb1e14ea76c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "d52cbeb9e3c6", + "children": [ + { + "_type": "span", + "text": "4. Passing in an input parameter file", + "_key": "7e2e026df196" + } + ] + }, + { + "_key": "22693c2a833e", + "markDefs": [], + "children": [ + { + "text": "Nextflow is designed to work across both research and production settings. In production especially, specifying multiple parameters for the pipeline on the command line becomes cumbersome. In these cases, environment variables or config files are commonly used which contain all input files, options and metadata. Love them or hate them, YAML and JSON are the standard formats for human and machines, respectively.", + "_key": "d3329aa2b0f1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d3e951f2e82c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "86d45930afc0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "The Nextflow run option ", + "_key": "7e992ebb5214", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "-params-file", + "_key": "5d04d70a5dfe", + "_type": "span" + }, + { + "_key": "a469c8a253ac", + "_type": "span", + "marks": [], + "text": " can be used to pass in a file containing parameters in either format." + } + ], + "_type": "block", + "style": "normal", + "_key": "08ae7429963d", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0e0a78f00ba9", + "children": [ + { + "text": "", + "_key": "a93227a6d3ff", + "_type": "span" + } + ] + }, + { + "_type": "code", + "_key": "47bbbee89628", + "code": "$ nextflow run nextflow-io/rnaseq -params-file run_42.yaml" + }, + { + "_type": "block", + "style": "normal", + "_key": "90471ddfa812", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5770dc6e7675" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "The YAML file could contain the following.", + "_key": "744d3dd67928", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e1ba5d9fbb05" + }, + { + "style": "normal", + "_key": "6bde078896bf", + "children": [ + { + "text": "", + "_key": "c5bfbb492b0f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "2462af85dd93", + "code": "reads : \"s3://gatk-data/run_42/reads/*_R{1,2}_*.fastq.gz\"\nbwa_index : \"$baseDir/index/*.bwa-index.tar.gz\"\npaired_end : true\npenalty : 12" + }, + { + "_key": "8b40b488ffae", + "children": [ + { + "_type": "span", + "text": "", + "_key": "fadbe454e0a2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "df825d8da2d5", + "_type": "span", + "text": "5. Specific workflow entry points" + } + ], + "_type": "block", + "style": "h3", + "_key": "e2b58b2a59d5" + }, + { + "_type": "block", + "style": "normal", + "_key": "773f9a9ace8c", + "markDefs": [ + { + "_key": "4b5b0c568454", + "_type": "link", + "href": "https://www.nextflow.io/blog/2020/dsl2-is-here.html" + } + ], + "children": [ + { + "marks": [], + "text": "The recently released ", + "_key": "b39078a98091", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "4b5b0c568454" + ], + "text": "DSL2", + "_key": "60034d94a3b7" + }, + { + "marks": [], + "text": " adds powerful modularity to Nextflow and enables scripts to contain multiple workflows. By default, the unnamed workflow is assumed to be the main entry point for the script, however, with numerous named workflows, the entry point can be customized by using the ", + "_key": "15f0d38017d5", + "_type": "span" + }, + { + "_key": "8e8054c25f46", + "_type": "span", + "marks": [ + "code" + ], + "text": "entry" + }, + { + "_key": "f8d1080431e8", + "_type": "span", + "marks": [], + "text": " child-option of the run command." + } + ] + }, + { + "children": [ + { + "_key": "cee6c0c5bac3", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "431bd58fe897" + }, + { + "code": "$ nextflow run main.nf -entry workflow1", + "_type": "code", + "_key": "042ce39434f7" + }, + { + "style": "normal", + "_key": "da619209a28e", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/dsl2.html#implicit-workflow", + "_key": "9ae8d4b33714" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This allows users to run a specific sub-workflow or a section of their entire workflow script. For more information, refer to the ", + "_key": "ce97d7044e2b" + }, + { + "text": "implicit workflow", + "_key": "7ac639c27172", + "_type": "span", + "marks": [ + "9ae8d4b33714" + ] + }, + { + "_key": "19c9cc471b2b", + "_type": "span", + "marks": [], + "text": " section of the documentation." + } + ], + "_type": "block" + }, + { + "_key": "7369caa20cdb", + "children": [ + { + "text": "", + "_key": "9af8e981ae11", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, as of version 20.09.1-edge, you can specify the script in a project to run other than ", + "_key": "d1ae77edbd7c" + }, + { + "marks": [ + "code" + ], + "text": "main.nf", + "_key": "ff50ad9d5e84", + "_type": "span" + }, + { + "text": " using the command line option ", + "_key": "90a223fdf06e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-main-script", + "_key": "ad70a5e19ee4" + }, + { + "_key": "7770f21b6cf5", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "db137b5fb480" + }, + { + "_type": "block", + "style": "normal", + "_key": "6e3e8fd35197", + "children": [ + { + "_key": "a214947c4f46", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "$ nextflow run http://github.com/my/pipeline -main-script my-analysis.nf", + "_type": "code", + "_key": "30dd7f4e01e9" + }, + { + "_key": "ba4b90a4f69b", + "children": [ + { + "_key": "e4911c6cd4f6", + "_type": "span", + "text": "Bonus trick! Web dashboard launched from the CLI" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "text": "The tricks above highlight the functionality of the Nextflow CLI. However, for long-running workflows, monitoring becomes all the more crucial. With Nextflow Tower, we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow execution wherever we are. Sign-in to ", + "_key": "733e633f475b", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "78b96fc3b98d" + ], + "text": "Tower", + "_key": "835ee4efc8dc" + }, + { + "marks": [], + "text": " using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, ", + "_key": "ab5bb1caccf7", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/.bashrc", + "_key": "b833263d1f5e" + }, + { + "_key": "b03539fd1f38", + "_type": "span", + "marks": [], + "text": " or include them in your " + }, + { + "_key": "3ec0efc65715", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "fde29a59ee97" + } + ], + "_type": "block", + "style": "normal", + "_key": "5791813a4f49", + "markDefs": [ + { + "_type": "link", + "href": "https://tower.nf", + "_key": "78b96fc3b98d" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "1974648cab41" + } + ], + "_type": "block", + "style": "normal", + "_key": "d46e7176b3e3" + }, + { + "_type": "code", + "_key": "a59e198bb227", + "code": "$ export TOWER_ACCESS_TOKEN=my-secret-tower-key\n$ export NXF_VER=20.07.1" + }, + { + "_type": "block", + "style": "normal", + "_key": "d59d4e6b7fd2", + "children": [ + { + "_type": "span", + "text": "", + "_key": "85b813e0227e" + } + ] + }, + { + "_key": "4f8c1e69b99a", + "markDefs": [], + "children": [ + { + "_key": "61209901f441", + "_type": "span", + "marks": [], + "text": "Next simply add the "-with-tower" child-option to any Nextflow run command. A URL with the monitoring dashboard will appear." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9964b61dc7ba", + "children": [ + { + "text": "", + "_key": "e18c0b4a40da", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "7ffa9a76c114", + "code": "$ nextflow run nextflow-io/hello -with-tower" + }, + { + "_key": "e1864562eb67", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2e926af9e987" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Nextflow Giveaway", + "_key": "81d52ccfc71c" + } + ], + "_type": "block", + "style": "h3", + "_key": "2ea447a60afa" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c83d8bbf452b", + "_type": "span", + "marks": [], + "text": "If you want to look stylish while you put the above tips into practice, or simply like free stuff, we are giving away five of our latest Nextflow hoodie and sticker packs. Retweet or like the Nextflow tweet about this article and we will draw and notify the winners on October 31st!" + } + ], + "_type": "block", + "style": "normal", + "_key": "597e62b74ed2" + }, + { + "_key": "607b67145889", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5315e940ca76" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b05b89361e1c", + "children": [ + { + "text": "About the Author", + "_key": "9e182c922cfa", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "_key": "98112b0da88b", + "_type": "span", + "marks": [ + "28202d2f1e4e" + ], + "text": "Abhinav Sharma" + }, + { + "marks": [], + "text": " is a Bioinformatics Engineer at ", + "_key": "ce7c8685a1a4", + "_type": "span" + }, + { + "text": "Seqera Labs", + "_key": "4fee5ddab6db", + "_type": "span", + "marks": [ + "e69245a51845" + ] + }, + { + "_key": "6f51547015be", + "_type": "span", + "marks": [], + "text": " interested in Data Science and Cloud Engineering. He enjoys working on all things Genomics, Bioinformatics and Nextflow." + } + ], + "_type": "block", + "style": "normal", + "_key": "97c3b6ac58c0", + "markDefs": [ + { + "_key": "28202d2f1e4e", + "_type": "link", + "href": "https://www.linkedin.com/in/abhi18av/" + }, + { + "_type": "link", + "href": "https://www.seqera.io", + "_key": "e69245a51845" + } + ] + }, + { + "children": [ + { + "_key": "7deac9b37098", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6c9f11bde344" + }, + { + "children": [ + { + "text": "Acknowledgements", + "_key": "9379d00d54c3", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "5ebdad595af1" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/KevinSayers", + "_key": "3734d6d2d01c" + }, + { + "href": "https://github.com/apeltzer", + "_key": "6fd08f204816", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Shout out to ", + "_key": "f15069f01280", + "_type": "span" + }, + { + "text": "Kevin Sayers", + "_key": "048e53641c24", + "_type": "span", + "marks": [ + "3734d6d2d01c" + ] + }, + { + "text": " and ", + "_key": "7c24897fd2a0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "6fd08f204816" + ], + "text": "Alexander Peltzer", + "_key": "e061aeb35772", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for their earlier efforts in documenting the CLI and which inspired this work.", + "_key": "51f3e0dd06f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "68c04976b571" + }, + { + "children": [ + { + "text": "", + "_key": "417a40a3ad91", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "23c15a4f6f1e" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "The latest CLI docs can be found in the edge release docs at [https://www.nextflow.io/docs/latest/cli.html](https://www.nextflow.io/docs/latest/cli.html).", + "_key": "513ce70bb7b5" + } + ], + "_type": "block", + "style": "normal", + "_key": "dcf58fa87de7", + "markDefs": [] + } + ], + "_createdAt": "2024-09-25T14:15:47Z" + }, + { + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "a2be3f1f832f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Innovation can be viewed as the application of solutions that meet new requirements or existing market needs. Academia has traditionally been the driving force of innovation. Scientific ideas have shaped the world, but only a few of them were brought to market by the inventing scientists themselves, resulting in both time and financial loses.", + "_key": "7baae13aa833" + } + ] + }, + { + "children": [ + { + "_key": "ebae1b93c6f8", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1c5af5f2a373" + }, + { + "children": [ + { + "text": "Lately there have been several attempts to boost scientific innovation and translation, with most notable in Europe being the Horizon 2020 funding program. The problem with these types of funding is that they are not designed for PhDs and Postdocs, but rather aim to promote the collaboration of senior scientists in different institutions. This neglects two very important facts, first and foremost that most of the Nobel prizes were given for discoveries made when scientists were in their 20's / 30's (not in their 50's / 60's). Secondly, innovation really happens when a few individuals (not institutions) face a problem in their everyday life/work, and one day they just decide to do something about it (end-user innovation). Without realizing, these people address a need that many others have. They don’t do it for the money or the glory; they do it because it bothers them! Many examples of companies that started exactly this way include Apple, Google, and Virgin Airlines.", + "_key": "d5b540f77171", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "86bfb0541d66", + "markDefs": [] + }, + { + "children": [ + { + "_key": "d01647856223", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bcafa24b7e57" + }, + { + "children": [ + { + "_type": "span", + "text": "The story of Nextflow", + "_key": "31f95ec7419f" + } + ], + "_type": "block", + "style": "h3", + "_key": "8488bd7cd9a9" + }, + { + "_key": "cb13d099e328", + "markDefs": [ + { + "_type": "link", + "href": "http://en.wikipedia.org/wiki/Dataflow_programming", + "_key": "b57662986664" + }, + { + "_type": "link", + "href": "http://en.wikipedia.org/wiki/Domain-specific_language", + "_key": "8394a05e9476" + } + ], + "children": [ + { + "_key": "612afe22983d", + "_type": "span", + "marks": [], + "text": "Similarly, Nextflow started as an attempt to solve the every-day computational problems we were facing with “big biomedical data” analyses. We wished that our huge and almost cryptic BASH-based pipelines could handle parallelization automatically. In our effort to make that happen we stumbled upon the " + }, + { + "_key": "4966ecb26fed", + "_type": "span", + "marks": [ + "b57662986664" + ], + "text": "Dataflow" + }, + { + "marks": [], + "text": " programming model and Nextflow was created. We were getting furious every time our two-week long pipelines were crashing and we had to re-execute them from the beginning. We, therefore, developed a caching system, which allows Nextflow to resume any pipeline from the last executed step. While we were really enjoying developing a new ", + "_key": "c70f6786231e", + "_type": "span" + }, + { + "text": "DSL", + "_key": "8943ff92be62", + "_type": "span", + "marks": [ + "8394a05e9476" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and creating our own operators, at the same time we were not willing to give up our favorite Perl/Python scripts and one-liners, and thus Nextflow became a polyglot.", + "_key": "d49bb678b414" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "96a7ecdef5b4", + "children": [ + { + "_key": "5a312794bc02", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ca7d8e16e685", + "markDefs": [ + { + "href": "https://www.docker.com/", + "_key": "7480592f1c6f", + "_type": "link" + }, + { + "_key": "becc2d4710fa", + "_type": "link", + "href": "https://github.com" + }, + { + "href": "https://bitbucket.org/", + "_key": "b8a8fb2b235d", + "_type": "link" + } + ], + "children": [ + { + "text": "Another problem we were facing was that our pipelines were invoking a lot of third-party software, making distribution and execution on different platforms a nightmare. Once again while searching for a solution to this problem, we were able to identify a breakthrough technology ", + "_key": "25eb0a849b96", + "_type": "span", + "marks": [] + }, + { + "text": "Docker", + "_key": "f9fa5b609ab4", + "_type": "span", + "marks": [ + "7480592f1c6f" + ] + }, + { + "_key": "5eb896f7e94d", + "_type": "span", + "marks": [], + "text": ", which is now revolutionising cloud computation. Nextflow has been one of the first framework, that fully supports Docker containers and allows pipeline execution in an isolated and easy to distribute manner. Of course, sharing our pipelines with our friends rapidly became a necessity and so we had to make Nextflow smart enough to support " + }, + { + "_type": "span", + "marks": [ + "becc2d4710fa" + ], + "text": "Github", + "_key": "43bf929f36c8" + }, + { + "_key": "02635a4a80df", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "text": "Bitbucket", + "_key": "7a485b41d6c0", + "_type": "span", + "marks": [ + "b8a8fb2b235d" + ] + }, + { + "_key": "bd4649e35e59", + "_type": "span", + "marks": [], + "text": " integration." + } + ], + "_type": "block" + }, + { + "_key": "55ff3fa3c70c", + "children": [ + { + "_key": "41ab4d2246f7", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c30a909f5adf", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I don’t know if Nextflow will make as much difference in the world as the Dataflow programming model and Docker container technology are making, but it has already made a big difference in our lives and that is all we ever wanted…", + "_key": "ddb4dbc05e9d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "43b0fa2ba2a2" + } + ], + "_type": "block", + "style": "normal", + "_key": "5ec0d02aaf68" + }, + { + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "b638fd3a1620" + } + ], + "_type": "block", + "style": "h3", + "_key": "0b77c16ea339" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Summarising, it is a pity that PhDs and Postdocs are the neglected engine of Innovation. They are not empowered to innovate, by identifying and addressing their needs, and to potentially set up commercial solutions to their problems. This fact becomes even sadder when you think that only 3% of Postdocs have a chance to become PIs in the UK. Instead more and more money is being invested into the senior scientists who only require their PhD students and Postdocs to put another step into a well-defined ladder. In todays world it seems that ideas, such as Nextflow, will only get funded for their scientific value, not as innovative concepts trying to address a need.", + "_key": "93bca796ff84" + } + ], + "_type": "block", + "style": "normal", + "_key": "95b287817bc0" + } + ], + "_type": "blogPost", + "_rev": "2PruMrLMGpvZP5qAknm7QB", + "title": "Innovation In Science - The story behind Nextflow", + "tags": [ + { + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference", + "_key": "48a472a1b789" + }, + { + "_type": "reference", + "_key": "c290bcded3ee", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "author": { + "_type": "reference", + "_ref": "7d389002-0fae-4149-98d4-22623b6afbed" + }, + "_id": "8685dbcae4cf", + "meta": { + "slug": { + "current": "innovation-in-science-the-story-behind-nextflow" + } + }, + "publishedAt": "2015-06-09T06:00:00.000Z", + "_createdAt": "2024-09-25T14:14:54Z", + "_updatedAt": "2024-09-26T09:01:10Z" + }, + { + "meta": { + "slug": { + "current": "clarification-about-nextflow-license" + }, + "description": "Over past week there was some discussion on social media regarding the Nextflow license and its impact on users’ workflow applications." + }, + "_rev": "rsIQ9Jd8Z4nKBVUruy4WZQ", + "_type": "blogPost", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:15:26Z", + "title": "Clarification about the Nextflow license", + "_updatedAt": "2024-10-16T12:11:56Z", + "tags": [ + { + "_key": "b58becd39b3f", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "publishedAt": "2018-07-20T06:00:00.000Z", + "_id": "87f476831047", + "body": [ + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Over past week there was some discussion on social media regarding the Nextflow license and its impact on users' workflow applications.", + "_key": "ee78b813fab1" + } + ], + "_type": "block", + "style": "normal", + "_key": "175bb65272ff" + }, + { + "children": [ + { + "text": "", + "_key": "bec70789dbe9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "29ae1994035c", + "markDefs": [] + }, + { + "size": "medium", + "_type": "picture", + "_key": "beaf231ca24e", + "alignment": "center", + "asset": { + "_type": "image", + "asset": { + "_ref": "image-8354a4645c0bd891522f53aa56e0c52ffc7efdce-1100x574-png", + "_type": "reference" + } + } + }, + { + "alignment": "center", + "asset": { + "_type": "image", + "asset": { + "_type": "reference", + "_ref": "image-504d32cf87bbadfd6d120e0206d78db3ecafdf11-1100x670-png" + } + }, + "size": "medium", + "_type": "picture", + "_key": "2d2bd7cb4227" + }, + { + "alignment": "center", + "asset": { + "asset": { + "_ref": "image-a86cbd83400be04b282f331944eb185e39dc1259-1100x520-png", + "_type": "reference" + }, + "_type": "image" + }, + "size": "medium", + "_type": "picture", + "_key": "4c3052ec0cd5" + }, + { + "_key": "939de290ba81", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3d5fbec0ec22", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "87bfec47e7b7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "What's the problem with GPL?", + "_key": "7316efd3be6b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "86ba355feefd", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy", + "_key": "dfe16ce44dc5" + }, + { + "href": "https://www.kernel.org/doc/html/v4.17/process/license-rules.html", + "_key": "a1cd838b0b74", + "_type": "link" + }, + { + "_type": "link", + "href": "https://git-scm.com/about/free-and-open-source", + "_key": "e4ccca697567" + } + ], + "children": [ + { + "_key": "8010f6e6688a", + "_type": "span", + "marks": [], + "text": "Nextflow has been released under the GPLv3 license since its early days " + }, + { + "_type": "span", + "marks": [ + "dfe16ce44dc5" + ], + "text": "over 5 years ago", + "_key": "de3a5796bda4" + }, + { + "marks": [], + "text": ". GPL is a very popular open source licence used by many projects (like, for example, ", + "_key": "0f0640a12398", + "_type": "span" + }, + { + "text": "Linux", + "_key": "e55524b52f93", + "_type": "span", + "marks": [ + "a1cd838b0b74" + ] + }, + { + "text": " and ", + "_key": "75a849cec54e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e4ccca697567" + ], + "text": "Git", + "_key": "f653cdc75a04" + }, + { + "_type": "span", + "marks": [], + "text": ") and it has been designed to promote the adoption and spread of open source software and culture.", + "_key": "cac16b0a10c8" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "88045297fe1d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9760cd034370" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "With this idea in mind, GPL requires the author of a piece of software, ", + "_key": "67ca6f3357e2", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "derived", + "_key": "af1bdb65ba54" + }, + { + "_type": "span", + "marks": [], + "text": " from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.", + "_key": "3665030fd62c" + } + ], + "_type": "block", + "style": "normal", + "_key": "516568a7f149" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d88862da47ae" + } + ], + "_type": "block", + "style": "normal", + "_key": "e414d92f0b02" + }, + { + "_type": "block", + "style": "normal", + "_key": "cf606a9dd27b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.", + "_key": "8078718dc20d", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a6bd1cb31c03" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a2e887730d9", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "d859390f48ab", + "markDefs": [ + { + "_type": "link", + "href": "http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html", + "_key": "4d48789afaa4" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a \\*derived\\* application is. See for example ", + "_key": "d01089b0e094" + }, + { + "_type": "span", + "marks": [ + "4d48789afaa4" + ], + "text": "this post by Titus Brown", + "_key": "ed12fb795b45" + }, + { + "text": " at this regard.", + "_key": "268f8bbf1dca", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "e2410d32c9e3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b779685b201f" + } + ] + }, + { + "_key": "017924093f51", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "What's the impact of the Nextflow license on my application?", + "_key": "1e9fd9ba1417" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [ + { + "href": "https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic", + "_key": "be15ceecf475", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL", + "_key": "dfa2b3bfcc6c" + } + ], + "children": [ + { + "marks": [], + "text": "If you are not distributing your application, based on Nextflow, it doesn't affect you in any way. If you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See ", + "_key": "63c76845a7a9", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "be15ceecf475" + ], + "text": "here", + "_key": "50c2a234346e" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "a9e11a811a32" + }, + { + "text": "here", + "_key": "81d384dcfed5", + "_type": "span", + "marks": [ + "dfa2b3bfcc6c" + ] + }, + { + "_key": "02183fb1e08b", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "426b98fe01e9" + }, + { + "markDefs": [], + "children": [ + { + "_key": "0a086dd63d75", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5b275e114bc" + }, + { + "style": "normal", + "_key": "725359980b14", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though.", + "_key": "48defa127178" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "202297dcde47" + } + ], + "_type": "block", + "style": "normal", + "_key": "746ae2fc9c15" + }, + { + "style": "h2", + "_key": "1fb5ead2d223", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "That's all?", + "_key": "bdca4b261402" + } + ], + "_type": "block" + }, + { + "_key": "fbae09602d68", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "No. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause.", + "_key": "aba2c8e39e71" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "30cf55885fa1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a4ed4b2d581b" + } + ] + }, + { + "_key": "730f9465b484", + "markDefs": [], + "children": [ + { + "text": "We are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.", + "_key": "d4a47a502983", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ] + }, + { + "publishedAt": "2023-10-18T06:00:00.000Z", + "_createdAt": "2024-09-25T14:17:23Z", + "_updatedAt": "2024-10-14T09:13:40Z", + "author": { + "_ref": "mNsm4Vx1W1Wy6aYYkroetD", + "_type": "reference" + }, + "title": "Introducing the Nextflow Ambassador Program", + "_rev": "mvya9zzDXWakVjnX4hhVQQ", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "513706fc2fdb" + }, + { + "_ref": "3d25991c-f357-442b-a5fa-6c02c3419f88", + "_type": "reference", + "_key": "ce1df4cabdb1" + } + ], + "body": [ + { + "_key": "4410f5606b1d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers.", + "_key": "a117e2b4acff", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bf50e4abff86" + } + ], + "_type": "block", + "style": "normal", + "_key": "bc7b6c0a4c88" + }, + { + "_type": "picture", + "alt": "nf-core Hackathon in Barcelona 2023", + "_key": "fc868e5686da", + "alignment": "right", + "asset": { + "_type": "image", + "asset": { + "_ref": "image-2a9648dad75de6c930ca67d9dc43b90c9a5ce90b-900x981-jpg", + "_type": "reference" + } + }, + "size": "medium" + }, + { + "_key": "227b0fd37e00", + "markDefs": [], + "children": [ + { + "_key": "1bf450716ce6", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow ambassadors will play a vital role in:", + "_key": "85f6064419cc" + } + ], + "_type": "block", + "style": "normal", + "_key": "6130b7d70f6c", + "markDefs": [] + }, + { + "_key": "32556c7707d7", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Sharing knowledge", + "_key": "fe66504a70b00", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "8762e91494e0", + "_type": "span", + "marks": [], + "text": ": Ambassadors provide valuable insights and best practices to help users make the most of Nextflow by writing training material and blog posts, giving seminars and workshops, organizing hackathons and meet-ups, and helping with community support." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e7a7b3f014ad", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Fostering collaboration", + "_key": "75e1c00f694b0" + }, + { + "text": ": As knowledgeable members of our community, ambassadors facilitate connections among users and developers, enabling collaboration on community projects, such as nf-core pipelines, sub-workflows, and modules, among other things, in the Nextflow ecosystem.", + "_key": "0856b210910e", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "3772fa8a407f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Community growth", + "_key": "b7ee6750eb9c0", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "1df25a8ab577", + "_type": "span", + "marks": [], + "text": ": Ambassadors help expand and enrich the Nextflow community, making it more vibrant and supportive. They are local contacts for new community members and engage with potential users in their region and fields of expertise." + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "59449ed5fb6d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7c4108f47a57" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "616292489f7d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As community members who already actively contribute to outreach, ambassadors will be supported to extend the work they’re already doing. For example, many of our ambassadors run local Nextflow training events – to help with this, the program will include “train the trainer” sessions and give access to our content library with slide decks, templates, and more. Ambassadors can also request stickers and financial support for events they organize (e.g., for pizza). Seqera is opening an exclusive travel fund that ambassadors can apply to help cover travel costs for events where they will present relevant work. Social media content written by ambassadors will be amplified by the nextflow and nf-core accounts, increasing their reach. Ambassadors will get “behind the scenes” access, with insights into running an open-source community, early access to new features, and a great networking experience. The ambassador network will enable members to be kept up-to-date with events and opportunities happening all over the world. To recognize their efforts, ambassadors will receive exclusive swag and apparel, a certificate for their work, and a profile on the ambassador page of our website.", + "_key": "ff67b34b21490" + } + ], + "_type": "block" + }, + { + "_key": "18a57a26bbdb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5830c122b5290" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "fbe6916524f0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Meet our ambassadors", + "_key": "1a57c6363147" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can visit our ", + "_key": "3627b6c3fe0b" + }, + { + "_type": "span", + "marks": [ + "0bdbee58fdb5" + ], + "text": "Nextflow ambassadors page", + "_key": "152ab0c2219a" + }, + { + "_key": "6255a610d688", + "_type": "span", + "marks": [], + "text": " to learn more about our first group of ambassadors. You will find their profiles there, highlighting their interests, expertise, and insights they bring to the Nextflow ecosystem." + } + ], + "_type": "block", + "style": "normal", + "_key": "1e7b869dbea3", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/our_ambassadors.html", + "_key": "0bdbee58fdb5" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "99bb3d3ea0d4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b704ba7382b7", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You can see snippets about some of our ambassadors below:", + "_key": "4391e48f7672", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "69067864ca03" + }, + { + "_key": "99d21bef958f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c7208bdf3749" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h4", + "_key": "ad9dd568c04d", + "markDefs": [], + "children": [ + { + "text": "Priyanka Surana", + "_key": "d3ad09ee244c", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "10afdcfe3624", + "markDefs": [ + { + "href": "https://pipelines.tol.sanger.ac.uk/pipelines", + "_key": "bd7e7fb6fcef", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Priyanka Surana is a Principal Bioinformatician at the Wellcome Sanger Institute, where she oversees the Nextflow development for the Tree of Life program. Over the last almost two years, they have released nine pipelines with nf-core standards and have three more in development. You can learn more about them ", + "_key": "5c6bbedd2c1a" + }, + { + "_type": "span", + "marks": [ + "bd7e7fb6fcef" + ], + "text": "here", + "_key": "259b698e277c" + }, + { + "text": ".", + "_key": "f93972378a73", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "3745179dfb82", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6a4da5a1fcdb", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "d9f323bb7883", + "_type": "span", + "marks": [], + "text": "She’s one of our ambassadors in the UK 🇬🇧 and has already done fantastic outreach work, organizing seminars and bringing many new users to our community! 🤩 In the March Hackathon, she organized a local site with over 70 individuals participating in person, plus over five other events in 2023. The Nextflow community on the Wellcome Genome Campus started in March 2023 with the nf-core hackathon, and now it has grown to over 150 members across 11 different organizations across Cambridge. Currently, they are planning a day-long Nextflow Symposium in December 🤯. They do seminars, workshops, coffee meetups, and trainings. In our previous round of the Nextflow and nf-core mentorship, Priyanka mentored Lila, a graduate student in Peru, to build her first Nextflow pipeline using nf-core tools to analyze bacterial metagenomics data. This is the power of a Nextflow ambassador! Not only growing a local community but helping people all over the world to get the best out of Nextflow and nf-core 🥰." + } + ], + "_type": "block", + "style": "normal", + "_key": "39d491380594" + }, + { + "_type": "block", + "style": "normal", + "_key": "652ea87303b4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2ce936283694" + } + ] + }, + { + "style": "h4", + "_key": "8f7bd3003d5d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Abhinav Sharma", + "_key": "cba4909039ca", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "de7a97f18459", + "markDefs": [ + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PL3xpfTVZLcNikun1FrSvtXW8ic32TciTJ", + "_key": "06ab99d97dfd" + }, + { + "_type": "link", + "href": "https://twitter.com/abhi18av/status/1695863348162675042", + "_key": "0de7e52eeee3" + }, + { + "_type": "link", + "href": "https://github.com/TelethonKids/Nextflow-BioWiki", + "_key": "dc28161b87d7" + }, + { + "_key": "beca79ea6900", + "_type": "link", + "href": "https://www.gov.br/iec/pt-br/assuntos/noticias/curso-contribui-para-criacao-da-rede-norte-nordeste-de-vigilancia-genomica-para-tuberculose-no-iec" + } + ], + "children": [ + { + "_key": "b6e8d36f86c3", + "_type": "span", + "marks": [], + "text": "Abhinav is a PhD candidate at Stellenbosch University, South Africa. As a Nextflow Ambassador, Abhinav has been tremendously active in the Global South, supporting young scientists in Africa 🇿🇦🇿🇲, Brazil 🇧🇷, India 🇮🇳 and Australia 🇦🇺 leading to the growth of local communities. He has contributed to the " + }, + { + "_key": "518bdf7a4d40", + "_type": "span", + "marks": [ + "06ab99d97dfd" + ], + "text": "Nextflow training in Hindi" + }, + { + "marks": [], + "text": " and played a key role in integrating African bioinformaticians in the Nextflow and nf-core community and initiatives, showcased by the high participation of individuals in African countries who benefited from mentorship during nf-core Hackathons, Training events and prominent workshops like ", + "_key": "7d1842c315da", + "_type": "span" + }, + { + "_key": "7b14a2d7260f", + "_type": "span", + "marks": [ + "0de7e52eeee3" + ], + "text": "VEME, 2023" + }, + { + "_type": "span", + "marks": [], + "text": ". In Australia, Abhinav continues to collaborate with Patricia, a research scientist from Telethon Kids Institute, Perth (whom he mentored during the nf-core mentorship round 2), to organize monthly seminars on ", + "_key": "21fcf63931ec" + }, + { + "marks": [ + "dc28161b87d7" + ], + "text": "BioWiki", + "_key": "7dfad4122f67", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and bootcamp for local capacity building. In addition, he engages in regular capacity-building sessions in Brazilian institutes such as ", + "_key": "314a2d2d508f" + }, + { + "_type": "span", + "marks": [ + "beca79ea6900" + ], + "text": "Instituto Evandro Chagas", + "_key": "3def7320d8e3" + }, + { + "marks": [], + "text": " (Belém, Brazil) and INI, FIOCRUZ (Rio de Janeiro, Brazil). Last but not least, Abhinav has contributed to the Nextflow community and project in several ways, even to the extent of contributing to the Nextflow code base and plugin ecosystem! 😎", + "_key": "0ef144ecac23", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ab15123b64bf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0410219f035e" + }, + { + "_type": "block", + "style": "h4", + "_key": "bf61576534ee", + "markDefs": [], + "children": [ + { + "_key": "06626c083a6f", + "_type": "span", + "marks": [], + "text": "Robert Petit" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Robert Petit is the Senior Bioinformatics Scientist at the ", + "_key": "901531dd83bb", + "_type": "span" + }, + { + "text": "Wyoming Public Health Laboratory", + "_key": "90ab0c0b886c", + "_type": "span", + "marks": [ + "cdf56da59ce7" + ] + }, + { + "marks": [], + "text": " 🦬 and a long-time contributor to the Nextflow community! 🥳 Being a Nextflow Ambassador, Robert has made extensive efforts to grow the Nextflow and nf-core communities, both locally and internationally. Through his work on ", + "_key": "a86514e125db", + "_type": "span" + }, + { + "text": "Bactopia", + "_key": "f3f8d1d69af5", + "_type": "span", + "marks": [ + "f40ca52fd7d3" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", a popular and extensive Nextflow pipeline for the analysis of bacterial genomes, Robert has been able to ", + "_key": "df550964cc75" + }, + { + "text": "contribute to nf-core regularly", + "_key": "27708e432c84", + "_type": "span", + "marks": [ + "3a86ba5c4884" + ] + }, + { + "_type": "span", + "marks": [], + "text": ". As a Bioconda Core team member, he is always lending a hand when called upon by the Nextflow community, whether it is to add a new recipe or approve a pull request! ⚒️ He has also delivered multiple trainings to the local community in Wyoming, US 🇺🇸, and workshops at conferences, including ASM Microbe. Robert's dedication as a Nextflow Ambassador is best highlighted, and he'll agree, by his active role as a mentor. Robert has acted as a mentor multiple times during virtual nf-core hackathons, and he is the only person to be a mentor in all three rounds of the Nextflow and nf-core mentorship program 😍!", + "_key": "f14233ba7f6d" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e6e149f9adc", + "markDefs": [ + { + "href": "https://health.wyo.gov/publichealth/lab/", + "_key": "cdf56da59ce7", + "_type": "link" + }, + { + "_type": "link", + "href": "https://bactopia.github.io/", + "_key": "f40ca52fd7d3" + }, + { + "href": "https://bactopia.github.io/v3.0.0/impact-and-outreach/enhancements/#enhancements-and-fixes", + "_key": "3a86ba5c4884", + "_type": "link" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ca2bc13967ff" + } + ], + "_type": "block", + "style": "normal", + "_key": "a048a9278d4b" + }, + { + "_type": "block", + "style": "normal", + "_key": "4836756ed37f", + "markDefs": [], + "children": [ + { + "_key": "72727706ce42", + "_type": "span", + "marks": [], + "text": "The Nextflow Ambassador Program is a testament to the power of community-driven innovation, and we invite you to join us in celebrating this exceptional group. In the coming weeks and months, you will hear more from our ambassadors as they continue to share their experiences, insights, and expertise with the community as freshly minted Nextflow ambassadors." + } + ] + } + ], + "_type": "blogPost", + "_id": "8b9158ac2e43", + "meta": { + "slug": { + "current": "introducing-nextflow-ambassador-program" + }, + "description": "We are excited to announce the launch of the Nextflow Ambassador Program, a worldwide initiative designed to foster collaboration, knowledge sharing, and community growth. It is intended to recognize and support the efforts of our community leaders and marks another step forward in our mission to advance scientific research and empower researchers" + } + }, + { + "meta": { + "description": "After diving into the Nextflow community, I’ve seen how it benefits bioinformatics in places like South Africa, Brazil, and France. I’m confident it can do the same for Türkiye by fostering collaboration and speeding up research. Since I became a Nextflow Ambassador, I am happy and excited because I can contribute to this development! ", + "slug": { + "current": "bioinformatics-growth-in-turkiye" + } + }, + "_createdAt": "2024-09-25T14:17:53Z", + "author": { + "_ref": "L90MLvtZSPRQtUzPRoOtHk", + "_type": "reference" + }, + "publishedAt": "2024-06-12T06:00:00.000Z", + "_type": "blogPost", + "_id": "8d36fbb5a89e", + "_updatedAt": "2024-09-27T10:00:41Z", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "728aadc7df47", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "After diving into the Nextflow community, I've seen how it benefits bioinformatics in places like South Africa, Brazil, and France. I'm confident it can do the same for Türkiye by fostering collaboration and speeding up research. Since I became a Nextflow Ambassador, I am happy and excited because I can contribute to this development! Even though our first attempt to organize an introductory Nextflow workshop was online, it was a fruitful collaboration with RSG-Türkiye that initiated our effort to promote more Nextflow in Türkiye. We are happy to announce that we will organize a hands-on workshop soon.", + "_key": "4c1e3a223c28", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "96ec68814f5d" + } + ], + "_type": "block", + "style": "normal", + "_key": "0947163b20c5" + }, + { + "style": "normal", + "_key": "c1a91f132252", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0c0039fb270d" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://www.ghga.de/about-us/team-members/narci-kuebra", + "_key": "b41f17361216", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.ghga.de/about-us/how-we-work/workstreams", + "_key": "1b327c1797d0" + } + ], + "children": [ + { + "text": "I am ", + "_key": "544e70c5b2bf", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "b41f17361216" + ], + "text": "Kübra Narcı", + "_key": "458d51a67d5e" + }, + { + "marks": [], + "text": ", currently employed as a bioinformatician within the ", + "_key": "5c5400d98cf5", + "_type": "span" + }, + { + "_key": "e6f9c346c988", + "_type": "span", + "marks": [ + "1b327c1797d0" + ], + "text": "German Human Genome Phenome Archive (GHGA) Workflows workstream" + }, + { + "marks": [], + "text": ". Upon commencing this position nearly two years ago, I was introduced to Nextflow due to the necessity of transporting certain variant calling workflows here, and given my prior experience with other workflow managers, I was well-suited for the task. Though the initial two months were marked by challenges and moments of frustration, my strong perseverance ultimately led to the successful development of my first pipeline.", + "_key": "bf74d36af211", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "746b53fcfd5b" + }, + { + "_key": "014e9ec2d9a9", + "markDefs": [], + "children": [ + { + "_key": "3442f255a610", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "488db9c1ff14", + "markDefs": [], + "children": [ + { + "_key": "76c859d4aad4", + "_type": "span", + "marks": [], + "text": "Subsequently, owing much to the supportive Nextflow community, my interest, as well as my proficiency in the platform, steadily grew, culminating in my acceptance to the role of Nextflow Ambassador for the past six months. I jumped into the role since it was a great opportunity for GHGA and Nextflow to be connected even more." + } + ] + }, + { + "style": "normal", + "_key": "0bb2d975455d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "f8e74f191372" + } + ], + "_type": "block" + }, + { + "alt": "meme on bright landscape", + "_key": "85eae924205e", + "asset": { + "_type": "reference", + "_ref": "image-805013bdaf2f1d396596eeb5484335d5bc8f4e10-1998x1114-png" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "daa476edbff4", + "markDefs": [], + "children": [ + { + "text": "\nTransitioning into this ambassadorial role prompted a solid realization: the absence of a dedicated Nextflow community in Türkiye. This revelation was a shock, particularly given my academic background in bioinformatics there, where the community’s live engagement in workflow development is undeniable. Witnessing Turkish contributors within Nextflow and nf-core Slack workspaces further underscored this sentiment. It became evident that what was lacking was a spark for organizing events to ignite the Turkish community, a task I gladly undertook.", + "_key": "e0cae04b7c15", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1c169d51a7c2" + } + ], + "_type": "block", + "style": "normal", + "_key": "1890aebcb24c", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.twitter.com/mribeirodantas", + "_key": "0d9f21c1a065" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "While I possessed foresight regarding the establishment of a Nextflow community, I initially faced uncertainty regarding the appropriate course of action. To address this, I sought counsel from ", + "_key": "d92384fb925f" + }, + { + "text": "Marcel", + "_key": "47f023f5fd19", + "_type": "span", + "marks": [ + "0d9f21c1a065" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", given his pivotal role in the initiation of the Nextflow community in Brazil. Following our discussion and receipt of valuable insights, it became evident that establishing connections with the appropriate community from my base in Germany was a necessity.", + "_key": "12d7f0de46f8" + } + ], + "_type": "block", + "style": "normal", + "_key": "e1545316fc7b" + }, + { + "_type": "block", + "style": "normal", + "_key": "204a8601a84f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ab3812501569" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "99a38a3c2aee", + "markDefs": [ + { + "_type": "link", + "href": "https://rsgturkey.com", + "_key": "fa6637f14b88" + } + ], + "children": [ + { + "marks": [], + "text": "This attempt led me to meet with ", + "_key": "f1d5583a6c44", + "_type": "span" + }, + { + "text": "RSG-Türkiye", + "_key": "bfdf2f0d07cf", + "_type": "span", + "marks": [ + "fa6637f14b88" + ] + }, + { + "text": ". RSG-Türkiye aims to create a platform for students and post-docs in computational biology and bioinformatics in Türkiye. It aims to share knowledge and experience, promote collaboration, and expand training opportunities. The organization also collaborates with universities and the Bioinformatics Council, a recently established national organization as the Turkish counterpart of the ISCB (International Society for Computational Biology) to introduce industrial and academic research. To popularize the field, they have offline and online talk series in university student centers to promote computational biology and bioinformatics.", + "_key": "9573ac04552a", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "1990b33f56f1", + "markDefs": [], + "children": [ + { + "_key": "6dc197a3bb2c", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_key": "58cabc9c49b9", + "markDefs": [ + { + "href": "https://www.youtube.com/watch?v=AqNmIkoQrNo&ab_channel=RSG-Turkey", + "_key": "611275fadb5f", + "_type": "link" + } + ], + "children": [ + { + "text": "Following our introduction, RSG-Türkiye and I hosted a workshop focusing on workflow reproducibility, Nextflow, and nf-core. We chose Turkish as the language to make it more accessible for participants who are not fluent in English. The online session lasted a bit more than an hour and attracted nearly 50 attendees, mostly university students but also individuals from the research and industry sectors. The strong student turnout was especially gratifying as it aligned with my goal of building a vibrant Nextflow community in Türkiye. I took the opportunity to discuss Nextflow’s ambassadorship and mentorship programs, which can greatly benefit students, given Türkiye’s growing interest in bioinformatics. The whole workshop was recorded and can be viewed on ", + "_key": "381ce5a3e2bc", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "611275fadb5f" + ], + "text": "YouTube", + "_key": "34f332614a75" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "4dc7a5e18632" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "ccf3920707a9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b362c7e14620" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "I am delighted to report that the workshop was a success. It was not only attracting considerable interest but also marked the commencement of a promising journey. Our collaboration with RSG-Türkiye persists, with plans underway for a more comprehensive on-site training session in Türkiye scheduled for later this year. I look forward to more engagement from Turkish participants as we work together to strengthen our community. Hopefully, this effort will lead to more Turkish-language content, new mentor relations from the core Nextflow team, and the emergence of a local Nextflow ambassador.", + "_key": "7be04a1ef8fe", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "68326f84e35e" + }, + { + "style": "normal", + "_key": "739060eaacb6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "2643942ce84c" + } + ], + "_type": "block" + }, + { + "asset": { + "_type": "reference", + "_ref": "image-855769e39cae10399f05dc42268e931a1e49fab0-1990x1112-png" + }, + "_type": "image", + "alt": "meme on bright landscape", + "_key": "468edcc8e518" + }, + { + "_key": "647fdf809466", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nHow can I contact the Nextflow Türkiye community?", + "_key": "b12b7f94a80c" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "a159a77bc700", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you want to help grow the Nextflow community in Türkiye, join the Nextflow and nf-core Slack workspaces and connect with Turkish contributors in the #region-turkiye channel. Don't be shy—say hello, and let's build up the community together! Feel free to contact me if you're interested in helping organize local hands-on Nextflow workshops. We welcome both advanced users and beginners. By participating, you'll contribute to the growth of bioinformatics in Türkiye, collaborate with peers, and access resources to advance your research and career.", + "_key": "2e03c7c2ec13" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b6c5504a2ae2" + } + ], + "_type": "block", + "style": "normal", + "_key": "68938a5b13fb", + "markDefs": [] + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/ambassadors.html", + "_key": "58c896c9847a", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it ", + "_key": "b04eaa880ea10", + "_type": "span" + }, + { + "text": "here", + "_key": "b04eaa880ea11", + "_type": "span", + "marks": [ + "58c896c9847a" + ] + }, + { + "_key": "b04eaa880ea12", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "blockquote", + "_key": "c18566ab0406" + } + ], + "title": "Fostering Bioinformatics Growth in Türkiye", + "_rev": "Ot9x7kyGeH5005E3MJ8rnE" + }, + { + "_rev": "c8Y6ejr6xtast8r4qB9SlG", + "_updatedAt": "2024-07-24T10:52:39Z", + "publishedAt": "2024-07-17T12:39:00.000Z", + "meta": { + "description": "State of the Workflow 2024 Community Survey Results: Insights from 600+ Nextflow users about the state of workflow management for scientific data analysis", + "noIndex": false, + "slug": { + "_type": "slug", + "current": "the-state-of-the-workflow-2024-community-survey-results" + }, + "_type": "meta" + }, + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_type": "blogPost", + "_createdAt": "2024-07-15T13:31:19Z", + "body": [ + { + "style": "h2", + "_key": "e4775c3fdbeb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Understanding the Nextflow User Community", + "_key": "8e229381fa710" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "In April, we conducted our annual ", + "_key": "19c26daec2820", + "_type": "span" + }, + { + "marks": [ + "strong" + ], + "text": "State of the Workflow Community survey", + "_key": "eb8144492759", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " to gather insights and feedback from the ", + "_key": "cc6ecf47c330" + }, + { + "marks": [ + "3025a534b404" + ], + "text": "Nextflow", + "_key": "19c26daec2823", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " user community, and we are excited to share that this year, ", + "_key": "19c26daec2824" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "600+ Nextflow users", + "_key": "19c26daec2825" + }, + { + "_key": "19c26daec2826", + "_type": "span", + "marks": [], + "text": " participated - a 21% increase from 2023! By sharing these insights, we aim to empower researchers, developers, and organizations to leverage Nextflow effectively, fostering innovation and collaboration amongst the community. Here we share some key findings from the Nextflow user community.\n\n" + }, + { + "marks": [ + "c521b2714e5e", + "strong", + "underline" + ], + "text": "DOWNLOAD THE FULL SURVEY", + "_key": "a71cfed51202", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1a4d6ab2b758", + "markDefs": [ + { + "_type": "link", + "href": "https://nextflow.io/", + "_key": "3025a534b404" + }, + { + "_type": "link", + "href": "https://hubs.la/Q02HMCZ70", + "_key": "c521b2714e5e" + } + ] + }, + { + "style": "normal", + "_key": "eefa4afc5452", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bc60c34cfbea" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "a163a8498996", + "markDefs": [], + "children": [ + { + "text": "Key Findings At a Glance", + "_key": "1789872287f70", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3773f2653df9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1b07495615020" + } + ], + "_type": "block" + }, + { + "_key": "ab334e5b3812", + "markDefs": [], + "children": [ + { + "_key": "159cafaf5df10", + "_type": "span", + "marks": [ + "strong" + ], + "text": "✔ Shift from HPC to public cloud " + }, + { + "text": "- Majority of biotech and industrial sectors now favor public clouds for running Nextflow, with 78% indicating plans to migrate in the next two years.\n", + "_key": "5612ee1622a6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [ + "strong" + ], + "text": "✔ Multi-cloud deployments are on the rise", + "_key": "020a0bd3ea7a0", + "_type": "span" + }, + { + "text": " - To meet growing computational and data availability needs, 14% of Nextflow users manage workloads across two clouds.", + "_key": "020a0bd3ea7a1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "65e306b63bb8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2fc2e91abdef0" + } + ], + "_type": "block", + "style": "normal", + "_key": "9660db208e5b" + }, + { + "_type": "block", + "style": "normal", + "_key": "529f6f989016", + "markDefs": [], + "children": [ + { + "text": "✔ Open Science is key for streamlining research", + "_key": "0c05be0b85150", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "0c05be0b85151", + "_type": "span", + "marks": [], + "text": " - 82% of Nextflow users view Open Science as fundamental to research, advancing science for everyone." + } + ] + }, + { + "style": "normal", + "_key": "03d694b833bd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6a2a3e4ecd55" + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "5fd07cc3c181", + "asset": { + "_ref": "image-3fff8e82dc7ab66c289f1c32186e563997af4e7f-1200x836-png", + "_type": "reference" + } + }, + { + "style": "normal", + "_key": "3012982ecad0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4a6b7ff26854" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "c100760180b5", + "markDefs": [], + "children": [ + { + "text": "Bioinformatics Analysis is Moving to Public Clouds", + "_key": "94dd65e5e77d0", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In recent years, we have witnessed a notable shift in bioinformatics analysis towards public cloud platforms, driven largely by for-profit organizations seeking enhanced reliability, scalability and flexibility in their computational workflows. Our survey found that while on-premises clusters remain the most common for users in general, the prevalence of traditional HPC environments is on a steady decline. Specifically, in the biotech industry, nearly ", + "_key": "f848cb5dd3cc0" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "three-quarters of firms now favor public clouds, ", + "_key": "f848cb5dd3cc1" + }, + { + "_type": "span", + "marks": [], + "text": "reflecting a broader industry trend toward adaptable and robust computing solutions. ", + "_key": "f848cb5dd3cc2" + } + ], + "_type": "block", + "style": "normal", + "_key": "47d223e5d87e" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e531fa4ad4dc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "474b62cda575" + }, + { + "style": "normal", + "_key": "d571f398163a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6a32f4c62759", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Multi-Cloud Deployments are Rising", + "_key": "cbc09602d1f20", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "77bfd81b8e7f" + }, + { + "_type": "block", + "style": "normal", + "_key": "d8db77c8db24", + "markDefs": [], + "children": [ + { + "_key": "b8fc76e313c00", + "_type": "span", + "marks": [], + "text": "As the industry continues to scale their workflow, they are increasingly adopting multi-cloud strategies to meet the demands of diverse computational workflows. In 2021, just " + }, + { + "_key": "b8fc76e313c01", + "_type": "span", + "marks": [ + "strong" + ], + "text": "10% of cloud batch service users" + }, + { + "_key": "b8fc76e313c02", + "_type": "span", + "marks": [], + "text": " were running workloads in " + }, + { + "_key": "b8fc76e313c03", + "_type": "span", + "marks": [ + "strong" + ], + "text": "two separate clouds" + }, + { + "marks": [], + "text": ". By 2024, this figure had ", + "_key": "b8fc76e313c04", + "_type": "span" + }, + { + "_key": "b8fc76e313c05", + "_type": "span", + "marks": [ + "strong" + ], + "text": "risen to 14%" + }, + { + "_key": "b8fc76e313c06", + "_type": "span", + "marks": [], + "text": ". Additionally, 3% of users utilized three different cloud batch services in 2021, which increased to 4% by 2024. This trend highlights the " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "move towards deploying across multiple cloud providers", + "_key": "b8fc76e313c07" + }, + { + "_key": "b8fc76e313c08", + "_type": "span", + "marks": [], + "text": " to address bioinformatics' growing computational and data availability needs across various regions and technical complexities." + } + ] + }, + { + "_key": "7428ac4b77d7", + "markDefs": [], + "children": [ + { + "_key": "fa0bd7669202", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "303f6c725740", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e835fa9dd922" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Open Science: Advancing Science for Everyone", + "_key": "cf20f469a9910" + } + ], + "_type": "block", + "style": "h3", + "_key": "dff1615fb612", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Open Science has emerged as a transformative approach within the bioinformatics community, significantly enhancing collaboration, efficiency, and cost-effectiveness. Around ", + "_key": "957a2c3bdc7a0" + }, + { + "marks": [ + "strong" + ], + "text": "82% of survey respondents", + "_key": "957a2c3bdc7a1", + "_type": "span" + }, + { + "_key": "957a2c3bdc7a2", + "_type": "span", + "marks": [], + "text": " emphasized the" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " fundamental role of Open Science", + "_key": "957a2c3bdc7a3" + }, + { + "text": " in their research practices, reflecting strong community endorsement. Additionally, two-thirds reported ", + "_key": "957a2c3bdc7a4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "significant time-savings", + "_key": "957a2c3bdc7a5" + }, + { + "marks": [], + "text": " through Open Science and 42% acknowledged the ", + "_key": "957a2c3bdc7a6", + "_type": "span" + }, + { + "text": "financial benefits, ", + "_key": "957a2c3bdc7a7", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": "highlighting the value of transparency in research. This shift fosters effective knowledge sharing and collaborative advancement, accelerating research outcomes while reinforcing accountability and scientific integrity.", + "_key": "957a2c3bdc7a8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e370eea37e9", + "markDefs": [] + }, + { + "children": [ + { + "_key": "e803e4047b81", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a09d247e84e8", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "2850ca907c4c", + "markDefs": [], + "children": [ + { + "text": "Read the Full Report Now", + "_key": "7f3ac0c64cdf", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "1b90b2084f12", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Our 2024 State of the Workflow Community Survey provides insights into the evolving landscape of bioinformatics and scientific computing. The shift towards public and multi-cloud platforms, combined with the transformative impact of Open Science, is reshaping the Nextflow ecosystem and revolutionizing computational workflows. Embracing these trends not only drives innovation but also ensures that scientific inquiry remains robust, accountable, and accessible to all, paving the way for continued progress in bioinformatics and beyond.", + "_key": "b357bf5090ac0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "01a3a713b8430", + "_type": "span", + "marks": [], + "text": "Dive into the " + }, + { + "_type": "span", + "marks": [ + "688962534403" + ], + "text": "full report", + "_key": "01a3a713b8431" + }, + { + "_type": "span", + "marks": [], + "text": " to uncover further insights on how bioinformaticians are running pipelines, the pivotal role of the ", + "_key": "01a3a713b8432" + }, + { + "marks": [ + "517a41101c8d" + ], + "text": "nf-core community", + "_key": "01a3a713b8433", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", and other key trends —your glimpse into the future of computational workflows awaits!\n", + "_key": "01a3a713b8434" + } + ], + "_type": "block", + "style": "normal", + "_key": "c690c04d6473", + "markDefs": [ + { + "href": "https://nf-co.re/", + "_key": "517a41101c8d", + "_type": "link" + }, + { + "_type": "link", + "href": "https://hubs.la/Q02HMCZ70", + "_key": "688962534403" + } + ] + }, + { + "_key": "d7acd64713cf", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "722484bb3cf0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "729b6f099a85", + "markDefs": [ + { + "_key": "042ff6244a59", + "_type": "link", + "href": "https://hubs.la/Q02HMCZ70" + } + ], + "children": [ + { + "_key": "e3fcb08b34ad0", + "_type": "span", + "marks": [ + "042ff6244a59", + "strong", + "underline" + ], + "text": "DOWNLOAD THE FULL SURVEY NOW" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7d81e062e8a9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "d1cf1e278328", + "_type": "span" + }, + { + "text": "\n\n", + "_key": "cbd3cd3cbad9", + "_type": "span", + "marks": [ + "strong", + "underline" + ] + } + ], + "_type": "block", + "style": "normal" + } + ], + "_id": "8e1a9fb2-814c-455b-890c-5f3f07e83da4", + "title": "The State of the Workflow 2024: Community Survey Results", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "e3b6edcaea48" + } + ] + }, + { + "meta": { + "description": "Nextflow Ambassadors are passionate individuals within the Nextflow community who play a more active role in fostering collaboration, knowledge sharing, and engagement. We launched this program at the Nextflow Summit in Barcelona last year, and it’s been a great experience so far, so we’ve been recruiting more volunteers to expand the program. ", + "slug": { + "current": "ambassador-second-call" + } + }, + "_id": "9088058a344c", + "title": "Open call for new Nextflow Ambassadors closes June 14", + "_type": "blogPost", + "author": { + "_type": "reference", + "_ref": "mNsm4Vx1W1Wy6aYYkroetD" + }, + "_updatedAt": "2024-09-27T10:10:49Z", + "_createdAt": "2024-09-25T14:17:47Z", + "_rev": "rsIQ9Jd8Z4nKBVUruy4QXm", + "publishedAt": "2024-05-17T06:00:00.000Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow Ambassadors are passionate individuals within the Nextflow community who play a more active role in fostering collaboration, knowledge sharing, and engagement. We launched this program at the Nextflow Summit in Barcelona last year, and it's been a great experience so far, so we've been recruiting more volunteers to expand the program. We’re going to close applications in June with the goal of having new ambassadors start in July, so if you’re interested in becoming an ambassador, now is your chance to apply!", + "_key": "7e6ece563924", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6d575d6cfac5" + }, + { + "children": [ + { + "_key": "91890b4b3887", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d6d9af1e1b03", + "markDefs": [] + }, + { + "style": "normal", + "_key": "a85f39c3d1a1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d06b4c4c06a0" + } + ], + "_type": "block" + }, + { + "_key": "2b77bdbba894", + "markDefs": [], + "children": [ + { + "text": "The program has been off to a great start, bringing together a diverse group of 46 passionate individuals from around the globe. Our ambassadors have done a great job in their dedication to spreading the word about Nextflow, contributing significantly to the community in numerous ways, including writing insightful content, organizing impactful events, conducting training sessions, leading hackathons, and even contributing to the codebase. Their efforts have not only enhanced the Nextflow ecosystem but have also fostered a stronger, more interconnected global community.", + "_key": "6fde3fedb44d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ecfd3801dcbb" + } + ], + "_type": "block", + "style": "normal", + "_key": "89063167f789" + }, + { + "_key": "1dda5de30456", + "markDefs": [ + { + "_type": "link", + "href": "http://seqera.typeform.com/ambassadors/", + "_key": "5af5aa205671" + } + ], + "children": [ + { + "marks": [], + "text": "To support their endeavors, we provide our ambassadors with exclusive swag, essential assets to facilitate their work and funding to attend events where they can promote Nextflow. With the end of the first semester fast approaching, we are excited to officially announce the second cohort of the Nextflow Ambassador program will start in July. If you are passionate about Nextflow and eager to make a meaningful impact, we invite you to ", + "_key": "43d29cb87660", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5af5aa205671" + ], + "text": "apply", + "_key": "87ec3a695513" + }, + { + "marks": [], + "text": " and join our vibrant community of ambassadors.", + "_key": "c18238852d01", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "9030207786db", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7ef43e56c34e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "4f7778368561", + "markDefs": [], + "children": [ + { + "_key": "5fb7876f39a3", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Application Details:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "96906ce98f410", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Call for Applications:" + }, + { + "_type": "span", + "marks": [], + "text": " Open until June 14 (23h59 any timezone)", + "_key": "96906ce98f411" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2dcc826d6d32", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Notification of Acceptance:", + "_key": "f1500bf0b0b80" + }, + { + "text": " By June 30", + "_key": "f1500bf0b0b81", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ef4195612021", + "listItem": "bullet" + }, + { + "_key": "b1587290b29c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Program Start:", + "_key": "a4004a93d1ed0" + }, + { + "text": " July 2024", + "_key": "a4004a93d1ed1", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "bc52415cf208", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "34497f1fb28d" + }, + { + "alt": "Ambassadors on action!", + "_key": "b90cdf8e6e54", + "asset": { + "_ref": "image-7c42127c90d2ecdf3dcf2c56f7963d7ba2319ec6-900x981-jpg", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "text": "We seek enthusiastic individuals ready to take their contribution to the next level through various initiatives such as content creation, event organization, training, hackathons, and more. As an ambassador, you will receive support and resources to help you succeed in your role, including swag, necessary assets, and funding for event participation.", + "_key": "4c8084102a05", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d221e358e68c", + "markDefs": [] + }, + { + "_key": "34301b9b17e4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a48e39d6bbc0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "8b16a9fdf867", + "markDefs": [ + { + "href": "http://seqera.typeform.com/ambassadors/", + "_key": "f1de71ec581f", + "_type": "link" + } + ], + "children": [ + { + "text": "To apply, please visit our ", + "_key": "ba26a3bb98dc", + "_type": "span", + "marks": [] + }, + { + "text": "Nextflow Ambassador Program Application Page", + "_key": "a7865b583ac8", + "_type": "span", + "marks": [ + "f1de71ec581f" + ] + }, + { + "text": " and submit your application no later than 23h59 June 14 (any timezone). The form shouldn’t take more than a few minutes to complete. We are eager to welcome a new group of ambassadors who will help support the growth and success of the Nextflow community.", + "_key": "375c984f1e40", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "98cbf3866d43", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0d97977e18b9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c99b7ca4226d", + "markDefs": [], + "children": [ + { + "_key": "6accc33d932a", + "_type": "span", + "marks": [], + "text": "Thanks to all our current ambassadors for their incredible work and dedication. We look forward to seeing the new ideas and initiatives that the next cohort of ambassadors will bring to the table. Together, let's continue to build a stronger, more dynamic Nextflow community." + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "86b3266d82d0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b16255492223", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "4dce683d4c15", + "markDefs": [ + { + "_type": "link", + "href": "http://seqera.typeform.com/ambassadors/", + "_key": "7d67606dc278" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "7d67606dc278" + ], + "text": "Apply now and become a part of the Nextflow journey!", + "_key": "1db2755d0bde" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "4fc5c2981b2b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "4706cdc9c071", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [ + { + "href": "https://twitter.com/nextflowio", + "_key": "0aa9541e8d4f", + "_type": "link" + } + ], + "children": [ + { + "text": "Stay tuned for more updates and follow us on our", + "_key": "02c1551fcfb4", + "_type": "span", + "marks": [] + }, + { + "text": " social media channels", + "_key": "4c3f1e4a784a", + "_type": "span", + "marks": [ + "0aa9541e8d4f" + ] + }, + { + "marks": [], + "text": " to keep up with the latest news and events from the Nextflow community.", + "_key": "c9a94abff25c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "13d2d98d9f53" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "62fd38b40f3e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4aefddc19b7d" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html", + "_key": "176ed3c55b5d" + } + ], + "children": [ + { + "_key": "ea4453f358d20", + "_type": "span", + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it " + }, + { + "_key": "ea4453f358d21", + "_type": "span", + "marks": [ + "176ed3c55b5d" + ], + "text": "here" + }, + { + "marks": [], + "text": ".", + "_key": "ea4453f358d22", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "e374f131102b" + } + ] + }, + { + "publishedAt": "2023-03-10T07:00:00.000Z", + "_id": "9332f4bb5d0c", + "_type": "blogPost", + "title": "The State of Kubernetes in Nextflow", + "meta": { + "slug": { + "current": "the-state-of-kubernetes-in-nextflow" + } + }, + "_rev": "mvya9zzDXWakVjnX4hdtjq", + "_createdAt": "2024-09-25T14:17:43Z", + "_updatedAt": "2024-09-26T09:04:14Z", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "ecf843b8c607" + } + ], + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "6fedfcdf5c23", + "_type": "span", + "marks": [], + "text": "Hi, my name is Ben, and I’m a software engineer at Seqera Labs. I joined Seqera in November 2021 after finishing my Ph.D. at Clemson University. I work on a number of things at Seqera, but my primary role is that of a Nextflow core contributor." + } + ], + "_type": "block", + "style": "normal", + "_key": "19ca528b9e1f" + }, + { + "children": [ + { + "text": "", + "_key": "12ef59bcb6f5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e4178e30df85" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/bentsherman/tesseract", + "_key": "93f33e18ee65" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I have run Nextflow just about everywhere, from my laptop to my university cluster to the cloud and Kubernetes. I have written Nextlfow pipelines for bioinformatics and machine learning, and I even wrote a pipeline to run other Nextflow pipelines for my ", + "_key": "211f1a1cf4c8" + }, + { + "_key": "18e7700f24ed", + "_type": "span", + "marks": [ + "93f33e18ee65" + ], + "text": "dissertation research" + }, + { + "_type": "span", + "marks": [], + "text": ". While I tried to avoid contributing code to Nextflow as a student (I had enough work already), now I get to work on it full-time!", + "_key": "2da97e62612e" + } + ], + "_type": "block", + "style": "normal", + "_key": "239fb9f09c3e" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "08dedb6623fa" + } + ], + "_type": "block", + "style": "normal", + "_key": "3f5f9c7dbf55" + }, + { + "_key": "215f08aacf9f", + "markDefs": [], + "children": [ + { + "_key": "36950a120858", + "_type": "span", + "marks": [], + "text": "Which brings me to the topic of this post: Nextflow and Kubernetes." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "51e43986ef66", + "children": [ + { + "_type": "span", + "text": "", + "_key": "df7d83de0b1c" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/seqeralabs/nf-k8s-best-practices", + "_key": "d99e281a926d" + } + ], + "children": [ + { + "_key": "3da38c183fe1", + "_type": "span", + "marks": [], + "text": "One of my first contributions was a “" + }, + { + "text": "best practices guide", + "_key": "c9d06ded2986", + "_type": "span", + "marks": [ + "d99e281a926d" + ] + }, + { + "text": "” for running Nextflow on Kubernetes. The guide has helped many people, but for me it provided a map for how to improve K8s support in Nextflow. You see, Nextflow was originally built for HPC, while Kubernetes and cloud batch executors were added later. While Nextflow’s extensible design makes adding features like new executors relatively easy, support for Kubernetes is still a bit spotty.", + "_key": "a78ffc2c8f92", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7626c57a0eaf" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ed2e76b747eb" + } + ], + "_type": "block", + "style": "normal", + "_key": "e258f2d35f8b" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "So, I set out to make Nextflow + K8s great! Over the past year, in collaboration with talented members of the Nextflow community, we have added all sorts of enhancements to the K8s executor. In this blog post, I’d like to show off all of these improvements in one place. So here we go!", + "_key": "db6cdb91fc7e" + } + ], + "_type": "block", + "style": "normal", + "_key": "985649dd9414" + }, + { + "style": "normal", + "_key": "8f3dd347f7b8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "53eae553e2e0" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "596b1d0caa85", + "children": [ + { + "_key": "3f7893bfa792", + "_type": "span", + "text": "New features" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "9a785fc5b84b", + "children": [ + { + "text": "Submit tasks as Kubernetes Jobs", + "_key": "9683a886bad5", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "New in version 22.05.0-edge.", + "_key": "3de28352a537" + } + ], + "_type": "block", + "style": "normal", + "_key": "6b2e0b286ef7" + }, + { + "style": "normal", + "_key": "9deeba062943", + "children": [ + { + "_key": "37b7c65f74af", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Nextflow submits tasks as Pods by default, which is sort of a bad practice. In Kubernetes, every Pod should be created through a controller (e.g., Deployment, Job, StatefulSet) so that Pod failures can be handled automatically. For Nextflow, the appropriate controller is a K8s Job. Using Jobs instead of Pods directly has greatly improved the stability of large Nextflow runs on Kubernetes, and will likely become the default behavior in a future version.", + "_key": "b0e4a556abc5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "82cfef1ebac1", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "0652bdd7fa09", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "19b06d3b769c" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can enable this feature with the following configuration option:", + "_key": "3fec01af1843" + } + ], + "_type": "block", + "style": "normal", + "_key": "2541d83949a5", + "markDefs": [] + }, + { + "style": "normal", + "_key": "3cf96fd0cd2a", + "children": [ + { + "text": "", + "_key": "44ad7c6b424d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "k8s.computeResourceType = 'Job'", + "_type": "code", + "_key": "25dfa10ed742" + }, + { + "children": [ + { + "_key": "8d3adf4da52a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "fb0a48f7c4f8" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Credit goes to @xhejtman from CERIT-SC for leading the charge on this one!", + "_key": "4f5953711a45" + } + ], + "_type": "block", + "style": "normal", + "_key": "b5c1789531db" + }, + { + "style": "normal", + "_key": "86eb999dc68a", + "children": [ + { + "text": "", + "_key": "85dab948f591", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Object storage as the work directory", + "_key": "67ef8ea88c66" + } + ], + "_type": "block", + "style": "h3", + "_key": "81880c310e2e" + }, + { + "_key": "fa6d303e08d7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "New in version 22.10.0.", + "_key": "7bb9acd75f94" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "2c46cc969f49", + "children": [ + { + "_type": "span", + "text": "", + "_key": "22f11edf180c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "One of the most difficult aspects of using Nextflow with Kubernetes is that Nextflow needs a ", + "_key": "790f109692a8", + "_type": "span", + "marks": [] + }, + { + "text": "PersistentVolumeClaim", + "_key": "67727ed103e5", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "4f6c14306723", + "_type": "span", + "marks": [], + "text": " (PVC) to store the shared work directory, which also means that Nextflow itself must run inside the Kubernetes cluster in order to access this storage. While the " + }, + { + "_key": "0d0ed276038e", + "_type": "span", + "marks": [ + "code" + ], + "text": "kuberun" + }, + { + "marks": [], + "text": " command attempts to automate this process, it has never been reliable enough for production usage.", + "_key": "7f4c9a02bee5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc4f7a5eed1a", + "markDefs": [] + }, + { + "_key": "64f0cd3518e6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "280505e02144" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "At the Nextflow Summit in October 2022, we introduced ", + "_key": "0bef4499259c" + }, + { + "_type": "span", + "marks": [ + "64ad5275d37b" + ], + "text": "Fusion", + "_key": "12b587016fbe" + }, + { + "marks": [], + "text": ", a file system driver that can mount S3 buckets as POSIX-like directories. The combination of Fusion and ", + "_key": "39158b96e158", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "0a2d02c8fa6d" + ], + "text": "Wave", + "_key": "d5d4b8ce75cd" + }, + { + "marks": [], + "text": " (a just-in-time container provisioning service) enables you to have your work directory in S3-compatible storage. See the ", + "_key": "daa65d13f159", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "351da3e8fccf" + ], + "text": "Wave blog post", + "_key": "856644cb4159" + }, + { + "text": " for an explanation of how it works — it’s pretty cool.", + "_key": "8729f23bf7c4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9aeda8686933", + "markDefs": [ + { + "href": "https://seqera.io/fusion/", + "_key": "64ad5275d37b", + "_type": "link" + }, + { + "_key": "0a2d02c8fa6d", + "_type": "link", + "href": "https://seqera.io/wave/" + }, + { + "_type": "link", + "href": "https://nextflow.io/blog/2022/rethinking-containers-for-cloud-native-pipelines.html", + "_key": "351da3e8fccf" + } + ] + }, + { + "_key": "55d57557f9ee", + "children": [ + { + "text": "", + "_key": "b9782e4e85b3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c9074de4db11", + "markDefs": [], + "children": [ + { + "_key": "df43f87adc9d", + "_type": "span", + "marks": [], + "text": "This functionality is useful in general, but it is especially useful for Kubernetes, because (1) you don’t need to provision your own PVC and (2) you can run Nextflow on Kubernetes without using " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "kuberun", + "_key": "6e4cfd7b752c" + }, + { + "marks": [], + "text": " or creating your own submitter Pod.", + "_key": "ac3c9b92e1c0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "5f9d95d7c840", + "children": [ + { + "_type": "span", + "text": "", + "_key": "86be23542dc1" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "201fbe498d30", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "This feature currently supports AWS S3 on Elastic Kubernetes Service (EKS) clusters and Google Cloud Storage on Google Kubernetes Engine (GKE) clusters.", + "_key": "a8531c38f912", + "_type": "span" + } + ] + }, + { + "_key": "de2ff593c10c", + "children": [ + { + "text": "", + "_key": "d0c1b0a294b8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "fa45b06bee52", + "markDefs": [ + { + "_key": "040da5c03f62", + "_type": "link", + "href": "https://seqera.io/blog/deploying-nextflow-on-amazon-eks/" + } + ], + "children": [ + { + "_key": "a1c367725863", + "_type": "span", + "marks": [], + "text": "Check out " + }, + { + "_key": "e1fd7bf5116e", + "_type": "span", + "marks": [ + "040da5c03f62" + ], + "text": "this article" + }, + { + "text": " over at the Seqera blog for an in-depth guide to running Nextflow (with Fusion) on Amazon EKS.", + "_key": "9a61d9eda545", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "d37ef283a2e5", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ab1fb0e5c358" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "No CPU limits by default", + "_key": "e208bce6ff06", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "968f00cfd8af" + }, + { + "_key": "70fdd51b16fd", + "markDefs": [], + "children": [ + { + "_key": "f591fae54322", + "_type": "span", + "marks": [ + "em" + ], + "text": "New in version 22.11.0-edge." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f860bb849034", + "children": [ + { + "text": "", + "_key": "387149a8e747", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "5a5809255ae4", + "markDefs": [ + { + "_type": "link", + "href": "https://home.robusta.dev/blog/stop-using-cpu-limits", + "_key": "e7cb2b1fa984" + } + ], + "children": [ + { + "marks": [], + "text": "We have changed the default behavior of CPU requests for the K8s executor. Before, a single number in a Nextflow resource request (e.g., ", + "_key": "b51b579d006d", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "cpus = 8", + "_key": "bea175b6b61d" + }, + { + "_type": "span", + "marks": [], + "text": ") was interpreted as both a “request” (lower bound) and a “limit” (upper bound) in the Pod definition. However, setting an explicit CPU limit in K8s is increasingly seen as an anti-pattern (see ", + "_key": "88cb0720aaed" + }, + { + "_type": "span", + "marks": [ + "e7cb2b1fa984" + ], + "text": "this blog post", + "_key": "a07452f46e0d" + }, + { + "_type": "span", + "marks": [], + "text": " for an explanation). The bottom line is that it is better to specify a request without a limit, because that will ensure that each task has the CPU time it requested, while also allowing the task to use more CPU time if it is available. Unlike other resources like memory and disk, CPU time is compressible — it can be given and taken away without killing the application.", + "_key": "54265ee67837" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "097316b4d57c" + } + ], + "_type": "block", + "style": "normal", + "_key": "2e81d729fbf0" + }, + { + "_key": "f2674bd789e0", + "markDefs": [ + { + "_type": "link", + "href": "https://www.batey.info/cgroup-cpu-shares-for-docker.html", + "_key": "77752d57cb48" + }, + { + "_type": "link", + "href": "https://www.batey.info/cgroup-cpu-shares-for-kubernetes.html", + "_key": "932038df245b" + }, + { + "_type": "link", + "href": "https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions", + "_key": "31774581e1a2" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We have also updated the Docker integration in Nextflow to use ", + "_key": "42ead20f10d6" + }, + { + "marks": [ + "77752d57cb48" + ], + "text": "CPU shares", + "_key": "90fff1eb3f87", + "_type": "span" + }, + { + "_key": "34a827b39b91", + "_type": "span", + "marks": [], + "text": ", which is the mechanism used by " + }, + { + "_type": "span", + "marks": [ + "932038df245b" + ], + "text": "Kubernetes", + "_key": "209076932a8c" + }, + { + "marks": [], + "text": " and ", + "_key": "893d6fdafeb6", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "31774581e1a2" + ], + "text": "AWS Batch", + "_key": "1183f11d2904" + }, + { + "marks": [], + "text": " under the hood to define expandable CPU requests. These changes make the behavior of CPU requests in Nextflow much more consistent across executors.", + "_key": "c6f5ec389dc0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "81005f148aa4", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1667dd45f8de" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "CSI ephemeral volumes", + "_key": "a8bfc0218936", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "8eda965e1d13" + }, + { + "_key": "36c41dc67bfc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "New in version 22.11.0-edge.", + "_key": "60c6a1341082" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "bc4e0adabfc5" + } + ], + "_type": "block", + "style": "normal", + "_key": "3e4904ecbb8c" + }, + { + "style": "normal", + "_key": "503bde498094", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In Kubernetes, volumes are used to provide storage and data (e.g., configuration and secrets) to Pods. Persistent volumes exist independently of Pods and can be mounted and unmounted over time, while ephemeral volumes are attached to a single Pod and are created and destroyed alongside it. While Nextflow can use any persistent volume through a ", + "_key": "92858f377342" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "PersistentVolumeClaim", + "_key": "a07ab88f4d09" + }, + { + "_key": "d090f02b6d6a", + "_type": "span", + "marks": [], + "text": ", ephemeral volume types are supported on a case-by-case basis. For example, " + }, + { + "_key": "faca8b0006e5", + "_type": "span", + "marks": [ + "code" + ], + "text": "ConfigMaps" + }, + { + "marks": [], + "text": " and ", + "_key": "ac3400cab9f2", + "_type": "span" + }, + { + "text": "Secrets", + "_key": "9ade93816887", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " are two ephemeral volume types that are already supported by Nextflow.", + "_key": "e45da2b9b362" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "bb4c64da22b1", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ef901eb0f6ea" + }, + { + "children": [ + { + "text": "Nextflow now also supports ", + "_key": "0a44936b30c5", + "_type": "span", + "marks": [] + }, + { + "_key": "97f41dfcefd3", + "_type": "span", + "marks": [ + "a270ff0f1c7c" + ], + "text": "CSI ephemeral volumes" + }, + { + "marks": [], + "text": ". CSI stands for Container Storage Interface, and it is a standard used by Kubernetes to support third-party storage systems as volumes. The most common example of a CSI ephemeral volume is ", + "_key": "f76c23fca848", + "_type": "span" + }, + { + "text": "Secrets Store", + "_key": "f7b24be0c1ca", + "_type": "span", + "marks": [ + "1f23a4831c66" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", which is used to inject secrets from a remote vault such as ", + "_key": "abe04488aa83" + }, + { + "_key": "67eb19b02544", + "_type": "span", + "marks": [ + "15d62b58a492" + ], + "text": "Hashicorp Vault" + }, + { + "marks": [], + "text": " or ", + "_key": "471ab20a21f8", + "_type": "span" + }, + { + "_key": "2fe8c0def5a1", + "_type": "span", + "marks": [ + "5980b72f67f2" + ], + "text": "Azure Key Vault" + }, + { + "text": ".", + "_key": "de238442eb21", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "76fec9bc367c", + "markDefs": [ + { + "_type": "link", + "href": "https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes", + "_key": "a270ff0f1c7c" + }, + { + "_key": "1f23a4831c66", + "_type": "link", + "href": "https://secrets-store-csi-driver.sigs.k8s.io/getting-started/usage.html" + }, + { + "_type": "link", + "href": "https://www.vaultproject.io/", + "_key": "15d62b58a492" + }, + { + "_key": "5980b72f67f2", + "_type": "link", + "href": "https://azure.microsoft.com/en-us/products/key-vault/" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "12f0e17dbc6e" + } + ], + "_type": "block", + "style": "normal", + "_key": "f976716877a3" + }, + { + "_key": "54b005654f30", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "Note: CSI persistent volumes can already be used in Nextflow through a `PersistentVolumeClaim`.", + "_key": "a5cfa7fe2385", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f72db697264c", + "children": [ + { + "text": "", + "_key": "58370fab12d9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Local disk storage for tasks", + "_key": "c0831ae7868e", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "09ff1180f248" + }, + { + "style": "normal", + "_key": "cd8448b26e3c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "New in version 22.11.0-edge.", + "_key": "e4f5fba6ceb3" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "86cdfabd6b5b", + "children": [ + { + "_key": "7702047c9372", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "d0dd800d3428", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow uses a shared work directory to coordinate tasks. Each task receives its own subdirectory with the required input files, and each task is expected to write its output files to this directory. As a workflow scales to thousands of concurrent tasks, this shared storage becomes a major performance bottleneck. We are investigating a few different ways to overcome this challenge.", + "_key": "22ca856a4129" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "307d957f7cc7", + "children": [ + { + "_key": "fd6790101dd6", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "37607e0f3bbc", + "markDefs": [], + "children": [ + { + "_key": "7ed657348aa5", + "_type": "span", + "marks": [], + "text": "One of the tools we have to reduce I/O pressure on the shared work directory is to make tasks use local storage. For example, if a task takes input file A, produces an intermediate file B, then produces an output file C, the file B can be written to local storage instead of shared storage because it isn’t a required output file. Or, if the task writes an output file line by line instead of all at once at the end, it can stream the output to local storage first and then copy the file to shared storage." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "49ba75520b95" + } + ], + "_type": "block", + "style": "normal", + "_key": "dafaa107cbb1" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While it is far from a comprehensive solution, local storage can reduce I/O congestion in some cases. Provisioning local storage for every task looks different on every platform, and in some cases it is not supported. Fortunately, Kubernetes provides a seamless interface for local storage, and now Nextflow supports it as well.", + "_key": "5a871aa30f1b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "57dd28d20256" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4653247d1da3" + } + ], + "_type": "block", + "style": "normal", + "_key": "a00191902bf3" + }, + { + "style": "normal", + "_key": "7d7957afe942", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To provision local storage for tasks, you must (1) add an ", + "_key": "691b31a812ff", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "emptyDir", + "_key": "0623975d07a2" + }, + { + "text": " volume to your Pod options, (2) request disk storage via the ", + "_key": "018569beb90f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "disk", + "_key": "48c052322e33" + }, + { + "marks": [], + "text": " directive, and (3) direct tasks to use the local storage with the ", + "_key": "3eefa49af60a", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "scratch", + "_key": "d26e41f991bd", + "_type": "span" + }, + { + "marks": [], + "text": " directive. Here’s an example:", + "_key": "4f970d7ccde0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0b0520a4a7c3", + "children": [ + { + "_key": "33c3237c31e5", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "fb79b8df0cd4", + "code": "process {\n disk = 10.GB\n pod = [ [emptyDir: [:], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}" + }, + { + "style": "normal", + "_key": "d443f372f0b0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3050548789c9" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "61b7c40abe60", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As a bonus, you can also provision an ", + "_key": "8f2949f4508d" + }, + { + "marks": [ + "code" + ], + "text": "emptyDir", + "_key": "effaba032cc5", + "_type": "span" + }, + { + "_key": "27ffb24d776a", + "_type": "span", + "marks": [], + "text": " backed by memory:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e34d1260c7e2" + } + ], + "_type": "block", + "style": "normal", + "_key": "e17210a0af8a" + }, + { + "_key": "01a28caf74ee", + "code": "process {\n memory = 10.GB\n pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]\n scratch = '/scratch'\n}", + "_type": "code" + }, + { + "_key": "726ba6601bd9", + "children": [ + { + "_key": "8b6aca8d2758", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "311a82545733", + "markDefs": [ + { + "_type": "link", + "href": "https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage", + "_key": "f50835bce9df" + }, + { + "_type": "link", + "href": "https://kubernetes.io/docs/concepts/storage/volumes/#emptydir", + "_key": "8320baeaa362" + } + ], + "children": [ + { + "_key": "48d73be41965", + "_type": "span", + "marks": [], + "text": "Nextflow maps the " + }, + { + "_key": "67d0f4dbdc00", + "_type": "span", + "marks": [ + "code" + ], + "text": "disk" + }, + { + "_type": "span", + "marks": [], + "text": " directive to the ", + "_key": "50ed5bebc5e5" + }, + { + "_key": "4be78cce7fbb", + "_type": "span", + "marks": [ + "f50835bce9df" + ], + "text": "`ephemeral-storage`" + }, + { + "marks": [], + "text": " resource request, which is provided by the ", + "_key": "0812dd74d622", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "8320baeaa362" + ], + "text": "`emptyDir`", + "_key": "0d7200fda592" + }, + { + "text": " volume (another ephemeral volume type).", + "_key": "bdc237e08111", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4e08db7d667a" + } + ], + "_type": "block", + "style": "normal", + "_key": "a1f7150536ac" + }, + { + "_type": "block", + "style": "h3", + "_key": "9e11c750defb", + "children": [ + { + "_type": "span", + "text": "Miscellaneous", + "_key": "9f5e8bf649e7" + } + ] + }, + { + "style": "normal", + "_key": "1b829fff2049", + "markDefs": [ + { + "_key": "6ac7abb08337", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases" + }, + { + "href": "https://github.com/nextflow-io/nextflow/pulls?q=is%3Apr+label%3Aplatform%2Fk8s", + "_key": "c92264f0c3e7", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Check the ", + "_key": "2793db786774", + "_type": "span" + }, + { + "text": "release notes", + "_key": "2081cb118d33", + "_type": "span", + "marks": [ + "6ac7abb08337" + ] + }, + { + "_key": "47b4550a4bae", + "_type": "span", + "marks": [], + "text": " or the list of " + }, + { + "_type": "span", + "marks": [ + "c92264f0c3e7" + ], + "text": "K8s pull requests", + "_key": "4072cd966d4e" + }, + { + "text": " on Github to see what else has been added. Here are some notable improvements from the past year:", + "_key": "c400131124ee", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "1e209eb94429", + "children": [ + { + "text": "", + "_key": "4a3f106c4041", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "753c7b8f90cd", + "listItem": "bullet", + "children": [ + { + "_key": "84fb14f041a3", + "_type": "span", + "marks": [], + "text": "Support Pod `affinity` ([640cbed4](https://github.com/nextflow-io/nextflow/commit/640cbed4813a34887d4dc10f87fa2e4aa524d055))" + }, + { + "marks": [], + "text": "Support Pod `automountServiceAccountToken` ([1b5908e4](https://github.com/nextflow-io/nextflow/commit/1b5908e4cbbb79f93be2889eec3acfa6242068a1))", + "_key": "a891fd05695c", + "_type": "span" + }, + { + "_key": "f0fb84c615a9", + "_type": "span", + "marks": [], + "text": "Support Pod `priorityClassName` ([51650f8c](https://github.com/nextflow-io/nextflow/commit/51650f8c411ba40f0966031035e7a47c036f542e))" + }, + { + "_key": "bf15b6401aa4", + "_type": "span", + "marks": [], + "text": "Support Pod `tolerations` ([7f7cdadc](https://github.com/nextflow-io/nextflow/commit/7f7cdadc6a36d0fb99ef125f6c6f89bfca8ca52e))" + }, + { + "text": "Support `time` directive via `activeDeadlineSeconds` ([2b6f70a8](https://github.com/nextflow-io/nextflow/commit/2b6f70a8fa55b993fa48755f7a47ac9e1b584e48))", + "_key": "47336430d969", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Improved control over error conditions ([064f9bc4](https://github.com/nextflow-io/nextflow/commit/064f9bc4), [58be2128](https://github.com/nextflow-io/nextflow/commit/58be2128), [d86ddc36](https://github.com/nextflow-io/nextflow/commit/d86ddc36))", + "_key": "c69add6f0be2", + "_type": "span" + }, + { + "_key": "a894896d040d", + "_type": "span", + "marks": [], + "text": "Improved support for labels and queue annotation ([9951fcd9](https://github.com/nextflow-io/nextflow/commit/9951fcd9), [4df8c8d2](https://github.com/nextflow-io/nextflow/commit/4df8c8d2))" + }, + { + "_key": "3aad99ad6ee0", + "_type": "span", + "marks": [], + "text": "Add support for AWS IAM role for Service Accounts ([62df42c3](https://github.com/nextflow-io/nextflow/commit/62df42c3), [c3364d0f](https://github.com/nextflow-io/nextflow/commit/c3364d0f), [b3d33e3b](https://github.com/nextflow-io/nextflow/commit/b3d33e3b))" + } + ], + "_type": "block" + }, + { + "_key": "d740f397e6ab", + "children": [ + { + "_key": "9ee92972edd2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "66adec89a99c", + "children": [ + { + "text": "Beyond Kubernetes", + "_key": "3f6eda412884", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "f9a19d2c8903", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow", + "_key": "3c57fa7accdb", + "_type": "link" + } + ], + "children": [ + { + "_key": "d5181e1eebf9", + "_type": "span", + "marks": [], + "text": "We’ve added tons of value to Nextflow over the past year – not just in terms of Kubernetes support, but also in terms of performance, stability, and integrations with other technologies – and we aren’t stopping any time soon! We have greater ambitions still for Nextflow, and I for one am looking forward to what we will accomplish together. As always, keep an eye on this blog, as well as the " + }, + { + "text": "Nextflow GitHub", + "_key": "10a2fa123be4", + "_type": "span", + "marks": [ + "3c57fa7accdb" + ] + }, + { + "text": " page, for the latest updates to Nextflow.", + "_key": "2aba301c178e", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + } + ], + "author": { + "_ref": "8bd9c7c9-b7e7-473a-ace4-2cf6802bc884", + "_type": "reference" + } + }, + { + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "67bc588250ea", + "_type": "span", + "marks": [], + "text": "It's time for the monthly Nextflow release for March, " + }, + { + "text": "edge", + "_key": "b9b46318a95d", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_key": "20d50a0505b2", + "_type": "span", + "marks": [], + "text": " version 19.03. This is another great release with some cool new features, bug fixes and improvements." + } + ], + "_type": "block", + "style": "normal", + "_key": "4e136b88cc83" + }, + { + "style": "normal", + "_key": "fad70a23d44e", + "markDefs": [], + "children": [ + { + "_key": "9009129bc77d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "SRA channel factory", + "_key": "a2401babb87b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "4c433f4c6a04", + "markDefs": [] + }, + { + "_key": "6a699e8b3acc", + "markDefs": [ + { + "href": "https://www.ncbi.nlm.nih.gov/sra", + "_key": "cea210fea683", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "This sees the introduction of the long-awaited sequence read archive (SRA) channel factory. The ", + "_key": "6df17a604a47", + "_type": "span" + }, + { + "marks": [ + "cea210fea683" + ], + "text": "SRA", + "_key": "b49ecd31d057", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " is a key public repository for sequencing data and run in coordination between The National Center for Biotechnology Information (NCBI), The European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).", + "_key": "a479b062944c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e8ad88128670", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "f6b6e637f639", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "82cdcee95b66", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues/89", + "_key": "b624e9be2505" + }, + { + "_key": "b9dae8ab88d1", + "_type": "link", + "href": "https://ewels.github.io/sra-explorer/" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/channel.html#fromfilepairs", + "_key": "8522e0b18f33" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This feature originates all the way back in ", + "_key": "223e42485b9f" + }, + { + "_type": "span", + "marks": [ + "b624e9be2505" + ], + "text": "2015", + "_key": "dd692646142a" + }, + { + "_type": "span", + "marks": [], + "text": " and was worked on during a 2018 Nextflow hackathon. It was brought to fore again thanks to the release of Phil Ewels' excellent ", + "_key": "97d3c5938e37" + }, + { + "marks": [ + "b9dae8ab88d1" + ], + "text": "SRA Explorer", + "_key": "66079ce2ee11", + "_type": "span" + }, + { + "text": ". The SRA channel factory allows users to pull read data in FASTQ format directly from SRA by referencing a study, accession ID or even a keyword. It works in a similar way to ", + "_key": "6e6a1dc8d5c8", + "_type": "span", + "marks": [] + }, + { + "_key": "1f3556066637", + "_type": "span", + "marks": [ + "8522e0b18f33" + ], + "text": "`fromFilePairs`" + }, + { + "text": ", returning a sample ID and files (single or pairs of files) for each sample.", + "_key": "44035ef6d932", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "037dd2ed8ba4", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3963c3f311a6" + }, + { + "style": "normal", + "_key": "9efa76fde530", + "markDefs": [], + "children": [ + { + "text": "The code snippet below creates a channel containing 24 samples from a chromatin dynamics study and runs FASTQC on the resulting files.", + "_key": "8cdf5635cc69", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "82d4b6f3c59e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "141eb1620078" + }, + { + "_type": "code", + "_key": "8f0078ccd86b", + "code": "Channel\n .fromSRA('SRP043510')\n .set{reads}\n\nprocess fastqc {\n input:\n set sample_id, file(reads_file) from reads\n\n output:\n file(\"fastqc_${sample_id}_logs\") into fastqc_ch\n\n script:\n \"\"\"\n mkdir fastqc_${sample_id}_logs\n fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads_file}\n \"\"\"\n}" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c7dd535fe6d2", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "69d063032752" + }, + { + "_key": "8a99193a8c41", + "markDefs": [ + { + "_key": "ffaffa9a5edb", + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/channel.html#fromsra" + } + ], + "children": [ + { + "_key": "ab6e5e703afa", + "_type": "span", + "marks": [], + "text": "See the " + }, + { + "_type": "span", + "marks": [ + "ffaffa9a5edb" + ], + "text": "documentation", + "_key": "33201e68ba32" + }, + { + "_type": "span", + "marks": [], + "text": " for more details. When combined with downstream processes, you can quickly open a firehose of data on your workflow!", + "_key": "f077513fb3f6" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "_type": "block", + "style": "normal", + "_key": "dd790e1e582c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a6109a53fc7e", + "_type": "span" + } + ] + }, + { + "_key": "69bb5ea8442c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Edge release", + "_key": "7fc30091e0b1" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "5bdfa216f2d7", + "markDefs": [], + "children": [ + { + "text": "Note that this is a monthly edge release. To use it simply execute the following command prior to running Nextflow:", + "_key": "814ad965fd03", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "473030fcf202", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5d8898e300d2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "27498f9d6701", + "code": "export NXF_VER=19.03.0-edge" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "21b5e47530e4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a6c4fcbec253" + }, + { + "style": "h2", + "_key": "c18ed3a40121", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you need help", + "_key": "c06782c67643", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "dec714c02936", + "markDefs": [ + { + "_type": "link", + "href": "https://gitter.im/nextflow-io/nextflow", + "_key": "a16101aa8b19" + }, + { + "_type": "link", + "href": "https://groups.google.com/forum/#!forum/nextflow", + "_key": "88156b52d77c" + } + ], + "children": [ + { + "_key": "9bd30a295105", + "_type": "span", + "marks": [], + "text": "Please don’t hesitate to use our very active " + }, + { + "_key": "9639ef714aad", + "_type": "span", + "marks": [ + "a16101aa8b19" + ], + "text": "Gitter" + }, + { + "_key": "2b34e025d1eb", + "_type": "span", + "marks": [], + "text": " channel or create a thread in the " + }, + { + "_key": "285724f17a80", + "_type": "span", + "marks": [ + "88156b52d77c" + ], + "text": "Google discussion group" + }, + { + "_key": "4a9683b8e21a", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "7ebf269746bf", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6de54a23616a", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "46902e2fd6f2", + "markDefs": [], + "children": [ + { + "_key": "ccb29324fb85", + "_type": "span", + "marks": [], + "text": "Reporting Issues" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f2787a57637c", + "markDefs": [ + { + "_key": "14d16b367b78", + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/issues" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Experiencing issues introduced by this release? Please report them in our ", + "_key": "c6bb388ffbac" + }, + { + "_type": "span", + "marks": [ + "14d16b367b78" + ], + "text": "issue tracker", + "_key": "28f8de0d9ae3" + }, + { + "_key": "2073f625f6a1", + "_type": "span", + "marks": [], + "text": ". Make sure to fill in the fields of the issue template." + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "b7b77fe2d3c9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "be31f953e1c9", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "13ebcad9bc5c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Contributions", + "_key": "51542dde8c36" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Special thanks to the contributors of this release:", + "_key": "5d026eb2859c" + } + ], + "_type": "block", + "style": "normal", + "_key": "263a1b2bc015" + }, + { + "_key": "4f56220b988c", + "markDefs": [], + "children": [ + { + "_key": "3ba1273e1ea5", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6b3f22c152a4", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/pachiras", + "_key": "5369da714672" + } + ], + "children": [ + { + "text": "Akira Sekiguchi - ", + "_key": "e81bb5449a7a0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "5369da714672" + ], + "text": "pachiras", + "_key": "e81bb5449a7a1" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://github.com/jhlegarreta", + "_key": "c0925f2bdab6", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Jon Haitz Legarreta Gorroño - ", + "_key": "c6f069f5a0e80", + "_type": "span" + }, + { + "text": "jhlegarreta", + "_key": "c6f069f5a0e81", + "_type": "span", + "marks": [ + "c0925f2bdab6" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "2e66b3970e62", + "listItem": "bullet" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "5b669598c157", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/JLLeitschuh", + "_key": "91c8afe94770" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Jonathan Leitschuh - ", + "_key": "1b96aaf6cf130" + }, + { + "_type": "span", + "marks": [ + "91c8afe94770" + ], + "text": "JLLeitschuh", + "_key": "1b96aaf6cf131" + } + ] + }, + { + "style": "normal", + "_key": "affea34a75a7", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/KevinSayers", + "_key": "33f4308d5d3d" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Kevin Sayers - ", + "_key": "48c9ee9f80660" + }, + { + "_type": "span", + "marks": [ + "33f4308d5d3d" + ], + "text": "KevinSayers", + "_key": "48c9ee9f80661" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a24308b08e2d", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/lukasjelonek", + "_key": "e68243e2e5da" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Lukas Jelonek - ", + "_key": "d14f0f722e8d0" + }, + { + "text": "lukasjelonek", + "_key": "d14f0f722e8d1", + "_type": "span", + "marks": [ + "e68243e2e5da" + ] + } + ] + }, + { + "children": [ + { + "text": "Paolo Di Tommaso - ", + "_key": "f5e48e5d05070", + "_type": "span", + "marks": [] + }, + { + "text": "pditommaso", + "_key": "f5e48e5d05071", + "_type": "span", + "marks": [ + "ac2996650b1a" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ffd4fc519503", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/pditommaso", + "_key": "ac2996650b1a" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d022e687152f", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/toniher", + "_key": "624b9f1e99d8" + } + ], + "children": [ + { + "marks": [], + "text": "Toni Hermoso Pulido - ", + "_key": "6c58b2d09df40", + "_type": "span" + }, + { + "_key": "6c58b2d09df41", + "_type": "span", + "marks": [ + "624b9f1e99d8" + ], + "text": "toniher" + } + ], + "level": 1 + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "50c231e4ddd1", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/phupe", + "_key": "9acfb434ce64" + } + ], + "children": [ + { + "_key": "cff527c23e640", + "_type": "span", + "marks": [], + "text": "Philippe Hupé " + }, + { + "_type": "span", + "marks": [ + "9acfb434ce64" + ], + "text": "phupe", + "_key": "cff527c23e641" + } + ] + }, + { + "markDefs": [ + { + "href": "https://github.com/phue", + "_key": "0a65dfec9ce1", + "_type": "link" + } + ], + "children": [ + { + "_key": "2fcddd6d21f30", + "_type": "span", + "marks": [ + "0a65dfec9ce1" + ], + "text": "phue" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "54a2c2fce5e3", + "listItem": "bullet" + }, + { + "_key": "34343ee97745", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "5f217248be81", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Complete changes", + "_key": "e0b2e51cfb52" + } + ], + "_type": "block", + "style": "h2", + "_key": "b0cc543c715a" + }, + { + "_key": "07f527f87be2", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fix Nextflow hangs submitting jobs to AWS batch #1024", + "_key": "b8293de20aaf0" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "156ea14aa69c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "a1b6af6f06a40", + "_type": "span", + "marks": [], + "text": "Fix process builder incomplete output [2fe1052c]" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Fix Grid executor reports invalid queue status #1045", + "_key": "f2d9403bc1930", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9c4bde892b48" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fix Script execute permission is lost in container #1060", + "_key": "13bc99e2a1e20" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "caf3edea2ce3" + }, + { + "_type": "block", + "style": "normal", + "_key": "06da9a0540aa", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fix K8s serviceAccount is not honoured #1049", + "_key": "df7837bcf2220" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "f9a1e88ef45c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "7a5c47bc8bcc0", + "_type": "span", + "marks": [], + "text": "Fix K8s kuberun login path #1072" + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "508a31af7158", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Fix K8s imagePullSecret and imagePullPolicy #1062", + "_key": "e59e2c153c060", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "text": "Fix Google Storage docs #1023", + "_key": "0c348b2288010", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "735e2649dd76", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fix Env variable NXF_CONDA_CACHEDIR is ignored #1051", + "_key": "1b34fade774e0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "521bcc06ab8d", + "listItem": "bullet" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Fix failing task due to legacy sleep command [3e150b56]", + "_key": "3e6b69e027b80", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "6775e1a2b675", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "e496ff52d99f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "fb3f267aeb960", + "_type": "span", + "marks": [], + "text": "Fix SplitText operator should accept a closure parameter #1021" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Add Channel.fromSRA factory method #1070", + "_key": "6074785505930" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8d93bd17384e", + "listItem": "bullet" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a3fd2c21eca3", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Add voluntary/involuntary context switches to metrics #1047", + "_key": "c6acc8f7dc8f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Add noHttps option to singularity config #1041", + "_key": "f1c96dedff1a0", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "981b88b89fc2", + "listItem": "bullet", + "markDefs": [] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "34f087a2c2af", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Add docker-daemon Singularity support #1043 [dfef1391]", + "_key": "9ac6353cf3fa0", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "14f4b8811e39", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use peak_vmem and peak_rss as default output in the trace file instead of rss and vmem #1020", + "_key": "a63221c1942b0" + } + ], + "level": 1, + "_type": "block" + }, + { + "_key": "f2de7161b70d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "4c3d54d586d90", + "_type": "span", + "marks": [], + "text": "Improve ansi log rendering #996 [33038a18]" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "2942fbdb143f", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7c62a2ea3d0a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "458d3220abb9", + "markDefs": [], + "children": [ + { + "text": "Breaking changes:", + "_key": "8078dc96d44a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "None known.", + "_key": "c3735f6f03c7" + } + ], + "_type": "block", + "style": "normal", + "_key": "e99ab15ca693" + } + ], + "_createdAt": "2024-09-25T14:15:43Z", + "publishedAt": "2019-03-19T07:00:00.000Z", + "title": "Edge release 19.03: The Sequence Read Archive & more!", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_updatedAt": "2024-10-02T11:20:50Z", + "_type": "blogPost", + "tags": [ + { + "_key": "d15804e240ca", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_id": "93e771b66031", + "_rev": "2PruMrLMGpvZP5qAknmAsT", + "meta": { + "description": "It’s time for the monthly Nextflow release for March, edge version 19.03. This is another great release with some cool new features, bug fixes and improvements.", + "slug": { + "current": "release-19.03.0-edge" + } + } + }, + { + "_rev": "2PruMrLMGpvZP5qAknmBRn", + "title": "Addressing Bioinformatics Core Challenges with Nextflow and nf-core", + "author": { + "_type": "reference", + "_ref": "5bLgfCKN00diCN0ijmWNzV" + }, + "body": [ + { + "style": "normal", + "_key": "a68c07125513", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face. From managing complex pipelines to optimizing resource utilization, Nextflow offers a range of benefits that can streamline workflows and improve productivity. In this blog, I'll summarize my talk and share insights on how Nextflow can help overcome some of those challenges, including meeting the needs of a wide range of users or customers, automate reporting, customising pipelines and training.", + "_key": "9937a66c6f7c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "f42e3ef9b96c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d765bfaeb31e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Challenge 1: running multiple services\n", + "_key": "2e0bf5dbcdc5", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "3a1d4760bd41", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "35e54a9bb003", + "markDefs": [], + "children": [ + { + "marks": [ + "em", + "strong" + ], + "text": "Challenge description", + "_key": "df11910dc792", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": ": “I have a wide range of stakeholders, and my pipelines need to address different needs in multiple scientific domains”", + "_key": "017d351b0a91", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "bfc3317714fb", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "275f34944a37", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "4f3ce74ac3d3", + "_type": "span", + "marks": [], + "text": "One of the biggest challenges faced by bioinformatics cores is catering to a diverse range of users with varying applications. On one hand, one might need to run analyses for researchers focused on cancer or human genetics. On the other hand, one may also need to support scientists working with mass spectrometry or metagenomics. Fortunately, the " + }, + { + "_key": "1ae990e7e40e", + "_type": "span", + "marks": [ + "3438bf75dd39" + ], + "text": "nf-core " + }, + { + "_type": "span", + "marks": [], + "text": "community has made it relatively easy to tackle these diverse needs with their curated pipelines. These pipelines are ready to use, covering a broad spectrum of applications, from genomics and metagenomics to immunology and mass spectrometry. In one of my slides I showed a non-exhaustive list, which spans genomics, metagenomics, immunology, mass spec, and more: one can find best-practice pipelines for almost any bioinformatics application imaginable, including emerging areas like imaging and spatial-omics. By leveraging this framework, one can not only tap into the expertise of the pipeline developers but also engage with them to discuss specific needs and requirements. This collaborative approach can significantly ease the deployment of a workflow, allowing the user to focus on high-priority tasks while ensuring that the analyses are always up to date and aligned with current best practices.", + "_key": "ab6b42e23951" + } + ], + "_type": "block", + "style": "normal", + "_key": "c2d57b583713", + "markDefs": [ + { + "href": "https://nf-co.re/", + "_key": "3438bf75dd39", + "_type": "link" + } + ] + }, + { + "_key": "a125c88f7086", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "53153364f048" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "c810f2bededf", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Challenge 2: customizing applications", + "_key": "643726f033af", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0a0108ae2436", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "", + "_key": "9d6feed71953" + } + ], + "_type": "block" + }, + { + "_key": "54c175c31e1c", + "markDefs": [], + "children": [ + { + "marks": [ + "em", + "strong" + ], + "text": "Challenge description:", + "_key": "5e1e74fa9c1e", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": " “We often need to customize our applications and pipeline, to meet specific in-house needs of our users”", + "_key": "05e60248abf0", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nWhile ready-to-use applications are a huge advantage, there are times when customisation is necessary. Perhaps the standard pipeline that works for most users doesn’t quite meet the specific needs of a facilities user or customer. Fortunately, the nf-core community has got these cases covered. With over 1,300 modules at everyone’s disposal, one can easily compose their own pipeline using the nf-core components and tooling. Should that not be enough though, one can even create a pipeline from scratch using nf-core tools. For instance, one can run a simple command like “", + "_key": "79850845b2860" + }, + { + "text": "nf-core create", + "_key": "79850845b2861", + "_type": "span", + "marks": [ + "5c99a560e367" + ] + }, + { + "_key": "79850845b2862", + "_type": "span", + "marks": [], + "text": "” followed by the name of the pipeline, and voilà! The software package will create a complete skeleton for the pipeline, filled with pre-compiled code and placeholders to ease customisation. This process is incredibly quick, as I demonstrated in a video clip during the talk, where a pipeline skeleton was created in just a few moments.\n" + } + ], + "_type": "block", + "style": "normal", + "_key": "1a89388c04cc", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/docs/nf-core-tools/pipelines/create", + "_key": "5c99a560e367" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/containers/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727424116222.1727429443081.78&__hssc=247481240.4.1727429443081&__hsfp=3485190257", + "_key": "c641581ec657" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Of course, customization isn’t limited to pipelines. It also applies to containers, which are a crucial enabler of portability. When it comes to containers, Nextflow users have two options: an easy way and a more advanced approach. The easy way involves using ", + "_key": "1a35c1e9ca580" + }, + { + "_key": "1a35c1e9ca581", + "_type": "span", + "marks": [ + "c641581ec657" + ], + "text": "Seqera Containers" + }, + { + "_type": "span", + "marks": [], + "text": ", a platform that allows anyone to compose a container using tools from bioconda, pypi, and conda-forge. No need for logging in, just select the tools, and the URL of your container will be made available in no time. One can build containers for either Docker or Singularity, and for different platforms (amd64 or arm64).\n", + "_key": "1a35c1e9ca582" + } + ], + "_type": "block", + "style": "normal", + "_key": "49d74bdfe2e5" + }, + { + "_key": "1f1d36823f2d", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/wave/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727424116222.1727429443081.78&__hssc=247481240.4.1727429443081&__hsfp=3485190257", + "_key": "3ba708782ca8" + } + ], + "children": [ + { + "text": "If one is looking for more control, they can use ", + "_key": "ccdb59de63070", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "3ba708782ca8" + ], + "text": "Wave", + "_key": "ccdb59de63071" + }, + { + "_key": "ccdb59de63072", + "_type": "span", + "marks": [], + "text": " as a command line. This is a powerful tool that can act as an intermediary between the user and a container registry. Wave builds containers on the fly, allowing anyone to pass a wave build command as an evaluation inside a docker run command. It’s incredibly fast, and builds containers from conda packages in a matter of seconds. Wave, which is also the engine behind Seqera Containers, can be extremely handy to allow other operations like container augmentation. This feature enables a user to add new layers to existing containers without having to rebuild them, thanks to Docker’s layer-based architecture. One can simply create a folder where configuration files or executable scripts are located, pass the folder to Wave which will add the folder with a new layer, and get the URL of the augmented container on the fly.\n" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Challenge 3: Reporting\n", + "_key": "802f20e866e8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "b581f677fbd5", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "f777c23dbbbc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em", + "strong" + ], + "text": "Challenge description", + "_key": "0729cddff4b9" + }, + { + "_key": "7d223b81b0d2", + "_type": "span", + "marks": [ + "em" + ], + "text": ": “I need to deliver a clear report of the analysis results, in a format that is accessible and can be used for publication purposes by my users”" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c30d698af275", + "markDefs": [], + "children": [ + { + "_key": "9679c0d0e7fc", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cbd516f5d2e1", + "markDefs": [ + { + "_key": "7e55fbb17c37", + "_type": "link", + "href": "https://seqera.io/multiqc/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727424116222.1727429443081.78&__hssc=247481240.4.1727429443081&__hsfp=3485190257" + } + ], + "children": [ + { + "marks": [], + "text": "Reporting is a crucial aspect of any bioinformatics pipeline, and as for customisation Nextflow offers different ways to approach it. suitable for different levels of expertise. The most straightforward solution involves running ", + "_key": "d1d5abb82c170", + "_type": "span" + }, + { + "text": "MultiQC", + "_key": "d1d5abb82c171", + "_type": "span", + "marks": [ + "7e55fbb17c37" + ] + }, + { + "marks": [], + "text": ", a tool that collects the output and logs of a wide range of software in a pipeline and generates a nicely formatted HTML report. This is a great option if one wants a quick and easy way to get a summary of their pipeline’s results. MultiQC is a widely used tool that supports a huge list (and growing) of bioinformatics tools and file formats, making it a great choice for many use cases.", + "_key": "d1d5abb82c172", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d3dfdd86c6cb", + "markDefs": [], + "children": [ + { + "text": "However, if the developer needs more control over the reporting process or wants to create a custom report that meets some specific needs, it is entirely possible to engineer the reports from scratch. This involves collecting the outputs from various processes in the pipeline and passing them as an input to a process that runs an R Markdown or Quarto script. R Markdown and Quarto are popular tools for creating dynamic documents that can be parameterised, allowing anyone to customize the content and the layout of a report dynamically. By using this approach, one can create a report that is tailored to your specific needs, including the types of plots and visualizations they want to include, the formatting and layouting, branding, and anything specific one might want to highlight.", + "_key": "ae2f1097f98f0", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3ae0072e729b", + "markDefs": [ + { + "href": "https://github.com/nf-core/modules/tree/master/modules/nf-core/quartonotebook", + "_key": "098a60cd57c5", + "_type": "link" + }, + { + "_key": "dcd4133a0474", + "_type": "link", + "href": "https://github.com/nf-core/modules/tree/master/modules/nf-core/jupyternotebook" + } + ], + "children": [ + { + "text": "To follow this approach, the user can either create their own customised module, or re-use one of the available notebooks modules in the nf-core repository (quarto ", + "_key": "5b86c2cf87530", + "_type": "span", + "marks": [] + }, + { + "text": "here", + "_key": "5b86c2cf87531", + "_type": "span", + "marks": [ + "098a60cd57c5" + ] + }, + { + "_key": "5b86c2cf87532", + "_type": "span", + "marks": [], + "text": ", or jupyter " + }, + { + "_type": "span", + "marks": [ + "dcd4133a0474" + ], + "text": "here", + "_key": "5b86c2cf87533" + }, + { + "marks": [], + "text": ").", + "_key": "5b86c2cf87534", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d751122ad9a9" + } + ], + "_type": "block", + "style": "h2", + "_key": "75136718b3b5", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Challenge 4: Monitoring", + "_key": "6203ea4d4856", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "ffac27e7ede5" + }, + { + "style": "normal", + "_key": "9abc7c1538b4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "", + "_key": "61f8a21a3758" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em", + "strong" + ], + "text": "Challenge description", + "_key": "fccf1accd839" + }, + { + "_key": "0e354eb6ebae", + "_type": "span", + "marks": [ + "em" + ], + "text": ": “I need to be able to estimate and optimize runtimes as well as costs of my pipelines, fitting our cost model”" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "7c8d0f0063d9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b3c50e3af9d8" + } + ], + "_type": "block", + "style": "normal", + "_key": "6e27826bf401" + }, + { + "_type": "block", + "style": "normal", + "_key": "3164f9d0f6c9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Monitoring is a critical aspect of pipeline management, and Nextflow provides a robust set of tools to help you track and optimise a pipeline's performance. At its core, monitoring involves tracking the execution of the pipeline to ensure that it's running efficiently and effectively. But it's not just about knowing how long a pipeline takes to run or how much it costs - it's also about making sure each process in the pipeline is using the requested resources efficiently. With Nextflow, the user can track the resources used by each process in your pipeline, including CPU, memory, and disk usage and compare them visually with the resources requested in the pipeline configuration and reserved by each job. This information allows the user to identify bottlenecks and areas for optimization, so one can fine-tune their pipeline for a better resource consumption. For example, if the user notices that one process is using a disproportionate amount of memory, they can adjust the configuration to better match the actual usage.", + "_key": "2048d002f6c8" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "004e1db61340", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bceac9303b39" + }, + { + "_key": "c2f1515e006c", + "markDefs": [], + "children": [ + { + "text": "But monitoring isn't just about optimizing a pipeline's performance - it's also about reducing the environmental impact where possible. A recently developed Nextflow plugin allows to track the carbon footprint of a pipeline, including the energy consumption and greenhouse gas emissions associated with running that pipeline. This information allows one to make informed decisions about their environmental impact, and gaining better awareness or even adopting greener strategies to computing.", + "_key": "ba2d91fb488f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "8fd55c6b98d9", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c6905075b83c", + "markDefs": [] + }, + { + "_key": "98e7f20f2897", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/platform/?__hstc=247481240.9785ace5f350c27ee07ceee486f18692.1717578742769.1727424116222.1727429443081.78&__hssc=247481240.4.1727429443081&__hsfp=3485190257", + "_key": "609dccf9d6c0" + } + ], + "children": [ + { + "marks": [], + "text": "One of the key benefits of Nextflow’s monitoring system is its flexibility. The user can either use the built-in html reports for trace and pipeline execution, or could monitor a run live by connecting to ", + "_key": "3d83104383ca", + "_type": "span" + }, + { + "marks": [ + "609dccf9d6c0" + ], + "text": "Seqera Platform", + "_key": "29bd787cef6f", + "_type": "span" + }, + { + "_key": "4993c58ee0ec", + "_type": "span", + "marks": [], + "text": " and visualising its progress on a graphical interface in real time. More expert or creative users could use the trace file produced by a Nextflow execution, to create their own metrics and visualizations." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "18c318c2de0d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bac3a32afd6e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d7b573462beb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Challenge 5: User accessibility", + "_key": "15b53f7ac5fa" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_type": "block", + "style": "normal", + "_key": "5473bca9f6b7", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "47179970c13f", + "_type": "span", + "marks": [ + "em" + ] + } + ] + }, + { + "children": [ + { + "text": "Challenge description", + "_key": "2e2afa95be08", + "_type": "span", + "marks": [ + "em", + "strong" + ] + }, + { + "_key": "9eb4409994b1", + "_type": "span", + "marks": [ + "em" + ], + "text": ": “I could balance workloads better, by giving users a certain level of autonomy in running some of my pipelines”" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "f00185c2a71f", + "markDefs": [] + }, + { + "_key": "6dcce420d8e9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d0200ff55e60" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f26d2acc88c1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "User accessibility is a crucial aspect of pipeline development, as it enables users with varying levels of bioinformatics experience to run complex pipelines with ease. One of the advantages of Nextflow, is that a developer can create pipelines that are not only robust and efficient but also user-friendly. Allowing your users to run them with a certain level of autonomy might be a good strategy in a bioinformatics core to decentralise straightforward analyses and invest human resources on more complex projects. Empowering a facility’s users to run specific pipelines independently could be a solution to reduce certain workloads.", + "_key": "f21ac96da3590" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The nf-core template includes a parameters schema, which is captured by the nf-core website to create a graphical interface for parameters configuration of the pipelines hosted under the ", + "_key": "0616bc09013d0" + }, + { + "_key": "0616bc09013d1", + "_type": "span", + "marks": [ + "4ca17ea1e558" + ], + "text": "nf-core organisation on GitHub" + }, + { + "_type": "span", + "marks": [], + "text": ". This interface allows users to fill in the necessary fields for parameters needed to run a pipeline, and allows even users with minimal experience with bioinformatics or command-line interfaces to quickly set up a run. The user can then simply copy and paste the command generated by the webpage into a terminal, and the pipeline will launch as configured. This approach is ideal for users who are familiar with basic computer tasks, and have a very minimal familiarity with a terminal.", + "_key": "0616bc09013d2" + } + ], + "_type": "block", + "style": "normal", + "_key": "a3f0e2b5e4a5", + "markDefs": [ + { + "_key": "4ca17ea1e558", + "_type": "link", + "href": "https://github.com/nf-core" + } + ] + }, + { + "style": "normal", + "_key": "b2f8f305d238", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "However, for users with even less bioinformatics experience, Nextflow and the nf-core template together offer an even more intuitive solution. The pipeline can be added to the launcher of the Seqera Platform, and one can provide users with a comprehensive and user-friendly interface that allows them to launch pipelines with ease. This platform offers a range of features, including access to datasets created from sample sheets, the ability to launch pipelines on a wide range of cloud environments as well as on HPC on-premise. A simple graphical interface simplifies the entire process.The Seqera Platform provides in this way a seamless and intuitive experience for users, allowing them to run pipelines without requiring extensive bioinformatics knowledge.", + "_key": "6e8da2f638dd", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "b9d92bbcc7f2", + "markDefs": [], + "children": [ + { + "_key": "e32e7da8c093", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Challenge 6: Training", + "_key": "d5ef6e79e1b7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "22eecfe0a3d4", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "43e0146022de", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ca2d2bad4b5b", + "_type": "span", + "marks": [ + "em" + ] + } + ] + }, + { + "_key": "2dc993b739f2", + "markDefs": [], + "children": [ + { + "text": "Challenge description", + "_key": "9c00a004ddcb", + "_type": "span", + "marks": [ + "em", + "strong" + ] + }, + { + "marks": [ + "em" + ], + "text": ": “Training my team and especially onboarding new team members is always challenging and requires documentation and good materials”", + "_key": "605dd28df498", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "dac284c2fbd1" + } + ], + "_type": "block", + "style": "normal", + "_key": "343ed3b88403", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "5e8d958b15160", + "_type": "span", + "marks": [], + "text": "The final challenge we often face in bioinformatics facilities is training. We all know that training is an ongoing issue, not just because of staff turnover and the need to onboard new recruits, but also because the field is constantly evolving. With new tools, techniques, and technologies emerging all the time, it can be difficult to keep up with the latest developments. However, training is crucial for ensuring that pipelines are robust, efficient, and accurate." + } + ], + "_type": "block", + "style": "normal", + "_key": "3fbc50c3b2d5" + }, + { + "style": "normal", + "_key": "c622efeeea18", + "markDefs": [ + { + "href": "https://training.nextflow.io/", + "_key": "617bdad5e617", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Fortunately, there are now many resources available to help with training. The ", + "_key": "06ee93adeef30", + "_type": "span" + }, + { + "_key": "06ee93adeef31", + "_type": "span", + "marks": [ + "617bdad5e617" + ], + "text": "Nextflow training website" + }, + { + "_key": "06ee93adeef32", + "_type": "span", + "marks": [], + "text": ", for example, has been completely rebuilt recently and now offers a wealth of material suitable for everyone, from beginners to experts. Whether you’re just starting out with Nextflow or are already an experienced user, you’ll find plenty of resources to help you improve your skills. From introductory tutorials to advanced guides, the training website has everything you need to get the most out of this workflow manager." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "136525fd628f", + "markDefs": [ + { + "href": "https://nextflow.io/ambassadors.html", + "_key": "ba36da915824", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Everyone can access the material at their own pace, but regular training events have been scheduled during the year. Additionally, there is now a network of ", + "_key": "25b0663c23fc0" + }, + { + "text": "Nextflow Ambassadors", + "_key": "25b0663c23fc1", + "_type": "span", + "marks": [ + "ba36da915824" + ] + }, + { + "_key": "25b0663c23fc2", + "_type": "span", + "marks": [], + "text": " who often organise local training events across the world. Without making comparisons with other solutions, I can easily say that the steep learning curve to get going with Nextflow is just a myth nowadays. The quality of the training material, the examples available, the frequency of events in person or online you can attend to, and more importantly a welcoming community of users, make learning Nextflow quite easy." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In my laboratory, usually in a couple of months bachelor students are reasonably confident with the code and with running pipelines and debugging common issues.", + "_key": "66471e9376960" + } + ], + "_type": "block", + "style": "normal", + "_key": "2d5641682584" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "926fbd86c877", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f1bb2003823", + "markDefs": [] + }, + { + "_key": "1705670c4b9c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Conclusions", + "_key": "f5224ed74073", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "1799d8cd89a4", + "markDefs": [], + "children": [ + { + "_key": "a7c8a4c38573", + "_type": "span", + "marks": [], + "text": "In conclusion, the presentation at ISMB has gathered quite some interest because I believe it has shown how Nextflow is a powerful and versatile tool that can help bioinformatics cores address those common challenges everyone has experienced. With its comprehensive tooling, extensive training materials, and active community of users, Nextflow offers a complete package that can help people streamline their workflows and improve their productivity. Although I might be biased on this, I also believe that by adopting Nextflow one also becomes part of a community of researchers and developers who are passionate about bioinformatics and committed to sharing their knowledge and expertise. Beginners not only will have access to a wealth of resources and tutorials, but more importantly to a supportive network of peers who can offer advice and guidance, and which is really fun to be part of." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ded72f00e2e2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8655f8e53179" + }, + { + "style": "blockquote", + "_key": "d931ce151d21", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/ambassadors.html", + "_key": "0bb688bd7001" + } + ], + "children": [ + { + "marks": [], + "text": "This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it ", + "_key": "f481e0dcf23c0", + "_type": "span" + }, + { + "text": "here", + "_key": "f481e0dcf23c1", + "_type": "span", + "marks": [ + "0bb688bd7001" + ] + }, + { + "_key": "f481e0dcf23c2", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + } + ], + "meta": { + "description": "I was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face.", + "slug": { + "current": "addressing-bioinformatics-core-challenges" + } + }, + "_type": "blogPost", + "_id": "94e1e3fb7cda", + "_updatedAt": "2024-09-27T09:48:31Z", + "tags": [ + { + "_key": "7990a90fbc77", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_createdAt": "2024-09-25T14:17:44Z", + "publishedAt": "2024-09-11T06:00:00.000Z" + }, + { + "body": [ + { + "children": [ + { + "_key": "97b74589a3f5", + "_type": "span", + "marks": [], + "text": "The ability to resume an analysis (i.e. caching) is one of the core strengths of Nextflow. When developing pipelines, this allows us to avoid re-running unchanged processes by simply appending " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-resume", + "_key": "49a140b29340" + }, + { + "text": " to the ", + "_key": "890e4fbf4ac2", + "_type": "span", + "marks": [] + }, + { + "_key": "7c511d616f5f", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow run" + }, + { + "_type": "span", + "marks": [], + "text": " command. Sometimes, tasks may be repeated for reasons that are unclear. In these cases it can help to look into the caching mechanism, to understand why a specific process was re-run.", + "_key": "1eeb488f72d7" + } + ], + "_type": "block", + "style": "normal", + "_key": "732639609f9c", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "c6061a5c1563", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f1b7cb440560" + } + ] + }, + { + "_key": "8d3f0ecc874a", + "markDefs": [ + { + "_key": "d69163952239", + "_type": "link", + "href": "https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html" + }, + { + "href": "https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html", + "_key": "6b2950938430", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We have previously written about Nextflow's ", + "_key": "c785c7825c57" + }, + { + "marks": [ + "d69163952239" + ], + "text": "resume functionality", + "_key": "a469ead1685a", + "_type": "span" + }, + { + "marks": [], + "text": " as well as some ", + "_key": "195dcfda6745", + "_type": "span" + }, + { + "_key": "a3664d9281bb", + "_type": "span", + "marks": [ + "6b2950938430" + ], + "text": "troubleshooting strategies" + }, + { + "_key": "1325a4dcb566", + "_type": "span", + "marks": [], + "text": " to gain more insights on the caching behavior." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c30613d02626" + } + ], + "_type": "block", + "style": "normal", + "_key": "535d6923cce6" + }, + { + "_type": "block", + "style": "normal", + "_key": "44f66ec5ef29", + "markDefs": [ + { + "_key": "e4d5b3d5a97f", + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf" + } + ], + "children": [ + { + "text": "In this post, we will take a more hands-on approach and highlight some strategies which we can use to understand what is causing a particular process (or processes) to re-run, instead of using the cache from previous runs of the pipeline. To demonstrate the process, we will introduce a minor change into one of the process definitions in the the ", + "_key": "6531bdacd749", + "_type": "span", + "marks": [] + }, + { + "text": "nextflow-io/rnaseq-nf", + "_key": "c4e41e3114a9", + "_type": "span", + "marks": [ + "e4d5b3d5a97f" + ] + }, + { + "marks": [], + "text": " pipeline and investigate how it affects the overall caching behavior when compared to the initial execution of the pipeline.", + "_key": "8e2cd89e0e54", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "157c14d02349", + "children": [ + { + "_type": "span", + "text": "", + "_key": "51c0a1cbaa6f" + } + ] + }, + { + "children": [ + { + "_key": "084d30d608a6", + "_type": "span", + "text": "Local setup for the test" + } + ], + "_type": "block", + "style": "h3", + "_key": "cf04aeeda787" + }, + { + "style": "normal", + "_key": "d067adcb4ecc", + "markDefs": [ + { + "_key": "55186110ab8d", + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf" + } + ], + "children": [ + { + "_key": "d7c645f90ff3", + "_type": "span", + "marks": [], + "text": "First, we clone the " + }, + { + "marks": [ + "55186110ab8d" + ], + "text": "nextflow-io/rnaseq-nf", + "_key": "712d80280221", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " pipeline locally:", + "_key": "f7eefa643d3e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ab81a7544e6d" + } + ], + "_type": "block", + "style": "normal", + "_key": "891263db5db4" + }, + { + "_type": "code", + "_key": "d00c3ed26798", + "code": "$ git clone https://github.com/nextflow-io/rnaseq-nf\n$ cd rnaseq-nf" + }, + { + "children": [ + { + "text": "", + "_key": "4aa63bc48fec", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ad94ce3688ad" + }, + { + "style": "normal", + "_key": "b57147c229ba", + "markDefs": [], + "children": [ + { + "_key": "92d74afdf7f9", + "_type": "span", + "marks": [], + "text": "In the examples below, we have used Nextflow " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "v22.10.0", + "_key": "306a66e0323d" + }, + { + "_type": "span", + "marks": [], + "text": ", Docker ", + "_key": "854bd93de532" + }, + { + "text": "v20.10.8", + "_key": "2bcc81b17709", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " and ", + "_key": "9839887bddcb", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "Java v17 LTS", + "_key": "8b3a8c18db3b" + }, + { + "text": " on MacOS.", + "_key": "7d0974e67eb6", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1d2cc9f1076", + "children": [ + { + "_key": "d71b2b994302", + "_type": "span", + "text": "" + } + ] + }, + { + "_key": "46437579cbe2", + "children": [ + { + "_type": "span", + "text": "Pipeline flowchart", + "_key": "ff3c422a2d0a" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "d5c0a58d3437", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The flowchart below can help in understanding the design of the pipeline and the dependencies between the various tasks.", + "_key": "9a2bec7983d4", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c5cc6fa8e8b0", + "children": [ + { + "text": "", + "_key": "53b58afd458f", + "_type": "span" + } + ] + }, + { + "_type": "image", + "alt": "rnaseq-nf", + "_key": "4092c8d3dfc4", + "asset": { + "_ref": "image-ededfe17a105d5ee8cca74f55576d2298cd702e1-732x560-png", + "_type": "reference" + } + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "722542ebc94a" + } + ], + "_type": "block", + "style": "normal", + "_key": "1082b61f3da5" + }, + { + "children": [ + { + "_type": "span", + "text": "Logs from initial (fresh) run", + "_key": "567ae2f4bf4e" + } + ], + "_type": "block", + "style": "h3", + "_key": "a8ccb71b1a11" + }, + { + "markDefs": [ + { + "_key": "4136933a3161", + "_type": "link", + "href": "https://nextflow.io/blog/2019/troubleshooting-nextflow-resume.html" + } + ], + "children": [ + { + "marks": [], + "text": "As a reminder, Nextflow generates a unique task hash, e.g. 22/7548fa… for each task in a workflow. The hash takes into account the complete file path, the last modified timestamp, container ID, content of script directive among other factors. If any of these change, the task will be re-executed. Nextflow maintains a list of task hashes for caching and traceability purposes. You can learn more about task hashes in the article ", + "_key": "65307b0af3a3", + "_type": "span" + }, + { + "_key": "2856c955181c", + "_type": "span", + "marks": [ + "4136933a3161" + ], + "text": "Troubleshooting Nextflow resume" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d8e22dfd4f6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "c717e7a68381" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "da3214fcf929" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f4d6a843acf" + }, + { + "style": "normal", + "_key": "e8bd88c1ba77", + "markDefs": [], + "children": [ + { + "_key": "01f56f2f8ac7", + "_type": "span", + "marks": [], + "text": "To have something to compare to, we first need to generate the initial hashes for the unchanged processes in the pipeline. We save these in a file called " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "1c414c119239" + }, + { + "text": " and use them later on as "ground-truth" for the analysis. In order to save the process hashes we use the ", + "_key": "68b784c9dcce", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "-dump-hashes", + "_key": "1220e600acb8", + "_type": "span" + }, + { + "marks": [], + "text": " flag, which prints them to the log.", + "_key": "7ee9cfdbe528", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "dc4710a9f9dd", + "children": [ + { + "text": "", + "_key": "5d10b3ae0575", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/cli.html#execution-logs", + "_key": "9521f1246ec3" + } + ], + "children": [ + { + "text": "TIP:", + "_key": "32833f786885", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": " We rely upon the ", + "_key": "ecb720b277e0", + "_type": "span" + }, + { + "_key": "b73e52b90fb3", + "_type": "span", + "marks": [ + "9521f1246ec3" + ], + "text": "`-log` option" + }, + { + "text": " in the ", + "_key": "993f5b924dd3", + "_type": "span", + "marks": [] + }, + { + "_key": "b8e71f2b6802", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow" + }, + { + "_key": "a4ca16d2d4ee", + "_type": "span", + "marks": [], + "text": " command line interface to be able to supply a custom log file name instead of the default " + }, + { + "_key": "e39a0163a989", + "_type": "span", + "marks": [ + "code" + ], + "text": ".nextflow.log" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "fa3825cfa6cc" + } + ], + "_type": "block", + "style": "normal", + "_key": "abe1ff0cb35a" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4084d9440bf9" + } + ], + "_type": "block", + "style": "normal", + "_key": "23ef9415249c" + }, + { + "code": "$ nextflow -log fresh_run.log run ./main.nf -profile docker -dump-hashes\n\n[...truncated…]\nexecutor > local (4)\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔\n[25/433b23] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1 ✔\n[38/712d21] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]", + "_type": "code", + "_key": "ad81cb2dbb7b" + }, + { + "_key": "7c78f0d0a134", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d218e161be55" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "869451118ccc", + "children": [ + { + "_key": "9c4db898f26a", + "_type": "span", + "text": "Edit the `FastQC` process" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ebe56f3c4f0d", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#cpus", + "_key": "5d690c5adb72" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "After the initial run of the pipeline, we introduce a change in the ", + "_key": "f5d6bd8e240f" + }, + { + "_key": "4415821817fc", + "_type": "span", + "marks": [ + "code" + ], + "text": "fastqc.nf" + }, + { + "_key": "bff539ed3cf2", + "_type": "span", + "marks": [], + "text": " module, hard coding the number of threads which should be used to run the " + }, + { + "_key": "018ab416d98a", + "_type": "span", + "marks": [ + "code" + ], + "text": "FASTQC" + }, + { + "text": " process via Nextflow's ", + "_key": "a72770c0cd9f", + "_type": "span", + "marks": [] + }, + { + "text": "`cpus` directive", + "_key": "794c638e7ff8", + "_type": "span", + "marks": [ + "5d690c5adb72" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "47a4c6d2a001" + } + ] + }, + { + "_key": "fa151421a1ee", + "children": [ + { + "text": "", + "_key": "ac0064faaf09", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Here's the output of ", + "_key": "4d7d9ce13afe" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "git diff", + "_key": "1d3f59f95a3f" + }, + { + "_type": "span", + "marks": [], + "text": " on the contents of ", + "_key": "5c57909bc73a" + }, + { + "marks": [ + "code" + ], + "text": "modules/fastqc/main.nf", + "_key": "3c4a242b8580", + "_type": "span" + }, + { + "text": " file:", + "_key": "518c00887f0a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "77f80ea998ce", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e51edf190e03" + } + ], + "_type": "block", + "style": "normal", + "_key": "759e7c11085e" + }, + { + "_type": "code", + "_key": "8271cf37eb5f", + "code": "--- a/modules/fastqc/main.nf\n+++ b/modules/fastqc/main.nf\n@@ -4,6 +4,7 @@ process FASTQC {\n tag \"FASTQC on $sample_id\"\n conda 'bioconda::fastqc=0.11.9'\n publishDir params.outdir, mode:'copy'\n+ cpus 2\n\n input:\n tuple val(sample_id), path(reads)\n@@ -13,6 +14,6 @@ process FASTQC {\n\n script:\n \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n }" + }, + { + "_type": "block", + "style": "normal", + "_key": "8bdc995861d1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4f12aa52c341" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "220da63ef88b", + "children": [ + { + "_key": "d5f28464074f", + "_type": "span", + "text": "Logs from the follow up run" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "1c8c8c09f191", + "_type": "span", + "marks": [], + "text": "Next, we run the pipeline again with the " + }, + { + "_key": "82b9d7bfa3dc", + "_type": "span", + "marks": [ + "code" + ], + "text": "-resume" + }, + { + "_type": "span", + "marks": [], + "text": " option, which instructs Nextflow to rely upon the cached results from the previous run and only run the parts of the pipeline which have changed. As before, we instruct Nextflow to dump the process hashes, this time in a file called ", + "_key": "350422ad948e" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "resumed_run.log", + "_key": "11298572f5a7" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "37584ff05daa" + } + ], + "_type": "block", + "style": "normal", + "_key": "5e76228c5234" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f3c3318912ef" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d6b4caeecc1" + }, + { + "code": "$ nextflow -log resumed_run.log run ./main.nf -profile docker -dump-hashes -resume\n\n[...truncated…]\nexecutor > local\n[d5/57c2bb] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1, cached: 1 ✔\n[55/15b609] process > RNASEQ:FASTQC (FASTQC on ggal_gut) [100%] 1 of 1 ✔\n[03/23372f] process > RNASEQ:QUANT (ggal_gut) [100%] 1 of 1, cached: 1 ✔\n[f3/f1ccb4] process > MULTIQC [100%] 1 of 1 ✔\n[...truncated…]", + "_type": "code", + "_key": "c4abca2faba2" + }, + { + "style": "normal", + "_key": "a5762de807ad", + "children": [ + { + "_type": "span", + "text": "", + "_key": "11452f6945a5" + } + ], + "_type": "block" + }, + { + "style": "h2", + "_key": "8b8863054dd6", + "children": [ + { + "_key": "6f123ae82f8a", + "_type": "span", + "text": "Analysis of cache hashes" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "From the summary of the command line output above, we can see that the ", + "_key": "fc789c77278a" + }, + { + "_key": "085496247c88", + "_type": "span", + "marks": [ + "code" + ], + "text": "RNASEQ:FASTQC (FASTQC on ggal_gut)" + }, + { + "_key": "57c6976fe5f8", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "MULTIQC", + "_key": "a2691e574c3b" + }, + { + "text": " processes were re-run while the others were cached. To understand why, we can examine the hashes generated by the processes from the logs of the ", + "_key": "6da2789ce429", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "fresh_run", + "_key": "aae77b8e6eff", + "_type": "span" + }, + { + "marks": [], + "text": " and ", + "_key": "adb3bdb8c2a6", + "_type": "span" + }, + { + "text": "resumed_run", + "_key": "1ba48e531d92", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "dfc833a7556e", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "c5a7902c1f83", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "71cca0a76695" + } + ], + "_type": "block", + "style": "normal", + "_key": "ecca541ff52c" + }, + { + "children": [ + { + "_key": "ce95e8c4179d", + "_type": "span", + "marks": [], + "text": "For the analysis, we need to keep in mind that:" + } + ], + "_type": "block", + "style": "normal", + "_key": "f70864556a59", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "b7bf95090b16", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d771018d673f" + } + ] + }, + { + "style": "normal", + "_key": "1cc284086289", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "The time-stamps are expected to differ and can be safely ignored to narrow down the `grep` pattern to the Nextflow `TaskProcessor` class.", + "_key": "6ab08f0711c0" + }, + { + "_type": "span", + "marks": [], + "text": "The _order_ of the log entries isn't fixed, due to the nature of the underlying parallel computation dataflow model used by Nextflow. For example, in our example below, `FASTQC` ran first in `fresh_run.log` but wasn’t the first logged process in `resumed_run.log`.", + "_key": "0f66c2e527c3" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "ae0c37bd8b18", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ecc58bfac7a7" + }, + { + "style": "h3", + "_key": "2215c9877647", + "children": [ + { + "_type": "span", + "text": "Find the process level hashes", + "_key": "2ee4b36a0dc4" + } + ], + "_type": "block" + }, + { + "_key": "67222908ee7e", + "markDefs": [], + "children": [ + { + "text": "We can use standard Unix tools like ", + "_key": "6586d8985b96", + "_type": "span", + "marks": [] + }, + { + "text": "grep", + "_key": "fd6f7bf8324e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "f7046f038a32", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "text": "cut", + "_key": "37210f788748", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "1c7827db52fe", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "sort", + "_key": "94325fcf0db5" + }, + { + "text": " to address these points and filter out the relevant information:", + "_key": "5366cf63f27b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f262e13aee1a", + "children": [ + { + "text": "", + "_key": "797876812757", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use `grep` to isolate log entries with `cache hash` string", + "_key": "58517bb96137" + }, + { + "text": "Remove the prefix time-stamps using `cut -d ‘-’ -f 3`", + "_key": "cda459b32451", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Remove the caching mode related information using `cut -d ';' -f 1`", + "_key": "e37cdc026396", + "_type": "span" + }, + { + "marks": [], + "text": "Sort the lines based on process names using `sort` to have a standard order before comparison", + "_key": "e4208c5d96a3", + "_type": "span" + }, + { + "text": "Use `tee` to print the resultant strings to the terminal and simultaneously save to a file", + "_key": "a6f25af82eb0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8c56562e9848" + }, + { + "children": [ + { + "text": "", + "_key": "5f40609fab0a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b8bb472af8b" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now, let’s apply these transformations to the ", + "_key": "f69cc3652d1f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "58e51715c191" + }, + { + "_key": "d0cbac16c5b5", + "_type": "span", + "marks": [], + "text": " as well as " + }, + { + "marks": [ + "code" + ], + "text": "resumed_run.log", + "_key": "c354bb5d9332", + "_type": "span" + }, + { + "text": " entries.", + "_key": "03f26c34617a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d9ce843c2f3f", + "markDefs": [] + }, + { + "style": "normal", + "_key": "19cfae7cb4bf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "beb2d0c54850" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c1caec0c9faa", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "`fresh_run.log`", + "_key": "aa7208b0766f" + } + ] + }, + { + "_key": "694e5de3a124", + "children": [ + { + "_type": "span", + "text": "", + "_key": "af649ed23e7a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ cat ./fresh_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./fresh_run.tasks.log\n\n [MULTIQC] cache hash: 167d7b39f7efdfc49b6ff773f081daef\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 47e8c58d92dbaafba3c2ccc4f89f53a4\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038", + "_type": "code", + "_key": "62f28de5f445" + }, + { + "_type": "block", + "style": "normal", + "_key": "3d36ce9fdf4c", + "children": [ + { + "_key": "1bc2fd42ff15", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "bb74a1f257a0", + "listItem": "bullet", + "children": [ + { + "text": "`resumed_run.log`", + "_key": "0475b4690945", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "4d348beb89a7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bd96d2fd6eb3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "1db225be98b5", + "code": "$ cat ./resumed_run.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d '-' -f 3 | cut -d ';' -f 1 | sort | tee ./resumed_run.tasks.log\n\n [MULTIQC] cache hash: d3f200c56cf00b223282f12f06ae8586\n [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 92478eeb3b0ff210ebe5a4f3d99aed2d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: ac8be293e1d57f3616cdd0adce34af6f\n [RNASEQ:QUANT (ggal_gut)] cache hash: d8b88e3979ff9fe4bf64b4e1bfaf4038", + "_type": "code" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "6c719477fedc" + } + ], + "_type": "block", + "style": "normal", + "_key": "9212e03fe002" + }, + { + "children": [ + { + "_type": "span", + "text": "Inference from process top-level hashes", + "_key": "968be1f51aed" + } + ], + "_type": "block", + "style": "h3", + "_key": "18d7101f4b0e" + }, + { + "style": "normal", + "_key": "30c26048ff53", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html", + "_key": "8134dfd14c04" + } + ], + "children": [ + { + "_key": "4a85567dbdc9", + "_type": "span", + "marks": [], + "text": "Computing a hash is a multi-step process and various factors contribute to it such as the inputs of the process, platform, time-stamps of the input files and more ( as explained in " + }, + { + "marks": [ + "8134dfd14c04" + ], + "text": "Demystifying Nextflow resume", + "_key": "377b1bbe58af", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " blog post) . The change we made in the task level CPUs directive and script section of the ", + "_key": "808c8a6314b6" + }, + { + "text": "FASTQC", + "_key": "f43ceeb13383", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "68061a3b18bc", + "_type": "span", + "marks": [], + "text": " process triggered a re-computation of hashes:" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "907388760050" + } + ], + "_type": "block", + "style": "normal", + "_key": "99c31fd8f86b" + }, + { + "code": "--- ./fresh_run.tasks.log\n+++ ./resumed_run.tasks.log\n@@ -1,4 +1,4 @@\n- [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534\n- [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401\n+ [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436\n+ [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d\n [RNASEQ:INDEX (ggal_1_48850000_49020000)] cache hash: 356aaa7524fb071f258480ba07c67b3c\n [RNASEQ:QUANT (ggal_gut)] cache hash: 169ced0fc4b047eaf91cd31620b22540\n", + "_type": "code", + "_key": "eff5fa5afc7e" + }, + { + "_type": "block", + "style": "normal", + "_key": "4c909f9db6ce", + "children": [ + { + "_key": "6000246ae41a", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "62aed64f89ff", + "markDefs": [], + "children": [ + { + "text": "Even though we only introduced changes in ", + "_key": "cadb7f0ce394", + "_type": "span", + "marks": [] + }, + { + "text": "FASTQC", + "_key": "f07dc612f194", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", the ", + "_key": "75cde0c19505" + }, + { + "marks": [ + "code" + ], + "text": "MULTIQC", + "_key": "e060d8f62816", + "_type": "span" + }, + { + "text": " process was re-run since it relies upon the output of the ", + "_key": "dc3ba08223c0", + "_type": "span", + "marks": [] + }, + { + "text": "FASTQC", + "_key": "d3c543a71894", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " process. Any task that has its cache hash invalidated triggers a rerun of all downstream steps:", + "_key": "0592006acb23" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8960868b9f60" + } + ], + "_type": "block", + "style": "normal", + "_key": "e1117e993450" + }, + { + "alt": "rnaseq-nf after modification", + "_key": "80b27b4ce745", + "asset": { + "_ref": "image-88ad0b925166304c5d29e44a7fdfbaa0994d6ff9-732x503-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "_key": "12ce673df236", + "children": [ + { + "text": "", + "_key": "7b7a35207bf4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "dfe46bbe6047", + "children": [ + { + "_type": "span", + "text": "Understanding why `FASTQC` was re-run", + "_key": "d6340d4dae8e" + } + ] + }, + { + "_key": "60bf5abdd72f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We can see the full list of ", + "_key": "5e4c23c4219f" + }, + { + "marks": [ + "code" + ], + "text": "FASTQC", + "_key": "5e762db70882", + "_type": "span" + }, + { + "_key": "bfeb7d672cbb", + "_type": "span", + "marks": [], + "text": " process hashes within the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "28e1f4a3c15c" + }, + { + "marks": [], + "text": " file", + "_key": "0c93966f7bf2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "4343feff844a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bf51333867a9" + } + ], + "_type": "block" + }, + { + "code": "\n[...truncated…]\nNov-03 20:19:13.827 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n \"\"\"\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 7ec7cbd71ff757f5fcdbaa760c9ce6de [java.lang.String] sample_id\n 16b4905b1545252eb7cbfe7b2a20d03d [java.lang.String] ggal_gut\n 553096c532e666fb42214fdf0520fe4a [java.lang.String] reads\n 6a5d50e32fdb3261e3700a30ad257ff9 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_1.fq, stageName:ggal_gut_1.fq), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, storePath:/home/abhinav/rnaseq-nf/data/ggal/ggal_gut_2.fq, stageName:ggal_gut_2.fq)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true\n 80a8708c1f85f9e53796b84bd83471d3 [java.util.HashMap$EntrySet] [task.cpus=2]\n f46c56757169dad5c65708a8f892f414 [sun.nio.fs.UnixPath] /home/abhinav/rnaseq-nf/bin/fastqc.sh\n[...truncated…]\n", + "_type": "code", + "_key": "59b0a5296248" + }, + { + "children": [ + { + "_key": "7d25680a03a7", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b60d849573cd" + }, + { + "_key": "0a4c735fc332", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "When we isolate and compare the log entries for ", + "_key": "f7bd595d9c33" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "FASTQC", + "_key": "1dc02cb2a819" + }, + { + "_type": "span", + "marks": [], + "text": " between ", + "_key": "f4e9bb1da943" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "fresh_run.log", + "_key": "9a094d9c7082" + }, + { + "marks": [], + "text": " and ", + "_key": "441d8df01e76", + "_type": "span" + }, + { + "_key": "29b8db7190c8", + "_type": "span", + "marks": [ + "code" + ], + "text": "resumed_run.log" + }, + { + "text": ", we see the following diff:", + "_key": "0935945838f8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "20c61c59625b", + "children": [ + { + "text": "", + "_key": "72be07ca666f", + "_type": "span" + } + ] + }, + { + "code": "--- ./fresh_run.fastqc.log\n+++ ./resumed_run.fastqc.log\n@@ -1,8 +1,8 @@\n-INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 94be8c84f4bed57252985e6813bec401; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [RNASEQ:FASTQC (FASTQC on ggal_gut)] cache hash: 54aa712db7c8248e7f31d5fb6535ff9d; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n 195c7faea83c75f2340eb710d8486d2a [java.lang.String] RNASEQ:FASTQC\n- 43e5a23fc27129f92a6c010823d8909b [java.lang.String] \"\"\"\n- fastqc.sh \"$sample_id\" \"$reads\"\n+ 2bea0eee5e384bd6082a173772e939eb [java.lang.String] \"\"\"\n+ fastqc.sh \"$sample_id\" \"$reads\" -t ${task.cpus}\n", + "_type": "code", + "_key": "c22a71af5f4b" + }, + { + "_type": "block", + "style": "normal", + "_key": "dcf5557d908c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "45e6d4fe9377" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "351a3a99ba4a", + "markDefs": [], + "children": [ + { + "_key": "1cde28d3c278", + "_type": "span", + "marks": [], + "text": "Observations from the diff:" + } + ] + }, + { + "style": "normal", + "_key": "aa22c07074fb", + "children": [ + { + "text": "", + "_key": "3248c000c177", + "_type": "span" + } + ], + "_type": "block" + }, + { + "listItem": "bullet", + "children": [ + { + "_key": "efec6b5e8c12", + "_type": "span", + "marks": [], + "text": "We can see that the content of the script has changed, highlighting the new `$task.cpus` part of the command." + }, + { + "text": "There is a new entry in the `resumed_run.log` showing that the content of the process level directive `cpus` has been added.", + "_key": "58866ff5d0c0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5cd1bda1a4ca" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "9b6667d1dd51" + } + ], + "_type": "block", + "style": "normal", + "_key": "3a8af4c40e17" + }, + { + "style": "normal", + "_key": "18e1e0f2a49c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In other words, the diff from log files is confirming our edits.", + "_key": "5c4cfcb84d53" + } + ], + "_type": "block" + }, + { + "_key": "47d414d946ac", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f8f7b4eb1f36" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "46f1aec08d07", + "children": [ + { + "text": "Understanding why `MULTIQC` was re-run", + "_key": "aca9b35068f5", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "da66f4fed079", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now, we apply the same analysis technique for the ", + "_key": "503012eaa399" + }, + { + "marks": [ + "code" + ], + "text": "MULTIQC", + "_key": "1d49ab1d5509", + "_type": "span" + }, + { + "text": " process in both log files:", + "_key": "6f94f230701c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d9b21665048d", + "children": [ + { + "text": "", + "_key": "76d55f7f8db6", + "_type": "span" + } + ] + }, + { + "_key": "8a5be14e4e72", + "code": "--- ./fresh_run.multiqc.log\n+++ ./resumed_run.multiqc.log\n@@ -1,4 +1,4 @@\n-INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: dccabcd012ad86e1a2668e866c120534; mode: STANDARD; entries:\n+INFO nextflow.processor.TaskProcessor - [MULTIQC] cache hash: c5a63560338596282682cc04ff97e436; mode: STANDARD; entries:\n 1a0e496fef579b22998f099981b494f9 [java.util.UUID] a11bf24f-638a-42d6-8b50-48d3be637d54\n cd584abbdbee0d2cfc4361ee2a3fd44b [java.lang.String] MULTIQC\n 56bfc44d4ed5c943f30ec98b22904eec [java.lang.String] \"\"\"\n@@ -9,8 +9,9 @@\n\n 8e58c0cec3bde124d5d932c7f1579395 [java.lang.String] quay.io/nextflow/rnaseq-nf:v1.1\n 14ca61f10a641915b8c71066de5892e1 [java.lang.String] *\n- cd0e6f1a382f11f25d5cef85bd87c3f4 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/25/433b23af9e98294becade95db6bd76/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n+ 18966b473f7bdb07f4f7f4c8445be1f5 [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, storePath:/home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut, stageName:ggal_gut), FileHolder(sourceObj:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, storePath:/home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs, stageName:fastqc_ggal_gut_logs)]\n d271b8ef022bbb0126423bf5796c9440 [java.lang.String] config\n 5a07367a32cd1696f0f0054ee1f60e8b [nextflow.util.ArrayBag] [FileHolder(sourceObj:/home/abhinav/rnaseq-nf/multiqc, storePath:/home/abhinav/rnaseq-nf/multiqc, stageName:multiqc)]\n 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $\n 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true", + "_type": "code" + }, + { + "_key": "c09212836c96", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8ee886a40043" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Here, the highlighted diffs show the directory of the input files, changing as a result of ", + "_key": "358b7f8c6954", + "_type": "span" + }, + { + "text": "FASTQC", + "_key": "cdc51f83a135", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "522f607948f8", + "_type": "span", + "marks": [], + "text": " being re-run; as a result " + }, + { + "text": "MULTIQC", + "_key": "e2734525cee4", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " has a new hash and has to be re-run as well.", + "_key": "ae996cbfa762" + } + ], + "_type": "block", + "style": "normal", + "_key": "09ba8f476164" + }, + { + "_key": "f1a7396f61c1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "17edf9feb08e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "3201b6569b94", + "_type": "span", + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h2", + "_key": "09686e2723a6" + }, + { + "_type": "block", + "style": "normal", + "_key": "3bb74e45edbd", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Debugging the caching behavior of a pipeline can be tricky, however a systematic analysis can help to uncover what is causing a particular process to be re-run.", + "_key": "b6f03be3ca6a", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "76a74ca42b28", + "children": [ + { + "_key": "75cae1ccedc3", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "4827c0c18b8c", + "markDefs": [], + "children": [ + { + "text": "When analyzing large datasets, it may be worth using the ", + "_key": "a5362fa12b2e", + "_type": "span", + "marks": [] + }, + { + "text": "-dump-hashes", + "_key": "336ee9e0c0e9", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " option by default for all pipeline runs, avoiding needing to run the pipeline again to obtain the hashes in the log file in case of problems.", + "_key": "d0684f4eeda7" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e6f16df28be6", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4e0ae25fdcbc" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "While this process works, it is not trivial. We would love to see some community-driven tooling for a better cache-debugging experience for Nextflow, perhaps an ", + "_key": "5ba5dd581415", + "_type": "span", + "marks": [] + }, + { + "text": "nf-cache", + "_key": "f81d6f3cfcae", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " plugin? Stay tuned for an upcoming blog post describing how to extend and add new functionality to Nextflow using plugins.", + "_key": "bc95329f11fd" + } + ], + "_type": "block", + "style": "normal", + "_key": "7d26c81f4111" + } + ], + "tags": [ + { + "_key": "13481c55fbfe", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_rev": "Ot9x7kyGeH5005E3MJ9aJ2", + "_createdAt": "2024-09-25T14:16:26Z", + "_type": "blogPost", + "title": "Analyzing caching behavior of pipelines", + "publishedAt": "2022-11-10T07:00:00.000Z", + "_id": "96dd279d9303", + "_updatedAt": "2024-09-26T09:03:01Z", + "meta": { + "slug": { + "current": "caching-behavior-analysis" + } + }, + "author": { + "_ref": "5bLgfCKN00diCN0ijmWNOF", + "_type": "reference" + } + }, + { + "meta": { + "slug": { + "current": "addressing-bioinformatics-core-challenges" + } + }, + "_updatedAt": "2024-09-25T14:17:44Z", + "publishedAt": "2024-09-11T06:00:00.000Z", + "_createdAt": "2024-09-25T14:17:44Z", + "_rev": "mvya9zzDXWakVjnX4hhani", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWNzV", + "_type": "reference" + }, + "_id": "98d7cb42542d", + "title": "Addressing Bioinformatics Core Challenges with Nextflow and nf-core", + "_type": "blogPost", + "body": [ + { + "_key": "a68c07125513", + "markDefs": [], + "children": [ + { + "_key": "9937a66c6f7c", + "_type": "span", + "marks": [], + "text": "I was honored to be invited to the ISMB 2024 congress to speak at the session organised by the COSI (Community of Special Interest) of Bioinformatics Cores. This session brought together bioinformatics professionals from around the world who manage bioinformatics facilities in different institutions to share experiences, discuss challenges, and explore solutions for managing and analyzing large-scale biological data. In this session, I had the opportunity to introduce Nextflow, and discuss how its adoption can help bioinformatics cores to address some of the most common challenges they face. From managing complex pipelines to optimizing resource utilization, Nextflow offers a range of benefits that can streamline workflows and improve productivity. In this blog, I'll summarize my talk and share insights on how Nextflow can help overcome some of those challenges, including meeting the needs of a wide range of users or customers, automate reporting, customising pipelines and training." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d765bfaeb31e" + } + ], + "_type": "block", + "style": "normal", + "_key": "f42e3ef9b96c" + }, + { + "style": "h3", + "_key": "3a1d4760bd41", + "children": [ + { + "_type": "span", + "text": "Challenge 1: running multiple services", + "_key": "2e0bf5dbcdc5" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "df11910dc792", + "_type": "span", + "marks": [ + "em" + ], + "text": "Challenge description: “I have a wide range of stakeholders, and my pipelines need to address different needs in multiple scientific domains”" + } + ], + "_type": "block", + "style": "normal", + "_key": "35e54a9bb003" + }, + { + "_type": "block", + "style": "normal", + "_key": "bfc3317714fb", + "children": [ + { + "text": "", + "_key": "275f34944a37", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "c2d57b583713", + "markDefs": [], + "children": [ + { + "_key": "4f3ce74ac3d3", + "_type": "span", + "marks": [], + "text": "One of the biggest challenges faced by bioinformatics cores is catering to a diverse range of users with varying applications. On one hand, one might need to run analyses for researchers focused on cancer or human genetics. On the other hand, one may also need to support scientists working with mass spectrometry or metagenomics. Fortunately, the nf-core community has made it relatively easy to tackle these diverse needs with their curated pipelines. These pipelines are ready to use, covering a broad spectrum of applications, from genomics and metagenomics to immunology and mass spectrometry. In one of my slides I showed a non-exhaustive list, which spans genomics, metagenomics, immunology, mass spec, and more: one can find best-practice pipelines for almost any bioinformatics application imaginable, including emerging areas like imaging and spatial-omics. By leveraging this framework, one can not only tap into the expertise of the pipeline developers but also engage with them to discuss specific needs and requirements. This collaborative approach can significantly ease the deployment of a workflow, allowing the user to focus on high-priority tasks while ensuring that the analyses are always up to date and aligned with current best practices." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "a125c88f7086", + "children": [ + { + "_type": "span", + "text": "", + "_key": "53153364f048" + } + ] + }, + { + "children": [ + { + "_key": "643726f033af", + "_type": "span", + "text": "Challenge 2: customising applications" + } + ], + "_type": "block", + "style": "h3", + "_key": "c810f2bededf" + }, + { + "style": "normal", + "_key": "54c175c31e1c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Challenge description: “We often need to customise our applications and pipeline, to meet specific in-house needs of our users”", + "_key": "5e1e74fa9c1e" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "6660ff183cc1", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "945f15c61323" + }, + { + "_type": "block", + "style": "normal", + "_key": "3dcc4327d6d0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "While ready-to-use applications are a huge advantage, there are times when customisation is necessary. Perhaps the standard pipeline that works for most users doesn't quite meet the specific needs of a facilities user or customer. Fortunately, the nf-core community has got these cases covered. With over 1,300 modules at everyone’s disposal, one can easily compose their own pipeline using the nf-core components and tooling. Should that not be enough though, one can even create a pipeline from scratch using nf-core tools. For instance, one can run a simple command like “nf-core create” followed by the name of the pipeline, and voilà! The software package will create a complete skeleton for the pipeline, filled with pre-compiled code and placeholders to ease customisation. This process is incredibly quick, as I demonstrated in a video clip during the talk, where a pipeline skeleton was created in just a few moments.", + "_key": "cbc4ec38cd6c" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "496a6399a5b7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "fa651bc0d088" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Of course, customisation isn't limited to pipelines. It also applies to containers, which are a crucial enabler of portability. When it comes to containers, Nextflow users have two options: an easy way and a more advanced approach. The easy way involves using Seqera Containers, a platform that allows anyone to compose a container using tools from bioconda, pypi, and conda-forge. No need for logging in, just select the tools, and the URL of your container will be made available in no time. One can build containers for either Docker or Singularity, and for different platforms (amd64 or arm64).", + "_key": "48e132ee96bc", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a5b4255ead3a" + }, + { + "_type": "block", + "style": "normal", + "_key": "752d3e5c776a", + "children": [ + { + "_key": "26ad1ca68b71", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "a8e01bd63ee9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If one is looking for more control, they can use Wave as a command line. This is a powerful tool that can act as an intermediary between the user and a container registry. Wave builds containers on the fly, allowing anyone to pass a wave build command as an evaluation inside a docker run command. It's incredibly fast, and builds containers from conda packages in a matter of seconds. Wave, which is also the engine behind Seqera Containers, can be extremely handy to allow other operations like container augmentation. This feature enables a user to add new layers to existing containers without having to rebuild them, thanks to Docker's layer-based architecture. One can simply create a folder where configuration files or executable scripts are located, pass the folder to Wave which will add the folder with a new layer, and get the URL of the augmented container on the fly.", + "_key": "10c1fc1da31d" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "f404f902c8b2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "5b81e33af28d" + }, + { + "children": [ + { + "_key": "802f20e866e8", + "_type": "span", + "text": "Challenge 3: Reporting" + } + ], + "_type": "block", + "style": "h3", + "_key": "b581f677fbd5" + }, + { + "children": [ + { + "_key": "399773ca629f", + "_type": "span", + "marks": [ + "em" + ], + "text": "Challenge description: “I need to deliver a clear report of the analysis results, in a format that is accessible and can be used for publication purposes by my users”" + } + ], + "_type": "block", + "style": "normal", + "_key": "f777c23dbbbc", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "9679c0d0e7fc" + } + ], + "_type": "block", + "style": "normal", + "_key": "c30d698af275" + }, + { + "children": [ + { + "text": "Reporting is a crucial aspect of any bioinformatics pipeline, and as for customisation Nextflow offers different ways to approach it. suitable for different levels of expertise. The most straightforward solution involves running MultiQC, a tool that collects the output and logs of a wide range of software in a pipeline and generates a nicely formatted HTML report. This is a great option if one wants a quick and easy way to get a summary of their pipeline's results. MultiQC is a widely used tool that supports a huge list (and growing) of bioinformatics tools and file formats, making it a great choice for many use cases.", + "_key": "00dbb93826d4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "f883ffe7d9a5", + "markDefs": [] + }, + { + "style": "normal", + "_key": "d90dbb2ff3b1", + "children": [ + { + "text": "", + "_key": "476310aafae8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "However, if the developer needs more control over the reporting process or wants to create a custom report that meets some specific needs, it is entirely possible to engineer the reports from scratch. This involves collecting the outputs from various processes in the pipeline and passing them as an input to a process that runs an R Markdown or Quarto script. R Markdown and Quarto are popular tools for creating dynamic documents that can be parameterised, allowing anyone to customize the content and the layout of a report dynamically. By using this approach, one can create a report that is tailored to your specific needs, including the types of plots and visualizations they want to include, the formatting and layouting, branding, and anything specific one might want to highlight.", + "_key": "1e2d3a400ed0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1dfe19a5de5e" + }, + { + "_key": "e23c4c92a2e9", + "children": [ + { + "_key": "a034447ccb05", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "To follow this approach, the user can either create their own customised module, or re-use one of the available notebooks modules in the nf-core repository (quarto ", + "_key": "a4d25fd81cbe" + }, + { + "marks": [ + "3a9b4d9076f1" + ], + "text": "here", + "_key": "4a58571df9e8", + "_type": "span" + }, + { + "_key": "4a8d07cd0f0b", + "_type": "span", + "marks": [], + "text": ", or jupyter " + }, + { + "marks": [ + "95b8c542b6da" + ], + "text": "here", + "_key": "9981d7def1b8", + "_type": "span" + }, + { + "text": ").", + "_key": "bd31bbbc93f5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4352e3b5ef1e", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/modules/tree/master/modules/nf-core/quartonotebook", + "_key": "3a9b4d9076f1" + }, + { + "_type": "link", + "href": "https://github.com/nf-core/modules/tree/master/modules/nf-core/jupyternotebook", + "_key": "95b8c542b6da" + } + ] + }, + { + "_key": "75136718b3b5", + "children": [ + { + "text": "", + "_key": "d751122ad9a9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "ffac27e7ede5", + "children": [ + { + "text": "Challenge 4: Monitoring", + "_key": "6203ea4d4856", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "7c8d0f0063d9", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "Challenge description: “I need to be able to estimate and optimise runtimes as well as costs of my pipelines, fitting our cost model”", + "_key": "fccf1accd839", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6e27826bf401", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b3c50e3af9d8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "3164f9d0f6c9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Monitoring is a critical aspect of pipeline management, and Nextflow provides a robust set of tools to help you track and optimise a pipeline's performance. At its core, monitoring involves tracking the execution of the pipeline to ensure that it's running efficiently and effectively. But it's not just about knowing how long a pipeline takes to run or how much it costs - it's also about making sure each process in the pipeline is using the requested resources efficiently. With Nextflow, the user can track the resources used by each process in your pipeline, including CPU, memory, and disk usage and compare them visually with the resources requested in the pipeline configuration and reserved by each job. This information allows the user to identify bottlenecks and areas for optimisation, so one can fine-tune their pipeline for a better resource consumption. For example, if the user notices that one process is using a disproportionate amount of memory, they can adjust the configuration to better match the actual usage.", + "_key": "2048d002f6c8", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "004e1db61340", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bceac9303b39" + }, + { + "_key": "c2f1515e006c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "But monitoring isn't just about optimising a pipeline's performance - it's also about reducing the environmental impact where possible. A recently developed Nextflow plugin allows to track the carbon footprint of a pipeline, including the energy consumption and greenhouse gas emissions associated with running that pipeline. This information allows one to make informed decisions about their environmental impact, and gaining better awareness or even adopting greener strategies to computing.", + "_key": "ba2d91fb488f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c6905075b83c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8fd55c6b98d9" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "One of the key benefits of Nextflow’s monitoring system is its flexibility. The user can either use the built-in html reports for trace and pipeline execution, or could monitor a run live by connecting to Seqera Platform and visualising its progress on a graphical interface in real time. More expert or creative users could use the trace file produced by a Nextflow execution, to create their own metrics and visualisations.", + "_key": "3d83104383ca", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "98e7f20f2897" + }, + { + "_key": "18c318c2de0d", + "children": [ + { + "text": "", + "_key": "bac3a32afd6e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "d7b573462beb", + "children": [ + { + "_type": "span", + "text": "Challenge 5: User accessibility", + "_key": "15b53f7ac5fa" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "f00185c2a71f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Challenge description: “I could balance workloads better, by giving users a certain level of autonomy in running some of my pipelines”", + "_key": "2e2afa95be08" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "d0200ff55e60", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6dcce420d8e9" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "User accessibility is a crucial aspect of pipeline development, as it enables users with varying levels of bioinformatics experience to run complex pipelines with ease. One of the advantages of Nextflow, is that a developer can create pipelines that are not only robust and efficient but also user-friendly. Allowing your users to run them with a certain level of autonomy might be a good strategy in a bioinformatics core to decentralise straightforward analyses and invest human resources on more complex projects. Empowering a facility’s users to run specific pipelines independently could be a solution to reduce certain workloads.", + "_key": "6df8a93c4e19", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c71a5c532bd" + }, + { + "_type": "block", + "style": "normal", + "_key": "93ef800d8024", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b2978cf3e4d0" + } + ] + }, + { + "_key": "d221fee7f422", + "markDefs": [], + "children": [ + { + "_key": "eec80a143544", + "_type": "span", + "marks": [], + "text": "The nf-core template includes a parameters schema, which is captured by the nf-core website to create a graphical interface for parameters configuration of the pipelines hosted under the nf-core organisation on GitHub. This interface allows users to fill in the necessary fields for parameters needed to run a pipeline, and allows even users with minimal experience with bioinformatics or command-line interfaces to quickly set up a run. The user can then simply copy and paste the command generated by the webpage into a terminal, and the pipeline will launch as configured. This approach is ideal for users who are familiar with basic computer tasks, and have a very minimal familiarity with a terminal." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "1d4057f60c7a", + "children": [ + { + "_key": "f8fbb8b98822", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "However, for users with even less bioinformatics experience, Nextflow and the nf-core template together offer an even more intuitive solution. The pipeline can be added to the launcher of the Seqera Platform, and one can provide users with a comprehensive and user-friendly interface that allows them to launch pipelines with ease. This platform offers a range of features, including access to datasets created from sample sheets, the ability to launch pipelines on a wide range of cloud environments as well as on HPC on-premise. A simple graphical interface simplifies the entire process.The Seqera Platform provides in this way a seamless and intuitive experience for users, allowing them to run pipelines without requiring extensive bioinformatics knowledge.", + "_key": "ae589922fc21", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b2f8f305d238" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e32e7da8c093" + } + ], + "_type": "block", + "style": "normal", + "_key": "b9d92bbcc7f2" + }, + { + "children": [ + { + "text": "Challenge 6: Training", + "_key": "d5ef6e79e1b7", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "22eecfe0a3d4" + }, + { + "_key": "2dc993b739f2", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "Challenge description: “Training my team and especially onboarding new team members is always challenging and requires documentation and good materials”", + "_key": "9c00a004ddcb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "343ed3b88403", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dac284c2fbd1" + } + ] + }, + { + "style": "normal", + "_key": "2e59c27e815e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The final challenge we often face in bioinformatics facilities is training. We all know that training is an ongoing issue, not just because of staff turnover and the need to onboard new recruits, but also because the field is constantly evolving. With new tools, techniques, and technologies emerging all the time, it can be difficult to keep up with the latest developments. However, training is crucial for ensuring that pipelines are robust, efficient, and accurate.", + "_key": "8c0ff3b2127a" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "55fdd7b895db", + "children": [ + { + "_type": "span", + "text": "", + "_key": "deca91db6398" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9f58cbd6d298", + "markDefs": [], + "children": [ + { + "text": "Fortunately, there are now many resources available to help with training. The Nextflow training website, for example, has been completely rebuilt recently and now offers a wealth of material suitable for everyone, from beginners to experts. Whether you're just starting out with Nextflow or are already an experienced user, you'll find plenty of resources to help you improve your skills. From introductory tutorials to advanced guides, the training website has everything you need to get the most out of this workflow manager.", + "_key": "5b67076d9671", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "17cb2d5ec9fe", + "children": [ + { + "_type": "span", + "text": "", + "_key": "56471b3fc55b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "7580b36fd450", + "markDefs": [], + "children": [ + { + "_key": "14edd1adc8c9", + "_type": "span", + "marks": [], + "text": "Everyone can access the material at their own pace, but regular training events have been scheduled during the year. Additionally, there is now a network of Nextflow Ambassadors who often organise local training events across the world. Without making comparisons with other solutions, I can easily say that the steep learning curve to get going with Nextflow is just a myth nowadays. The quality of the training material, the examples available, the frequency of events in person or online you can attend to, and more importantly a welcoming community of users, make learning Nextflow quite easy." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "253e73724085", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ce927302c367" + } + ] + }, + { + "children": [ + { + "text": "In my laboratory, usually in a couple of months bachelor students are reasonably confident with the code and with running pipelines and debugging common issues.", + "_key": "a073a5204bc7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "93e1ef530589", + "markDefs": [] + }, + { + "children": [ + { + "_key": "926fbd86c877", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f1bb2003823" + }, + { + "_key": "1705670c4b9c", + "children": [ + { + "_key": "f5224ed74073", + "_type": "span", + "text": "Conclusions" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "1799d8cd89a4", + "markDefs": [], + "children": [ + { + "text": "In conclusion, the presentation at ISMB has gathered quite some interest because I believe it has shown how Nextflow is a powerful and versatile tool that can help bioinformatics cores address those common challenges everyone has experienced. With its comprehensive tooling, extensive training materials, and active community of users, Nextflow offers a complete package that can help people streamline their workflows and improve their productivity. Although I might be biased on this, I also believe that by adopting Nextflow one also becomes part of a community of researchers and developers who are passionate about bioinformatics and committed to sharing their knowledge and expertise. Beginners not only will have access to a wealth of resources and tutorials, but more importantly to a supportive network of peers who can offer advice and guidance, and which is really fun to be part of.", + "_key": "a7c8a4c38573", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + } + ] + }, + { + "_id": "9981e1c14fb6", + "_rev": "mvya9zzDXWakVjnX4hhATe", + "title": "Optimizing Nextflow for HPC and Cloud at Scale", + "publishedAt": "2024-01-17T07:00:00.000Z", + "_createdAt": "2024-09-25T14:18:34Z", + "meta": { + "slug": { + "current": "optimizing-nextflow-for-hpc-and-cloud-at-scale" + } + }, + "_type": "blogPost", + "body": [ + { + "style": "h2", + "_key": "a5c64d3fa9db", + "children": [ + { + "_type": "span", + "text": "Introduction", + "_key": "a5b8f1416474" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "2bb5e9e8fd62", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "A Nextflow workflow run consists of the head job (Nextflow itself) and compute tasks (defined in the pipeline script). It is common to request resources for the tasks via process directives such as ", + "_key": "909b0290c131", + "_type": "span" + }, + { + "text": "cpus", + "_key": "0161ec0fb3c6", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " and ", + "_key": "4bee11fa15c1", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "memory", + "_key": "f1f3e1ce2fa4", + "_type": "span" + }, + { + "text": ", but the Nextflow head job also requires compute resources. Most of the time, users don’t need to explicitly define the head job resources, as Nextflow generally does a good job of allocating resources for itself. For very large workloads, however, head job resource sizing becomes much more important.", + "_key": "7135b36794d0", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "cb4422f5a5d3", + "children": [ + { + "text": "", + "_key": "7d4b64055220", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "31e57873a47f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In this article, we will help you understand how the Nextflow head job works and show you how to tune head job resources such as CPUs and memory for your use case.", + "_key": "6db0f71f0035", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "82947e2ad6b0" + } + ], + "_type": "block", + "style": "normal", + "_key": "5784a30d6d9f" + }, + { + "_type": "block", + "_key": "ff176bd459d6" + }, + { + "style": "h2", + "_key": "4a51317ad1f4", + "children": [ + { + "_key": "c9c353c148ff", + "_type": "span", + "text": "Head job resources" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "50cb76680805", + "children": [ + { + "_key": "15c82a3c336d", + "_type": "span", + "text": "CPUs" + } + ] + }, + { + "_key": "40f8109d05ef", + "markDefs": [ + { + "href": "https://seqera.io/platform/", + "_key": "b56720d4bfda", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow uses a thread pool to run native Groovy code (e.g. channel operators, ", + "_key": "f47d354d4e67", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "exec", + "_key": "d65319ca1ed3" + }, + { + "_type": "span", + "marks": [], + "text": " processes), submit tasks to executors, and publish output files. The number of threads is based on the number of available CPUs, so if you want to provide more compute power to the head job, simply allocate more CPUs and Nextflow will use them. In the ", + "_key": "b568ced82ea2" + }, + { + "_type": "span", + "marks": [ + "b56720d4bfda" + ], + "text": "Seqera Platform", + "_key": "c4002901c11f" + }, + { + "_key": "2543b9b214ff", + "_type": "span", + "marks": [], + "text": ", you can use " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Head Job CPUs", + "_key": "72a92c766252" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "97e40092fba1" + }, + { + "marks": [ + "strong" + ], + "text": "Head Job submit options", + "_key": "ed5888f40fb5", + "_type": "span" + }, + { + "marks": [], + "text": " (depending on the compute environment) to allocate more CPUs.", + "_key": "2f241dbf5175", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "46eab781b660", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d102bd967cd2" + } + ] + }, + { + "style": "h3", + "_key": "dd2be54b8b36", + "children": [ + { + "text": "Memory", + "_key": "3e7500c95848", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Nextflow runs on the Java Virtual Machine (JVM), so it allocates memory based on the standard JVM options, specifically the initial and maximum heap size. You can view the default JVM options for your environment by running this command:", + "_key": "612779f2b19a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0cf56eee2305" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "05ee49d6b0db" + } + ], + "_type": "block", + "style": "normal", + "_key": "f4f2db09f770" + }, + { + "_type": "code", + "_key": "206759ae7dbb", + "code": "java -XX:+PrintFlagsFinal -version | grep 'HeapSize\\|RAM'" + }, + { + "_type": "block", + "style": "normal", + "_key": "d290cebc68cb", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c34ada32828a" + } + ] + }, + { + "_key": "f7dd119a548f", + "markDefs": [], + "children": [ + { + "text": "For example, here are the JVM options for an environment with 8 GB of RAM and OpenJDK Temurin 17.0.6:", + "_key": "92d710717b5a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c6bd8ce8e102", + "children": [ + { + "_key": "53fda06571f6", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "0941b13f8197", + "code": " size_t ErgoHeapSizeLimit = 0\n size_t HeapSizePerGCThread = 43620760\n size_t InitialHeapSize = 127926272\n uintx InitialRAMFraction = 64\n double InitialRAMPercentage = 1.562500\n size_t LargePageHeapSizeThreshold = 134217728\n size_t MaxHeapSize = 2044723200\n uint64_t MaxRAM = 137438953472\n uintx MaxRAMFraction = 4\n double MaxRAMPercentage = 25.000000\n size_t MinHeapSize = 8388608\n uintx MinRAMFraction = 2\n double MinRAMPercentage = 50.000000\n uintx NonNMethodCodeHeapSize = 5839372\n uintx NonProfiledCodeHeapSize = 122909434\n uintx ProfiledCodeHeapSize = 122909434\n size_t SoftMaxHeapSize = 2044723200" + }, + { + "_type": "block", + "style": "normal", + "_key": "a3038caf271a", + "children": [ + { + "text": "", + "_key": "6df4b5dfb3b6", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "44ca4f03eb50", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "These settings (displayed in bytes) show an initial and maximum heap size of ~128MB and ~2GB, or 1/64 (1.5625%) and 1/4 (25%) of physical memory. These percentages are the typical default settings, although different environments may have different defaults. In the Seqera Platform, the default settings are 40% and 75%, respectively.", + "_key": "cadbe834b5ec" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "5940acbe5b9d" + } + ], + "_type": "block", + "style": "normal", + "_key": "c397ad909c2f" + }, + { + "style": "normal", + "_key": "506f69614741", + "markDefs": [], + "children": [ + { + "_key": "d4c61e83f51a", + "_type": "span", + "marks": [], + "text": "You can set these options for Nextflow at runtime, for example:" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "22154f186a1b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "93f77bc87493" + } + ] + }, + { + "code": "# absolute values\nexport NXF_JVM_ARGS=\"-Xms2g -Xmx6g\"\n\n# percentages\nexport NXF_JVM_ARGS=\"-XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=75\"", + "_type": "code", + "_key": "24ae4750b09d" + }, + { + "_key": "a6cb4cece192", + "children": [ + { + "_key": "ef134425537c", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d795c719889c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you need to provide more memory to Nextflow, you can (1) allocate more memory to the head job and/or (2) use ", + "_key": "f9ab9567a425" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "NXF_JVM_ARGS", + "_key": "7d6d352bed72" + }, + { + "text": " to increase the percentage of available memory that Nextflow can use. In the Seqera Platform, you can use ", + "_key": "9a7cb5830476", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Head Job memory", + "_key": "11b3c7f32420" + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "1992635c85f7" + }, + { + "_key": "6f9c29804a03", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Head Job submit options" + }, + { + "_type": "span", + "marks": [], + "text": " (depending on the compute environment) to allocate more memory.", + "_key": "2d972b9b25fe" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f5e6224176d0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "3fb42f43936e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "75a2c73ab83e", + "children": [ + { + "_type": "span", + "text": "Disk", + "_key": "4b43a741dce1" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "915c15254e7e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow head job is generally responsible for downloading software dependencies and transferring inputs and outputs, but the details vary depending on the environment:", + "_key": "50383ed4bf06" + } + ] + }, + { + "style": "normal", + "_key": "83025c17c878", + "children": [ + { + "_key": "c699e2b11c08", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "70f9e9a22eed", + "listItem": "bullet", + "children": [ + { + "_key": "d00531d901ab", + "_type": "span", + "marks": [], + "text": "In an HPC environment, the home directory is typically used to store pipeline code and container images, while the work directory is typically stored in high-performance shared storage. Within the work directory, task inputs are staged from previous tasks via symlinks. Remote inputs (e.g. from HTTP or S3) are first staged into the work directory and then symlinked into the task directory." + }, + { + "text": "In a cloud environment like AWS Batch, each task is responsible for pulling its own container image, downloading input files from the work directory (e.g. in S3), and uploading outputs. The head job’s local storage is only used to download the pipeline code.", + "_key": "3e5e59511f98", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d996ad8483c3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "39212b9ed079" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Overall, the head job uses very little local storage, since most data is saved to shared storage (HPC) or object storage (cloud) rather than the head job itself. However, there are a few specific cases to keep in mind, which we will cover in the following section.", + "_key": "eec7f5ab77d4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2cc9c0128ddb", + "markDefs": [] + }, + { + "style": "normal", + "_key": "47d1135f592d", + "children": [ + { + "text": "", + "_key": "fd18eb401f4a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "086740d10bc1", + "children": [ + { + "text": "Common failure modes", + "_key": "924dbf7f9585", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "h3", + "_key": "4a5e9d838a96", + "children": [ + { + "_key": "92ab5fdee5ff", + "_type": "span", + "text": "Not enough CPUs for local tasks" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0496b1f3ab2a", + "markDefs": [], + "children": [ + { + "_key": "10b2cff5a5e1", + "_type": "span", + "marks": [], + "text": "If your workflow has any tasks that use the local executor, make sure the Nextflow head job has enough CPUs to execute these tasks. For example, if a local task requires 4 CPUs, the Nextflow head job should have at least 5 CPUs (the local executor reserves 1 CPU for Nextflow by default)." + } + ] + }, + { + "style": "normal", + "_key": "ee35f9841d97", + "children": [ + { + "_type": "span", + "text": "", + "_key": "420f08190519" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Not enough memory for native pipeline code", + "_key": "10ad71561521", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "50ec50767506" + }, + { + "_type": "block", + "style": "normal", + "_key": "81d2d9832028", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow pipelines are a combination of native Groovy code (channels, operators, ", + "_key": "69e2b067b38a", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "exec", + "_key": "ebf53c04e243", + "_type": "span" + }, + { + "marks": [], + "text": " processes) and embedded shell scripts (", + "_key": "555bf1aa7b64", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "script", + "_key": "a2249aad4784", + "_type": "span" + }, + { + "_key": "2756793f330b", + "_type": "span", + "marks": [], + "text": " processes). Native code is executed directly by the Nextflow head job, while tasks with shell scripts are delegated to executors. Typically, tasks are used to perform the “actual” computations, while channels and operators are used to pass data between tasks." + } + ] + }, + { + "style": "normal", + "_key": "eca6f115442e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0b0caf537f1a" + } + ], + "_type": "block" + }, + { + "_key": "673ae764702d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However much Groovy code you write, keep in mind that the Nextflow head job needs to have enough memory to execute it at the desired scale. The simplest way to determine how much memory Nextflow needs is to iteratively allocate more memory to the head job until it succeeds (e.g. start with 1 GB, then 2 GB, then 4 GB, and so on). In general, 2-4 GB is more than enough memory for the Nextflow head job.", + "_key": "6ef085921d41" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "c44fe3be40c4" + } + ], + "_type": "block", + "style": "normal", + "_key": "9399588946de" + }, + { + "style": "h3", + "_key": "1dae3baf1714", + "children": [ + { + "_key": "297c02a76c1d", + "_type": "span", + "text": "Not enough memory to stage and publish files" + } + ], + "_type": "block" + }, + { + "_key": "474522c0a694", + "markDefs": [], + "children": [ + { + "text": "In Nextflow, input files can come from a variety of sources: local files, an HTTP or FTP server, an S3 bucket, etc. When an input file is not local, Nextflow automatically stages the file into the work directory. Similarly, when a ", + "_key": "2f8814c18a13", + "_type": "span", + "marks": [] + }, + { + "_key": "dc07ea4a7c49", + "_type": "span", + "marks": [ + "code" + ], + "text": "publishDir" + }, + { + "_key": "7c1721b03618", + "_type": "span", + "marks": [], + "text": " directive points to a remote path, Nextflow automatically “publishes” the output files using the correct protocol. These transfers are usually performed in-memory." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "267876a0d551", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "fe0b1f15041c" + }, + { + "_type": "block", + "style": "normal", + "_key": "bf5a4c018f7c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Many users have encountered head job errors when running large-scale workloads, where the head job runs out of memory while staging or publishing files. While you can try to give more and more memory to Nextflow as in the previous example, you might be able to fix your problem by simply updating your Nextflow version. There have been many improvements to Nextflow over the past few years around file staging, particularly with S3, and overall we have seen fewer out-of-memory errors of this kind.", + "_key": "0f319858764a" + } + ] + }, + { + "children": [ + { + "_key": "9bbe3136a769", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "efdb64eb076b" + }, + { + "_key": "062be5c2fda7", + "children": [ + { + "_type": "span", + "text": "Not enough disk storage to build Singularity images", + "_key": "c41c0b0c7672" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "47acfffe1c69", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Singularity / Apptainer can download and convert Docker images on the fly, and it uses the head job’s local scratch storage to do so. This is a common pattern in HPC environments, since container images are usually published as Docker images but HPC environments usually require the use of a rootless container runtime like Singularity. In this case, make sure the head job has enough scratch storage to build each image, even if the image is eventually saved to shared storage.", + "_key": "c0f15d8f7d59", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0f01f9b08e2d", + "children": [ + { + "text": "", + "_key": "49ddc470069a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Since Nextflow version ", + "_key": "ec77839c953c" + }, + { + "marks": [ + "91600e1adee4" + ], + "text": "23.10.0", + "_key": "b01b764bf9d2", + "_type": "span" + }, + { + "_key": "7177995d29e3", + "_type": "span", + "marks": [], + "text": ", you can use " + }, + { + "_key": "b85570f5653a", + "_type": "span", + "marks": [ + "b6d458636134" + ], + "text": "Wave" + }, + { + "_type": "span", + "marks": [], + "text": " to build Singularity images for you. Refer to the ", + "_key": "530f5f79ac20" + }, + { + "marks": [ + "3c38eeebe73a" + ], + "text": "Nextflow documentation", + "_key": "926aab76cb4f", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " for more details.", + "_key": "6fe69ab7bbb2" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc5ac7f92faf", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0", + "_key": "91600e1adee4" + }, + { + "_type": "link", + "href": "https://seqera.io/wave/", + "_key": "b6d458636134" + }, + { + "href": "https://nextflow.io/docs/latest/wave.html#build-singularity-native-images", + "_key": "3c38eeebe73a", + "_type": "link" + } + ] + }, + { + "_key": "390f2d3a71ee", + "children": [ + { + "text": "", + "_key": "604b5e1d5ac9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, Nextflow version ", + "_key": "0295f01d65a1" + }, + { + "_type": "span", + "marks": [ + "b67a5de8514b" + ], + "text": "23.11.0-edge", + "_key": "8ec473d2b159" + }, + { + "_type": "span", + "marks": [], + "text": " introduced support for ", + "_key": "331062ce894c" + }, + { + "marks": [ + "c57bf8e26eb6" + ], + "text": "Singularity OCI mode", + "_key": "93220de4286a", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", which allows Singularity / Apptainer to use the OCI container format (the same as Docker) instead of having to build and store a SIF container image locally.", + "_key": "eb819ff89d61" + } + ], + "_type": "block", + "style": "normal", + "_key": "1a4fe8c72166", + "markDefs": [ + { + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v23.11.0-edge", + "_key": "b67a5de8514b", + "_type": "link" + }, + { + "_type": "link", + "href": "https://docs.sylabs.io/guides/3.1/user-guide/oci_runtime.html", + "_key": "c57bf8e26eb6" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cdc4551d1ac1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "521b64998b9f" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "9b18afe1a9ed", + "children": [ + { + "_type": "span", + "text": "Failures due to head job and tasks sharing local storage", + "_key": "9665a7dc3b1a" + } + ] + }, + { + "_key": "57efae4ffefb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "There are some situations where the head job and tasks may run on the same node and thereby share the node’s local storage, for example, Kubernetes. If this storage becomes full, any one of the jobs might fail first, including the head job. You can avoid this problem by segregating the head job to its own node, or explicitly requesting disk storage for each task so that they each have sufficient storage.", + "_key": "f9c6bbc951e6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e09273d28413", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f619b3a1fa02" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "ced32e5402ea", + "children": [ + { + "_type": "span", + "text": "Virtual threads", + "_key": "a830db21b584" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "ab3df3b2e063" + ], + "text": "Virtual threads", + "_key": "70eb3bd1cf93" + }, + { + "_type": "span", + "marks": [], + "text": " were introduced in Java 19 and finalized in Java 21. Whereas threads in Java are normally “platform” threads managed by the operating system, “virtual” threads are user-space threads that share a pool of platform threads. Virtual threads use less memory and can be context-switched faster than platform threads, so an application that uses a fixed-size pool of platform threads (e.g. one thread per CPU) could instead have thousands of virtual threads (one thread per “task”) with the same memory footprint and more flexibility – if a virtual thread is blocked (i.e. waiting on I/O), the underlying platform thread can be switched to another virtual thread that isn’t blocked.", + "_key": "274baf980833" + } + ], + "_type": "block", + "style": "normal", + "_key": "9a88c328c4c8", + "markDefs": [ + { + "_type": "link", + "href": "https://www.infoq.com/articles/java-virtual-threads/", + "_key": "ab3df3b2e063" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cc7bdb3e3849", + "children": [ + { + "_type": "span", + "text": "", + "_key": "fd63a51f2189" + } + ] + }, + { + "_key": "790999b969e1", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v23.05.0-edge", + "_key": "212bd7bb36df" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v23.10.0", + "_key": "31beb3773cf3" + } + ], + "children": [ + { + "marks": [], + "text": "Since Nextflow ", + "_key": "ca456f2186b7", + "_type": "span" + }, + { + "text": "23.05.0-edge", + "_key": "f07fc159a2e0", + "_type": "span", + "marks": [ + "212bd7bb36df" + ] + }, + { + "text": ", you can enable virtual threads by using Java 19 or later and setting the ", + "_key": "067f16d5e1b8", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "NXF_ENABLE_VIRTUAL_THREADS", + "_key": "5db9bb2a7596" + }, + { + "text": " environment variable to ", + "_key": "d528b31d48c9", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "true", + "_key": "25b5b5e548c7" + }, + { + "marks": [], + "text": ". Since version ", + "_key": "b011715e324f", + "_type": "span" + }, + { + "text": "23.10.0", + "_key": "92c9d1d9d7b2", + "_type": "span", + "marks": [ + "31beb3773cf3" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", when using Java 21, virtual threads are enabled by default.", + "_key": "dce504ec17b3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a2919d119ac2", + "children": [ + { + "_key": "c4d9b427b9b0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6f7f37319034", + "children": [ + { + "text": "Initial Benchmark: S3 Upload", + "_key": "15cccecb3f76", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Virtual threads are particularly useful when there are many I/O-bound tasks, such as uploading many files to S3. So to demonstrate this benefit, we wrote a pipeline… that uploads many files to S3! Here is the core pipeline code:", + "_key": "989da7571988", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f010b356ebd9" + }, + { + "_type": "block", + "style": "normal", + "_key": "002fb61a1612", + "children": [ + { + "_type": "span", + "text": "", + "_key": "04cef866d2c1" + } + ] + }, + { + "_key": "f0725e37f52a", + "code": "params.upload_count = 1000\nparams.upload_size = '10M'\n\nprocess make_random_file {\n publishDir 's3://my-bucket/data/'\n\n input:\n val index\n val size\n\n output:\n path '*.data'\n\n script:\n \"\"\"\n dd \\\n if=/dev/random \\\n of=upload-${size}-${index}.data \\\n bs=1 count=0 seek=${size}\n \"\"\"\n}\n\nworkflow {\n index = Channel.of(1..params.upload_count)\n make_random_file(index, params.upload_size)\n}", + "_type": "code" + }, + { + "_key": "40936c74c9ec", + "children": [ + { + "text": "", + "_key": "f39b4af98e85", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/bentsherman/nf-head-job-benchmark", + "_key": "697f19083c9b" + } + ], + "children": [ + { + "text": "The full source code is available on ", + "_key": "57f786f226d3", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "697f19083c9b" + ], + "text": "GitHub", + "_key": "c46d716b3f37" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "7d8d007d8ff4" + } + ], + "_type": "block", + "style": "normal", + "_key": "4c2c7e7c1062" + }, + { + "_key": "cf31c68346f8", + "children": [ + { + "text": "", + "_key": "d05deb498072", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0721257931d2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We ran this pipeline across a variety of file sizes and counts, and the results are shown below. Error bars denote +/- 1 standard deviation across three independent trials.", + "_key": "f3e251a44bad" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "6e4d5859cfbb", + "children": [ + { + "_type": "span", + "text": "", + "_key": "801fbcc72346" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "At larger scales, virtual threads significantly reduce the total runtime, at the cost of higher CPU and memory usage. Considering that the head job resources are typically underutilized anyway, we think the lower time-to-solution is a decent trade!", + "_key": "e061c4ecc3df" + } + ], + "_type": "block", + "style": "normal", + "_key": "9e6f04135ef7" + }, + { + "style": "normal", + "_key": "f4e83a737cac", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7ef69168280f" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "563ce2f3be7f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The reason why virtual threads are faster in this case is that Nextflow usually spends extra time waiting for files to be published after all tasks have completed. Normally, these publishing tasks are executed by a fixed-size thread pool based on the number of CPUs, but with virtual threads there is no such limit, so Nextflow can fully utilize the available network bandwidth. In the largest case (1000x 100 MB files), virtual threads reduce the runtime by over 30%.", + "_key": "fd666a835404" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "0f950feddc5f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "fe818ef68e5a" + } + ] + }, + { + "_type": "image", + "alt": "CPU usage", + "_key": "b51c451bd8ac", + "asset": { + "_ref": "image-8d4e50d3b423ede644d56e4590af71d3a0eaae67-1009x300-png", + "_type": "reference" + } + }, + { + "alt": "Memory usage", + "_key": "f7b922471ff7", + "asset": { + "_ref": "image-9e09a11f318cf8f06d4b21e47ebbb76d64b2f78f-1009x300-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "alt": "Workflow runtime", + "_key": "7eb4447d421f", + "asset": { + "_ref": "image-a999de9fd0169a4c36d45584a1f92da1401d4286-1009x300-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "children": [ + { + "_type": "span", + "text": "Realistic Benchmark: nf-core/rnaseq", + "_key": "369171e6d50e" + } + ], + "_type": "block", + "style": "h3", + "_key": "5d7e78603109" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/rnaseq", + "_key": "f350bc5c3de4" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To evaluate virtual threads on a real pipeline, we also ran ", + "_key": "997feda38741" + }, + { + "text": "nf-core/rnaseq", + "_key": "3480c14c2e44", + "_type": "span", + "marks": [ + "f350bc5c3de4" + ] + }, + { + "_type": "span", + "marks": [], + "text": " with the ", + "_key": "6bb89e6dc99e" + }, + { + "text": "test", + "_key": "c7537ff4a745", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " profile. To simulate a run with many samples, we upsampled the test dataset to 1000 samples. The results are summarized below:", + "_key": "e313a58971ca", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "2beb960911bf" + }, + { + "_type": "block", + "style": "normal", + "_key": "5314be4138c9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "7177a8258873" + } + ] + }, + { + "_type": "block", + "_key": "fbed253dc90f" + }, + { + "children": [ + { + "marks": [], + "text": "As you can see, the benefit here is not so clear. Whereas the upload benchmark was almost entirely I/O, a typical Nextflow pipeline spends most of its time scheduling compute tasks and waiting for them to finish. These tasks are generally not I/O bound and do not block for very long, so there may be little opportunity for improvement from virtual threads.", + "_key": "f90be1d9f756", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "94516d77c0f9", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "445848a4fcb7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "28914d1ba496" + }, + { + "children": [ + { + "_key": "3fd5a21f0c8c", + "_type": "span", + "marks": [], + "text": "That being said, this benchmark consisted of only two runs of nf-core/rnaseq. We didn’t perform more runs here because they were so large, so your results may vary. In particular, if your Nextflow runs spend a lot of time publishing outputs after all the compute tasks have completed, you will likely benefit the most from using virtual threads. In any case, virtual threads should perform at least as well as platform threads, albeit with higher memory usage in some cases." + } + ], + "_type": "block", + "style": "normal", + "_key": "6ee128675d0c", + "markDefs": [] + }, + { + "style": "normal", + "_key": "2165e20e88f1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "57e1574b3c8c" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Summary", + "_key": "25e111ab3942" + } + ], + "_type": "block", + "style": "h2", + "_key": "e7742065ab0e" + }, + { + "markDefs": [], + "children": [ + { + "_key": "b2f865e40f05", + "_type": "span", + "marks": [], + "text": "The key to right-sizing the Nextflow head job is to understand which parts of a Nextflow pipeline are executed directly by Nextflow, and which parts are delegated to compute tasks. This knowledge will help prevent head job failures at scale." + } + ], + "_type": "block", + "style": "normal", + "_key": "e38e3867d9ca" + }, + { + "_type": "block", + "style": "normal", + "_key": "7a356e59b96a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b1fb4a7ccf00" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Here are the main takeaways:", + "_key": "7ab9d2f494b1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "45f69eb8f70b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "b1526e0dab8a", + "children": [ + { + "text": "", + "_key": "26c715ec4f1c", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Nextflow uses a thread pool based on the number of available CPUs.", + "_key": "3f0db00033d2", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Nextflow uses a maximum heap size based on the standard JVM options, which is typically 25% of physical memory (75% in the Seqera Platform).", + "_key": "6029ed17048d" + }, + { + "marks": [], + "text": "You can use `NXF_JVM_ARGS` to make more system memory available to Nextflow.", + "_key": "e81cb0255e87", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "The easiest way to figure out how much memory Nextflow needs is to iteratively double the memory allocation until the workflow succeeds (but usually 2-4 GB is enough).", + "_key": "1a552a3b2ba3" + }, + { + "_type": "span", + "marks": [], + "text": "You can enable virtual threads in Nextflow, which may reduce overall runtime for some pipelines.", + "_key": "b2daaf7e286f" + } + ], + "_type": "block", + "style": "normal", + "_key": "7e09fcddeb83", + "listItem": "bullet" + } + ], + "author": { + "_type": "reference", + "_ref": "8bd9c7c9-b7e7-473a-ace4-2cf6802bc884" + }, + "_updatedAt": "2024-09-26T09:05:05Z", + "tags": [ + { + "_key": "15979dbb0c37", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + }, + { + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference", + "_key": "057bc211dc36" + } + ] + }, + { + "publishedAt": "2023-10-11T06:00:00.000Z", + "_updatedAt": "2024-10-14T08:34:12Z", + "_id": "99bf344d0302", + "body": [ + { + "style": "normal", + "_key": "4f19f8b99f32", + "markDefs": [], + "children": [ + { + "_key": "4a42a47ba179", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "46a73f368d43", + "alignment": "right", + "asset": { + "_type": "image", + "asset": { + "_ref": "image-dd0feaed0611ff03d655ac7bcb09cf801764ca7c-2318x2792-jpg", + "_type": "reference" + } + }, + "size": "small", + "_type": "picture" + }, + { + "style": "normal", + "_key": "87363aa07495", + "markDefs": [], + "children": [ + { + "text": "I’m excited to announce that I’m joining Seqera as Lead Developer Advocate. My mission is to support the growth of the Nextflow user community, especially in the USA, which will involve running community events, conducting training sessions, managing communications and working globally with our partners across the field to ensure Nextflow users have what they need to be successful. I’ll be working remotely from Boston, in collaboration with Paolo, Phil and the rest of the Nextflow team.", + "_key": "b5eaa2a47864", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "d6385fe18b0e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "10cf00cc2484" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Some of you may already know me from my previous job at the Broad Institute, where I spent a solid decade doing outreach and providing support for the genomics research community, first for GATK, then for WDL and Cromwell, and eventually Terra. A smaller subset might have come across the O’Reilly book I co-authored, ", + "_key": "bdfaad02fe3b" + }, + { + "_type": "span", + "marks": [ + "880798044e5b" + ], + "text": "Genomics on the Cloud", + "_key": "27bcfa277867" + }, + { + "marks": [], + "text": ".", + "_key": "3efe37254ac0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ab2c9ac45381", + "markDefs": [ + { + "_type": "link", + "href": "https://www.oreilly.com/library/view/genomics-in-the/9781491975183/", + "_key": "880798044e5b" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bb37dc38a0e5" + } + ], + "_type": "block", + "style": "normal", + "_key": "da0294113f54" + }, + { + "_key": "cfc71488baa4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This new mission is very much a continuation of my dedication to helping the research community use cutting-edge software tools effectively.", + "_key": "d0e0863e0a82" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "6751e2c3a6b6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ba3a0e3361ef" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "529351406609", + "markDefs": [], + "children": [ + { + "text": "From bacterial cultures to large-scale genomics", + "_key": "79b7fc286e31", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "dc1dc2e7b283", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To give you a brief sense of where I’m coming from, I originally trained as a wetlab microbiologist in my homeland of Belgium, so it’s fair to say I’ve come a long way, quite literally. I never took a computing class, but taught myself Python during my PhD to analyze bacterial plasmid sequencing data (72 kb of Sanger sequence!) and sort of fell in love with bioinformatics in the process. Later, I got the opportunity to deepen my bioinformatics skills during my postdoc at Harvard Medical School, although my overall research project was still very focused on wetlab work.", + "_key": "0b98f4da44b4", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "a6fb479c44e9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c7501e38fc6d", + "markDefs": [] + }, + { + "_key": "7385e30760a7", + "markDefs": [], + "children": [ + { + "_key": "c71ec69a8b9d", + "_type": "span", + "marks": [], + "text": "Toward the end of my postdoc, I realized I had become more interested in the software side of things, though I didn’t have any formal qualifications. Fortunately I was able to take a big leap sideways and found a new home at the Broad Institute, where I was hired as a Bioinformatics Scientist to build out the GATK community, at a time when it was still a bit niche. (It’s a long story that I don’t have time for today, but I’m always happy to tell it over drinks at a conference reception…)" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0b84e93c0266" + } + ], + "_type": "block", + "style": "normal", + "_key": "7d55eb8153b0" + }, + { + "children": [ + { + "text": "The GATK job involved providing technical and scientific support to researchers, developing documentation, and teaching workshops about genomics and variant calling specifically. Which is hilarious because at the time I was hired, I had no clue what variant calling even meant! I think I was easily a month or two into the job before that part actually started making a little bit of sense. I still remember the stress and confusion of trying to figure all that out, and it’s something I always carry with me when I think about how to help newcomers to the ecosystem. I can safely say, whatever aspect of this highly multidisciplinary field is causing you trouble, I’ve struggled with it myself at some point.", + "_key": "4d1187c88da1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8aa04f30b6a6", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e27dc6a2ba0d" + } + ], + "_type": "block", + "style": "normal", + "_key": "3240999ceaea", + "markDefs": [] + }, + { + "style": "normal", + "_key": "dc2cb91110a6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Anyway, I can’t fully summarize a decade in a couple of paragraphs, but suffice to say, I learned an enormous amount on the job. And in the process, I developed a passion for helping researchers take maximum advantage of the powerful bioinformatics at their disposal. Which inevitably involves workflows.", + "_key": "5914d58458af" + } + ], + "_type": "block" + }, + { + "_key": "230fce52e3da", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "36ff2132848c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "08e684f8496d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Going with the flow", + "_key": "46c367c90543" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "5f7023f1a3c6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Over time my responsibilities at the Broad grew into supporting not just GATK, but also the workflow systems people use to run tools like GATK at scale, both on premises and increasingly, on public cloud platforms. My own pipelining experience has been focused on WDL and Cromwell, but I’ve dabbled with most of the mainstream tools in the space.", + "_key": "aea9e8d65e16" + } + ] + }, + { + "style": "normal", + "_key": "238d072ce291", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "24ca453e40ea", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "If I had a dollar for every time I’ve been asked the question “What’s the best workflow language?” I’d still need a full-time job, but I could maybe take a nice holiday somewhere warm. Oh, and my answer is: whatever gets the work done, plays nice with the systems you’re tied to, and connects you to a community.", + "_key": "687aa3760be9" + } + ], + "_type": "block", + "style": "normal", + "_key": "9466a72f13a0", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "3afcdef89fec", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "4ef39fa89fc5", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "9d0d6a80e37c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "That’s one of the reasons I’ve been watching the growth of Nextflow’s popularity with great interest for the last few years. The amount of community engagement that we’ve seen around Nextflow, and especially around the development of nf-core, has been really impressive.", + "_key": "9958c4180fa5" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "d48f053c364f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5dd6b469f53f", + "markDefs": [] + }, + { + "style": "normal", + "_key": "fae499f0694f", + "markDefs": [ + { + "_type": "link", + "href": "https://summit.nextflow.io/", + "_key": "5351cd89fea8" + } + ], + "children": [ + { + "marks": [], + "text": "So I’m especially thrilled to be joining the Seqera team the week of the ", + "_key": "4d4e71bacc6b", + "_type": "span" + }, + { + "_key": "6f2186ab858a", + "_type": "span", + "marks": [ + "5351cd89fea8" + ], + "text": "Nextflow Summit" + }, + { + "_type": "span", + "marks": [], + "text": " in Barcelona, because it means I’ll get to meet a lot of people from the community in person during my very first few days on the job. I’m also very much looking forward to participating in the hackathon, which should be a great way for me to get started doing real work with Nextflow.", + "_key": "6efb5a1493fd" + } + ], + "_type": "block" + }, + { + "_key": "653f937e2567", + "markDefs": [], + "children": [ + { + "_key": "a6c08b688bf3", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "I’m hoping to see many of you there!", + "_key": "f399c08a1e21", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "efb13805db1c" + } + ], + "author": { + "_ref": "geraldine-van-der-auwera", + "_type": "reference" + }, + "tags": [ + { + "_type": "reference", + "_key": "b892fd05191c", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_type": "reference", + "_key": "8f1b257ce4c7", + "_ref": "3d25991c-f357-442b-a5fa-6c02c3419f88" + } + ], + "title": "Geraldine Van der Auwera joins Seqera", + "_rev": "2PruMrLMGpvZP5qAknmBzw", + "meta": { + "description": "I’m excited to announce that I’m joining Seqera as Lead Developer Advocate. My mission is to support the growth of the Nextflow user community, especially in the USA, which will involve running community events, conducting training sessions, managing communications and working globally with our partners across the field to ensure Nextflow users have what they need to be successful. I’ll be working remotely from Boston, in collaboration with Paolo, Phil and the rest of the Nextflow team.", + "slug": { + "current": "geraldine-van-der-auwera-joins-seqera" + } + }, + "_createdAt": "2024-09-25T14:17:21Z", + "_type": "blogPost" + }, + { + "body": [ + { + "markDefs": [], + "children": [ + { + "text": "We are excited to announce the new Nextflow 19.04.0 stable release! This version includes numerous bug fixes, enhancement and new features.", + "_key": "2e3e0e408c36", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "1293ce798784" + }, + { + "_key": "cd5e9a1cb1a9", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "d12ac6bac022", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Rich logging", + "_key": "c781eb85b887", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "2ae16458573f" + }, + { + "_type": "block", + "style": "normal", + "_key": "841bfab52a1f", + "markDefs": [], + "children": [ + { + "text": "In this release, we are making the new interactive rich output using ANSI escape characters as the default logging option. This produces a much more readable and easy to follow log of the running workflow execution.", + "_key": "9596bf43567e", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "38de9a886329", + "src": "https://asciinema.org/a/IrT6uo85yyVoOjPa6KVzT2FXQ.js", + "_type": "script", + "id": "asciicast-IrT6uo85yyVoOjPa6KVzT2FXQ" + }, + { + "_type": "block", + "style": "normal", + "_key": "753ecb757082", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the ", + "_key": "edd8c20fd0d1" + }, + { + "marks": [ + "code" + ], + "text": "-bg", + "_key": "0ede067d2522", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " option. It can also be explicitly disabled using the ", + "_key": "0982d5a9e28f" + }, + { + "text": "-ansi-log false", + "_key": "72b809afd602", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " option or setting the ", + "_key": "1c54585aad94", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "NXF_ANSI_LOG=false", + "_key": "081e542eff7a" + }, + { + "text": " variable in your launching environment.", + "_key": "73dfe5eb7b2e", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2dd57c01ffd8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2a4b53bc34b8" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "4a43a604aeb5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "NCBI SRA data source", + "_key": "96dce75dea67" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "/blog/2019/release-19.03.0-edge.html", + "_key": "e8537f3ccb99" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The support for NCBI SRA archive was introduced in the ", + "_key": "8218b827579e" + }, + { + "text": "previous edge release", + "_key": "939ba6616744", + "_type": "span", + "marks": [ + "e8537f3ccb99" + ] + }, + { + "_key": "78d3bcb898f7", + "_type": "span", + "marks": [], + "text": ". Given the very positive reaction, we are graduating this feature into the stable release for general availability." + } + ], + "_type": "block", + "style": "normal", + "_key": "0848c6628481" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "22ed3aa384b2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "bda6e8d615a5", + "markDefs": [] + }, + { + "_key": "ca5e05eea5a5", + "markDefs": [], + "children": [ + { + "text": "Sharing", + "_key": "0238a45d2864", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "bfeabbaab108", + "markDefs": [ + { + "_key": "a3ce3eb0758f", + "_type": "link", + "href": "https://gitea.io" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This version includes also a new Git repository provider for the ", + "_key": "8a0d6a3ee0fe" + }, + { + "_type": "span", + "marks": [ + "a3ce3eb0758f" + ], + "text": "Gitea", + "_key": "d5e84d7da68f" + }, + { + "text": " self-hosted source code management system, which is added to the already existing support for GitHub, Bitbucket and GitLab sharing platforms.", + "_key": "24d3a7dc8755", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ed4aa0278117" + } + ], + "_type": "block", + "style": "normal", + "_key": "916d3684f3a4" + }, + { + "_key": "eaedb8c385ea", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Reports and metrics", + "_key": "cf1bd52cac86" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_key": "3310738bc78d", + "_type": "span", + "marks": [], + "text": "Finally, this version includes important enhancements and bug fixes for the task executions metrics collected by Nextflow. If you are using this feature we strongly suggest updating Nextflow to this version." + } + ], + "_type": "block", + "style": "normal", + "_key": "988b3591dd8d" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "e9de57710282", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5f82f78d5f1a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "fe245998dbc5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Remember that updating can be done with the ", + "_key": "cd05f915c458" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow -self-update", + "_key": "ed549cb472db" + }, + { + "_type": "span", + "marks": [], + "text": " command.", + "_key": "8f18357a43ee" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "a83957b9891d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "30f44037eac2" + }, + { + "markDefs": [], + "children": [ + { + "text": "Changelog", + "_key": "3b371dfe46a0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "f1f1f37b9dd0" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v19.04.0", + "_key": "9277ebde3433" + } + ], + "children": [ + { + "_key": "f114e425ecfb", + "_type": "span", + "marks": [], + "text": "The complete list of changes and bug fixes is available on GitHub at " + }, + { + "_type": "span", + "marks": [ + "9277ebde3433" + ], + "text": "this link", + "_key": "04bd083b25fe" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "92b162b1216e" + } + ], + "_type": "block", + "style": "normal", + "_key": "478114a01913" + }, + { + "_type": "block", + "style": "normal", + "_key": "c5b121a89469", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "a4a9334cf6b0", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Contributions", + "_key": "6d192925f413", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "21a51a852bec" + }, + { + "_type": "block", + "style": "normal", + "_key": "379654628375", + "markDefs": [], + "children": [ + { + "_key": "8a6e4adceba9", + "_type": "span", + "marks": [], + "text": "Special thanks to all people contributed to this release by reporting issues, improving the docs or submitting (patiently) a pull request (sorry if we have missed somebody):" + } + ] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9a8178bec8e2", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/acerjanic", + "_key": "de9eee85486b" + } + ], + "children": [ + { + "_key": "ee7a4c85f9650", + "_type": "span", + "marks": [ + "de9eee85486b" + ], + "text": "Alex Cerjanic" + } + ] + }, + { + "_key": "761b7df364b2", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/aunderwo", + "_key": "0f33f5a1e410" + } + ], + "children": [ + { + "text": "Anthony Underwood", + "_key": "906b1f8e51290", + "_type": "span", + "marks": [ + "0f33f5a1e410" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/pachiras", + "_key": "94c9f6cd5a43", + "_type": "link" + } + ], + "children": [ + { + "text": "Akira Sekiguchi", + "_key": "f6662069b1e10", + "_type": "span", + "marks": [ + "94c9f6cd5a43" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a57e15aead00" + }, + { + "_type": "block", + "style": "normal", + "_key": "b2ca31a0c252", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/wflynny", + "_key": "18a1f3625cbe" + } + ], + "children": [ + { + "_type": "span", + "marks": [ + "18a1f3625cbe" + ], + "text": "Bill Flynn", + "_key": "347ef51733e90" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "78da9ee0adf2", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/glormph", + "_key": "88cd4c38bdfa" + } + ], + "children": [ + { + "_key": "eb5fc5451c440", + "_type": "span", + "marks": [ + "88cd4c38bdfa" + ], + "text": "Jorrit Boekel" + } + ], + "level": 1, + "_type": "block" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/olgabot", + "_key": "605357092181" + } + ], + "children": [ + { + "text": "Olga Botvinnik", + "_key": "cb8775a9efe50", + "_type": "span", + "marks": [ + "605357092181" + ] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "36304510aec2" + }, + { + "listItem": "bullet", + "markDefs": [ + { + "href": "https://github.com/olifly", + "_key": "2f595cf24408", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "2f595cf24408" + ], + "text": "Ólafur Haukur Flygenring", + "_key": "bb4378d48e480", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "678233e37665" + }, + { + "style": "normal", + "_key": "9336a9d5c4b9", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/sven1103", + "_key": "1034d450fc57" + } + ], + "children": [ + { + "marks": [ + "1034d450fc57" + ], + "text": "Sven Fillinger", + "_key": "d86cc661c2b30", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + } + ], + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "f7b5b91dcdfe" + } + ], + "_createdAt": "2024-09-25T14:15:45Z", + "meta": { + "description": "We are excited to announce the new Nextflow 19.04.0 stable release!", + "slug": { + "current": "release-19.04.0-stable" + } + }, + "_id": "9a5763abaee6", + "_rev": "mvya9zzDXWakVjnX4hhFqw", + "_updatedAt": "2024-10-02T11:17:27Z", + "_type": "blogPost", + "title": "Nextflow 19.04.0 stable release is out!", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "publishedAt": "2019-04-18T06:00:00.000Z" + }, + { + "meta": { + "slug": { + "current": "nextflow-nf-core-ancient-env-dna" + } + }, + "_updatedAt": "2024-09-26T09:04:49Z", + "_rev": "87gw29IlgU4Z8o00zkoBn4", + "tags": [ + { + "_type": "reference", + "_key": "33a9da578f13", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "title": "Application of Nextflow and nf-core to ancient environmental eDNA", + "author": { + "_ref": "L90MLvtZSPRQtUzPRoOthC", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:18:20Z", + "_type": "blogPost", + "body": [ + { + "children": [ + { + "text": "Ancient environmental DNA (eDNA) is currently a hot topic in archaeological, ecological, and metagenomic research fields. Recent eDNA studies have shown that authentic ‘ancient’ DNA can be recovered from soil and sediments even as far back as 2 million years ago(1). However, as with most things metagenomics (the simultaneous analysis of the entire DNA content of a sample), there is a need to work at scale, processing the large datasets of many sequencing libraries to ‘fish’ out the tiny amounts of temporally degraded ancient DNA from amongst a huge swamp of contaminating modern biomolecules.", + "_key": "6ee19b8cf496", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dae4e1813b08", + "markDefs": [] + }, + { + "style": "normal", + "_key": "ea06ebb13163", + "children": [ + { + "_type": "span", + "text": "", + "_key": "36933a7db853" + } + ], + "_type": "block" + }, + { + "_type": "block", + "_key": "77ca2215201f" + }, + { + "style": "normal", + "_key": "d11d07e0ef4b", + "markDefs": [], + "children": [ + { + "text": "This need to work at scale, while also conducting reproducible analyses to demonstrate the authenticity of ancient DNA, lends itself to the processing of DNA with high-quality pipelines and open source workflow managers such as Nextflow. In this context, I was invited to the Australian Center for Ancient DNA (ACAD) at the University of Adelaide in February 2024 to co-teach a graduate-level course on ‘Hands-on bioinformatics for ancient environmental DNA’, alongside other members of the ancient eDNA community. Workshop participants included PhD students from across Australia, New Zealand, and even from as far away as Estonia.", + "_key": "4870d036e61f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "1dd3983cba45", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "54967b832196" + }, + { + "_type": "image", + "alt": "Mentor compliment about new module added", + "_key": "0c387b66c5bf", + "asset": { + "_ref": "image-d02b4256937d5f43b962628710a6075c184af539-1999x1333-jpg", + "_type": "reference" + } + }, + { + "_type": "block", + "style": "normal", + "_key": "329b3aba2f2d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We began the five-day workshop with an overview of the benefits of using workflow managers and pipelines in academic research, which include efficiency, portability, reproducibility, and fault-tolerance, and we then proceeded to introduce the Ph.D. students to installing Nextflow, and configure pipelines for running on different types of computing infrastructure.", + "_key": "0e2e5df0f4ae", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "4f02a8038d26", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b4d41276f8a4" + } + ] + }, + { + "_key": "53429790bec7", + "asset": { + "_ref": "image-0158575fbb10423d65fb2520b56708b27769be97-1999x1334-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "Review comment in GitHub" + }, + { + "_key": "1491c29326e7", + "markDefs": [ + { + "href": "https://nf-co.re/eager", + "_key": "ad20e8972fa6", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/mag", + "_key": "51aff765364e" + } + ], + "children": [ + { + "_key": "e810dadd7e6f", + "_type": "span", + "marks": [], + "text": "Over the next two days, I then introduced two well-established nf-core pipelines: " + }, + { + "marks": [ + "ad20e8972fa6" + ], + "text": "nf-core/eager", + "_key": "4486a6047c1b", + "_type": "span" + }, + { + "_key": "07ae5eb396bf", + "_type": "span", + "marks": [], + "text": " (2) and " + }, + { + "_type": "span", + "marks": [ + "51aff765364e" + ], + "text": "nf-core/mag", + "_key": "99adeef63a48" + }, + { + "_type": "span", + "marks": [], + "text": " (3), and explained to students how these pipelines can be applied to various aspects of environmental metagenomic and ancient DNA analysis: nf-core/eager is a dedicated ‘swiss-army-knife’ style pipeline for ancient DNA analysis that performs genetic data preprocessing, genomic alignment, variant calling, and metagenomic screening with specific tools and parameters to account for the characteristics of degraded DNA. nf-core/mag is a best-practice pipeline for metagenomic de novo assembly of microbial genomes that performs preprocessing, assembly, binning, bin-refinement and validation. It also contains a specific subworkflow for the authentication of ancient contigs. In both cases, the students of the workshops were given practical tasks to set up and run both pipelines on real data, and time was spent exploring the extensive nf-core documentation and evaluating the outputs from MultiQC, both important components that contribute to the quality of nf-core pipelines.", + "_key": "9ab9606f7a26" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "2b400fe23552" + } + ], + "_type": "block", + "style": "normal", + "_key": "7507a44ada76" + }, + { + "_type": "block", + "style": "normal", + "_key": "93e14d68f88c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The workshop was well received by students, and many were eager (pun intended) to start running Nextflow and nf-core pipelines on their own data at their own institutions.", + "_key": "576d8090b22a" + } + ] + }, + { + "style": "normal", + "_key": "9379b51a1b76", + "children": [ + { + "_type": "span", + "text": "", + "_key": "52a58a477f1c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "73e51449224e", + "markDefs": [ + { + "_key": "18ce0343fdfd", + "_type": "link", + "href": "https://www.adelaide.edu.au/acad/" + }, + { + "_key": "050773095bc4", + "_type": "link", + "href": "https://www.adelaide.edu.au/environment/" + }, + { + "_type": "link", + "href": "https://www.wernersiemens-stiftung.ch/", + "_key": "9d55275745e9" + }, + { + "_key": "861edb0f5eca", + "_type": "link", + "href": "https://www.leibniz-hki.de/" + }, + { + "_key": "fe1f22ce98cd", + "_type": "link", + "href": "https://www.eva.mpg.de" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "I would like to thank Vilma Pérez at ACAD for the invitation to contribute to the workshop as well as Mikkel Winther Pedersen for being my co-instructor, and the nf-core community for continued support in the development of the pipelines. Thank you also to Tina Warinner for proof-reading this blog post, and I would like to acknowledge ", + "_key": "516279972059" + }, + { + "_type": "span", + "marks": [ + "18ce0343fdfd" + ], + "text": "ACAD", + "_key": "ab6602fb7bc9" + }, + { + "_type": "span", + "marks": [], + "text": ", the ", + "_key": "5912511ff7e5" + }, + { + "_type": "span", + "marks": [ + "050773095bc4" + ], + "text": "University of Adelaide Environment Institute", + "_key": "d9e37ba4b96f" + }, + { + "_key": "53546c93cded", + "_type": "span", + "marks": [], + "text": ", the " + }, + { + "_key": "e5021e4c7303", + "_type": "span", + "marks": [ + "9d55275745e9" + ], + "text": "Werner Siemens-Stiftung" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "6a4eef678443" + }, + { + "_key": "024b540cf73b", + "_type": "span", + "marks": [ + "861edb0f5eca" + ], + "text": "Leibniz HKI" + }, + { + "_type": "span", + "marks": [], + "text": ", and ", + "_key": "a9fb8690433d" + }, + { + "_type": "span", + "marks": [ + "fe1f22ce98cd" + ], + "text": "MPI for Evolutionary Anthropology", + "_key": "787354ae9a06" + }, + { + "_type": "span", + "marks": [], + "text": " for financial support to attend the workshop and support in developing nf-core pipelines.", + "_key": "d356f9e9c2bc" + } + ], + "_type": "block" + }, + { + "_key": "13927dcd664d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ea030dc31b9c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "4c6a0fe47161", + "children": [ + { + "_type": "span", + "text": "---", + "_key": "a82644b6fb9b" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8443edf98333" + } + ], + "_type": "block", + "style": "normal", + "_key": "f9c090aa1ff0" + }, + { + "style": "normal", + "_key": "3d3e51b71e97", + "markDefs": [ + { + "_key": "4499c1fc685d", + "_type": "link", + "href": "https://doi.org/10.1038/s41586-022-05453-y" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "(1) Kjær, K.H., Winther Pedersen, M., De Sanctis, B. et al. A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA. Nature ", + "_key": "3eaf6a7b75c4" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "612", + "_key": "0c19eb40f79d" + }, + { + "_key": "d3d3bbc72bec", + "_type": "span", + "marks": [], + "text": ", 283–291 (2022). " + }, + { + "_type": "span", + "marks": [ + "4499c1fc685d" + ], + "text": "https://doi.org/10.1038/s41586-022-05453-y", + "_key": "adaee6269eb9" + } + ], + "_type": "block" + }, + { + "_key": "98307be0b504", + "children": [ + { + "text": "", + "_key": "d177f219fd9d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://doi.org/10.7717/peerj.10947", + "_key": "ced7a536f06e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "(2) Fellows Yates, J.A., Lamnidis, T.C., Borry, M., Andrades Valtueña, A., Fagernäs, Z., Clayton, S., Garcia, M.U., Neukamm, J., Peltzer, A.. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:10947 (2021) ", + "_key": "a0cd4bb69ea0" + }, + { + "marks": [ + "ced7a536f06e" + ], + "text": "http://doi.org/10.7717/peerj.10947", + "_key": "4fdda2f8fc5f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1deae826f97a" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "f84e921369eb" + } + ], + "_type": "block", + "style": "normal", + "_key": "4ce7c7e43ce2" + }, + { + "style": "normal", + "_key": "94be38743543", + "markDefs": [ + { + "_type": "link", + "href": "https://doi.org/10.1093/nargab/lqac007", + "_key": "676cbf8ee601" + } + ], + "children": [ + { + "marks": [], + "text": "(3) Krakau, S., Straub, D., Gourlé, H., Gabernet, G., Nahnsen, S., nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning, NAR Genomics and Bioinformatics, ", + "_key": "3a5f38b179f9", + "_type": "span" + }, + { + "text": "4", + "_key": "7f7f849d1146", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "49915f30a28c", + "_type": "span", + "marks": [], + "text": ":1 (2022) " + }, + { + "_key": "63fad8d0a892", + "_type": "span", + "marks": [ + "676cbf8ee601" + ], + "text": "https://doi.org/10.1093/nargab/lqac007" + } + ], + "_type": "block" + } + ], + "publishedAt": "2024-04-17T06:00:00.000Z", + "_id": "9b244da82d34" + }, + { + "_type": "blogPost", + "publishedAt": "2023-07-24T06:00:00.000Z", + "meta": { + "description": "The Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual Nextflow and nf-core community training online. In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions.", + "slug": { + "current": "nextflow-goes-to-university" + } + }, + "_id": "9b37d7c481a5", + "body": [ + { + "markDefs": [ + { + "href": "https://www.youtube.com/@nf-core/playlists?view=50&sort=dd&shelf_id=2", + "_key": "51eb0944ee8d", + "_type": "link" + }, + { + "_key": "a1b64fa15713", + "_type": "link", + "href": "https://www.nextflow.io/" + }, + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "0d141144762b" + } + ], + "children": [ + { + "marks": [], + "text": "The Nextflow project originated from within an academic research group, so perhaps it’s no surprise that education is an essential part of the Nextflow and nf-core communities. Over the years, we have established several regular training resources: we have a weekly online seminar series called nf-core/bytesize and run hugely popular bi-annual ", + "_key": "a1e6a8ac31a4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "51eb0944ee8d" + ], + "text": "Nextflow and nf-core community training online", + "_key": "a1c83572d41f" + }, + { + "marks": [], + "text": ". In 2022, Seqera established a new community and growth team, funded in part by a grant from the Chan Zuckerberg Initiative “Essential Open Source Software for Science” grant. We are all former bioinformatics researchers from academia and part of our mission is to build resources and programs to support academic institutions. We want to help to provide leading edge, high-quality, ", + "_key": "a8a2f07be3ed", + "_type": "span" + }, + { + "_key": "319609bf04e5", + "_type": "span", + "marks": [ + "a1b64fa15713" + ], + "text": "Nextflow" + }, + { + "marks": [], + "text": " and ", + "_key": "15668be18dee", + "_type": "span" + }, + { + "_key": "2fd4b6857cf3", + "_type": "span", + "marks": [ + "0d141144762b" + ], + "text": "nf-core" + }, + { + "_key": "206bd0aeb42f", + "_type": "span", + "marks": [], + "text": " training for Masters and Ph.D. students in Bioinformatics and other related fields." + } + ], + "_type": "block", + "style": "normal", + "_key": "d51c89973ec4" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ed270fa7eeed" + } + ], + "_type": "block", + "style": "normal", + "_key": "671606aec285" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://bioinfo.imd.ufrn.br/site/en-US", + "_key": "98fa1d653af9" + }, + { + "_key": "95ae3e5d4d94", + "_type": "link", + "href": "https://www.ufrn.br/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We recently held one of our first such projects, a collaboration with the ", + "_key": "42b7195e5bc8" + }, + { + "text": "Bioinformatics Multidisciplinary Environment, BioME", + "_key": "46b09c776de5", + "_type": "span", + "marks": [ + "98fa1d653af9" + ] + }, + { + "_type": "span", + "marks": [], + "text": " at the ", + "_key": "ee3c9a7e3150" + }, + { + "text": "Federal University of Rio Grande do Norte (UFRN)", + "_key": "492cc5dc78c8", + "_type": "span", + "marks": [ + "95ae3e5d4d94" + ] + }, + { + "marks": [], + "text": " in Brazil. The UFRN is one of the largest universities in Brazil with over 40,000 enrolled students, hosting one of the best-ranked bioinformatics programs in Brazil, attracting students from all over the country. The BioME department runs courses for Masters and Ph.D. students, including a flexible course dedicated to cutting-edge bioinformatics techniques. As part of this, we were invited to run an 8-day Nextflow and nf-core graduate course. Participants attended 5 days of training seminars and presented a Nextflow project at the end of the course. Upon successful completion of the course, participants received graduate program course credits as well as a Seqera Labs certified certificate recognizing their knowledge and hands-on experience 😎.", + "_key": "65a381137efb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d8f1842dc22" + }, + { + "_key": "8b987db41ddb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "93fa24d2ac77" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The course participants included one undergraduate student, Master's students, Ph.D. students, and postdocs with very diverse backgrounds. While some had prior Nextflow and nf-core experience and had already attended Nextflow training, others had never used it. Unsurprisingly, they all chose very different project topics to work on and present to the rest of the group. At the end of the course, eleven students chose to undergo the final project evaluation for the Seqera certification. They all passed with flying colors!", + "_key": "9e9233ba5efe" + } + ], + "_type": "block", + "style": "normal", + "_key": "95ae409ffc6d", + "markDefs": [] + }, + { + "_key": "4db30702ef53", + "asset": { + "_type": "reference", + "_ref": "image-01131f8592dcf8d31016b05cb8487eb568ff4fb3-3088x1737-jpg" + }, + "_type": "image" + }, + { + "_type": "block", + "style": "normal", + "_key": "bd9fef2d7a26", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Marcel with some of the students that attended the course.", + "_key": "64b7c36e38f90", + "_type": "span" + } + ] + }, + { + "_key": "ccbbf943d4f3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "fb683ca048810" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "fb83e1607c77", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Final projects", + "_key": "4f6206eb7dfa", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "4dc518ee10e1", + "markDefs": [], + "children": [ + { + "_key": "598db10b2880", + "_type": "span", + "marks": [], + "text": "Final hands-on projects are very useful not only to practice new skills but also to have a tangible deliverable at the end of the course. It could be the first step of a long journey with Nextflow, especially if you work on a project that lives on after the course concludes. Participants were given complete freedom to design a project that was relevant to them and their interests. Many students were very satisfied with their projects and intend to continue working on them after the course conclusion." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "41be59fe5f7b" + } + ], + "_type": "block", + "style": "normal", + "_key": "d27c3dc12d23" + }, + { + "_type": "block", + "style": "h3", + "_key": "a723dc0c7a21", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Euryale 🐍", + "_key": "8c15efc545af", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "6b187dc441b7", + "_type": "span", + "marks": [ + "b63fe28f6eb5" + ], + "text": "João Vitor Cavalcante" + }, + { + "_type": "span", + "marks": [], + "text": ", along with collaborators, had developed and ", + "_key": "485a180a8a2a" + }, + { + "marks": [ + "701842123b44" + ], + "text": "published", + "_key": "aa9ddba2c84d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " a Snakemake pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences called MEDUSA. During the course, after seeing the huge potential of Nextflow, he decided to fully translate this pipeline to Nextflow, but with a new name: Euryale. You can check the result ", + "_key": "672936f430f4" + }, + { + "marks": [ + "1ce0127637e9" + ], + "text": "here", + "_key": "b15e9d5bb14b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " 😍 Why Euryale? In Greek mythology, Euryale was one of the three gorgons, a sister to Medusa 🤓", + "_key": "a4b5dca97d87" + } + ], + "_type": "block", + "style": "normal", + "_key": "2aa8a02975b8", + "markDefs": [ + { + "href": "https://www.linkedin.com/in/joao-vitor-cavalcante", + "_key": "b63fe28f6eb5", + "_type": "link" + }, + { + "href": "https://www.frontiersin.org/articles/10.3389/fgene.2022.814437/full", + "_key": "701842123b44", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/dalmolingroup/euryale/", + "_key": "1ce0127637e9" + } + ] + }, + { + "_key": "50fbd5600038", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "58a9436ce884", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Bringing Nanopore to Google Batch ☁️", + "_key": "4ea5b9601fee", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "ed16e8920c58" + }, + { + "_key": "de08655bb970", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/epi2me-labs/wf-alignment", + "_key": "30afb2396762" + }, + { + "href": "https://www.linkedin.com/in/daniloimparato", + "_key": "0729f1dd0df5", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/daniloimparato/wf-alignment", + "_key": "211f0107b536" + } + ], + "children": [ + { + "marks": [], + "text": "The Customer Workflows Group at Oxford Nanopore Technologies (ONT) has adopted Nextflow to develop and distribute general-purpose pipelines for its customers. One of these pipelines, ", + "_key": "2a0259338942", + "_type": "span" + }, + { + "text": "wf-alignment", + "_key": "546c43557f8b", + "_type": "span", + "marks": [ + "30afb2396762" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", takes a FASTQ directory and a reference directory and outputs a minimap2 alignment, along with samtools stats and an HTML report. Both samtools stats and the HTML report generated by this pipeline are well suited for Nextflow Tower’s Reports feature. However, ", + "_key": "cd2bf181005d" + }, + { + "text": "Danilo Imparato", + "_key": "8dd718b54721", + "_type": "span", + "marks": [ + "0729f1dd0df5" + ] + }, + { + "_type": "span", + "marks": [], + "text": " noticed that the pipeline lacked support for using Google Cloud as compute environment and decided to work on this limitation on his ", + "_key": "f2f8992b2a4e" + }, + { + "marks": [ + "211f0107b536" + ], + "text": "final project", + "_key": "1a3645726330", + "_type": "span" + }, + { + "_key": "aa9a085e3334", + "_type": "span", + "marks": [], + "text": ", which included fixing a few bugs specific to running it on Google Cloud and making the reports available on Nextflow Tower 🤯" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "463e99401e21", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "04235bc45cf1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "947f0bd8688a", + "_type": "span", + "marks": [], + "text": "Nextflow applied to economics! 🤩" + } + ], + "_type": "block", + "style": "h3", + "_key": "7260eeddc37b" + }, + { + "markDefs": [ + { + "href": "https://www.linkedin.com/in/galileu-nobre-901551187/", + "_key": "4b8a87de6eee", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/galileunobre/nextflow_projeto_1", + "_key": "10465c90626d" + } + ], + "children": [ + { + "text": "Galileu Nobre", + "_key": "703bc67f2b54", + "_type": "span", + "marks": [ + "4b8a87de6eee" + ] + }, + { + "marks": [], + "text": " is studying Economical Sciences and decided to convert his scripts into a Nextflow pipeline for his ", + "_key": "46c5ecb28dd1", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "10465c90626d" + ], + "text": "final project", + "_key": "64310262ad38" + }, + { + "text": ". The goal of the pipeline is to estimate the demand for health services in Brazil based on data from the 2019 PNS (National Health Survey), (a) treating this database to contain only the variables we will work with, (b) running a descriptive analysis to determine the data distribution in order to investigate which models would be best applicable. In the end, two regression models, Poisson, and the Negative Binomial, are used to estimate the demand. His work is an excellent example of applying Nextflow to fields outside of traditional bioinformatics 😉.", + "_key": "1d76f5973b62", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "eed89a5c82e3" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "4a6105363d71", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "afde11d90f50", + "markDefs": [] + }, + { + "style": "h3", + "_key": "0ab6c2cc8f1c", + "markDefs": [], + "children": [ + { + "_key": "49d330fc42fd", + "_type": "span", + "marks": [], + "text": "Whole-exome sequencing 🧬" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "24f4a098cfb8", + "markDefs": [ + { + "_key": "b42bc6498821", + "_type": "link", + "href": "https://github.com/RafaellaFerraz/exome" + }, + { + "_key": "cef34af031d8", + "_type": "link", + "href": "https://www.linkedin.com/in/rafaella-sousa-ferraz" + } + ], + "children": [ + { + "marks": [], + "text": "For her ", + "_key": "ba0b55ed6e6a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "b42bc6498821" + ], + "text": "final project", + "_key": "acdf2ab2a152" + }, + { + "text": ", ", + "_key": "a3bdce18001c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "cef34af031d8" + ], + "text": "Rafaella Ferraz", + "_key": "40e6caafdd31" + }, + { + "_type": "span", + "marks": [], + "text": " used nf-core/tools to write a whole-exome sequencing analysis pipeline from scratch. She applied her new skills using nf-core modules and sub-workflows to achieve this and was able to launch and monitor her pipeline using Nextflow Tower. Kudos to Rafaella! 👏🏻", + "_key": "0b02b4bc1c94" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8357f7b6dc70" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1f089d93833" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "RNASeq with contamination 🧫", + "_key": "6df8e7a6d76f" + } + ], + "_type": "block", + "style": "h3", + "_key": "bef5f952a22d", + "markDefs": [] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/iaradsouza1/tab-projeto-final", + "_key": "2ce26d041b3c" + }, + { + "href": "https://www.linkedin.com/in/iaradsouza", + "_key": "cfd0cab87952", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In her ", + "_key": "0eb4bc972908" + }, + { + "_key": "aaf4218b9a96", + "_type": "span", + "marks": [ + "2ce26d041b3c" + ], + "text": "final project" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "a24d31f9fd3a" + }, + { + "_type": "span", + "marks": [ + "cfd0cab87952" + ], + "text": "Iara Souza", + "_key": "289a12ef43de" + }, + { + "_key": "c934b4b9bb09", + "_type": "span", + "marks": [], + "text": " developed a bioinformatics pipeline that analyzed RNA-Seq data when it's required to have an extra pre-filtering step. She needed this for analyzing data from RNA-Seq experiments performed in cell culture, where there is a high probability of contamination of the target transcriptome with the host transcriptome. Iara was able to learn how to use nf-core/tools and benefit from all the \"batteries included\" that come with it 🔋😬" + } + ], + "_type": "block", + "style": "normal", + "_key": "4564b1ec2cd1" + }, + { + "_key": "d17e0050429d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8fdc9c47a48d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "SARS-CoV-2 genome assembly and lineage classification 🦠", + "_key": "5e614271b6f9" + } + ], + "_type": "block", + "style": "h3", + "_key": "9af825ed2dd5", + "markDefs": [] + }, + { + "_key": "209d5b4a82af", + "markDefs": [ + { + "href": "https://www.linkedin.com/in/diego-go-tex", + "_key": "115f0423c569", + "_type": "link" + }, + { + "href": "https://github.com/diegogotex/sarscov2_irma_nf", + "_key": "bb7f4cab9138", + "_type": "link" + } + ], + "children": [ + { + "_key": "632422e1b571", + "_type": "span", + "marks": [ + "115f0423c569" + ], + "text": "Diego Teixeira" + }, + { + "_key": "9f0381c1c9ec", + "_type": "span", + "marks": [], + "text": " has been working with SARS-CoV-2 genome assembly and lineage classification. As his final project, he wrote a " + }, + { + "marks": [ + "bb7f4cab9138" + ], + "text": "Nextflow pipeline", + "_key": "26663d54bd35", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " aggregating all tools and analyses he's been doing, allowing him to be much more efficient in his work and have a reproducible pipeline that can easily be shared with collaborators.", + "_key": "882b9f30cfa9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "21be266b3a74", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "eb2e9d6a5605" + }, + { + "style": "normal", + "_key": "f7353f089f7d", + "markDefs": [ + { + "href": "https://nf-co.re/modules", + "_key": "77bf0a155166", + "_type": "link" + }, + { + "_type": "link", + "href": "https://nf-co.re/pipelines", + "_key": "70f0ef63311f" + } + ], + "children": [ + { + "_key": "0bfb01e85e81", + "_type": "span", + "marks": [], + "text": "In the nf-core project, there are almost a " + }, + { + "_key": "5f4d6cc147af", + "_type": "span", + "marks": [ + "77bf0a155166" + ], + "text": "thousand modules" + }, + { + "text": " ready to plug in your pipeline, together with ", + "_key": "27d12fb3edc7", + "_type": "span", + "marks": [] + }, + { + "text": "dozens of full-featured pipelines", + "_key": "61260fa7e7e9", + "_type": "span", + "marks": [ + "70f0ef63311f" + ] + }, + { + "marks": [], + "text": ". However, in many situations, you'll need a custom pipeline. With that in mind, it's very useful to master the skills of Nextflow scripting so that you can take advantage of everything that is available, both building new pipelines and modifying public ones.", + "_key": "17fda30e3afd", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "2830d86ace99", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ae5ca96f544b", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "185df3b0ca51", + "_type": "span", + "marks": [], + "text": "Exciting experience!" + } + ], + "_type": "block", + "style": "h2", + "_key": "2f1b0f0ec8f0" + }, + { + "_type": "block", + "style": "normal", + "_key": "ada74de71e55", + "markDefs": [], + "children": [ + { + "text": "It was an amazing experience to see what each participant had worked on for their final projects! 🤯 They were all able to master the skills required to write Nextflow pipelines in real-life scenarios, which can continue to be used well after the end of the course. For people just starting their adventure with Nextflow, it can feel overwhelming to use nf-core tools with all the associated best practices, but students surprised me by using nf-core tools from the very beginning and having their project almost perfectly fitting the best practices 🤩", + "_key": "5aeb873e4d94", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f21cfe6563d3" + } + ], + "_type": "block", + "style": "normal", + "_key": "79b864798589" + }, + { + "_type": "block", + "style": "normal", + "_key": "0704dcad0c59", + "markDefs": [ + { + "_type": "link", + "href": "mailto:community@seqera.io", + "_key": "51fd15d277e6" + } + ], + "children": [ + { + "text": "We’d love to help out with more university bioinformatics courses like this. If you think your institution could benefit from such an experience, please don't hesitate to reach out to us at ", + "_key": "a4e4fc9f1900", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "51fd15d277e6" + ], + "text": "community@seqera.io", + "_key": "e407f6a7ccc9" + }, + { + "_type": "span", + "marks": [], + "text": ". We would love to hear from you!", + "_key": "282f228555fb" + } + ] + } + ], + "_updatedAt": "2024-09-27T12:42:31Z", + "author": { + "_type": "reference", + "_ref": "mNsm4Vx1W1Wy6aYYkroetD" + }, + "_createdAt": "2024-09-25T14:17:28Z", + "tags": [ + { + "_key": "31f07dee1557", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "title": "Nextflow goes to university!", + "_rev": "Ot9x7kyGeH5005E3MJ9Q5P" + }, + { + "tags": [ + { + "_key": "26346f7cd8d2", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_id": "9b87bf9b257e", + "_rev": "mvya9zzDXWakVjnX4hZGhC", + "publishedAt": "2020-12-01T07:00:00.000Z", + "author": { + "_type": "reference", + "_ref": "ntV3A5cVsWRByk7zltFbVD" + }, + "_updatedAt": "2024-09-26T09:02:23Z", + "title": "Learning Nextflow in 2020", + "_createdAt": "2024-09-25T14:15:51Z", + "meta": { + "slug": { + "current": "learning-nextflow-in-2020" + } + }, + "_type": "blogPost", + "body": [ + { + "_key": "ec243eea7932", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With the year nearly over, we thought it was about time to pull together the best-of-the-best guide for learning Nextflow in 2020. These resources will support anyone in the journey from total noob to Nextflow expert so this holiday season, give yourself or someone you know the gift of learning Nextflow!", + "_key": "d7828361705f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6874645080af", + "children": [ + { + "_key": "2e31a88f26ba", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "Prerequisites to get started", + "_key": "1109639f73a7" + } + ], + "_type": "block", + "style": "h3", + "_key": "658684f4efb2" + }, + { + "_type": "block", + "style": "normal", + "_key": "8b239fea3cb4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We recommend that learners are comfortable with using the command line and the basic concepts of a scripting language such as Python or Perl before they start writing pipelines. Nextflow is widely used for bioinformatics applications, and the examples in these guides often focus on applications in these topics. However, Nextflow is now adopted in a number of data-intensive domains such as radio astronomy, satellite imaging and machine learning. No domain expertise is expected.", + "_key": "bf3b7594f19e" + } + ] + }, + { + "_key": "b898b56902c5", + "children": [ + { + "text": "", + "_key": "44bddfeb8de5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "2d5786a4bef3", + "children": [ + { + "_type": "span", + "text": "Time commitment", + "_key": "9747a82532be" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "11a27e290599", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We estimate that the speediest of learners can complete the material in around 12 hours. It all depends on your background and how deep you want to dive into the rabbit-hole! Most of the content is introductory with some more advanced dataflow and configuration material in the workshops and patterns sections.", + "_key": "0665f71fc382" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7eb8fa097dd4" + } + ], + "_type": "block", + "style": "normal", + "_key": "64d1a189bb3c" + }, + { + "children": [ + { + "text": "Overview of the material", + "_key": "9e3870826e12", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "3d5c12073549" + }, + { + "_type": "block", + "style": "normal", + "_key": "5323d4647624", + "listItem": "bullet", + "children": [ + { + "text": "Why learn Nextflow?", + "_key": "d677eb40c233", + "_type": "span", + "marks": [] + }, + { + "_key": "2796b9f568b7", + "_type": "span", + "marks": [], + "text": "Introduction to Nextflow - AWS HPC Conference 2020 (8m)" + }, + { + "text": "A simple RNA-Seq hands-on tutorial (2h)", + "_key": "dbe7d88a5c46", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Full-immersion workshop (8h)", + "_key": "1a32933ff283" + }, + { + "_key": "795ea7702b80", + "_type": "span", + "marks": [], + "text": "Nextflow advanced implementation Patterns (2h)" + }, + { + "marks": [], + "text": "Other resources", + "_key": "ee6401fd0f6d", + "_type": "span" + }, + { + "_key": "a92255f8c98c", + "_type": "span", + "marks": [], + "text": "Community and Support" + } + ] + }, + { + "style": "normal", + "_key": "044ffc61c1c4", + "children": [ + { + "_type": "span", + "text": "", + "_key": "28deffc5d286" + } + ], + "_type": "block" + }, + { + "_key": "118b0e18f56c", + "children": [ + { + "text": "1. Why learn Nextflow?", + "_key": "ed3d2b9857dd", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "text": "Nextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.", + "_key": "20f3dceceaf4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "909d6320df79", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "47d19dfebd0b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e5820419d1f3" + }, + { + "style": "normal", + "_key": "18898467f90b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The following are some high-level motivations on why people choose to adopt Nextflow:", + "_key": "fe7607bb7b02", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "885d749eeccf", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "4e0b044b2a22" + }, + { + "_key": "859fac37688b", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Integrating Nextflow in your analysis workflows helps you implement **reproducible** pipelines. Nextflow pipelines follow FDA repeatability and reproducibility guidelines with version-control and containers to manage all software dependencies.", + "_key": "db3825f8821a" + }, + { + "_type": "span", + "marks": [], + "text": "Avoid vendor lock-in by ensuring portability. Nextflow is **portable**; the same pipeline written on a laptop can quickly scale to run HPC cluster, Amazon and Google cloud services, and Kubernetes. The code stays constant across varying infrastructures allowing collaboration and avoiding lock-in.", + "_key": "14ffde60f039" + }, + { + "_key": "1dead3afb753", + "_type": "span", + "marks": [], + "text": "It is **scalable** allowing the parallelization of tasks using the dataflow paradigm without having to hard-code the pipeline to a specific platform architecture." + }, + { + "_type": "span", + "marks": [], + "text": "It is **flexible** and supports scientific workflow requirements like caching processes to prevent re-computation, and workflow reports to better understand the workflows’ executions.", + "_key": "eb963dc301df" + }, + { + "_key": "772bc4647b89", + "_type": "span", + "marks": [], + "text": "It is **growing fast** and has **long-term support**. Developed since 2013 by the same team, the Nextflow ecosystem is expanding rapidly." + }, + { + "_type": "span", + "marks": [], + "text": "It is **open source** and licensed under Apache 2.0. You are free to use it, modify it and distribute it.", + "_key": "d63e3b6eeb83" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "a54132d2dc43", + "children": [ + { + "_key": "a2bc438af49d", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "h3", + "_key": "d84c1dc318ad", + "children": [ + { + "_type": "span", + "text": "2. Introduction to Nextflow from the HPC on AWS Conference 2020", + "_key": "5cce81887a6a" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "96b00b00b53f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This short YouTube video provides a general overview of Nextflow, the motivations behind its development and a demonstration of some of the latest features.", + "_key": "435a3a7711ea" + } + ] + }, + { + "_key": "00959684a85a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4af8305831ca" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "_key": "857e1888f93e" + }, + { + "children": [ + { + "_key": "829d63c098c8", + "_type": "span", + "text": "3. A simple RNA-Seq hands-on tutorial" + } + ], + "_type": "block", + "style": "h3", + "_key": "a99864bccac5" + }, + { + "children": [ + { + "text": "This hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, use channels for data and write processes to perform tasks. It includes all scripts, data and resources and is perfect for getting a flavor for Nextflow.", + "_key": "c0e963ac9edf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ae0844a419ba", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "c6bcd469bf91", + "children": [ + { + "text": "", + "_key": "d275a00ceea1", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/seqeralabs/nextflow-tutorial", + "_key": "4df27a6b0eaf" + } + ], + "children": [ + { + "marks": [ + "4df27a6b0eaf" + ], + "text": "Tutorial link on GitHub", + "_key": "0cfcb213179d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5851d7c30f34" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0e614a5583a4" + } + ], + "_type": "block", + "style": "normal", + "_key": "828b83a283a8" + }, + { + "_type": "block", + "style": "h3", + "_key": "fab772cee5a7", + "children": [ + { + "_key": "c4fb9413e1d7", + "_type": "span", + "text": "4. Full-immersion workshop" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8f32d2a5543a", + "markDefs": [], + "children": [ + { + "_key": "7e65b1d64a4d", + "_type": "span", + "marks": [], + "text": "Here you’ll dive deeper into Nextflow’s most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop." + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "c69b4f60efd2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1a3f00ffcc0b" + }, + { + "style": "normal", + "_key": "51264ab39b18", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The workshop includes topics on:", + "_key": "e209d207767e" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "fae6c7dfc61e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "315f4bfd4008" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "9d931bd746b5", + "listItem": "bullet", + "children": [ + { + "_key": "81507a7f49b4", + "_type": "span", + "marks": [], + "text": "Environment Setup" + }, + { + "_type": "span", + "marks": [], + "text": "Basic NF Script and Concepts", + "_key": "c57f614b47c5" + }, + { + "_key": "77de9cbed9d1", + "_type": "span", + "marks": [], + "text": "Nextflow Processes" + }, + { + "_type": "span", + "marks": [], + "text": "Nextflow Channels", + "_key": "f6f5d745d345" + }, + { + "_key": "60caff5fde15", + "_type": "span", + "marks": [], + "text": "Nextflow Operators" + }, + { + "text": "Basic RNA-Seq pipeline", + "_key": "5b363f4f8134", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Containers & Conda", + "_key": "032d8689fcfc", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "Nextflow Configuration", + "_key": "5682300c2153" + }, + { + "marks": [], + "text": "On-premise & Cloud Deployment", + "_key": "aa44035cde13", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "DSL 2 & Modules", + "_key": "43c582151a19" + }, + { + "marks": [], + "text": "[GATK hands-on exercise](https://seqera.io/training/handson/)", + "_key": "473cfe5962c6", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "609077460b9b", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ba5280cb42e0" + }, + { + "_key": "16aa1d9198ab", + "markDefs": [ + { + "href": "https://seqera.io/training", + "_key": "d0c78ac0e65a", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.youtube.com/playlist?list=PLPZ8WHdZGxmUv4W8ZRlmstkZwhb_fencI", + "_key": "236ce4aa7caa" + } + ], + "children": [ + { + "_key": "8781d0eb8f42", + "_type": "span", + "marks": [ + "d0c78ac0e65a" + ], + "text": "Workshop" + }, + { + "_type": "span", + "marks": [], + "text": " & ", + "_key": "13ee9a510161" + }, + { + "text": "YouTube playlist", + "_key": "ec6476efb895", + "_type": "span", + "marks": [ + "236ce4aa7caa" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "b01c20c5f604" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0ce1bc0ba5cd", + "children": [ + { + "_key": "cb8cc58ca50d", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "df9b986fec26", + "children": [ + { + "text": "5. Nextflow implementation Patterns", + "_key": "92844e4c5be4", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "66415f7c8598", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along with as well as a GitHub repository.", + "_key": "f8515cfc7299" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "c53b4ff0689d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4b0705b1e73d" + }, + { + "style": "normal", + "_key": "80ee91e3c0a7", + "markDefs": [ + { + "_key": "f1370ba7950d", + "_type": "link", + "href": "http://nextflow-io.github.io/patterns/index.html" + }, + { + "_type": "link", + "href": "https://github.com/nextflow-io/patterns", + "_key": "f8ee24d5440f" + } + ], + "children": [ + { + "text": "Nextflow Patterns", + "_key": "32d0e2a812b4", + "_type": "span", + "marks": [ + "f1370ba7950d" + ] + }, + { + "text": " & ", + "_key": "49c1e2037fc7", + "_type": "span", + "marks": [] + }, + { + "text": "GitHub repository", + "_key": "dcc5829fde32", + "_type": "span", + "marks": [ + "f8ee24d5440f" + ] + }, + { + "_key": "37e8a74bede3", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "_key": "079742e0e59d", + "children": [ + { + "text": "", + "_key": "7ea8298717ba", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "1459fa425f98", + "children": [ + { + "text": "Other resources", + "_key": "82857d03c930", + "_type": "span" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing something.", + "_key": "4a4ad395d8db" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc8e523a1f91" + }, + { + "_key": "4e3718362e90", + "children": [ + { + "_key": "c167cfdf2ebc", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b360e2d14fd2", + "children": [ + { + "_type": "span", + "text": "Nextflow docs", + "_key": "a6c8193c3542" + } + ], + "_type": "block", + "style": "h4" + }, + { + "style": "normal", + "_key": "2015f7dbc0e4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The reference for the Nextflow language and runtime. The docs should be your first point of reference when something is not clear. Newest features are documented in edge documentation pages released every month with the latest stable releases every three months.", + "_key": "82094361ec90", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6492d22598be", + "children": [ + { + "text": "", + "_key": "e1bfd7776825", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3a02aab8e359", + "markDefs": [ + { + "_key": "9ea1852c987c", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/index.html" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/edge/index.html", + "_key": "af39349cf0c0" + } + ], + "children": [ + { + "_key": "1dd344f8209b", + "_type": "span", + "marks": [], + "text": "Latest " + }, + { + "marks": [ + "9ea1852c987c" + ], + "text": "stable", + "_key": "b1e1681300d9", + "_type": "span" + }, + { + "_key": "b42edf4a0427", + "_type": "span", + "marks": [], + "text": " & " + }, + { + "_type": "span", + "marks": [ + "af39349cf0c0" + ], + "text": "edge", + "_key": "2fea2beeb2a7" + }, + { + "_type": "span", + "marks": [], + "text": " documentation.", + "_key": "7ac177ac52ce" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "653e710aec5e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1096679fc93a" + }, + { + "style": "h4", + "_key": "93e5f25f3d19", + "children": [ + { + "text": "nf-core", + "_key": "800bef5d1174", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "nf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines built by domain experts with Nextflow, that have passed tests and have been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.", + "_key": "9d807c91013d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f42732f34f08" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a358875c8c1e" + } + ], + "_type": "block", + "style": "normal", + "_key": "61a02123dbd6" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "cce221aede02" + ], + "text": "nf-core website", + "_key": "4f05ba4980d4" + } + ], + "_type": "block", + "style": "normal", + "_key": "f6f4170b4a29", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re", + "_key": "cce221aede02" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ae3873d1e79f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "de1873afa7a6" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "Tower Docs", + "_key": "3561cb894e78" + } + ], + "_type": "block", + "style": "h4", + "_key": "7a3a9f709aef" + }, + { + "_type": "block", + "style": "normal", + "_key": "e1ad8c868c71", + "markDefs": [], + "children": [ + { + "_key": "1610ffbfea7d", + "_type": "span", + "marks": [], + "text": "Nextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface or API." + } + ] + }, + { + "style": "normal", + "_key": "eb8798cf0866", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b915de4027d2" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "38d78b8ce216", + "markDefs": [ + { + "_type": "link", + "href": "http://help.tower.nf", + "_key": "2cac74f21ed1" + } + ], + "children": [ + { + "_key": "8b71355e37ec", + "_type": "span", + "marks": [ + "2cac74f21ed1" + ], + "text": "Nextflow Tower documentation" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ed027a236597", + "children": [ + { + "_type": "span", + "text": "", + "_key": "eed280c4a5fd" + } + ] + }, + { + "style": "h4", + "_key": "ee0086be2e84", + "children": [ + { + "text": "Nextflow Biotech Blueprint by AWS", + "_key": "a901b450219c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "173119383a87", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.", + "_key": "f3cc3ad52963" + } + ] + }, + { + "_key": "60e67bc93de1", + "children": [ + { + "_key": "0e51751ef9c7", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0c322dae0e9b", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/quickstart/biotech-blueprint/nextflow/", + "_key": "197bffe6288b" + } + ], + "children": [ + { + "marks": [ + "197bffe6288b" + ], + "text": "Biotech Blueprint by AWS", + "_key": "c88e87317e13", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "12abd7a9004f", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e10a33221640" + }, + { + "_key": "ba0967762182", + "children": [ + { + "_type": "span", + "text": "Running Nextflow by Google Cloud", + "_key": "40ecff94dcb1" + } + ], + "_type": "block", + "style": "h4" + }, + { + "_key": "f2382c1e02d2", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Google Cloud Nextflow step-by-step guide to launching Nextflow Pipelines in Google Cloud.", + "_key": "565dac2477f8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "c0db76389a3a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1dfc00b9bf2c" + }, + { + "children": [ + { + "text": "Nextflow on Google Cloud ", + "_key": "0912e046fb90", + "_type": "span", + "marks": [ + "bad8c87da0c5" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "563aec7a4eaf", + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.google.com/life-sciences/docs/tutorials/nextflow", + "_key": "bad8c87da0c5" + } + ] + }, + { + "style": "normal", + "_key": "b9a08434ef66", + "children": [ + { + "text": "", + "_key": "a3268f8eec00", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h4", + "_key": "ab3e00f0e421", + "children": [ + { + "text": "Awesome Nextflow", + "_key": "f0b59f2c016d", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "713dde7e4800", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A collections of Nextflow based pipelines and other resources.", + "_key": "39e255979720" + } + ], + "_type": "block" + }, + { + "_key": "224951b3ba68", + "children": [ + { + "_type": "span", + "text": "", + "_key": "543adaea9638" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/awesome-nextflow", + "_key": "fb2bf93afe89" + } + ], + "children": [ + { + "marks": [ + "fb2bf93afe89" + ], + "text": "Awesome Nextflow", + "_key": "5fa6c2982b91", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "8d84a704fbed" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "4e9e103c7c76" + } + ], + "_type": "block", + "style": "normal", + "_key": "f873fb78fc41" + }, + { + "_key": "6c611be6b143", + "children": [ + { + "_type": "span", + "text": "Community and support", + "_key": "4f053d76899e" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "6eab3043a09c", + "listItem": "bullet", + "children": [ + { + "_key": "e243b4e16933", + "_type": "span", + "marks": [], + "text": "Nextflow [Gitter channel](https://gitter.im/nextflow-io/nextflow)" + }, + { + "text": "Nextflow [Forums](https://groups.google.com/forum/#!forum/nextflow)", + "_key": "836d5c4d51d8", + "_type": "span", + "marks": [] + }, + { + "_key": "a06702e87dd8", + "_type": "span", + "marks": [], + "text": "[nf-core Slack](https://nfcore.slack.com/)" + }, + { + "_type": "span", + "marks": [], + "text": "Twitter [@nextflowio](https://twitter.com/nextflowio?lang=en)", + "_key": "c8d0f13301ee" + }, + { + "_type": "span", + "marks": [], + "text": "[Seqera Labs](https://www.seqera.io) technical support & consulting", + "_key": "77fbea1370be" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "9b17fadfabcf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4fae4f62adb7" + } + ] + }, + { + "style": "normal", + "_key": "a148e60e0c75", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow is a community-driven project. The list of links below has been collated from a diverse collection of resources and experts to guide you in learning Nextflow. If you have any suggestions, please make a pull request to this page on GitHub.", + "_key": "2a88c6dbbd39", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "acf60965a11e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a0a6383a2c71" + } + ] + }, + { + "_key": "87f83f89d66f", + "markDefs": [], + "children": [ + { + "_key": "9157e1c924bc", + "_type": "span", + "marks": [], + "text": "Also stay tuned for our upcoming post, where we will discuss the ultimate Nextflow development environment." + } + ], + "_type": "block", + "style": "normal" + } + ] + }, + { + "publishedAt": "2023-10-18T06:00:00.000Z", + "author": { + "_ref": "drafts.phil-ewels", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:17:08Z", + "_rev": "2PruMrLMGpvZP5qAknm7x9", + "title": "Introducing community.seqera.io", + "body": [ + { + "_key": "af2c864aced4", + "markDefs": [ + { + "_key": "4aad2eff77d4", + "_type": "link", + "href": "https://community.seqera.io/" + } + ], + "children": [ + { + "marks": [], + "text": "We are very excited to introduce the ", + "_key": "e34333baff99", + "_type": "span" + }, + { + "marks": [ + "4aad2eff77d4" + ], + "text": "Seqera community forum", + "_key": "ab2f50b90a80", + "_type": "span" + }, + { + "marks": [], + "text": " - the new home of the Nextflow community!", + "_key": "c14622f4f233", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "155b23327a72", + "children": [ + { + "_key": "af4ade57adbd", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "1a03076bc759", + "markDefs": [ + { + "_type": "link", + "href": "https://community.seqera.io/", + "_key": "e74404e2eb95" + } + ], + "children": [ + { + "marks": [ + "e74404e2eb95" + ], + "text": "community.seqera.io", + "_key": "73b05bae50d3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4e77ed56b803", + "children": [ + { + "_type": "span", + "text": "", + "_key": "f9bf2c9506d1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "7c1d5d497e41", + "_type": "span", + "marks": [], + "text": "The growth of the Nextflow community over recent years has been phenomenal. The Nextflow Slack organization was launched in early 2022 and has already reached a membership of nearly 3,000 members. As we look ahead to growing to 5,000 and even 50,000, we are making a new tool available to the community: a community forum." + } + ], + "_type": "block", + "style": "normal", + "_key": "bf9423671615" + }, + { + "_key": "858f7adffc4f", + "children": [ + { + "text": "", + "_key": "17726e6bf75d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "We expect the new forum to coexist with the Nextflow Slack. The forum will be great at medium-format discussion, whereas Slack is largely designed for short-term ephemeral conversations. We want to support this growth of the community and believe the new forum will allow us to scale.", + "_key": "c5768d76da1c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "fc5b290a50a4", + "markDefs": [] + }, + { + "_key": "9c4529845186", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9b2e7cb27ae1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Discourse is an open-source, web-based platform designed for online community discussions and forum-style interactions. Discourse offers a user-friendly interface, real-time notifications, and a wide range of customization options. It prioritizes healthy and productive conversations by promoting user-friendly features, such as trust levels, gamification, and robust moderation tools. Discourse is well known for its focus on fostering engaging and respectful discussions and already caters to many large developer communities. It’s able to serve immense groups, giving us confidence that it will meet the needs of our growing developer community just as well. We believe that Discourse is a natural fit for the evolution of the Nextflow community.", + "_key": "9b429a450948" + } + ], + "_type": "block", + "style": "normal", + "_key": "ebcbe96a4209" + }, + { + "_key": "29b18444f5a7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "62d28e0b31de" + } + ], + "_type": "block", + "style": "normal" + }, + { + "alt": "", + "_key": "b475d3efbbb1", + "asset": { + "_ref": "image-28cbe24b43f8006f77554c66508689cefd2bda05-2880x1800-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The community forum offers many exciting new features. Here are some of the things you can expect:", + "_key": "119cee315a2a" + } + ], + "_type": "block", + "style": "normal", + "_key": "b0360160729a" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "06c6647ff96c" + } + ], + "_type": "block", + "style": "normal", + "_key": "887bf39895b0" + }, + { + "_type": "block", + "style": "normal", + "_key": "9be2362eeaf9", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "**Open content:** Content on the new forum is public – accessible without login, indexed by Google, and can be linked to directly. This means that it will be much easier to find answers to your problems, as well as share solutions on other platforms.", + "_key": "54e5230a4547" + }, + { + "_type": "span", + "marks": [], + "text": "**Don’t ask the same thing twice:** It’s not always easy to find answers when there’s a lot of content available. The community forum helps you by suggesting similar topics as you write a new post. An upcoming [Discourse AI Bot](https://www.discourse.org/plugins/ai.html) may even allow you to ask questions using natural language in the future!", + "_key": "791d58ab71ae" + }, + { + "marks": [], + "text": "**Stay afloat:** The community forum will ensure developers have a space where they can post without fear that what they write might be drowned out, and where anything that our community finds useful will rise to the top of the list. Discourse will give life to threads with high-quality content that may have otherwise gone unnoticed and lost in a sea of new posts.", + "_key": "c66fbc153f27", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "**Better organized:** The forum model for categories, tags, threads, and quoting forces conversations to be structured. Many questions involve the broader Nextflow ecosystem, tagging with multiple topics will cut through the noise and allow people to participate in targeted and well-labeled discussions. Importantly, maintainers can move miscategorized posts without asking the original author to delete and write again.", + "_key": "28443b6942db" + }, + { + "_key": "9b0a042d9aec", + "_type": "span", + "marks": [], + "text": "**Multi-product:** The forum has categories for Nextflow but also [Seqera Platform](https://seqera.io/platform/), [MultiQC](https://seqera.io/multiqc/), [Wave](https://seqera.io/wave/), and [Fusion](https://seqera.io/fusion/). Questions that involve multiple Seqera products can now span these boundaries, and content can be shared between posts easily." + }, + { + "text": "**Community recognition:** The community forum will encourage a healthy ecosystem of developers that provides value to everyone involved and rewards the most active users. The new forum encourages positive community behaviors through features such as badges, a trust system, and community moderation. There’s even a [community leaderboard](https://community.seqera.io/leaderboard/)! We plan to gradually introduce additional features over time as adoption grows.", + "_key": "0703c64871ba", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "e6a92b958392", + "children": [ + { + "_key": "ae5a0e27e721", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "137813757803", + "markDefs": [ + { + "_key": "a3341627d352", + "_type": "link", + "href": "https://community.seqera.io/c/community/site-feedback/2" + }, + { + "href": "https://community.seqera.io/", + "_key": "63aacade928b", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Online discussion platforms have been the beating heart of the Nextflow community from its inception. The first was a Google groups email list, which was followed by the Gitter instant messaging platform, GitHub Discussions, and most recently, Slack. We’re thrilled to embark on this new chapter of the Nextflow community – let us know what you think and ask any questions you might have in the ", + "_key": "87763473690d", + "_type": "span" + }, + { + "_key": "d076a1b0823e", + "_type": "span", + "marks": [ + "a3341627d352" + ], + "text": "“Site Feedback” forum category" + }, + { + "text": "! Join us today at ", + "_key": "9cac44f4d60d", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "63aacade928b" + ], + "text": "https://community.seqera.io", + "_key": "d954ac0ada57" + }, + { + "_type": "span", + "marks": [], + "text": " for a new and improved developer experience.", + "_key": "3d303b376c5e" + } + ], + "_type": "block" + }, + { + "_key": "81b9c93e1169", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c117e901c960" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://community.seqera.io/", + "_key": "101428e19487", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "101428e19487" + ], + "text": "Visit the Seqera community forum", + "_key": "48829f8aa46a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f086819b9410" + } + ], + "tags": [ + { + "_type": "reference", + "_key": "793adedbd536", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_ref": "3d25991c-f357-442b-a5fa-6c02c3419f88", + "_type": "reference", + "_key": "8416df80d2a6" + } + ], + "meta": { + "slug": { + "current": "community-forum" + } + }, + "_type": "blogPost", + "_id": "9d92d5bf3f41", + "_updatedAt": "2024-09-26T09:03:40Z" + }, + { + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_id": "9de65bebe53d", + "meta": { + "slug": { + "current": "using-docker-in-hpc-cluster" + }, + "description": "Scientific data analysis pipelines are rarely composed by a single piece of software. In a real world scenario, computational pipelines are made up of multiple stages, each of which can execute many different scripts, system commands and external tools deployed in a hosting computing environment, usually an HPC cluster." + }, + "_type": "blogPost", + "publishedAt": "2014-11-06T07:00:00.000Z", + "_createdAt": "2024-09-25T14:14:53Z", + "_rev": "2PruMrLMGpvZP5qAknm9s4", + "_updatedAt": "2024-10-02T13:38:31Z", + "title": "Using Docker for scientific data analysis in an HPC cluster", + "tags": [], + "body": [ + { + "style": "normal", + "_key": "a25b08b8f56a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Scientific data analysis pipelines are rarely composed by a single piece of software. In a real world scenario, computational pipelines are made up of multiple stages, each of which can execute many different scripts, system commands and external tools deployed in a hosting computing environment, usually an HPC cluster.", + "_key": "21a8e2818342" + } + ], + "_type": "block" + }, + { + "_key": "907655974abe", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b9c4c3807595", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "49cc2b96a907", + "markDefs": [], + "children": [ + { + "_key": "530e70036d22", + "_type": "span", + "marks": [], + "text": "As I work as a research engineer in a bioinformatics lab I experience on a daily basis the difficulties related on keeping such a piece of software consistent." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2735c64f6cdc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "96364a8569d8" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Computing environments can change frequently in order to test new pieces of software or maybe because system libraries need to be updated. For this reason replicating the results of a data analysis over time can be a challenging task.", + "_key": "3934f31485ce" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc13a6bbc553" + }, + { + "style": "normal", + "_key": "685177a023e3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "145c02a03cd3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "848fa4196541", + "markDefs": [ + { + "href": "http://www.docker.com", + "_key": "857bdc2f8141", + "_type": "link" + } + ], + "children": [ + { + "marks": [ + "857bdc2f8141" + ], + "text": "Docker", + "_key": "7a4165200edd", + "_type": "span" + }, + { + "_key": "4c988b03fe0d", + "_type": "span", + "marks": [], + "text": " has emerged recently as a new type of virtualisation technology that allows one to create a self-contained runtime environment. There are plenty of examples showing the benefits of using it to run application services, like web servers or databases." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e4d8376e4189", + "markDefs": [], + "children": [ + { + "_key": "b78bfd6ffe15", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "However it seems that few people have considered using Docker for the deployment of scientific data analysis pipelines on distributed cluster of computer, in order to simplify the development, the deployment and the replicability of this kind of applications.", + "_key": "2ef9662fe703" + } + ], + "_type": "block", + "style": "normal", + "_key": "966b19a5e97e", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "2fc19aabd0e0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "3414ca206b77", + "_type": "span" + } + ] + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://www.crg.eu", + "_key": "cd0e0b3d5c50" + } + ], + "children": [ + { + "text": "For this reason I wanted to test the capabilities of Docker to solve these problems in the cluster available in our ", + "_key": "fc61cb5812a1", + "_type": "span", + "marks": [] + }, + { + "_key": "7a360bced9bb", + "_type": "span", + "marks": [ + "cd0e0b3d5c50" + ], + "text": "institute" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d661f0453985" + } + ], + "_type": "block", + "style": "normal", + "_key": "1cc8b6454482" + }, + { + "_type": "block", + "style": "normal", + "_key": "7629d3de1266", + "markDefs": [], + "children": [ + { + "_key": "76da63e7914d", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "h2", + "_key": "315712d85ed0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Method", + "_key": "8bb982d2616d" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "http://www.univa.com/products/grid-engine.php", + "_key": "30aafdbb8c01" + }, + { + "_key": "528676649d6b", + "_type": "link", + "href": "http://registry.hub.docker.com" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Docker engine has been installed in each node of our cluster, that runs a ", + "_key": "30e6b189c0f7" + }, + { + "marks": [ + "30aafdbb8c01" + ], + "text": "Univa grid engine", + "_key": "c19c3428be52", + "_type": "span" + }, + { + "marks": [], + "text": " resource manager. A Docker private registry instance has also been installed in our internal network, so that images can be pulled from the local repository in a much faster way when compared to the public ", + "_key": "d254e73cbce3", + "_type": "span" + }, + { + "text": "Docker registry", + "_key": "2b1005b58d1c", + "_type": "span", + "marks": [ + "528676649d6b" + ] + }, + { + "marks": [], + "text": ".", + "_key": "7d2a6c6bf231", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "68959c99ae6c" + }, + { + "style": "normal", + "_key": "b9e6b3a330c5", + "markDefs": [], + "children": [ + { + "_key": "30c775a9d7c0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "e83ed6476beb", + "markDefs": [ + { + "_key": "51b40f4451f1", + "_type": "link", + "href": "http://www.gridengine.eu/mangridengine/htmlman5/complex.html" + } + ], + "children": [ + { + "_key": "73e7928d35b1", + "_type": "span", + "marks": [], + "text": "Moreover the Univa grid engine has been configured with a custom " + }, + { + "marks": [ + "51b40f4451f1" + ], + "text": "complex", + "_key": "6923f788cefe", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " resource type. This allows us to request a specific Docker image as a resource type while submitting a job execution to the cluster.", + "_key": "d0ad3d4b82a1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "d1e6e8237056", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c75dc5d9e29b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "a94140a87dce", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Docker image is requested as a ", + "_key": "6e94c79b603f" + }, + { + "marks": [ + "em" + ], + "text": "soft", + "_key": "89a626adb300", + "_type": "span" + }, + { + "_key": "32156455e022", + "_type": "span", + "marks": [], + "text": " resource, by doing that the UGE scheduler tries to run a job to a node where that image has already been pulled, otherwise a lower priority is given to it and it is executed, eventually, by a node where the specified Docker image is not available. This will force the node to pull the required image from the local registry at the time of the job execution." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ae16149b3b9f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "28bde6dcc640" + }, + { + "markDefs": [ + { + "_key": "d99d7cf25964", + "_type": "link", + "href": "https://github.com/cbcrg/piper-nf" + } + ], + "children": [ + { + "text": "This environment has been tested with ", + "_key": "ccd5d71ff786", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "d99d7cf25964" + ], + "text": "Piper-NF", + "_key": "9b9942a451b5" + }, + { + "marks": [], + "text": ", a genomic pipeline for the detection and mapping of long non-coding RNAs.", + "_key": "dc2383411fc4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "96a9dff42d0c" + }, + { + "children": [ + { + "_key": "4ce27f61de6e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f8e6f4bf0cf0", + "markDefs": [] + }, + { + "style": "normal", + "_key": "467bfe39b228", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The pipeline runs on top of Nextflow, which takes care of the tasks parallelization and submits the jobs for execution to the Univa grid engine.", + "_key": "298907ca1526" + } + ], + "_type": "block" + }, + { + "_key": "c58da74a21dd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9f37ce0dcfe9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "The Piper-NF code wasn't modified in order to run it using Docker. Nextflow is able to handle it automatically. The Docker containers are run in such a way that the tasks result files are created in the hosting file system, in other words it behaves in a completely transparent manner without requiring extra steps or affecting the flow of the pipeline execution.", + "_key": "bf36dae331c5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "63571e36d056" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e466207d5bc4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "506ba08f5671" + }, + { + "markDefs": [ + { + "_key": "e6d913eae644", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/docker.html" + } + ], + "children": [ + { + "text": "It was only necessary to specify the Docker image (or images) to be used in the Nextflow configuration file for the pipeline. You can read more about this at ", + "_key": "25639f706326", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e6d913eae644" + ], + "text": "this link", + "_key": "222284abb483" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "4ff5371122f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "1fd1bfa59d24" + }, + { + "_key": "7a535c4f399d", + "markDefs": [], + "children": [ + { + "_key": "04405ed2f160", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "59fd51c05b3d", + "_type": "span", + "marks": [], + "text": "Results" + } + ], + "_type": "block", + "style": "h2", + "_key": "23b8dfd2f6fd", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "To benchmark the impact of Docker on the pipeline performance a comparison was made running it with and without Docker.", + "_key": "80f513ffb104", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c514a23baab3", + "markDefs": [] + }, + { + "_key": "1a3121b4c3bd", + "markDefs": [], + "children": [ + { + "_key": "0fc80faa04e1", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "dc2f2fa26c98", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For this experiment 10 cluster nodes were used. The pipeline execution launches around 100 jobs, and it was run 5 times by using the same dataset with and without Docker.", + "_key": "04272fc661bf" + } + ] + }, + { + "_key": "6e5071ea700e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8694183d05db" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "fcc69544fbe0", + "markDefs": [], + "children": [ + { + "_key": "0bfbde9e5b22", + "_type": "span", + "marks": [], + "text": "The average execution time without Docker was 28.6 minutes, while the average pipeline execution time, running each job in a Docker container, was 32.2 minutes. Thus, by using Docker the overall execution time increased by something around 12.5%." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8010702b167d" + } + ], + "_type": "block", + "style": "normal", + "_key": "de87f9734107", + "markDefs": [] + }, + { + "children": [ + { + "text": "It is important to note that this time includes both the Docker bootstrap time, and the time overhead that is added to the task execution by the virtualization layer.", + "_key": "58c8a0547238", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e9b6ebfb64d6", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a6c16e43e354" + } + ], + "_type": "block", + "style": "normal", + "_key": "069850deaa54" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "For this reason the actual task run time was measured as well i.e. without including the Docker bootstrap time overhead. In this case, the aggregate average task execution time was 57.3 minutes and 59.5 minutes when running the same tasks using Docker. Thus, the time overhead added by the Docker virtualization layer to the effective task run time can be estimated to around 4% in our test.", + "_key": "b9381c8c62e5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "681b379a78fb" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "f8019ef1d722", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "47fa9b5c3fde", + "markDefs": [] + }, + { + "_key": "833a6d34561a", + "markDefs": [ + { + "_type": "link", + "href": "https://registry.hub.docker.com/repos/cbcrg/", + "_key": "1904cb29f48e" + } + ], + "children": [ + { + "marks": [], + "text": "Keeping the complete toolset required by the pipeline execution within a Docker image dramatically reduced configuration and deployment problems. Also storing these images into the private and ", + "_key": "305dea57c370", + "_type": "span" + }, + { + "text": "public", + "_key": "a893fda2d701", + "_type": "span", + "marks": [ + "1904cb29f48e" + ] + }, + { + "_type": "span", + "marks": [], + "text": " repositories with a unique tag allowed us to replicate the results without the usual burden required to set-up an identical computing environment.", + "_key": "9596447b4002" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "a40887c4fe43", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "07022de84527", + "markDefs": [] + }, + { + "style": "h2", + "_key": "e989ea5a171a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "df3594c6f0cf" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The fast start-up time for Docker containers technology allows one to virtualise a single process or the execution of a bunch of applications, instead of a complete operating system. This opens up new possibilities, for example the possibility to "virtualise" distributed job executions in an HPC cluster of computers.", + "_key": "b610b92c2f8d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3b29c57d7283" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "274eeadce480" + } + ], + "_type": "block", + "style": "normal", + "_key": "38dd9964c872" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The minimal performance loss introduced by the Docker engine is offset by the advantages of running your analysis in a self-contained and dead easy to reproduce runtime environment, which guarantees the consistency of the results over time and across different computing platforms.", + "_key": "256591933336" + } + ], + "_type": "block", + "style": "normal", + "_key": "61ef627676a3" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4407a9d4873f" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc7bba825666", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Credits", + "_key": "f49a24119778" + } + ], + "_type": "block", + "style": "h2", + "_key": "74d033d7a132", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0284aeb960ed", + "markDefs": [], + "children": [ + { + "text": "Thanks to Arnau Bria and the all scientific systems admins team to manage the Docker installation in the CRG computing cluster.", + "_key": "4fcaeb424993", + "_type": "span", + "marks": [] + } + ] + } + ] + }, + { + "publishedAt": "2015-06-09T06:00:00.000Z", + "_createdAt": "2024-09-25T14:14:54Z", + "_rev": "Ot9x7kyGeH5005E3MJ8jOF", + "meta": { + "slug": { + "current": "innovation-in-science-the-story-behind-nextflow" + }, + "description": "Innovation can be viewed as the application of solutions that meet new requirements or existing market needs. Academia has traditionally been the driving force of innovation. Scientific ideas have shaped the world, but only a few of them were brought to market by the inventing scientists themselves, resulting in both time and financial loses." + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "7baae13aa833", + "_type": "span", + "marks": [], + "text": "Innovation can be viewed as the application of solutions that meet new requirements or existing market needs. Academia has traditionally been the driving force of innovation. Scientific ideas have shaped the world, but only a few of them were brought to market by the inventing scientists themselves, resulting in both time and financial loses." + } + ], + "_type": "block", + "style": "normal", + "_key": "a2be3f1f832f" + }, + { + "style": "normal", + "_key": "1c5af5f2a373", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ebae1b93c6f8" + } + ], + "_type": "block" + }, + { + "_key": "86bfb0541d66", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Lately there have been several attempts to boost scientific innovation and translation, with most notable in Europe being the Horizon 2020 funding program. The problem with these types of funding is that they are not designed for PhDs and Postdocs, but rather aim to promote the collaboration of senior scientists in different institutions. This neglects two very important facts, first and foremost that most of the Nobel prizes were given for discoveries made when scientists were in their 20's / 30's (not in their 50's / 60's). Secondly, innovation really happens when a few individuals (not institutions) face a problem in their everyday life/work, and one day they just decide to do something about it (end-user innovation). Without realizing, these people address a need that many others have. They don’t do it for the money or the glory; they do it because it bothers them! Many examples of companies that started exactly this way include Apple, Google, and Virgin Airlines.", + "_key": "d5b540f77171" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "bcafa24b7e57", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d01647856223" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "8488bd7cd9a9", + "markDefs": [], + "children": [ + { + "_key": "31f95ec7419f", + "_type": "span", + "marks": [], + "text": "The story of Nextflow" + } + ] + }, + { + "children": [ + { + "text": "Similarly, Nextflow started as an attempt to solve the every-day computational problems we were facing with “big biomedical data” analyses. We wished that our huge and almost cryptic BASH-based pipelines could handle parallelization automatically. In our effort to make that happen we stumbled upon the ", + "_key": "612afe22983d", + "_type": "span", + "marks": [] + }, + { + "text": "Dataflow", + "_key": "4966ecb26fed", + "_type": "span", + "marks": [ + "b57662986664" + ] + }, + { + "_key": "c70f6786231e", + "_type": "span", + "marks": [], + "text": " programming model and Nextflow was created. We were getting furious every time our two-week long pipelines were crashing and we had to re-execute them from the beginning. We, therefore, developed a caching system, which allows Nextflow to resume any pipeline from the last executed step. While we were really enjoying developing a new " + }, + { + "_type": "span", + "marks": [ + "8394a05e9476" + ], + "text": "DSL", + "_key": "8943ff92be62" + }, + { + "text": " and creating our own operators, at the same time we were not willing to give up our favorite Perl/Python scripts and one-liners, and thus Nextflow became a polyglot.", + "_key": "d49bb678b414", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "cb13d099e328", + "markDefs": [ + { + "href": "http://en.wikipedia.org/wiki/Dataflow_programming", + "_key": "b57662986664", + "_type": "link" + }, + { + "_key": "8394a05e9476", + "_type": "link", + "href": "http://en.wikipedia.org/wiki/Domain-specific_language" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "96a7ecdef5b4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5a312794bc02" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ca7d8e16e685", + "markDefs": [ + { + "_key": "7480592f1c6f", + "_type": "link", + "href": "https://www.docker.com/" + }, + { + "href": "https://github.com", + "_key": "becc2d4710fa", + "_type": "link" + }, + { + "_type": "link", + "href": "https://bitbucket.org/", + "_key": "b8a8fb2b235d" + } + ], + "children": [ + { + "_key": "25eb0a849b96", + "_type": "span", + "marks": [], + "text": "Another problem we were facing was that our pipelines were invoking a lot of third-party software, making distribution and execution on different platforms a nightmare. Once again while searching for a solution to this problem, we were able to identify a breakthrough technology " + }, + { + "_type": "span", + "marks": [ + "7480592f1c6f" + ], + "text": "Docker", + "_key": "f9fa5b609ab4" + }, + { + "_type": "span", + "marks": [], + "text": ", which is now revolutionizing cloud computation. Nextflow has been one of the first framework, that fully supports Docker containers and allows pipeline execution in an isolated and easy to distribute manner. Of course, sharing our pipelines with our friends rapidly became a necessity and so we had to make Nextflow smart enough to support ", + "_key": "5eb896f7e94d" + }, + { + "_key": "43bf929f36c8", + "_type": "span", + "marks": [ + "becc2d4710fa" + ], + "text": "Github" + }, + { + "text": " and ", + "_key": "02635a4a80df", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "b8a8fb2b235d" + ], + "text": "Bitbucket", + "_key": "7a485b41d6c0" + }, + { + "text": " integration.", + "_key": "bd4649e35e59", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "55ff3fa3c70c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "41ab4d2246f7", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "ddb4dbc05e9d", + "_type": "span", + "marks": [], + "text": "I don’t know if Nextflow will make as much difference in the world as the Dataflow programming model and Docker container technology are making, but it has already made a big difference in our lives and that is all we ever wanted…" + } + ], + "_type": "block", + "style": "normal", + "_key": "c30a909f5adf", + "markDefs": [] + }, + { + "style": "normal", + "_key": "5ec0d02aaf68", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "43b0fa2ba2a2", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "0b77c16ea339", + "markDefs": [], + "children": [ + { + "_key": "b638fd3a1620", + "_type": "span", + "marks": [], + "text": "Conclusion" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Summarizing, it is a pity that PhDs and Postdocs are the neglected engine of Innovation. They are not empowered to innovate, by identifying and addressing their needs, and to potentially set up commercial solutions to their problems. This fact becomes even sadder when you think that only 3% of Postdocs have a chance to become PIs in the UK. Instead more and more money is being invested into the senior scientists who only require their PhD students and Postdocs to put another step into a well-defined ladder. In todays world it seems that ideas, such as Nextflow, will only get funded for their scientific value, not as innovative concepts trying to address a need.", + "_key": "93bca796ff84", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "95b287817bc0" + } + ], + "_updatedAt": "2024-10-02T13:41:40Z", + "title": "Innovation In Science - The story behind Nextflow", + "author": { + "_ref": "7d389002-0fae-4149-98d4-22623b6afbed", + "_type": "reference" + }, + "_type": "blogPost", + "_id": "9e0149c75053" + }, + { + "meta": { + "slug": { + "current": "nextflow-developer-environment" + } + }, + "publishedAt": "2021-03-04T07:00:00.000Z", + "_updatedAt": "2024-09-26T09:02:35Z", + "tags": [ + { + "_key": "7d27c5f42160", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_type": "blogPost", + "_id": "9f6dbbf11a2e", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_rev": "2PruMrLMGpvZP5qAknmCUY", + "title": "6 Tips for Setting Up Your Nextflow Dev Environment", + "_createdAt": "2024-09-25T14:16:02Z", + "body": [ + { + "_type": "block", + "style": "normal", + "_key": "c8216ae8bc07", + "markDefs": [], + "children": [ + { + "text": "This blog follows up the Learning Nextflow in 2020 blog [post](https://www.nextflow.io/blog/2020/learning-nextflow-in-2020.html).", + "_key": "baea13971feb", + "_type": "span", + "marks": [ + "em" + ] + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "cd1a014b34b1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "35773849ebf9" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This guide is designed to walk you through a basic development setup for writing Nextflow pipelines.", + "_key": "b29577b408a3" + } + ], + "_type": "block", + "style": "normal", + "_key": "384e07386d8b" + }, + { + "_key": "945b045b1063", + "children": [ + { + "text": "", + "_key": "2b7b71ab9d3d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "185122270abc", + "_type": "span", + "text": "1. Installation" + } + ], + "_type": "block", + "style": "h3", + "_key": "cc4455ab3db1" + }, + { + "markDefs": [ + { + "_key": "2b1b6ebcc748", + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/wsl/install-win10" + } + ], + "children": [ + { + "marks": [], + "text": "Nextflow runs on any Linux compatible system and MacOS with Java installed. Windows users can rely on the ", + "_key": "16de800a7e79", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "2b1b6ebcc748" + ], + "text": "Windows Subsystem for Linux", + "_key": "a1d7e7c33a03" + }, + { + "marks": [], + "text": ". Installing Nextflow is straightforward. You just need to download the ", + "_key": "f31aeb8e728e", + "_type": "span" + }, + { + "_key": "5f6016136103", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow" + }, + { + "_type": "span", + "marks": [], + "text": " executable. In your terminal type the following commands:", + "_key": "66ee62e7ede1" + } + ], + "_type": "block", + "style": "normal", + "_key": "bf29c31a1d5b" + }, + { + "style": "normal", + "_key": "d17b4a46e121", + "children": [ + { + "_type": "span", + "text": "", + "_key": "569811fd6eeb" + } + ], + "_type": "block" + }, + { + "code": "$ curl get.nextflow.io | bash\n$ sudo mv nextflow /usr/local/bin", + "_type": "code", + "_key": "1c814b48a55e" + }, + { + "style": "normal", + "_key": "32801bc1ff15", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0b5e04bc88fb" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "The first line uses the curl command to download the nextflow executable, and the second line moves the executable to your PATH. Note ", + "_key": "e71dc74e32c9", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "/usr/local/bin", + "_key": "d4abc629acf6" + }, + { + "marks": [], + "text": " is the default for MacOS, you might want to choose ", + "_key": "860380200f6a", + "_type": "span" + }, + { + "text": "~/bin", + "_key": "1b23d2c9ca53", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " or ", + "_key": "d546ea964f49" + }, + { + "_key": "87bbce39dff5", + "_type": "span", + "marks": [ + "code" + ], + "text": "/usr/bin" + }, + { + "text": " depending on your PATH definition and operating system.", + "_key": "7acb44128d55", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "abf34f69104d" + }, + { + "_type": "block", + "style": "normal", + "_key": "19639c682cb3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "c0e9e5d7361b" + } + ] + }, + { + "children": [ + { + "_key": "09786655aa47", + "_type": "span", + "text": "2. Text Editor or IDE?" + } + ], + "_type": "block", + "style": "h3", + "_key": "fd5c0a16b777" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow pipelines can be written in any plain text editor. I'm personally a bit of a Vim fan, however, the advent of the modern IDE provides a more immersive development experience.", + "_key": "82913152fe33" + } + ], + "_type": "block", + "style": "normal", + "_key": "596bd6fbae5a", + "markDefs": [] + }, + { + "_key": "f6d267b17671", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b57868bf9f63" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://code.visualstudio.com/download", + "_key": "e946c83f2eb8" + } + ], + "children": [ + { + "text": "My current choice is Visual Studio Code which provides a wealth of add-ons, the most obvious of these being syntax highlighting. With ", + "_key": "6a0ba850b5e6", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e946c83f2eb8" + ], + "text": "VSCode installed", + "_key": "323c79f69f0f" + }, + { + "text": ", you can search for the Nextflow extension in the marketplace.", + "_key": "475cc7978db9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "140b54ba282a" + }, + { + "style": "normal", + "_key": "c5783c52cac8", + "children": [ + { + "text": "", + "_key": "80b91bcce9ba", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "VSCode with Nextflow Syntax Highlighting", + "_key": "a25636a4b194", + "asset": { + "_type": "reference", + "_ref": "image-46057944042068bf75b8c10ecf400e2a2813f736-1600x966-png" + } + }, + { + "style": "normal", + "_key": "227c18352d20", + "children": [ + { + "text": "", + "_key": "e83f71ae33db", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Other syntax highlighting has been made available by the community including:", + "_key": "1cdfa72fad84", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7fd3d6ab271d" + }, + { + "children": [ + { + "_key": "9a7c79509562", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "349c1edce9ff" + }, + { + "_key": "eabbdb81aa1c", + "listItem": "bullet", + "children": [ + { + "_key": "e5aef83e0750", + "_type": "span", + "marks": [], + "text": "[Atom](https://atom.io/packages/language-nextflow)" + }, + { + "text": "[Vim](https://github.com/LukeGoodsell/nextflow-vim)", + "_key": "efef0a8dcb0b", + "_type": "span", + "marks": [] + }, + { + "text": "[Emacs](https://github.com/Emiller88/nextflow-mode)", + "_key": "5e333bd3d838", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b412b511a7ae", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9bea3e344026" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "4c371ac95ade", + "_type": "span", + "text": "3. The Nextflow REPL console" + } + ], + "_type": "block", + "style": "h3", + "_key": "45830c179224" + }, + { + "children": [ + { + "_key": "650c3df33eee", + "_type": "span", + "marks": [], + "text": "The Nextflow console is a REPL (read-eval-print loop) environment that allows one to quickly test part of a script or segments of Nextflow code in an interactive manner. This can be particularly useful to quickly evaluate channels and operators behaviour and prototype small snippets that can be included in your pipeline scripts." + } + ], + "_type": "block", + "style": "normal", + "_key": "bb19bdaecf5b", + "markDefs": [] + }, + { + "_key": "e448cf228140", + "children": [ + { + "_key": "4d5e7c19b132", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f6f4b6393b47", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Start the Nextflow console with the following command:", + "_key": "3ce365642e62" + } + ] + }, + { + "_key": "ae7b461d75a7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "89da3ec9e8a0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ nextflow console", + "_type": "code", + "_key": "222a73d8b62f" + }, + { + "children": [ + { + "text": "", + "_key": "34372c2d80f6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "12c5de946f00" + }, + { + "_type": "image", + "alt": "Nextflow REPL console", + "_key": "a9cb994c6431", + "asset": { + "_ref": "image-6f3138f809d97af5e3a85e3ab2497fd0176b8e75-1174x810-png", + "_type": "reference" + } + }, + { + "_key": "cea3a2ad26a3", + "children": [ + { + "_type": "span", + "text": "", + "_key": "46d3b977f70e" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "98cf3b40d8f8", + "markDefs": [], + "children": [ + { + "text": "Use the ", + "_key": "5c606cfc35d6", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "CTRL+R", + "_key": "d153eb3b5345", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " keyboard shortcut to run (", + "_key": "60e393c29855" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "⌘+R", + "_key": "0ab12a8bb048" + }, + { + "text": "on the Mac) and to evaluate your code. You can also evaluate by selecting code and use the ", + "_key": "9a3c425c5b29", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": "Run selection", + "_key": "f7c37edd2f60", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "4b2cfd278f1e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "e1be84c5755a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "884c64999d32" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "868fd834d18f", + "children": [ + { + "_key": "3fbcc4c2e122", + "_type": "span", + "text": "4. Containerize all the things" + } + ] + }, + { + "style": "normal", + "_key": "8b6e7e88a0ba", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Containers are a key component of developing scalable and reproducible pipelines. We can build Docker images that contain an OS, all libraries and the software we need for each process. Pipelines are typically developed using Docker containers and tooling as these can then be used on many different container engines such as Singularity and Podman.", + "_key": "ff8176eae89e", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "fca42ce6f204", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "396d31dc18f8" + }, + { + "_type": "block", + "style": "normal", + "_key": "221f5fdd90e0", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.docker.com/engine/install/", + "_key": "c8cbc8462485" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you have ", + "_key": "3038f38b6545" + }, + { + "text": "downloaded and installed Docker", + "_key": "2be77154c648", + "_type": "span", + "marks": [ + "c8cbc8462485" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", try pull a public docker image:", + "_key": "a261b0a8af1c" + } + ] + }, + { + "_key": "6179d57c67cd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "16639c0ef72d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ docker pull quay.io/nextflow/rnaseq-nf", + "_type": "code", + "_key": "8b095c2ed71c" + }, + { + "_key": "f70b13f87b8e", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1ef1ff3a75da" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d192baf967cb", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To run a Nextflow pipeline using the latest tag of the image, we can use:", + "_key": "293c61d93b7e", + "_type": "span" + } + ] + }, + { + "_key": "b775af109bba", + "children": [ + { + "_key": "2b8220fe7194", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "nextflow run nextflow-io/rnaseq-nf -with-docker quay.io/nextflow/rnaseq-nf:latest", + "_type": "code", + "_key": "1bcfb7419981" + }, + { + "_key": "08093eed6c02", + "children": [ + { + "text": "", + "_key": "398e99711b63", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "53db58d0f758", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/training/#_manage_dependencies_containers", + "_key": "107be351aa22" + } + ], + "children": [ + { + "marks": [], + "text": "To learn more about building Docker containers, see the ", + "_key": "f45d07e3d9f7", + "_type": "span" + }, + { + "_key": "575fcf632f19", + "_type": "span", + "marks": [ + "107be351aa22" + ], + "text": "Seqera Labs tutorial" + }, + { + "marks": [], + "text": " on managing dependencies with containers.", + "_key": "a426844c2774", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "77f2cd784c47", + "children": [ + { + "_type": "span", + "text": "", + "_key": "17addf9ba414" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "16d8f5e61dc9", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, you can install the VSCode marketplace addon for Docker to manage and interactively run and test the containers and images on your machine. You can even connect to remote registries such as Dockerhub, Quay.io, AWS ECR, Google Cloud and Azure Container registries.", + "_key": "3f9a4fc8cfc3" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f4896c16b10f", + "children": [ + { + "_key": "ed6edc30a081", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "image", + "alt": "VSCode with Docker Extension", + "_key": "fc74ddac5c77", + "asset": { + "_type": "reference", + "_ref": "image-7130bbb5139e37b38e4554d3d8ae0683571479e1-1600x891-png" + } + }, + { + "_key": "36e1cbe648b7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "93b79a7e589d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "8e10bed8173a", + "children": [ + { + "_key": "391785249836", + "_type": "span", + "text": "5. Use Tower to monitor your pipelines" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "2ff2e59e1d6a", + "markDefs": [ + { + "_type": "link", + "href": "https://tower.nf", + "_key": "04888447b21e" + } + ], + "children": [ + { + "text": "When developing real-world pipelines, it can become inevitable that pipelines will require significant resources. For long-running workflows, monitoring becomes all the more crucial. With ", + "_key": "73e6016f3af2", + "_type": "span", + "marks": [] + }, + { + "_key": "2b0b58007cc8", + "_type": "span", + "marks": [ + "04888447b21e" + ], + "text": "Nextflow Tower" + }, + { + "marks": [], + "text": ", we can invoke any Nextflow pipeline execution from the CLI and use the integrated dashboard to follow the workflow run.", + "_key": "b44820b00a42", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "309de3fbc39a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "93453427f64b" + }, + { + "children": [ + { + "text": "Sign-in to Tower using your GitHub credentials, obtain your token from the Getting Started page and export them into your terminal, ", + "_key": "d68d63824b45", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "~/.bashrc", + "_key": "e9c904a62fa4" + }, + { + "_type": "span", + "marks": [], + "text": ", or include them in your nextflow.config.", + "_key": "d7ee7be8187a" + } + ], + "_type": "block", + "style": "normal", + "_key": "13812497c7a5", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b9eae55ecc18" + } + ], + "_type": "block", + "style": "normal", + "_key": "08441c45b30e" + }, + { + "code": "$ export TOWER_ACCESS_TOKEN=my-secret-tower-key", + "_type": "code", + "_key": "23b87d6a070c" + }, + { + "_type": "block", + "style": "normal", + "_key": "deec3f2da1ca", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ea25815762de" + } + ] + }, + { + "_key": "8e8dd1bac999", + "markDefs": [], + "children": [ + { + "_key": "5a8effaafbbb", + "_type": "span", + "marks": [], + "text": "We can then add the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-with-tower", + "_key": "fd9519c060bb" + }, + { + "_type": "span", + "marks": [], + "text": " child-option to any Nextflow run command. A URL with the monitoring dashboard will appear.", + "_key": "9f033abe5cde" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "aa2c69a6559a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "30b17bc29b9c" + } + ] + }, + { + "_key": "3d94f6e51c5c", + "code": "$ nextflow run nextflow-io/rnaseq-nf -with-tower", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "436ea22b13fc", + "children": [ + { + "text": "", + "_key": "15f91331ffef", + "_type": "span" + } + ] + }, + { + "_key": "771208634a3a", + "children": [ + { + "_key": "79853855730e", + "_type": "span", + "text": "6. nf-core tools" + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "34a901105331", + "markDefs": [ + { + "_type": "link", + "href": "https://nf-co.re/", + "_key": "b2bf9ddf48cd" + } + ], + "children": [ + { + "_key": "a7b530278830", + "_type": "span", + "marks": [ + "b2bf9ddf48cd" + ], + "text": "nf-core" + }, + { + "marks": [], + "text": " is a community effort to collect a curated set of analysis pipelines built using Nextflow. The pipelines continue to come on in leaps and bounds and nf-core tools is a python package for helping with developing nf-core pipelines. It includes options for listing, creating, and even downloading pipelines for offline usage.", + "_key": "5a92affff4a0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "4c4ad91e4993", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8ed9dc297b8b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "c1555964fba5", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/", + "_key": "e65204e3dae2" + } + ], + "children": [ + { + "text": "These tools are particularly useful for developers contributing to the community pipelines on ", + "_key": "094c520696df", + "_type": "span", + "marks": [] + }, + { + "text": "GitHub", + "_key": "52e4c9139deb", + "_type": "span", + "marks": [ + "e65204e3dae2" + ] + }, + { + "_type": "span", + "marks": [], + "text": " with linting and syncing options that keep pipelines up-to-date against nf-core guidelines.", + "_key": "86235dbddeb6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e61e548e5237", + "children": [ + { + "_key": "1cc7c456d37f", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "nf-core tools", + "_key": "5457ae7b8f5e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " is a python package that can be installed in your development environment from Bioconda or PyPi.", + "_key": "b18abeb98d06", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6f213b44e00b", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "a28e8e45f0b2", + "children": [ + { + "_key": "3b4d9500f2a7", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "$ conda install nf-core", + "_type": "code", + "_key": "68ee1c400169" + }, + { + "style": "normal", + "_key": "e995aa968870", + "children": [ + { + "text": "", + "_key": "2362a6261961", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "or", + "_key": "3e5fc6ac8700" + } + ], + "_type": "block", + "style": "normal", + "_key": "8365c21628b4", + "markDefs": [] + }, + { + "children": [ + { + "_key": "031e985c2bd4", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "42452766ba52" + }, + { + "code": "$ pip install nf-core", + "_type": "code", + "_key": "7d90d1e8a372" + }, + { + "style": "normal", + "_key": "9896671f827b", + "children": [ + { + "text": "", + "_key": "d8070892db5b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "9ffdd5c319f9", + "asset": { + "_ref": "image-ebd33dce7e6c15adb14e331a26bb6e647a2a299a-1450x1022-png", + "_type": "reference" + }, + "_type": "image", + "alt": "nf-core tools" + }, + { + "_type": "block", + "style": "normal", + "_key": "817387a8bec1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2f92dc68b5a9" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "1d57f03de998" + } + ], + "_type": "block", + "style": "h3", + "_key": "ab7498283a57" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Developer workspaces are evolving rapidly. While your own development environment may be highly dependent on personal preferences, community contributions are keeping Nextflow users at the forefront of the modern developer experience.", + "_key": "b4dadce8b09e" + } + ], + "_type": "block", + "style": "normal", + "_key": "7290c3dc5d2a", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "06e1b2dce95a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ef3d89f74f0e" + }, + { + "_key": "b62845ca9057", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/features/codespaces", + "_key": "e74354fc9ce9" + }, + { + "_type": "link", + "href": "https://www.gitpod.io/", + "_key": "217845d136cd" + } + ], + "children": [ + { + "marks": [], + "text": "Solutions such as ", + "_key": "177e7be968a9", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "e74354fc9ce9" + ], + "text": "GitHub Codespaces", + "_key": "46572964aade" + }, + { + "text": " and ", + "_key": "413a457618a4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "217845d136cd" + ], + "text": "Gitpod", + "_key": "0b9fb34520ef" + }, + { + "text": " are now offering extendible, cloud-based options that may well be the future. I’m sure we can all look forward to a one-click, pre-configured, cloud-based, Nextflow developer environment sometime soon!", + "_key": "843335091dc0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + } + ] + }, + { + "meta": { + "slug": { + "current": "how_i_became_a_nextflow_ambassador" + } + }, + "publishedAt": "2024-07-24T06:00:00.000Z", + "_type": "blogPost", + "_id": "9fb1300d740c", + "author": { + "_ref": "ntV3A5cVsWRByk7zltFcTw", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:18:00Z", + "title": "How I became a Nextflow Ambassador!", + "_updatedAt": "2024-09-26T09:04:31Z", + "_rev": "5lTkDsqMC29L3wnnkjjRta", + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "4dbff4eedc67" + } + ], + "body": [ + { + "style": "normal", + "_key": "500c426b9862", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "As a PhD student in bioinformatics, I aimed to build robust pipelines to analyze diverse datasets throughout my research. Initially, mastering Bash scripting was a time-consuming challenge, but this journey ultimately led me to become a Nextflow Ambassador, engaging actively with the expert Nextflow community.", + "_key": "f0ec468509f3" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b3010d6d888d" + } + ], + "_type": "block", + "style": "normal", + "_key": "8f41af747a9e" + }, + { + "_type": "block", + "_key": "269ffce4ec8b" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.linkedin.com/in/firaszemzem/", + "_key": "b00b03e4be78" + }, + { + "_type": "link", + "href": "https://www.google.com/search?q=things+to+do+in+tunisia&sca_esv=3b07b09e3325eaa7&sca_upv=1&udm=15&biw=1850&bih=932&ei=AS2eZuqnFpG-i-gPwciJyAk&ved=0ahUKEwiqrOiRsbqHAxUR3wIHHUFkApkQ4dUDCBA&uact=5&oq=things+to+do+in+tunisia&gs_lp=Egxnd3Mtd2l6LXNlcnAiF3RoaW5ncyB0byBkbyBpbiB0dW5pc2lhMgUQABiABDIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yBhAAGBYYHjIGEAAYFhgeMgYQABgWGB4yCBAAGBYYHhgPSOIGULYDWNwEcAF4AZABAJgBfaAB9gGqAQMwLjK4AQPIAQD4AQGYAgOgAoYCwgIKEAAYsAMY1gQYR5gDAIgGAZAGCJIHAzEuMqAH_Aw&sclient=gws-wiz-serp", + "_key": "b295948da86c" + } + ], + "children": [ + { + "marks": [], + "text": "My name is ", + "_key": "e4ff87fe5d71", + "_type": "span" + }, + { + "text": "Firas Zemzem", + "_key": "a4a464124878", + "_type": "span", + "marks": [ + "b00b03e4be78" + ] + }, + { + "text": ", a PhD student based in ", + "_key": "e034b3751759", + "_type": "span", + "marks": [] + }, + { + "text": "Tunisia", + "_key": "c151d21585cc", + "_type": "span", + "marks": [ + "b295948da86c" + ] + }, + { + "_type": "span", + "marks": [], + "text": " working with the Laboratory of Cytogenetics, Molecular Genetics, and Biology of Reproduction at CHU Farhat Hached Sousse. I was specialized in human genetics, focusing on studying genomics behind neurodevelopmental disorders. Hence Developing methods for detecting SNPs and variants related to my work was crucial step for advancing medical research and improving patient outcomes. On the other hand, pipelines integration and bioinformatics tools were essential in this process, enabling efficient data analysis, accurate variant detection, and streamlined workflows that enhance the reliability and reproducibility of our findings.", + "_key": "eef6ae3ba131" + } + ], + "_type": "block", + "style": "normal", + "_key": "f7bad2946c46" + }, + { + "_type": "block", + "style": "normal", + "_key": "1e824428b392", + "children": [ + { + "text": "", + "_key": "6a9ffb1f5f41", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "af26b8c40c39", + "children": [ + { + "_key": "b1e436ab8ac4", + "_type": "span", + "text": "The initial nightmare of Bash" + } + ] + }, + { + "_key": "c7eb6cef405c", + "markDefs": [], + "children": [ + { + "text": "During my master's degree, I was a steadfast user of Bash scripting. Bash had been my go-to tool for automating tasks and managing workflows in my bioinformatics projects, such as variant calling. Its simplicity and versatility made it an indispensable part of my toolkit. I was writing Bash scripts for various next-generation sequencing (NGS) high-throughput analyses, including data preprocessing, quality control, alignment, and variant calling. However, as my projects grew more complex, I began to encounter the limitations of Bash. Managing dependencies, handling parallel executions, and ensuring reproducibility became increasingly challenging. Handling the vast amount of data generated by NGS and other high-throughput technologies was cumbersome. Using Bash became a nightmare for debugging and maintaining. I spent countless hours trying to make it work, only to be met with more errors and inefficiencies. It was nearly impossible to scale for larger datasets and more complex analyses. Additionally, managing different environments and versions of tools was beyond Bash's capabilities. I needed a solution that could handle these challenges more gracefully.", + "_key": "08499eeac344", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "3cdb36d2b3fc", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "c369112d86e0" + }, + { + "style": "h2", + "_key": "f2551aef5795", + "children": [ + { + "text": "Game-Changing Call", + "_key": "7c6daef80b69", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "One evening, I received a call from my friend, Mr. HERO, a bioinformatician. As we discussed our latest projects, I vented my frustrations with Bash. Mr. HERO, as I called him, the problem-solver, mentioned a tool called Nextflow. He described how it had revolutionized his workflow, making complex pipeline management a breeze. Intrigued, I decided to look into it.", + "_key": "2a51f6d33e55", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e24a87139eb9" + }, + { + "_key": "286f542fee1e", + "children": [ + { + "text": "", + "_key": "12838e3f1e72", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h2", + "_key": "fd3507fa5855", + "children": [ + { + "_type": "span", + "text": "Diving Into the process", + "_key": "a66563cea88b" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "de12d4c447ae", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/index.html", + "_key": "d839d21972e9" + }, + { + "_type": "link", + "href": "https://training.nextflow.io/", + "_key": "278e369aa155" + } + ], + "children": [ + { + "text": "Reading the ", + "_key": "20fd2136fab0", + "_type": "span", + "marks": [] + }, + { + "_key": "c49af488fd23", + "_type": "span", + "marks": [ + "d839d21972e9" + ], + "text": "documentation" + }, + { + "_type": "span", + "marks": [], + "text": " and watching ", + "_key": "f7e0fdf05547" + }, + { + "marks": [ + "278e369aa155" + ], + "text": "tutorials", + "_key": "88097777e1b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " were my first steps. Nextflow's approach to workflow management was a revelation. Unlike Bash, Nextflow was designed to address the complexities of modern computational questions. It provided a transparent, declarative syntax for defining tasks and their dependencies and supported parallel execution out of the box. The first thing I did when I decided to convert one of my existing Bash scripts into a Nextflow pipeline was to start experimenting with simple code. Doing this was no small feat. I had to rethink my approach to workflow design and embrace a new way of defining tasks and dependencies. My learning curve was not too steep, so understanding how to translate my Bash logic into Nextflow's domain-specific language (DSL) was not that hard.", + "_key": "9c6b80dd16b9" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "7490ef653da9", + "children": [ + { + "text": "", + "_key": "13f7dbc1aacc", + "_type": "span" + } + ] + }, + { + "_key": "97aa3c6cc743", + "children": [ + { + "text": "Eureka Moment: First run", + "_key": "d8012d4d90b6", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "_key": "3be3fa88345f", + "_type": "span", + "marks": [], + "text": "The first time I ran my Nextflow pipeline, I was amazed by how smoothly and efficiently it handled tasks that previously took hours to debug and execute in Bash. Nextflow managed task dependencies, parallel execution, and error handling with ease, resulting in a faster, more reliable, and maintainable pipeline. The ability to run pipelines on different computing environments, from local machines to high-performance clusters and cloud platforms, was a game-changer. Several Nextflow features were particularly valuable: Containerization Support using Docker and Singularity ensured consistency across environments; Error Handling with automatic retry mechanisms and detailed error reporting saved countless debugging hours; Portability and scalability allowed seamless execution on various platforms; Modularity facilitated the reuse and combination of processes across different pipelines, enhancing efficiency and organization; and Reproducibility features, including versioning and traceability, ensured that workflows could be reliably reproduced and shared across different research projects and teams." + } + ], + "_type": "block", + "style": "normal", + "_key": "8f18d87f00d8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "51d426ac72cd", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0ae8afd271d4" + } + ], + "_type": "block" + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "0fef4013c695", + "asset": { + "_type": "reference", + "_ref": "image-5bead16d3c6ba20b1181e0add921b9523e537232-762x406-png" + } + }, + { + "style": "h2", + "_key": "0f8f43a0db21", + "children": [ + { + "text": "New Horizons: Becoming a Nextflow Ambassador", + "_key": "7faf2c4cd73f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Switching from Bash scripting to Nextflow was more than just adopting a new tool. It was about embracing a new mindset. Nextflow’s emphasis on scalability, reproducibility, and ease of use transformed how I approached bioinformatics. The initial effort to learn Nextflow paid off in spades, leading to more robust, maintainable, and scalable workflows. My enthusiasm and advocacy for Nextflow didn't go unnoticed. Recently, I became a Nextflow Ambassador. This role allows me to further contribute to the community, promote best practices, and support new users as they embark on their own Nextflow journeys.", + "_key": "f9f81a58099f", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b8be3b5b4d6d", + "markDefs": [] + }, + { + "_key": "19592a71b2fb", + "children": [ + { + "text": "", + "_key": "4973bbb12e4a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "886f2977f6d5", + "_type": "span", + "text": "Future Projects and Community Engagement" + } + ], + "_type": "block", + "style": "h2", + "_key": "8b2c0b6f2b9b" + }, + { + "_key": "1432b7d2fd36", + "markDefs": [], + "children": [ + { + "_key": "a2c44677f669", + "_type": "span", + "marks": [], + "text": "Currently I am working on developing a Nextflow pipeline with my team that will help in analyzing variants, providing valuable insights for medical and clinical applications. This pipeline aims to improve the accuracy and efficiency of variant detection, ultimately supporting better diagnostic for patients with various genetic conditions. As part of my ongoing efforts within the Nextflow community, I am planning a series of projects aimed at developing and sharing advanced Nextflow pipelines tailored to specific genetic rare disorder analyses. These initiative will include detailed tutorials, case studies, and collaborative efforts with other researchers to enhance the accessibility and utility of Nextflow for various bioinformatics applications. Additionally, I plan to host workshops and seminars to spread knowledge and best practices among my colleagues and other researchers. This will help foster a collaborative environment where we can all benefit from the power and flexibility of Nextflow." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "ecf5ef2257db" + } + ], + "_type": "block", + "style": "normal", + "_key": "633e50989de1" + }, + { + "style": "h2", + "_key": "bce776024513", + "children": [ + { + "_type": "span", + "text": "Invitation for researchers over the world", + "_key": "e77719e02a68" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "b1dad293bec9", + "markDefs": [ + { + "href": "https://www.nextflow.io/slack-invite.html", + "_key": "fdebd7e0e6d4", + "_type": "link" + }, + { + "_key": "a83bb660d5d5", + "_type": "link", + "href": "https://community.seqera.io" + } + ], + "children": [ + { + "_key": "f6eb7bbb8cec", + "_type": "span", + "marks": [], + "text": "As a Nextflow Ambassador, I invite you to become part of a dynamic group of experts and enthusiasts dedicated to advancing workflow automation. Whether you're just starting or looking to deepen your knowledge, our community offers invaluable resources, support, and networking opportunities. You can chat with us on the " + }, + { + "text": "Nextflow Slack Workspace", + "_key": "d657a5d37719", + "_type": "span", + "marks": [ + "fdebd7e0e6d4" + ] + }, + { + "_key": "71a6c0cf89f5", + "_type": "span", + "marks": [], + "text": " and ask your questions at the " + }, + { + "_key": "ad0d39137c35", + "_type": "span", + "marks": [ + "a83bb660d5d5" + ], + "text": "Seqera Community Forum" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "080fd61997f4" + } + ], + "_type": "block" + } + ] + }, + { + "publishedAt": "2019-04-18T06:00:00.000Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "release-19.04.0-stable" + } + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "2e3e0e408c36", + "_type": "span", + "marks": [], + "text": "We are excited to announce the new Nextflow 19.04.0 stable release!" + } + ], + "_type": "block", + "style": "normal", + "_key": "1293ce798784" + }, + { + "children": [ + { + "_key": "5e1c6945b7dd", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "1158e366f38a" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This version includes numerous bug fixes, enhancement and new features.", + "_key": "f3d193d22b73" + } + ], + "_type": "block", + "style": "normal", + "_key": "373b5a5afc34" + }, + { + "children": [ + { + "_key": "d12ac6bac022", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "cd5e9a1cb1a9" + }, + { + "style": "h4", + "_key": "2ae16458573f", + "children": [ + { + "_key": "c781eb85b887", + "_type": "span", + "text": "Rich logging" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this release, we are making the new interactive rich output using ANSI escape characters as the default logging option. This produces a much more readable and easy to follow log of the running workflow execution.", + "_key": "9596bf43567e" + } + ], + "_type": "block", + "style": "normal", + "_key": "841bfab52a1f" + }, + { + "_type": "block", + "style": "normal", + "_key": "f0d0046580e1", + "children": [ + { + "text": "", + "_key": "cea18b4e8d54", + "_type": "span" + } + ] + }, + { + "_key": "61a307c9a3ad", + "src": "https://asciinema.org/a/IrT6uo85yyVoOjPa6KVzT2FXQ.js", + "_type": "script", + "id": "asciicast-IrT6uo85yyVoOjPa6KVzT2FXQ" + }, + { + "_type": "block", + "style": "normal", + "_key": "753ecb757082", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The ANSI log is implicitly disabled when the nextflow is launched in the background i.e. when using the ", + "_key": "edd8c20fd0d1", + "_type": "span" + }, + { + "text": "-bg", + "_key": "0ede067d2522", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " option. It can also be explicitly disabled using the ", + "_key": "0982d5a9e28f", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "-ansi-log false", + "_key": "72b809afd602", + "_type": "span" + }, + { + "marks": [], + "text": " option or setting the ", + "_key": "1c54585aad94", + "_type": "span" + }, + { + "text": "NXF_ANSI_LOG=false", + "_key": "081e542eff7a", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " variable in your launching environment.", + "_key": "73dfe5eb7b2e", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2dd57c01ffd8", + "children": [ + { + "text": "", + "_key": "2a4b53bc34b8", + "_type": "span" + } + ] + }, + { + "_key": "4a43a604aeb5", + "children": [ + { + "_type": "span", + "text": "NCBI SRA data source", + "_key": "96dce75dea67" + } + ], + "_type": "block", + "style": "h4" + }, + { + "_key": "0848c6628481", + "markDefs": [ + { + "_key": "e8537f3ccb99", + "_type": "link", + "href": "/blog/2019/release-19.03.0-edge.html" + } + ], + "children": [ + { + "text": "The support for NCBI SRA archive was introduced in the ", + "_key": "8218b827579e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e8537f3ccb99" + ], + "text": "previous edge release", + "_key": "939ba6616744" + }, + { + "_key": "78d3bcb898f7", + "_type": "span", + "marks": [], + "text": ". Given the very positive reaction, we are graduating this feature into the stable release for general availability." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "22ed3aa384b2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "bda6e8d615a5" + }, + { + "_type": "block", + "style": "h4", + "_key": "ca5e05eea5a5", + "children": [ + { + "_type": "span", + "text": "Sharing", + "_key": "0238a45d2864" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "bfeabbaab108", + "markDefs": [ + { + "_type": "link", + "href": "https://gitea.io", + "_key": "a3ce3eb0758f" + } + ], + "children": [ + { + "marks": [], + "text": "This version includes also a new Git repository provider for the ", + "_key": "8a0d6a3ee0fe", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "a3ce3eb0758f" + ], + "text": "Gitea", + "_key": "d5e84d7da68f" + }, + { + "_key": "24d3a7dc8755", + "_type": "span", + "marks": [], + "text": " self-hosted source code management system, which is added to the already existing support for GitHub, Bitbucket and GitLab sharing platforms." + } + ] + }, + { + "style": "normal", + "_key": "916d3684f3a4", + "children": [ + { + "_key": "ed4aa0278117", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h4", + "_key": "eaedb8c385ea", + "children": [ + { + "_type": "span", + "text": "Reports and metrics", + "_key": "cf1bd52cac86" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Finally, this version includes important enhancements and bug fixes for the task executions metrics collected by Nextflow. If you are using this feature we strongly suggest updating Nextflow to this version.", + "_key": "3310738bc78d" + } + ], + "_type": "block", + "style": "normal", + "_key": "988b3591dd8d", + "markDefs": [] + }, + { + "style": "normal", + "_key": "5f82f78d5f1a", + "children": [ + { + "text": "", + "_key": "e9de57710282", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "Remember that updating can be done with the ", + "_key": "cd05f915c458", + "_type": "span" + }, + { + "text": "nextflow -self-update", + "_key": "ed549cb472db", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " command.", + "_key": "8f18357a43ee" + } + ], + "_type": "block", + "style": "normal", + "_key": "fe245998dbc5", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "a83957b9891d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "30f44037eac2" + }, + { + "children": [ + { + "text": "Changelog", + "_key": "3b371dfe46a0", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "f1f1f37b9dd0" + }, + { + "children": [ + { + "marks": [], + "text": "The complete list of changes and bug fixes is available on GitHub at ", + "_key": "f114e425ecfb", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9277ebde3433" + ], + "text": "this link", + "_key": "04bd083b25fe" + }, + { + "_key": "92b162b1216e", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "478114a01913", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/releases/tag/v19.04.0", + "_key": "9277ebde3433" + } + ] + }, + { + "style": "normal", + "_key": "c5b121a89469", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a4a9334cf6b0" + } + ], + "_type": "block" + }, + { + "_key": "21a51a852bec", + "children": [ + { + "_type": "span", + "text": "Contributions", + "_key": "6d192925f413" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "text": "Special thanks to all people contributed to this release by reporting issues, improving the docs or submitting (patiently) a pull request (sorry if we have missed somebody):", + "_key": "8a6e4adceba9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "379654628375", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "0472cc5ec0be", + "children": [ + { + "_type": "span", + "text": "", + "_key": "48509edd5656" + } + ] + }, + { + "style": "normal", + "_key": "34fa7855e4f9", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "[Alex Cerjanic](https://github.com/acerjanic)", + "_key": "e6c558d9e076" + }, + { + "marks": [], + "text": "[Anthony Underwood](https://github.com/aunderwo)", + "_key": "3f29cb4b1ceb", + "_type": "span" + }, + { + "text": "[Akira Sekiguchi](https://github.com/pachiras)", + "_key": "ba6f02a2b71e", + "_type": "span", + "marks": [] + }, + { + "text": "[Bill Flynn](https://github.com/wflynny)", + "_key": "41c6749012b7", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[Jorrit Boekel](https://github.com/glormph)", + "_key": "82f01c17baba" + }, + { + "text": "[Olga Botvinnik](https://github.com/olgabot)", + "_key": "af7a27fcb5d4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "[Ólafur Haukur Flygenring](https://github.com/olifly)", + "_key": "72c1432b4cc6" + }, + { + "_type": "span", + "marks": [], + "text": "[Sven Fillinger](https://github.com/sven1103)", + "_key": "7f5cdfde074f" + } + ], + "_type": "block" + } + ], + "tags": [ + { + "_key": "f7b5b91dcdfe", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_createdAt": "2024-09-25T14:15:45Z", + "_type": "blogPost", + "title": "Nextflow 19.04.0 stable release is out!", + "_updatedAt": "2024-09-26T09:02:16Z", + "_rev": "rsIQ9Jd8Z4nKBVUruy4PCk", + "_id": "a0070dc35376" + }, + { + "_id": "a1ada165cea2", + "publishedAt": "2024-05-01T06:00:00.000Z", + "body": [ + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Check out Nextflow's newest plugin, nf-schema! It's an enhanced version of nf-validation, utilizing JSON schemas to validate parameters and sample sheets. Unlike its predecessor, it supports the latest JSON schema draft and can convert pipeline-generated files. But what's the story behind its development?", + "_key": "a4513a814304" + } + ], + "_type": "block", + "style": "normal", + "_key": "fec211d6854c" + }, + { + "_key": "4a60da1375ba", + "children": [ + { + "_key": "711ff68fccdf", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "_key": "4762847799c3" + }, + { + "style": "normal", + "_key": "e0bd57810240", + "markDefs": [], + "children": [ + { + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "32fc1682a90d", + "_type": "span" + }, + { + "marks": [], + "text": " is a well-known Nextflow plugin that uses JSON schemas to validate parameters and sample sheets. It can also convert sample sheets to channels using a built-in channel factory. On top of that, it can create a nice summary of pipeline parameters and can even be used to generate a help message for the pipeline.", + "_key": "3423ab3d47b3", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "97c901cb385a", + "children": [ + { + "text": "", + "_key": "19dc5a60d977", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "All of this has made the plugin very popular in the Nextflow community, but it wasn’t without its issues. For example, the plugin uses an older version of the JSON schema draft, namely draft ", + "_key": "de58a944071e" + }, + { + "marks": [ + "code" + ], + "text": "07", + "_key": "90c4df2ec381", + "_type": "span" + }, + { + "marks": [], + "text": " while the latest draft is ", + "_key": "f988e11e04f7", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "2020-12", + "_key": "42de36ebbebc", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". It also can’t convert any files/sample sheets created by the pipeline itself since the channel factory is only able to access values from pipeline parameters.", + "_key": "686d3e58beaa" + } + ], + "_type": "block", + "style": "normal", + "_key": "f33719669746" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "10b1dc1abf6a" + } + ], + "_type": "block", + "style": "normal", + "_key": "99adf1dae275" + }, + { + "style": "normal", + "_key": "d329bf40cbed", + "markDefs": [], + "children": [ + { + "_key": "904c2d34b806", + "_type": "span", + "marks": [], + "text": "But then " + }, + { + "text": "nf-schema", + "_key": "fdacef52b02d", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " came to the rescue! In this plugin we rewrote large parts of the ", + "_key": "44155a6d5db1" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "da5bb41ffb14" + }, + { + "_type": "span", + "marks": [], + "text": " code, making the plugin way faster and more flexible while adding a lot of requested features. Let’s see what’s been changed in this new and improved version of ", + "_key": "1f75962be5b8" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "a92b1a228b0f" + }, + { + "_key": "0d29924d2694", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "259c2a4938c2", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "26a767b57b7f" + }, + { + "children": [ + { + "_key": "356bad26a35d", + "_type": "span", + "text": "What a shiny new JSON schema draft" + } + ], + "_type": "block", + "style": "h1", + "_key": "09aa31544fef" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "To quote the official JSON schema website:", + "_key": "17c0cb183931", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "30c0bfe165be" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "66e844c9e6be" + } + ], + "_type": "block", + "style": "normal", + "_key": "c6494b9f4475" + }, + { + "_key": "3fbbe1546465", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> “JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.”", + "_key": "a5e2e2f95261" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "ec5b1fcdda94", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "98b7b7bcbbdb" + }, + { + "markDefs": [], + "children": [ + { + "_key": "ad2e3b248f14", + "_type": "span", + "marks": [], + "text": "This one sentence does an excellent job of explaining what JSON schema is and why it was such a great fit for " + }, + { + "_key": "7e943e238b72", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation" + }, + { + "marks": [], + "text": " and ", + "_key": "ea5aba23e645", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-schema", + "_key": "45572758ebed" + }, + { + "_type": "span", + "marks": [], + "text": ". By using these schemas, we can validate pipeline inputs in a way that would otherwise be impossible. The JSON schema drafts define a set of annotations that are used to set some conditions to which the data has to adhere. In our case, this can be used to determine what a parameter or sample sheet value should look like (this can range from what type of data it has to be to a specific pattern that the data has to follow).", + "_key": "159395f1f512" + } + ], + "_type": "block", + "style": "normal", + "_key": "aa6f0eebd48e" + }, + { + "_type": "block", + "style": "normal", + "_key": "a351e474ddc0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d1fb7788551d" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "The JSON schema draft ", + "_key": "d6dae2c781c0", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "07", + "_key": "d56cfd0cb090", + "_type": "span" + }, + { + "text": " already has a lot of useful annotations, but it lacked some special annotations that could elevate our validations to the next level. That’s where the JSON schema draft ", + "_key": "f8d0ed579e80", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "2020-12", + "_key": "cf45673c9a04", + "_type": "span" + }, + { + "marks": [], + "text": " came in. This draft contained a lot more specialized annotations, like dependent requirements of values (if one value is set, another value also has to be set). Although this example was already possible in ", + "_key": "0d1704531262", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "1b8ea632e3cb", + "_type": "span" + }, + { + "text": ", it was poorly implemented and didn’t follow any consensus specified by the JSON schema team.", + "_key": "0bdaa9769665", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "750f0398462f" + }, + { + "_key": "4aa2be9d7fe9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2722d605c485" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "a2295722c5ea", + "asset": { + "_type": "reference", + "_ref": "image-fd1ad6e689a52b72d06e4c1ef912915cc41a71b7-762x675-jpg" + } + }, + { + "style": "h1", + "_key": "c9c446927d7a", + "children": [ + { + "_type": "span", + "text": "Bye-bye Channel Factory, hello Function", + "_key": "8b1873af790a" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "One major shortcoming in the ", + "_key": "b7157fb54e68", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "1995c4100fe2" + }, + { + "_key": "f4a9813507ef", + "_type": "span", + "marks": [], + "text": " plugin was the lack of the " + }, + { + "marks": [ + "code" + ], + "text": "fromSamplesheet", + "_key": "235ef27d914c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " channel factory to handle files created by the pipeline (or files imported from another pipeline as part of a meta pipeline). That’s why we decided to remove the ", + "_key": "b235bfd42409" + }, + { + "text": "fromSamplesheet", + "_key": "2728e2ff4847", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " channel factory and replace it with a function called ", + "_key": "215dda7571ce" + }, + { + "marks": [ + "code" + ], + "text": "samplesheetToList", + "_key": "c470c623b689", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " that can be deployed in an extremely flexible way. It takes two inputs: the sample sheet to be validated and converted, and the JSON schema used for the conversion. Both inputs can either be a ", + "_key": "1e904cace4cc" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "String", + "_key": "576234ce23a2" + }, + { + "text": " value containing the path to the files or a Nextflow ", + "_key": "8125b870ab99", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "file", + "_key": "e44183fd1ab7" + }, + { + "text": " object. By converting the channel factory to a function, we also decoupled the parameter schema from the actual sample sheet conversion. This means all validation and conversion of the sample sheet is now fully done by the ", + "_key": "9af752f7cd43", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "samplesheetToList", + "_key": "80eee62a76c2" + }, + { + "_type": "span", + "marks": [], + "text": " function. In ", + "_key": "e4e953826e0f" + }, + { + "_key": "ff8c9f61f5da", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation" + }, + { + "marks": [], + "text": ", you could add a relative path to another JSON schema to the parameter schema so that the plugin would validate the file given with that parameter using the supplied JSON schema. It was necessary to also add this for sample sheet inputs as they would not be validated otherwise. Due to the change described earlier, the schema should no longer be given to the sample sheet inputs because they will be validated twice that way. Last, but certainly not least, this function also introduces the possibility of using nested sample sheets. This was probably one of the most requested features and it’s completely possible right now! Mind that this feature only works for YAML and JSON sample sheets since CSV and TSV do not support nesting.", + "_key": "1b14ba0116de", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2c4ed230a2b2" + }, + { + "_type": "block", + "style": "normal", + "_key": "c8ba503c77b8", + "children": [ + { + "_type": "span", + "text": "", + "_key": "be93e2cb56b6" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "Configuration sensation", + "_key": "59784d954a7f" + } + ], + "_type": "block", + "style": "h1", + "_key": "cb474daea277" + }, + { + "children": [ + { + "marks": [], + "text": "In ", + "_key": "693e65d8f094", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "e6e488e20e2c" + }, + { + "_type": "span", + "marks": [], + "text": ", you could configure how the plugin worked by certain parameters (like ", + "_key": "9d8b08debc89" + }, + { + "marks": [ + "code" + ], + "text": "validationSchemaIgnoreParams", + "_key": "4e135834669f", + "_type": "span" + }, + { + "_key": "f7c641e7c78c", + "_type": "span", + "marks": [], + "text": ", which could be used to exempt certain parameters from the validation). These parameters have now been converted to proper configuration options under the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "validation", + "_key": "c0e0e34e64c3" + }, + { + "_type": "span", + "marks": [], + "text": " scope. The ", + "_key": "3bad84d22141" + }, + { + "text": "validationSchemaIgnoreParams", + "_key": "ac12cc2495c0", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " has even been expanded into two configuration options: ", + "_key": "ae4571ca10d0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "validation.ignoreParams", + "_key": "97bc4e7db0da" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "fa6af27726af" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "validation.defaultIgnoreParams", + "_key": "e777cffa7ee3" + }, + { + "_key": "e581803d8ed1", + "_type": "span", + "marks": [], + "text": ". The former is to be used by the pipeline user to exclude certain parameters from validation, while the latter is to be used by the pipeline developer to set which parameters should be ignored by default. The plugin combines both options so users no longer need to supply the defaults alongside their parameters that need to be ignored." + } + ], + "_type": "block", + "style": "normal", + "_key": "1ec32da87718", + "markDefs": [] + }, + { + "style": "normal", + "_key": "21011ba2eebb", + "children": [ + { + "_key": "fb83cbc73d9c", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "h1", + "_key": "2fb9951a57ab", + "children": [ + { + "text": "But, why not stick to nf-validation?", + "_key": "bd4422a1e609", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "bbe6c3bad400", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In February we released an earlier version of these changes as ", + "_key": "e7a666a0c610" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "ff9b34449feb" + }, + { + "_type": "span", + "marks": [], + "text": " version ", + "_key": "5cf52d484530" + }, + { + "text": "2.0.0", + "_key": "236330b73fe4", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": ". This immediately caused massive issues in quite some nf-core pipelines (I think I set a new record of how many pipelines could be broken by one release). This was due to the fact that a lot of pipelines didn’t pin the ", + "_key": "0a3d44eb930a", + "_type": "span", + "marks": [] + }, + { + "_key": "492d011079ca", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation" + }, + { + "_key": "d8d865f34715", + "_type": "span", + "marks": [], + "text": " version, so all these pipelines started pulling the newest version of " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation", + "_key": "83b8f63a89f5" + }, + { + "_key": "ff70501515bf", + "_type": "span", + "marks": [], + "text": ". The pipelines all started showing errors because this release contained breaking changes. For that reason we decided to remove the version " + }, + { + "_key": "0454334d9b79", + "_type": "span", + "marks": [ + "code" + ], + "text": "2.0.0" + }, + { + "marks": [], + "text": " release until more pipelines pinned their plugin versions.", + "_key": "3d8809f627f0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "86f86debaf22", + "children": [ + { + "text": "", + "_key": "0b8232ebbad5", + "_type": "span" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-239f27b0217caa526c6f764d89bf310d3a3cbd68-700x449-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "meme on bright landscape", + "_key": "8febfc4db68c" + }, + { + "children": [ + { + "marks": [], + "text": "Some discussion arose from this and we decided that version ", + "_key": "08c2733b760f", + "_type": "span" + }, + { + "text": "2.0.0", + "_key": "48cb42d9638a", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " would always cause issues since a lot of older versions of the nf-core pipelines didn’t pin their nf-validation version either, which would mean that all those older versions (that were probably running as production pipelines) would suddenly start breaking. That’s why there seemed to be only one sensible solution: make a new plugin with the breaking changes! And it would also need a new name. We started collecting feedback from the community and got some very nice suggestions. I made a poll with the 5 most popular suggestions and let everyone vote on their preferred options. The last place was tied between ", + "_key": "cec69e3d1055" + }, + { + "marks": [ + "code" + ], + "text": "nf-schemavalidator", + "_key": "0b3cc483bfaa", + "_type": "span" + }, + { + "_key": "f9ee8b13de9f", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "code" + ], + "text": "nf-validationutils", + "_key": "a0bdcdad8fda", + "_type": "span" + }, + { + "_key": "ee37dfd1b1d4", + "_type": "span", + "marks": [], + "text": ", both with 3 votes. In third place was " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-checker", + "_key": "94ce6625b953" + }, + { + "_type": "span", + "marks": [], + "text": " with 4 votes. The second place belonged to ", + "_key": "4907bd28871a" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation2", + "_key": "393799a5b8bb" + }, + { + "text": " with 7 votes. And with 13 votes we had a winner: ", + "_key": "90b126b2f06b", + "_type": "span", + "marks": [] + }, + { + "_key": "cfab906bf20e", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-schema" + }, + { + "_type": "span", + "marks": [], + "text": "!", + "_key": "626b46943343" + } + ], + "_type": "block", + "style": "normal", + "_key": "c8dc926688ca", + "markDefs": [] + }, + { + "_key": "8b57ba4050e3", + "children": [ + { + "text": "", + "_key": "11220320c5ac", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5322be25a297", + "markDefs": [], + "children": [ + { + "_key": "fd2c71d5fb7e", + "_type": "span", + "marks": [], + "text": "So, a fork was made of " + }, + { + "_key": "9b74baef694d", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-validation" + }, + { + "_type": "span", + "marks": [], + "text": " that we called ", + "_key": "9355dd8e469d" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-schema", + "_key": "905247761264" + }, + { + "_type": "span", + "marks": [], + "text": ". At this point, the only breaking change was the new JSON schema draft, but some other feature requests started pouring in. That’s the reason why the new ", + "_key": "669935b43343" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "samplesheetToList", + "_key": "f20c6fd3c7e1" + }, + { + "marks": [], + "text": " function and the configuration options were implemented before the first release of ", + "_key": "212ff02dd970", + "_type": "span" + }, + { + "_key": "dd8bb6e8c322", + "_type": "span", + "marks": [ + "code" + ], + "text": "nf-schema" + }, + { + "_type": "span", + "marks": [], + "text": " on the 22nd of April 2024.", + "_key": "a722e6ebd90f" + } + ] + }, + { + "_key": "fe63f2ff0f53", + "children": [ + { + "text": "", + "_key": "fcc90647c490", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "And to try and mitigate the same issue from ever happening again, we added an automatic warning when the pipeline is being run with an unpinned version of nf-schema:", + "_key": "1c28267a8794" + } + ], + "_type": "block", + "style": "normal", + "_key": "8c6bcbf9d2c6", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1bc316056e7b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b7e8253cf0a5" + } + ] + }, + { + "_key": "c1c83b3a7dae", + "asset": { + "_type": "reference", + "_ref": "image-698fd36bf5711cd33a2f8698de7d185307ae0475-1222x706-png" + }, + "_type": "image", + "alt": "meme on bright landscape" + }, + { + "children": [ + { + "text": "So, what’s next?", + "_key": "79f5134830cd", + "_type": "span" + } + ], + "_type": "block", + "style": "h1", + "_key": "8b4730a07b50" + }, + { + "_key": "f862941d617d", + "markDefs": [], + "children": [ + { + "text": "One of the majorly requested features is the support for nested parameters. The version ", + "_key": "705985247f01", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "2.0.0", + "_key": "d3253aefaff1" + }, + { + "_type": "span", + "marks": [], + "text": " already was getting pretty big so I decided not to implement any extra features into it. This is, however, one of the first features that I will try to tackle in version ", + "_key": "d0c13a9c5a99" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "2.1.0", + "_key": "4e6b9455cf99" + }, + { + "text": ".", + "_key": "636697ef8ca1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "6108cde97506", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e232521fca71" + }, + { + "_key": "8fffb61d5034", + "markDefs": [], + "children": [ + { + "text": "Furthermore, I’d also like to improve the functionality of the ", + "_key": "be783c71018e", + "_type": "span", + "marks": [] + }, + { + "_key": "a0b32372067d", + "_type": "span", + "marks": [ + "code" + ], + "text": "exists" + }, + { + "_key": "f79d219484af", + "_type": "span", + "marks": [], + "text": " keyword to also work for non-conventional paths (like s3 and azure paths)." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "6315a44185e8", + "children": [ + { + "text": "", + "_key": "8e802abc828f", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "5dc37dd531e8", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It’s also a certainty that some weird bugs will pop up over time, those will, of course, also be fixed.", + "_key": "139b42394912" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "11c9687736a9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "9f554c7c3e1a" + }, + { + "style": "h1", + "_key": "6c618ada4104", + "children": [ + { + "_key": "2f928f58c111", + "_type": "span", + "text": "Useful links" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Here are some useful links to get you started on using ", + "_key": "98c5f165299b" + }, + { + "text": "nf-schema", + "_key": "9953d344176e", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": ":", + "_key": "0adbda61159a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e28b645cda1f" + }, + { + "style": "normal", + "_key": "08009208a998", + "children": [ + { + "_key": "d7a7baa98e97", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "f8f7bb1de8c6", + "markDefs": [ + { + "_key": "fdb156f6aa74", + "_type": "link", + "href": "https://nextflow-io.github.io/nf-schema/latest/migration_guide/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you want to easily migrate from nf-validation to ", + "_key": "3ec880194e54" + }, + { + "marks": [ + "code" + ], + "text": "nf-schema", + "_key": "738fb112831b", + "_type": "span" + }, + { + "text": ", you can use the migration guide: ", + "_key": "634d2323a5d2", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "fdb156f6aa74" + ], + "text": "https://nextflow-io.github.io/nf-schema/latest/migration_guide/", + "_key": "8a04ea7611f1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "b0dbc39c2777", + "children": [ + { + "text": "", + "_key": "d3878d3c54fa", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "c2ca10ab78aa", + "markDefs": [ + { + "_key": "fe3e1325cc57", + "_type": "link", + "href": "https://nextflow-io.github.io/nf-schema/latest/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you are completely new to the plugin I suggest reading through the documentation: ", + "_key": "1a0832d74efd" + }, + { + "_key": "e8359b15d7c4", + "_type": "span", + "marks": [ + "fe3e1325cc57" + ], + "text": "https://nextflow-io.github.io/nf-schema/latest/" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b56bf0d23181", + "children": [ + { + "_type": "span", + "text": "", + "_key": "535ace53ae9e" + } + ] + }, + { + "_key": "e4f0e0c6d11c", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nf-schema/tree/master/examples", + "_key": "eb0ce1a617cc" + } + ], + "children": [ + { + "text": "If you need some examples, look no further: ", + "_key": "4675fb089ce6", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "eb0ce1a617cc" + ], + "text": "https://github.com/nextflow-io/nf-schema/tree/master/examples", + "_key": "b2940dbc2f60", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "ad917c336b59", + "children": [ + { + "_type": "span", + "text": "", + "_key": "acf4ca07880a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "10cb5305b26d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "And to conclude this blog post, here are some very wise words from Master Yoda himself:", + "_key": "f652c8e087aa", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "58da1793654b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "398089426392" + } + ] + }, + { + "_type": "image", + "alt": "meme on bright landscape", + "_key": "df414409c56f", + "asset": { + "_ref": "image-843ac5ff3a7a120ca30f9715e0ca6facb8ce52e8-620x714-jpg", + "_type": "reference" + } + } + ], + "tags": [ + { + "_type": "reference", + "_key": "9779239d538d", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "author": { + "_ref": "ntV3A5cVsWRByk7zltFcg5", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:18:25Z", + "_rev": "5lTkDsqMC29L3wnnkjjRvZ", + "_updatedAt": "2024-09-26T09:04:54Z", + "title": "nf-schema: the new and improved nf-validation", + "meta": { + "slug": { + "current": "nf-schema" + } + }, + "_type": "blogPost" + }, + { + "_createdAt": "2024-09-25T14:15:26Z", + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "meta": { + "slug": { + "current": "clarification-about-nextflow-license" + } + }, + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "b58becd39b3f" + } + ], + "_rev": "hf9hwMPb7ybAE3bqEU5rM7", + "body": [ + { + "children": [ + { + "marks": [], + "text": "Over past week there was some discussion on social media regarding the Nextflow license and its impact on users' workflow applications.", + "_key": "ee78b813fab1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "175bb65272ff", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "84ecccec59da", + "children": [ + { + "_key": "ac14e49589b7", + "_type": "span", + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "a96b485dbcce", + "markDefs": [ + { + "_type": "link", + "href": "https://t.co/Paip5W1wgG", + "_key": "87a0f40da5a3" + }, + { + "_type": "link", + "href": "https://twitter.com/klmr/status/1016606226103357440?ref_src=twsrc%5Etfw", + "_key": "a7c3f2001fbc" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> … don’t use Nextflow, yo. ", + "_key": "a657de1b93d1" + }, + { + "_type": "span", + "marks": [ + "87a0f40da5a3" + ], + "text": "https://t.co/Paip5W1wgG", + "_key": "c7f84741ceb4" + }, + { + "text": " > > — Konrad Rudolph 👨‍🔬💻 (@klmr) ", + "_key": "df20581549ab", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "a7c3f2001fbc" + ], + "text": "July 10, 2018", + "_key": "b57402b6d178" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0c254a636365" + } + ], + "_type": "block", + "style": "normal", + "_key": "76191fb8753d" + }, + { + "_key": "4cfe474751ac", + "src": "https://platform.twitter.com/widgets.js", + "_type": "script", + "id": "" + }, + { + "_type": "block", + "style": "normal", + "_key": "7ae88b926958", + "markDefs": [ + { + "href": "https://twitter.com/commonwl?ref_src=twsrc%5Etfw", + "_key": "8da087f4436d", + "_type": "link" + }, + { + "_key": "5fc2d23ea4a7", + "_type": "link", + "href": "https://t.co/mIbdLQQxmf" + }, + { + "_key": "61d1013f2820", + "_type": "link", + "href": "https://twitter.com/jdidion/status/1016612435938160640?ref_src=twsrc%5Etfw" + } + ], + "children": [ + { + "text": "> This is certainly disappointing. An argument in favor of writing workflows in ", + "_key": "90ad98d5ae9e", + "_type": "span", + "marks": [] + }, + { + "_key": "f4b040ecb095", + "_type": "span", + "marks": [ + "8da087f4436d" + ], + "text": "@commonwl" + }, + { + "text": ", which is independent of the execution engine. ", + "_key": "6a6046a57a31", + "_type": "span", + "marks": [] + }, + { + "_key": "a34dce862274", + "_type": "span", + "marks": [ + "5fc2d23ea4a7" + ], + "text": "https://t.co/mIbdLQQxmf" + }, + { + "_key": "7024a9be1a68", + "_type": "span", + "marks": [], + "text": " > > — John Didion (@jdidion) " + }, + { + "marks": [ + "61d1013f2820" + ], + "text": "July 10, 2018", + "_key": "e207361ccfce", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "990dc03bf6cb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "608e5a8d8321" + }, + { + "_type": "script", + "id": "", + "_key": "12505f3adfa8", + "src": "https://platform.twitter.com/widgets.js" + }, + { + "style": "normal", + "_key": "29ae1994035c", + "markDefs": [ + { + "href": "https://twitter.com/geoffjentry/status/1016656901139025921?ref_src=twsrc%5Etfw", + "_key": "a21919485f66", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "> GPL is generally considered toxic to companies due to fear of the viral nature of the license. > > — Jeff Gentry (@geoffjentry) ", + "_key": "62717671367d" + }, + { + "marks": [ + "a21919485f66" + ], + "text": "July 10, 2018", + "_key": "bec70789dbe9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "7a0e1f5969b7" + } + ], + "_type": "block", + "style": "normal", + "_key": "a93885b279ac" + }, + { + "_type": "script", + "id": "", + "_key": "dee581e61df8", + "src": "https://platform.twitter.com/widgets.js" + }, + { + "_type": "block", + "style": "h3", + "_key": "87bfec47e7b7", + "children": [ + { + "text": "What's the problem with GPL?", + "_key": "7316efd3be6b", + "_type": "span" + } + ] + }, + { + "_key": "86ba355feefd", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/nextflow/blob/c080150321e5000a2c891e477bb582df07b7f75f/src/main/groovy/nextflow/Nextflow.groovy", + "_key": "dfe16ce44dc5" + }, + { + "_key": "a1cd838b0b74", + "_type": "link", + "href": "https://www.kernel.org/doc/html/v4.17/process/license-rules.html" + }, + { + "_type": "link", + "href": "https://git-scm.com/about/free-and-open-source", + "_key": "e4ccca697567" + } + ], + "children": [ + { + "text": "Nextflow has been released under the GPLv3 license since its early days ", + "_key": "8010f6e6688a", + "_type": "span", + "marks": [] + }, + { + "_key": "de3a5796bda4", + "_type": "span", + "marks": [ + "dfe16ce44dc5" + ], + "text": "over 5 years ago" + }, + { + "text": ". GPL is a very popular open source licence used by many projects (like, for example, ", + "_key": "0f0640a12398", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "a1cd838b0b74" + ], + "text": "Linux", + "_key": "e55524b52f93", + "_type": "span" + }, + { + "text": " and ", + "_key": "75a849cec54e", + "_type": "span", + "marks": [] + }, + { + "text": "Git", + "_key": "f653cdc75a04", + "_type": "span", + "marks": [ + "e4ccca697567" + ] + }, + { + "_type": "span", + "marks": [], + "text": ") and it has been designed to promote the adoption and spread of open source software and culture.", + "_key": "cac16b0a10c8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9760cd034370", + "children": [ + { + "_key": "88045297fe1d", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "516568a7f149", + "markDefs": [], + "children": [ + { + "_key": "67ca6f3357e2", + "_type": "span", + "marks": [], + "text": "With this idea in mind, GPL requires the author of a piece of software, " + }, + { + "marks": [ + "em" + ], + "text": "derived", + "_key": "af1bdb65ba54", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " from a GPL licensed application or library, to distribute it using the same license i.e. GPL itself.", + "_key": "3665030fd62c" + } + ] + }, + { + "_key": "e414d92f0b02", + "children": [ + { + "_key": "d88862da47ae", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "cf606a9dd27b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This is generally good, because this requirement incentives the growth of the open source ecosystem and the adoption of open source software more widely.", + "_key": "8078718dc20d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a6bd1cb31c03" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a2e887730d9" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "However, this is also a reason for concern by some users and organizations because it's perceived as too strong requirement by copyright holders (who may not want to disclose their code) and because it can be difficult to interpret what a ", + "_key": "d01089b0e094" + }, + { + "_type": "span", + "text": "\\*", + "_key": "ffcfc0ec7171" + }, + { + "_type": "span", + "marks": [], + "text": "derived", + "_key": "d2705bd819a2" + }, + { + "_key": "a85fe6c2d77f", + "_type": "span", + "text": "\\*" + }, + { + "text": " application is. See for example ", + "_key": "2690b960e6b9", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "4d48789afaa4" + ], + "text": "this post by Titus Brown", + "_key": "ed12fb795b45", + "_type": "span" + }, + { + "text": " at this regard.", + "_key": "268f8bbf1dca", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d859390f48ab", + "markDefs": [ + { + "_type": "link", + "href": "http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html", + "_key": "4d48789afaa4" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "e2410d32c9e3", + "children": [ + { + "_key": "b779685b201f", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "block", + "style": "h4", + "_key": "017924093f51", + "children": [ + { + "_type": "span", + "text": "What's the impact of the Nextflow license on my application?", + "_key": "1e9fd9ba1417" + } + ] + }, + { + "markDefs": [ + { + "_key": "be15ceecf475", + "_type": "link", + "href": "https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDynamic" + }, + { + "_type": "link", + "href": "https://www.gnu.org/licenses/gpl-faq.en.html#IfInterpreterIsGPL", + "_key": "dfa2b3bfcc6c" + } + ], + "children": [ + { + "text": "If you are not distributing your application, based on Nextflow, it doesn't affect you in any way. If you are distributing an application that requires Nextflow to be executed, technically speaking your application is dynamically linking to the Nextflow runtime and it uses routines provided by it. For this reason your application should be released as GPLv3. See ", + "_key": "63c76845a7a9", + "_type": "span", + "marks": [] + }, + { + "text": "here", + "_key": "50c2a234346e", + "_type": "span", + "marks": [ + "be15ceecf475" + ] + }, + { + "_key": "a9e11a811a32", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "dfa2b3bfcc6c" + ], + "text": "here", + "_key": "81d384dcfed5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "02183fb1e08b" + } + ], + "_type": "block", + "style": "normal", + "_key": "426b98fe01e9" + }, + { + "style": "normal", + "_key": "a5b275e114bc", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0a086dd63d75" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "725359980b14", + "markDefs": [], + "children": [ + { + "text": "**However, this was not our original intention. We don’t consider workflow applications to be subject to the GPL copyleft obligations of the GPL even though they may link dynamically to Nextflow functionality through normal calls and we are not interested to enforce the license requirement to third party workflow developers and organizations. Therefore you can distribute your workflow application using the license of your choice. For other kind of derived applications the GPL license should be used, though. **", + "_key": "48defa127178", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "746ae2fc9c15", + "children": [ + { + "_type": "span", + "text": "", + "_key": "202297dcde47" + } + ] + }, + { + "children": [ + { + "text": "That's all?", + "_key": "bdca4b261402", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "1fb5ead2d223" + }, + { + "style": "normal", + "_key": "fbae09602d68", + "markDefs": [], + "children": [ + { + "_key": "aba2c8e39e71", + "_type": "span", + "marks": [], + "text": "No. We are aware that this is not enough and the GPL licence can impose some limitation in the usage of Nextflow to some users and organizations. For this reason we are working with the CRG legal department to move Nextflow to a more permissive open source license. This is primarily motivated by our wish to make it more adaptable and compatible with all the different open source ecosystems, but also to remove any remaining legal uncertainty that using Nextflow through linking with its functionality may cause." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "a4ed4b2d581b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "30cf55885fa1" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We are expecting that this decision will be made over the summer so stay tuned and continue to enjoy Nextflow.", + "_key": "d4a47a502983" + } + ], + "_type": "block", + "style": "normal", + "_key": "730f9465b484", + "markDefs": [] + } + ], + "_type": "blogPost", + "_id": "a27166ee2aaf", + "title": "Clarification about the Nextflow license", + "publishedAt": "2018-07-20T06:00:00.000Z", + "_updatedAt": "2024-09-26T09:01:54Z" + }, + { + "meta": { + "slug": { + "current": "5-more-tips-for-nextflow-user-on-hpc" + } + }, + "_rev": "mvya9zzDXWakVjnX4hhCWc", + "publishedAt": "2021-06-15T06:00:00.000Z", + "tags": [ + { + "_type": "reference", + "_key": "8dc4bd5f302a", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "body": [ + { + "_key": "1e079bccd85f", + "markDefs": [ + { + "_type": "link", + "href": "/blog/2021/5_tips_for_hpc_users.html", + "_key": "bfb7c4f40434" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In May we blogged about ", + "_key": "c046a7075e94" + }, + { + "marks": [ + "bfb7c4f40434" + ], + "text": "Five Nextflow Tips for HPC Users", + "_key": "8dffc4f9faa5", + "_type": "span" + }, + { + "_key": "f2ddefd9788f", + "_type": "span", + "marks": [], + "text": " and now we continue the series with five additional tips for deploying Nextflow with on HPC batch schedulers." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "19821646d6ac", + "children": [ + { + "text": "", + "_key": "9ddf2ce8d302", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h3", + "_key": "dc2bfb4d8f87", + "children": [ + { + "_type": "span", + "text": "1. Use the scratch directive", + "_key": "01ef60790a82" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "a5e06a741038", + "_type": "span", + "marks": [], + "text": "To allow the pipeline tasks to share data with each other, Nextflow requires a shared file system path as a working directory. When using this model, a common recommendation is to use the node's local scratch storage as the job working directory to avoid unnecessary use of the network shared file system and achieve better performance." + } + ], + "_type": "block", + "style": "normal", + "_key": "ff3d627e7f99" + }, + { + "_key": "c2f1ac172f15", + "children": [ + { + "_key": "b0358dd5e166", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "766afde1bbbd", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow implements this best-practice which can be enabled by adding the following setting in your ", + "_key": "6b6ea5ab02b9" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "45490e6d5d3b" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "6735391e2415" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d7ace5b3ca18", + "children": [ + { + "_key": "5da7b5339ecc", + "_type": "span", + "text": "" + } + ] + }, + { + "code": "process.scratch = true", + "_type": "code", + "_key": "02fe0e00b2a7" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b2b279450e48" + } + ], + "_type": "block", + "style": "normal", + "_key": "d2d7bb191194" + }, + { + "style": "normal", + "_key": "d80fb760f6fa", + "markDefs": [], + "children": [ + { + "text": "When using this option, Nextflow:", + "_key": "0c47b9a53384", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "8244e6f55d52", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce810f8e9d10" + }, + { + "_key": "23dc40c260cf", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "Creates a unique directory in the computing node's local `/tmp` or the path assigned by your cluster via the `TMPDIR` environment variable.", + "_key": "d4066fbc5255" + }, + { + "text": "Creates a [symlink](https://en.wikipedia.org/wiki/Symbolic_link) for each input file required by the job execution.", + "_key": "14c7471ec385", + "_type": "span", + "marks": [] + }, + { + "marks": [], + "text": "Runs the job in the local scratch path. Copies the job output files into the job shared work directory assigned by Nextflow.", + "_key": "6b3195981c77", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d4afe9f642bf", + "children": [ + { + "_key": "c69eb890a3eb", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "2. Use -bg option to launch the execution in the background", + "_key": "d8e69ea668d2" + } + ], + "_type": "block", + "style": "h3", + "_key": "b7d2badc9baf" + }, + { + "children": [ + { + "_key": "c6f481f101e0", + "_type": "span", + "marks": [], + "text": "In some circumstances, you may need to run your Nextflow pipeline in the background without losing the execution output. In this scenario use the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-bg", + "_key": "ea12253295cd" + }, + { + "_type": "span", + "marks": [], + "text": " command line option as shown below.", + "_key": "1b372e263059" + } + ], + "_type": "block", + "style": "normal", + "_key": "54073a68f72f", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "b85a8de0d7b2", + "children": [ + { + "_key": "0cce4c63d040", + "_type": "span", + "text": "" + } + ] + }, + { + "_type": "code", + "_key": "8b5e2de93bed", + "code": "nextflow run -bg > my-file.log" + }, + { + "_type": "block", + "style": "normal", + "_key": "91e9d34ab66f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5264e2054015" + } + ] + }, + { + "style": "normal", + "_key": "e66b198a1e92", + "markDefs": [], + "children": [ + { + "_key": "1949ca52abcd", + "_type": "span", + "marks": [], + "text": "This can be very useful when launching the execution from an SSH connected terminal and ensures that any connection issues don't stop the pipeline. You can use " + }, + { + "_key": "7c866d93dd99", + "_type": "span", + "marks": [ + "code" + ], + "text": "ps" + }, + { + "_key": "ecfcabadac2a", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "_key": "929e119c2b64", + "_type": "span", + "marks": [ + "code" + ], + "text": "kill" + }, + { + "marks": [], + "text": " to find and stop the execution.", + "_key": "7b314c62d6f0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "487075f9f234", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "94878577c759" + }, + { + "children": [ + { + "text": "3. Disable interactive logging", + "_key": "ddd311cc0801", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "70e9706e7e4c" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow has rich terminal logging which uses ANSI escape codes to update the pipeline execution counters interactively. However, this is not very useful when submitting the pipeline execution as a cluster job or in the background. In this case, disable the rich ANSI logging using the command line option ", + "_key": "e873cb7f406f" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-ansi-log false", + "_key": "1beece2a9231" + }, + { + "_key": "0cb747b1d521", + "_type": "span", + "marks": [], + "text": " or the environment variable " + }, + { + "text": "NXF_ANSI_LOG=false", + "_key": "a739f0a18bb2", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "f2cc5643031a" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1f6f23aff82" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "533bb7cfb10b" + } + ], + "_type": "block", + "style": "normal", + "_key": "9bae6268dad3" + }, + { + "_key": "af36a9faaa29", + "children": [ + { + "_type": "span", + "text": "4. Cluster native options", + "_key": "32773edf1aba" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/process.html#cpus", + "_key": "45b91467b042", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#memory", + "_key": "6f1095c8d436" + }, + { + "_key": "4e0d3d51b11b", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/process.html#disk" + } + ], + "children": [ + { + "_key": "db0e7952fe3a", + "_type": "span", + "marks": [], + "text": "Nextlow has portable directives for common resource requests such as " + }, + { + "text": "cpus", + "_key": "73895a9b020d", + "_type": "span", + "marks": [ + "45b91467b042" + ] + }, + { + "marks": [], + "text": ", ", + "_key": "8d7820f7f3bb", + "_type": "span" + }, + { + "text": "memory", + "_key": "fbf16c8d59c1", + "_type": "span", + "marks": [ + "6f1095c8d436" + ] + }, + { + "_key": "c94428074a20", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "4e0d3d51b11b" + ], + "text": "disk", + "_key": "34f31b12a268", + "_type": "span" + }, + { + "marks": [], + "text": " allocation.", + "_key": "674c55b3ca7b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2bbc3d94ca92" + }, + { + "style": "normal", + "_key": "b4e2367f0e17", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4700d4255ba0" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "da2b346173c8", + "_type": "span", + "marks": [], + "text": "These directives allow you to specify the request for a certain number of computing resources e.g CPUs, memory, or disk and Nextflow converts these values to the native setting of the target execution platform specified in the pipeline configuration." + } + ], + "_type": "block", + "style": "normal", + "_key": "516302240cf2", + "markDefs": [] + }, + { + "style": "normal", + "_key": "11a5b181732b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bf4e7b5073e8" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3e4b55d3d37c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However, there can be settings that are only available on some specific cluster technology or vendors.", + "_key": "38e4e68000f6" + } + ] + }, + { + "_key": "a6d4d4b1773a", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2e53882af5d6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b115bff69e24", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/process.html#clusterOptions", + "_key": "894e1a7d3a32", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "05b6aa762afd", + "_type": "span" + }, + { + "_key": "bcf1d9c3b577", + "_type": "span", + "marks": [ + "894e1a7d3a32" + ], + "text": "clusterOptions" + }, + { + "marks": [], + "text": " directive allows you to specify any option of your resource manager for which there isn't direct support in Nextflow.", + "_key": "65f30d18a9e7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "aef3b822f954", + "children": [ + { + "_key": "61d8396a97ed", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "d5097179e6f5", + "children": [ + { + "text": "5. Retry failing jobs increasing resource allocation", + "_key": "188ff7cae9e9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "738a13554797", + "markDefs": [], + "children": [ + { + "_key": "eaa1f97cc36b", + "_type": "span", + "marks": [], + "text": "A common scenario is that instances of the same process may require different computing resources. For example, requesting an amount of memory that is too low for some processes will result in those tasks failing. You could specify a higher limit which would accommodate the task with the highest memory utilization, but you then run the risk of decreasing your job’s execution priority." + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "f0ef6221b3af", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5f1bd5086ec8" + } + ], + "_type": "block" + }, + { + "_key": "1c44ba489d73", + "markDefs": [], + "children": [ + { + "text": "Nextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure and attempt to re-execute it using a higher limit. For example:", + "_key": "eec942bfa508", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "0bb9719ac1e0", + "children": [ + { + "text": "", + "_key": "824cd1f57ed7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "code": "process foo {\n\n memory { 2.GB * task.attempt }\n time { 1.hour * task.attempt }\n\n errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }\n maxRetries 3\n\n script:\n \"\"\"\n your_job_command --here\n \"\"\"\n}", + "_type": "code", + "_key": "19507665b71c" + }, + { + "_key": "c0f503f0cf67", + "children": [ + { + "_type": "span", + "text": "", + "_key": "4e617dceddfe" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the above example the memory and execution time limits are defined dynamically. The first time the process is executed the task.attempt is set to 1, thus it will request 2 GB of memory and one hour of maximum execution time.", + "_key": "d2b5dd2fdc3b" + } + ], + "_type": "block", + "style": "normal", + "_key": "c7e9fb85fe6d" + }, + { + "style": "normal", + "_key": "35e7bff90950", + "children": [ + { + "text": "", + "_key": "2a1bd20061e1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "487fc35e66d5", + "_type": "span", + "marks": [], + "text": "If the task execution fails, reporting an exit status in the range between 137 and 140, the task is re-submitted (otherwise it terminates immediately). This time the value of task.attempt is 2, thus increasing the amount of the memory to four GB and the time to 2 hours, and so on." + } + ], + "_type": "block", + "style": "normal", + "_key": "1e12eb2be3be" + }, + { + "_type": "block", + "style": "normal", + "_key": "ef37c1e2c593", + "children": [ + { + "_type": "span", + "text": "", + "_key": "a0a7ffb7f6cf" + } + ] + }, + { + "style": "normal", + "_key": "03cc831da6b2", + "markDefs": [], + "children": [ + { + "_key": "8ea3741f842b", + "_type": "span", + "marks": [], + "text": "NOTE: These exit statuses are not standard and can change depending on the resource manager you are using. Consult your cluster administrator or scheduler administration guide for details on the exit statuses used by your cluster in similar error conditions." + } + ], + "_type": "block" + }, + { + "_key": "54000e295e84", + "children": [ + { + "_key": "a54f4bd5ecd6", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Conclusion", + "_key": "5d31a1142abd" + } + ], + "_type": "block", + "style": "h3", + "_key": "1af0fb865cb9" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow aims to give you control over every aspect of your workflow. These Nextflow options allow you to shape how Nextflow submits your processes to your executor, that can make your workflow more robust by avoiding the overloading of the executor. Some systems have hard limits which if you do not take into account, no processes will be executed. Being aware of these configuration values and how to use them is incredibly helpful when working with larger workflows. ", + "_key": "53bbce01c2c8" + }, + { + "_key": "0ee52d6a2216", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "9952ef367fe5", + "markDefs": [] + } + ], + "_id": "a3774eee3abe", + "author": { + "_ref": "5bLgfCKN00diCN0ijmWND4", + "_type": "reference" + }, + "title": "Five more tips for Nextflow user on HPC", + "_createdAt": "2024-09-25T14:15:52Z", + "_updatedAt": "2024-09-26T09:02:24Z", + "_type": "blogPost" + }, + { + "tags": [ + { + "_type": "reference", + "_key": "52686bd7856a", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "2346151467fb" + } + ], + "meta": { + "description": "This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max describes how they deploy complex cancer data analysis pipelines using Nextflow and Singularity. We are very happy to share their experience across the Nextflow community.", + "slug": { + "current": "caw-and-singularity" + } + }, + "_updatedAt": "2024-10-14T09:30:32Z", + "author": { + "_ref": "c121be61-087a-4ca7-a3c2-1729e5d706f3", + "_type": "reference" + }, + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "a5016cfc0d3f", + "_type": "span", + "marks": [ + "em" + ], + "text": "This is a guest post authored by Maxime Garcia from the Science for Life Laboratory in Sweden. Max describes how they deploy complex cancer data analysis pipelines using Nextflow and Singularity. We are very happy to share their experience across the Nextflow community." + } + ], + "_type": "block", + "style": "normal", + "_key": "d3ae7d6b8b48" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "58cbc65b5de5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "795be87b0121" + }, + { + "_type": "block", + "style": "h2", + "_key": "5a64db12bd62", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The CAW pipeline", + "_key": "143e44dfb014" + } + ] + }, + { + "alignment": "right", + "asset": { + "asset": { + "_ref": "image-b65e611f8f184178c224e7d045a5a63f4e7eac9f-197x100-png", + "_type": "reference" + }, + "_type": "image" + }, + "size": "small", + "_type": "picture", + "alt": "CAW logo", + "_key": "3a04869c3005" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "b5370da3669b" + ], + "text": "Cancer Analysis Workflow", + "_key": "9cbbdd284485" + }, + { + "marks": [], + "text": " (CAW for short) is a Nextflow based analysis pipeline developed for the analysis of tumour: normal pairs. It is developed in collaboration with two infrastructures within ", + "_key": "ad26c75ea4f4", + "_type": "span" + }, + { + "_key": "9b4acb7fa111", + "_type": "span", + "marks": [ + "376de3c4a017" + ], + "text": "Science for Life Laboratory" + }, + { + "marks": [], + "text": ": ", + "_key": "b2878166cde1", + "_type": "span" + }, + { + "_key": "668aa26bdca4", + "_type": "span", + "marks": [ + "5eac11f777ec" + ], + "text": "National Genomics Infrastructure" + }, + { + "marks": [], + "text": " (NGI), in The Stockholm ", + "_key": "ee4e4dd86a75", + "_type": "span" + }, + { + "_key": "12be2148a727", + "_type": "span", + "marks": [ + "c10c77ac5016" + ], + "text": "Genomics Applications Development Facility" + }, + { + "_key": "a6862a44a7f8", + "_type": "span", + "marks": [], + "text": " to be precise and " + }, + { + "_type": "span", + "marks": [ + "edb4c35acfd4" + ], + "text": "National Bioinformatics Infrastructure Sweden", + "_key": "1b61c1ecda59" + }, + { + "text": " (NBIS).", + "_key": "7a41341e9854", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "662c2673c6bf", + "markDefs": [ + { + "_key": "b5370da3669b", + "_type": "link", + "href": "http://opensource.scilifelab.se/projects/sarek/" + }, + { + "href": "https://www.scilifelab.se/", + "_key": "376de3c4a017", + "_type": "link" + }, + { + "_type": "link", + "href": "https://ngisweden.scilifelab.se/", + "_key": "5eac11f777ec" + }, + { + "_key": "c10c77ac5016", + "_type": "link", + "href": "https://www.scilifelab.se/facilities/ngi-stockholm/" + }, + { + "href": "https://www.nbis.se/", + "_key": "edb4c35acfd4", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "05e1a3bffe31", + "markDefs": [], + "children": [ + { + "_key": "4775462972f7", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "28b5d5e00f3f", + "markDefs": [ + { + "href": "https://software.broadinstitute.org/gatk/best-practices/", + "_key": "400b1a893741", + "_type": "link" + }, + { + "href": "https://github.com/broadinstitute/mutect/", + "_key": "29d1f2d25a2d", + "_type": "link" + }, + { + "href": "https://github.com/broadgsa/gatk-protected/", + "_key": "ec5a7e979f1f", + "_type": "link" + }, + { + "href": "https://github.com/Illumina/strelka/", + "_key": "ffc06c8a5432", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/ekg/freebayes/", + "_key": "2a2f890970e8" + }, + { + "_key": "affda0050980", + "_type": "link", + "href": "https://github.com/broadgsa/gatk-protected/" + }, + { + "_type": "link", + "href": "https://github.com/Illumina/manta/", + "_key": "3d1af233cd71" + }, + { + "href": "https://github.com/Crick-CancerGenomics/ascat/", + "_key": "75d0de940ae1", + "_type": "link" + }, + { + "href": "http://snpeff.sourceforge.net/", + "_key": "3c1912c340f6", + "_type": "link" + }, + { + "_key": "bc474e49db24", + "_type": "link", + "href": "https://www.ensembl.org/info/docs/tools/vep/index.html" + }, + { + "href": "http://multiqc.info/", + "_key": "7062b31b0eb9", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "CAW is based on ", + "_key": "54c97875242f" + }, + { + "text": "GATK Best Practices", + "_key": "f4d2fb053447", + "_type": "span", + "marks": [ + "400b1a893741" + ] + }, + { + "_type": "span", + "marks": [], + "text": " for the preprocessing of FastQ files, then uses various variant calling tools to look for somatic SNVs and small indels (", + "_key": "b8fa3833e0d5" + }, + { + "_type": "span", + "marks": [ + "29d1f2d25a2d" + ], + "text": "MuTect1", + "_key": "dfc943c67a90" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "0f3600d339b1" + }, + { + "_type": "span", + "marks": [ + "ec5a7e979f1f" + ], + "text": "MuTect2", + "_key": "a99a6bca29bc" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "a6b4d26158f8" + }, + { + "text": "Strelka", + "_key": "80ff453e43b3", + "_type": "span", + "marks": [ + "ffc06c8a5432" + ] + }, + { + "text": ", ", + "_key": "0afcd1050f50", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "2a2f890970e8" + ], + "text": "Freebayes", + "_key": "dd6f5967f128" + }, + { + "marks": [], + "text": "), (", + "_key": "f759ba55a4ab", + "_type": "span" + }, + { + "_key": "ec8897e6cfe0", + "_type": "span", + "marks": [ + "affda0050980" + ], + "text": "GATK HaplotyeCaller" + }, + { + "marks": [], + "text": "), for structural variants(", + "_key": "2a408fe12fc4", + "_type": "span" + }, + { + "text": "Manta", + "_key": "26af2e6cca8b", + "_type": "span", + "marks": [ + "3d1af233cd71" + ] + }, + { + "_key": "277875b6e802", + "_type": "span", + "marks": [], + "text": ") and for CNVs (" + }, + { + "_type": "span", + "marks": [ + "75d0de940ae1" + ], + "text": "ASCAT", + "_key": "3f1ad521d274" + }, + { + "_type": "span", + "marks": [], + "text": "). Annotation tools (", + "_key": "af749cb8be5f" + }, + { + "_key": "4bd03567c4f1", + "_type": "span", + "marks": [ + "3c1912c340f6" + ], + "text": "snpEff" + }, + { + "_key": "f3804b9f0ef7", + "_type": "span", + "marks": [], + "text": ", " + }, + { + "_type": "span", + "marks": [ + "bc474e49db24" + ], + "text": "VEP", + "_key": "d64ba9fe7363" + }, + { + "text": ") are also used, and finally ", + "_key": "f0c23ac43bfe", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "7062b31b0eb9" + ], + "text": "MultiQC", + "_key": "4ff43b64b73f", + "_type": "span" + }, + { + "_key": "c759b797cfc5", + "_type": "span", + "marks": [], + "text": " for handling reports." + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "236fa86215f7", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "fab4f4751001", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f44d0fa4804b", + "markDefs": [ + { + "_key": "1115444b3f9c", + "_type": "link", + "href": "https://github.com/SciLifeLab/CAW/" + }, + { + "_type": "link", + "href": "https://gitter.im/SciLifeLab/CAW/", + "_key": "4f341343a76a" + } + ], + "children": [ + { + "_key": "0927870d18a0", + "_type": "span", + "marks": [], + "text": "We are currently working on a manuscript, but you're welcome to look at (or even contribute to) our " + }, + { + "text": "github repository", + "_key": "4ae440c4e9b0", + "_type": "span", + "marks": [ + "1115444b3f9c" + ] + }, + { + "_type": "span", + "marks": [], + "text": " or talk with us on our ", + "_key": "1c7e6f22debd" + }, + { + "text": "gitter channel", + "_key": "f39052b7cf1e", + "_type": "span", + "marks": [ + "4f341343a76a" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "65b9842de381" + } + ] + }, + { + "style": "normal", + "_key": "cbae57b40bee", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "6ee67b01bef2", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "a1e675dd46b5", + "_type": "span", + "marks": [], + "text": "Singularity and UPPMAX" + } + ], + "_type": "block", + "style": "h2", + "_key": "82375d0c0c2c", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "6b1dda8f8f1a" + ], + "text": "Singularity", + "_key": "dbbc5e85c35e" + }, + { + "_key": "69dff9bf4e2c", + "_type": "span", + "marks": [], + "text": " is a tool package software dependencies into a contained environment, much like Docker. It's designed to run on HPC environments where Docker is often a problem due to its requirement for administrative privileges." + } + ], + "_type": "block", + "style": "normal", + "_key": "ad28f23966ae", + "markDefs": [ + { + "href": "http://singularity.lbl.gov/", + "_key": "6b1dda8f8f1a", + "_type": "link" + } + ] + }, + { + "_key": "da1510958d2b", + "markDefs": [], + "children": [ + { + "_key": "302b409d9911", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "8b40855a955f", + "markDefs": [ + { + "_type": "link", + "href": "https://uppmax.uu.se/", + "_key": "f82be797dbce" + }, + { + "_type": "link", + "href": "https://www.uppmax.uu.se/projects-and-collaborations/snic-sens/", + "_key": "83733d3d4572" + } + ], + "children": [ + { + "marks": [], + "text": "We're based in Sweden, and ", + "_key": "3f2e0e95376c", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "f82be797dbce" + ], + "text": "Uppsala Multidisciplinary Center for Advanced Computational Science", + "_key": "24111e45e4c0" + }, + { + "_type": "span", + "marks": [], + "text": " (UPPMAX) provides Computational infrastructures for all Swedish researchers. Since we're analyzing sensitive data, we are using secure clusters (with a two factor authentication), set up by UPPMAX: ", + "_key": "45b292061e84" + }, + { + "_type": "span", + "marks": [ + "83733d3d4572" + ], + "text": "SNIC-SENS", + "_key": "a9fe33761df1" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6b3a7646742b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "41726010f3fa", + "markDefs": [], + "children": [ + { + "_key": "223b538c3960", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "b418a11871ae", + "_type": "span", + "marks": [], + "text": "In my case, since we're still developing the pipeline, I am mainly using the research cluster " + }, + { + "_type": "span", + "marks": [ + "d9ccc539cbaf" + ], + "text": "Bianca", + "_key": "f4766386d16f" + }, + { + "marks": [], + "text": ". So I can only transfer files and data in one specific repository using SFTP.", + "_key": "cb5de068d31a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1e48dd24f1df", + "markDefs": [ + { + "_type": "link", + "href": "https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/", + "_key": "d9ccc539cbaf" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "c81d9aca8939", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "dc1cbdcbc771" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "711a5e817264", + "markDefs": [ + { + "_type": "link", + "href": "http://modules.sourceforge.net/", + "_key": "f8a2d45700e5" + } + ], + "children": [ + { + "text": "UPPMAX provides computing resources for Swedish researchers for all scientific domains, so getting software updates can occasionally take some time. Typically, ", + "_key": "fd1d47c71b0c", + "_type": "span", + "marks": [] + }, + { + "text": "Environment Modules", + "_key": "45ebdc608a21", + "_type": "span", + "marks": [ + "f8a2d45700e5" + ] + }, + { + "marks": [], + "text": " are used which allow several versions of different tools - this is good for reproducibility and is quite easy to use. However, the approach is not portable across different clusters outside of UPPMAX.", + "_key": "1c0b2168107c", + "_type": "span" + } + ] + }, + { + "_key": "dfdc4d7a5265", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "87a0557545fb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "b4ce1f703f10", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Why use containers?", + "_key": "1af05e5d272f", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "_key": "a451f3b0f292", + "markDefs": [ + { + "href": "https://www.docker.com/", + "_key": "819ab4284c07", + "_type": "link" + }, + { + "href": "http://singularity.lbl.gov/", + "_key": "e722a5846d88", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The idea of using containers, for improved portability and reproducibility, and more up to date tools, came naturally to us, as it is easily managed within Nextflow. We cannot use ", + "_key": "a627e3673f4c", + "_type": "span" + }, + { + "marks": [ + "819ab4284c07" + ], + "text": "Docker", + "_key": "7f88b603e6ff", + "_type": "span" + }, + { + "marks": [], + "text": " on our secure cluster, so we wanted to run CAW with ", + "_key": "d10fd389a827", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "e722a5846d88" + ], + "text": "Singularity", + "_key": "9ff91e90189c" + }, + { + "marks": [], + "text": " images instead.", + "_key": "680217768a1b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "594afc570d37" + } + ], + "_type": "block", + "style": "normal", + "_key": "583735e027d4", + "markDefs": [] + }, + { + "_type": "block", + "style": "h2", + "_key": "5cc82bec4c7c", + "markDefs": [], + "children": [ + { + "text": "How was the switch made?", + "_key": "b9eba064ce25", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "2fd5b5b4bc8d", + "markDefs": [ + { + "href": "https://github.com/SciLifeLab/CAW/blob/master/buildContainers.nf", + "_key": "0cc4622f76ee", + "_type": "link" + } + ], + "children": [ + { + "_key": "81ba9fe29a97", + "_type": "span", + "marks": [], + "text": "We were already using Docker containers for our continuous integration testing with Travis, and since we use many tools, I took the approach of making (almost) a container for each process. Because this process is quite slow, repetitive and I~~'m lazy~~ like to automate everything, I made a simple NF " + }, + { + "_type": "span", + "marks": [ + "0cc4622f76ee" + ], + "text": "script", + "_key": "c80dcde038ad" + }, + { + "_key": "40d58c805faa", + "_type": "span", + "marks": [], + "text": " to build and push all docker containers. Basically it's just " + }, + { + "marks": [ + "code" + ], + "text": "build", + "_key": "e1f1d906d978", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "10e5431bf0af" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "pull", + "_key": "468c093d09fa" + }, + { + "marks": [], + "text": " for all containers, with some configuration possibilities.", + "_key": "57938a3ddb31", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "65165a0413c5", + "markDefs": [], + "children": [ + { + "_key": "1e1281cc4660", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "7981801cbeed", + "code": "docker build -t ${repository}/${container}:${tag} ${baseDir}/containers/${container}/.\n\ndocker push ${repository}/${container}:${tag}", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "300b67a22d42", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "dc0efc43a64d" + }, + { + "_type": "block", + "style": "normal", + "_key": "71b4b6d6cc7f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Since Singularity can directly pull images from DockerHub, I made the build script to pull all containers from DockerHub to have local Singularity image files.", + "_key": "7cbb1511785f", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "8ec40d876508", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "eeddff34f6f9", + "markDefs": [] + }, + { + "code": "singularity pull --name ${container}-${tag}.img docker://${repository}/${container}:${tag}", + "_type": "code", + "_key": "7d5e902fc937" + }, + { + "style": "normal", + "_key": "51c0d6ad7268", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3240935023e8", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "After this, it's just a matter of moving all containers to the secure cluster we're using, and using the right configuration file in the profile. I'll spare you the details of the SFTP transfer. This is what the configuration file for such Singularity images looks like: ", + "_key": "dfa0c39bab14" + }, + { + "_type": "span", + "marks": [ + "cebbf0178117" + ], + "text": "`singularity-path.config`", + "_key": "0480b3a9196d" + } + ], + "_type": "block", + "style": "normal", + "_key": "6a828caff9a6", + "markDefs": [ + { + "href": "https://github.com/SciLifeLab/CAW/blob/master/configuration/singularity-path.config", + "_key": "cebbf0178117", + "_type": "link" + } + ] + }, + { + "_key": "0537320059ac", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "eb76bcac33bc" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "/*\nvim: syntax=groovy\n-*- mode: groovy;-*-\n * -------------------------------------------------\n * Nextflow config file for CAW project\n * -------------------------------------------------\n * Paths to Singularity images for every process\n * No image will be pulled automatically\n * Need to transfer and set up images before\n * -------------------------------------------------\n */\n\nsingularity {\n enabled = true\n runOptions = \"--bind /scratch\"\n}\n\nparams {\n containerPath='containers'\n tag='1.2.3'\n}\n\nprocess {\n $ConcatVCF.container = \"${params.containerPath}/caw-${params.tag}.img\"\n $RunMultiQC.container = \"${params.containerPath}/multiqc-${params.tag}.img\"\n $IndelRealigner.container = \"${params.containerPath}/gatk-${params.tag}.img\"\n // I'm not putting the whole file here\n // you probably already got the point\n}", + "_type": "code", + "_key": "6d498f3b1666" + }, + { + "style": "normal", + "_key": "2255d8ad22ba", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "094a711d3b0b", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "b792900f8d76", + "markDefs": [], + "children": [ + { + "_key": "3e773860cba6", + "_type": "span", + "marks": [], + "text": "This approach ran (almost) perfectly on the first try, except a process failing due to a typo on a container name..." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "8cb827f459d9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "74150686d861", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Conclusion", + "_key": "617aec92026d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h2", + "_key": "cf88e53506ee" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "This switch was completed a couple of months ago and has been a great success. We are now using Singularity containers in almost all of our Nextflow pipelines developed at NGI. Even if we do enjoy the improved control, we must not forgot that:", + "_key": "ca50cf71162e" + } + ], + "_type": "block", + "style": "normal", + "_key": "449863778fd8", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7c9d480ae59c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "239f444c4481" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With great power comes great responsibility!", + "_key": "6b83a6d639a8" + } + ], + "_type": "block", + "style": "blockquote", + "_key": "3b97c6dd8db9" + }, + { + "_key": "039aa16680d4", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e22b13c61dc4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "7d6d9a7c878e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Credits", + "_key": "5966c10264f9", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "92c3bb0be030", + "markDefs": [ + { + "_key": "b929c7c6c76f", + "_type": "link", + "href": "https://github.com/Hammarn" + }, + { + "_type": "link", + "href": "http://phil.ewels.co.uk/", + "_key": "db38f32bd3af" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Thanks to ", + "_key": "280624830fa5" + }, + { + "_key": "e783ab24478a", + "_type": "span", + "marks": [ + "b929c7c6c76f" + ], + "text": "Rickard Hammarén" + }, + { + "text": " and ", + "_key": "29dd480cdda4", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "db38f32bd3af" + ], + "text": "Phil Ewels", + "_key": "439fe18a2258" + }, + { + "_type": "span", + "marks": [], + "text": " for comments and suggestions for improving the post.", + "_key": "3ed8309b3f11" + } + ], + "_type": "block", + "style": "normal" + } + ], + "title": "Running CAW with Singularity and Nextflow", + "publishedAt": "2017-11-16T07:00:00.000Z", + "_type": "blogPost", + "_createdAt": "2024-09-25T14:15:12Z", + "_rev": "2PruMrLMGpvZP5qAknmBCU", + "_id": "a4034ff2cdea" + }, + { + "_createdAt": "2024-09-25T14:15:41Z", + "body": [ + { + "style": "normal", + "_key": "fb2392e4e9c0", + "markDefs": [], + "children": [ + { + "marks": [ + "em" + ], + "text": "Continuing our [series on understanding Nextflow resume](blog/2019/demystifying-nextflow-resume.html), we wanted to delve deeper to show how you can report which tasks contribute to a given workflow output.", + "_key": "fc662a1848ad", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c3cd3ea88f85", + "children": [ + { + "_type": "span", + "text": "", + "_key": "66d75995885d" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "f4d92f006f30", + "children": [ + { + "_type": "span", + "text": "Easy provenance reports", + "_key": "0512fc297730" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cd54c0ed0e33", + "markDefs": [], + "children": [ + { + "_key": "adcd51405ee4", + "_type": "span", + "marks": [], + "text": "When provided with a run name or session ID, the log command can return useful information about a pipeline execution. This can be composed to track the provenance of a workflow result." + } + ] + }, + { + "style": "normal", + "_key": "34b3875a806e", + "children": [ + { + "text": "", + "_key": "64142e7a0455", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "4faf29fdd28d", + "markDefs": [], + "children": [ + { + "text": "When supplying a run name or session ID, the log command lists all the work directories used to compute the final result. For example:", + "_key": "aa0e185ab4f2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "14f1680196a1", + "children": [ + { + "text": "", + "_key": "c596fbee8164", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "e6f3ab714843", + "code": "$ nextflow log tiny_fermat\n\n/data/.../work/7b/3753ff13b1fa5348d2d9b6f512153a\n/data/.../work/c1/56a36d8f498c99ac6cba31e85b3e0c\n/data/.../work/f7/659c65ef60582d9713252bcfbcc310\n/data/.../work/82/ba67e3175bd9e6479d4310e5a92f99\n/data/.../work/e5/2816b9d4e7b402bfdd6597c2c2403d\n/data/.../work/3b/3485d00b0115f89e4c202eacf82eba" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "569c3d9b47a6" + } + ], + "_type": "block", + "style": "normal", + "_key": "593076e6715a" + }, + { + "style": "normal", + "_key": "cf86330d9ac5", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Using the option ", + "_key": "d0f45d4ab4b5", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "-f", + "_key": "eb86d2ab6f2d", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " (fields) it’s possible to specify which metadata should be printed by the log command. For example:", + "_key": "2c209ff5da4f" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "05b1f2514c38", + "children": [ + { + "text": "", + "_key": "a53b0cf5bc04", + "_type": "span" + } + ] + }, + { + "_type": "code", + "_key": "c7e30ac79fc0", + "code": "$ nextflow log tiny_fermat -f 'process,exit,hash,duration'\n\nindex\t0\t7b/3753ff\t2s\nfastqc\t0\tc1/56a36d\t9.3s\nfastqc\t0\tf7/659c65\t9.1s\nquant\t0\t82/ba67e3\t2.7s\nquant\t0\te5/2816b9\t3.2s\nmultiqc\t0\t3b/3485d0\t6.3s" + }, + { + "_key": "56735f730d29", + "children": [ + { + "_type": "span", + "text": "", + "_key": "34d617b3165a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "cac2d17fcc69", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The complete list of available fields can be retrieved with the command:", + "_key": "18179227b432", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "21cc1360e125" + } + ], + "_type": "block", + "style": "normal", + "_key": "d03debfaf753" + }, + { + "code": "$ nextflow log -l", + "_type": "code", + "_key": "b8f4d0997480" + }, + { + "_key": "b491d6d4ea99", + "children": [ + { + "text": "", + "_key": "e3db9de273aa", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "The option ", + "_key": "36717c095732", + "_type": "span" + }, + { + "_key": "2a45f19a6b3d", + "_type": "span", + "marks": [ + "code" + ], + "text": "-F" + }, + { + "_key": "ef47ba6d4745", + "_type": "span", + "marks": [], + "text": " allows the specification of filtering criteria to print only a subset of tasks. For example:" + } + ], + "_type": "block", + "style": "normal", + "_key": "9ef427e91347", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "585f950128d6" + } + ], + "_type": "block", + "style": "normal", + "_key": "f9af4cef6a4c" + }, + { + "code": "$ nextflow log tiny_fermat -F 'process =~ /fastqc/'\n\n/data/.../work/c1/56a36d8f498c99ac6cba31e85b3e0c\n/data/.../work/f7/659c65ef60582d9713252bcfbcc310", + "_type": "code", + "_key": "521a74a6b4d3" + }, + { + "_key": "8d829bcc7d44", + "children": [ + { + "text": "", + "_key": "0f9842c4ff36", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "57d24075daa8", + "_type": "span", + "marks": [], + "text": "This can be useful to locate specific tasks work directories." + } + ], + "_type": "block", + "style": "normal", + "_key": "991bd3c9e4d2", + "markDefs": [] + }, + { + "_key": "0a3f9a0e8d0b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "5beb1f685bba" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "c097fa3a2dce", + "_type": "span", + "marks": [], + "text": "Finally, the " + }, + { + "text": "-t", + "_key": "e0bba423dac1", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " option allows for the creation of a basic custom HTML provenance report that can be generated by providing a template file, in any format of your choice. For example:", + "_key": "70d9dbf8a9aa" + } + ], + "_type": "block", + "style": "normal", + "_key": "d1151715f567", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3e9787370ff0" + } + ], + "_type": "block", + "style": "normal", + "_key": "5ee46f369aea" + }, + { + "code": "
\n

${name}

\n
\nScript:\n
${script}
\n
\n\n- Exit: ${exit}\n- Status: ${status}\n- Work dir: ${workdir}\n- Container: ${container}\n\n
", + "_type": "code", + "_key": "76d4c582b3aa" + }, + { + "_key": "0012924785cc", + "children": [ + { + "text": "", + "_key": "1d2c8442dbb0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "By saving the above snippet in a file named template.html, you can run the following command:", + "_key": "615e3e126f2a" + } + ], + "_type": "block", + "style": "normal", + "_key": "d66e33f035ee", + "markDefs": [] + }, + { + "_key": "c78fdfb7358c", + "children": [ + { + "_key": "d87bc69b0823", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "cb60a9af324e", + "code": "$ nextflow log tiny_fermat -t template.html > provenance.html" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "70bb91464e96" + } + ], + "_type": "block", + "style": "normal", + "_key": "e41a1783bea0" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Open it in your browser, et voilà!", + "_key": "3d5c5ebaefb2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "6f8372c1690d" + }, + { + "_type": "block", + "style": "normal", + "_key": "af37c980ec67", + "children": [ + { + "text": "", + "_key": "2743498f0681", + "_type": "span" + } + ] + }, + { + "style": "h2", + "_key": "fa8804a8112f", + "children": [ + { + "_key": "a9cf171c24ff", + "_type": "span", + "text": "Conclusion" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "181f5543549d", + "markDefs": [], + "children": [ + { + "text": "This post introduces a little know Nextflow feature and it's intended to show how it can be used to produce a custom execution report reporting some - basic - provenance information.", + "_key": "28904187aa65", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "3a507e19fac4", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2066eba9b9c8" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "In future releases we plan to support a more formal provenance specification and execution tracking features.", + "_key": "d6b46878fd4d" + } + ], + "_type": "block", + "style": "normal", + "_key": "302f5433f72c", + "markDefs": [] + } + ], + "_updatedAt": "2024-09-26T09:02:12Z", + "tags": [ + { + "_key": "9be02c19052a", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "meta": { + "slug": { + "current": "easy-provenance-report" + } + }, + "publishedAt": "2019-08-29T06:00:00.000Z", + "_type": "blogPost", + "title": "Easy provenance reporting", + "_id": "a54d0c678021", + "_rev": "mvya9zzDXWakVjnX4hhZJ8", + "author": { + "_ref": "evan-floden", + "_type": "reference" + } + }, + { + "meta": { + "slug": { + "current": "scaling-with-aws-batch" + } + }, + "publishedAt": "2017-11-08T07:00:00.000Z", + "_id": "a89c9586c709", + "_updatedAt": "2024-09-26T09:01:49Z", + "body": [ + { + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/batch/", + "_key": "c5daa195af6b" + } + ], + "children": [ + { + "text": "The latest Nextflow release (0.26.0) includes built-in support for ", + "_key": "a6ff24002d19", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "c5daa195af6b" + ], + "text": "AWS Batch", + "_key": "2f5d1fdba72e" + }, + { + "_type": "span", + "marks": [], + "text": ", a managed computing service that allows the execution of containerised workloads over the Amazon EC2 Container Service (ECS).", + "_key": "cf0deed10a24" + } + ], + "_type": "block", + "style": "normal", + "_key": "a27bea3529f6" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "0033e90abc20" + } + ], + "_type": "block", + "style": "normal", + "_key": "64551967199c" + }, + { + "_key": "6bd066c7d2ef", + "markDefs": [], + "children": [ + { + "text": "This feature allows the seamless deployment of Nextflow pipelines in the cloud by offloading the process executions as managed Batch jobs. The service takes care to spin up the required computing instances on-demand, scaling up and down the number and composition of the instances to best accommodate the actual workload resource needs at any point in time.", + "_key": "901e2f9d6dee", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e40ff2cebbc5", + "children": [ + { + "text": "", + "_key": "21e679683da1", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ae0bee544e0b", + "markDefs": [], + "children": [ + { + "_key": "76f3a7e14a40", + "_type": "span", + "marks": [], + "text": "AWS Batch shares with Nextflow the same vision regarding workflow containerisation i.e. each compute task is executed in its own Docker container. This dramatically simplifies the workflow deployment through the download of a few container images. This common design background made the support for AWS Batch a natural extension for Nextflow." + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ab7f794ce634", + "children": [ + { + "_type": "span", + "text": "", + "_key": "23a77ae26b42" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "04c08c613b64", + "children": [ + { + "_type": "span", + "text": "Batch in a nutshell", + "_key": "59b81a947826" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "51e484bcc0fc", + "markDefs": [], + "children": [ + { + "text": "Batch is organised in ", + "_key": "b84859ea78b7", + "_type": "span", + "marks": [] + }, + { + "text": "Compute Environments", + "_key": "1ac148f94824", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "26d5c72c810c" + }, + { + "_key": "d50161486bf3", + "_type": "span", + "marks": [ + "em" + ], + "text": "Job queues" + }, + { + "marks": [], + "text": ", ", + "_key": "713b89df6679", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Job definitions", + "_key": "0b8171533b19" + }, + { + "_key": "34623d9eb1f7", + "_type": "span", + "marks": [], + "text": " and " + }, + { + "marks": [ + "em" + ], + "text": "Jobs", + "_key": "3156d6add377", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "8033d0dff989" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "057f450d6c7d", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4c5e35cf7c66" + }, + { + "children": [ + { + "marks": [], + "text": "The ", + "_key": "3dbee83bb41d", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "Compute Environment", + "_key": "73b0764aafb2", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " allows you to define the computing resources required for a specific workload (type). You can specify the minimum and maximum number of CPUs that can be allocated, the EC2 provisioning model (On-demand or Spot), the AMI to be used and the allowed instance types.", + "_key": "310c1529c95e" + } + ], + "_type": "block", + "style": "normal", + "_key": "66e4b9376be8", + "markDefs": [] + }, + { + "_key": "b85e1da8840f", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2041abf88edd" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "be04b79e4cd4", + "_type": "span", + "marks": [], + "text": "The " + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Job queue", + "_key": "de5c0eb32a39" + }, + { + "text": " definition allows you to bind a specific task to one or more Compute Environments.", + "_key": "16914999e841", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "09c83ab3ecc4", + "markDefs": [] + }, + { + "_key": "d957f9d07381", + "children": [ + { + "_key": "bbad292f996a", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Then, the ", + "_key": "bb6b890c12e4" + }, + { + "marks": [ + "em" + ], + "text": "Job definition", + "_key": "3b1da5f7a535", + "_type": "span" + }, + { + "_key": "c0ec3f61ab10", + "_type": "span", + "marks": [], + "text": " is a template for one or more jobs in your workload. This is required to specify the Docker image to be used in running a particular task along with other requirements such as the container mount points, the number of CPUs, the amount of memory and the number of retries in case of job failure." + } + ], + "_type": "block", + "style": "normal", + "_key": "22fc8eba9ad0", + "markDefs": [] + }, + { + "style": "normal", + "_key": "a707904a826d", + "children": [ + { + "_key": "f38fd874f8c8", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e4f9237d8881", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Finally the ", + "_key": "902b5367a1b1" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Job", + "_key": "70a7aae9ebf3" + }, + { + "marks": [], + "text": " binds a Job definition to a specific Job queue and allows you to specify the actual task command to be executed in the container.", + "_key": "c4f9593730ba", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "9819563c5f4b", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b20a925bcca4" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The job input and output data management is delegated to the user. This means that if you only use Batch API/tools you will need to take care to stage the input data from a S3 bucket (or a different source) and upload the results to a persistent storage location.", + "_key": "b63889686655" + } + ], + "_type": "block", + "style": "normal", + "_key": "5981c02a9677", + "markDefs": [] + }, + { + "_key": "c9d2e692bbbe", + "children": [ + { + "text": "", + "_key": "9d4fac58df9e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "23a2c18c51cf", + "markDefs": [], + "children": [ + { + "_key": "e813f4c4d5f4", + "_type": "span", + "marks": [], + "text": "This could turn out to be cumbersome in complex workflows with a large number of tasks and above all it makes it difficult to deploy the same applications across different infrastructure." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "38ef40e7dbd2" + } + ], + "_type": "block", + "style": "normal", + "_key": "bb51576688e4" + }, + { + "children": [ + { + "_key": "109945bda3d0", + "_type": "span", + "text": "How to use Batch with Nextflow" + } + ], + "_type": "block", + "style": "h3", + "_key": "88d72649e790" + }, + { + "_type": "block", + "style": "normal", + "_key": "f35ce16deb48", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Nextflow streamlines the use of AWS Batch by smoothly integrating it in its workflow processing model and enabling transparent interoperability with other systems.", + "_key": "b90254c4328a", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "c12c292131d7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "984feea8e2e5" + }, + { + "_type": "block", + "style": "normal", + "_key": "10f96c579225", + "markDefs": [ + { + "_type": "link", + "href": "http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html", + "_key": "4c0375e4bea1" + }, + { + "_key": "f027756f3bc5", + "_type": "link", + "href": "http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To run Nextflow you will need to set-up in your AWS Batch account a ", + "_key": "54b2e618a132" + }, + { + "text": "Compute Environment", + "_key": "591d8f21aa6f", + "_type": "span", + "marks": [ + "4c0375e4bea1" + ] + }, + { + "_type": "span", + "marks": [], + "text": " defining the required computing resources and associate it to a ", + "_key": "3127335cd55b" + }, + { + "marks": [ + "f027756f3bc5" + ], + "text": "Job Queue", + "_key": "2c396bd8334c", + "_type": "span" + }, + { + "_key": "82a7fdb6bca3", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "_key": "8e5c8137a2be", + "children": [ + { + "_key": "b0069029a864", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Nextflow takes care to create the required ", + "_key": "1370afbe1fad", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Job Definitions", + "_key": "37d082404afc" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "bc85f33088dc" + }, + { + "marks": [ + "em" + ], + "text": "Job", + "_key": "7542b524d9b2", + "_type": "span" + }, + { + "marks": [], + "text": " requests as needed. This spares some Batch configurations steps.", + "_key": "16bb547eeabb", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "5b773f037ad7", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "2cc15770048c", + "children": [ + { + "_key": "311649c5ce71", + "_type": "span", + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In the ", + "_key": "83e8ec5d32e7" + }, + { + "_key": "12191a14a258", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config" + }, + { + "marks": [], + "text": ", file specify the ", + "_key": "b4edeec1d5af", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "awsbatch", + "_key": "06bd4c7b0394" + }, + { + "text": " executor, the Batch ", + "_key": "7e7fd58dbb47", + "_type": "span", + "marks": [] + }, + { + "text": "queue", + "_key": "c310437fe1f5", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " and the container to be used in the usual manner. You may also need to specify the AWS region and access credentials if they are not provided by other means. For example:", + "_key": "d6cdb6d0ddf1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "88c87fbeb5f2" + }, + { + "children": [ + { + "text": "", + "_key": "9be875ce8fc2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e9be99a49e43" + }, + { + "code": "process.executor = 'awsbatch'\nprocess.queue = 'my-batch-queue'\nprocess.container = your-org/your-docker:image\naws.region = 'eu-west-1'\naws.accessKey = 'xxx'\naws.secretKey = 'yyy'", + "_type": "code", + "_key": "097fc170b106" + }, + { + "_key": "34437eb32b3c", + "markDefs": [ + { + "_key": "8edf91599a37", + "_type": "link", + "href": "https://hub.docker.com/" + }, + { + "_type": "link", + "href": "https://quay.io/", + "_key": "71bd27ed336e" + }, + { + "_type": "link", + "href": "https://aws.amazon.com/ecr/", + "_key": "32dedd97b75c" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Each process can eventually use a different queue and Docker image (see Nextflow documentation for details). The container image(s) must be published in a Docker registry that is accessible from the instances run by AWS Batch eg. ", + "_key": "0a799aea5cb3" + }, + { + "marks": [ + "8edf91599a37" + ], + "text": "Docker Hub", + "_key": "821fbefcb0c0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", ", + "_key": "96c2fa32efda" + }, + { + "marks": [ + "71bd27ed336e" + ], + "text": "Quay", + "_key": "343e1bca5758", + "_type": "span" + }, + { + "_key": "aac20010ead2", + "_type": "span", + "marks": [], + "text": " or " + }, + { + "marks": [ + "32dedd97b75c" + ], + "text": "ECS Container Registry", + "_key": "3a7752c02bf0", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "71186ef67e51", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "df58912f583e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0856d00cc55a" + }, + { + "_key": "6ca75ce52f68", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Nextflow process can be launched either in a local computer or a EC2 instance. The latter is suggested for heavy or long running workloads.", + "_key": "bd5cb1adbfc1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "cd9ed3bc6102", + "children": [ + { + "_type": "span", + "text": "", + "_key": "966082ad7d3c" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "446efd365987", + "_type": "span", + "marks": [], + "text": "Note that input data should be stored in the S3 storage. In the same manner the pipeline execution must specify a S3 bucket as a working directory by using the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "-w", + "_key": "d1fb5981742a" + }, + { + "_key": "21a700327770", + "_type": "span", + "marks": [], + "text": " command line option." + } + ], + "_type": "block", + "style": "normal", + "_key": "091b61bcb1b6" + }, + { + "_type": "block", + "style": "normal", + "_key": "ba6d057adaa0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b0a4389d5afd" + } + ] + }, + { + "_key": "050bcae08875", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "A final caveat about custom containers and computing AMI. Nextflow automatically stages input data and shares tasks intermediate results by using the S3 bucket specified as a work directory. For this reason it needs to use the ", + "_key": "d9fe6548083b" + }, + { + "_key": "fe2fdcf86c8d", + "_type": "span", + "marks": [ + "code" + ], + "text": "aws" + }, + { + "_key": "7ff23b944ade", + "_type": "span", + "marks": [], + "text": " command line tool which must be installed either in your process container or be present in a custom AMI that can be mounted and accessed by the Docker containers." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "8ad255b2b520", + "children": [ + { + "_key": "18d6712ba839", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You may also need to create a custom AMI because the default image used by AWS Batch only provides 22 GB of storage which may not be enough for real world analysis pipelines.", + "_key": "d51164488694" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ffb1e78152c" + }, + { + "_key": "2144e5cdb0d1", + "children": [ + { + "_type": "span", + "text": "", + "_key": "b7477277cd0a" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "6db8d938a47e", + "markDefs": [ + { + "_key": "2ace7239aa52", + "_type": "link", + "href": "/docs/latest/awscloud.html#custom-ami" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "See the documentation to learn ", + "_key": "3f3208d1b209" + }, + { + "text": "how to create a custom AMI", + "_key": "269577d1d9a6", + "_type": "span", + "marks": [ + "2ace7239aa52" + ] + }, + { + "_type": "span", + "marks": [], + "text": " with larger storage and how to setup the AWS CLI tools.", + "_key": "36ebab5aab17" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d4eea62db1ce", + "children": [ + { + "_type": "span", + "text": "", + "_key": "278e1eecfff9" + } + ] + }, + { + "_type": "block", + "style": "h3", + "_key": "e973dac36071", + "children": [ + { + "_type": "span", + "text": "An example", + "_key": "37061abeb372" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "In order to validate Nextflow integration with AWS Batch, we used a simple RNA-Seq pipeline.", + "_key": "c1a63c9f67e7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ae03d5f7b076", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "3cad6c2caf78" + } + ], + "_type": "block", + "style": "normal", + "_key": "85207a7f515a" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://www.encodeproject.org/search/?type=Experiment&award.project=ENCODE&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&files.file_type=fastq&files.run_type=paired-ended&replicates.library.nucleic_acid_term_name=RNA&replicates.library.depleted_in_term_name=rRNA", + "_key": "a26cfeec8608" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This pipeline takes as input a metadata file from the Encode project corresponding to a ", + "_key": "efb3a33706b3" + }, + { + "_key": "b5e5153b6ab8", + "_type": "span", + "marks": [ + "a26cfeec8608" + ], + "text": "search\nreturning all human RNA-seq paired-end datasets" + }, + { + "_key": "a267024be8e7", + "_type": "span", + "marks": [], + "text": " (the metadata file has been additionally filtered to retain only data having a SRA ID)." + } + ], + "_type": "block", + "style": "normal", + "_key": "3f43f30cc919" + }, + { + "_type": "block", + "style": "normal", + "_key": "d70bf06e0a47", + "children": [ + { + "text": "", + "_key": "3eea0fa2f108", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "a45a549e2767", + "markDefs": [ + { + "_key": "776f9d9e4de0", + "_type": "link", + "href": "https://combine-lab.github.io/salmon/" + }, + { + "href": "http://multiqc.info/", + "_key": "d14d204dd6fd", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The pipeline automatically downloads the FASTQ files for each sample from the EBI ENA database, it assesses the overall quality of sequencing data using FastQC and then runs ", + "_key": "5ded8110aca7", + "_type": "span" + }, + { + "text": "Salmon", + "_key": "9935fb547858", + "_type": "span", + "marks": [ + "776f9d9e4de0" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to perform the quantification over the human transcript sequences. Finally all the QC and quantification outputs are summarised using the ", + "_key": "b0788b1cfece" + }, + { + "_type": "span", + "marks": [ + "d14d204dd6fd" + ], + "text": "MultiQC", + "_key": "9d391e1fb686" + }, + { + "_key": "9de81524e5b5", + "_type": "span", + "marks": [], + "text": " tool." + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "", + "_key": "73100dab1547", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b285e3ea5c86" + }, + { + "markDefs": [], + "children": [ + { + "text": "For the sake of this benchmark we used the first 38 samples out of the full 375 samples dataset.", + "_key": "eef1f873a8c5", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "67fa3af86519" + }, + { + "_key": "9817cefe782c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "eb692a3892f2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "9ec96316e352", + "markDefs": [ + { + "_key": "266a0d48aeef", + "_type": "link", + "href": "/blog/2016/more-fun-containers-hpc.html" + } + ], + "children": [ + { + "marks": [], + "text": "The pipeline was executed both on AWS Batch cloud and in the CRG internal Univa cluster, using ", + "_key": "6a096c586c3a", + "_type": "span" + }, + { + "marks": [ + "266a0d48aeef" + ], + "text": "Singularity", + "_key": "2819f1d8f61e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " as containers runtime.", + "_key": "d00aa9928879" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "54a6eabbf1b5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c5d0d0992d9a" + }, + { + "style": "normal", + "_key": "4cfea0912233", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-encode-nf", + "_key": "7fefbd96e81d" + } + ], + "children": [ + { + "_key": "5b3290f61980", + "_type": "span", + "marks": [], + "text": "It's worth noting that with the exception of the two configuration changes detailed below, we used exactly the same pipeline implementation at " + }, + { + "_type": "span", + "marks": [ + "7fefbd96e81d" + ], + "text": "this GitHub repository", + "_key": "1d352b86f86b" + }, + { + "text": ".", + "_key": "85f362058326", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "91192c4feb86", + "children": [ + { + "_type": "span", + "text": "", + "_key": "9d9671d34072" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The AWS deploy used the following configuration profile:", + "_key": "9820cf05e8e6" + } + ], + "_type": "block", + "style": "normal", + "_key": "2ff1a09bd415" + }, + { + "_key": "92975dbf0739", + "children": [ + { + "text": "", + "_key": "5fbbe3cd31a2", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "aws.region = 'eu-west-1'\naws.client.storageEncryption = 'AES256'\nprocess.queue = 'large'\nexecutor.name = 'awsbatch'\nexecutor.awscli = '/home/ec2-user/miniconda/bin/aws'", + "_type": "code", + "_key": "4592e11fc83e" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While for the cluster deployment the following configuration was used:", + "_key": "aa7505b75841", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "dc2a1896df81" + }, + { + "_key": "8ffaaf4850c7", + "children": [ + { + "_type": "span", + "text": "", + "_key": "00ef349299e9" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "94ecfc21795d", + "code": "executor = 'crg'\nsingularity.enabled = true\nprocess.container = \"docker://nextflow/rnaseq-nf\"\nprocess.queue = 'cn-el7'\nprocess.time = '90 min'\nprocess.$quant.time = '4.5 h'" + }, + { + "_type": "block", + "style": "h3", + "_key": "fcb72214e5b1", + "children": [ + { + "_type": "span", + "text": "Results", + "_key": "6c9eb3d23dd1" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "cb46ac2b6e2a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The AWS Batch Compute environment was configured to use a maximum of 132 CPUs as the number of CPUs that were available in the queue for local cluster deployment.", + "_key": "58fe7ed4d4af", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a0dbe35ceaea" + } + ], + "_type": "block", + "style": "normal", + "_key": "b76c5bf05b8b" + }, + { + "children": [ + { + "_key": "ed40ffb699c2", + "_type": "span", + "marks": [], + "text": "The two executions ran in roughly the same time: 2 hours and 24 minutes when running in the CRG cluster and 2 hours and 37 minutes when using AWS Batch." + } + ], + "_type": "block", + "style": "normal", + "_key": "5472ac8f16e8", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "e8266ca879ab", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2d548a45e657" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "8d63bb01f293", + "markDefs": [], + "children": [ + { + "text": "It must be noted that 14 jobs failed in the Batch deployment, presumably because one or more spot instances were retired. However Nextflow was able to re-schedule the failed jobs automatically and the overall pipeline execution completed successfully, also showing the benefits of a truly fault tolerant environment.", + "_key": "46d418a9f554", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "ea044ef42311", + "children": [ + { + "text": "", + "_key": "e3adea8acb35", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "The overall cost for running the pipeline with AWS Batch was ", + "_key": "8399fa01f3a4", + "_type": "span", + "marks": [] + }, + { + "text": "$5.47", + "_key": "48cd265fa7b3", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": " ($ 3.28 for EC2 instances, $1.88 for EBS volume and $0.31 for S3 storage). This means that with ~ $55 we could have performed the same analysis on the full Encode dataset.", + "_key": "e6d37261d9c4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e7a649de56a5", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "1bf88a585e42", + "children": [ + { + "_type": "span", + "text": "", + "_key": "8c57b6b08683" + } + ] + }, + { + "style": "normal", + "_key": "56e539a917f3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is more difficult to estimate the cost when using the internal cluster, because we don't have access to such detailed cost accounting. However, as a user, we can estimate it roughly comes out at $0.01 per CPU-Hour. The pipeline needed around 147 CPU-Hour to carry out the analysis, hence with an estimated cost of ", + "_key": "cb8dcbe040b9" + }, + { + "_key": "bb501100898e", + "_type": "span", + "marks": [ + "strong" + ], + "text": "$1.47" + }, + { + "_type": "span", + "marks": [], + "text": " just for the computation.", + "_key": "201f0afff356" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "3ddd57449eee", + "children": [ + { + "_key": "4289cbc0d09b", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "6e4756eef22c", + "markDefs": [ + { + "_key": "984670afa53a", + "_type": "link", + "href": "https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/aws-batch/report.html" + }, + { + "href": "https://cdn.rawgit.com/nextflow-io/rnaseq-encode-nf/db303a81/benchmark/crg-cluster/report.html", + "_key": "2336fe7d376f", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The execution report for the Batch execution is available at ", + "_key": "661beb8ebf81" + }, + { + "marks": [ + "984670afa53a" + ], + "text": "this link", + "_key": "15ab5c295c43", + "_type": "span" + }, + { + "marks": [], + "text": " and the one for cluster is available ", + "_key": "84435d5f2267", + "_type": "span" + }, + { + "text": "here", + "_key": "b6b4f315e877", + "_type": "span", + "marks": [ + "2336fe7d376f" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "50279fe6483b" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "b7ef4d33aec2" + } + ], + "_type": "block", + "style": "normal", + "_key": "7ba1d92d1a03" + }, + { + "_type": "block", + "style": "h3", + "_key": "f04a8cde4634", + "children": [ + { + "_key": "a408b3fd6b05", + "_type": "span", + "text": "Conclusion" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This post shows how Nextflow integrates smoothly with AWS Batch and how it can be used to deploy and execute real world genomics pipeline in the cloud with ease.", + "_key": "cdda08689332" + } + ], + "_type": "block", + "style": "normal", + "_key": "41cbe2ece04f" + }, + { + "style": "normal", + "_key": "ef2d1a8c9ddf", + "children": [ + { + "text": "", + "_key": "2342d2dce2b7", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "83d202a4d949", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The auto-scaling ability provided by AWS Batch along with the use of spot instances make the use of the cloud even more cost effective. Running on a local cluster may still be cheaper, even if it is non trivial to account for all the real costs of a HPC infrastructure. However the cloud allows flexibility and scalability not possible with common on-premises clusters.", + "_key": "77b2da54c3ff", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "6a08276c2978", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2756a293a535" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "We also demonstrate how the same Nextflow pipeline can be ", + "_key": "f21ed0c5c93d" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "transparently", + "_key": "6dcb4a5ef09a" + }, + { + "_key": "b07d840ccef9", + "_type": "span", + "marks": [], + "text": " deployed in two very different computing infrastructure, using different containerisation technologies by simply providing a separate configuration profile." + } + ], + "_type": "block", + "style": "normal", + "_key": "ed4011357f0b" + }, + { + "children": [ + { + "text": "", + "_key": "d1da09179183", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ba41ec9396ad" + }, + { + "_type": "block", + "style": "normal", + "_key": "9b46363220a7", + "markDefs": [], + "children": [ + { + "_key": "e13a3d5af8e9", + "_type": "span", + "marks": [], + "text": "This approach enables the interoperability across different deployment sites, reduces operational and maintenance costs and guarantees consistent results over time." + } + ] + }, + { + "_key": "4d01301006d9", + "children": [ + { + "_type": "span", + "text": "", + "_key": "ad30f5421ec7" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "6d75d603ec58", + "_type": "span", + "text": "Credits" + } + ], + "_type": "block", + "style": "h3", + "_key": "35a97ee2841b" + }, + { + "_key": "56863abc994b", + "markDefs": [ + { + "href": "https://twitter.com/fstrozzi", + "_key": "dcc1c8a19617", + "_type": "link" + }, + { + "href": "https://github.com/emi80", + "_key": "5be6936f03af", + "_type": "link" + }, + { + "_key": "2cf585b55b24", + "_type": "link", + "href": "https://gitter.im/skptic" + } + ], + "children": [ + { + "marks": [], + "text": "This post is co-authored with ", + "_key": "ea1bba6d5b45", + "_type": "span" + }, + { + "marks": [ + "dcc1c8a19617" + ], + "text": "Francesco Strozzi", + "_key": "75fd6ae0caa2", + "_type": "span" + }, + { + "marks": [], + "text": ", who also helped to write the pipeline used for the benchmark in this post and contributed to and tested the AWS Batch integration. Thanks to ", + "_key": "1253c482491b", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "5be6936f03af" + ], + "text": "Emilio Palumbo", + "_key": "d2617c9e8488" + }, + { + "_type": "span", + "marks": [], + "text": " that helped to set-up and configure the AWS Batch environment and ", + "_key": "b40d2b1cb79c" + }, + { + "_type": "span", + "marks": [ + "2cf585b55b24" + ], + "text": "Evan Floden", + "_key": "94cfba4288fe" + }, + { + "_type": "span", + "marks": [], + "text": " for the comments.", + "_key": "9abe7f2c2ec0" + } + ], + "_type": "block", + "style": "normal" + } + ], + "author": { + "_type": "reference", + "_ref": "paolo-di-tommaso" + }, + "_createdAt": "2024-09-25T14:15:21Z", + "title": "Scaling with AWS Batch", + "_rev": "Ot9x7kyGeH5005E3MJ9Z5w", + "tags": [ + { + "_key": "eaca60cde6d5", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference" + }, + { + "_type": "reference", + "_key": "bb0317333425", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + }, + { + "_key": "e01020702cf2", + "_ref": "9161ec05-53f8-455a-a931-7b41f6ec5172", + "_type": "reference" + } + ], + "_type": "blogPost" + }, + { + "tags": [ + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "37ceee71feed" + } + ], + "_id": "a8cb7eaf028f", + "_rev": "Ot9x7kyGeH5005E3MJ9H4s", + "_type": "blogPost", + "body": [ + { + "_key": "f75a154b7386", + "markDefs": [], + "children": [ + { + "text": "Nextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn't be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options.", + "_key": "6560e56a4bfc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "5951a7417533" + } + ], + "_type": "block", + "style": "normal", + "_key": "55b8b933afd8" + }, + { + "_type": "image", + "alt": "xkcd charge making fun of tar tricky command line arguments", + "_key": "a9de4db08bd2", + "asset": { + "_ref": "image-f697fe230bc989080eca53947a3e4156a638edc3-1425x458-png", + "_type": "reference" + } + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "e35c6c509bc1", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "aad105f28d55" + }, + { + "_type": "block", + "style": "normal", + "_key": "4cd58ef73e84", + "markDefs": [ + { + "href": "https://fig.io", + "_key": "a62a06fee244", + "_type": "link" + } + ], + "children": [ + { + "text": "Many tools have sprung up to make the command-line more user friendly, such as tldr pages and rich-click. ", + "_key": "35f9d119469b", + "_type": "span", + "marks": [] + }, + { + "_key": "7e75600d809a", + "_type": "span", + "marks": [ + "a62a06fee244" + ], + "text": "Fig" + }, + { + "_type": "span", + "marks": [], + "text": " is one such tool that adds powerful autocomplete functionality to your terminal. Fig gives you graphical popups with color-coded contexts more dynamic than shaded text for recent commands or long blocks of text after pressing tab.", + "_key": "a469d091a44d" + } + ] + }, + { + "children": [ + { + "_key": "f36a27dbf6be", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "e3224234577c", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Fig is compatible with most terminals, shells and IDEs (such as the VSCode terminal), is fully supported in MacOS, and has beta support for Linux and Windows. In MacOS, you can simply install it with ", + "_key": "2d7fcc372f9c", + "_type": "span" + }, + { + "_key": "3831dd13324c", + "_type": "span", + "marks": [ + "code" + ], + "text": "brew install --cask fig" + }, + { + "marks": [], + "text": " and then running the ", + "_key": "0fa3b46b6fa2", + "_type": "span" + }, + { + "_key": "a6c1794d78b8", + "_type": "span", + "marks": [ + "code" + ], + "text": "fig" + }, + { + "_type": "span", + "marks": [], + "text": " command to set it up.", + "_key": "914df0890acd" + } + ], + "_type": "block", + "style": "normal", + "_key": "b379766c3001" + }, + { + "children": [ + { + "_key": "a267fe6b5a99", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "0e9a4c4c6f85", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We have now added Nextflow for Fig. Thanks to Figs open source core we were able to contribute specifications in Typescript that will now be automatically added for anyone installing or updating Fig. Now, with Fig, when you start typing your Nextflow commands, you’ll see autocomplete suggestions based on what you are typing and what you have typed in the past, such as your favorite options.", + "_key": "d96b558679a5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "56f3ac3a6ce2" + }, + { + "_type": "block", + "style": "normal", + "_key": "2b63fcac5618", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d49989af6761" + } + ] + }, + { + "_type": "image", + "alt": "GIF with a demo of nextflow log/list subcommands", + "_key": "5aca24e6c893", + "asset": { + "_ref": "image-b5b6632ecf6e637444ece4dbdbd5b23f6597da8e-544x226-gif", + "_type": "reference" + } + }, + { + "_key": "9387ad6302f0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ee080466aacf", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "51bc87e82916", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Fig autocomplete functionality can also be adjusted to suit our preferences. Suggestions can be displayed in alphabetical order or as a list of your most recent commands. Similarly, suggestions can be displayed all the time or only when you press tab.", + "_key": "423239c354d9" + } + ], + "_type": "block" + }, + { + "_key": "a99369f11cc6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "3c5d27413ec0", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "c2a2675f811c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The Fig specification that we've written not only suggests commands and options, but dynamic inputs too. For example, finding previous run names when resuming or cleaning runs is tedious and error prone. Similarly, pipelines that you’ve already downloaded with ", + "_key": "ea8268732f1c" + }, + { + "text": "nextflow pull", + "_key": "8bb11ec55485", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " will be autocompleted if they have been run in the past. You won't have to remember the full names anymore, as Fig generators in the autocomplete allow you to automatically complete the run name after typing a few letters where a run name is expected. Importantly, this also works for pipeline names!", + "_key": "a56455a6cabf", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "18d9fc8a36e1" + } + ], + "_type": "block", + "style": "normal", + "_key": "a4e36c275a92", + "markDefs": [] + }, + { + "asset": { + "_ref": "image-f07a6c11fb87db69b34d76ca7891f13d53d46d6b-544x226-gif", + "_type": "reference" + }, + "_type": "image", + "alt": "GIF with a demo of nextflow pull/run/clean/view/config subcommands", + "_key": "01e0127db20c" + }, + { + "_type": "block", + "style": "normal", + "_key": "640afa574235", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "48ac5bc6c959" + } + ] + }, + { + "style": "normal", + "_key": "020a40b04e2f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Fig for Nextflow will make you increase your productivity regardless of your user level. If you run multiple pipelines during your day you will immediately see the benefit of Fig. Your productivity will increase by taking advantage of this autocomplete function for run and project names. For Nextflow newcomers it will provide an intuitive way to explore the Nextflow CLI with built-in help text.", + "_key": "69aa04a1743a" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "b4bbe613326e", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "d8cc0a1ecbd6" + }, + { + "_key": "73c03bb18209", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "While Fig won’t replace the need to view help menus and documentation it will undoubtedly save you time and energy searching for commands and copying and pasting run names. Take your coding to the next level using Fig!", + "_key": "81e644d6e14d" + } + ], + "_type": "block", + "style": "normal" + } + ], + "meta": { + "slug": { + "current": "turbocharging-nextflow-with-fig" + }, + "description": "Nextflow is a powerful workflow manager that supports multiple container technologies, cloud providers and HPC job schedulers. It shouldn’t be a surprise that wide ranging functionality leads to a complex interface, but comes with the drawback of many subcommands and options to remember. For a first-time user (and sometimes even for some long-time users) it can be difficult to remember everything. This is not a new problem for the command-line; even very common applications such as grep and tar are famous for having a bewildering array of options." + }, + "author": { + "_ref": "mNsm4Vx1W1Wy6aYYkroetD", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:16:51Z", + "_updatedAt": "2024-09-30T08:58:53Z", + "title": "Turbo-charging the Nextflow command line with Fig!", + "publishedAt": "2022-09-22T06:00:00.000Z" + }, + { + "author": { + "_type": "reference", + "_ref": "evan-floden" + }, + "publishedAt": "2016-04-13T06:00:00.000Z", + "_createdAt": "2024-09-25T14:15:02Z", + "body": [ + { + "style": "normal", + "_key": "cba73ea90c34", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Publication time acts as a snapshot for scientific work. Whether a project is ongoing or not, work which was performed months ago must be described, new software documented, data collated and figures generated.", + "_key": "5f320abf91ae" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7623aba73446" + } + ], + "_type": "block", + "style": "normal", + "_key": "69766958048a", + "markDefs": [] + }, + { + "markDefs": [ + { + "_key": "7e2f7c394d29", + "_type": "link", + "href": "http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080278" + } + ], + "children": [ + { + "marks": [], + "text": "The monumental increase in data and pipeline complexity has led to this task being performed to many differing standards, or ", + "_key": "0c2c80566eab", + "_type": "span" + }, + { + "marks": [ + "7e2f7c394d29" + ], + "text": "lack of thereof", + "_key": "819b7c5107da", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". We all agree it is not good enough to simply note down the software version number. But what practical measures can be taken?", + "_key": "3fb3d26f289d" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a072ec99c73" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a3b5fa0b3f44" + } + ], + "_type": "block", + "style": "normal", + "_key": "ea75a4fa6b7c", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "fd5fb06d3ca0", + "markDefs": [ + { + "href": "https://doi.org/10.1038/nbt.3519", + "_key": "61b0db71dcea", + "_type": "link" + }, + { + "_key": "26ea0e0e16a6", + "_type": "link", + "href": "https://github.com/pachterlab/kallisto_paper_analysis" + } + ], + "children": [ + { + "marks": [], + "text": "The recent publication describing ", + "_key": "96392679886b", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Kallisto", + "_key": "81f47fd9dd9c" + }, + { + "marks": [], + "text": " ", + "_key": "bc13f50e1ad7", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "61b0db71dcea" + ], + "text": "(Bray et al. 2016)", + "_key": "7fd4aab4181a" + }, + { + "text": " provides an excellent high profile example of the growing efforts to ensure reproducible science in computational biology. The authors provide a GitHub ", + "_key": "a7a5e0d6b40b", + "_type": "span", + "marks": [] + }, + { + "_key": "61e5d8e27eeb", + "_type": "span", + "marks": [ + "26ea0e0e16a6" + ], + "text": "repository" + }, + { + "text": " that ", + "_key": "3041d88d62d5", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "“contains all the analysis to reproduce the results in the kallisto paper”", + "_key": "4b509f322d10" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "83d74dbaa54e" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1454c2fd344f", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "29e012023c5f" + }, + { + "style": "normal", + "_key": "8508053ba838", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "They should be applauded and indeed - in the Twittersphere - they were. The corresponding author Lior Pachter stated that the publication could be reproduced starting from raw reads in the NCBI Sequence Read Archive through to the results, which marks a fantastic accomplishment.", + "_key": "f144dcac581c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "0b5674226ba4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "2bc7d1a82754" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "blockquote", + "_key": "bbc9b1f9e026", + "markDefs": [ + { + "_type": "link", + "href": "https://t.co/qiu3LFozMX", + "_key": "5bead306418d" + }, + { + "href": "https://twitter.com/yarbsalocin", + "_key": "51022986aeee", + "_type": "link" + }, + { + "_type": "link", + "href": "https://twitter.com/hjpimentel", + "_key": "d530563a9dcf" + }, + { + "href": "https://twitter.com/pmelsted", + "_key": "5497ad974caf", + "_type": "link" + }, + { + "_type": "link", + "href": "https://twitter.com/hashtag/kallisto?src=hash", + "_key": "927c728e2321" + }, + { + "_key": "e93a9bd3aaf4", + "_type": "link", + "href": "https://x.com/lpachter/status/717279998424457216" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Hoping people will notice ", + "_key": "5f54d11bf956" + }, + { + "text": "https://t.co/qiu3LFozMX", + "_key": "99f551dd9741", + "_type": "span", + "marks": [ + "5bead306418d" + ] + }, + { + "text": " by ", + "_key": "4c52f8e38e05", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "51022986aeee" + ], + "text": "@yarbsalocin", + "_key": "c2940a00293b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " ", + "_key": "83bf33ed7a70" + }, + { + "_type": "span", + "marks": [ + "d530563a9dcf" + ], + "text": "@hjpimentel", + "_key": "0f708f63d667" + }, + { + "text": " ", + "_key": "73f453b44931", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "5497ad974caf" + ], + "text": "@pmelsted", + "_key": "567bdcef374d", + "_type": "span" + }, + { + "marks": [], + "text": " reproducing ALL the ", + "_key": "c1ea7565e54a", + "_type": "span" + }, + { + "marks": [ + "927c728e2321" + ], + "text": "#kallisto", + "_key": "b20793eb1a0e", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " paper from SRA→results — ", + "_key": "5e9694a8da86" + }, + { + "_key": "9f041c58024f", + "_type": "span", + "marks": [ + "e93a9bd3aaf4" + ], + "text": "Lior Pachter (@lpachter) April 5, 2016" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "930b63c1e9e9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "0b3b37911304", + "markDefs": [] + }, + { + "children": [ + { + "_key": "5f2bd862187b", + "_type": "span", + "marks": [], + "text": "They achieve this utilising the workflow framework " + }, + { + "_key": "4399a4320a61", + "_type": "span", + "marks": [ + "af7ddf44eeb3" + ], + "text": "Snakemake" + }, + { + "_type": "span", + "marks": [], + "text": ". Increasingly, we are seeing scientists applying workflow frameworks to their pipelines, which is great to see. There is a learning curve, but I have personally found the payoffs in productivity to be immense.", + "_key": "0fc3985e59fd" + } + ], + "_type": "block", + "style": "normal", + "_key": "49a95b38fb15", + "markDefs": [ + { + "_type": "link", + "href": "https://bitbucket.org/snakemake/snakemake/wiki/Home", + "_key": "af7ddf44eeb3" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "d5d1a1f4261f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "ddc3ea03eacc", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "5c637751afb9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "As both users and developers of Nextflow, we have long discussed best practice to ensure reproducibility of our work. As a community, we are at the beginning of that conversation - there are still many ideas to be aired and details ironed out—nevertheless, we wished to provide a state-of-play as we see it and describe what is possible with Nextflow in this regard.", + "_key": "7fad820c405d", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "89f111eb0c5f", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "261184d4db07" + }, + { + "style": "h3", + "_key": "53af2b1aa323", + "markDefs": [], + "children": [ + { + "text": "Guaranteed Reproducibility", + "_key": "d604801300e0", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "This is our goal. It is one thing for a pipeline to be able to be reproduced in your own hands, on your machine, yet is another for this to be guaranteed so that anyone anywhere can reproduce it. What I mean by guaranteed is that when a given pipeline is executed, there is only one result which can be output. Envisage what I term the ", + "_key": "432b05ff8001" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "reproducibility triangle", + "_key": "331f7f81a527" + }, + { + "_type": "span", + "marks": [], + "text": ": consisting of data, code and compute environment.", + "_key": "5117c2128079" + } + ], + "_type": "block", + "style": "normal", + "_key": "afdd74f103a6" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ddd3129b920a" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a5d54d81c2b" + }, + { + "alt": "Reproducibility Triangle", + "_key": "ccd455638d2d", + "asset": { + "_ref": "image-2e4a76f057a6ea8be016e34319d34338f54e2c86-500x416-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "9f9356b72d5c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "670b0644c8eb" + }, + { + "_key": "28da8642b59f", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Figure 1:", + "_key": "d754132200d0", + "_type": "span" + }, + { + "_key": "e95480fdf992", + "_type": "span", + "marks": [], + "text": " The Reproducibility Triangle. " + }, + { + "_key": "6cd53aba89b6", + "_type": "span", + "marks": [ + "em" + ], + "text": "Data" + }, + { + "_key": "d0cbb2af3026", + "_type": "span", + "marks": [], + "text": ": raw data such as sequencing reads, genomes and annotations but also metadata such as experimental design. " + }, + { + "_key": "524635e25b4d", + "_type": "span", + "marks": [ + "em" + ], + "text": "Code" + }, + { + "_type": "span", + "marks": [], + "text": ": scripts, binaries and libraries/dependencies. ", + "_key": "e5a689c4ea5e" + }, + { + "marks": [ + "em" + ], + "text": "Environment", + "_key": "3258f2806d94", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ": operating system.", + "_key": "728bcab06aaf" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "d35e57ac0af5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f89270ee696d" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_key": "0c0f861ae39f", + "_type": "span", + "marks": [], + "text": "If there is any change to one of these then the reproducibililty is no longer guaranteed. For years there have been solutions to each of these individual components. But they have lived a somewhat discrete existence: data in databases such as the SRA and Ensembl, code on GitHub and compute environments in the form of virtual machines. We think that in the future science must embrace solutions that integrate each of these components natively and holistically." + } + ], + "_type": "block", + "style": "normal", + "_key": "c461f2f95fb8", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "2b14649d95af", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "af37059ce15c" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Implementation", + "_key": "e5380a5efa5b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "dc2316f0cf4f" + }, + { + "children": [ + { + "marks": [], + "text": "Nextflow provides a solution to reproduciblility through version control and sandboxing.", + "_key": "739636a3892c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "70026b8253e9", + "markDefs": [] + }, + { + "_key": "2cf3d7c1a163", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7dfbbe3a650b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "h4", + "_key": "ec4d0eb245ed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Code", + "_key": "633334c87665" + } + ] + }, + { + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/sharing.html", + "_key": "ab8060ad378c", + "_type": "link" + }, + { + "_key": "43f7f5bc9896", + "_type": "link", + "href": "https://github.com/cbcrg/kallisto-nf" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Version control is provided via ", + "_key": "ad2f5ca0d91b" + }, + { + "text": "native integration with GitHub", + "_key": "059377e340ec", + "_type": "span", + "marks": [ + "ab8060ad378c" + ] + }, + { + "_type": "span", + "marks": [], + "text": " and other popular code management platforms such as Bitbucket and GitLab. Pipelines can be pulled, executed, developed, collaborated on and shared. For example, the command below will pull a specific version of a ", + "_key": "d5bb3fa94486" + }, + { + "_type": "span", + "marks": [ + "43f7f5bc9896" + ], + "text": "simple Kallisto + Sleuth pipeline", + "_key": "7dae48213716" + }, + { + "_key": "56f4b7c6bd01", + "_type": "span", + "marks": [], + "text": " from GitHub and execute it. The " + }, + { + "_key": "5232fd8b765d", + "_type": "span", + "marks": [ + "code" + ], + "text": "-r" + }, + { + "_type": "span", + "marks": [], + "text": " parameter can be used to specify a specific tag, branch or revision that was previously defined in the Git repository.", + "_key": "e1a2bbc55368" + } + ], + "_type": "block", + "style": "normal", + "_key": "154ad35165b3" + }, + { + "_type": "block", + "style": "normal", + "_key": "f01d7cd6e6b3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3ce083bd2a37", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "code", + "_key": "063ed29b86b6", + "code": "nextflow run cbcrg/kallisto-nf -r v0.9" + }, + { + "_type": "block", + "style": "h4", + "_key": "4549f16ee2e0", + "markDefs": [], + "children": [ + { + "text": "Environment", + "_key": "a65e9ab76968", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "6fe3b29ef335", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Sandboxing during both development and execution is another key concept; version control alone does not ensure that all dependencies nor the compute environment are the same.", + "_key": "cb0f09fd2113" + } + ] + }, + { + "_key": "e81d52e3b936", + "markDefs": [], + "children": [ + { + "_key": "84e0d21c8c7a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "href": "https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config", + "_key": "43d0cadf8547", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "A simplified implementation of this places all binaries, dependencies and libraries within the project repository. In Nextflow, any binaries within the the ", + "_key": "61abfb341a1f", + "_type": "span" + }, + { + "text": "bin", + "_key": "96d51552a726", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " directory of a repository are added to the path. Also, within the Nextflow ", + "_key": "c284ff05a995", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "43d0cadf8547" + ], + "text": "config file", + "_key": "d96cf6761a7a" + }, + { + "_key": "ae100b3fd6c5", + "_type": "span", + "marks": [], + "text": ", environmental variables such as " + }, + { + "_key": "c066652812c7", + "_type": "span", + "marks": [ + "code" + ], + "text": "PERL5LIB" + }, + { + "marks": [], + "text": " can be defined so that they are automatically added during the task executions.", + "_key": "2904daadcb91", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "89fe413ac8de" + }, + { + "style": "normal", + "_key": "79e9722ef756", + "markDefs": [], + "children": [ + { + "_key": "a74a19ef06f0", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "aa2b914682cb", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/docker.html", + "_key": "664650dd2782" + }, + { + "href": "https://doi.org/10.7717/peerj.1273", + "_key": "ff72db4f068b", + "_type": "link" + }, + { + "_type": "link", + "href": "https://github.com/cbcrg/kallisto-nf/blob/master/Dockerfile", + "_key": "663612c7ebe4" + } + ], + "children": [ + { + "text": "This can be taken a step further with containerisation such as ", + "_key": "99260bf2fbd2", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "664650dd2782" + ], + "text": "Docker", + "_key": "8c424b7a92c6", + "_type": "span" + }, + { + "_key": "bba1c6a27e26", + "_type": "span", + "marks": [], + "text": ". We have recently published " + }, + { + "_type": "span", + "marks": [ + "ff72db4f068b" + ], + "text": "work", + "_key": "cac6719ff634" + }, + { + "_key": "aa94dac853bc", + "_type": "span", + "marks": [], + "text": " about this: briefly a " + }, + { + "_key": "ba006d52a3d1", + "_type": "span", + "marks": [ + "663612c7ebe4" + ], + "text": "dockerfile" + }, + { + "marks": [], + "text": " containing the instructions on how to build the docker image resides inside a repository. This provides a specification for the operating system, software, libraries and dependencies to be run.", + "_key": "1a3177d34b7c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "c4ed951897af", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "a8cf9cacaf93" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://docs.docker.com/engine/userguide/containers/dockerimages/#image-digests", + "_key": "23622e140f58" + }, + { + "href": "https://github.com/cbcrg/kallisto-nf/blob/master/nextflow.config", + "_key": "43ab127a52f1", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The images themself also have content-addressable identifiers in the form of ", + "_key": "5adc09cbc701" + }, + { + "_type": "span", + "marks": [ + "23622e140f58" + ], + "text": "digests", + "_key": "3050a89c8a16" + }, + { + "_type": "span", + "marks": [], + "text": ", which ensure not a single byte of information, from the operating system through to the libraries pulled from public repos, has been changed. This container digest can be specified in the ", + "_key": "fa7b594fa3a4" + }, + { + "text": "pipeline config file", + "_key": "0d7883602bc9", + "_type": "span", + "marks": [ + "43ab127a52f1" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "6efc234ef0a1" + } + ], + "_type": "block", + "style": "normal", + "_key": "5fd4a8d94d05" + }, + { + "_type": "block", + "style": "normal", + "_key": "14003b18e837", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "08ec3eeadd49" + } + ] + }, + { + "_type": "code", + "_key": "923f32b3fa4b", + "code": "process {\n container = \"cbcrg/kallisto-nf@sha256:9f84012739...\"\n}" + }, + { + "_key": "e71bf00f9d90", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When doing so Nextflow automatically pulls the specified image from the Docker Hub and manages the execution of the pipeline tasks from within the container in a transparent manner, i.e. without having to adapt or modify your code.", + "_key": "26c3c4bd8dae", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "80dc492d4427", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f4c8c6c29723" + } + ] + }, + { + "children": [ + { + "text": "Data", + "_key": "7b85683afc21", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h4", + "_key": "8f007167b89c", + "markDefs": [] + }, + { + "_key": "d679694fe1a6", + "markDefs": [ + { + "_key": "136e8629821f", + "_type": "link", + "href": "https://git-lfs.github.com/" + } + ], + "children": [ + { + "text": "Data is currently one of the more challenging aspect to address. ", + "_key": "3bf7ae44d388", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "em" + ], + "text": "Small data", + "_key": "716821e6f805", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " can be easily version controlled within git-like repositories. For larger files the ", + "_key": "adb62c76f899" + }, + { + "_key": "6b7a239f1f46", + "_type": "span", + "marks": [ + "136e8629821f" + ], + "text": "Git Large File Storage" + }, + { + "_type": "span", + "marks": [], + "text": ", for which Nextflow provides built-in support, may be one solution. Ultimately though, the real home of scientific data is in publicly available, programmatically accessible databases.", + "_key": "945dcc236462" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "783591e90724", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "69963e55a7cf", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "32868c03369b", + "markDefs": [ + { + "_key": "9a477d3af0dd", + "_type": "link", + "href": "http://www.ncbi.nlm.nih.gov/sra" + }, + { + "_type": "link", + "href": "http://www.ensembl.org/index.html", + "_key": "c9cefef98a8c" + }, + { + "href": "https://www.ncbi.nlm.nih.gov/bioproject/", + "_key": "749e82521fda", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "Providing out-of-box solutions is difficult given the hugely varying nature of the data and meta-data within these databases. We are currently looking to incorporate the most highly used ones, such as the ", + "_key": "f613f6b1ce0e", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "9a477d3af0dd" + ], + "text": "SRA", + "_key": "07434c15efe8" + }, + { + "text": " and ", + "_key": "6323c0f4f978", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "c9cefef98a8c" + ], + "text": "Ensembl", + "_key": "26f0fa74de34" + }, + { + "marks": [], + "text": ". In the long term we have an eye on initiatives, such as ", + "_key": "3c5922e8098b", + "_type": "span" + }, + { + "text": "NCBI BioProject", + "_key": "34bb1aa55541", + "_type": "span", + "marks": [ + "749e82521fda" + ] + }, + { + "_key": "aeda864f983c", + "_type": "span", + "marks": [], + "text": ", with the idea there is a single identifier for both the data and metadata that can be referenced in a workflow." + } + ] + }, + { + "style": "normal", + "_key": "161b53f3ea8b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "0be53de8bebf", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "11a1ad204c5f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Adhering to the practices above, one could imagine one line of code which would appear within a publication.", + "_key": "19190022a1ae", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "7d325bd67bb1", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "71b4239b53e8", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "code", + "_key": "1d90aec1859a", + "code": "nextflow run [user/repo] -r [version] --data [DB_reference:data_reference] -with-docker" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The result would be guaranteed to be reproduced by whoever wished.", + "_key": "73e1046e94aa" + } + ], + "_type": "block", + "style": "normal", + "_key": "629bc88e769c" + }, + { + "_type": "block", + "style": "normal", + "_key": "a71b1c7b4d7b", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0a8bfc98310e" + } + ] + }, + { + "children": [ + { + "text": "Conclusion", + "_key": "394ffd350a70", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "117d2af784bd", + "markDefs": [] + }, + { + "style": "normal", + "_key": "7f97e99f5d75", + "markDefs": [], + "children": [ + { + "text": "With this approach the reproducilbility triangle is complete. But it must be noted that this does not guard against conceptual or implementation errors. It does not replace proper documentation. What it does is to provide transparency to a result.", + "_key": "25cdcaada353", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "699d6035b2fb", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "643f25477e97" + }, + { + "markDefs": [], + "children": [ + { + "_key": "052500178e51", + "_type": "span", + "marks": [], + "text": "The assumption that the deterministic nature of computation makes results insusceptible to irreproducbility is clearly false. We consider Nextflow with its other features such its polyglot nature, out-of-the-box portability and native support across HPC and Cloud environments to be an ideal solution in our everyday work. We hope to see more scientists adopt this approach to their workflows." + } + ], + "_type": "block", + "style": "normal", + "_key": "166f27dd3abb" + }, + { + "_key": "c38e84692e4d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "e35307cbe396" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c6c05673694a", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The recent efforts by the ", + "_key": "910d0653b96c", + "_type": "span" + }, + { + "marks": [ + "em" + ], + "text": "Kallisto", + "_key": "e5c41a30e7e5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " authors highlight the appetite for increasing these standards and we encourage the community at large to move towards ensuring this becomes the normal state of affairs for publishing in science.", + "_key": "28f14677bc76" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "d3ae045f3879", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "d1f93400432a", + "markDefs": [] + }, + { + "style": "h3", + "_key": "7d99e15e63e7", + "markDefs": [], + "children": [ + { + "text": "References", + "_key": "c1db291ab55c", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "276e6c3f8fa5", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Bray, Nicolas L., Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology, April. Nature Publishing Group. doi:10.1038/nbt.3519.", + "_key": "699a5750edf8", + "_type": "span" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. (2015) “The impact of Docker containers on the performance of genomic pipelines.” PeerJ 3 doi.org:10.7717/peerj.1273.", + "_key": "59b9b239afe20" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "9e3180f28f7d", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "7a5fe9fe2e5d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) “Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.” PLoS ONE 8(11): e80278. doi:10.1371/journal.pone.0080278", + "_key": "c03e71b97e030", + "_type": "span", + "marks": [] + } + ], + "level": 1 + } + ], + "_rev": "mvya9zzDXWakVjnX4hhFgM", + "meta": { + "description": "Publication time acts as a snapshot for scientific work. Whether a project is ongoing or not, work which was performed months ago must be described, new software documented, data collated and figures generated.", + "slug": { + "current": "best-practice-for-reproducibility" + } + }, + "_updatedAt": "2024-10-02T13:53:07Z", + "_type": "blogPost", + "_id": "a9076f5bf5bc", + "title": "Workflows & publishing: best practice for reproducibility" + }, + { + "_rev": "Ot9x7kyGeH5005E3MJ8dKl", + "_id": "ac64884a081a", + "_updatedAt": "2024-09-30T09:57:17Z", + "title": "Setting up a Nextflow environment on Windows 10", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "meta": { + "slug": { + "current": "setup-nextflow-on-windows" + }, + "description": "For Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as Cygwin or Wubi. Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows." + }, + "body": [ + { + "markDefs": [ + { + "_type": "link", + "href": "http://www.cygwin.com/", + "_key": "6becb2c3b516" + }, + { + "href": "https://wiki.ubuntu.com/WubiGuide", + "_key": "bfc704fda1d5", + "_type": "link" + } + ], + "children": [ + { + "_key": "6cb202f46482", + "_type": "span", + "marks": [], + "text": "For Windows users, getting access to a Linux-based Nextflow development and runtime environment used to be hard. Users would need to run virtual machines, access separate physical servers or cloud instances, or install packages such as " + }, + { + "_type": "span", + "marks": [ + "6becb2c3b516" + ], + "text": "Cygwin", + "_key": "f0d573589aef" + }, + { + "_key": "ef4cfa1b0908", + "_type": "span", + "marks": [], + "text": " or " + }, + { + "_key": "c118989fc9d8", + "_type": "span", + "marks": [ + "bfc704fda1d5" + ], + "text": "Wubi" + }, + { + "text": ". Fortunately, there is now an easier way to deploy a complete Nextflow development environment on Windows.", + "_key": "d32f8bea0015", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5fc3324cbc64" + }, + { + "_type": "block", + "style": "normal", + "_key": "b03c749cbcfa", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "10aee690ecff" + } + ] + }, + { + "style": "normal", + "_key": "12930d82455d", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The Windows Subsystem for Linux (WSL) allows users to build, manage and execute Nextflow pipelines on a Windows 10 laptop or desktop without needing a separate Linux machine or cloud VM. Users can build and test Nextflow pipelines and containerized workflows locally, on an HPC cluster, or their preferred cloud service, including AWS Batch and Azure Batch.", + "_key": "da9bece122cf", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "382bb307a059", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0b56bb88fcf2" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "This document provides a step-by-step guide to setting up a Nextflow development environment on Windows 10.", + "_key": "5717951568be", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "bd367e17439d" + }, + { + "_type": "block", + "style": "normal", + "_key": "64ed8a2e0537", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ef95ade3ef62", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "h2", + "_key": "ab4b8dbcec0d", + "markDefs": [], + "children": [ + { + "text": "High-level Steps", + "_key": "aa4c6912125f", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "1c8db0a1cadb0", + "_type": "span", + "marks": [], + "text": "High-level Steps" + } + ], + "_type": "block", + "style": "h2", + "_key": "23eada725707" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "The steps described in this guide are as follows:", + "_key": "8258e15b4c650" + } + ], + "_type": "block", + "style": "normal", + "_key": "329409232b59", + "markDefs": [] + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "b6063eaca1ed", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Install Windows PowerShell", + "_key": "ea5ff14c8db10", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0a4cac1194f1", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Configure the Windows Subsystem for Linux (WSL2)", + "_key": "c3987f32b2360", + "_type": "span" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "9bcd0541e3ff", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "d420e301f1990", + "_type": "span", + "marks": [], + "text": "Obtain and Install a Linux distribution (on WSL2)" + } + ], + "level": 1, + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Install Windows Terminal", + "_key": "008b4838e96c0" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a0576a66796a", + "listItem": "bullet" + }, + { + "children": [ + { + "text": "Install and configure Docker", + "_key": "60926ae0c2aa0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3ccc55792260", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "b01f8daa0a26", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Download and install an IDE (VS Code)", + "_key": "30f1f73adc6a0" + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "9d8f7d2a49bb", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Install and test Nextflow", + "_key": "bc7752c8b52d0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "df3e27ebf14a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Configure X-Windows for use with the Nextflow console", + "_key": "d86ff3b7f2930", + "_type": "span" + } + ] + }, + { + "children": [ + { + "text": "Install and configure GIT", + "_key": "a070ba3e0edd0", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "005aa228f956", + "listItem": "bullet", + "markDefs": [] + }, + { + "_key": "f76f28c39e5a", + "markDefs": [], + "children": [ + { + "_key": "485fd4116dc4", + "_type": "span", + "marks": [], + "text": "Install Windows PowerShell" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "PowerShell is a cross-platform command-line shell and scripting language available for Windows, Linux, and macOS. If you are an experienced Windows user, you are probably already familiar with PowerShell. PowerShell is worth taking a few minutes to download and install.", + "_key": "cb103108b784" + } + ], + "_type": "block", + "style": "normal", + "_key": "a39fe896f5f0" + }, + { + "_type": "block", + "style": "normal", + "_key": "5264507e8c58", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8a5d70d215de", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "PowerShell is a big improvement over the Command Prompt in Windows 10. It brings features to Windows that Linux/UNIX users have come to expect, such as command-line history, tab completion, and pipeline functionality.", + "_key": "634eeadc57ce0" + } + ], + "_type": "block", + "style": "normal", + "_key": "9c87f925feda", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "You can obtain PowerShell for Windows from GitHub at the URL ", + "_key": "8b90055f21120", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "b3bd7e72c17c" + ], + "text": "https://github.com/PowerShell/PowerShell", + "_key": "8b90055f21121" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "8b90055f21122" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "02e7bcabe8c4", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/PowerShell/PowerShell", + "_key": "b3bd7e72c17c" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Download and install the latest stable version of PowerShell for Windows x64 - e.g., ", + "_key": "94a72f264d060", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "0d98d3aa635b" + ], + "text": "powershell-7.1.3-win-x64.msi", + "_key": "94a72f264d061" + }, + { + "text": ".", + "_key": "94a72f264d062", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ba9bf52ee9e7", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/PowerShell/PowerShell/releases/download/v7.1.3/PowerShell-7.1.3-win-x64.msi", + "_key": "0d98d3aa635b" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "afabf2127302", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-windows?view=powershell-7.1", + "_key": "bb0b9d1a7d3a" + } + ], + "children": [ + { + "text": "If you run into difficulties, Microsoft provides detailed instructions ", + "_key": "9707c601034e0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "bb0b9d1a7d3a" + ], + "text": "here", + "_key": "9707c601034e1" + }, + { + "text": ".", + "_key": "9707c601034e2", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "579bf907f6c30" + } + ], + "_type": "block", + "style": "normal", + "_key": "984904fd86f0", + "markDefs": [] + }, + { + "_key": "03fdea3f0902", + "markDefs": [], + "children": [ + { + "_key": "18ee1a18a7fc", + "_type": "span", + "marks": [], + "text": "Configure the Windows Subsystem for Linux (WSL)" + } + ], + "_type": "block", + "style": "h2" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Enable the Windows Subsystem for Linux", + "_key": "ae1277bdeafa", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "4b81656587f7" + }, + { + "children": [ + { + "text": "Make sure you are running Windows 10 Version 1903 with Build 18362 or higher. You can check your Windows version by selecting WIN-R (using the Windows key to run a command) and running the utility ", + "_key": "4a71d41d86a1", + "_type": "span", + "marks": [] + }, + { + "text": "winver", + "_key": "2e54f7b07151", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "3840f308f8bc" + } + ], + "_type": "block", + "style": "normal", + "_key": "f6e80cc6159a", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "19e8ee9da3ba", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "68a21de6f0d5", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0e932cecd2f0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "From within PowerShell, run the Windows Deployment Image and Service Manager (DISM) tool as an administrator to enable the Windows Subsystem for Linux. To run PowerShell with administrator privileges, right-click on the PowerShell icon from the Start menu or desktop and select \"", + "_key": "43b4139d7f70", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Run as administrator\".", + "_key": "61b473bfcfbf" + } + ] + }, + { + "_key": "f7031369f153", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "80b18208e94a", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a3572b8176cb", + "code": "PS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\n\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\nYou can learn more about DISM here.\n", + "_type": "code" + }, + { + "style": "normal", + "_key": "62b8296b72f0", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "df72f56ba302", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e5292856a4bb", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/what-is-dism", + "_key": "f0f427230150" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can learn more about DISM ", + "_key": "2b97a42e89e0" + }, + { + "_type": "span", + "marks": [ + "f0f427230150" + ], + "text": "here", + "_key": "177fc6259b91" + }, + { + "_key": "0e94cbf87b82", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1aeb6c1e5b9e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5657e8ed3b39" + }, + { + "children": [ + { + "marks": [], + "text": "Step 2: Enable the Virtual Machine Feature", + "_key": "136e79f4d2c4", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "754691a86770", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Within PowerShell, enable Virtual Machine Platform support using DISM. If you have trouble enabling this feature, make sure that virtual machine support is enabled in your machine's BIOS.", + "_key": "6e9bd6c47908", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d51d10aa103e" + }, + { + "style": "normal", + "_key": "f3e970691daa", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7f931ff4a923" + } + ], + "_type": "block" + }, + { + "code": "PS C:\\WINDOWS\\System32> dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart\nDeployment Image Servicing and Management tool\nVersion: 10.0.19041.844\nImage Version: 10.0.19041.1083\nEnabling feature(s)\n[==========================100.0%==========================]\nThe operation completed successfully.\nAfter enabling the Virtual Machine Platform support, restart your machine.", + "_type": "code", + "_key": "5520cb110999" + }, + { + "_key": "eba584b559ab", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "29ff1beb31ee" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "f36a31712562", + "markDefs": [], + "children": [ + { + "text": "After enabling the Virtual Machine Platform support, ", + "_key": "9d790a95f55c", + "_type": "span", + "marks": [] + }, + { + "_key": "4fdd4e017eb5", + "_type": "span", + "marks": [ + "strong" + ], + "text": "restart your machine" + }, + { + "_key": "1a41edbfa5d8", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "97bf3acd4db5", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "be9949c88e48" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "c75b04d7f3c8", + "markDefs": [], + "children": [ + { + "text": "Step 3: Download the Linux Kernel Update Package", + "_key": "b798f4650238", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "ccf3eb03f17a", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b0902dd4666d", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "3db61b10b4e6", + "markDefs": [ + { + "href": "https://docs.microsoft.com/en-us/windows/wsl/compare-versions", + "_key": "cc630f9ac0f7", + "_type": "link" + } + ], + "children": [ + { + "_key": "d40395d0c95d", + "_type": "span", + "marks": [], + "text": "Nextflow users will want to take advantage of the latest features in WSL 2. You can learn about differences between WSL 1 and WSL 2 " + }, + { + "marks": [ + "cc630f9ac0f7" + ], + "text": "here", + "_key": "cabb90f743d5", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ". Before you can enable support for WSL 2, you'll need to download the kernel update package at the link below:", + "_key": "5e9787db0d3e" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "654b932aa775", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "6cfb572f3e6b" + }, + { + "children": [ + { + "marks": [ + "ef6839e5bee4" + ], + "text": "WSL2 Linux kernel update package for x64 machines", + "_key": "1ff7abf51131", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "c8cd0ad46716", + "markDefs": [ + { + "href": "https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi", + "_key": "ef6839e5bee4", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "2afa2b6e6a2c", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5179d301a2af", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Once downloaded, double click on the kernel update package and select "Yes" to install it with elevated permissions.", + "_key": "6e51fe37791e", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "943f86771a75" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "42702d5a30e6", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ac54b5f90d2" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "STEP 4: Set WSL2 as your Default Version", + "_key": "3efced757707" + } + ], + "_type": "block", + "style": "h3", + "_key": "3001c33962bf" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "From within PowerShell:", + "_key": "0ff377632cfc" + } + ], + "_type": "block", + "style": "normal", + "_key": "fa8235c5b29b" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "822a79e1c2b2" + } + ], + "_type": "block", + "style": "normal", + "_key": "afcedbb7f568", + "markDefs": [] + }, + { + "_type": "code", + "_key": "2aaa6e836228", + "code": "PS C:\\WINDOWS\\System32> wsl --set-default-version 2\nFor information on key differences with WSL 2 please visit https://aka.ms/wsl2" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ee6baf9e64d1" + } + ], + "_type": "block", + "style": "normal", + "_key": "8b04fddd5f49", + "markDefs": [] + }, + { + "style": "normal", + "_key": "17156ee9a66e", + "markDefs": [ + { + "href": "https://docs.microsoft.com/en-us/windows/wsl/install-win10#manual-installation-steps", + "_key": "7f550bf28027", + "_type": "link" + } + ], + "children": [ + { + "text": "If you run into difficulties with any of these steps, Microsoft provides detailed installation instructions ", + "_key": "c0ff56067c1a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "7f550bf28027" + ], + "text": "here", + "_key": "fd7e8bb25a66" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "f6b074e4ceb5" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "aae45645d044" + } + ], + "_type": "block", + "style": "normal", + "_key": "4f0b07d0ad03", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "3c026ae4f649", + "_type": "span", + "marks": [], + "text": "Obtain and Install a Linux Distribution on WSL" + } + ], + "_type": "block", + "style": "h2", + "_key": "04f9eac6b121" + }, + { + "_key": "9f904f27cf5f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you normally install Linux on VM environments such as VirtualBox or VMware, this probably sounds like a lot of work. Fortunately, Microsoft provides Linux OS distributions via the Microsoft Store that work with the Windows Subsystem for Linux.", + "_key": "218a9cafdb88" + } + ], + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "41b2795787bd", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://aka.ms/wslstore", + "_key": "952403cf27c7" + } + ], + "children": [ + { + "marks": [], + "text": "Use this link to access and download a Linux Distribution for WSL through the Microsoft Store - ", + "_key": "868ae80777a00", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "952403cf27c7" + ], + "text": "https://aka.ms/wslstore", + "_key": "868ae80777a01" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "868ae80777a02" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "We selected the Ubuntu 20.04 LTS release. You can use a different distribution if you choose. Installation from the Microsoft Store is automated. Once the Linux distribution is installed, you can run a shell on Ubuntu (or your installed OS) from the Windows Start menu.", + "_key": "0f7f42fb56110", + "_type": "span", + "marks": [] + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "a539076d5661", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "670295762c2c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When you start Ubuntu Linux for the first time, you will be prompted to provide a UNIX username and password. The username that you select can be distinct from your Windows username. The UNIX user that you create will automatically have ", + "_key": "56cd257ad09b0", + "_type": "span" + }, + { + "text": "sudo", + "_key": "56cd257ad09b1", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " privileges. Whenever a shell is started, it will default to this user.", + "_key": "56cd257ad09b2", + "_type": "span" + } + ], + "level": 1 + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "d796eb3a74700", + "_type": "span", + "marks": [], + "text": "After setting your username and password, update your packages on Ubuntu from the Linux shell using the following command:" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "af0ccab84aea" + }, + { + "_type": "block", + "style": "normal", + "_key": "444823c753be", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ab37442c295d" + } + ] + }, + { + "code": "sudo apt update && sudo apt upgrade", + "_type": "code", + "_key": "1cb29d30cc99" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "445880c922e5", + "_type": "span", + "marks": [], + "text": "\n\n```bash\nsudo apt update && sudo apt upgrade\n```\nThis is also a good time to add any additional Linux packages that you will want to use.\n\n```bash\nsudo apt install net-tools\n```" + } + ], + "_type": "block", + "style": "normal", + "_key": "7ca439dc9c88" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cc7a9ea3d5cc" + } + ], + "_type": "block", + "style": "normal", + "_key": "278ec0d9c8d1" + }, + { + "_type": "block", + "style": "h2", + "_key": "8c15f3ddfcd3", + "markDefs": [], + "children": [ + { + "_key": "0749e537492f", + "_type": "span", + "marks": [], + "text": "Install Windows Terminal" + } + ] + }, + { + "style": "normal", + "_key": "832c3474ea13", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/microsoft/terminal", + "_key": "3166a8b799e9" + } + ], + "children": [ + { + "_key": "9d03e0884fbc", + "_type": "span", + "marks": [], + "text": "While not necessary, it is a good idea to install " + }, + { + "text": "Windows Terminal", + "_key": "9c331d34b97f", + "_type": "span", + "marks": [ + "3166a8b799e9" + ] + }, + { + "marks": [], + "text": " at this point. When working with Nextflow, it is handy to interact with multiple command lines at the same time. For example, users may want to execute flows, monitor logfiles, and run Docker commands in separate windows.", + "_key": "e9a753f8078c", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "5cd2f1fd107d", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "9825d4220934" + }, + { + "_type": "block", + "style": "normal", + "_key": "f913144549ed", + "markDefs": [], + "children": [ + { + "_key": "294e87174ecc", + "_type": "span", + "marks": [], + "text": "Windows Terminal provides an X-Windows-like experience on Windows. It helps organize your various command-line environments - Linux shell, Windows Command Prompt, PowerShell, AWS or Azure CLIs." + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "bee5a82a7393" + } + ], + "_type": "block", + "style": "normal", + "_key": "92a237142137", + "markDefs": [] + }, + { + "asset": { + "_ref": "image-f02c195fad1be5c9053179106e86221dfca766d8-1381x903-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Windows Terminal", + "_key": "d7709d5f7894" + }, + { + "_type": "block", + "style": "normal", + "_key": "c87d0b905484", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "7b73916c7b7d", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_key": "cc49d134695f", + "_type": "span", + "marks": [], + "text": "Instructions for downloading and installing Windows Terminal are available at: " + }, + { + "text": "https://docs.microsoft.com/en-us/windows/terminal/get-started", + "_key": "7ebc0c43bb95", + "_type": "span", + "marks": [ + "9681967ad783" + ] + }, + { + "marks": [], + "text": ".", + "_key": "b108f6264291", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "b4adee730117", + "markDefs": [ + { + "_key": "9681967ad783", + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/terminal/get-started" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "183f0555d75e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "345b6596968f", + "markDefs": [] + }, + { + "style": "normal", + "_key": "6f9ad4446cdf", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments", + "_key": "1287146d4fb6" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "It is worth spending a few minutes getting familiar with available commands and shortcuts in Windows Terminal. Documentation is available at ", + "_key": "eece809028f7" + }, + { + "_key": "6d43d46dd894", + "_type": "span", + "marks": [ + "1287146d4fb6" + ], + "text": "https://docs.microsoft.com/en-us/windows/terminal/command-line-arguments" + }, + { + "text": ".", + "_key": "042485158273", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "b65093e88c56", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e6deead5247a" + }, + { + "_key": "1b16982af4e6", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Some Windows Terminal commands you'll need right away are provided below:", + "_key": "aba059ce6503" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "86e39aa442b9", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "620546696488", + "_type": "span" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "8db03ef84d9a", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Split the active window vertically: SHIFT ALT =Split the active window horizontally: SHIFT ALT Resize the active window: SHIFT ALT ``Open a new window under the current tab: ALT v (_the new tab icon along the top of the Windows Terminal interface_)", + "_key": "b2ef918f73ef", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "c62578cd7567", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "adaa98f7ebcd" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Installing Docker on Windows", + "_key": "128896cf62f8" + } + ], + "_type": "block", + "style": "h2", + "_key": "70a2f926bec5" + }, + { + "_key": "2fafc2b4a9e8", + "markDefs": [ + { + "href": "https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9", + "_key": "f7517fb7b0a8", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "There are two ways to install Docker for use with the WSL on Windows. One method is to install Docker directly on a hosted WSL Linux instance (Ubuntu in our case) and have the docker daemon run on the Linux kernel as usual. An installation recipe for people that choose this "native Linux" approach is provided ", + "_key": "f7ceff158e6b", + "_type": "span" + }, + { + "_key": "21a65b83e337", + "_type": "span", + "marks": [ + "f7517fb7b0a8" + ], + "text": "here" + }, + { + "_key": "c468f0b7ca47", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "29edc1888975" + } + ], + "_type": "block", + "style": "normal", + "_key": "2f1fa80b3668", + "markDefs": [] + }, + { + "children": [ + { + "_key": "b7a338b888e4", + "_type": "span", + "marks": [], + "text": "A second method is to run " + }, + { + "text": "Docker Desktop", + "_key": "0a82b53903ea", + "_type": "span", + "marks": [ + "2b9f6b2e5017" + ] + }, + { + "_type": "span", + "marks": [], + "text": " on Windows. While Docker is more commonly used in Linux environments, it can be used with Windows also. The Docker Desktop supports containers running on Windows and Linux instances running under WSL. Docker Desktop provides some advantages for Windows users:", + "_key": "c8074f9eee53" + } + ], + "_type": "block", + "style": "normal", + "_key": "626e152582e2", + "markDefs": [ + { + "href": "https://www.docker.com/products/docker-desktop", + "_key": "2b9f6b2e5017", + "_type": "link" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "f384370784b3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "cac45639abb1", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "67a96507c31d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The installation process is automatedDocker Desktop provides a Windows GUI for managing Docker containers and images (including Linux containers running under WSL)Microsoft provides Docker Desktop integration features from within Visual Studio Code via a VS Code extensionDocker Desktop provides support for auto-installing a single-node Kubernetes clusterThe Docker Desktop WSL 2 back-end provides an elegant Linux integration such that from a Linux user's perspective, Docker appears to be running natively on Linux.", + "_key": "2332ada26657", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "f73ae75e7f79", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "548b85eac6ca" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7f9b3c7c9f10", + "markDefs": [ + { + "_type": "link", + "href": "https://www.docker.com/blog/new-docker-desktop-wsl2-backend/", + "_key": "aff57c711a7e" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "An explanation of how the Docker Desktop WSL 2 Back-end works is provided ", + "_key": "91562c515d63" + }, + { + "_type": "span", + "marks": [ + "aff57c711a7e" + ], + "text": "here", + "_key": "2e3d36f3373f" + }, + { + "text": ".", + "_key": "b70aba7c5508", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "63c19668969b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "1952a921c0a6", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "Step 1: Install Docker Desktop on Windows", + "_key": "f711a17c9ed9", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "8ae5a39f20d2" + }, + { + "_key": "0db910c1930c", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Download and install Docker Desktop for Windows from the following link: https://desktop.docker.com/win/stable/amd64/Docker%20Desktop%20Installer.exeFollow the on-screen prompts provided by the Docker Desktop Installer. The installation process will install Docker on Windows and install the Docker back-end components so that Docker commands are accessible from within WSL.After installation, Docker Desktop can be run from the Windows start menu. The Docker Desktop user interface is shown below. Note that Docker containers launched under WSL can be managed from the Windows Docker Desktop GUI or Linux command line.The installation process is straightforward, but if you run into difficulties, detailed instructions are available [here](https://docs.docker.com/docker-for-windows/install/).\n\n![Nextflow Visual Studio Code Extension](/img/docker-images.png)\n\nThe Docker Engineering team provides an architecture diagram explaining how Docker on Windows interacts with WSL. Additional details are available [here](https://code.visualstudio.com/blogs/2020/03/02/docker-in-wsl2).\n\n![Nextflow Visual Studio Code Extension](/img/docker-windows-arch.png)", + "_key": "c1fd967080ac", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "2d702659331f", + "markDefs": [], + "children": [ + { + "_key": "0e3e027d35ee", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h3", + "_key": "32ca7f6c2d3f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Step 2: Verify the Docker installation", + "_key": "942090e26fe6" + } + ], + "_type": "block" + }, + { + "_key": "4a10fde1dff8", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Now that Docker is installed, run a Docker container to verify that Docker and the Docker Integration Package on WSL 2 are working properly.", + "_key": "0ff61c27cfb3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b062e53f3562" + } + ], + "_type": "block", + "style": "normal", + "_key": "987858ec513d", + "markDefs": [] + }, + { + "_key": "805111643132", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Run a Docker command from the Linux shell as shown below below. This command downloads a **centos** image from Docker Hub and allows us to interact with the container via an assigned pseudo-tty. Your Docker container may exit with exit code 139 when you run this and other Docker containers. If so, don't worry – an easy fix to this issue is provided shortly.\n\n```console\n$ docker run -ti centos:6\n[root@02ac0beb2d2c /]# hostname\n02ac0beb2d2c\n```\nYou can run Docker commands in other Linux shell windows via the Windows Terminal environment to monitor and manage Docker containers and images. For example, running `docker ps` in another window shows the running CentOS Docker container.\n\n```console\n$ docker ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS NAMES\nf5dad42617f1 centos:6 \"/bin/bash\" 2 minutes ago Up 2 minutes happy_hopper\n```", + "_key": "880074fca088", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "d2b2719c6d0c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "48d5598d6083" + }, + { + "markDefs": [], + "children": [ + { + "text": "Step 3: Dealing with exit code 139", + "_key": "26183cfa9327", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "bff3301d6dc8" + }, + { + "_type": "block", + "style": "normal", + "_key": "e7ecbf23f2e1", + "markDefs": [ + { + "_key": "e89fc0187c0e", + "_type": "link", + "href": "https://dev.to/damith/docker-desktop-container-crash-with-exit-code-139-on-windows-wsl-fix-438" + }, + { + "_key": "fd5b05ee3dd5", + "_type": "link", + "href": "https://unix.stackexchange.com/questions/478387/running-a-centos-docker-image-on-arch-linux-exits-with-code-139" + } + ], + "children": [ + { + "_key": "95dfa02661f9", + "_type": "span", + "marks": [], + "text": "You may encounter exit code " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "139", + "_key": "f1bd067a4afc" + }, + { + "_key": "fb17703958c3", + "_type": "span", + "marks": [], + "text": " when running Docker containers. This is a known problem when running containers with specific base images within Docker Desktop. Good explanations of the problem and solution are provided " + }, + { + "_type": "span", + "marks": [ + "e89fc0187c0e" + ], + "text": "here", + "_key": "de1a4f8624b3" + }, + { + "marks": [], + "text": " and ", + "_key": "adc6c2208999", + "_type": "span" + }, + { + "text": "here", + "_key": "aec7155b7451", + "_type": "span", + "marks": [ + "fd5b05ee3dd5" + ] + }, + { + "_key": "d1d7c35d65eb", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "363576a213dc" + } + ], + "_type": "block", + "style": "normal", + "_key": "d4bcf251cf7f" + }, + { + "children": [ + { + "_key": "a10a94310c91", + "_type": "span", + "marks": [], + "text": "The solution is to add two lines to a " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": ".wslconfig", + "_key": "346f9ae4fb86" + }, + { + "marks": [], + "text": " file in your Windows home directory. The ", + "_key": "4ec97c28a2c9", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": ".wslconfig", + "_key": "a6eb54b8e772", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " file specifies kernel options that apply to all Linux distributions running under WSL 2.", + "_key": "e371dae48574" + } + ], + "_type": "block", + "style": "normal", + "_key": "ec25c415d7a7", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "ce923e7c81fe", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a8ae2f42cbaa", + "markDefs": [] + }, + { + "_key": "9e6ba742843a", + "markDefs": [], + "children": [ + { + "_key": "c3df7054aac3", + "_type": "span", + "marks": [], + "text": "Some of the Nextflow container images served from Docker Hub are affected by this bug since they have older base images, so it is a good idea to apply this fix." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "25cfd07a16bc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "ac912ef0fc86" + }, + { + "_type": "block", + "style": "normal", + "_key": "e3efaadbdca3", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "Edit the `.wslconfig` file in your Windows home directory. You can do this using PowerShell as shown:\n\n```powershell\nPS C:\\Users\\ notepad .wslconfig\n```\nAdd these two lines to the `.wslconfig` file and save it:\n\n```ini\n[wsl2]\nkernelCommandLine = vsyscall=emulate\n```\nAfter this, **restart your machine** to force a restart of the Docker and WSL 2 environment. After making this correction, you should be able to launch containers without seeing exit code `139`.", + "_key": "7308913ea661", + "_type": "span", + "marks": [] + } + ] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "29b092f51f9e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "1912704ebfe2" + }, + { + "_type": "block", + "style": "h2", + "_key": "6931352cd4ca", + "markDefs": [], + "children": [ + { + "text": "Install Visual Studio Code as your IDE (optional)", + "_key": "e2eb3f4fffe9", + "_type": "span", + "marks": [] + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "b2d8b070f1ad", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Developers can choose from a variety of IDEs depending on their preferences. Some examples of IDEs and developer-friendly editors are below:", + "_key": "72d67eaa0141" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "069f31834fc2" + } + ], + "_type": "block", + "style": "normal", + "_key": "cfe41810bc50" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Visual Studio Code - https://code.visualstudio.com/Download (Nextflow VSCode Language plug-in [here](https://github.com/nextflow-io/vscode-language-nextflow/blob/master/vsc-extension-quickstart.md))Eclipse - https://www.eclipse.org/VIM - https://www.vim.org/ (VIM plug-in for Nextflow [here](https://github.com/LukeGoodsell/nextflow-vim))Emacs - https://www.gnu.org/software/emacs/download.html (Nextflow syntax highlighter [here](https://github.com/Emiller88/nextflow-mode))JetBrains PyCharm - https://www.jetbrains.com/pycharm/IntelliJ IDEA - https://www.jetbrains.com/idea/Atom – https://atom.io/ (Nextflow Atom support available [here](https://atom.io/packages/language-nextflow))Notepad++ - https://notepad-plus-plus.org/", + "_key": "f4d4054d2e38" + } + ], + "_type": "block", + "style": "normal", + "_key": "09ae17a1fc63", + "listItem": "bullet" + }, + { + "style": "normal", + "_key": "cb1483c80d3b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0d6dd41ccddc", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "We decided to install Visual Studio Code because it has some nice features, including:", + "_key": "f5f8f0b5d1f8" + } + ], + "_type": "block", + "style": "normal", + "_key": "ea3a84d72b88", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "b4522e54021a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "2d71bf55c0b5" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Support for source code control from within the IDE (Git)Support for developing on Linux via its WSL 2 Video Studio Code BackendA library of extensions including Docker and Kubernetes support and extensions for Nextflow, including Nextflow language support and an [extension pack for the nf-core community](https://github.com/nf-core/vscode-extensionpack).", + "_key": "b698d2fb3ade", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "706f75ba0b40", + "listItem": "bullet" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f57986a11c52" + } + ], + "_type": "block", + "style": "normal", + "_key": "47991ae3524d" + }, + { + "_key": "2d1b93179ab3", + "markDefs": [ + { + "href": "https://code.visualstudio.com/Download", + "_key": "e2aa64150669", + "_type": "link" + } + ], + "children": [ + { + "_key": "39570acdbe14", + "_type": "span", + "marks": [], + "text": "Download Visual Studio Code from " + }, + { + "marks": [ + "e2aa64150669" + ], + "text": "https://code.visualstudio.com/Download", + "_key": "b23f50069442", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " and follow the installation procedure. The installation process will detect that you are running WSL. You will be invited to download and install the Remote WSL extension.", + "_key": "3b5b9a4a6993" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b02d2a9a4e0b" + } + ], + "_type": "block", + "style": "normal", + "_key": "3934494593f6" + }, + { + "children": [ + { + "text": "Within VS Code and other Windows tools, you can access the Linux file system under WSL 2 by accessing the path `\\\\wsl$\\`. In our example, the path from Windows to access files from the root of our Ubuntu Linux instance is: [**\\\\wsl$\\Ubuntu-20.04**](file://wsl$/Ubuntu-20.04).", + "_key": "7e6eaf5fd417", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e90ca1e0c84e", + "listItem": "bullet", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "b4072a0708b8", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "85d8a19702ee", + "_type": "span" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "66834936418e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Note that the reverse is possible also – from within Linux, ", + "_key": "22dd59fb1a5a" + }, + { + "_key": "da58a2e3e83c", + "_type": "span", + "marks": [ + "code" + ], + "text": "/mnt/c" + }, + { + "_key": "064282dc9b10", + "_type": "span", + "marks": [], + "text": " maps to the Windows C: drive. You can inspect " + }, + { + "text": "/etc/mtab", + "_key": "01566a3bd6ca", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "6ee1e89a3148", + "_type": "span", + "marks": [], + "text": " to see the mounted file systems available under Linux." + } + ] + }, + { + "style": "normal", + "_key": "b4b5cb5ee77e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "4ab3288f643a" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "It is a good idea to install Nextflow language support in VS Code. You can do this by selecting the Extensions icon from the left panel of the VS Code interface and searching the extensions library for Nextflow as shown. The Nextflow language support extension is on GitHub at https://github.com/nextflow-io/vscode-language-nextflow\n\n![Nextflow Visual Studio Code Extension](/img/nf-vscode-ext.png)", + "_key": "c7a5365e4a40", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "36639de3bab1", + "listItem": "bullet", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1c7e8da68c1b", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4d93359955ca" + }, + { + "style": "h2", + "_key": "4041772604c4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Visual Studio Code Remote Development", + "_key": "27e7829bbd7c" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "7258aae5f401", + "markDefs": [], + "children": [ + { + "_key": "f88a9ec5b0c9", + "_type": "span", + "marks": [], + "text": "Visual Studio Code Remote Development supports development on remote environments such as containers or remote hosts. For Nextflow users, it is important to realize that VS Code sees the Ubuntu instance we installed on WSL as a remote environment. The Diagram below illustrates how remote development works. From a VS Code perspective, the Linux instance in WSL is considered a remote environment." + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "408ec60281a8", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "a74ffe0d03c5" + }, + { + "children": [ + { + "text": "Windows users work within VS Code in the Windows environment. However, source code, developer tools, and debuggers all run Linux on WSL, as illustrated below.", + "_key": "fd17b19d8f7b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8befd6bd030a", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f8bd1447acd7" + } + ], + "_type": "block", + "style": "normal", + "_key": "076004a3ade5" + }, + { + "_type": "image", + "alt": "The Remote Development Environment in VS Code", + "_key": "2eb73bc6dfbe", + "asset": { + "_type": "reference", + "_ref": "image-c865d3c82010e9ca67f28de8b6d0812668758dca-958x308-png" + } + }, + { + "_key": "ed1a6ca8baca", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "db9d957eab20", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "An explanation of how VS Code Remote Development works is provided ", + "_key": "b66cf224f537", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "3db9f8df4ef0" + ], + "text": "here", + "_key": "c4ec2d3db2c2" + }, + { + "_key": "df671cb13891", + "_type": "span", + "marks": [], + "text": "." + } + ], + "_type": "block", + "style": "normal", + "_key": "326f5da14a98", + "markDefs": [ + { + "href": "https://code.visualstudio.com/docs/remote/remote-overview", + "_key": "3db9f8df4ef0", + "_type": "link" + } + ] + }, + { + "_key": "ffa3e6c4ab17", + "markDefs": [], + "children": [ + { + "_key": "575707f2ba3c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1f07399b193", + "markDefs": [], + "children": [ + { + "text": "VS Code users see the Windows filesystem, plug-ins specific to VS Code on Windows, and access Windows versions of tools such as Git. If you prefer to develop in Linux, you will want to select WSL as the remote environment.", + "_key": "68813f5534f9", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "1a15220f5ae7", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "255fb71288a9" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To open a new VS Code Window running in the context of the WSL Ubuntu-20.04 environment, click the green icon at the lower left of the VS Code window and select ", + "_key": "5e404734e971" + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "\"New WSL Window using Distro ..\"", + "_key": "9608dca75900" + }, + { + "_key": "e74cc82b6ab0", + "_type": "span", + "marks": [], + "text": " and select " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "Ubuntu 20.04", + "_key": "83c0bca6fe35" + }, + { + "text": ". You'll notice that the environment changes to show that you are working in the WSL: ", + "_key": "dc979953a92c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "Ubuntu-20.04", + "_key": "3b136f2e25ed" + }, + { + "_key": "2b05ecc5dc9a", + "_type": "span", + "marks": [], + "text": " environment." + } + ], + "_type": "block", + "style": "normal", + "_key": "8477aa50b556" + }, + { + "style": "normal", + "_key": "ccce0d447c82", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "688c05a2e65a", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "212505adace0", + "asset": { + "_ref": "image-21d824cfc6f7e089c8540b039bf70ba6d894cffa-1045x322-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Selecting the Remote Dev Environment within VS Code" + }, + { + "style": "normal", + "_key": "87de85dd1fed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "1c0f9e8084ac" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Selecting the Extensions icon, you can see that different VS Code Marketplace extensions run in different contexts. The Nextflow Language extension installed in the previous step is globally available. It works when developing on Windows or developing on WSL: Ubuntu-20.04.", + "_key": "033680399f86", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "97b4bf2c2e39" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "854147eeda3a" + } + ], + "_type": "block", + "style": "normal", + "_key": "f38eb41209d4", + "markDefs": [] + }, + { + "children": [ + { + "text": "The Extensions tab in VS Code differentiates between locally installed plug-ins and those installed under WSL.", + "_key": "a13e2a66abf8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "c760b9bc3f44", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "635e73c6d746" + } + ], + "_type": "block", + "style": "normal", + "_key": "cdf08a96ee81", + "markDefs": [] + }, + { + "_key": "de928cb3b887", + "asset": { + "_type": "reference", + "_ref": "image-b57b2e771cb9d391e3b288644597b973dabb7847-1103x460-png" + }, + "_type": "image", + "alt": "Local vs. Remote Extensions in VS Code" + }, + { + "_key": "3f8e67c10872", + "markDefs": [], + "children": [ + { + "_key": "e19b3d008590", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "2ffc1b459c1b", + "markDefs": [], + "children": [ + { + "text": "Installing Nextflow", + "_key": "3bbe7dc8af6d", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "31fa19c1467b", + "markDefs": [ + { + "_key": "1c8bdbdf490a", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/getstarted.html#installation" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With Linux, Docker, and an IDE installed, now we can install Nextflow in our WSL 2 hosted Linux environment. Detailed instructions for installing Nextflow are available at ", + "_key": "b79d05c8e70c" + }, + { + "_type": "span", + "marks": [ + "1c8bdbdf490a" + ], + "text": "https://www.nextflow.io/docs/latest/getstarted.html#installation", + "_key": "08461233b8ae" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f1ba391b9d9d" + } + ], + "_type": "block", + "style": "normal", + "_key": "5f889674e0b7" + }, + { + "_key": "8a6810742a6c", + "markDefs": [], + "children": [ + { + "text": "Step 1: Make sure Java is installed (under WSL)", + "_key": "2f13a264ae06", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3" + }, + { + "style": "normal", + "_key": "4d6baf7bd227", + "markDefs": [ + { + "_type": "link", + "href": "https://linuxize.com/post/install-java-on-ubuntu-18-04/", + "_key": "1f09e58f94ac" + } + ], + "children": [ + { + "marks": [], + "text": "Java is a prerequisite for running Nextflow. Instructions for installing Java on Ubuntu are available ", + "_key": "8d72534f3e99", + "_type": "span" + }, + { + "text": "here", + "_key": "1848e7405401", + "_type": "span", + "marks": [ + "1f09e58f94ac" + ] + }, + { + "text": ". To install the default OpenJDK, follow the instructions below in a Linux shell window:", + "_key": "f14ddc4774cb", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_key": "fc856575d20b", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "85c7a65db5d4", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Update the _apt_ package index:\n\n```bash\nsudo apt update\n```\nInstall the latest default OpenJDK package\n\n```bash\nsudo apt install default-jdk\n```\nVerify the installation\n\n```bash\njava -version\n```", + "_key": "b16587feea85" + } + ], + "_type": "block", + "style": "normal", + "_key": "1f74277d542d" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "175092575205", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "253c9fdb33b3" + }, + { + "_type": "block", + "style": "h3", + "_key": "827381d0653c", + "markDefs": [], + "children": [ + { + "text": "Step 2: Make sure curl is installed", + "_key": "dacfc23890b7", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "2d33fb95e34a", + "markDefs": [], + "children": [ + { + "text": "curl", + "_key": "17e848035518", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "4d3ed4a3325c", + "_type": "span", + "marks": [], + "text": " is a convenient way to obtain Nextflow. " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "curl", + "_key": "69c2189e8f8b" + }, + { + "text": " is included in the default Ubuntu repositories, so installation is straightforward.", + "_key": "d1a971e13a3d", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "01fc5feac2bc", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "f5caa8a2e947" + }, + { + "style": "normal", + "_key": "058b01d709cd", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "From the shell:\n\n```bash\nsudo apt update\nsudo apt install curl\n```\nVerify that `curl` works:\n\n```console\n$ curl\ncurl: try 'curl --help' or 'curl --manual' for more information\n```", + "_key": "28c2d3fde4e6", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "11ede508fb85" + } + ], + "_type": "block", + "style": "normal", + "_key": "58e9c0adcfc0" + }, + { + "style": "h3", + "_key": "96cf5665f7ec", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "STEP 3: Download and install Nextflow", + "_key": "43f8a055417e" + } + ], + "_type": "block" + }, + { + "_key": "bceff68ed977", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use `curl` to retrieve Nextflow into a temporary directory and then install it in `/usr/bin` so that the Nextflow command is on your path:\n\n```bash\nmkdir temp\ncd temp\ncurl -s https://get.nextflow.io | bash\nsudo cp nextflow /usr/bin\n```\nMake sure that Nextflow is executable:\n\n```bash\nsudo chmod 755 /usr/bin/nextflow\n```\n\nor if you prefer:\n\n```bash\nsudo chmod +x /usr/bin/nextflow\n```", + "_key": "e26ac8292f82" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "de70e677b837", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b3c247fef8a1" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "Step 4: Verify the Nextflow installation", + "_key": "65c3814a76b8", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "62c13a393584" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Make sure Nextflow runs:\n\n```console\n$ nextflow -version\n\n N E X T F L O W\n version 21.04.2 build 5558\n created 12-07-2021 07:54 UTC (03:54 EDT)\n cite doi:10.1038/nbt.3820\n http://nextflow.io\n```\nRun a simple Nextflow pipeline. The example below downloads and executes a sample hello world pipeline from GitHub - https://github.com/nextflow-io/hello.\n\n```console\n$ nextflow run hello\n\nN E X T F L O W ~ version 21.04.2\nLaunching `nextflow-io/hello` [distracted_pare] - revision: ec11eb0ec7 [master]\nexecutor > local (4)\n[06/c846d8] process > sayHello (3) [100%] 4 of 4 ✔\nCiao world!\n\nHola world!\n\nBonjour world!\n\nHello world!\n```", + "_key": "06c1d10e12d2" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc05db644cb0" + }, + { + "_key": "56ab48f63343", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "18a0f30811fc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Step 5: Run a Containerized Workflow", + "_key": "2592088d054f" + } + ], + "_type": "block", + "style": "h3", + "_key": "5fa4f2dd07f4" + }, + { + "style": "normal", + "_key": "b40ecea19814", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/blast-example", + "_key": "7801cfd7c9ae" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "To validate that Nextflow works with containerized workflows, we can run a slightly more complicated example. A sample workflow involving NCBI Blast is available at ", + "_key": "e70b89aaccea" + }, + { + "marks": [ + "7801cfd7c9ae" + ], + "text": "https://github.com/nextflow-io/blast-example", + "_key": "128b42af6c2e", + "_type": "span" + }, + { + "text": ". Rather than installing Blast on our local Linux instance, it is much easier to pull a container preloaded with Blast and other software that the pipeline depends on.", + "_key": "c90986f784f2", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "03fa9a25a03f" + } + ], + "_type": "block", + "style": "normal", + "_key": "c1313b454b5c", + "markDefs": [] + }, + { + "markDefs": [ + { + "_key": "e46e32c61c16", + "_type": "link", + "href": "https://hub.docker.com/r/nextflow/examples" + } + ], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "fd2dd435cc3a", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "5c5a7e01af20" + }, + { + "text": " file for the Blast example (below) specifies that process logic is encapsulated in the container ", + "_key": "763cdccc977c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow/examples", + "_key": "e410cc3d7af6" + }, + { + "_key": "04448505be39", + "_type": "span", + "marks": [], + "text": " available from Docker Hub (" + }, + { + "_key": "47f1de1cd394", + "_type": "span", + "marks": [ + "e46e32c61c16" + ], + "text": "https://hub.docker.com/r/nextflow/examples" + }, + { + "marks": [], + "text": ").", + "_key": "50451370d176", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "37e7d897956c" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "8231851f55ea", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "faf6bf352637" + }, + { + "style": "normal", + "_key": "33377bbb7430", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "On GitHub: [nextflow-io/blast-example/nextflow.config](https://github.com/nextflow-io/blast-example/blob/master/nextflow.config)\n\n```groovy\nmanifest {\n nextflowVersion = '>= 20.01.0'\n}\n\nprocess {\n container = 'nextflow/examples'\n}\n```\nRun the _blast-example_ pipeline that resides on GitHub directly from WSL and specify Docker as the container runtime using the command below:\n\n```console\n$ nextflow run blast-example -with-docker\nN E X T F L O W ~ version 21.04.2\nLaunching `nextflow-io/blast-example` [sharp_raman] - revision: 25922a0ae6 [master]\nexecutor > local (2)\n[aa/a9f056] process > blast (1) [100%] 1 of 1 ✔\n[b3/c41401] process > extract (1) [100%] 1 of 1 ✔\nmatching sequences:\n>lcl|1ABO:B unnamed protein product\nMNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n>lcl|1ABO:A unnamed protein product\nMNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS\n>lcl|1YCS:B unnamed protein product\nPEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA\nGHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG\nVQEKMGIMNKGVIYALWDYEPQNDDELPMKEGDCMTIIHREDEDEIEWWWARLNDKEGYVPRNLLGLYPRIKPRQRSLA\n>lcl|1IHD:C unnamed protein product\nLPNITILATGGTIAGGGDSATKSNYTVGKVGVENLVNAVPQLKDIANVKGEQVVNIGSQDMNDNVWLTLAKKINTDCDKT\n```\nNextflow executes the pipeline directly from the GitHub repository and automatically pulls the nextflow/examples container from Docker Hub if the image is unavailable locally. The pipeline then executes the two containerized workflow steps (blast and extract). The pipeline then collects the sequences into a single file and prints the result file content when pipeline execution completes.", + "_key": "24431e9f4e19" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "8ccd59644c7a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "8b0ff97170bb" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Configuring an XServer for the Nextflow Console", + "_key": "1916480ec6bb" + } + ], + "_type": "block", + "style": "h2", + "_key": "e7076c68da06", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipeline developers will probably want to use the Nextflow Console at some point. The Nextflow Console's REPL (read-eval-print loop) environment allows developers to quickly test parts of scripts or Nextflow code segments interactively.", + "_key": "21dd8f41ab11" + } + ], + "_type": "block", + "style": "normal", + "_key": "914fd17b2ef1" + }, + { + "children": [ + { + "_key": "b2054a9aaccd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "68abb1ebf3ab", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "6d0655eb0504", + "markDefs": [ + { + "_key": "acb59e6de910", + "_type": "link", + "href": "https://medium.com/javarevisited/using-wsl-2-with-x-server-linux-on-windows-a372263533c3" + } + ], + "children": [ + { + "marks": [], + "text": "The Nextflow Console is launched from the Linux command line. However, the Groovy-based interface requires an X-Windows environment to run. You can set up X-Windows with WSL using the procedure below. A good article on this same topic is provided ", + "_key": "c8eee28eb408", + "_type": "span" + }, + { + "text": "here", + "_key": "de1ae2a9095f", + "_type": "span", + "marks": [ + "acb59e6de910" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "10529ef8f82d" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "b0e89848cf8c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "68afd60df4b4" + }, + { + "_key": "d228004a387d", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Download an X-Windows server for Windows. In this example, we use the _VcXsrv Windows X Server_ available from source forge at https://sourceforge.net/projects/vcxsrv/.Accept all the defaults when running the automated installer. The X-server will end up installed in `c:\\Program Files\\VcXsrv`.The automated installation of VcXsrv will create an _\"XLaunch\"_ shortcut on your desktop. It is a good idea to create your own shortcut with a customized command line so that you don't need to interact with the XLaunch interface every time you start the X-server.Right-click on the Windows desktop to create a new shortcut, give it a meaningful name, and insert the following for the shortcut target:\n\n```powershell\n\"C:\\Program Files\\VcXsrv\\vcxsrv.exe\" :0 -ac -terminate -lesspointer -multiwindow -clipboard -wgl -dpi auto\n```\nInspecting the new shortcut properties, it should look something like this:\n\n![X-Server (vcxsrc) Properties](/img/xserver.png)Double-click on the new shortcut desktop icon to test it. Unfortunately, the X-server runs in the background. When running the X-server in multiwindow mode (which we recommend), it is not obvious whether the X-server is running.One way to check that the X-server is running is to use the Microsoft Task Manager and look for the XcSrv process running in the background. You can also verify it is running by using the `netstat` command from with PowerShell on Windows to ensure that the X-server is up and listening on the appropriate ports. Using `netstat`, you should see output like the following:\n\n```powershell\nPS C:\\WINDOWS\\system32> **netstat -abno | findstr 6000**\n TCP 0.0.0.0:6000 0.0.0.0:0 LISTENING 35176\n TCP 127.0.0.1:6000 127.0.0.1:56516 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56517 ESTABLISHED 35176\n TCP 127.0.0.1:6000 127.0.0.1:56518 ESTABLISHED 35176\n TCP 127.0.0.1:56516 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56517 127.0.0.1:6000 ESTABLISHED 35176\n TCP 127.0.0.1:56518 127.0.0.1:6000 ESTABLISHED 35176\n TCP 172.28.192.1:6000 172.28.197.205:46290 TIME_WAIT 0\n TCP [::]:6000 [::]:0 LISTENING 35176\n```\nAt this point, the X-server is up and running and awaiting a connection from a client.Within Ubuntu in WSL, we need to set up the environment to communicate with the X-Windows server. The shell variable DISPLAY needs to be set pointing to the IP address of the X-server and the instance of the X-windows server.The shell script below will set the DISPLAY variable appropriately and export it to be available to X-Windows client applications launched from the shell. This scripting trick works because WSL sees the Windows host as the nameserver and this is the same IP address that is running the X-Server. You can echo the $DISPLAY variable after setting it to verify that it is set correctly.\n\n```console\n$ export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n$ echo $DISPLAY\n172.28.192.1:0.0\n```\nAdd this command to the end of your `.bashrc` file in the Linux home directory to avoid needing to set the DISPLAY variable every time you open a new window. This way, if the IP address of the desktop or laptop changes, the DISPLAY variable will be updated accordingly.\n\n```bash\ncd ~\nvi .bashrc\n```\n\n```bash\n# set the X-Windows display to connect to VcXsrv on Windows\nexport DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0.0\n\".bashrc\" 120L, 3912C written\n```\nUse an X-windows client to make sure that the X- server is working. Since X-windows clients are not installed by default, download an xterm client as follows via the Linux shell:\n\n```bash\nsudo apt install xterm\n```\nAssuming that the X-server is up and running on Windows, and the Linux DISPLAY variable is set correctly, you're ready to test X-Windows.\n\nBefore testing X-Windows, do yourself a favor and temporarily disable the Windows Firewall. The Windows Firewall will very likely block ports around 6000, preventing client requests on WSL from connecting to the X-server. You can find this under Firewall & network protection on Windows. Clicking the \"Private Network\" or \"Public Network\" options will show you the status of the Windows Firewall and indicate whether it is on or off.\n\nDepending on your installation, you may be running a specific Firewall. In this example, we temporarily disable the McAfee LiveSafe Firewall as shown:\n\n![Ensure that the Firewall is not interfering](/img/firewall.png)With the Firewall disabled, you can attempt to launch the xterm client from the Linux shell:\n\n```bash\nxterm &\n```\nIf everything is working correctly, you should see the new xterm client appear under Windows. The xterm is executing on Ubuntu under WSL but displays alongside other Windows on the Windows desktop. This is what is meant by \"multiwindow\" mode.\n\n![Launch an xterm to verify functionality](/img/xterm.png)Now that you know X-Windows is working correctly turn the Firewall back on, and adjust the settings to allow traffic to and from the required port. Ideally, you want to open only the minimal set of ports and services required. In the case of the McAfee Firewall, getting X-Windows to work required changing access to incoming and outgoing ports to _\"Open ports to Work and Home networks\"_ for the `vcxsrv.exe` program only as shown:\n\n![Allowing access to XServer traffic](/img/xserver_setup.png)With the X-server running, the `DISPLAY` variable set, and the Windows Firewall configured correctly, we can now launch the Nextflow Console from the shell as shown:\n\n```bash\nnextflow console\n```\n\nThe command above opens the Nextflow REPL console under X-Windows.\n\n![Nextflow REPL Console under X-Windows](/img/repl_console.png)", + "_key": "d211c16a1779" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "f0275284b98d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9098733f1b8a" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Inside the Nextflow console, you can enter Groovy code and run it interactively, a helpful feature when developing and debugging Nextflow pipelines.", + "_key": "2b6ad911f5bc" + } + ], + "_type": "block", + "style": "normal", + "_key": "f4fef554096a" + }, + { + "markDefs": [], + "children": [ + { + "_key": "4143c1234c4a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "b22fa24ff37d" + }, + { + "_key": "af52952d5c1a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Installing Git", + "_key": "35870481806a" + } + ], + "_type": "block", + "style": "h1" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Collaborative source code management systems such as BitBucket, GitHub, and GitLab are used to develop and share Nextflow pipelines. To be productive with Nextflow, you will want to install Git.", + "_key": "eff9192462fc" + } + ], + "_type": "block", + "style": "normal", + "_key": "723c5dbea44e" + }, + { + "_key": "3c149e20f09d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "f08378476176" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "As explained earlier, VS Code operates in different contexts. When running VS Code in the context of Windows, VS Code will look for a local copy of Git. When using VS Code to operate against the remote WSL environment, a separate installation of Git installed on Ubuntu will be used. (Note that Git is installed by default on Ubuntu 20.04)", + "_key": "ee9043276923", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "38ee4d1cd649", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "109f9c1c3d44", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "862393f175cc" + } + ] + }, + { + "children": [ + { + "text": "Developers will probably want to use Git both from within a Windows context and a Linux context, so we need to make sure that Git is present in both environments.", + "_key": "93c560ce7104", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "b8f8dfcb927b", + "markDefs": [] + }, + { + "style": "normal", + "_key": "24a9036ce325", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "0b1a6996a0b9" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "67a14d3cf30a", + "_type": "span", + "marks": [], + "text": "Step 1: Install Git on Windows (optional)" + } + ], + "_type": "block", + "style": "h3", + "_key": "51a9ae0990cd" + }, + { + "style": "normal", + "_key": "0255a4977720", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Download the install the 64-bit Windows version of Git from https://git-scm.com/downloads.Click on the Git installer from the Downloads directory, and click through the default installation options. During the install process, you will be asked to select the default editor to be used with Git. (VIM, Notepad++, etc.). Select Visual Studio Code (assuming that this is the IDE that you plan to use for Nextflow).\n\n![Installing Git on Windows](/img/git-install.png)The Git installer will prompt you for additional settings. If you are not sure, accept the defaults. When asked, adjust the `PATH` variable to use the recommended option, making the Git command line available from Git Bash, the Command Prompt, and PowerShell.After installation Git Bash, Git GUI, and GIT CMD will appear as new entries under the Start menu. If you are running Git from PowerShell, you will need to open a new Windows to force PowerShell to reset the path variable. By default, Git installs in C:\\Program Files\\Git.If you plan to use Git from the command line, GitHub provides a useful cheatsheet [here](https://training.github.com/downloads/github-git-cheat-sheet.pdf).After installing Git, from within VS Code (in the context of the local host), select the Source Control icon from the left pane of the VS Code interface as shown. You can open local folders that contain a git repository or clone repositories from GitHub or your preferred source code management system.\n\n![Using Git within VS Code](/img/git-vscode.png)Documentation on using Git with Visual Studio Code is provided at https://code.visualstudio.com/docs/editor/versioncontrol", + "_key": "d78b87e46d50" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "0f32cde0fab3", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "144893a25ca4", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "3d4ac2b1cdbd", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Step 2: Install Git on Linux", + "_key": "653e5d2c0889", + "_type": "span" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "Open a Remote VS Code Window on **\\*WSL: Ubuntu 20.04\\*** (By selecting the green icon on the lower-left corner of the VS code interface.)Git should already be installed in `/usr/bin`, but you can validate this from the Ubuntu shell:\n\n```console\n$ git --version\ngit version 2.25.1\n```\nTo get started using Git with VS Code Remote on WSL, select the _Source Control icon_ on the left panel of VS code. Assuming VS Code Remote detects that Git is installed on Linux, you should be able to _Clone a Repository_.Select \"Clone Repository,\" and when prompted, clone the GitHub repo for the Blast example that we used earlier - https://github.com/nextflow-io/blast-example. Clone this repo into your home directory on Linux. You should see _blast-example_ appear as a source code repository within VS code as shown:\n\n![Using Git within VS Code](/img/git-linux-1.png)Select the _Explorer_ panel in VS Code to see the cloned _blast-example_ repo. Now we can explore and modify the pipeline code using the IDE.\n\n![Using Git within VS Code](/img/git-linux-2.png)After making modifications to the pipeline, we can execute the _local copy_ of the pipeline either from the Linux shell or directly via the Terminal window in VS Code as shown:\n\n![Using Git within VS Code](/img/git-linux-3.png)With the Docker VS Code extension, users can select the Docker icon from the left code to view containers and images associated with the Nextflow pipeline.Git commands are available from within VS Code by selecting the _Source Control_ icon on the left panel and selecting the three dots (…) to the right of SOURCE CONTROL. Some operations such as pushing or committing code will require that VS Code be authenticated with your GitHub credentials.\n\n![Using Git within VS Code](/img/git-linux-4.png)", + "_key": "6b5acb1f85c7", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "4c10a381b45a", + "listItem": "bullet" + }, + { + "_key": "bac3a5917d29", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "12faa00f6b4c" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "138a8537f9af", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Summary", + "_key": "2cb8ba9ed7ef", + "_type": "span" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "ee21f6530a7a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With WSL2, Windows 10 is an excellent environment for developing and testing Nextflow pipelines. Users can take advantage of the power and convenience of a Linux command line environment while using Windows-based IDEs such as VS-Code with full support for containers.", + "_key": "b21885fd5027" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "46225a359393", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "1b86b9b55660", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Pipelines developed in the Windows environment can easily be extended to compute environments in the cloud.", + "_key": "50b769503196" + } + ], + "_type": "block", + "style": "normal", + "_key": "44f7288da3fd" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "6795e83080f0" + } + ], + "_type": "block", + "style": "normal", + "_key": "e36337855b8b", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "While installing Nextflow itself is straightforward, installing and testing necessary components such as WSL, Docker, an IDE, and Git can be a little tricky. Hopefully readers will find this guide helpful. ", + "_key": "79b5d195edde", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "429b6c49b3b4" + } + ], + "_createdAt": "2024-09-25T14:16:23Z", + "publishedAt": "2021-10-13T06:00:00.000Z", + "_type": "blogPost" + }, + { + "_createdAt": "2024-09-25T14:17:38Z", + "meta": { + "description": "We have talked about Google Cloud Batch before. Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn’t stop there! The Nextflow official documentation also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines.", + "slug": { + "current": "nextflow-with-gbatch" + } + }, + "_rev": "hf9hwMPb7ybAE3bqEU5pVR", + "tags": [ + { + "_key": "eb00fc73a313", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + }, + { + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe", + "_type": "reference", + "_key": "1186b8914c2c" + } + ], + "author": { + "_ref": "mNsm4Vx1W1Wy6aYYkroetD", + "_type": "reference" + }, + "title": "Get started with Nextflow on Google Cloud Batch", + "publishedAt": "2023-02-01T07:00:00.000Z", + "_type": "blogPost", + "body": [ + { + "children": [ + { + "marks": [ + "46532c5914ac" + ], + "text": "We have talked about Google Cloud Batch before", + "_key": "b4a1cc918280", + "_type": "span" + }, + { + "_key": "20b9df914b63", + "_type": "span", + "marks": [], + "text": ". Not only that, we were proud to announce Nextflow support to Google Cloud Batch right after it was publicly released, back in July 2022. How amazing is that? But we didn't stop there! The " + }, + { + "marks": [ + "0d07d5df5d3f" + ], + "text": "Nextflow official documentation", + "_key": "f99f39ade5ae", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " also provides a lot of useful information on how to use Google Cloud Batch as the compute environment for your Nextflow pipelines. Having said that, feedback from the community is valuable, and we agreed that in addition to the documentation, teaching by example, and in a more informal language, can help many of our users. So, here is a tutorial on how to use the Batch service of the Google Cloud Platform with Nextflow 🥳", + "_key": "c591fd0e447a" + } + ], + "_type": "block", + "style": "normal", + "_key": "4649b669d1dd", + "markDefs": [ + { + "href": "https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html", + "_key": "46532c5914ac", + "_type": "link" + }, + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/google.html", + "_key": "0d07d5df5d3f" + } + ] + }, + { + "style": "normal", + "_key": "8eb442784072", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "31dc0ef53452", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "090be1a79409", + "_type": "span", + "marks": [], + "text": "Running an RNAseq pipeline with Google Cloud Batch" + } + ], + "_type": "block", + "style": "h2", + "_key": "14fae6b49728" + }, + { + "_type": "block", + "style": "normal", + "_key": "bea5369d85c8", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nf-core/rnaseq", + "_key": "427e6a38fc12" + }, + { + "href": "https://github.com/nextflow-io/rnaseq-nf", + "_key": "5ff301bdcbdc", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Welcome to our RNAseq tutorial using Nextflow and Google Cloud Batch! RNAseq is a powerful technique for studying gene expression and is widely used in a variety of fields, including genomics, transcriptomics, and epigenomics. In this tutorial, we will show you how to use Nextflow, a popular workflow management tool, to run a proof-of-concept RNAseq pipeline to perform the analysis on Google Cloud Batch, a scalable cloud-based computing platform. For a real Nextflow RNAseq pipeline, check ", + "_key": "44dd00decee7" + }, + { + "marks": [ + "427e6a38fc12" + ], + "text": "nf-core/rnaseq", + "_key": "65f2b5b882ae", + "_type": "span" + }, + { + "_key": "a7b9f4ebe96b", + "_type": "span", + "marks": [], + "text": ". For the proof-of-concept RNAseq pipeline that we will use here, check " + }, + { + "marks": [ + "5ff301bdcbdc" + ], + "text": "nextflow-io/rnaseq-nf", + "_key": "52d6ae27ef64", + "_type": "span" + }, + { + "_key": "17bc9da0f2f7", + "_type": "span", + "marks": [], + "text": "." + } + ] + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "3f41aeb8d77b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "41fc79f02b25" + }, + { + "_key": "fb5d83ab259f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow allows you to easily develop, execute, and scale complex pipelines on any infrastructure, including the cloud. Google Cloud Batch enables you to run batch workloads on Google Cloud Platform (GCP), with the ability to scale up or down as needed. Together, Nextflow and Google Cloud Batch provide a powerful and flexible solution for RNAseq analysis.", + "_key": "ae719a82b164" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "fa66649596e9", + "markDefs": [], + "children": [ + { + "_key": "af2c64c1704c", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "style": "normal", + "_key": "2a80ab269d30", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We will walk you through the entire process, from setting up your Google Cloud account and installing Nextflow to running an RNAseq pipeline and interpreting the results. By the end of this tutorial, you will have a solid understanding of how to use Nextflow and Google Cloud Batch for RNAseq analysis. So let's get started!", + "_key": "e550a19a3312", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "bb5a1cd49cf3", + "markDefs": [], + "children": [ + { + "_key": "e96fcb32b940", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "_type": "block", + "style": "h2", + "_key": "69bc270b7a93", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Setting up Google Cloud CLI (gcloud)", + "_key": "9dba215d7da8" + } + ] + }, + { + "style": "normal", + "_key": "205f28088e9c", + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.google.com/sdk/docs/install", + "_key": "3c67e3a142ba" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In this tutorial, you will learn how to use the gcloud command-line interface to interact with the Google Cloud Platform and set up your Google Cloud account for use with Nextflow. If you do not already have gcloud installed, you can follow the instructions ", + "_key": "17ec0a58b37a" + }, + { + "_key": "186e87235cb9", + "_type": "span", + "marks": [ + "3c67e3a142ba" + ], + "text": "here" + }, + { + "marks": [], + "text": " to install it. Once you have gcloud installed, run the command ", + "_key": "3038220f1b5f", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "gcloud init", + "_key": "15fa3021f802", + "_type": "span" + }, + { + "marks": [], + "text": " to initialize the CLI. You will be prompted to choose an existing project to work on or create a new one. For the purpose of this tutorial, we will create a new project. Name your project "my-rnaseq-pipeline". There may be a lot of information displayed on the screen after running this command, but you can ignore it for now.", + "_key": "8429998e9683", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\nSetting up Batch and Storage in Google Cloud Platform\n", + "_key": "eedb31169aaa", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "c37b701f727c" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Enable Google Batch", + "_key": "f14713d6d4af" + } + ], + "_type": "block", + "style": "h3", + "_key": "ab72c428fa0e" + }, + { + "style": "normal", + "_key": "75af9d0f088e", + "markDefs": [ + { + "_type": "link", + "href": "https://cloud.google.com/batch/docs/get-started", + "_key": "28252753c849" + } + ], + "children": [ + { + "text": "According to the ", + "_key": "f7a60d7ecd4e", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "28252753c849" + ], + "text": "official Google documentation", + "_key": "2d6145bb7907" + }, + { + "_key": "1b0903328014", + "_type": "span", + "marks": [], + "text": " " + }, + { + "text": "Batch is a fully managed service that lets you schedule, queue, and execute [batch processing](https://en.wikipedia.org/wiki/Batch_processing) workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale", + "_key": "3f97db81f636", + "_type": "span", + "marks": [ + "em" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "2d1bd7a00989" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "5040271b7464", + "markDefs": [], + "children": [ + { + "_key": "69ea0742f94d", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "661825f15f6c", + "markDefs": [], + "children": [ + { + "_key": "1c1aa05a4f6b", + "_type": "span", + "marks": [], + "text": "The first step is to download the " + }, + { + "_key": "25e679175bd4", + "_type": "span", + "marks": [ + "code" + ], + "text": "beta" + }, + { + "_key": "c75f8c836136", + "_type": "span", + "marks": [], + "text": " command group. You can do this by executing:" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "90cc0b1ab3f9", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7e214178bebe" + }, + { + "_key": "835a4ca38cf9", + "code": "$ gcloud components install beta", + "_type": "code" + }, + { + "_type": "block", + "style": "normal", + "_key": "c2ad92d563c1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7a158845ce56" + } + ] + }, + { + "children": [ + { + "_key": "a1ee6ca9d18e", + "_type": "span", + "marks": [], + "text": "Then, enable billing for this project. You will first need to get your account id with" + } + ], + "_type": "block", + "style": "normal", + "_key": "70e8032679f5", + "markDefs": [] + }, + { + "_key": "7d0ce4c2119c", + "markDefs": [], + "children": [ + { + "_key": "7e72b15ee683", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ gcloud beta billing accounts list", + "_type": "code", + "_key": "50ff076983b0" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "b737505423a4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4195dfe1ea37" + }, + { + "_key": "524fa7a1eb2d", + "markDefs": [], + "children": [ + { + "_key": "5ab19436a930", + "_type": "span", + "marks": [], + "text": "After that, you will see something like the following appear in your window:" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "a8758fd7f668", + "markDefs": [], + "children": [ + { + "_key": "0c25657b093a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "code", + "_key": "b07972ec3773", + "code": "ACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID\nXXXXX-YYYYYY-ZZZZZZ My Billing Account True" + }, + { + "style": "normal", + "_key": "83185677499c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "b9e4cccf8227" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "If you get the error “Service Usage API has not been used in project 842841895214 before or it is disabled”, simply run the command again and it should work. Then copy the account id, and the project id and paste them into the command below. This will enable billing for your project id.", + "_key": "330f7d74a793", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7eb027cfdc57" + }, + { + "_key": "07fb6666dd89", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "c145c7f7579f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "ec8ac3156e35", + "code": "$ gcloud beta billing projects link PROJECT-ID --billing-account XXXXXX-YYYYYY-ZZZZZZ", + "_type": "code" + }, + { + "_key": "9832bacce6ac", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "02546a432cab", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b23c4c13ab0c", + "markDefs": [], + "children": [ + { + "text": "Next, you must enable the Batch API, along with the Compute Engine and Cloud Logging APIs. You can do so with the following command:", + "_key": "e450e0eb53d9", + "_type": "span", + "marks": [] + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "55db5d6abe0d" + } + ], + "_type": "block", + "style": "normal", + "_key": "5860dca2e6e1", + "markDefs": [] + }, + { + "_key": "a76f1708fd2c", + "code": "$ gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com", + "_type": "code" + }, + { + "style": "normal", + "_key": "bbc58295ae43", + "markDefs": [], + "children": [ + { + "_key": "85ce60ce4c08", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "aa40d9186315", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "You should see a message similar to the one below:", + "_key": "ecfe9ee6ed90", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "478838fb4aa1", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "96623ecbd2fb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "13b2c8def5d7", + "code": "Operation \"operations/acf.p2-AAAA-BBBBB-CCCC--DDDD\" finished successfully.", + "_type": "code" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "\n", + "_key": "4cf8182570f7", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "447269e67a29" + }, + { + "_key": "5fc786597ec0", + "markDefs": [], + "children": [ + { + "_key": "2e57cbc2616f", + "_type": "span", + "marks": [], + "text": "Create a Service Account" + } + ], + "_type": "block", + "style": "h3" + }, + { + "markDefs": [ + { + "_key": "847a3cb2f2b1", + "_type": "link", + "href": "https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-gcloud" + } + ], + "children": [ + { + "text": "In order to access the APIs we enabled, you need to ", + "_key": "bdbc9cd54151", + "_type": "span", + "marks": [] + }, + { + "_key": "f8754daffbe4", + "_type": "span", + "marks": [ + "847a3cb2f2b1" + ], + "text": "create a Service Account" + }, + { + "_type": "span", + "marks": [], + "text": " and set the necessary IAM roles for the project. You can create the Service Account by executing:", + "_key": "292a6706da91" + } + ], + "_type": "block", + "style": "normal", + "_key": "1ba584cd052a" + }, + { + "style": "normal", + "_key": "b44b9fbd646d", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d0de6d313002" + } + ], + "_type": "block" + }, + { + "_type": "code", + "_key": "ad4bdca3e7fc", + "code": "$ gcloud iam service-accounts create rnaseq-pipeline-sa" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "ee17255d7084" + } + ], + "_type": "block", + "style": "normal", + "_key": "d0aa14575fff" + }, + { + "_key": "f1b2318907a6", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "After this, set appropriate roles for the project using the commands below:", + "_key": "e04b70cf3db5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a6810ad3fc2e" + } + ], + "_type": "block", + "style": "normal", + "_key": "12c91813881c", + "markDefs": [] + }, + { + "code": "$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/iam.serviceAccountUser\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/batch.jobsEditor\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/logging.viewer\"\n\n$ gcloud projects add-iam-policy-binding my-rnaseq-pipeline \\\n--member=\"serviceAccount:rnaseq-pipeline-sa@my-rnaseq-pipeline.iam.gserviceaccount.com\" \\\n--role=\"roles/storage.admin\"", + "_type": "code", + "_key": "7d13f848f2f1" + }, + { + "style": "normal", + "_key": "ec64464c2904", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "e875150619c1" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "Create your Bucket", + "_key": "e3d3b9d98e3b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "h3", + "_key": "f4aa41527974", + "markDefs": [] + }, + { + "_key": "c5f2664254e4", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Now it's time to create your Storage bucket, where both your input, intermediate and output files will be hosted and accessed by the Google Batch virtual machines. Your bucket name must be globally unique (across regions). For the example below, the bucket is named rnaseq-pipeline-nextflow-bucket. However, as this name has now been used you have to create a bucket with a different name", + "_key": "aea32544ddbb" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "764d4980c45a", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "cd8cfca1df31" + }, + { + "_type": "code", + "_key": "3047bd50bec2", + "code": "$ gcloud storage buckets create gs://rnaseq-pipeline-bckt" + }, + { + "children": [ + { + "text": "", + "_key": "7b9a1734f864", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "23f682b84f50", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "text": "Now it's time for Nextflow to join the party! 🥳", + "_key": "60aadeda5d84", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "72168f669946" + }, + { + "style": "normal", + "_key": "81ae14ff217c", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e48066d029e4", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "52387da857ae", + "_type": "span", + "marks": [], + "text": "Setting up Nextflow to make use of Batch and Storage" + } + ], + "_type": "block", + "style": "h2", + "_key": "ef10a5c1bd12", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "\nWrite the configuration file", + "_key": "a6b6f2c1fcb4" + } + ], + "_type": "block", + "style": "h3", + "_key": "b8c5b54c4fcb" + }, + { + "markDefs": [], + "children": [ + { + "text": "Here you will set up a simple RNAseq pipeline with Nextflow to be run entirely on Google Cloud Platform (GCP) directly from your local machine.", + "_key": "77b38786aaf1", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "a64ea1bb143f" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "8a54e6cb5f2c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "3fcff2378bc4" + }, + { + "_type": "block", + "style": "normal", + "_key": "3b770c6f35c3", + "markDefs": [], + "children": [ + { + "_key": "0864d5a89f16", + "_type": "span", + "marks": [], + "text": "Start by creating a folder for your project on your local machine, such as “rnaseq-example”. It's important to mention that you can also go fully cloud and use a Virtual Machine for everything we will do here locally." + } + ] + }, + { + "_key": "d1c559e688d7", + "markDefs": [], + "children": [ + { + "_key": "c5925ece0d84", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "49396905ad09", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Inside the folder that you created for the project, create a file named ", + "_key": "907c4f5ad88a" + }, + { + "_key": "def2c020fae8", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow.config" + }, + { + "_type": "span", + "marks": [], + "text": " with the following content (remember to replace PROJECT-ID with the project id you created above):", + "_key": "cc3b7bce788b" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "a02ccabc2d79" + } + ], + "_type": "block", + "style": "normal", + "_key": "df579ad7106f", + "markDefs": [] + }, + { + "_type": "code", + "_key": "bfa5cca99c69", + "code": "workDir = 'gs://rnaseq-pipeline-bckt/scratch'\n\nprocess {\n executor = 'google-batch'\n container = 'nextflow/rnaseq-nf'\n errorStrategy = { task.exitStatus==14 ? 'retry' : 'terminate' }\n maxRetries = 5\n}\n\ngoogle {\n project = 'PROJECT-ID'\n location = 'us-central1'\n batch.spot = true\n}" + }, + { + "style": "normal", + "_key": "98d65726a516", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2c8ece844ecc", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_key": "77ffe96a523b", + "markDefs": [], + "children": [ + { + "text": "The ", + "_key": "0654936d159f", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "workDir", + "_key": "2aa97fb5721a" + }, + { + "_key": "e4e06aff9f3b", + "_type": "span", + "marks": [], + "text": " option tells Nextflow to use the bucket you created as the work directory. Nextflow will use this directory to stage our input data and store intermediate and final data. Nextflow does not allow you to use the root directory of a bucket as the work directory -- it must be a subdirectory instead. Using a subdirectory is also just a good practice." + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "602c084fe2df", + "markDefs": [], + "children": [ + { + "_key": "e2748099361c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "700ef61290f0", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The ", + "_key": "43f0459bb9cd" + }, + { + "_key": "37ccb6e69e32", + "_type": "span", + "marks": [ + "code" + ], + "text": "process" + }, + { + "text": " scope tells Nextflow to run all the processes (steps) of your pipeline on Google Batch and to use the ", + "_key": "612c5f17b82e", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "nextflow/rnaseq-nf", + "_key": "c461db2400ad", + "_type": "span" + }, + { + "_key": "f85d9cfa0759", + "_type": "span", + "marks": [], + "text": " Docker image hosted on DockerHub (default) for all processes. Also, the error strategy will automatically retry any failed tasks with exit code 14, which is the exit code for spot instances that were reclaimed." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0d276d928470", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7df72c094a49" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "272b9ef92bc9", + "markDefs": [ + { + "href": "https://www.nextflow.io/docs/latest/google.html#spot-instances", + "_key": "bb2936cf5ab8", + "_type": "link" + } + ], + "children": [ + { + "marks": [], + "text": "The ", + "_key": "08fce59dc3a2", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "google", + "_key": "0b85966cada5" + }, + { + "_type": "span", + "marks": [], + "text": " scope is specific to Google Cloud. You need to provide the project id (don't provide the project name, it won't work!), and a Google Cloud location (leave it as above if you're not sure of what to put). In the example above, spot instances are also requested (more info about spot instances ", + "_key": "f80063b91e65" + }, + { + "text": "here", + "_key": "961d20fa85d9", + "_type": "span", + "marks": [ + "bb2936cf5ab8" + ] + }, + { + "marks": [], + "text": "), which are cheaper instances that, as a drawback, can be reclaimed at any time if resources are needed by the cloud provider. Based on what we have seen so far, the ", + "_key": "2c91a5815680", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "5e1e957ae643", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " file should contain "rnaseq-nxf" as the project id.", + "_key": "c6366d563a98" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "27729dc1e5a0" + } + ], + "_type": "block", + "style": "normal", + "_key": "890a8ebfebc8", + "markDefs": [] + }, + { + "_key": "4a448756320a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Use the command below to authenticate with Google Cloud Platform. Nextflow will use this account by default when you run a pipeline.", + "_key": "4bdada2e36c3" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "text": "", + "_key": "0fbcd665df67", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8727de833ac3" + }, + { + "code": "$ gcloud auth application-default login", + "_type": "code", + "_key": "f488d364d980" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "f0e00801c9f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "0aeb881f24b7", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Launch the pipeline!", + "_key": "cf451d36d8ca" + } + ], + "_type": "block", + "style": "h3", + "_key": "8ccbb8d91e90" + }, + { + "_type": "block", + "style": "normal", + "_key": "c8f94520bac9", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf", + "_key": "bf3ec4dfea75" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "With that done, you’re now ready to run the proof-of-concept RNAseq Nextflow pipeline. Instead of asking you to download it, or copy-paste something into a script file, you can simply provide the GitHub URL of the RNAseq pipeline mentioned at the beginning of ", + "_key": "0aceea84d950" + }, + { + "text": "this tutorial", + "_key": "820002441a2e", + "_type": "span", + "marks": [ + "bf3ec4dfea75" + ] + }, + { + "marks": [], + "text": ", and Nextflow will do all the heavy lifting for you. This pipeline comes with test data bundled with it, and for more information about it and how it was developed, you can check the public training material developed by Seqera Labs at <https: training.nextflow.io="">.", + "_key": "52f2161a08d9", + "_type": "span" + } + ] + }, + { + "style": "normal", + "_key": "53d76002b446", + "markDefs": [], + "children": [ + { + "_key": "b448f79a4dcd", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "children": [ + { + "text": "One important thing to mention is that in this repository there is already a ", + "_key": "8a51df595c74", + "_type": "span", + "marks": [] + }, + { + "text": "nextflow.config", + "_key": "6ad035af60a9", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "text": " file with different configuration, but don't worry about that. You can run the pipeline with the configuration file that we have wrote above using the ", + "_key": "a610e7f77a8e", + "_type": "span", + "marks": [] + }, + { + "_key": "7290d7c3e7ef", + "_type": "span", + "marks": [ + "code" + ], + "text": "-c" + }, + { + "marks": [], + "text": " Nextflow parameter. Run the command line below:", + "_key": "35f5b94c5c5a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "be4636722fc8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "e90a099c9c63", + "markDefs": [], + "children": [ + { + "_key": "f841ab6b5771", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "code": "$ nextflow run nextflow-io/rnaseq-nf -c nextflow.config", + "_type": "code", + "_key": "7d482473d32b" + }, + { + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "05417cd2d683", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "7a6a5b37fd9d" + }, + { + "_type": "block", + "style": "normal", + "_key": "fdf87214a7f7", + "markDefs": [ + { + "_key": "ecffbe50d053", + "_type": "link", + "href": "https://github.com/nextflow-io/rnaseq-nf/blob/ed179ef74df8d5c14c188e200a37fff61fd55dfb/modules/multiqc/main.nf#L5" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "While the pipeline stores everything in the bucket, our example pipeline will also download the final outputs to a local directory called ", + "_key": "b5e70e86e7d5" + }, + { + "text": "results", + "_key": "57cdfeb51cd3", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": ", because of how the ", + "_key": "1bb98dfa2883" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "publishDir", + "_key": "7f24faa22f2f" + }, + { + "_type": "span", + "marks": [], + "text": " directive was specified in the ", + "_key": "ec9fc60f4f41" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "main.nf", + "_key": "d94baec681fc" + }, + { + "_type": "span", + "marks": [], + "text": " script (example ", + "_key": "2cf5b0503440" + }, + { + "text": "here", + "_key": "3c0ca9c35f09", + "_type": "span", + "marks": [ + "ecffbe50d053" + ] + }, + { + "text": "). If you want to avoid the egress cost associated with downloading data from a bucket, you can change the ", + "_key": "e3eb83dafbc7", + "_type": "span", + "marks": [] + }, + { + "text": "publishDir", + "_key": "e2663d779e5f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " to another bucket directory, e.g. ", + "_key": "cda80dbda0dd" + }, + { + "marks": [ + "code" + ], + "text": "gs://rnaseq-pipeline-bckt/results", + "_key": "8515bf18a5d9", + "_type": "span" + }, + { + "text": ".", + "_key": "f1a5c5569291", + "_type": "span", + "marks": [] + } + ] + }, + { + "style": "normal", + "_key": "fe166b1cfebe", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "7bae53def502", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In your terminal, you should see something like this:", + "_key": "a5d345b3c49b" + } + ], + "_type": "block", + "style": "normal", + "_key": "31dd792d1971" + }, + { + "_type": "block", + "style": "normal", + "_key": "2988b5de445d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "e561e553fdfa", + "_type": "span", + "marks": [] + } + ] + }, + { + "alt": "Nextflow ongoing run on Google Cloud Batch", + "_key": "57b6920fd4d5", + "asset": { + "_ref": "image-f828eff746c5383b57ed1a8943f8bfd64f224475-1714x656-png", + "_type": "reference" + }, + "_type": "image" + }, + { + "_key": "b754a6329358", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "6f0a9dfe3b2b", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "You can check the status of your jobs on Google Batch by opening another terminal and running the following command:", + "_key": "d2bad5bbba43" + } + ], + "_type": "block", + "style": "normal", + "_key": "4e0d6e461a2e", + "markDefs": [] + }, + { + "_key": "e71d191cf0a4", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "06179d9ea5e4", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "code": "$ gcloud batch jobs list", + "_type": "code", + "_key": "042a2d28ce93" + }, + { + "_type": "block", + "style": "normal", + "_key": "e2fd1c323765", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "630d9653ec5d" + } + ] + }, + { + "style": "normal", + "_key": "b4f5c23a00dc", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "By the end of it, if everything worked well, you should see something like:", + "_key": "256c3085089f" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "b34980829a20", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "39a762d7636b" + }, + { + "_key": "47d737b659d2", + "asset": { + "_ref": "image-79042f36d93b14d4f6efb0eafa75d2043f6b797e-1728x866-png", + "_type": "reference" + }, + "_type": "image", + "alt": "Nextflow run on Google Cloud Batch finished" + }, + { + "style": "normal", + "_key": "f445bee37ae3", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "bbf66adba559", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "text": "And that's all, folks! 😆", + "_key": "7a4c0728b276", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6a911548ebb7" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "d5cded0acc2a" + } + ], + "_type": "block", + "style": "normal", + "_key": "fa6a8b7a8ab9", + "markDefs": [] + }, + { + "children": [ + { + "text": "You will find more information about Nextflow on Google Batch in ", + "_key": "54d4d3a3b8dc", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "e04f54fcfedf" + ], + "text": "this blog post", + "_key": "982ebc5705a1" + }, + { + "_key": "3faefd1b3a4d", + "_type": "span", + "marks": [], + "text": " and the " + }, + { + "_type": "span", + "marks": [ + "79f110285473" + ], + "text": "official Nextflow documentation", + "_key": "2fb54fa8c31e" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "587770709603" + } + ], + "_type": "block", + "style": "normal", + "_key": "0d663c3aa42a", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/blog/2022/deploy-nextflow-pipelines-with-google-cloud-batch.html", + "_key": "e04f54fcfedf" + }, + { + "_key": "79f110285473", + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/google.html" + } + ] + }, + { + "_key": "0cfd24fe661b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "340bdcc16d70", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "0c7956bb62ab", + "markDefs": [], + "children": [ + { + "_key": "7861ad82e5c5", + "_type": "span", + "marks": [], + "text": "Special thanks to Hatem Nawar, Chris Hakkaart, and Ben Sherman for providing valuable feedback to this document." + } + ], + "_type": "block", + "style": "normal" + } + ], + "_updatedAt": "2024-09-30T08:50:18Z", + "_id": "aedb1fe824f3" + }, + { + "publishedAt": "2016-09-01T06:00:00.000Z", + "_id": "aeef97ac1a63", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + }, + "_rev": "Ot9x7kyGeH5005E3MJ8jXd", + "title": "Deploy your computational pipelines in the cloud at the snap-of-a-finger", + "tags": [ + { + "_ref": "9161ec05-53f8-455a-a931-7b41f6ec5172", + "_type": "reference", + "_key": "81e954e89fdf" + }, + { + "_type": "reference", + "_key": "ffc19a407407", + "_ref": "7d9ffad4-385c-433d-a409-d0bfacc3c2fe" + }, + { + "_key": "05c75abe91c4", + "_ref": "ace8dd2c-eed3-4785-8911-d146a4e84bbb", + "_type": "reference" + }, + { + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference", + "_key": "beacc0d048ff" + } + ], + "body": [ + { + "_key": "630793c836af", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "em" + ], + "text": "Learn how to deploy and run a computational pipeline in the Amazon AWS cloud with ease thanks to Nextflow and Docker containers", + "_key": "ca060ca49785" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "39d41a18734a", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "7d9f2d41d405" + } + ] + }, + { + "children": [ + { + "_key": "2c0f4ff0fae8", + "_type": "span", + "marks": [], + "text": "Nextflow is a framework that simplifies the writing of parallel and distributed computational pipelines in a portable and reproducible manner across different computing platforms, from a laptop to a cluster of computers." + } + ], + "_type": "block", + "style": "normal", + "_key": "684b15f5b645", + "markDefs": [] + }, + { + "_key": "acf4e6979b97", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "2f3daf3616bd", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Indeed, the original idea, when this project started three years ago, was to implement a tool that would allow researchers in ", + "_key": "cf9bc9950bde", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "a1da9a3dd25b" + ], + "text": "our lab", + "_key": "bf0016703f19" + }, + { + "text": " to smoothly migrate their data analysis applications in the cloud when needed - without having to change or adapt their code.", + "_key": "9aae6272f542", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "48b1c13a5713", + "markDefs": [ + { + "_type": "link", + "href": "http://www.crg.eu/es/programmes-groups/comparative-bioinformatics", + "_key": "a1da9a3dd25b" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "265a6672a1ed", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "cec0dcfeaa53" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "12cd963ce635", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "However to date Nextflow has been used mostly to deploy computational workflows within on-premise computing clusters or HPC data-centers, because these infrastructures are easier to use and provide, on average, cheaper cost and better performance when compared to a cloud environment.", + "_key": "2f7f41e0cf3b" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "0587c018a9be", + "markDefs": [], + "children": [ + { + "_key": "1454a535519f", + "_type": "span", + "marks": [], + "text": "" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "3d1f7076c19b", + "_type": "span", + "marks": [], + "text": "A major obstacle to efficient deployment of scientific workflows in the cloud is the lack of a performant POSIX compatible shared file system. These kinds of applications are usually made-up by putting together a collection of tools, scripts and system commands that need a reliable file system to share with each other the input and output files as they are produced, above all in a distributed cluster of computers." + } + ], + "_type": "block", + "style": "normal", + "_key": "80e8764b9658" + }, + { + "_type": "block", + "style": "normal", + "_key": "9386f2fad05c", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "956382a0f767", + "_type": "span" + } + ] + }, + { + "_key": "fb70f47974bc", + "markDefs": [ + { + "href": "https://aws.amazon.com/efs/", + "_key": "b33502dca5f2", + "_type": "link" + } + ], + "children": [ + { + "_key": "04565282ca81", + "_type": "span", + "marks": [], + "text": "The recent availability of the " + }, + { + "marks": [ + "b33502dca5f2" + ], + "text": "Amazon Elastic File System", + "_key": "a55b4ac66a3a", + "_type": "span" + }, + { + "marks": [], + "text": " (EFS), a fully featured NFS based file system hosted on the AWS infrastructure represents a major step in this context, unlocking the deployment of scientific computing in the cloud and taking it to the next level.", + "_key": "e1af081c1ada", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_key": "bc234372001c", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "cf2132938604" + }, + { + "_key": "28522dfe5238", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow support for the cloud", + "_key": "e5c3ecf4aa21" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_key": "b07679c92183", + "markDefs": [ + { + "href": "https://github.com/gc3-uzh-ch/elasticluster", + "_key": "1817716e129a", + "_type": "link" + }, + { + "_key": "245fa2f66569", + "_type": "link", + "href": "https://aws.amazon.com/hpc/cfncluster/" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Nextflow could already be deployed in the cloud, either using tools such as ", + "_key": "766506450f43" + }, + { + "_key": "74bb174ca48f", + "_type": "span", + "marks": [ + "1817716e129a" + ], + "text": "ElastiCluster" + }, + { + "_key": "e94f0c168854", + "_type": "span", + "marks": [], + "text": " or " + }, + { + "_key": "83f380d4cd0e", + "_type": "span", + "marks": [ + "245fa2f66569" + ], + "text": "CfnCluster" + }, + { + "_key": "409cd6646840", + "_type": "span", + "marks": [], + "text": ", or by using custom deployment scripts. However the procedure was still cumbersome and, above all, it was not optimised to fully take advantage of cloud elasticity i.e. the ability to (re)shape the computing cluster dynamically as the computing needs change over time." + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "549e376a2b86" + } + ], + "_type": "block", + "style": "normal", + "_key": "1d33b6723ee5" + }, + { + "markDefs": [ + { + "href": "https://ignite.apache.org/", + "_key": "9cecfffe15c6", + "_type": "link" + } + ], + "children": [ + { + "_key": "ce0a26eb76bd", + "_type": "span", + "marks": [], + "text": "For these reasons, we decided it was time to provide Nextflow with a first-class support for the cloud, integrating the Amazon EFS and implementing an optimised native cloud scheduler, based on " + }, + { + "_key": "cc365c077581", + "_type": "span", + "marks": [ + "9cecfffe15c6" + ], + "text": "Apache Ignite" + }, + { + "text": ", with a full support for cluster auto-scaling and spot/preemptible instances.", + "_key": "193fc7402a87", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "798eaf8b860c" + }, + { + "style": "normal", + "_key": "08b5176ed6bb", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "36d9f2abc4a9" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "59f5d299aa98", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In practice this means that Nextflow can now spin-up and configure a fully featured computing cluster in the cloud with a single command, after that you need only to login to the master node and launch the pipeline execution as you would do in your on-premise cluster.", + "_key": "f8d89735a299", + "_type": "span" + } + ], + "_type": "block" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "49a576232da9", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "e0c2a05daf9f", + "markDefs": [] + }, + { + "children": [ + { + "marks": [], + "text": "Demo!", + "_key": "6f91aefed9a8", + "_type": "span" + } + ], + "_type": "block", + "style": "h3", + "_key": "ac50703612c8", + "markDefs": [] + }, + { + "style": "normal", + "_key": "b260a4f98d3f", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Since a demo is worth a thousands words, I've record a short screencast showing how Nextflow can setup a cluster in the cloud and mount the Amazon EFS shared file system.", + "_key": "fbadc8174784", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "2026ae20f123", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7174ff2876c6" + }, + { + "id": "asciicast-9vupd4d72ivaz6h56pajjjkop", + "_key": "681b40a00cf4", + "src": "https://asciinema.org/a/9vupd4d72ivaz6h56pajjjkop.js", + "_type": "script" + }, + { + "_key": "2f511cbb7277", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "Note: in this screencast it has been cut the Ec2 instances startup delay. It required around 5 minutes to launch them and setup the cluster.", + "_key": "cbc5c21ac6a3", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "Let’s recap the steps showed in the demo:", + "_key": "855ae945052c0", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "e8dba145dcea", + "markDefs": [] + }, + { + "_key": "ce35d599b1b0", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The user provides the cloud parameters (such as the VM image ID and the instance type) in the ", + "_key": "3030211520ac0" + }, + { + "marks": [ + "code" + ], + "text": "nextflow.config", + "_key": "3030211520ac1", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " file.", + "_key": "3030211520ac2" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "14e8f83ee46f", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "text": "To configure the EFS file system you need to provide your EFS storage ID and the mount path by using the ", + "_key": "3c8f6ebc36e40", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "sharedStorageId", + "_key": "3c8f6ebc36e41", + "_type": "span" + }, + { + "text": " and ", + "_key": "3c8f6ebc36e42", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "sharedStorageMount", + "_key": "3c8f6ebc36e43" + }, + { + "_type": "span", + "marks": [], + "text": " properties.", + "_key": "3c8f6ebc36e44" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "_key": "49da15281ffa", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/ec2/spot/", + "_key": "ea27c1ed26e2" + } + ], + "children": [ + { + "marks": [], + "text": "To use ", + "_key": "a857ae57001f0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "ea27c1ed26e2" + ], + "text": "EC2 Spot", + "_key": "a857ae57001f1" + }, + { + "marks": [], + "text": " instances, just specify the price you want to bid by using the ", + "_key": "a857ae57001f2", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "spotPrice", + "_key": "a857ae57001f3" + }, + { + "_type": "span", + "marks": [], + "text": " property.", + "_key": "a857ae57001f4" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "ca521505a419", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The AWS access and secret keys are provided by using the usual environment variables.", + "_key": "af540aa1d4550" + } + ], + "level": 1, + "_type": "block" + }, + { + "children": [ + { + "text": "The ", + "_key": "ac4ad9e23e5a0", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow cloud create", + "_key": "ac4ad9e23e5a1" + }, + { + "_type": "span", + "marks": [], + "text": " launches the requested number of instances, configures the user and access key, mounts the EFS storage and setups the Nextflow cluster automatically. Any Linux AMI can be used, it is only required that the ", + "_key": "ac4ad9e23e5a2" + }, + { + "text": "cloud-init", + "_key": "ac4ad9e23e5a3", + "_type": "span", + "marks": [ + "8b0adb4e8fa4" + ] + }, + { + "marks": [], + "text": " package, a Java 7+ runtime and the Docker engine are present.", + "_key": "ac4ad9e23e5a4", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "136511b46870", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://cloudinit.readthedocs.io/en/latest/", + "_key": "8b0adb4e8fa4" + } + ] + }, + { + "_key": "eafb92e2439b", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "When the cluster is ready, you can SSH in the master node and launch the pipeline execution as usual with the ", + "_key": "cb59aab18cc30", + "_type": "span" + }, + { + "_key": "cb59aab18cc31", + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow run " + }, + { + "marks": [], + "text": " command.", + "_key": "cb59aab18cc32", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "8597a82c1b96", + "listItem": "bullet", + "markDefs": [ + { + "_key": "b82208e33918", + "_type": "link", + "href": "https://github.com/pditommaso/paraMSA" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "For the sake of this demo we are using ", + "_key": "c83af1aa91450" + }, + { + "_key": "c83af1aa91451", + "_type": "span", + "marks": [ + "b82208e33918" + ], + "text": "paraMSA" + }, + { + "text": ", a pipeline for generating multiple sequence alignments and bootstrap replicates developed in our lab.", + "_key": "c83af1aa91452", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "ecdc762fa445", + "listItem": "bullet", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/pditommaso/paraMSA#dependencies-", + "_key": "2955af234e4d" + } + ], + "children": [ + { + "_key": "779548980c360", + "_type": "span", + "marks": [], + "text": "Nextflow automatically pulls the pipeline code from its GitHub repository when the execution is launched. This repository includes also a dataset which is used by default. " + }, + { + "_key": "779548980c361", + "_type": "span", + "marks": [ + "2955af234e4d" + ], + "text": "The many bioinformatic tools used by the pipeline" + }, + { + "marks": [], + "text": " are packaged using a Docker image, which is downloaded automatically on each computing node.", + "_key": "779548980c362", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal" + }, + { + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The pipeline results are uploaded automatically in the S3 bucket specified by the ", + "_key": "ff025a28367b0", + "_type": "span" + }, + { + "text": "--output s3://cbcrg-eu/para-msa-results", + "_key": "ff025a28367b1", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "marks": [], + "text": " command line option.", + "_key": "ff025a28367b2", + "_type": "span" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "0e9762246ee0" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "ae1fe4b13b08", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "099f6e6e3e4f0", + "_type": "span", + "marks": [], + "text": "When the computation is completed, the cluster can be safely shutdown and the EC2 instances terminated with the " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "nextflow cloud shutdown", + "_key": "099f6e6e3e4f1" + }, + { + "_type": "span", + "marks": [], + "text": " command.", + "_key": "099f6e6e3e4f2" + } + ] + }, + { + "_key": "3aef94403d37", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Try it yourself", + "_key": "2d9490d80afe" + } + ], + "_type": "block", + "style": "h3" + }, + { + "children": [ + { + "_type": "span", + "marks": [ + "strike-through" + ], + "text": "We are releasing the Nextflow integrated cloud support in the upcoming version", + "_key": "b609703d32240" + }, + { + "marks": [], + "text": " ", + "_key": "b609703d32241", + "_type": "span" + }, + { + "marks": [ + "strike-through", + "code" + ], + "text": "0.22.0", + "_key": "b609703d32242", + "_type": "span" + }, + { + "marks": [], + "text": ".", + "_key": "b609703d32243", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "067dc7e150ac", + "markDefs": [] + }, + { + "style": "normal", + "_key": "ccfe18996832", + "markDefs": [], + "children": [ + { + "_key": "e58c1df4ceae0", + "_type": "span", + "marks": [], + "text": "Nextflow integrated cloud support is available from version " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "0.22.0", + "_key": "e58c1df4ceae1" + }, + { + "_type": "span", + "marks": [], + "text": ". To use it just make sure to have this or an higher version of Nextflow.", + "_key": "e58c1df4ceae2" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "href": "https://blogs.windows.com/buildingapps/2016/03/30/run-bash-on-ubuntu-on-windows/", + "_key": "db69dae7f23e", + "_type": "link" + } + ], + "children": [ + { + "text": "Bare in mind that Nextflow requires a Unix-like operating system and a Java runtime version 7+ (Windows 10 users which have installed the ", + "_key": "72802990ca610", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "db69dae7f23e" + ], + "text": "Ubuntu subsystem", + "_key": "72802990ca611" + }, + { + "marks": [], + "text": " should be able to run it, at their risk..).", + "_key": "72802990ca612", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ff272accb032" + }, + { + "_type": "block", + "style": "normal", + "_key": "81a99e924220", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Once you have installed it, you can follow the steps in the above demo. For your convenience we made publicly available the EC2 image ", + "_key": "f7218a77b9510" + }, + { + "_type": "span", + "marks": [ + "strike-through", + "code" + ], + "text": "ami-43f49030", + "_key": "f7218a77b9511" + }, + { + "_type": "span", + "marks": [], + "text": " ", + "_key": "f7218a77b9512" + }, + { + "text": "ami-4b7daa32", + "_key": "f7218a77b9513", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_key": "f7218a77b9514", + "_type": "span", + "marks": [], + "text": "* (EU Ireland region) used to record this screencast." + } + ] + }, + { + "code": "AWS_ACCESS_KEY_ID=\"\"\nAWS_SECRET_ACCESS_KEY=\"\"\nAWS_DEFAULT_REGION=\"\"", + "_type": "code", + "_key": "376dd2513c8f" + }, + { + "markDefs": [ + { + "href": "/docs/latest/awscloud.html", + "_key": "f948c593b53a", + "_type": "link" + } + ], + "children": [ + { + "text": "Refer to the ", + "_key": "28d43b3643dd", + "_type": "span", + "marks": [] + }, + { + "_key": "67bc77273041", + "_type": "span", + "marks": [ + "f948c593b53a" + ], + "text": "documentation" + }, + { + "_type": "span", + "marks": [], + "text": " for configuration details.", + "_key": "f1cc819c9957" + } + ], + "_type": "block", + "style": "normal", + "_key": "2ba7d7e2ac65" + }, + { + "children": [ + { + "marks": [], + "text": "", + "_key": "b9a1cdcc6129", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "81fd727b3a97", + "markDefs": [] + }, + { + "children": [ + { + "_key": "e7ebc886f81f0", + "_type": "span", + "marks": [], + "text": "* Update: the AMI has been updated with Java 8 on Sept 2017." + } + ], + "_type": "block", + "style": "normal", + "_key": "1ced4a3bd3dc", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Conclusion", + "_key": "f358c65b605a0" + } + ], + "_type": "block", + "style": "h3", + "_key": "2cc15dbf661f", + "markDefs": [] + }, + { + "children": [ + { + "text": "Nextflow provides state of the art support for cloud and containers technologies making it possible to create computing clusters in the cloud and deploy computational workflows in a no-brainer way, with just two commands on your terminal.", + "_key": "9de3d14334370", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "7516610cc32a", + "markDefs": [] + }, + { + "style": "normal", + "_key": "d3819dff3f1c", + "markDefs": [], + "children": [ + { + "_key": "8abe576c3487", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "c9e1e8329179", + "markDefs": [], + "children": [ + { + "text": "In an upcoming post I will describe the autoscaling capabilities implemented by the Nextflow scheduler that allows, along with the use of spot/preemptible instances, a cost effective solution for the execution of your pipeline in the cloud.", + "_key": "1159b45786f50", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4b551af3d13b", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "", + "_key": "892a3a2bf3f5", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "27bb21b95547", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Credits", + "_key": "d377c2b0851d0" + } + ], + "_type": "block", + "style": "h4" + }, + { + "_type": "block", + "style": "normal", + "_key": "91a4a6a8158c", + "markDefs": [ + { + "_key": "274330e3fd43", + "_type": "link", + "href": "https://github.com/skptic" + }, + { + "_key": "0df7e27557f4", + "_type": "link", + "href": "https://github.com/skptic/paraMSA/" + } + ], + "children": [ + { + "marks": [], + "text": "Thanks to ", + "_key": "46e754fb2a670", + "_type": "span" + }, + { + "_key": "46e754fb2a671", + "_type": "span", + "marks": [ + "274330e3fd43" + ], + "text": "Evan Floden" + }, + { + "_type": "span", + "marks": [], + "text": " for reviewing this post and for writing the ", + "_key": "46e754fb2a672" + }, + { + "text": "paraMSA", + "_key": "46e754fb2a673", + "_type": "span", + "marks": [ + "0df7e27557f4" + ] + }, + { + "_type": "span", + "marks": [], + "text": " pipeline.", + "_key": "46e754fb2a674" + } + ] + } + ], + "_updatedAt": "2024-10-02T11:10:02Z", + "meta": { + "description": "Nextflow is a framework that simplifies the writing of parallel and distributed computational pipelines in a portable and reproducible manner across different computing platforms, from a laptop to a cluster of computers.", + "slug": { + "current": "deploy-in-the-cloud-at-snap-of-a-finger" + } + }, + "_createdAt": "2024-09-25T14:15:03Z", + "_type": "blogPost" + }, + { + "meta": { + "slug": { + "current": "a-nextflow-docker-murder-mystery-the-mysterious-case-of-the-oom-killer" + } + }, + "_updatedAt": "2024-09-26T09:03:26Z", + "author": { + "_ref": "graham-wright", + "_type": "reference" + }, + "_createdAt": "2024-09-25T14:16:53Z", + "title": "A Nextflow-Docker Murder Mystery: The mysterious case of the “OOM killer”", + "_type": "blogPost", + "body": [ + { + "_key": "921a218068a1", + "markDefs": [], + "children": [ + { + "_key": "f9a0a5cf1f7d", + "_type": "span", + "marks": [], + "text": "Most support tickets crossing our desks don’t warrant a blog article. However, occasionally we encounter a genuine mystery—a bug so pervasive and vile that it threatens innocent containers and pipelines everywhere. Such was the case of the " + }, + { + "text": "_OOM killer_", + "_key": "c31776687546", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "d6cd82bf17e2" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "e77524293478" + } + ], + "_type": "block", + "style": "normal", + "_key": "be6f8a334fce" + }, + { + "_key": "6dec27b831af", + "markDefs": [], + "children": [ + { + "_key": "5b522bcc11cf", + "_type": "span", + "marks": [], + "text": "In this article, we alert our colleagues in the Nextflow community to the threat. We also discuss how to recognize the killer’s signature in case you find yourself dealing with a similar murder mystery in your own cluster or cloud." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "6ef608b6e99e", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ad823dcccd23" + }, + { + "_type": "block", + "_key": "bcdf0f10559e" + }, + { + "children": [ + { + "_type": "span", + "text": "To catch a killer", + "_key": "408c5ac1de6c" + } + ], + "_type": "block", + "style": "h2", + "_key": "a4fd510f0f8d" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In mid-2022, Nextflow jobs began to mysteriously die. Containerized tasks were being struck down in the prime of life, seemingly at random. By November, the body count was beginning to mount: Out-of-memory (OOM) errors were everywhere we looked!", + "_key": "07ef1c65a478" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce0730067cb0" + }, + { + "_type": "block", + "style": "normal", + "_key": "a6105ec0a0cf", + "children": [ + { + "_type": "span", + "text": "", + "_key": "20d614ba8d4b" + } + ] + }, + { + "children": [ + { + "text": "It became clear that we had a serial killer on our hands. Unfortunately, identifying a suspect turned out to be easier said than done. Nextflow is rather good at restarting failed containers after all, giving the killer a convenient alibi and plenty of places to hide. Sometimes, the killings went unnoticed, requiring forensic analysis of log files.", + "_key": "c0baa94d2e78", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8de44d626d28", + "markDefs": [] + }, + { + "_key": "35ff0ec86c97", + "children": [ + { + "text": "", + "_key": "caa1cdb17447", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "While we’ve made great strides, and the number of killings has dropped dramatically, the killer is still out there. In this article, we offer some tips that may prove helpful if the killer strikes in your environment.", + "_key": "8a52f399c577", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "94c787871e96", + "markDefs": [] + }, + { + "_key": "5c16408f9f58", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2e0782a631a6" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "2592344525eb", + "children": [ + { + "_key": "c74c2988d0f4", + "_type": "span", + "text": "Establishing an MO" + } + ], + "_type": "block", + "style": "h2" + }, + { + "style": "normal", + "_key": "08cc3fd565d3", + "markDefs": [ + { + "href": "https://aws.amazon.com/ec2/", + "_key": "43762d19f52d", + "_type": "link" + } + ], + "children": [ + { + "text": "Fortunately for our intrepid investigators, the killer exhibited a consistent ", + "_key": "cab0331d20eb", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": "modus operandi", + "_key": "f4e6127ab2ce" + }, + { + "marks": [], + "text": ". Containerized jobs on ", + "_key": "9a8d7ee5753c", + "_type": "span" + }, + { + "text": "Amazon EC2", + "_key": "3876e025bc71", + "_type": "span", + "marks": [ + "43762d19f52d" + ] + }, + { + "_key": "d1f37512f579", + "_type": "span", + "marks": [], + "text": " were being killed due to out-of-memory (OOM) errors, even when plenty of memory was available on the container host. While we initially thought the killer was native to the AWS cloud, we later realized it could also strike in other locales." + } + ], + "_type": "block" + }, + { + "_key": "9b45d99fb90d", + "children": [ + { + "_key": "17ce11c40ae5", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "What the killings had in common was that they tended to occur when Nextflow tasks copied large files from Amazon S3 to a container’s local file system via the AWS CLI. As some readers may know, Nextflow leverages the AWS CLI behind the scenes to facilitate data movement. The killer’s calling card was an ", + "_key": "8aeee33a8f11", + "_type": "span" + }, + { + "_key": "c175aa336abc", + "_type": "span", + "marks": [ + "code" + ], + "text": "[Errno 12] Cannot allocate memory" + }, + { + "text": " message, causing the container to terminate with an exit status of 1.", + "_key": "d80341c18b2c", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "6f269d922f57", + "markDefs": [] + }, + { + "children": [ + { + "_key": "36a792046b86", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "434511b25598" + }, + { + "code": "Nov-08 21:54:07.926 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:MARKDUPLICATES:BAM_TO_CRAM:SAMTOOLS_STATS_CRAM (004-005_L3.SSHT82)'\nCaused by:\n Essential container in task exited\n..\nCommand error:\n download failed: s3://myproject/NFTower-Ref/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory", + "_type": "code", + "_key": "2fdbd87d8e62" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "15c187b9823d" + } + ], + "_type": "block", + "style": "normal", + "_key": "249c1ce2b467" + }, + { + "_key": "c215be19ce92", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "The problem is illustrated in the diagram below. In theory, Nextflow should have been able to dispatch multiple containerized tasks to a single host. However, tasks were being killed with out-of-memory errors even though plenty of memory was available. Rather than being able to run many containers per host, we could only run two or three and even that was dicey! Needless to say, this resulted in a dramatic loss of efficiency.", + "_key": "b7f96a02a12c", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "7a3b93f25527", + "children": [ + { + "_type": "span", + "text": "", + "_key": "dde2ae353eef" + } + ], + "_type": "block" + }, + { + "asset": { + "_ref": "image-798f67cc016f892ff5c67d9bb1a9bfbbfa0b44db-1368x853-jpg", + "_type": "reference" + }, + "_type": "image", + "alt": "", + "_key": "53b8b9ab38c3" + }, + { + "style": "normal", + "_key": "1eac506fa658", + "markDefs": [], + "children": [ + { + "_key": "e77df9b01dcd", + "_type": "span", + "marks": [], + "text": "Among our crack team of investigators, alarm bells began to ring. We asked ourselves, " + }, + { + "marks": [ + "em" + ], + "text": "“Could the killer be inside the house?”", + "_key": "a2b58de8591b", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " Was it possible that Nextflow was nefariously killing its own containerized tasks?", + "_key": "adf040f5dc65" + } + ], + "_type": "block" + }, + { + "_key": "ca522c167692", + "children": [ + { + "text": "", + "_key": "3b2926ffa258", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "marks": [], + "text": "Before long, reports of similar mysterious deaths began to trickle in from other jurisdictions. It turned out that the killer had struck ", + "_key": "83daedd7ca1a", + "_type": "span" + }, + { + "marks": [ + "398d91603eed" + ], + "text": "Cromwell", + "_key": "e59042933c40", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " also (", + "_key": "ff744900b2c3" + }, + { + "_type": "span", + "marks": [ + "7a3389448b8b" + ], + "text": "see the police report here", + "_key": "f00d2fb00df8" + }, + { + "text": "). We breathed a sigh of relief that we could rule out Nextflow as the culprit, but we still had a killer on the loose and a series of container murders to solve!", + "_key": "48f46ecb5914", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "0671b574c676", + "markDefs": [ + { + "_type": "link", + "href": "https://cromwell.readthedocs.io/en/stable/", + "_key": "398d91603eed" + }, + { + "href": "https://github.com/aws/aws-cli/issues/5876", + "_key": "7a3389448b8b", + "_type": "link" + } + ] + }, + { + "children": [ + { + "_key": "e53b76f620f0", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "fda7e67a5b2e" + }, + { + "style": "h2", + "_key": "dea0fae090d1", + "children": [ + { + "_type": "span", + "text": "Recreating the scene of the crime", + "_key": "a051f797f8b3" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "3d090bb242ac", + "markDefs": [ + { + "_key": "50e8bf6ae39a", + "_type": "link", + "href": "https://codefresh.io/blog/docker-memory-usage/" + } + ], + "children": [ + { + "text": "As any good detective knows, recreating the scene of the crime is a good place to start. It turned out that our killer had a profile and had been targeting containers processing large datasets since 2020. We came across an excellent ", + "_key": "cf2c7b77bb2a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "50e8bf6ae39a" + ], + "text": "codefresh.io article", + "_key": "e18a93ca4103" + }, + { + "_key": "fe9bd0520aaf", + "_type": "span", + "marks": [], + "text": " by Saffi Hartal, discussing similar murders and suggesting techniques to lure the killer out of hiding and protect the victims. Unfortunately, the suggested workaround of periodically clearing kernel buffers was impractical in our Nextflow pipeline scenario." + } + ] + }, + { + "children": [ + { + "_key": "1781602dd3ce", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "3ffc8796c70d" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://codefresh.io/blog/docker-memory-usage/", + "_key": "90faa7606e49" + } + ], + "children": [ + { + "_key": "805bebc15921", + "_type": "span", + "marks": [], + "text": "We borrowed the Python script from " + }, + { + "_type": "span", + "marks": [ + "90faa7606e49" + ], + "text": "Saffi’s article", + "_key": "acaf5220eb5b" + }, + { + "text": " designed to write huge files and simulate the issues we saw with the Linux buffer and page cache. Using this script, we hoped to replicate the conditions at the time of the murders.", + "_key": "57825ef43383", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "d8f7d04ead1e" + }, + { + "_type": "block", + "style": "normal", + "_key": "3d04bbd5f1c2", + "children": [ + { + "_key": "c165f3785871", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "Using separate SSH sessions to the same docker host, we manually launched the Python script from the command line to run in a Docker container, allocating 512MB of memory to each container. This was meant to simulate the behavior of the Nextflow head job dispatching multiple tasks to the same Docker host. We monitored memory usage as each container was started.", + "_key": "47f1e24bd993", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "ba4c3eb6544f", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "9c9334805e98" + } + ], + "_type": "block", + "style": "normal", + "_key": "0562e6a441ae" + }, + { + "code": "$ docker run --rm -it -v $PWD/dockertest.py:/dockertest.py --entrypoint /bin/bash --memory=\"512M\" --memory-swap=0 python:3.10.5-slim-bullseye", + "_type": "code", + "_key": "c639cb8de5f2" + }, + { + "_key": "584d80971d6c", + "children": [ + { + "_key": "2c5f040b107f", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "446f123a5684", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Sure enough, we found that containers began dying with out-of-memory errors. Sometimes we could run a single container, and sometimes we could run two. Containers died even though memory use was well under the cgroups-enforced maximum, as reported by docker stats. As containers ran, we also used the Linux ", + "_key": "69f61d202854" + }, + { + "text": "free", + "_key": "3c0cf43b418f", + "_type": "span", + "marks": [ + "code" + ] + }, + { + "_type": "span", + "marks": [], + "text": " command to monitor memory usage and the combined memory used by kernel buffers and the page cache.", + "_key": "479d1145e96f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "7a690f00cf5b", + "children": [ + { + "text": "", + "_key": "72604d2ba60a", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "Developing a theory of the case", + "_key": "494e1c44c4cd" + } + ], + "_type": "block", + "style": "h2", + "_key": "14d27278c9d1" + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "From our testing, we were able to clear both Nextflow and the AWS S3 copy facility since we could replicate the out-of-memory error in our controlled environment independent of both.", + "_key": "dee4eb4bc801" + } + ], + "_type": "block", + "style": "normal", + "_key": "f567ee8ed3ad", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "8aeb952019ad" + } + ], + "_type": "block", + "style": "normal", + "_key": "7271da666b72" + }, + { + "_key": "bf090244efff", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "We had multiple theories of the case: ", + "_key": "5edd2d4e551e", + "_type": "span" + }, + { + "text": "_Was it Colonel Mustard with an improper cgroups configuration? Was it Professor Plum and the size of the SWAP partition? Was it Mrs. Peacock running a Linux 5.20 kernel?_", + "_key": "ee1f41645a3b", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "e5a4fde47159", + "children": [ + { + "_type": "span", + "text": "", + "_key": "374ad28439da" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "For the millennials and Gen Zs in the crowd, you can find a primer on the CLUE/Cluedo references [here](https://en.wikipedia.org/wiki/Cluedo)", + "_key": "5e56ed7f110e", + "_type": "span", + "marks": [ + "em" + ] + } + ], + "_type": "block", + "style": "normal", + "_key": "02865e27ee4d", + "markDefs": [] + }, + { + "style": "normal", + "_key": "72c35ce2e054", + "children": [ + { + "_key": "c7a8c1a1ce96", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "5f3b67fc26b9", + "markDefs": [], + "children": [ + { + "_key": "9bdeae700314", + "_type": "span", + "marks": [], + "text": "To make a long story short, we identified several suspects and conducted tests to clear each suspect one by one. Tests included the following:" + } + ] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "d48401818439" + } + ], + "_type": "block", + "style": "normal", + "_key": "f0df90042862" + }, + { + "listItem": "bullet", + "children": [ + { + "text": "We conducted tests with EBS vs. NVMe disk volumes to see if the error was related to page caches when using EBS. The problems persisted with NVMe but appeared to be much less severe.", + "_key": "c35ee4cc5e4c", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "We attempted to configure a swap partition as recommended in this [AWS article](https://repost.aws/knowledge-center/ecs-resolve-outofmemory-errors), which discusses similar out-of-memory errors in Amazon ECS (used by AWS Batch). AWS provides good documentation on managing container [swap space](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-swap.html) using the `--memory-swap` switch. You can learn more about how Docker manages swap space in the [Docker documentation](https://docs.docker.com/config/containers/resource_constraints/).", + "_key": "c208eb9d5d10" + }, + { + "_key": "8ed6e78f9788", + "_type": "span", + "marks": [], + "text": "Creating swap files on the Docker host and making swap available to containers using the switch `--memory-swap=\"1g\"` appeared to help, and we learned a lot in the process. Using this workaround we could reliably run 10 containers simultaneously, whereas previously, we could run only one or two. This was a good workaround for static clusters but wasn’t always helpful in cloud batch environments. Creating the swap partition requires root privileges, and in batch environments, where resources may be provisioned automatically, this could be difficult to implement. It also didn’t explain the root cause of why containers were being killed. You can use the commands below to create a swap partition:" + } + ], + "_type": "block", + "style": "normal", + "_key": "9f313666b5aa" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "459cba96044e" + } + ], + "_type": "block", + "style": "normal", + "_key": "9d81bf348759" + }, + { + "code": "$ sudo dd if=/dev/zero of=/mnt/2GiB.swap bs=2048 count=1048576\n$ mkswap /mnt/2GiB.swap\n$ swapon /mnt/2GiB.swap", + "_type": "code", + "_key": "8c731f952235" + }, + { + "_key": "cfdc3617ffbe", + "children": [ + { + "_type": "span", + "text": "", + "_key": "2e7953bcf589" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "A break in the case!", + "_key": "b7b7d3d8bc37" + } + ], + "_type": "block", + "style": "h2", + "_key": "194dcfe2ac01" + }, + { + "_key": "50aade2d4302", + "markDefs": [ + { + "_type": "link", + "href": "https://github.com/jordeu", + "_key": "e7cb6fcf7ad4" + } + ], + "children": [ + { + "_key": "7794be435182", + "_type": "span", + "marks": [], + "text": "On Nov 16th, we finally caught a break in the case. A hot tip from Seqera Lab’s own " + }, + { + "text": "Jordi Deu-Pons", + "_key": "674ae8f919d2", + "_type": "span", + "marks": [ + "e7cb6fcf7ad4" + ] + }, + { + "text": ", indicated the culprit may be lurking in the Linux kernel. He suggested hard coding limits for two Linux kernel parameters as follows:", + "_key": "3b25200bde23", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "b11d096287fa", + "children": [ + { + "_type": "span", + "text": "", + "_key": "1e15465c4310" + } + ] + }, + { + "code": "$ echo \"838860800\" > /proc/sys/vm/dirty_bytes\n$ echo \"524288000\" > /proc/sys/vm/dirty_background_bytes", + "_type": "code", + "_key": "7a37f264dca4" + }, + { + "_key": "eda9dad47f5b", + "children": [ + { + "_key": "91815c12d850", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "6c45c00ff748", + "markDefs": [ + { + "_type": "link", + "href": "https://bugzilla.kernel.org/show_bug.cgi?id=207273", + "_key": "11198ccf63b9" + } + ], + "children": [ + { + "_key": "d89292b7e950", + "_type": "span", + "marks": [], + "text": "While it may seem like a rather unusual and specific leap of brilliance, our tipster’s hypothesis was inspired by this " + }, + { + "_key": "17e46fb6f2e0", + "_type": "span", + "marks": [ + "11198ccf63b9" + ], + "text": "kernel bug" + }, + { + "_type": "span", + "marks": [], + "text": " description. With this simple change, the reported memory usage for each container, as reported by docker stats, dropped dramatically. ", + "_key": "92d057219907" + }, + { + "_key": "c61f05380c3d", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Suddenly, we could run as many containers simultaneously as physical memory would allow." + }, + { + "_key": "600fcf8b731c", + "_type": "span", + "marks": [], + "text": " It turns out that this was a regression bug that only manifested in newer versions of the Linux kernel." + } + ] + }, + { + "children": [ + { + "text": "", + "_key": "8910bdde4b34", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4db151f9c95c" + }, + { + "style": "normal", + "_key": "81c132be6d8c", + "markDefs": [ + { + "_type": "link", + "href": "https://docs.kernel.org/admin-guide/sysctl/vm.html", + "_key": "bd6dde83b5a4" + } + ], + "children": [ + { + "text": "By hardcoding these ", + "_key": "ae074ccf8d49", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [ + "bd6dde83b5a4" + ], + "text": "kernel parameters", + "_key": "80a4adffc629" + }, + { + "_key": "f5cca63efb9d", + "_type": "span", + "marks": [], + "text": ", we were limiting the number of dirty pages the kernel could hold before writing pages to disk. When these variables were not set, they defaulted to 0, and the default parameters " + }, + { + "_type": "span", + "marks": [ + "code" + ], + "text": "dirty_ratio", + "_key": "ae8b2a10c191" + }, + { + "_type": "span", + "marks": [], + "text": " and ", + "_key": "9a02e02465cf" + }, + { + "_key": "a8a3e7135c88", + "_type": "span", + "marks": [ + "code" + ], + "text": "dirty_bakground_ratio" + }, + { + "_type": "span", + "marks": [], + "text": " took effect instead.", + "_key": "af2c0fd84d42" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "0304bf8e65ea", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "7d6672d455b7" + }, + { + "style": "normal", + "_key": "499958799272", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "In high-load conditions (such as data-intensive Nextflow pipeline tasks), processes accumulated dirty pages faster than the kernel could flush them to disk, eventually leading to the out-of-memory condition. By hard coding the dirty pages limit, we forced the kernel to flush the dirty pages to disk, thereby avoiding the bug. This also explained why the problem was less pronounced using NVMe storage, where flushing to disk occurred more quickly, thus mitigating the problem.", + "_key": "834b906d9aa0" + } + ], + "_type": "block" + }, + { + "style": "normal", + "_key": "e57eb9588154", + "children": [ + { + "_type": "span", + "text": "", + "_key": "91e773cb2970" + } + ], + "_type": "block" + }, + { + "children": [ + { + "_key": "352838799b96", + "_type": "span", + "marks": [], + "text": "Further testing determined that the bug appeared reliably on the newer " + }, + { + "_key": "f86c737a7cca", + "_type": "span", + "marks": [ + "44252d4752d3" + ], + "text": "AMI Linux 2 AMI using the 5.10 kernel" + }, + { + "text": ". The bug did not seem to appear when using the older Amazon Linux 2 AMI running the 4.14 kernel version.", + "_key": "5430eda10cea", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "70e1a0b5155f", + "markDefs": [ + { + "_type": "link", + "href": "https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-linux-2-ami-kernel-5-10/", + "_key": "44252d4752d3" + } + ] + }, + { + "_key": "78d2703d8286", + "children": [ + { + "_type": "span", + "text": "", + "_key": "256b971b4e69" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "3527b81e0b08", + "markDefs": [], + "children": [ + { + "_key": "119c1f6d5191", + "_type": "span", + "marks": [], + "text": "We now had two solid strategies to resolve the problem and thwart our killer:" + } + ] + }, + { + "style": "normal", + "_key": "125db74e86a0", + "children": [ + { + "_type": "span", + "text": "", + "_key": "95644fc28402" + } + ], + "_type": "block" + }, + { + "_key": "c5a33c58751a", + "listItem": "bullet", + "children": [ + { + "marks": [], + "text": "Create a swap partition and run containers with the `--memory-swap` flag set.", + "_key": "3880a39cfeeb", + "_type": "span" + }, + { + "_key": "b8508f27364f", + "_type": "span", + "marks": [], + "text": "Set `dirty_bytes` and `dirty_background_bytes` kernel variables on the Docker host before launching the jobs." + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "1ab45fadff4c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "945f68a653e4" + } + ] + }, + { + "_key": "966bc3be7946", + "children": [ + { + "_key": "d065ab8b4b7f", + "_type": "span", + "text": "The killer is (mostly) brought to justice" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "marks": [], + "text": "Avoiding the Linux 5.10 kernel was obviously not a viable option. The 5.10 kernel includes support for important processor architectures such as Intel® Ice Lake. This bug did not manifest earlier because, by default, AWS Batch was using ECS-optimized AMIs based on the 4.14 kernel. Further testing showed us that the killer could still appear in 4.14 environments, but the bug was harder to trigger.", + "_key": "c1cb581c8255", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "27663780cfca", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ae05961cf9dd", + "children": [ + { + "_key": "487d33d9b827", + "_type": "span", + "text": "" + } + ] + }, + { + "children": [ + { + "_key": "134dbd0f6100", + "_type": "span", + "marks": [], + "text": "We ended up working around the problem for Nextflow Tower users by tweaking the kernel parameters in the compute environment deployed by Tower Forge. This solution works reliably with AMIs based on both the 4.14 and 5.10 kernels. We considered adding a swap partition as this was another potential solution to the problem. However, we were concerned that this could have performance implications, particularly for customers running with EBS gp2 magnetic disk storage." + } + ], + "_type": "block", + "style": "normal", + "_key": "be7e81ee0506", + "markDefs": [] + }, + { + "children": [ + { + "_key": "01c3a149d9ec", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "18071b548eef" + }, + { + "style": "normal", + "_key": "6ef32dfea2fe", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/fusion/", + "_key": "ccea40ca1519" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Interestingly, we also tested the ", + "_key": "e8669d46ba9b" + }, + { + "_key": "4d7e95966b94", + "_type": "span", + "marks": [ + "ccea40ca1519" + ], + "text": "Fusion v2 file system" + }, + { + "_key": "b9dcf34acee6", + "_type": "span", + "marks": [], + "text": " with NVMe disk. Using Fusion, we avoided the bug entirely on both kernel versions without needing to adjust kernel partitions or add a swap partition." + } + ], + "_type": "block" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "45e35b64e0f6" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce95fc57abb8" + }, + { + "style": "h2", + "_key": "061d4e63b566", + "children": [ + { + "_type": "span", + "text": "Some helpful investigative tools", + "_key": "d7bf628c7af5" + } + ], + "_type": "block" + }, + { + "_key": "4d723debcd6e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "If you find evidence of foul play in your cloud or cluster, here are some useful investigative tools you can use:", + "_key": "364947422104", + "_type": "span" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "d1ed412af12f", + "children": [ + { + "text": "", + "_key": "5e2b71d96c73", + "_type": "span" + } + ] + }, + { + "children": [ + { + "_key": "d75c9d6f707a", + "_type": "span", + "marks": [], + "text": "After manually starting a container, use [docker stats](https://docs.docker.com/engine/reference/commandline/stats/) to monitor the CPU and memory used by each container compared to available memory." + }, + { + "text": "\n\n", + "_key": "2b47dff82d73", + "_type": "span" + }, + { + "_type": "span", + "text": "```bash\n$ watch docker stats\n```\n", + "_key": "209c7d99fce1" + }, + { + "text": "The Linux [free](https://linuxhandbook.com/free-command/) utility is an excellent way to monitor memory usage. You can track total, used, and free memory and monitor the combined memory used by kernel buffers and page cache reported in the _buff/cache_ column.", + "_key": "88f4ebe9bc7b", + "_type": "span", + "marks": [] + }, + { + "text": "\n\n", + "_key": "7bf0a6034d35", + "_type": "span" + }, + { + "_type": "span", + "text": "```bash\n$ free -h\n```\n", + "_key": "6ff70d24a519" + }, + { + "_type": "span", + "marks": [], + "text": "After a container was killed, we executed the command below on the Docker host to confirm why the containerized Python script was killed.", + "_key": "db810eea8c4e" + }, + { + "text": "\n\n", + "_key": "de47afd7ef08", + "_type": "span" + }, + { + "_type": "span", + "text": "```bash\n$ dmesg -T | grep -i ‘killed process’\n```\n", + "_key": "0776770c1561" + }, + { + "marks": [], + "text": "We used the Linux [htop](https://man7.org/linux/man-pages/man1/htop.1.html) command to monitor CPU and memory usage to check the results reported by Docker and double-check CPU and memory use.", + "_key": "757439c7d5d0", + "_type": "span" + }, + { + "marks": [], + "text": "You can use the command [systemd-cgtop](https://www.commandlinux.com/man-page/man1/systemd-cgtop.1.html) to validate group settings and ensure you are not running into arbitrary limits imposed by _cgroups_.", + "_key": "66a3a0305497", + "_type": "span" + }, + { + "marks": [], + "text": "Related to the _cgroups_ settings described above, you can inspect various memory-related limits directly from the file system. You can also use an alias to make the large numbers associated with _cgroups_ parameters easier to read. For example:", + "_key": "483c337fb623", + "_type": "span" + }, + { + "_type": "span", + "text": "\n\n", + "_key": "5e89a9c20ee6" + }, + { + "text": "```bash\n$ alias n='numft --to=iec-i'\n$ cat /sys/fs/cgroup/memory/docker/DOCKER_CONTAINER/memory.limit_in_bytes | n\n512Mi\n```\n", + "_key": "3d39cca1190e", + "_type": "span" + }, + { + "_key": "40b519d111de", + "_type": "span", + "marks": [], + "text": "You can clear the kernel buffer and page cache that appears in the buff/cache columns reported by the Linux _free_ command using either of these commands:" + }, + { + "text": "\n\n", + "_key": "32fa153df724", + "_type": "span" + }, + { + "_type": "span", + "text": "```bash\n$ echo 1 > /proc/sys/vm/drop_caches\n$ sysctl -w vm.drop_caches=1\n```", + "_key": "e94063adc2c2" + } + ], + "_type": "block", + "style": "normal", + "_key": "70af4c107389", + "listItem": "bullet" + }, + { + "_key": "b0e5db5b6f03", + "children": [ + { + "_key": "3f69ebb25ee6", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "bfaea298d778", + "children": [ + { + "_type": "span", + "text": "The bottom line", + "_key": "9983c287fe21" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "While we’ve come a long way in bringing the killer to justice, out-of-memory issues still crop up occasionally. It’s hard to say whether these are copycats, but you may still run up against this bug in a dark alley near you!", + "_key": "e9ba1f4bf6f4" + } + ], + "_type": "block", + "style": "normal", + "_key": "fc6d63ccd1a8" + }, + { + "_type": "block", + "style": "normal", + "_key": "68097011b30c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "365c5988100e" + } + ] + }, + { + "style": "normal", + "_key": "747514ba2b1e", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If you run into similar problems, hopefully, some of the suggestions offered above, such as tweaking kernel parameters or adding a swap partition on the Docker host, can help.", + "_key": "e6a417fea73f" + } + ], + "_type": "block" + }, + { + "_key": "a22f18a260ca", + "children": [ + { + "_type": "span", + "text": "", + "_key": "0e54e245e5d0" + } + ], + "_type": "block", + "style": "normal" + }, + { + "markDefs": [ + { + "_key": "6040550cc9f0", + "_type": "link", + "href": "https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/" + } + ], + "children": [ + { + "marks": [], + "text": "For some users, a good workaround is to use the ", + "_key": "561d0176fb03", + "_type": "span" + }, + { + "text": "Fusion file system", + "_key": "8bb17d3c9194", + "_type": "span", + "marks": [ + "6040550cc9f0" + ] + }, + { + "text": " instead of Nextflow’s conventional approach based on the AWS CLI. As explained above, the combination of more efficient data handling in Fusion and fast NVMe storage means that dirty pages are flushed more quickly, and containers are less likely to reach hard limits and exit with an out-of-memory error.", + "_key": "40c733c03312", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "df2f815540f4" + }, + { + "_key": "badd99dc330d", + "children": [ + { + "_type": "span", + "text": "", + "_key": "d4a8e05bf0a4" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "df4001fb64d3", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/whitepapers/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/", + "_key": "e8ee625d8247" + }, + { + "_key": "83edf0d1548f", + "_type": "link", + "href": "https://join.slack.com/t/nextflow/shared_invite/zt-11iwlxtw5-R6SNBpVksOJAx5sPOXNrZg" + } + ], + "children": [ + { + "_key": "139bbea65dfc", + "_type": "span", + "marks": [], + "text": "You can learn more about the Fusion file system by downloading the whitepaper " + }, + { + "_key": "e1c121dfff57", + "_type": "span", + "marks": [ + "e8ee625d8247" + ], + "text": "Breakthrough performance and cost-efficiency with the new Fusion file system" + }, + { + "marks": [], + "text": ". If you encounter similar issues or have ideas to share, join the discussion on the ", + "_key": "9cbf35096886", + "_type": "span" + }, + { + "marks": [ + "83edf0d1548f" + ], + "text": "Nextflow Slack channel", + "_key": "b50a1816ba4c", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ".", + "_key": "9630dff1dd4d" + } + ], + "_type": "block" + } + ], + "publishedAt": "2023-06-19T06:00:00.000Z", + "_rev": "Ot9x7kyGeH5005E3MJ4Uhd", + "tags": [ + { + "_key": "a7067c0a748d", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493", + "_type": "reference" + } + ], + "_id": "af3e5b699ba4" + }, + { + "body": [ + { + "style": "h2", + "_key": "5c3efce1d43a", + "markDefs": [], + "children": [ + { + "text": "Streamlining containers lifecycle", + "_key": "9f757a56788c0", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block" + }, + { + "_key": "133925eb69a3", + "markDefs": [], + "children": [ + { + "text": "In the bioinformatics landscape, containerized workflows have become crucial for ensuring reproducibility in data analysis. By encapsulating applications and their dependencies into", + "_key": "bbd08134fa220", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "strong" + ], + "text": " portable, self-contained packages", + "_key": "bbd08134fa221", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ", containers enable seamless distribution across diverse computing environments. However, this innovation comes with its own set of challenges such as maintaining and validating collections of images, operating private registries and limited tool access.", + "_key": "bbd08134fa222" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "5406af305818", + "markDefs": [ + { + "_key": "3af825e880ca", + "_type": "link", + "href": "https://seqera.io/wave/" + }, + { + "_type": "link", + "href": "https://seqera.io/containers/", + "_key": "df13db0993d9" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Seqera’s ", + "_key": "ea38e78f202c0" + }, + { + "_type": "span", + "marks": [ + "3af825e880ca" + ], + "text": "Wave", + "_key": "ea38e78f202c1" + }, + { + "_type": "span", + "marks": [], + "text": " tackles these challenges by offering a suite of features designed to simplify the configuration, provisioning and management of software containers for data pipelines at scale. In this blog, we will explore common pitfalls of managing containerized workflows, examine how Wave overcomes these obstacles, and discover how ", + "_key": "ea38e78f202c2" + }, + { + "_type": "span", + "marks": [ + "df13db0993d9" + ], + "text": "Seqera Containers", + "_key": "ea38e78f202c3" + }, + { + "_key": "ea38e78f202c4", + "_type": "span", + "marks": [], + "text": " further enhances the Wave user experience." + } + ] + }, + { + "children": [ + { + "_key": "d494224541cb", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "ce9241113361", + "markDefs": [] + }, + { + "style": "blockquote", + "_key": "e2920de8bcf3", + "markDefs": [ + { + "href": "https://hubs.la/Q02P4r9W0", + "_key": "34a3f2d2cbdc", + "_type": "link" + } + ], + "children": [ + { + "_key": "3d0c7b96ea780", + "_type": "span", + "marks": [ + "34a3f2d2cbdc" + ], + "text": "Read the Whitepaper Now!" + } + ], + "_type": "block" + }, + { + "markDefs": [], + "children": [ + { + "_key": "9b968ce154aa", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "95228c8b8d8b" + }, + { + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "Handling containerized workflows at scale is not easy", + "_key": "1c7b66a371820", + "_type": "span" + } + ], + "_type": "block", + "style": "h2", + "_key": "24eb80288494" + }, + { + "markDefs": [ + { + "href": "https://biocontainers.pro/", + "_key": "d3824d3cdd3a", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Software containers have been heavily adopted as a solution to streamline both the configuration and deployment of dependencies in complex data pipelines. However, maintaining containers at scale is not without its difficulties. Building, storing and distributing container images is an error-prone and tedious task that increases the cognitive load on software engineers, ultimately diminishing their productivity. Community-maintained container collections, such as ", + "_key": "d660fe7595e30" + }, + { + "_type": "span", + "marks": [ + "d3824d3cdd3a" + ], + "text": "BioContainers", + "_key": "d660fe7595e31" + }, + { + "_type": "span", + "marks": [], + "text": ", have emerged to mitigate some of these challenges. However, still, several problems remain:", + "_key": "d660fe7595e32" + } + ], + "_type": "block", + "style": "normal", + "_key": "2057a7ea39aa" + }, + { + "level": 1, + "_type": "block", + "style": "normal", + "_key": "3099a4010964", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Publicly Accessible Container Images", + "_key": "a01a55bd260e0" + }, + { + "_type": "span", + "marks": [], + "text": ": Issues with stability can compromise reliability. Typically unsuitable for non-academic organizations due to security and compliance concerns.\n\n", + "_key": "a01a55bd260e1" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_key": "2b4dea072f470", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Limited Tool Access: " + }, + { + "_key": "2b4dea072f471", + "_type": "span", + "marks": [], + "text": "Access is restricted to only to specific tools or collections (e.g. BioConda). Organizations often need the flexibility to assemble and deploy custom containers.\n\n" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "592901bbf056", + "listItem": "bullet" + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "API Rate Limits:", + "_key": "f3612e248c410" + }, + { + "_type": "span", + "marks": [], + "text": " Public registries often impose low API rate limits and afford low-rate or low-quality SLAs, making them unsuitable for production workloads.\n\n", + "_key": "f3612e248c411" + } + ], + "level": 1, + "_type": "block", + "style": "normal", + "_key": "1969250dec82", + "listItem": "bullet" + }, + { + "_type": "block", + "style": "normal", + "_key": "9ea73e59c666", + "listItem": "bullet", + "markDefs": [], + "children": [ + { + "_key": "5a3fdfc32935", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Egress Costs" + }, + { + "text": ": Use of private registries can incur outbound data transfer costs, particularly when deploying pipelines at scale across multiple regions or cloud providers.\n\n", + "_key": "dd0bf6ca10b3", + "_type": "span", + "marks": [] + } + ], + "level": 1 + }, + { + "style": "normal", + "_key": "1bd6e257c774", + "markDefs": [ + { + "_key": "f54c460b69c3", + "_type": "link", + "href": "https://seqera.io/wave/" + } + ], + "children": [ + { + "marks": [], + "text": "Seqera’s ", + "_key": "2d0b37d34c330", + "_type": "span" + }, + { + "marks": [ + "f54c460b69c3" + ], + "text": "Wave", + "_key": "2d0b37d34c331", + "_type": "span" + }, + { + "marks": [], + "text": " solves these problems by simplifying the management of containerized bioinformatics workflows by ", + "_key": "2d0b37d34c332", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "provisioning containers on-demand during pipeline execution.", + "_key": "2d0b37d34c333" + }, + { + "_key": "2d0b37d34c334", + "_type": "span", + "marks": [], + "text": " This approach ensures the delivery of container images that are defined precisely depending on requirements of each pipeline task in terms of dependencies and platform architecture. The process is " + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "completely transparent and fully automated,", + "_key": "2d0b37d34c335" + }, + { + "text": " eliminating the need to manually create, upload and maintain the numerous container images required for pipeline execution.", + "_key": "2d0b37d34c336", + "_type": "span", + "marks": [] + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "b7b00332c37f", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "By integrating containers as ", + "_key": "a75abd67740a0" + }, + { + "_key": "a75abd67740a1", + "_type": "span", + "marks": [ + "strong" + ], + "text": "dynamic pipeline components " + }, + { + "marks": [], + "text": "rather than standalone artifacts, Wave streamlines development, enhances reliability, and reduces maintenance overhead. This makes it easier for developers and operations teams to build, deploy, and manage containers efficiently and securely.", + "_key": "a75abd67740a2", + "_type": "span" + } + ] + }, + { + "_key": "2bcc79444f9d", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "ff8ecd1929ad", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "h2", + "_key": "bc1396a38011", + "markDefs": [], + "children": [ + { + "marks": [ + "strong" + ], + "text": "How does Wave work?", + "_key": "ada86664aefe0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://training.nextflow.io/basic_training/containers/#container-directives", + "_key": "0d6219f32dfe" + } + ], + "children": [ + { + "_key": "d2a901bcfde60", + "_type": "span", + "marks": [], + "text": "Wave transforms containers and pipeline management by allowing bioinformaticians to specify container requirements directly within their pipeline definitions. Instead of referencing manually created container images in " + }, + { + "_type": "span", + "marks": [ + "0d6219f32dfe" + ], + "text": "Nextflow’s ", + "_key": "d2a901bcfde61" + }, + { + "_key": "d2a901bcfde62", + "_type": "span", + "marks": [ + "0d6219f32dfe", + "em" + ], + "text": "container" + }, + { + "marks": [ + "0d6219f32dfe" + ], + "text": " directive", + "_key": "d2a901bcfde63", + "_type": "span" + }, + { + "marks": [], + "text": ", developers can either include a Dockerfile in the directory where the process' module is defined or just instruct Wave to use the Conda package associated with the process definition. By using this information, Wave provisions a container on-demand either using an existing container image in the target registry matching the specified requirement or building an new one on-the-fly to fulfill a new request, and returns the container URI pointing to the Wave container for process execution. The built container is then pushed to a destination registry and returned to the pipeline for execution, ensuring seamless integration and optimization across ", + "_key": "d2a901bcfde64", + "_type": "span" + }, + { + "_key": "d2a901bcfde65", + "_type": "span", + "marks": [ + "strong" + ], + "text": "diverse computational architectures." + } + ], + "_type": "block", + "style": "normal", + "_key": "86f8349d8dd9" + }, + { + "style": "normal", + "_key": "4bd12760fcd8", + "markDefs": [ + { + "_type": "link", + "href": "https://www.nextflow.io/docs/latest/config.html", + "_key": "b0443a516524" + } + ], + "children": [ + { + "marks": [], + "text": "Wave can also direct containers into a registry specified in the ", + "_key": "201624e167ae0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "b0443a516524" + ], + "text": "nextflow.config file", + "_key": "201624e167ae1" + }, + { + "text": ", along with other pipeline settings. This means containers can be served from cloud registries closer to where pipelines are executed, delivering ", + "_key": "201624e167ae2", + "_type": "span", + "marks": [] + }, + { + "text": "better performance and reducing network traffic", + "_key": "201624e167ae3", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": ". Moreover, Wave operates independently, serving as a versatile tool for bioinformaticians across various platforms and workflows. By employing ", + "_key": "201624e167ae4", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "multi-level caching,", + "_key": "201624e167ae5" + }, + { + "marks": [], + "text": " Wave ensures that containers are built only once or when the Dockerfile changes, enhancing efficiency and streamlining the management of bioinformatics workflows.", + "_key": "201624e167ae6", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "image", + "_key": "6f161a64aa34", + "asset": { + "_ref": "image-63c1caffc660a4c615ef2551318bc7b8fb8eca7b-2165x680-png", + "_type": "reference" + } + }, + { + "_key": "e33f18dee5bf", + "markDefs": [], + "children": [ + { + "_key": "2bb9a056f41c0", + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 1." + }, + { + "_type": "span", + "marks": [ + "em" + ], + "text": " Wave —a smart container provisioning and augmentation service for Nextflow.", + "_key": "2bb9a056f41c1" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "f6e484ee66d6", + "markDefs": [], + "children": [ + { + "text": "", + "_key": "4e5278eb85f4", + "_type": "span", + "marks": [] + } + ] + }, + { + "_key": "1f7634971dca", + "markDefs": [], + "children": [ + { + "_key": "a220fc9a291d0", + "_type": "span", + "marks": [ + "strong" + ], + "text": "Key features of Wave" + } + ], + "_type": "block", + "style": "h2" + }, + { + "children": [ + { + "marks": [], + "text": "✔ ", + "_key": "5178d679f1fd0", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Access private container repositories", + "_key": "5178d679f1fd1" + }, + { + "text": ": Seamlessly integrate Nextflow pipelines with Seqera Platform to grant access to private container repositories.", + "_key": "5178d679f1fd2", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "14d7473d0d0f", + "markDefs": [] + }, + { + "_type": "block", + "style": "normal", + "_key": "ee1c448e55ff", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "✔ ", + "_key": "770c482bc4f10" + }, + { + "text": "On-demand container provisioning:", + "_key": "770c482bc4f11", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "_key": "770c482bc4f12", + "_type": "span", + "marks": [], + "text": " Automatically provision containers (via Dockerfile or Conda packages) based on dependencies in your Nextflow pipeline, enhancing efficiency, reducing errors, and eliminating the need for separate container builds and maintenance." + } + ] + }, + { + "_key": "dfc5be5494ed", + "markDefs": [], + "children": [ + { + "_key": "753a230208440", + "_type": "span", + "marks": [], + "text": "✔ " + }, + { + "marks": [ + "strong" + ], + "text": "Enhanced security", + "_key": "753a230208441", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": ": Each new container provisioned by Wave undergoes a security scan to identify potential vulnerabilities.", + "_key": "753a230208442" + } + ], + "_type": "block", + "style": "normal" + }, + { + "_type": "block", + "style": "normal", + "_key": "e50b6a0ff82a", + "markDefs": [], + "children": [ + { + "_key": "7ec429cf22a90", + "_type": "span", + "marks": [], + "text": "✔" + }, + { + "text": " Create multi-tool and multi-package containers", + "_key": "7ec429cf22a91", + "_type": "span", + "marks": [ + "strong" + ] + }, + { + "marks": [], + "text": ": Easily build and manage containers with diverse tools and packages, streamlining complex workflows with multiple dependencies.", + "_key": "7ec429cf22a92", + "_type": "span" + } + ] + }, + { + "children": [ + { + "marks": [], + "text": "✔", + "_key": "51c15bf17a880", + "_type": "span" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": " Provision multi-format and multi-platform containers: ", + "_key": "51c15bf17a881" + }, + { + "_key": "51c15bf17a882", + "_type": "span", + "marks": [], + "text": "Automatically provision containers for Docker or Singularity based on your Nextflow pipeline configuration and platform, including ARM64 containers for AWS Graviton if a compatible Dockerfile or Conda package is provided." + } + ], + "_type": "block", + "style": "normal", + "_key": "a8c485516400", + "markDefs": [] + }, + { + "markDefs": [], + "children": [ + { + "_key": "741df5fa35600", + "_type": "span", + "marks": [], + "text": "✔ " + }, + { + "marks": [ + "strong" + ], + "text": "Mirror Public and Private Repositories", + "_key": "741df5fa35601", + "_type": "span" + }, + { + "text": ": Mirror the containers needed by your pipelines in a registry co-located with where pipeline execution is carried out, allowing optimized data transfer costs and accelerated execution of pipeline tasks.", + "_key": "741df5fa35602", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "8d580f86471d" + }, + { + "_key": "e1cf4e3eabf4", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4r9W0", + "_key": "45a6b1f720b3" + } + ], + "children": [ + { + "marks": [ + "45a6b1f720b3" + ], + "text": "Download the Whitepaper", + "_key": "5ebb5fcdeb6b", + "_type": "span" + }, + { + "marks": [], + "text": " to explore features in more detail", + "_key": "835caf3a91f0", + "_type": "span" + } + ], + "_type": "block", + "style": "blockquote" + }, + { + "style": "h2", + "_key": "418255c8286d", + "markDefs": [], + "children": [ + { + "text": "Seqera Containers for publicly accessible container images", + "_key": "f4fa5324fa620", + "_type": "span", + "marks": [ + "strong" + ] + } + ], + "_type": "block" + }, + { + "_key": "6b076c6e5be9", + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4rwk0", + "_key": "49dd8bebf517" + } + ], + "children": [ + { + "_key": "f2dc6721adb90", + "_type": "span", + "marks": [], + "text": "With the newly launched " + }, + { + "_key": "f2dc6721adb91", + "_type": "span", + "marks": [ + "49dd8bebf517" + ], + "text": "Seqera Containers" + }, + { + "_type": "span", + "marks": [], + "text": ", the Wave experience is elevated even further. Now, instead of browsing existing container images as with a traditional container registry, users can just specify which tools they require through an ", + "_key": "f2dc6721adb92" + }, + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "intuitive and user-friendly web interface. ", + "_key": "f2dc6721adb93" + }, + { + "text": "This will find an existing container image for the required tool(s) or build a container on-the-fly using the Wave service. Currently it supports any software package provided by the Bioconda, Conda forge and Pypi Conda channels. Container can be built both for Docker and Singularity image format and linux/amd64 and linux/amd64 CPU architecture.", + "_key": "f2dc6721adb94", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal" + }, + { + "_key": "4c985acdcf31", + "markDefs": [], + "children": [ + { + "_key": "d950641601c00", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "e81e3d89583c", + "markDefs": [ + { + "_type": "link", + "href": "https://seqera.io/containers/", + "_key": "171c64cfcfbe" + }, + { + "href": "https://community.wave.seqera.io/", + "_key": "339b65aacd58", + "_type": "link" + } + ], + "children": [ + { + "_type": "span", + "marks": [], + "text": "Additionally, ", + "_key": "3a00b180c5180" + }, + { + "_type": "span", + "marks": [ + "171c64cfcfbe" + ], + "text": "Seqera Containers", + "_key": "3a00b180c5181" + }, + { + "marks": [], + "text": " are stored permanently and publicly accessible via the registry host ", + "_key": "3a00b180c5182", + "_type": "span" + }, + { + "_key": "8c65146fbd29", + "_type": "span", + "marks": [ + "339b65aacd58" + ], + "text": "community.wave.seqera.io" + }, + { + "_type": "span", + "marks": [], + "text": ". This ensures that any future requests for the same package will return the exact container image, guaranteeing reproducibility across runs. Seqera Containers project was developed in collaboration with Amazon Web Service, which is sponsoring the container hosting infrastructure.\n", + "_key": "11bc24deeace" + } + ], + "_type": "block" + }, + { + "_key": "7035fe8ea204", + "asset": { + "_ref": "image-d505d0b687501b2f43a47a688dc2e096886fbfff-883x451-jpg", + "_type": "reference" + }, + "_type": "image" + }, + { + "style": "normal", + "_key": "c5f9e1119426", + "markDefs": [], + "children": [ + { + "_key": "5497721bd8e40", + "_type": "span", + "marks": [ + "strong", + "em" + ], + "text": "Figure 2" + }, + { + "marks": [ + "em" + ], + "text": ". Snapshot of Seqera Containers, demonstrating how you can create containers with the tools you want, on the fly.", + "_key": "5497721bd8e41", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": "\n", + "_key": "5fccd7d33cef" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h2", + "_key": "aa383c5b254c", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [ + "strong" + ], + "text": "Discover the benefits of Wave", + "_key": "f811f3a434440" + } + ] + }, + { + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "", + "_key": "9da5596afc7b0" + } + ], + "_type": "block", + "style": "normal", + "_key": "a9639af3c287" + }, + { + "markDefs": [], + "children": [ + { + "text": "Wave offers a transformative solution to the complexities of managing containerized bioinformatics workflows. By integrating containers directly into pipelines and prioritizing flexibility and efficiency, Wave streamlines development, enhances security, and optimizes performance across diverse computing environments. Deep dive into how Wave can revolutionize your workflow management by downloading our whitepaper today.", + "_key": "a44c589047e10", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "5f5fe5035844" + }, + { + "markDefs": [ + { + "_type": "link", + "href": "https://hubs.la/Q02P4r9W0", + "_key": "29092c152215" + } + ], + "children": [ + { + "text": "Download the Wave Whitepaper", + "_key": "08116ce2b1b70", + "_type": "span", + "marks": [ + "29092c152215" + ] + } + ], + "_type": "block", + "style": "blockquote", + "_key": "c4ebd70557cd" + }, + { + "markDefs": [], + "children": [ + { + "_key": "2d5affde12080", + "_type": "span", + "marks": [], + "text": "" + } + ], + "_type": "block", + "style": "normal", + "_key": "2deff6b4aab7" + } + ], + "title": "Wave: rethinking software containers for data pipelines", + "_rev": "n1tMSWxwIdUSjJ5EuKAZgf", + "meta": { + "_type": "meta", + "description": "In the bioinformatics landscape, containerized workflows have become crucial for ensuring reproducibility in data analysis. By encapsulating applications and their dependencies into portable, self-contained packages, containers enable seamless distribution across diverse computing environments.", + "noIndex": false, + "slug": { + "_type": "slug", + "current": "wave-rethinking-software-containers-for-data-pipelines" + } + }, + "tags": [ + { + "_ref": "6f35c54a-0d93-4aef-9d80-bd4ccb6527b4", + "_type": "reference", + "_key": "e6e4331ef27a" + } + ], + "_createdAt": "2024-09-09T07:56:25Z", + "_id": "b032b7fb-8dc8-464e-b4c8-18cc9b8c2dd1", + "publishedAt": "2024-09-10T07:44:00.000Z", + "_type": "blogPost", + "_updatedAt": "2024-09-10T08:00:16Z", + "author": { + "_ref": "paolo-di-tommaso", + "_type": "reference" + } + }, + { + "publishedAt": "2019-06-24T06:00:00.000Z", + "author": { + "_ref": "evan-floden", + "_type": "reference" + }, + "_type": "blogPost", + "_createdAt": "2024-09-25T14:15:40Z", + "_updatedAt": "2024-09-26T09:02:10Z", + "tags": [ + { + "_type": "reference", + "_key": "eb87b3c77818", + "_ref": "b6511053-299b-4aa5-8957-94fb9ebc9493" + } + ], + "_rev": "2PruMrLMGpvZP5qAknmCOf", + "body": [ + { + "markDefs": [], + "children": [ + { + "_key": "9eda387f5652", + "_type": "span", + "marks": [ + "em" + ], + "text": "This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part two [here](/blog/2019/troubleshooting-nextflow-resume.html)" + } + ], + "_type": "block", + "style": "normal", + "_key": "a0038c3496f6" + }, + { + "_type": "block", + "style": "normal", + "_key": "a62228a6f33c", + "children": [ + { + "_type": "span", + "text": "", + "_key": "bf939ce5fea4" + } + ] + }, + { + "children": [ + { + "_type": "span", + "marks": [], + "text": "Task execution caching and checkpointing is an essential feature of any modern workflow manager and Nextflow provides an automated caching mechanism with every workflow execution. When using the ", + "_key": "0f157ad53e2c" + }, + { + "_key": "abcb16c7e3e9", + "_type": "span", + "marks": [ + "code" + ], + "text": "-resume" + }, + { + "_type": "span", + "marks": [], + "text": " flag, successfully completed tasks are skipped and the previously cached results are used in downstream tasks. But understanding the specifics of how it works and debugging situations when the behaviour is not as expected is a common source of frustration.", + "_key": "d00803befbac" + } + ], + "_type": "block", + "style": "normal", + "_key": "a0a90c40e5de", + "markDefs": [] + }, + { + "children": [ + { + "text": "", + "_key": "863cfcbcbcec", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "f2a87f598217" + }, + { + "_key": "28038a4c2e07", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "The mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, called the working directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number obtained from a composition of the task’s:", + "_key": "f7c24b81bf4f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "text": "", + "_key": "a85689829432", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "2b0cfe26487e" + }, + { + "listItem": "bullet", + "children": [ + { + "text": "Inputs values", + "_key": "98b881bb230a", + "_type": "span", + "marks": [] + }, + { + "_type": "span", + "marks": [], + "text": "Input files", + "_key": "a9a6e7a97bc2" + }, + { + "_key": "303e2463fbeb", + "_type": "span", + "marks": [], + "text": "Command line string" + }, + { + "text": "Container ID", + "_key": "79a035697977", + "_type": "span", + "marks": [] + }, + { + "text": "Conda environment", + "_key": "0646807a7636", + "_type": "span", + "marks": [] + }, + { + "_key": "bc03cfe61532", + "_type": "span", + "marks": [], + "text": "Environment modules" + }, + { + "text": "Any executed scripts in the bin directory", + "_key": "bf1bd9fd74dc", + "_type": "span", + "marks": [] + } + ], + "_type": "block", + "style": "normal", + "_key": "35c0279c2956" + }, + { + "style": "normal", + "_key": "7067505c5ab7", + "children": [ + { + "_key": "6daa7198fa9c", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "h3", + "_key": "938a70f11dd1", + "children": [ + { + "_key": "fdba84779d7b", + "_type": "span", + "text": "How does resume work?" + } + ] + }, + { + "children": [ + { + "text": "The ", + "_key": "9478002adc02", + "_type": "span", + "marks": [] + }, + { + "marks": [ + "code" + ], + "text": "-resume", + "_key": "9fbeee74a241", + "_type": "span" + }, + { + "marks": [], + "text": " command line option allows for the continuation of a workflow execution. It can be used in its most basic form with:", + "_key": "0449dbbfd432", + "_type": "span" + } + ], + "_type": "block", + "style": "normal", + "_key": "4570c84b513b", + "markDefs": [] + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "82ed9dce3990" + } + ], + "_type": "block", + "style": "normal", + "_key": "2b03758708ab" + }, + { + "_type": "code", + "_key": "5f04d57d35c3", + "code": "$ nextflow run nextflow-io/hello -resume" + }, + { + "_key": "f8e874bb071a", + "children": [ + { + "_key": "31877c9ee9a3", + "_type": "span", + "text": "" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "c052cdb8617e", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "In practice, every execution starts from the beginning. However, when using resume, before launching a task, Nextflow uses the unique ID to check if:", + "_key": "396203fa52b0", + "_type": "span" + } + ], + "_type": "block" + }, + { + "_type": "block", + "style": "normal", + "_key": "cb790d4b09ea", + "children": [ + { + "text": "", + "_key": "6406a75549df", + "_type": "span" + } + ] + }, + { + "_key": "ef009469b4c3", + "listItem": "bullet", + "children": [ + { + "_type": "span", + "marks": [], + "text": "the working directory exists", + "_key": "a2b1c4c335d8" + }, + { + "_key": "19bcbeca5997", + "_type": "span", + "marks": [], + "text": "it contains a valid command exit status" + }, + { + "_key": "30a032f4ed82", + "_type": "span", + "marks": [], + "text": "it contains the expected output files." + } + ], + "_type": "block", + "style": "normal" + }, + { + "children": [ + { + "_type": "span", + "text": "", + "_key": "a614a24cbb5b" + } + ], + "_type": "block", + "style": "normal", + "_key": "8a74ac56192a" + }, + { + "_key": "dcf55fe767e3", + "markDefs": [], + "children": [ + { + "_type": "span", + "marks": [], + "text": "If these conditions are satisfied, the task execution is skipped and the previously computed outputs are applied. When a task requires recomputation, ie. the conditions above are not fulfilled, the downstream tasks are automatically invalidated.", + "_key": "0f1954be453f" + } + ], + "_type": "block", + "style": "normal" + }, + { + "style": "normal", + "_key": "199342d58c3a", + "children": [ + { + "_key": "05f7601d3b9e", + "_type": "span", + "text": "" + } + ], + "_type": "block" + }, + { + "_key": "dc4d05993843", + "children": [ + { + "_type": "span", + "text": "The working directory", + "_key": "8c2442a3062e" + } + ], + "_type": "block", + "style": "h3" + }, + { + "_type": "block", + "style": "normal", + "_key": "ee7ce20f9af0", + "markDefs": [], + "children": [ + { + "marks": [], + "text": "By default, the task work directories are created in the directory from where the pipeline is launched. This is often a scratch storage area that can be cleaned up once the computation is completed. A different location for the execution work directory can be specified using the command line option ", + "_key": "cb732b14846b", + "_type": "span" + }, + { + "marks": [ + "code" + ], + "text": "-w", + "_key": "1cb8eba9d1b0", + "_type": "span" + }, + { + "_type": "span", + "marks": [], + "text": " e.g.", + "_key": "f86e71ba3981" + } + ] + }, + { + "_type": "block", + "style": "normal", + "_key": "bf3170ded749", + "children": [ + { + "_type": "span", + "text": "", + "_key": "32bd47631379" + } + ] + }, + { + "code": "$ nextflow run