Skip to content

Relatively straightforward WDL workflow that executes STAR+DESeq2 on RNAseq data

License

Notifications You must be signed in to change notification settings

getwilds/ww-star-deseq2

Repository files navigation

ww-star-deseq2

Project Status: Experimental – Useable, some support, not open to feedback, unstable API.

This WILDS WDL workflow performs alignment using the two-pass methodology of STAR and subsequently analyzes that alignment via DESeq2. It is intended to be a relatively straightforward demonstration of an RNA sequencing pipeline within the context of the WILDS ecosystem.

Basic Usage

For Fred Hutch users that are new to WDL, we recommend using PROOF to submit this workflow directly to the on-premise HPC cluster, as it simplifies interaction with Cromwell and provides a user-friendly front-end for job submission and tracking. To do this:

  1. Start by either cloning or downloading a copy of this repository to your local machine.
    • Cloning: git clone https://github.com/getwilds/ww-star-deseq2.git
    • Downloading: Click the green "Code" button in the top right corner, then click "Download ZIP".
  2. Update ww-star-deseq2-inputs.json with your sample names (omics_sample_name) and FASTQ file paths (R1 and R2).
  3. Update ww-star-deseq2-options.json with your preferred location for output data to be saved to (final_workflow_outputs_dir).
  4. Submit the WDL file along with your custom json's to the Fred Hutch cluster via PROOF by following our SciWiki documentation.

Additional Notes:

  • Keep in mind that all file paths in the jsons must be visible to the Fred Hutch cluster, e.g. /fh/fast/, AWS S3 bucket. Input file paths on your local machine won't work in PROOF.
  • Specific reference genome files can be provided as inputs, but if none are provided, the workflow will automatically download a GRCh38 reference genome and use that. For the first go-around, we recommend starting with the default reference files.
  • To avoid duplication of reference genome data, we highly recommend executing this workflow with call caching enabled in the options json (write_to_cache, read_from_cache, already set to true here).

Advanced Usage

For users outside of Fred Hutch or more advanced users who would like to run the workflow locally, command line execution is relatively straightforward:

java -jar cromwell-86.jar run ww-star-deseq2.wdl --inputs ww-star-deseq2-inputs.json --options ww-star-deseq2-options.json

Although Cromwell is demonstrated here, this pipeline is not specific to Cromwell and can be run using whichever WDL execution method you prefer (miniwdl, Terra, HealthOmics, etc.).

Support

For questions, bugs, and/or feature requests, reach out to the Fred Hutch Data Science Lab (DaSL) at wilds@fredhutch.org, or open an issue on our issue tracker.

Contributing

If you would like to contribute to this WILDS WDL workflow, see our contribution guidelines as well out our WILDS Contributor Guide for more details.

License

Distributed under the MIT License. See LICENSE for details.

About

Relatively straightforward WDL workflow that executes STAR+DESeq2 on RNAseq data

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages