-
Notifications
You must be signed in to change notification settings - Fork 4
Week 2 tests
We could not find any pipeline in this one.
It just gave SOPs for 16s data processing and microbiome analysis.
We also noticed that the information in this particular md file gave a summary of the various phases and terms of what the analysis entails, as well as criteria one could use to check if a pipeline is good.
This is Dada2 pipeline, implemented in Nextflow.
It was successful with a run time of 46 minutes.
It had an easy setup.
It was last updated 3 months ago.
nextflow run uct-cbio/16S-rDNA-dada2-pipeline --reads '*_R{1,2}.fastq.gz' --trimFor 24 --trimRev 25 --reference 'gg_13_8_train_set_97.fa.gz' -profile standard
- It output results mostly in RDS format that could not really be visualized.
.
├── FastQC_post_filter_trim
│ ├── 1_S103_L001_fastqc_postfiltertrim
│ │ ├── 1_S103_L001.R1.filtered_fastqc.zip
│ │ └── 1_S103_L001.R2.filtered_fastqc.zip
│ ├── 1a_S103_L001_fastqc_postfiltertrim
│ │ ├── 1a_S103_L001.R1.filtered_fastqc.zip
│ │ └── 1a_S103_L001.R2.filtered_fastqc.zip
│ ├── 2_S115_L001_fastqc_postfiltertrim
│ │ ├── 2_S115_L001.R1.filtered_fastqc.zip
│ │ └── 2_S115_L001.R2.filtered_fastqc.zip
│ ├── 2a_S115_L001_fastqc_postfiltertrim
│ │ ├── 2a_S115_L001.R1.filtered_fastqc.zip
│ │ └── 2a_S115_L001.R2.filtered_fastqc.zip
│ └── multiqc_report.html
├── dada2-Alignment
│ └── aligned_seqs.fasta
├── dada2-BIOM
│ └── dada2.biom
├── dada2-Chimera-Taxonomy
│ ├── seqtab_final.RDS
│ └── tax_final.RDS
├── dada2-Derep
│ ├── 1_S103_L001.ddF.RDS
│ ├── 1_S103_L001.ddR.RDS
│ ├── 1_S103_L001.merged.RDS
│ ├── 1a_S103_L001.ddF.RDS
│ ├── 1a_S103_L001.ddR.RDS
│ ├── 1a_S103_L001.merged.RDS
│ ├── 2_S115_L001.ddF.RDS
│ ├── 2_S115_L001.ddR.RDS
│ ├── 2_S115_L001.merged.RDS
│ ├── 2a_S115_L001.ddF.RDS
│ ├── 2a_S115_L001.ddR.RDS
│ └── 2a_S115_L001.merged.RDS
├── dada2-FilterAndTrim
│ ├── 1_S103_L001.R1.filtered.fastq.gz
│ ├── 1_S103_L001.R2.filtered.fastq.gz
│ ├── 1_S103_L001.trimmed.txt
│ ├── 1_S103_L001_fastqc
│ │ ├── 1_S103_L001_R1_001_fastqc.zip
│ │ └── 1_S103_L001_R2_001_fastqc.zip
│ ├── 1a_S103_L001.R1.filtered.fastq.gz
│ ├── 1a_S103_L001.R2.filtered.fastq.gz
│ ├── 1a_S103_L001.trimmed.txt
│ ├── 1a_S103_L001_fastqc
│ │ ├── 1a_S103_L001_R1_001_fastqc.zip
│ │ └── 1a_S103_L001_R2_001_fastqc.zip
│ ├── 2_S115_L001.R1.filtered.fastq.gz
│ ├── 2_S115_L001.R2.filtered.fastq.gz
│ ├── 2_S115_L001.trimmed.txt
│ ├── 2_S115_L001_fastqc
│ │ ├── 2_S115_L001_R1_001_fastqc.zip
│ │ └── 2_S115_L001_R2_001_fastqc.zip
│ ├── 2a_S115_L001.R1.filtered.fastq.gz
│ ├── 2a_S115_L001.R2.filtered.fastq.gz
│ ├── 2a_S115_L001.trimmed.txt
│ ├── 2a_S115_L001_fastqc
│ │ ├── 2a_S115_L001_R1_001_fastqc.zip
│ │ └── 2a_S115_L001_R2_001_fastqc.zip
│ ├── all.trimmed.csv
│ └── multiqc_report.html
├── dada2-Inference
│ ├── all.ddF.RDS
│ └── all.ddR.RDS
├── dada2-LearnErrors
│ ├── errorsF.RDS
│ └── errorsR.RDS
├── dada2-Phangorn
│ ├── phangorn.tree.RDS
│ ├── tree.GTR.newick
│ └── tree.newick
├── dada2-ReadTracking
│ └── all.readtracking.txt
├── dada2-SeqTable
│ ├── mergers.RDS
│ └── seqtab.RDS
└── pipeline_info
├── dada2_DAG.svg
├── dada2_report.html
├── dada2_timeline.html
└── dada2_trace.txt\
This pipeline was cumbersome as a result of too many adjustments required and hence the setup was not easy.
It was last updated 3years ago.
- Change the projectName. Keep it short.
- Change the path of the rawReads directory.
- Change the path to the qiimeMappingFile.
- Change the path to the outDir directory.
- Base on your data and analysis change other pipeline configuration settings.
- Change path to singularity containers.
There were two pipelines in this:
- Qiime2 Nextflow
- Dada2 R It was last updated 8 months ago.
- We input our data and after running the pipeline the code exited with a "no input files found" error.
- We resorted to running every command and noticed some errors in the kind of data being passed into some processes including:
The variables holding the reads after the basename splitting of the reads were not working.
The 'samples' variable being used in the pipeline was picking all forward and reverse reads.
We used the variables holding the data before the basename splitting.
The 'samples' variable was corrected to individual sample variables for forward and reverse reads.
- As a result of these adjustments, we corrected the following parts of the pipeline:
rawplotFreads and rawplotRreads
rawplotR and rawplotF
random_samples
dereplicating reads(samples)
making a sequence table(samples)
- Lengths and indexes were also adjusted to match our samples sizes.
- We also changed the labeling of plots for checking quality after filtering and trimming which had been done interchangeably (the forward in the reverse plot and vice versa).
- The pipeline had errors due to modules not being imported.
- The visualization.nf module had two processes in one hence outputting two results: otu table and pcoa.
- The adjustment that can be made to the module's error is having a script that automatically loads the modules.
- The pcoa process was removed and specified to its own process so as to output its own results. (Still under adjustment)
.
├── Rdb
│ └── reference_db
├── artifacts
├── chimeras
├── dereps
├── fastqc_post
├── fastqc_raw
├── filter
├── merge
├── multiqc_postfastqc
├── multiqc_raw
├── orient
├── otus
├── pipeline_info
├── trimming
└── visualization
Some results were empty:
fastq_post
fastq_raw
We were able to visualize:
- Alpha diversity
- Beta diversity
- Shannon diversity
- Taxonomic barplots
- Visualization script is not implemented in pipeline
- This is a good easy-to-set-up Qiime2 pipeline with a run time of 21 minutes.
- Well documented
- Regularly updated- It was last updated a month ago.
- Visualization in html
- Run in multiple containers - singularity/docker/podman/shifter
#16S rRNA gene amplicon analysis of Illumina paired-end data
nextflow run nf-core/ampliseq -profile singularity --input "data" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "data/Metadata.tsv"
The use of docker gave this error on running the command:
- Command error: docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
We ran the same code using singularity and we got this error:
- Caused by: Failed to pull singularity image
- We noticed that space was the issue and ran the code again in the hpc node. It ran with singularity.
- Qutadapt
- Dada 2
- Fastqc and multiqc
- Qiime 2
- SBDI