Skip to content

Week 2 tests

Nelly-Wambui edited this page Jan 27, 2022 · 11 revisions

H3ABioNet-SOPs

We could not find any pipeline in this one.

It just gave SOPs for 16s data processing and microbiome analysis.

We also noticed that the information in this particular md file gave a summary of the various phases and terms of what the analysis entails, as well as criteria one could use to check if a pipeline is good.

H3ABioNet-TADA

This is Dada2 pipeline, implemented in Nextflow.

It was successful with a run time of 46 minutes.

It had an easy setup.

It was last updated 3 months ago.

Run command

nextflow run uct-cbio/16S-rDNA-dada2-pipeline --reads '*_R{1,2}.fastq.gz' --trimFor 24 --trimRev 25 --reference 'gg_13_8_train_set_97.fa.gz' -profile standard

Functionality

  • It output results mostly in RDS format that could not really be visualized.

. ├── FastQC_post_filter_trim
│   ├── 1_S103_L001_fastqc_postfiltertrim
│   │   ├── 1_S103_L001.R1.filtered_fastqc.zip
│   │   └── 1_S103_L001.R2.filtered_fastqc.zip
│   ├── 1a_S103_L001_fastqc_postfiltertrim
│   │   ├── 1a_S103_L001.R1.filtered_fastqc.zip
│   │   └── 1a_S103_L001.R2.filtered_fastqc.zip
│   ├── 2_S115_L001_fastqc_postfiltertrim
│   │   ├── 2_S115_L001.R1.filtered_fastqc.zip
│   │   └── 2_S115_L001.R2.filtered_fastqc.zip
│   ├── 2a_S115_L001_fastqc_postfiltertrim
│   │   ├── 2a_S115_L001.R1.filtered_fastqc.zip
│   │   └── 2a_S115_L001.R2.filtered_fastqc.zip
│   └── multiqc_report.html
├── dada2-Alignment
│   └── aligned_seqs.fasta
├── dada2-BIOM
│   └── dada2.biom
├── dada2-Chimera-Taxonomy
│   ├── seqtab_final.RDS
│   └── tax_final.RDS
├── dada2-Derep
│   ├── 1_S103_L001.ddF.RDS
│   ├── 1_S103_L001.ddR.RDS
│   ├── 1_S103_L001.merged.RDS
│   ├── 1a_S103_L001.ddF.RDS
│   ├── 1a_S103_L001.ddR.RDS
│   ├── 1a_S103_L001.merged.RDS
│   ├── 2_S115_L001.ddF.RDS
│   ├── 2_S115_L001.ddR.RDS
│   ├── 2_S115_L001.merged.RDS
│   ├── 2a_S115_L001.ddF.RDS
│   ├── 2a_S115_L001.ddR.RDS
│   └── 2a_S115_L001.merged.RDS
├── dada2-FilterAndTrim
│   ├── 1_S103_L001.R1.filtered.fastq.gz
│   ├── 1_S103_L001.R2.filtered.fastq.gz
│   ├── 1_S103_L001.trimmed.txt
│   ├── 1_S103_L001_fastqc
│   │   ├── 1_S103_L001_R1_001_fastqc.zip
│   │   └── 1_S103_L001_R2_001_fastqc.zip
│   ├── 1a_S103_L001.R1.filtered.fastq.gz
│   ├── 1a_S103_L001.R2.filtered.fastq.gz
│   ├── 1a_S103_L001.trimmed.txt
│   ├── 1a_S103_L001_fastqc
│   │   ├── 1a_S103_L001_R1_001_fastqc.zip
│   │   └── 1a_S103_L001_R2_001_fastqc.zip
│   ├── 2_S115_L001.R1.filtered.fastq.gz
│   ├── 2_S115_L001.R2.filtered.fastq.gz
│   ├── 2_S115_L001.trimmed.txt
│   ├── 2_S115_L001_fastqc
│   │   ├── 2_S115_L001_R1_001_fastqc.zip
│   │   └── 2_S115_L001_R2_001_fastqc.zip
│   ├── 2a_S115_L001.R1.filtered.fastq.gz
│   ├── 2a_S115_L001.R2.filtered.fastq.gz
│   ├── 2a_S115_L001.trimmed.txt
│   ├── 2a_S115_L001_fastqc
│   │   ├── 2a_S115_L001_R1_001_fastqc.zip
│   │   └── 2a_S115_L001_R2_001_fastqc.zip
│   ├── all.trimmed.csv
│   └── multiqc_report.html
├── dada2-Inference
│   ├── all.ddF.RDS
│   └── all.ddR.RDS
├── dada2-LearnErrors
│   ├── errorsF.RDS
│   └── errorsR.RDS
├── dada2-Phangorn
│   ├── phangorn.tree.RDS
│   ├── tree.GTR.newick
│   └── tree.newick
├── dada2-ReadTracking
│   └── all.readtracking.txt
├── dada2-SeqTable
│   ├── mergers.RDS
│   └── seqtab.RDS
└── pipeline_info
├── dada2_DAG.svg
├── dada2_report.html
├── dada2_timeline.html
└── dada2_trace.txt\

H3ABioNet-16S

This pipeline was cumbersome as a result of too many adjustments required and hence the setup was not easy.

It was last updated 3years ago.

The advised adjustments

  • Change the projectName. Keep it short.
  • Change the path of the rawReads directory.
  • Change the path to the qiimeMappingFile.
  • Change the path to the outDir directory.
  • Base on your data and analysis change other pipeline configuration settings.
  • Change path to singularity containers.

MBBU/16S-Accreditation

There were two pipelines in this:

  • Qiime2 Nextflow
  • Dada2 R It was last updated 8 months ago.

Errors encountered and adjustments made in Dada2 R:

Errors encountered

  • We input our data and after running the pipeline the code exited with a "no input files found" error.
  • We resorted to running every command and noticed some errors in the kind of data being passed into some processes including:
  The variables holding the reads after the basename splitting of the reads were not working.
  The 'samples' variable being used in the pipeline was picking all forward and reverse reads.

Adjustments made

  We used the variables holding the data before the basename splitting.
  The 'samples' variable was corrected to individual sample variables for forward and reverse reads.
  • As a result of these adjustments, we corrected the following parts of the pipeline:
rawplotFreads and rawplotRreads
rawplotR and rawplotF
random_samples
dereplicating reads(samples)
making a sequence table(samples)
  • Lengths and indexes were also adjusted to match our samples sizes.
  • We also changed the labeling of plots for checking quality after filtering and trimming which had been done interchangeably (the forward in the reverse plot and vice versa).

Errors encountered and adjustments made in Qiime2 Nextflow:

Errors encountered

  • The pipeline had errors due to modules not being imported.
  • The visualization.nf module had two processes in one hence outputting two results: otu table and pcoa.

Adjustments made

  • The adjustment that can be made to the module's error is having a script that automatically loads the modules.
  • The pcoa process was removed and specified to its own process so as to output its own results. (Still under adjustment)

Functionality

. ├── Rdb
│   └── reference_db
├── artifacts
├── chimeras
├── dereps
├── fastqc_post
├── fastqc_raw
├── filter
├── merge
├── multiqc_postfastqc
├── multiqc_raw
├── orient
├── otus
├── pipeline_info
├── trimming
└── visualization
Some results were empty:

fastq_post
fastq_raw

We were able to visualize:

  • Alpha diversity
  • Beta diversity
  • Shannon diversity
  • Taxonomic barplots

Cons

  • Visualization script is not implemented in pipeline

nf-core/ampliseq

Pros

  • This is a good easy-to-set-up Qiime2 pipeline with a run time of 21 minutes.
  • Well documented
  • Regularly updated- It was last updated a month ago.
  • Visualization in html
  • Run in multiple containers - singularity/docker/podman/shifter

Run command

#16S rRNA gene amplicon analysis of Illumina paired-end data
nextflow run nf-core/ampliseq -profile singularity --input "data" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "data/Metadata.tsv"

Errors encountered and adjustments made

Errors encountered

The use of docker gave this error on running the command:

  • Command error: docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

We ran the same code using singularity and we got this error:

  • Caused by: Failed to pull singularity image

Adjustments made

  • We noticed that space was the issue and ran the code again in the hpc node. It ran with singularity.

Functionality

  • Qutadapt
  • Dada 2
  • Fastqc and multiqc
  • Qiime 2
  • SBDI