Week 4 tests

What we did:

Testing of the nf-core pipeline using different datasets(ITS data and 18S data)
Running Yosef's DADA2 pipeline
Running 16S-Accreditation DADA2 pipeline

Challenges

Obtaining test data

The metagenomic studies datasets we got did not have metadata

nf-core/ampliseq

16S Datasets with nf-core/ampliseq

Stingless bee microbiome data

Command

nextflow run nf-core/ampliseq -profile singularity --input "/node/cohort4/ckigen/data/honeybee/microbiome-data" --FW_primer "CCTACGGGNGGCWGCAG" --RV_primer "GACTACHVGGGTATCTAATCC" --metadata "../metadata.tsv" -resume

Runtime

28 min 49 s

Output

.
├── cutadapt
├── dada2
├── fastqc
├── multiqc
├── overall_summary.tsv
├── pipeline_info
└── qiime2

ITS Datasets with nf-core/ampliseq

Food metagenomic dataset

Command

 nextflow run nf-core/ampliseq -profile singularity --input "/node/cohort4/ckigen/ITS-metagenomics/dish-metagenomics/data/seq" --FW_primer "GGAAGTAAAAGTCGTAACAAGG" --RV_primer "GCTGCGTTCTTCATCGATGC" --illumina_pe_its --metadata "./metadata.tsv" -resume

Dummy Metadata

ID    treatment
SRR3003942      a
SRR3003943      b
SRR3003944      c
SRR3003945      d

Runtime

28 min 5 s

Output

.
├── cutadapt
├── dada2
├── fastqc
├── multiqc
├── overall_summary.tsv
├── pipeline_info
└── qiime2

Yosef's DADA2 pipeline

Error 1: Reading the sample metadata into R

While running the nf-core data, we got this error in reading the sample metadata into R :

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘a’, ‘b’

After trying to fix it by adding row.names="samples", an idea we borrowed from the 16S-Accreditation DADA2 pipeline, we got this error:

Error in data[[rowvar]] : 
  attempt to select less than one element in get1index

Solution was adding "row.names=NULL"

Initially:
sdata <- read.csv("stingless_bee_sample_metadata.csv", sep = ',', header = TRUE)

Changed to:
sdata <- read.table("/node/cohort4/nelly/16s-rRNA-Project/metadata.tsv", sep = '\t', header = TRUE, row.names=NULL)

Error 2: Extracting the filtered taxonomy and feature tables for barplot plotting

The parameters in making the taxonomy table were not agreeing to what we had in our phyloseq object:

Error in phyloseq_to_df(physeq1, addtax = T, addtot = F, addmaxrank = F) : 
  Error: Sample names in 'physeq' could not be automatically converted to the syntactically valid column names in data.frame (see 'make.names'). Consider renaming with 'sample_names'.

We changed the addtax = T to addtax = F and it worked

Initially:
tax_table <- phyloseq_to_df(physeq1, addtax = T, addtot = F, addmaxrank = F)

Changed to:
tax_table <- phyloseq_to_df(physeq1, addtax = F, addtot = F, addmaxrank = F)

We encountered the same error in creating the silva_classified object and solved it in the same way

Initially:
silva_classified <- phyloseq_to_df(physeq2, addtax = T, addtot = F, addmaxrank = F)

Changed to:
silva_classified <- phyloseq_to_df(physeq2, addtax = F, addtot = F, addmaxrank = F)

Error 3: BLAST for all Microbiota

In merging the silva classified taxonomy with the blast classified ones, the silva classified data object was being merged with a data object that had not yet been created(merged_data)
We solved this by merging with the cumulation object instead, which had already been created:

Initially:
silva_blast_raw <- as.data.frame(bind_rows(silva_classified, merged_data))

Changed to:
silva_blast_raw <- as.data.frame(bind_rows(silva_classified, cumulation))

Error 4: Removing the first staxids and making the OTU column the first

There was an object blast_out_1 mentioned that had not been created (might have been a typo)

Solution was changing it to blast_out, which had already been created:

Initially:
blast_results <- merge(blast_taxa, blast_out_1, by = "OTU", all = FALSE)

Changed to:
blast_results <- merge(blast_taxa, blast_out, by = "OTU", all = FALSE)

On solving that, we got this error:

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

We ran the objects being merged individually and realized that only the blast_taxa object had OTU
We solved the problem by naming the first column of the blast_out as OTU instead of qseqid:

Initially:
colnames <- c("qseqid",
              "sseqid",
              "evalue",
              "bitscore",
              "sgi",
              "sacc")

Changed to:
colnames <- c("OTU",
              "sseqid",
              "evalue",
              "bitscore",
              "sgi",
              "sacc")

As a result, the blast_taxa$OTU <- blast_out$qseqid could not run, but seeing that its intended purpose was not fulfilled, we commented it out

Error 5: Assigning to different genera/clusters

This was the last point we got to in running this pipeline and we encountered the following error:

Error: object 'Lactobacillus' not found

The problem hasn't been solved yet

MBBU/16S-Accreditation pipeline

Error 1: Creating a phyloseq object

We encountered the following errors:

Error in data[[rowvar]] : attempt to select less than one element
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on '../demo.txt'

Error in validObject(.Object) : invalid class “phyloseq” object: Component sample names do not match. Try sample_names()

We solved them by specifying the sampleID names in the metadata to one variable and providing it as the row.names object

Initially:
samdata <- read.table("set5_meta.txt",header = TRUE,row.names = "sample")

Changed to:
sample <- c("Dog1", "Dog2", "Dog3", "Dog8", "Dog9", "Dog10", "Dog15", "Dog16", "Dog17", "Dog22", "Dog23", "Dog24", "Dog29")
samdata <- read.table("/node/cohort4/nelly/16s-rRNA-Project/practice.dataset1.meta.txt",header = TRUE,row.names = sample)

Error 2: Rooting the tree

In the cloned script, this part came before the part for creating a phyloseq object yet it required a phyloseq object
The solution we had for this was shifting its position to make it go after creating the phyloseq object

Error 3: Missing BV variable

We encountered this error in the following parts of the script:

#Taxonomy plots.
#Phylum
phylum_plot <-  plot_bar(phyloseq_object, x="sample", fill="Phylum") + facet_wrap(~BV, scales="free_x")

#Genus
Genus_plot <-  plot_bar(ps.top50, x="sample", fill="Genus") + facet_wrap(~BV, scales="free_x")

#visualising alpha diversity
top30plot <- plot_bar(ps.top30, x="Age", fill="Inflammation") + facet_wrap(~BV, scales="free_x")

Due to lack of knowledge of what should go into the facet_wrap, we were not able to solve that particular BV issue

Breakthroughs

nf-core pipeline

We successfully ran pipeline using:
- Another set of 16S data(Stingless bee microbiome dataset)
- ITS data(Food metagenomic dataset)
We also realized the pipeline has a website in which we launched the pipeline version that allowed us to input values as per our data

Yosef's DADA2 pipeline

We tested it using 2 sets of data (Stingless bee microbiome data and the nf-core data)
We were able to solve the most of challenges indicated above
Changes that should be made to be specific to one's data are in the following steps:
- Loading the fastq files into R: input the path to the fastq files
- Extracting the file names: should match the files format
- Primer trimming: input primers specific to the sequences
- Path to cutadapt: input the path to the available cutadapt version
- Filtering and trimming data: change the parameters to fit the sequences being run
- Taxonomic classification: indicate paths to silva_nr_v138_train_set.fa.gz and silva_species_assignment_v138.fa.gz
- Reading the sample metadata into R: provide metadata file path(note separator in the file and update accordingly) and metadata column names
- Create BIOM file: change path to the biom file
- Extracting the filtered taxonomy and feature tables for barplot plotting: tax_table object and silva_classified object parameters might change for different datasets
- BLAST for all Microbiota: change the M_blast_abundance object data.frame column parameters and M_to_blast_dada2_SB_sequences oject path
- Running blast: provide paths to blastn, blast_db, and input objects
- Removing .1-9 string to get the rightful accession numbers: provide paths to taxaId and blast_taxa objects
- Merging the blast taxonomic classification to blast abundance table: provide path to merged_data_all_1 object

16S-Accreditation DADA2 pipeline

We tested the pipeline using 2 sets of data(Dog data and nf-core data)
We progressed from where we had got stuck in the last code-review(creating a phyloseq object)
Changes that should be made to be specific to one's data are in the following steps:
- Setwd("path to working directory"): input the path to the fastq files
- Reading sample names using r: should match the files format
- One holding the file names of all the forward and reverse reads: should match the files format
- Variables holding file names for the forward and reverse filtered reads: should match the files format
- Looking at a specific samples quality profile: should be according to the number of samples
- Plotting quality profiles for random samples: should be according to the number of samples
- Trimming and filtering: change the parameters to fit the sequences being run
- Checking the quality profile again: should be according to the number of samples
- Plotting Quality Profiles For Random Samples After Filtering: should be according to the number of samples
- Assigning Taxonomy: indicate paths to silva_nr_v138_train_set.fa.gz and silva_species_assignment_v138.fa.gz
- Alternative training database in assigning taxonomy: indicate path to rdp_train_set_18.fa
- Creating a phyloseq object: provide metadata file path(note separator in the file and update accordingly) and specify and metadata column names to the sample variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Week 4 tests

What we did:

Challenges

Obtaining test data

nf-core/ampliseq

16S Datasets with nf-core/ampliseq

Stingless bee microbiome data

Command

Runtime

Output

ITS Datasets with nf-core/ampliseq

Food metagenomic dataset

Command

Dummy Metadata

Runtime

Output

Yosef's DADA2 pipeline

Error 1: Reading the sample metadata into R

Error 2: Extracting the filtered taxonomy and feature tables for barplot plotting

Error 3: BLAST for all Microbiota

Error 4: Removing the first staxids and making the OTU column the first

Error 5: Assigning to different genera/clusters

MBBU/16S-Accreditation pipeline

Error 1: Creating a phyloseq object

Error 2: Rooting the tree

Error 3: Missing BV variable

Breakthroughs

nf-core pipeline

Yosef's DADA2 pipeline

16S-Accreditation DADA2 pipeline

Clone this wiki locally