Skip to content

Sensitivity analysis

Clara Qin edited this page Jul 17, 2020 · 1 revision

View the outputs of the quality filtering sensitivity analysis here.

Motivation

Sequencing reads are lost at every step of the DADA2 processing pipeline, but we identified that most of the reads were being lost at the quality filtering step. What impacts would this have on the downstream analysis? The purpose of the quality filtering sensitivity analysis was to demonstrate the effects of various DADA2 quality filtering parameter combinations on the outputs of the DADA2 pipeline, and ultimately, on the ecological inference. Specifically, this script asks the following:

  • What are the effects of quality filtering parameters on volume of remaining reads?
  • What are the effects of quality filtering parameters on merging success?
  • What are the effects of quality filtering parameters on taxonomic assignment?
  • What are the effects of quality filtering parameters on alpha and beta diversity metrics?

The parameters that we vary to observe their downstream effects include the following:

  • truncQ: Each read will be truncated at the first base pair for which the quality score is equal or less than truncQ; default 2.
  • maxEE: After truncation, reads with higher than maxEE expected errors will be discarded. Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10)).

Running the quality filtering sensitivity analysis

The output of the initial sensitivity analysis can be viewed here. If you want to reproduce the analysis yourself, you can do so with the ./code/test_dada2_params.R script and ./code/test_dada2_params_plots.Rmd script.

Before running test_dada2_params.R, you must have completed the following requirements:

  1. Installed all dependencies outlined in the dependencies page.
  2. Modified the parameters in ./code/params.R to match your working directory and desired data subset. You may choose to use a different set of directories for this sensitivity analysis (e.g. a sensitivity/ subdirectory) than you would for your ordinary processing pipeline.
  3. Used 00_new_server_setup.R to create a directory structure containing all relevant microbial marker gene sequencing files for your analysis (as specified in params.R)

Once these requirements have been met, the sensitivity analysis can be run:

  1. In R/RStudio, be sure to set your working directory to be the project directory (e.g. NEON_soil_microbe_processing/), not the code/ subdirectory.
  2. Run test_dada2_params.R. This can take several days, with most of the processing time spent on taxonomy assignment. To quicken the pace on a Mac, you can go to the params.R file and set MULTLITHREAD = TRUE. The outputs from this script will be saved as several .Rds files in the ./data/ subdirectory. These .Rds files will then be picked up by test_dada2_params_plots.Rmd for plotting.
  3. Be sure to modify the root.dir argument at the top of the test_dada2_params_plots.Rmd file to match your project directory.
  4. Knit test_dada2_params_plot.Rmd. This can take several minutes. The output should include many plots and accompanying text that aims to answer some of the questions posed in the Motivation section above.