-
Notifications
You must be signed in to change notification settings - Fork 4
Sensitivity analysis
View the outputs of the quality filtering sensitivity analysis here.
Sequencing reads are lost at every step of the DADA2 processing pipeline, but we identified that most of the reads were being lost at the quality filtering step. What impacts would this have on the downstream analysis? The purpose of the quality filtering sensitivity analysis was to demonstrate the effects of various DADA2 quality filtering parameter combinations on the outputs of the DADA2 pipeline, and ultimately, on the ecological inference. Specifically, this script asks the following:
- What are the effects of quality filtering parameters on volume of remaining reads?
- What are the effects of quality filtering parameters on merging success?
- What are the effects of quality filtering parameters on taxonomic assignment?
- What are the effects of quality filtering parameters on alpha and beta diversity metrics?
The parameters that we vary to observe their downstream effects include the following:
- truncQ: Each read will be truncated at the first base pair for which the quality score is equal or less than truncQ; default 2.
- maxEE: After truncation, reads with higher than maxEE expected errors will be discarded. Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10)).
The output of the initial sensitivity analysis can be viewed here. If you want to reproduce the analysis yourself, you can do so with the ./code/test_dada2_params.R
script and ./code/test_dada2_params_plots.Rmd
script.
Before running test_dada2_params.R
, you must have completed the following requirements:
- Installed all dependencies outlined in the dependencies page.
- Modified the parameters in
./code/params.R
to match your working directory and desired data subset. You may choose to use a different set of directories for this sensitivity analysis (e.g. asensitivity/
subdirectory) than you would for your ordinary processing pipeline. - Used
00_new_server_setup.R
to create a directory structure containing all relevant microbial marker gene sequencing files for your analysis (as specified inparams.R
)
Once these requirements have been met, the sensitivity analysis can be run:
- In R/RStudio, be sure to set your working directory to be the project directory (e.g.
NEON_soil_microbe_processing/
), not thecode/
subdirectory. - Run
test_dada2_params.R
. This can take several days, with most of the processing time spent on taxonomy assignment. To quicken the pace on a Mac, you can go to theparams.R
file and setMULTLITHREAD = TRUE
. The outputs from this script will be saved as several.Rds
files in the./data/
subdirectory. These.Rds
files will then be picked up bytest_dada2_params_plots.Rmd
for plotting. - Be sure to modify the
root.dir
argument at the top of thetest_dada2_params_plots.Rmd
file to match your project directory. - Knit
test_dada2_params_plot.Rmd
. This can take several minutes. The output should include many plots and accompanying text that aims to answer some of the questions posed in the Motivation section above.