-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding example for cross contamination
- Loading branch information
1 parent
7c9fdd1
commit 6240cff
Showing
4 changed files
with
148 additions
and
0 deletions.
There are no files selected for viewing
17 changes: 17 additions & 0 deletions
17
use_case_examples/contamination_detection_example/1_before_starting.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Download SRR25626360 which represents WGS of Haemophilus influenzae | ||
nohup fastq-dump --fasta 60 SRR25626360 2>&1 & | ||
|
||
### Download SRR24210460 which represents WGS of mycoplasma pneumoniae from library MDY | ||
nohup fastq-dump --fasta 60 SRR24210460 2>&1 & | ||
|
||
### Download SRR7217470 which represents WGS of Chlamydia pneumoniae | ||
nohup fastq-dump --fasta 60 SRR7217470 2>&1 & | ||
|
||
### Download SRR5962942 which represents WGS of Streptococcus pneumoniae | ||
nohup fastq-dump --fasta 60 SRR5962942 2>&1 & | ||
|
||
### Download SRR26202532 which represents WGS of Bordetella pertussis | ||
nohup fastq-dump --fasta 60 SRR26202532 2>&1 & | ||
|
||
### Download SRR2830253, reads of a healthy human lung microbiome | ||
nohup fastq-dump --fasta 60 SRR2830253 2>&1 & |
30 changes: 30 additions & 0 deletions
30
use_case_examples/contamination_detection_example/2_before_starting.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Create the example sample data for a patient with respiratory symptoms seeks to find out the pathogen that is causing them these symptoms. | ||
|
||
# Before moving on. Make sure reads needed to create sample dataset are available. Please reference create_reference_database.md | ||
|
||
# Create samples that will be loaded to the 96-well tray | ||
|
||
# Negative control, so just reads from a healthy lung | ||
cat SRR2830253.fasta negative_control_well_11.fasta | ||
|
||
# Positive control with H. influenzae | ||
cat SRR25626360.fasta SRR2830253.fasta > positive_control_well_23.fasta | ||
|
||
# Sample 1 | ||
cat SRR25626360.fasta SRR2830253.fasta SRR25626360.fasta > positive_control_well_64.fasta | ||
|
||
# Sample 2 | ||
cat SRR24210460.fasta SRR2830253.fasta SRR25626360.fasta > sample_well_80.fasta | ||
|
||
# I check one of my negative controls, which is a healthy lung example and we should not detect any bacteria here | ||
# no contamination | ||
|
||
# I check one of my positive controls for M. pneumonaie which should not have H. influenzae | ||
# no contamination | ||
|
||
# I check one of my positive controls for H. influenzae which should not have M. pneumonaie | ||
# contamination | ||
|
||
# I check one of my samples for H. influenzae which should not have M. pneumonaie | ||
# contamination | ||
|
95 changes: 95 additions & 0 deletions
95
...ase_examples/contamination_detection_example/contamination_detection_example.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# Pathogen Detection Example | ||
A research is being conducted on how microbial communities are being shaped dependin on the type of respiratory disease you have. Samples were collected from two patience in wihch one patient is M. pneumoniae positive and the other in H. influenzae. To save time and many, only one 96-well tray will be used for both samples. Before downstream analysis can be performed, we want to know if cross contamination between samples occured during the loading of the 96-well tray and we randomly choose wells 11, 23, 64, and 80. | ||
|
||
Make sure all bacterial reads needed to create your reference dataset also known as a training dataset are available. | ||
```bash | ||
bash 1_before_starting.sh | ||
``` | ||
```bash | ||
bash 2_before_starting.sh | ||
``` | ||
|
||
### Sketch your training dataset and sample to your preference. | ||
|
||
#### Using k=31 | ||
Note: training and sample datasets are required to have the same ksize. Please note that since we are sketching from a list of genomes. We can use the following sourmash sketch command: | ||
```bash | ||
sourmash sketch fromfile genome_list.csv -p dna,k=31,scaled=1000,abund -o training_database.k31.sig.zip | ||
``` | ||
|
||
Sketch the negative control reads from well 11 | ||
```bash | ||
sourmash sketch dna negative_control_well_11.fasta -p k=31,scaled=1000,abund -o negative_control_well_11.k31.sig.zip | ||
``` | ||
|
||
Sketch the positive control from well 23 | ||
```bash | ||
sourmash sketch dna positive_control_well_23.fasta -p k=31,scaled=1000,abund -o positive_control_well_23.k31.sig.zip | ||
``` | ||
|
||
Sketch the positive control from well 64 | ||
```bash | ||
sourmash sketch dna positive_control_well_64.fasta -p k=31,scaled=1000,abund -o positive_control_well_64.k31.sig.zip | ||
``` | ||
|
||
Sketch the sample from well 80 | ||
```bash | ||
sourmash sketch dna sample_well_80.fasta -p k=31,scaled=1000,abund -o sample_well_80.k31.sig.zip | ||
``` | ||
|
||
### Make training data for k=31 | ||
```bash | ||
python ../../make_training_data_from_sketches.py --ref_file training_database.k31.sig.zip --ksize 31 --ani_thresh 0.95 --out_prefix 'training_database.k31' | ||
``` | ||
|
||
### Identify whether the patient has a infection and what pathogen is causing the disease. | ||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'negative_control_well_11.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'negative_control_well_11_k31_result.xlsx' --outdir './' | ||
``` | ||
|
||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'positive_control_well_23.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'positive_control_well_23_k31_result.xlsx' --outdir './' | ||
``` | ||
|
||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'positive_control_well_64.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'positive_control_well_64_k31_result.xlsx' --outdir './' | ||
``` | ||
|
||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'sample_well_80.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'sample_well_80_k31_result.xlsx' --outdir './' | ||
``` | ||
|
||
### Results | ||
Using a ksize of 31 at ANI 0.95, YACHT finds XYZ | ||
|
||
## Let's decrease ANI to 0.50 | ||
|
||
### Make training data for k=31 | ||
```bash | ||
python ../../make_training_data_from_sketches.py --ref_file training_database.k31.sig.zip --ksize 31 --ani_thresh 0.50 --out_prefix 'training_database.k31_ani0.50' | ||
``` | ||
|
||
### Pathogen Detection using YACHT | ||
Identify whether the patient has a infectin and what pathogen is causing the disease. | ||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'negative_control_well_11.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './' | ||
``` | ||
|
||
Identify whether the patient has a infectin and what pathogen is causing the disease. | ||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'positive_control_well_23.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './' | ||
``` | ||
|
||
Identify whether the patient has a infectin and what pathogen is causing the disease. | ||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'positive_control_well_64.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './' | ||
``` | ||
|
||
Identify whether the patient has a infectin and what pathogen is causing the disease. | ||
```bash | ||
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'sample_well_80.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './' | ||
``` | ||
|
||
|
||
### Results | ||
Decreasing ANI to 0.50 and using a ksize of 31, YACHT finds XYZ |
6 changes: 6 additions & 0 deletions
6
use_case_examples/contamination_detection_example/genome_list.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
0,name,genome_filename,protein_filename | ||
1,SRR25626360,SRR25626360.fasta, | ||
2,SRR24210460,SRR24210460.fasta, | ||
3,SRR7217470,SRR7217470.fasta, | ||
4,SRR5962942,SRR5962942.fasta, | ||
5,SRR26202532,SRR26202532.fasta, |