Skip to content

Commit

Permalink
adding example for cross contamination
Browse files Browse the repository at this point in the history
  • Loading branch information
bioinfwithjudith committed Nov 17, 2023
1 parent 7c9fdd1 commit 6240cff
Show file tree
Hide file tree
Showing 4 changed files with 148 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Download SRR25626360 which represents WGS of Haemophilus influenzae
nohup fastq-dump --fasta 60 SRR25626360 2>&1 &

### Download SRR24210460 which represents WGS of mycoplasma pneumoniae from library MDY
nohup fastq-dump --fasta 60 SRR24210460 2>&1 &

### Download SRR7217470 which represents WGS of Chlamydia pneumoniae
nohup fastq-dump --fasta 60 SRR7217470 2>&1 &

### Download SRR5962942 which represents WGS of Streptococcus pneumoniae
nohup fastq-dump --fasta 60 SRR5962942 2>&1 &

### Download SRR26202532 which represents WGS of Bordetella pertussis
nohup fastq-dump --fasta 60 SRR26202532 2>&1 &

### Download SRR2830253, reads of a healthy human lung microbiome
nohup fastq-dump --fasta 60 SRR2830253 2>&1 &
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Create the example sample data for a patient with respiratory symptoms seeks to find out the pathogen that is causing them these symptoms.

# Before moving on. Make sure reads needed to create sample dataset are available. Please reference create_reference_database.md

# Create samples that will be loaded to the 96-well tray

# Negative control, so just reads from a healthy lung
cat SRR2830253.fasta negative_control_well_11.fasta

# Positive control with H. influenzae
cat SRR25626360.fasta SRR2830253.fasta > positive_control_well_23.fasta

# Sample 1
cat SRR25626360.fasta SRR2830253.fasta SRR25626360.fasta > positive_control_well_64.fasta

# Sample 2
cat SRR24210460.fasta SRR2830253.fasta SRR25626360.fasta > sample_well_80.fasta

# I check one of my negative controls, which is a healthy lung example and we should not detect any bacteria here
# no contamination

# I check one of my positive controls for M. pneumonaie which should not have H. influenzae
# no contamination

# I check one of my positive controls for H. influenzae which should not have M. pneumonaie
# contamination

# I check one of my samples for H. influenzae which should not have M. pneumonaie
# contamination

Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Pathogen Detection Example
A research is being conducted on how microbial communities are being shaped dependin on the type of respiratory disease you have. Samples were collected from two patience in wihch one patient is M. pneumoniae positive and the other in H. influenzae. To save time and many, only one 96-well tray will be used for both samples. Before downstream analysis can be performed, we want to know if cross contamination between samples occured during the loading of the 96-well tray and we randomly choose wells 11, 23, 64, and 80.

Make sure all bacterial reads needed to create your reference dataset also known as a training dataset are available.
```bash
bash 1_before_starting.sh
```
```bash
bash 2_before_starting.sh
```

### Sketch your training dataset and sample to your preference.

#### Using k=31
Note: training and sample datasets are required to have the same ksize. Please note that since we are sketching from a list of genomes. We can use the following sourmash sketch command:
```bash
sourmash sketch fromfile genome_list.csv -p dna,k=31,scaled=1000,abund -o training_database.k31.sig.zip
```

Sketch the negative control reads from well 11
```bash
sourmash sketch dna negative_control_well_11.fasta -p k=31,scaled=1000,abund -o negative_control_well_11.k31.sig.zip
```

Sketch the positive control from well 23
```bash
sourmash sketch dna positive_control_well_23.fasta -p k=31,scaled=1000,abund -o positive_control_well_23.k31.sig.zip
```

Sketch the positive control from well 64
```bash
sourmash sketch dna positive_control_well_64.fasta -p k=31,scaled=1000,abund -o positive_control_well_64.k31.sig.zip
```

Sketch the sample from well 80
```bash
sourmash sketch dna sample_well_80.fasta -p k=31,scaled=1000,abund -o sample_well_80.k31.sig.zip
```

### Make training data for k=31
```bash
python ../../make_training_data_from_sketches.py --ref_file training_database.k31.sig.zip --ksize 31 --ani_thresh 0.95 --out_prefix 'training_database.k31'
```

### Identify whether the patient has a infection and what pathogen is causing the disease.
```bash
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'negative_control_well_11.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'negative_control_well_11_k31_result.xlsx' --outdir './'
```

```bash
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'positive_control_well_23.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'positive_control_well_23_k31_result.xlsx' --outdir './'
```

```bash
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'positive_control_well_64.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'positive_control_well_64_k31_result.xlsx' --outdir './'
```

```bash
python ../../run_YACHT.py --json 'training_database.k31_config.json' --sample_file 'sample_well_80.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'sample_well_80_k31_result.xlsx' --outdir './'
```

### Results
Using a ksize of 31 at ANI 0.95, YACHT finds XYZ

## Let's decrease ANI to 0.50

### Make training data for k=31
```bash
python ../../make_training_data_from_sketches.py --ref_file training_database.k31.sig.zip --ksize 31 --ani_thresh 0.50 --out_prefix 'training_database.k31_ani0.50'
```

### Pathogen Detection using YACHT
Identify whether the patient has a infectin and what pathogen is causing the disease.
```bash
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'negative_control_well_11.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './'
```

Identify whether the patient has a infectin and what pathogen is causing the disease.
```bash
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'positive_control_well_23.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './'
```

Identify whether the patient has a infectin and what pathogen is causing the disease.
```bash
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'positive_control_well_64.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './'
```

Identify whether the patient has a infectin and what pathogen is causing the disease.
```bash
python ../../run_YACHT.py --json 'training_database.k31_ani0.50_config.json' --sample_file 'sample_well_80.k31.sig.zip' --significance 0.99 --min_coverage 1 0.5 0.1 0.05 0.01 --out_filename 'k31_ani0.50_result.xlsx' --outdir './'
```


### Results
Decreasing ANI to 0.50 and using a ksize of 31, YACHT finds XYZ
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
0,name,genome_filename,protein_filename
1,SRR25626360,SRR25626360.fasta,
2,SRR24210460,SRR24210460.fasta,
3,SRR7217470,SRR7217470.fasta,
4,SRR5962942,SRR5962942.fasta,
5,SRR26202532,SRR26202532.fasta,

0 comments on commit 6240cff

Please sign in to comment.