GitHub - gsmith98/ComputationalGenomicsProject: Computational Genomics Group Project

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Python_Script		Python_Script
diamond-0.8.26		diamond-0.8.26
figs		figs
.gitignore		.gitignore
README		README
data.csv		data.csv
diamond.pdf		diamond.pdf
ecoli.py		ecoli.py
ecoli_reads.fna		ecoli_reads.fna
gatherData.py		gatherData.py
parseDiamondOutput.py		parseDiamondOutput.py
reads.fna		reads.fna
sequence.fasta		sequence.fasta

Repository files navigation

Computational Genomics

Saranya Akumalla
Cameron Davis
Graham Smith
Jack Zhan

This directory contains all the code we used for the assignment, from our
group's Github. There were also extra data files and results not included in
the Github because of their extra space.

diamond-0.8.26 contains all the source code for diamond, along with its
documentation. We implemented the ability to set different reduced alphabets,
which can be set with the --reduced-alphabet flag, followed by a reduced
alphabet with groups separated by underscores.
Diamond requires a database, nr.dmnd, that is built using its makedb command
(examples in diamond's documentation). With this built, the alignment commands
can then be run. We have one built on our AWS instance, but it is a bit too
large to upload.

Database_Downloader.py needs to be run with Python 2.7. We used the search
query "ecoli" and database "protein" (prompted at runtime). We used 37 threads
to download the 37 URLs. Output is put in the directory "database_files",
created by the program. final_database.fasta is the compiled and filtered
sequence.

gatherData.py runs successive tests of diamond, with varying settings, the
settings can be tweaked in the program.

parseDiamondOutput.py is just used by gatherData.py to parse the results.

ecoli.py can be used to get reads from sequence.fasta, and place them into
reads.fna. We generated our reads and have them stored in reads.fna (and
ecoli_reads.fna).

make_hist.py and line_plot.py outputs to the directory figs/, creating data
from the .m8 results files.

Our write-up also discusses some of programs and how we ran them.