-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Jupyter, RCurl, rjson, IRKernal, pheatmap, ggplot2, RColorBrewer
BacDiveApiCrawler.R CleanProTrait.R CombineData.R SourceMicrobialData.ipynb
BacDiveCrawler()
retrieves information from the BacDive API, organizing it into a formatted table
BacDiveCrawler(usrname, pass, num_requests = 10, save_file = TRUE)
usrname the username for a verified BacDive account
pass the password for a corresponding BacDive account
num_requests the number of bacterial entries to asynchronously download
save_file if true, saves a .csv to the working directory containing the information extracted from the BacDive API
Designed to traverse the API provided by BacDive. The BacDive API provides a database that can easily queried, providing microbial physiology data in the JSON format. Each specie contains its own ‘page’, which details information such as taxonomy, morphology, strain information, and more. This script currently selectively chooses certain traits to record, meaning that there is more data that could be chosen to extracted, if implemented.
Returns a data.frame containing information extracted from BacDive
Because this traverses the site’s API, it is still limited by internet speeds and the rate at which the site’s server responds. This can be detrimental to the speed at which the script can run. In addition, if the number of bacterial entries requested is too high, it may be too demanding for BacDive's server.
CleanProTrait()
retrieves information from a file downloaded from the ProTrait Atlas, formatting it into a table
CleanProTrait(save_file = TRUE)
save_file if true, saves a .csv to the working directory containing the information extracted from the ProTrait Atlas
Designed to extract information from a table created by ProTrait. It lacks a format that generalizes traits, instead listing each type of trait (gram-positive, pathogenic in animals, aerobe, etc) as its column. Therefore, this script organizes this table into generalized traits, providing for an easy way to use this table for purposes such as annotation. It will first check if the ProTrait file already exists in the working directory. If it does not, it will download the file to the working directory and start formatting it.
Returns a data.frame containing information extracted from ProTrait
CombineData()
combines the given tables into a single, formatted table
CombineData(protrait, bacdive, save_file = TRUE)
protrait a data.frame containing the traits sourced from the ProTrait Atlas
bacdive a data.frame containing the traits sources from the BacDive database
save_file if true, saves a .csv to the working directory containing the information resulting from the combined table
CombineData.R is a script that merges the tables extracted from BacDive and ProTrait. This script is required because the column labels produced for each of these tables are different and there are different traits extracted in general. This script works to create one cohesive table.
Returns a data.frame containing a information from the combined tables
This script lacks the functionality of being able to merge any two given tables. This therefore leaves it limited to the tables produced by the ProTrait and BacDive functions. It also does not yet handle duplicate species.
HeatMap.R
load.abundance.data()
is a method for loading abundance table in .csv files in the appropriate format for use with the heat map creating functions
load.abundance.data(path, column = 1)
path the path from the working directory to the .csv file containing the abundance table
column the column number containing the feature names
The abundance table needs to be loaded into R in such a way that the row names are the feature names, the sample names are the column names, and all its values are numerics.
Returns a numerical matrix created from the abundance table
load.meta.data()
is a method for loading metadata in .csv files in the appropriate format for use with the heat map creating functions
load.meta.data(path, tax_column = 1)
path the path from the working directory to the .csv file containing the metadata
tax_column the column number containing the taxonomical or sample (ie identifying) name for the metadata
This can be used to load feature or sample metadata. Metadata needs to be loaded in such a way that the row names are the identifying names and the traits are the column names.
Returns a data.frame containing information extracted from the metadata
This will eliminate all duplicate entries from the metadata without merging their data resulting in potential data loss.
create.correlogram()
creates a heat map based on the correlation of features given an abundance table and feature metadata.
create.correlogram(data, feature_meta, show = TRUE)
data abundance data in a numerical matrix
feature_meta a data.frame containing feature metadata
show if true, will display the graph upon completion
The features need to be the rows of the abundance data.
Returns a pheatmap with the following components: row hclusters, column hclusters, kmeans, and gtable
create.heatmap()
creates a heat map based on relative abundance, with row and column dendrograms based on given metadata
create.heatmap(data, sample_meta, feature_meta, percentile = 0.75, show = FALSE, omit_na = TRUE)
data abundance data in a numerical matrix
sample_meta a data.frame containing sample metadata
feature_meta a data.frame containing feature metadata
percentile a filter for displaying only entries with a threshold correlation
show if true, will display the graph upon completion
omit_na whether to eliminate entries that are missing meta data
The features need to be the rows of the abundance data.
Returns a pheatmap with the following components: row hclusters, column hclusters, kmeans, and gtable
one.v.all()
uses the create.heatmap function, but filters the metadata such that it labels only a single feature category and type, labeling all others as 'other'
one.v.all(data, sample_meta, feature_meta, which = 2, percentile = 0.75, show = FALSE, column, trait)
data abundance data in a numerical matrix
sample_meta a data.frame containing sample metadata
feature_meta a data.frame containing feature metadata
which a number representing whether to filter the sample(1) or feature(2) metadata
percentile a filter for displaying only entries with a threshold correlation
show if true, will display the graph upon completion
column the column number with the feature category
trait the specific feature type to use
Compare only one feature type against all others in a feature category (ex. aerobic respiration v all other oxygen requirements). The features need to be the rows of the abundance data. Can supply any number of feature categories, but only one will be used.
Returns a pheatmap with the following components: row hclusters, column hclusters, kmeans, and gtable
all.one.v.all()
uses the one.v.all function, creates a heatmap for every feature type found
all.one.v.all <- function(data, sample_meta, feature_meta, which = 2, percentile = 0.75, show = FALSE, column, directory='')
data abundance data in a numerical matrix
sample_meta a data.frame containing sample metadata
feature_meta a data.frame containing feature metadata
which a number representing whether to filter the sample(1) or feature(2) metadata
percentile a filter for displaying only entries with a threshold correlation
show if true, will display the graph upon completion
column the column number with the feature category
directory the path from the working directory to where the file should be saved
Creates a heatmap for every feature type found (ex. 3 forms of oxygen requirements). The features need to be the rows of the abundance data. Can supply any number of feature categories, but only one will be used. Will automatically name the files based on the trait