Development of CoVizu is primarily being carried out on workstations and servers running Ubuntu 16.04+. However, we have also run the system on desktop computers running macOS 10.13.
- a C build environment
- Mozilla geckodriver v0.26+
- Python 3.6 or higher, and the following modules:
- GNU sed stream editor
- TN93 v1.0.6
- FastTree2 version 2.1.10+, compiled for double precision
- TreeTime version 0.7.5+
- R 3.6+, and the following packages:
To retrieve data from GISAID, you must agree to the terms of their Database Access Agreement and then complete and submit the registration form. Note that access credentials are only necessary if you are doing any back-end development. For front-end development, we upload the necessary JSON files to our GitHub repository. These files do not contain any genome sequence data - only sequence labels and associated metadata (date and country of sample collection), and results from our clustering and phylogenetic reconstruction analyses are stored.
The following environment variables need to be defined for the automated GISAID download scripts (SeliniumAutobot.py
and ), which can be done by adding the following lines to the .bashrc
shell script in your home directory, for example:
export gisaid_u_variable='<your GISAID username>'
export gisaid_pw_variable='<your GISAID password>'
This does mean that these database access credentials are written to a plain text file in your filesystem. Consequently, you should only use this approach if you are running CoVizu in a fairly secure computing environment (e.g., a physically- and password-secured workstation, ideally with multi-factor authentication).
The downloading scripts can be automated through crontab
on Linux:
0 0 * * * nohup python /home/covid/autobot.py >> /home/covid/Autobot.log 2>&1
Note that the absolute paths in the above are specific to the filesystem on our own server, and that you should replace this with your own path specification!
-
Use four spaces to indent, and other conventions described in the PEP8 style guide. It is easier to use a Python-aware IDE like PyCharm or Atom to automate this for you.
-
Organize your code into functions to facilitate testing and so that methods can be called from other scripts.
-
Isolate the main loop of your code under
if __name__ == '__main__'
to be executed only if the script is being run at the top level (from the command line). -
Try to use
argparse
to provide help documentation for command line execution. -
Use lowercase with underscores to separate words for function and variable names. Do not use camel-case.
-
Every function should open with a docstring. If the function is very brief with a small number of self-explanatory arguments, then a one-line docstring is fine. Otherwise, use a multi-line docstring. Use
:param varname: description
entries to document arguments. Use:return: type, description
entries to document return values.
- Indent with two spaces
- Use base R whenever possible
- Use
.
to separate words in variable and functino names, not_
- Use
#'
prefix to document functions, e.g.:#' @param node: str, label of current node variant #' @param parent: str, label of current node's parental variant #' @param el: str, edge list from minimum spanning tree #' @return linearized vector of parent->child pairs traverse <- function(node, parent, el, edges=c()) {
- Place any package requirements (i.e.,
require(igraph)
) at the top of the script
Do not push commits to the master
branch. All development should be tracked in the dev
branch and then merged into master
after it has been tested.
If you are trying to implement something new that can potentially break parts of the code that other developers may be working on, then create a new branch and merge it into dev
after you have finished building and testing the code.
Please try to write concise and informative commit messages.