Skip to content

Latest commit

 

History

History
90 lines (61 loc) · 5.23 KB

CONTRIBUTING.md

File metadata and controls

90 lines (61 loc) · 5.23 KB

Development environment

Development of CoVizu is primarily being carried out on workstations and servers running Ubuntu 16.04+. However, we have also run the system on desktop computers running macOS 10.13.

Dependencies

GISAID database access

To retrieve data from GISAID, you must agree to the terms of their Database Access Agreement and then complete and submit the registration form. Note that access credentials are only necessary if you are doing any back-end development. For front-end development, we upload the necessary JSON files to our GitHub repository. These files do not contain any genome sequence data - only sequence labels and associated metadata (date and country of sample collection), and results from our clustering and phylogenetic reconstruction analyses are stored.

The following environment variables need to be defined for the automated GISAID download scripts (SeliniumAutobot.py and ), which can be done by adding the following lines to the .bashrc shell script in your home directory, for example:

export gisaid_u_variable='<your GISAID username>'
export gisaid_pw_variable='<your GISAID password>'

This does mean that these database access credentials are written to a plain text file in your filesystem. Consequently, you should only use this approach if you are running CoVizu in a fairly secure computing environment (e.g., a physically- and password-secured workstation, ideally with multi-factor authentication).

The downloading scripts can be automated through crontab on Linux:

0 0 * * * nohup python /home/covid/autobot.py >> /home/covid/Autobot.log 2>&1

Note that the absolute paths in the above are specific to the filesystem on our own server, and that you should replace this with your own path specification!

Coding style

Python

  • Use four spaces to indent, and other conventions described in the PEP8 style guide. It is easier to use a Python-aware IDE like PyCharm or Atom to automate this for you.

  • Organize your code into functions to facilitate testing and so that methods can be called from other scripts.

  • Isolate the main loop of your code under if __name__ == '__main__' to be executed only if the script is being run at the top level (from the command line).

  • Try to use argparse to provide help documentation for command line execution.

  • Use lowercase with underscores to separate words for function and variable names. Do not use camel-case.

  • Every function should open with a docstring. If the function is very brief with a small number of self-explanatory arguments, then a one-line docstring is fine. Otherwise, use a multi-line docstring. Use :param varname: description entries to document arguments. Use :return: type, description entries to document return values.

R

  • Indent with two spaces
  • Use base R whenever possible
  • Use . to separate words in variable and functino names, not _
  • Use #' prefix to document functions, e.g.:
    #' @param node: str, label of current node variant
    #' @param parent: str, label of current node's parental variant
    #' @param el: str, edge list from minimum spanning tree
    #' @return linearized vector of parent->child pairs
    traverse <- function(node, parent, el, edges=c()) {
  • Place any package requirements (i.e., require(igraph)) at the top of the script

Commiting your code

Do not push commits to the master branch. All development should be tracked in the dev branch and then merged into master after it has been tested.

If you are trying to implement something new that can potentially break parts of the code that other developers may be working on, then create a new branch and merge it into dev after you have finished building and testing the code.

Please try to write concise and informative commit messages.