A starter tutorial for bio scientists and engineers wanting to access data sources in an automated fashion using web services.
One of the most frustrating things about domain specific research is adopting and developing your concepts leveraging existing tools and datasets. This tutorial aims to give a general straight-forward tutorial by automating the access of a 3rd party REST API (BiGG).
This tutorial assumes you have a general understanding of programming - primarily in the python programming language.
It is suggested that you do this tutorial on a linux or OSX (all commands will assume this platform in this tutorial).
If you are not familiar with the following concepts, it is highly suggested you take a few minutes a research them to get a general understanding:
- Python programming language (we will be using python 2.7)
- REST (Representational State Transfer) - Wikipedia
Make sure you create a folder for the work that you will do:
mkdir web-service-bio-tutorial
cd web-service-bio-tutorial
To organize our dependencies, you will need set up our python
environment. We will use virtual environments. Virtual environments allow us to create a development sandbox for all our project dependencies.
pip install virtualenv
virtualenv venv
virtualenv -p /usr/bin/python2.7 venv
source venv/bin/activate
If you don't have pip, you may need to install it (on Ubuntu: sudo apt-get install python-pip)
You may notice now that there is a (venv)
marker at the beginning of your terminal line. This means it worked!
Now that we are running in our virtual environment (our sandbox), we need to install a dependency:
Requests - a nice HTTP library that will help us access the BiGG API
To install the dependency, we will run:
pip install requests
If you have completed the tutorial this far, it means you are ready to start coding!
While it may be easy to access information on models, reactions, metabolites, and genes by simply using the frontend search engine of BiGG, the researchers and students and UCSD have also provided us a RESTful webservice to access the data in the BiGG database in an automated fashion.
The information on how to access BiGG via its API can be found here. You will notice that they provide curl
examples. While it is possible to make system calls through python using curl
, there is a much cleaner way: using requests
library to access data directly.
Let's begin by creating our main.py
file:
touch main.py
Opening main.py
in our favorite text editor or IDE, we will begin by importing requests
:
# main.py
import requests
Now that we have requests
imported, we can use that to access a specific HTTP
route provided by BiGG data access API. From the BiGG Documentation, we know we can use curl
to access the database version:
curl 'http://bigg.ucsd.edu/api/v2/database_version'
and we expect a response to be of the form:
{
"bigg_models_version": "1.1.0",
"api_version": "v2",
"last_updated": "2016-03-21 17:24:11.138365"
}
We see that this is a JSON
based API (see more about JSON here), so we know we can use python dictionaries to represent this data.
So let's get the info we need in main.py
:
import requests
res = requests.get("http://bigg.ucsd.edu/api/v2/database_version")
print "HTTP Response Code: ", res.status_code
bigg_info = res.json()
print "BiGG Model Version: ", bigg_info["bigg_models_version"]
Our first line res = requests.get("...")
does an RESTful (HTTP) webservice call to GET
the information we want (in this case, the model version number).
The second call print ... , res.status_code
simply verifies that the call was successful. You should see a 200
status code. More on HTTP
status codes here.
The third line (bigg_info = res.json()
) is the most important line. Here we take the response object from our HTTP request, grab the payload (in this case, the model version and other things), convert that payload from JSON
to a python dictionary, and assigns it our variable bigg_info
.
The fourth line simply prints the bigg_models_version
.
Awesome! We now have our first RESTful access of the BiGG API (and it only took 5 lines of code)!
OK, so we got some basic information, but what if we wanted to get information about a specific reaction such as Adenosine deaminase? Let's go to the BiGG docs:
Looks like there is a general reaction access route: curl 'http://bigg.ucsd.edu/api/v2/universal/reactions/ADA'
. Let's use that. Notice the print statements removed from getting the bigg_info
.
import requests
# Get BiGG database info
res = requests.get("http://bigg.ucsd.edu/api/v2/database_version")
bigg_info = res.json()
# Get ADA reaction from BiGG
res_reaction = requests.get("http://bigg.ucsd.edu/api/v2/universal/reactions/ADA")
ada_reaction_info = res_reaction.json()
metabolite_bigg_id = ada_reaction_info["metabolites"][0]["bigg_id"]
## Get a lot of information about all metabolites associated with an ADA reaction
for metabolite in ada_reaction_info["metabolites"]:
met_url = "http://bigg.ucsd.edu/api/v2/universal/metabolites/{0}".format(metabolite["bigg_id"])
res_met = requests.get(met_url)
met = res_met.json()
print met["name"], met["formulae"][0]
You will notice that we first got the Adenosine deaminase
reaction, and then from that, we looped through, each time requesting more general information about the metabolite based on its bigg_id
. In this tutorial, we simply printed the name of the metabolite and the first formula associated with it. However, in your application, with a little knowledge of python, you can use that information and add to it based on your specific needs.
In this brief (and very basic) tutorial, we showed you how to access a database (BiGG) full of biological information. You can use this as a starter to add the ability to automate the harvesting of information for your python projects and research.
If you felt like it was too much work to copy and paste the few lines of python code above into your own python environment, don't worry, you can always clone this repo with the python file in it and run in a terminal:
#Clone repo and cd into the cloned folder
git clone https://github.com/bjyurkovich/rest-tutorial-for-bio-science.git
cd https://github.com/bjyurkovich/rest-tutorial-for-bio-science.git
#Set up your env and install requests dependency
pip install virtualenv
virtualenv venv
virtualenv -p /usr/bin/python2.7 venv
source venv/bin/activate
pip install -r requirements.txt
#run it!
python main.py
Head over to the Escher Interactive tutorial to see an example of how BiGG API is used in conjunction with KeGG, PDB, and Chebi!