WebSTR-API provides access to the data via different “endpoints” - addresses on the url that are formed in a certain way. To access endpoints you can either use the curl tool in the Terminal or simply edit these the following links directly in your Browser. It is also possible to use WebSTR-API in your code, see Python and R examples at the bottom of this page.
Default output is in JSON format, details for each endpoint are described in the documentation - http://webstr-api.ucsd.edu/docs It is also possible to download results in a csv format by adding "&download=True" to each request. For example: https://webstr-api.ucsd.edu/repeats?gene_names=HTT&download=True will download a csv will all repeats associated with the HTT gene in repeats.scv file.
If you are interested in extracting STRs located in the genomic region of your interest, “repeats” endpoint would be the most important. It is possible to access genomic region of interest via genomic coordinates, common gene names or Ensemble IDs.
Access repeats via genomic coordinates:
Example request for chromosome 4, coordinates 3186810-4121716:
http://webstr-api.ucsd.edu/repeats?region_query=4:3186810-4121716
Example output:
[{"repeat_id":86397,"chr":"chr4","start":3186839,"end":3186860,"msa":"T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T",
"motif":"T","period":1,"copies":22,"ensembl_id":"ENSG00000197386","strand":"+","gene_name":"HTT",
"gene_desc":"huntingtin","total_calls":29,"frac_variable":0.896551724137931,"avg_size_diff":3.68965517241379},
{"repeat_id":1550762,"chr":"chr4","start":3313970,"end":3313989,"msa":null,"motif":"AAAT",
"period":4,"copies":5,"ensembl_id":"ENSG00000248840","strand":"+","gene_name":null,"gene_desc":null,
"total_calls":null,"frac_variable":null,"avg_size_diff":null},
....
]
Access repeats via common gene names: http://webstr-api.ucsd.edu/repeats?gene_names=HTT
Access repeats via EnsembleID: http://webstr-api.ucsd.edu/repeats?ensembl_ids=ENSG00000197386
To get repeats from several genes, chain gene_names parameters together in the following manner: https://webstr-api.ucsd.edu/repeats/?gene_names=HTT&gene_names=AGRN
You will need a repeat id number from our database, available via repeats endpoint, see example above. Example request: http://webstr-api.ucsd.edu/repeatinfo/?repeat_id=1
Via gene name get gene features: http://webstr-api.ucsd.edu/genefeatures/?gene_names=HTT
Here is an example of importing repeats for genes of interest into a pandas DataFrame directly from the API using requests library.
import requests
import pandas as pd
resp = requests.get('http://webstr-api.ucsd.edu/repeats/?gene_names=HTT&gene_names=AGRN'
df_repeats = pd.DataFrame.from_records(resp.json())
The same in R can be achieved using httr and jsonlite packages.
library(httr)
library(jsonlite)
res = GET("http://webstr-api.ucsd.edu/repeats/?gene_names=HTT&gene_names=AGRN")
repeats = fromJSON(rawToChar(res$content))