DaSH 2017 Progress Report

Progress Report

Haplotype Frequency Curation Service

In 2017, the Haplotype Frequency Curation Service (https://github.com/nmdp-bioinformatics/service-haplotype-frequency-curation) was initially implemented. Endpoints for getting and posting Haplotype Frequencies and Populations have been implemented: http://phycus.b12x.org:8080/swagger-ui.html. Code has also been added to support a delete endpoint for Haplotype Frequencies.

Perl, Python and Java clients have been checked in with the service and all make use of Swagger code generation.

The Java client is multi-module and includes an infrastructure for command line tools, with a basic tool implemented for pushing haplotype frequencies in a standard file format to the Frequency Curation Service with some basic annotations.

/* More to be said about Perl and Python clients here - @mhalagan-nmdp, @hpeberhard */

Discussions regarding useful (pragmatic?) annotation of haplotype frequencies and populations is underway, with implementation/upload of some real world examples likely to further fuel the conversation. Further discussion, and the implementation, of access control will likely be necessary before certain frequency sets may be uploaded. Clarity around annotation of haplotype frequency sets and populations will aid in determining how to implement duplicate detection within the service.

/* Other additions? @fscheel, @sauter, @HofmannJ, @jbrelsf2-nmdp, @pbashyal-nmdp

HL7 FHIR

IHIWS Follow Up

Primate MHC

I started with the goal of extending tools like feature-service, GFE, ACT to non-human primate MHC.

After locating the NHP.dat file from the IPD website I noticed it fails to conform to the EMBL standard in many ways: summary of nhp.dat analysis:

the file does not parse with EMBL and IMGT BioPython parsers
the RA (Reference Author) field has non utf-8 characters (control characters)
the ID field needs to have 7 fields separated by “;” — only has one
the annotation does not carry over genbank annotation

I wrote my own parser was able to parse out: 9605 genbank accession ids for 6812 alleles at 374 loci in 53 non-human primate species

Many of the alleles are defined based on cDNA so there is no genomic annotation to be found. But even the alleles (great apes) with gDNA annotations in genbank apparently are unannotated in IPD-NHP.

So, I build a mySQL "BioSQL" database and loaded it with the 9605 genbank entries linked back to the corresponding 6812 alleles from IPD-NHP. From here it is now possible to use BioPython to be able to do feature-level analysis. (@mmaiers-nmdp)

Etc

DaSH

Home
DaSH 15 (Utrecht) 2024
DaSH 14 (Oklahoma City) 2024
DaSH 13 (Rochester) 2023
DASH VRS (Virtual) 2022
DASSH3 (Virtual) 2020
DASH12 (Virtual) 2022
DASSH4 (Virtual) 2021
DASH11 (Virtual) 2021
DASSH3 (Virtual) 2020
DASH10 (Virtual) 2020
DASH Validation (Minneapolis) 2020
DaSSH 2 (Minneapolis) 2019
DASH9 (Denver) 2019
DASH8 (Baltimore) 2018
DASSH FHIR (Minneapolis) 2018
DASH7 (Utrecht) 2017
DASH IHIWS (Stanford) 2017
DASH6 (Heidelberg) 2017
DASH5 (Berkeley) 2017
DASH4 (Vienna) 2016
DASH3 (Minneapolis) 2016
DASH2 (La Jolla) 2015
DASH1 (Bethesda) 2014
Preparing for the Hackathon
Tool access
- AWS Account Creation and Setup
- Shared Hackathon Server Access
- Tutorial
Tools
- MIRING
- HML
- HML/MIRING Validation
- HL7 FHIR
- HL7 OID Registration
- Gene Feature Enumeration
- GL Service
- Pipeline
- Validation tools
- Public resources
Data
Github help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly