Skip to content

Commit

Permalink
WFS1
Browse files Browse the repository at this point in the history
  • Loading branch information
pnrobinson committed Nov 26, 2023
1 parent 257da3e commit 212ce61
Show file tree
Hide file tree
Showing 5 changed files with 272 additions and 526 deletions.
3 changes: 3 additions & 0 deletions docs/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ were mainly created using the Python library [pyphetools](https://github.com/mon
| [STXBP1](https://github.com/monarch-initiative/phenopacket-store/blob/main/notebooks/STXBP1/Xian_2022_STXBP1.ipynb){:target="_blank"} | 463 phenopackets; [Developmental and epileptic encephalopathy 4](https://omim.org/entry/612164){:target="_blank"} |
| [SUOX](https://github.com/monarch-initiative/phenopacket-store/blob/main/notebooks/SUOX/SUOX_Li_PMID_36303223_CreatePhenopackets.ipynb){:target="_blank"} | 35 phenopackets; [Sulfite oxidase deficiency](https://omim.org/entry/272300){:target="_blank"} |
| [TRAF7](){:target="_blank"} | 45 phenopackets; [Cardiac, facial, and digital anomalies with developmental delay](https://omim.org/entry/618164){:target="_blank"} |
| [WFS1](https://github.com/monarch-initiative/phenopacket-store/tree/main/notebooks/WFS1){:target="_blank"} | 16 phenopackets; [Wolfram syndrome 1](https://omim.org/entry/222300){:target="_blank"}, [Deafness, autosomal dominant 6](https://omim.org/entry/600965){:target="_blank"} |





Expand Down
207 changes: 105 additions & 102 deletions notebooks/WFS1/PMID_18688868.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,48 +19,53 @@
"name": "stdout",
"output_type": "stream",
"text": [
"pyphetools version 0.6.4\n"
"Using pyphetools version 0.8.30\n"
]
}
],
"source": [
"import phenopackets as php\n",
"from google.protobuf.json_format import MessageToDict, MessageToJson\n",
"from google.protobuf.json_format import Parse, ParseDict\n",
"import pandas as pd\n",
"pd.set_option('display.max_colwidth', None) # show entire column contents, important!\n",
"from collections import defaultdict\n",
"import numpy as np\n",
"from IPython.display import display, HTML\n",
"from pyphetools.creation import *\n",
"from pyphetools.visualization import *\n",
"from pyphetools.validation import *\n",
"import pyphetools\n",
"print(f\"pyphetools version {pyphetools.__version__}\")"
"print(f\"Using pyphetools version {pyphetools.__version__}\")"
]
},
{
"cell_type": "markdown",
"id": "8606e7eb",
"metadata": {},
"source": [
"<h2>Importing HPO data</h2>\n",
"<p>pyphetools uses the Human Phenotype Ontology (HPO) to encode phenotypic features. The recommended way of doing this is to ingest the hp.json file using HpoParser, which in turn creates an HpoConceptRecognizer object. </p>\n",
"<p>The HpoParser can accept a hpo_json_file argument if you want to use a specific file. If the argument is not passed, it will download the latext hp.json file from the HPO GitHub site and store it in a new subdirectory called hpo_data. It will not download the file if the file is already downloaded.</p>"
"<h2>Importing HPO data</h2>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5a7789fc",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HPO version 2023-10-09\n"
]
}
],
"source": [
"parser = HpoParser()\n",
"hpo_cr = parser.get_hpo_concept_recognizer()\n",
"hpo_version = parser.get_version()\n",
"hpo_ontology = parser.get_ontology()\n",
"PMID = \"PMID:18688868\"\n",
"title = \"Autoimmune disease in a DFNA6/14/38 family carrying a novel missense mutation in WFS1\"\n",
"metadata = MetaData(created_by=\"ORCID:0000-0002-5648-2155\", pmid=PMID, pubmed_title=title)\n",
"metadata.default_versions_with_hpo(version=hpo_version)"
"metadata.default_versions_with_hpo(version=hpo_version)\n",
"print(f\"HPO version {hpo_version}\")"
]
},
{
Expand All @@ -79,7 +84,7 @@
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_excel('../../data/WFS1/PMID_18688868.xlsx')"
"df = pd.read_excel('input/PMID_18688868.xlsx')"
]
},
{
Expand Down Expand Up @@ -194,16 +199,23 @@
"</div>"
],
"text/plain": [
" patient_id ... V:2\n",
"0 Sex ... male\n",
"1 Age ... 17\n",
"2 Variant ... c.2576G>A\n",
"3 Low-frequency sensorineural hearing impairment ... +\n",
"4 Progressive sensorineural hearing impairment ... +\n",
"5 Graves disease ... -\n",
"6 Crohn's disease ... -\n",
" patient_id II:2 III:1 \\\n",
"0 Sex female female \n",
"1 Age 97 55 \n",
"2 Variant c.2576G>A c.2576G>A \n",
"3 Low-frequency sensorineural hearing impairment + + \n",
"4 Progressive sensorineural hearing impairment + + \n",
"5 Graves disease - + \n",
"6 Crohn's disease - - \n",
"\n",
"[7 rows x 7 columns]"
" III:3 IV:2 IV:4 V:2 \n",
"0 female female female male \n",
"1 69 38 43 17 \n",
"2 c.2576G>A c.2576G>A c.2576G>A c.2576G>A \n",
"3 + + + + \n",
"4 + + + + \n",
"5 - - - - \n",
"6 - + - - "
]
},
"execution_count": 4,
Expand All @@ -221,11 +233,7 @@
"metadata": {},
"source": [
"<h1>Converting to row-based format</h1>\n",
"<p>To use pyphetools, we need to have the individuals represented as rows (one row per individual) and have the items of interest be encoded as column names. The required transformations for doing this may be different for different input data, but often we will want to transpose the table (using the pandas <tt>transpose</tt> function) and set the column names of the new table to the zero-th row. After this, we drop the zero-th row (otherwise, it will be interpreted as an individual by the pyphetools code).</p>\n",
"<p>After this step is completed, the remaining steps to create phenopackets are the same as in the \n",
" <a href=\"http://localhost:8888/notebooks/notebooks/Create%20phenopackets%20from%20tabular%20data%20with%20individuals%20in%20rows.ipynb\" target=\"__blank\">row-based notebook</a>.</p>\n",
" \n",
"Furthermore, for this specific case, there is a Count features row that we want dropped, so we filter out any row that does not have Patient in the first column."
"<p>For this specific case, there is a Count features row that we want dropped, so we filter out any row that does not have Patient in the first column.</p>"
]
},
{
Expand Down Expand Up @@ -320,14 +328,33 @@
"</div>"
],
"text/plain": [
"patient_id Sex Age ... Graves disease Crohn's disease\n",
"II:2 female 97 ... - -\n",
"III:1 female 55 ... + -\n",
"III:3 female 69 ... - -\n",
"IV:2 female 38 ... - +\n",
"IV:4 female 43 ... - -\n",
"patient_id Sex Age Variant \\\n",
"II:2 female 97 c.2576G>A \n",
"III:1 female 55 c.2576G>A \n",
"III:3 female 69 c.2576G>A \n",
"IV:2 female 38 c.2576G>A \n",
"IV:4 female 43 c.2576G>A \n",
"\n",
"[5 rows x 7 columns]"
"patient_id Low-frequency sensorineural hearing impairment \\\n",
"II:2 + \n",
"III:1 + \n",
"III:3 + \n",
"IV:2 + \n",
"IV:4 + \n",
"\n",
"patient_id Progressive sensorineural hearing impairment Graves disease \\\n",
"II:2 + - \n",
"III:1 + + \n",
"III:3 + - \n",
"IV:2 + - \n",
"IV:4 + - \n",
"\n",
"patient_id Crohn's disease \n",
"II:2 - \n",
"III:1 - \n",
"III:3 - \n",
"IV:2 + \n",
"IV:4 - "
]
},
"execution_count": 5,
Expand Down Expand Up @@ -378,7 +405,6 @@
"metadata": {},
"outputs": [],
"source": [
"hpo_cr = parser.get_hpo_concept_recognizer()\n",
"generator = SimpleColumnMapperGenerator(df=dft, observed='+', excluded='-', hpo_cr=hpo_cr)\n",
"column_mapper_d = generator.try_mapping_columns()"
]
Expand Down Expand Up @@ -427,7 +453,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 9,
"id": "da4d5706",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -455,7 +481,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 10,
"id": "23a2aefc-9ec9-4517-b7cb-bd117c6a9b5a",
"metadata": {},
"outputs": [],
Expand All @@ -474,7 +500,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 11,
"id": "3e64dc08",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -548,7 +574,7 @@
"5 17 P17Y"
]
},
"execution_count": 16,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -560,7 +586,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 12,
"id": "71f664cc",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -634,7 +660,7 @@
"5 male MALE"
]
},
"execution_count": 17,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -646,7 +672,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 14,
"id": "f6581a8a",
"metadata": {},
"outputs": [],
Expand All @@ -660,76 +686,56 @@
" variant_mapper=varMapper, \n",
" metadata=metadata,\n",
" pmid=PMID)\n",
"encoder.set_disease(disease_id='OMIM:600965', label='Deafness, autosomal dominant 6')"
"deafness_as6 = Disease(disease_id='OMIM:600965', disease_label='Deafness, autosomal dominant 6')\n",
"encoder.set_disease(deafness_as6)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 15,
"id": "fd367ed6",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/html": [
"<h2>Cohort validation</h2>\n",
"<p>No errors found for the cohort with 6 individuals</p>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"individuals = encoder.get_individuals()"
"individuals = encoder.get_individuals()\n",
"cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)\n",
"qc = QcVisualizer(ontology=hpo_ontology, cohort_validator=cvalidator)\n",
"display(HTML(qc.to_summary_html()))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 16,
"id": "5d044b78",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table style=\"border: 2px solid black;\">\n",
"\n",
"<table style=\"border: 2px solid black; align: \"left\">\n",
"<caption>6 phenopackets - PMID:18688868 (n=6)</caption>\n",
"\n",
"<tr>\n",
" <th>Individual</th>\n",
" <th>Disease</th>\n",
" <th>Genotype</th>\n",
" <th>Phenotypic features</th>\n",
" </tr>\n",
" \n",
"<tr>\n",
"<td>II:2 (FEMALE; P97Y)</ts>\n",
"<td>Deafness, autosomal dominant 6 (OMIM:600965)</ts>\n",
"<td>NM_006005.3:c.2576G>A (heterozygous)</td>\n",
"<td class=\"table-data\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408)</td>\n",
"</tr>\n",
"<tr>\n",
"<td>III:1 (FEMALE; P55Y)</ts>\n",
"<td>Deafness, autosomal dominant 6 (OMIM:600965)</ts>\n",
"<td>NM_006005.3:c.2576G>A (heterozygous)</td>\n",
"<td class=\"table-data\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); Graves disease (HP:0100647)</td>\n",
"</tr>\n",
"<tr>\n",
"<td>III:3 (FEMALE; P69Y)</ts>\n",
"<td>Deafness, autosomal dominant 6 (OMIM:600965)</ts>\n",
"<td>NM_006005.3:c.2576G>A (heterozygous)</td>\n",
"<td class=\"table-data\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408)</td>\n",
"</tr>\n",
"<tr>\n",
"<td>IV:2 (FEMALE; P38Y)</ts>\n",
"<td>Deafness, autosomal dominant 6 (OMIM:600965)</ts>\n",
"<td>NM_006005.3:c.2576G>A (heterozygous)</td>\n",
"<td class=\"table-data\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); Crohn's disease (HP:0100280)</td>\n",
"</tr>\n",
"<tr>\n",
"<td>IV:4 (FEMALE; P43Y)</ts>\n",
"<td>Deafness, autosomal dominant 6 (OMIM:600965)</ts>\n",
"<td>NM_006005.3:c.2576G>A (heterozygous)</td>\n",
"<td class=\"table-data\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408)</td>\n",
"</tr>\n",
"<tr>\n",
"<td>V:2 (MALE; P17Y)</ts>\n",
"<td>Deafness, autosomal dominant 6 (OMIM:600965)</ts>\n",
"<td>NM_006005.3:c.2576G>A (heterozygous)</td>\n",
"<td class=\"table-data\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408)</td>\n",
"</tr>\n",
"</table>\n"
"<tr><th style=\"text-align: left;font-weight: bold;\">Individual</th><th style=\"text-align: left;font-weight: bold;\">Disease</th><th style=\"text-align: left;font-weight: bold;\">Genotype</th><th style=\"text-align: left;font-weight: bold;\">Phenotypic features</th></tr>\n",
"<tr><td style=\"text-align: left;\">II:2 (FEMALE; P97Y)</td><td style=\"text-align: left;\">Deafness, autosomal dominant 6 (OMIM:600965)</td><td style=\"text-align: left;\">NM_006005.3:c.2576G>A (heterozygous)</td><td style=\"text-align: left;\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)</td></tr>\n",
"<tr><td style=\"text-align: left;\">III:1 (FEMALE; P55Y)</td><td style=\"text-align: left;\">Deafness, autosomal dominant 6 (OMIM:600965)</td><td style=\"text-align: left;\">NM_006005.3:c.2576G>A (heterozygous)</td><td style=\"text-align: left;\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)</td></tr>\n",
"<tr><td style=\"text-align: left;\">III:3 (FEMALE; P69Y)</td><td style=\"text-align: left;\">Deafness, autosomal dominant 6 (OMIM:600965)</td><td style=\"text-align: left;\">NM_006005.3:c.2576G>A (heterozygous)</td><td style=\"text-align: left;\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)</td></tr>\n",
"<tr><td style=\"text-align: left;\">IV:2 (FEMALE; P38Y)</td><td style=\"text-align: left;\">Deafness, autosomal dominant 6 (OMIM:600965)</td><td style=\"text-align: left;\">NM_006005.3:c.2576G>A (heterozygous)</td><td style=\"text-align: left;\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); Crohn's disease (HP:0100280); excluded: Graves disease (HP:0100647)</td></tr>\n",
"<tr><td style=\"text-align: left;\">IV:4 (FEMALE; P43Y)</td><td style=\"text-align: left;\">Deafness, autosomal dominant 6 (OMIM:600965)</td><td style=\"text-align: left;\">NM_006005.3:c.2576G>A (heterozygous)</td><td style=\"text-align: left;\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)</td></tr>\n",
"<tr><td style=\"text-align: left;\">V:2 (MALE; P17Y)</td><td style=\"text-align: left;\">Deafness, autosomal dominant 6 (OMIM:600965)</td><td style=\"text-align: left;\">NM_006005.3:c.2576G>A (heterozygous)</td><td style=\"text-align: left;\">Low-frequency sensorineural hearing impairment (HP:0008573); Progressive sensorineural hearing impairment (HP:0000408); excluded: Graves disease (HP:0100647); excluded: Crohn's disease (HP:0100280)</td></tr>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
Expand All @@ -740,16 +746,14 @@
}
],
"source": [
"from IPython.display import HTML, display\n",
"\n",
"phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]\n",
"table = PhenopacketTable(phenopacket_list=phenopackets)\n",
"individuals = cvalidator.get_error_free_individual_list()\n",
"table = PhenopacketTable(individual_list=individuals, metadata=metadata)\n",
"display(HTML(table.to_html()))"
]
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 17,
"id": "23f1094f",
"metadata": {},
"outputs": [
Expand All @@ -764,8 +768,7 @@
"source": [
"output_directory = \"phenopackets\"\n",
"Individual.output_individuals_as_phenopackets(individual_list=individuals,\n",
" pmid=PMID,\n",
" metadata=metadata.to_ga4gh(),\n",
" metadata=metadata,\n",
" outdir=output_directory)"
]
},
Expand Down Expand Up @@ -794,7 +797,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.3"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 212ce61

Please sign in to comment.