Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize #75

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
- [filter_results_top_n](./filter_results_top_n.md)
- [lookup](./lookup.md)
- [lookup_and_score](./lookup_and_score.md)
- [normalize_nodes](./normalize_nodes.md)
- [overlay](./overlay.md)
- [overlay_compute_jaccard](./overlay_compute_jaccard.md)
- [overlay_compute_ngd](./overlay_compute_ngd.md)
Expand Down
29 changes: 29 additions & 0 deletions docs/normalize_nodes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# normalize nodes

This operation updates the identifiers on qgraph and kgraph nodes to their preferred identifiers, and adds equivalent identifiers in a property for knodes. When two kgraph nodes normalize to the same preferred identifier, the two knodes are merged. The new node contain the union of the properties of the two original nodes. All edges attached to either of the two original nodes are now subsequently attached to the new merged knode. Qnodes are not merged, so that the structure of the query can be preserved. The updates to kgraph node identifiers also necessitates the updating of result node bindings.

### examples

- [input](../examples/normalize_nodes/messages/01_premerged_message.json), [output](../examples/normalize_nodes/messages/02_postmerged_message.json)

### input requirements

None

### output guarantees

None

### allowed changes

- modify qnodes
- modify knodes
- remove knodes
- modify kedges
- modify node bindings

### parameters

```yaml
[]
```
170 changes: 170 additions & 0 deletions examples/normalize_nodes/messages/01_premerged_message.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file be names something like 01_prenormalized_message.json instead of premerged?

"message": {
"query_graph": {
"nodes": {
"n1": {
"ids": ["HGNC:11603"],
"categories": [
"biolink:Gene"
]
},
"n2": {
"ids": ["NCBIGene:9496"],
"categories": [
"biolink:Gene"
]
},
"n3": {
"ids": ["MONDO:0005002"],
"categories": [
"biolink:Disease"
]
},
"n4": {
"ids": ["DOID:3083"],
"categories": [
"biolink:Disease"
]
},
"n5": {
"categories": [
"biolink:Disease"
]
}
},
"edges": {
"e1": {
"subject": "n1",
"object": "n3"
},
"e2": {
"subject": "n2",
"object": "n4",
"predicates": ["biolink:related_to"]
},
"e3": {
"subject": "n1",
"object": "n5"
}
}
},
"knowledge_graph": {
"nodes": {
"HGNC:11603": {
"name": "TBX4",
"categories": [
"biolink:Gene"
]
},
"NCBIGene:9496": {
"name": "T-box transcription factor 4",
"categories": [
"biolink:Gene"
]
},
"MONDO:0005002": {
"name": "chronic obstructive pulmonary disease",
"categories": [
"biolink:Disease"
]
},
"DOID:3083": {
"name": "chronic obstructive pulmonary disease",
"categories": [
"biolink:Disease"
]
},
"UMLS:CN202575": {
"name": "heritable pulmonary arterial hypertension",
"categories": [
"biolink:Disease"
]
}
},
"edges": {
"a8575c4e-61a6-428a-bf09-fcb3e8d1644d": {
"subject": "HGNC:11603",
"object": "MONDO:0005002",
"predicate": "biolink:related_to"
},
"2d38345a-e9bf-4943-accb-dccba351dd04": {
"subject": "NCBIGene:9496",
"object": "DOID:3083",
"predicate": "biolink:related_to"
},
"044a7916-fba9-4b4f-ae48-f0815b0b222d": {
"subject": "HGNC:11603",
"object": "UMLS:CN202575",
"predicate": "biolink:related_to"
}
}
},
"results": [
{
"node_bindings": {
"n1": [
{
"id": "HGNC:11603"
}
],
"n3": [
{
"id": "MONDO:0005002"
}
]
},
"edge_bindings": {
"e1": [
{
"id": "a8575c4e-61a6-428a-bf09-fcb3e8d1644d"
}
]
}
},
{
"node_bindings": {
"n2": [
{
"id": "NCBIGene:9496"
}
],
"n4": [
{
"id": "DOID:3083"
}
]
},
"edge_bindings": {
"e2": [
{
"id": "2d38345a-e9bf-4943-accb-dccba351dd04"
}
]
}
},
{
"node_bindings": {
"n1": [
{
"id": "HGNC:11603"
}
],
"n5": [
{
"id": "UMLS:CN202575"
}
]
},
"edge_bindings": {
"e3": [
{
"id": "044a7916-fba9-4b4f-ae48-f0815b0b222d"
}
]
}
}
]
},
"logs": null,
"status": null
}
Loading