-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathAttribute_object_examples.yaml
458 lines (394 loc) · 21.2 KB
/
Attribute_object_examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
####Active Version uploaded to EPC Github Repo here: https://github.com/NCATSTranslator/Evidence-Provenance-Confidence-Working-Group/blob/master/Attribute_object_examples.yaml
# Biolink/TRAPI 'Attribute' Object Modeling and Data Examples
TRAPI 'ATTRIBUTE' MODELING PROPOSAL:
## Attribute Schema Proposal (defined below using BiolinkML yaml syntax)
## Corresponds roughly to what Richard B. proposed in the PR here (https://github.com/NCATSTranslator/ReasonerAPI/pull/186/files),
## and discussed in the ticket here (https://github.com/NCATSTranslator/ReasonerAPI/issues/185).
## It extends the current 0.9 TRAPI Attribute model with additional slots to represent the nature of the relationship between the
## Attribute's value and the Edge/Node from which it hangs.
attribute:
description: >-
An object that holds any type of attribute of an Edge or Node in a KG, as defined in the Attribute object itself.
In a sense an Attribute object represents a reified slot. It allows data creators to define new 'slots' as objects
in the data, and specify in definition of the Attribute the type of value it holds, and the relationship of this
value to the Edge or Node form which it hangs. In this way, the core schema can be extended conceputally 'in the data'
to hold additional types of information about an Edge or Node without having to add new slots/features to the schema itself.
slots:
- attribute_type_id # a new proposed field. name t.b.d. other proposals include 'relationship_id', 'attribute_id', 'attribute_type', 'relation_id', 'slot_id'
- attribute_from_source # a new proposed field. name t.b.d. other proposals include 'source_relationship', 'attribute name', 'relation_name', 'slot_name'
- value
- value_type_id # previously called 'type', 'value_type'
- value_type_from_source # previously called 'name' , and 'value_type_name', 'value_type_from_source'
- value_url
- value_source
- description
slot_usage: # definitions of each slot in the Attribute class
attribute_type_id:
description: >-
A curie/iri of a ontology property that describes the type of relationship asserted to hold between the Attribute's value and
the Edge from which the Attribute hangs. e.g. that a p-value represents SEPIO:0000440 ('supporting evidence for') the Edge.
required: false
multivalued: false
range: curie
attribute_from_source:
description: >-
The original name or identifier expressing relationship between the Attribute's value and the Association/Edge from which it hanges,
as provided by the source of the Association. If no name or identifier for this relationship is provided by the source, leave this slot blank.
# Previous definition: "The preferred label for the ontology property captured in the attribute type slot, as given by the data creator
# (may be the same or different from the primary label of the ontology property)"
required: false
multivalued: false
range: string
value:
description: >-
The value of the property in the attribute_type slot. Could be a numerical value (e.g. the value of a supporting experimental measurement
or calculation), a string (e.g. a free-text name for an entity or concept), or a curie representing some entity or concepts
in the domain (e.g. a PMID representing a publiation, or an ECO code representing a type of evidence, or a blob of structured json).
required: true
multivalued: true
range: string
value_type_id:
description: >-
A class iri/curie for the 'semantic type' of the data captured in the value slot. Can be specified as a term from an external ontology
(e.g. EDAM, SIO, OBI, STATO), or a Biolink class. May be broader than the specific data type defined by the soruce, in cases where
there is no ontology term precise enough to represent the more specific concept (e.g. source term "magama p-value" --> EDAM:data_1169 'p-value')
required: false
multivalued: true
range: curie
value_type_from_source:
description: >-
The original name of the semantic type/data type of the info captured in the 'value' slot, as provided by the source of the Association.
If no name/identifier is provided by the source, this slot should be blank.
# Previous definition: Preferred name for the curie specified in the value_type slot, as given by the data creator
# (may be the same or different from the primary label of the ontology class)
required: false
multivalued: false
range: string
value_source:
description: >-
The resource that asserted/defined the relationship defined in the Attribute between
the value it holds and the entity/edge from which it hangs.
required: false
multivalued: false
range: string
value_url:
description: >-
An address of a web page or document that represents or describes the entity represented in
the 'value' field.
required: false
multivalued: true
range: string
description:
description: >-
A free text description of the value, and/or the intended meaning/use of the attribute-value pair
required: false
multivalued: false
range: string
close_mappings:
- FHIR:Extension # MHB: at some point we should consider patterning our model after the
# FHIR extensibility mechanism, and using their names/conventions for things
# https://www.hl7.org/fhir/extensibility.html
#----------------------------------------------------------------------------------------------------#
ATTRIBUTE OBJECT EXAMPLES TO DEMONSTRATE THE MODEL:
## The Example below demonstrate three different categories of values:
### 1. Attribute taking a quantitative value/data item (e.g. a p-value representing evidence supporting the edge)
attributes: # a p-value representing evidence supporting the assertion expressed in the edge
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
attribute_from_source: "evidence" # term the source may have used to describe the relationship of the p-value to the edge
value: 0.0000316
value_type_id: EDAM:data_1169 # p-value
value_type_from_source: "p-value"
value_source: ChEMBL
### 2. Attribute taking a CURIE as a value, that represents an *instance* of some class of entity (e.g. a publication describing evidence for the edge)
attributes: # a publication describing evidence used to support the edge
- attribute_type_id: SEPIO:0000438 # has evidence from source
attribute_from_source: "evidence from" # term the source may have used to describe the relationship of the publication to the edge
value: PMID:2012345
value_type_id: EDAM:data_1187 # PMID
value_type_from_source: "publication"
value_source: ChEMBL
description: "Smith et al 2018" # short description of what the value represents
attributes: # the agent who authored the knowledge expressed in the edge
- attribute_type_id: SEPIO:0000130 # authored by
attribute_from_source: "creator" # term the source may have used to describe the relationship of the ORCID to the edge
value: ORCID:0000-0001-8732-0928
value_type_id: EDAM:data_XXXX # ORCID
value_type_from_source: "identifier"
value_url: https://orcid.org/0000-0001-8732-0928
value_source: ChEMBL
description: "Eric Deutsch"
### 3. Attribute taking a value represented by an ontology term (e.g. an ECO code describing the type of evidence supporting the edge)
attributes: # an ECO code describing the type of evidence supporting the edge
- attribute_type_id: SEPIO:0000398 # has evidence of type
attribute_from_source: "type of evidence" # term the source may have used to describe the relationship of the ECO code to the edge
value: ECO:00000006
value_type_id: EDAM:data_0966 # ontology term
value_type_from_source: "code"
value_source: ChEMBL
description: "experimental evidence"
#----------------------------------------------------------------------------------------------------#
FULL ASSOCIATION EXAMPLES DEMONSTRTING THE PROPOSED MODEL, USING REAL KP DATA/USE CASES:
#### Example 1. Clinical Data Services association generated using a statistical methods based on observed-expected ratios, applied to a specific COHD cohort/dataset
Source Data from COHD API (http://cohd.smart-api.info/#/reasoner/query):
"knowledge_graph": {
"edges": [
{
"id": "ke000000",
"source_id": "OMOP:192855", # "Cancer in situ of urinary bladder"
"target_id": "OMOP:4021588", # "Instillation of therapeutic substance into bladder"
"type": "biolink:correlated_with"
"attributes": [
{
"name": "confidence_interval",
"source": "COHD",
"type": "EDAM:data_0951",
"value": [
7.374817270079623,
7.872219873513473
]
},
{
"name": "dataset_id",
"source": "COHD",
"type": "EDAM:data_1048",
"value": 3
},
{
"name": "expected_count",
"source": "COHD",
"type": "EDAM:operation_3438",
"value": 0.0846235661353298
},
{
"name": "ln_ratio",
"source": "COHD",
"type": "EDAM:data_1772",
"value": 7.645692224215023
},
{
"name": "observed_count",
"source": "COHD",
"type": "EDAM:data_0006",
"value": 177
}
],
}
]
}
TRAPI-Compliant Representation of this Association:
edges:
- id: cohd.Association001
category: biolink:Association
subject: OMOP:192855 # "Cancer in situ of urinary bladder"
predicate: biolink:correlated_with
object: OMOP:4021588 # "Instillation of therapeutic substance into bladder"
attributes:
- attribute_type_id: SEPIO:0000145 # has_evidence_from_source
attribute_from_source: null # null b/c the source does not explicitly describe the EPC relationship
value: cohd:dataset3
value_type_id: EDAM:data_1048 # Database ID. Might it be more appropritate to use IAO:0000100 'data set'?
value_type_from_source: null # no name provided in the source cohd data
value_source: COHD
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
attribute_from_source: null
value: 177
value_type_id: EDAM:data_0006 # data
value_type_from_source: observed count # this is cohd's more specific name for the type of data
value_source: COHD
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
attribute_from_source: null
value: 0.0846235661353298
value_type_id: EDAM:operation_3438 # calculation
value_type_from_source: expected count # this is cohd's more specific name for the type of data
value_source: COHD
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
attribute_from_source: null
value: 7.645692224215023
value_type_id: EDAM:data_1772 # score
value_type_from_source: ln ratio # this is cohd's more specific name for the type of data
value_source: COHD
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
attribute_from_source: null
value:
- 7.374817270079623
- 7.872219873513473
value_type_id: EDAM:data_0951 # statistical estimate score
value_type_from_source: confidence interval # this is cohd's more specific name for the type of data
value_source: COHD
# Since the confidence intervals are related to the ln_ratio, we could capture these inside the ln_ratio object in a nested Attribute
# - attribute_type_id: SEPIO:0000440 # has_supporting_evidence
# attribute_from_source: null
# value: 7.645692224215023
# value_type_id: EDAM:data_1772
# value_type_from_source: ln ratio
# value_source: COHD
# attributes:
# - attribute_type_id: SEPIO:XXXXXXX
# attribute_from_source: has_confidence_interval
# value:
# - 7.374817270079623
# - 7.872219873513473
# value_type_id: EDAM:data_0951
# value_type_from_source: confidence_interval
# value_source: COHD
# Proposing an Attribute to capture the statistical/computational method used to generate the Association
# - attribute_type_id: SEPIO:0000141 # was_specified_by
# attribute_from_source: "method"
# value: cohd.association/obsExpRatio
# value_type_id: EDAM:operation_3658 # statistical inference
# value_type_from_source: statistical inference
# Proposing an Attribute to capture the study group population that defines the context in which the association is held to be true
# - attribute_type_id: biolink:populationContextQualifier
# attribute_from_source: null
# value: cohd:cohort3-100
# value_type_id: STATO:0000193 # study group population
# value_type_from_source: study group population
#### Example 2. Text mining KP association generated from NLP-based text mining of publication text
Source Data from Text Mining KP API (http://35.232.64.189/docs): (soon to be TRAPI compliant)
# Edge:
graph": {
"nodes": [
{
"id": "n0",
"type": "chemical_substance",
"curie": "CHEBI:3215" # bupivacaine
},
{
"id": "n1",
"type": "gene_product",
"curie": "PR:000031567" # LRRC3B
}
],
"edges": [
{
"id": "e0",
"source_id": "n0",
"target_id": "n1",
"type": "negatively_regulates_entity_to_entity"
}
]
# Edge Properties:
{
'publication': 'PMID:29085514',
'score': '0.99956816',
'sentence': 'The administration of 50 ?g/ml bupivacaine promoted maximum breast cancer cell invasion, and suppressed LRRC3B mRNA expression in cells.',
'subject_spans': 'start: 31, end: 42',
'object_spans': 'start: 104, end: 110',
'provided_by': 'TMProvider'
}
TRAPI-Compliant Representation of this Association:
edges:
- id: tmkp.Association001
category: biolink:ChemicalToGeneAssociation
subject: CHEBI:3215 # bupivacaine
predicate: biolink:negatively_regulates_entity_to_entity
object: NCBIGene:51176 # LRRC3B
attributes:
- attribute_type_id: SEPIO:0000438 # has_supporting_evidence_from_source
attribute_from_source: "source publication" # what the source might have called the relationship
value: PMID:29085514
value_type_id: biolink:Publication # here a biolink term is used to type the value.
value_type_from_source: "PMID"
value_ source: TMProvider
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
value: "The administration of 50 ?g/ml bupivacaine promoted maximum breast cancer cell invasion, and suppressed LRRC3B mRNA expression in cells."
value_type_id: EDAM:data_3671 # text, or SIO:000113 'sentence'
value_type_from_source: sentence text
value_source: TMProvider
attributes: # exploring option of capturing this metadata in a nested Attribute, since it naturally represents a part of the sentence
- attribute_type_id: SIO:000028 # has part
value: 'start: 31, end: 42'
value_type_from_source: subject span
- attribute_type_id: SIO:000028 # has part
value: 'start: 104, end: 110'
value_type_from_source: object span
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
value: 0.99956816
value_type_id: EDAM:data_1772 # score
value_type_from_source: sentence confidence score
value_source: TMProvider
#### Example 3. Molecular Interaction association from Molepro KP, derived from CHEMBL
Source Data from MolePro chembl targets serice/API (https://translator.broadinstitute.org/chembl/targets/transform):
{ # the Association asserts that CID:2244 (Aspirin) - is correlated with - NCBIGene:51176 (LEF1)
"id": "NCBIGene:51176", # the object/target of the Association
"identifiers": {
"entrez": "NCBIGene:51176"
},
"source": "CMAP compound-to-gene transformer" # the Agent that authored the Association
"attributes": [], # attributes of the object/target node (none)
"biolink_class": "Gene", # the biolink category of the object of the Association
"connections": [ # connections = edges/associations, between the subject captured in the 'source_element_id' slot of the connection object below, and the object captured in the id slot above
{ "source_element_id": "CID:2244", # the subject/source of the Association
"type": "correlated_with" # the predicate of the Association
"attributes": [ # attributes of the Edge/Association
{
"name": "CMAP similarity score",
"provided_by": "CMAP compound-to-gene transformer",
"source": "CMAP",
"type": "CMAP similarity score",
"value": "99.68"
},
{
"name": "reference",
"provided_by": "CMAP compound-to-gene transformer",
"source": "CMAP",
"type": "reference",
"value": "PMID:29195078"
},
{
"name": "about CMAP",
"provided_by": "CMAP compound-to-gene transformer",
"source": "CMAP",
"type": "about CMAP",
"url": "https://clue.io/cmap",
"value": "https://clue.io/cmap"
},
{
"name": "CMAP touchstone data version",
"provided_by": "CMAP compound-to-gene transformer",
"source": "CMAP",
"type": "CMAP touchstone data version",
"url": "https://api.clue.io/api/touchstone-version",
"value": "1.1.1.2"
}
],
}
]
}
TRAPI-Compliant Representation of this Association:
edges:
- id: molepro.Association001
category: biolink:ChemicalToGeneAssociation
subject: CID:2244 # Aspirin
predicate: biolink:correlated_with
object: NCBIGene:51176 # LEF1
attributes:
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence
value: 99.68
value_type_id: CMAP similarity score # MolePro does not provide a coded type from EDAM - assuming this name is a placeholder?
value_type_from_source: CMAP similarity score
value_source: CMAP
- attribute_type_id: SEPIO:0000438 # has_supporting_evidence_from_source
value: PMID:29195078
value_type_id: reference # MolePro uses free text here at present
value_type_from_source: reference
value_source: CMAP
- attribute_type_id: SEPIO:0000442 # has_reference
value: https://clue.io/cmap
value_type_id: SIO:000302 # web page
value_type_from_source: about CMAP
value_source: CMAP
value_url: https://clue.io/cmap
- attribute_type_id: SEPIO:0000438 # has_supporting_evidence_from_source
value: CMAP touchstone v1.1.1.2 # MHB: I framed the version as the dataset itself, named with its version
value_type_id: IAO:0000100 # MolePro uses free text here, "CMAP touchstone data version" - propose a description slot to capture things like this
value_type_from_source: data set
value_source: CMAP
value_url: https://api.clue.io/api/touchstone-version
description: "CMAP touchstone data version"
# Proposing that the CMAP compound-to-gene transformer tool mentioned in each Attrbute be pulled out and represtned as the author / creator of the Association record
# - attribute_type_id: SEPIO:0000130 # authored by (MHB: or if this is the agent that created the record, not the knowledge, use SEPIO:record_created_by)
# attribute_from_source: null
# value: CMAP compound-to-gene transformer
# value_type_id: SWO:0000001 # software
# value_type_from_source: software