Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 502581d
Merge: 37ce60f 713f527
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 9 16:09:17 2024 -0500

    Merge pull request #265 from RobokopU24/add_GHaction

    Added a GitHub action...

commit 37ce60f
Merge: cd9a6c5 90c6231
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 9 16:07:41 2024 -0500

    Merge pull request #242 from RobokopU24/DnlRKorn-patch-4

    Fixed regex pattern issue in loadCTD.py

commit cd9a6c5
Merge: fcf6611 d63d4bb
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 9 16:04:50 2024 -0500

    Merge pull request #240 from RobokopU24/DnlRKorn-patch-2

    Dynamically load latest version of GenomeAlliance data

commit fcf6611
Merge: 43874d9 2a39e3a
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 9 16:00:09 2024 -0500

    Merge pull request #272 from RobokopU24/drugcentral-logscale-potencies

    Convert activity type mapping to log-scale.

commit 2a39e3a
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Tue Dec 3 12:18:36 2024 -0500

    making all instances of affinity/affinity parameter use the constants

commit 43874d9
Merge: 1867684 3491e3f
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 2 11:53:28 2024 -0500

    Merge pull request #267 from RobokopU24/LINCS

    Lincs

commit 3491e3f
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 2 11:51:21 2024 -0500

    fixing parser see commit description

    fixing several bugs and broken variable names
    - fixing source data download location
    - source data delimiter is comma not tab
    - cleaning up properties (they need to be a dictionary, but node props were unnecessary anyway)

    simplifying/fixing qualifier handling
    - using predicates like RO:0002212 includes directionality and will normalize to qualified version, old implementation didn't work anyway, so this is better

commit d2bead1
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon Dec 2 11:46:07 2024 -0500

    fixing missing comma, making quote usage consistent

commit 46f8117
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Tue Nov 26 10:56:15 2024 -0500

    general clean up, fixing imports, removing template comments

commit bc1e533
Author: beasleyjonm <85600465+beasleyjonm@users.noreply.github.com>
Date:   Mon Nov 25 16:38:44 2024 -0500

    Changed how activity parameters are described in log-scale.

commit 896e7af
Author: beasleyjonm <85600465+beasleyjonm@users.noreply.github.com>
Date:   Sat Nov 23 21:03:48 2024 -0500

    Convert activity type mapping to log-scale.

    The activity and potency types on Drug Central are log-scaled, so let's add the "p" in front of the activity types to reflect that.

commit 1867684
Merge: 52149ee 224d0b5
Author: beasleyjonm <85600465+beasleyjonm@users.noreply.github.com>
Date:   Thu Nov 21 12:19:08 2024 -0500

    Merge pull request #260 from RobokopU24/Issue257

    Add in all Monarch KG edge properties on ingest.

commit 52149ee
Merge: 6b8f389 7bbdc00
Author: beasleyjonm <85600465+beasleyjonm@users.noreply.github.com>
Date:   Thu Nov 21 12:14:21 2024 -0500

    Merge pull request #268 from RobokopU24/collapsed_qualifiers_kg

    Collapsed qualifiers kg

commit 7bbdc00
Author: beasleyjonm <85600465+beasleyjonm@users.noreply.github.com>
Date:   Thu Nov 7 12:35:08 2024 -0500

    Update collapse_qualifiers.py

commit 0839bcc
Author: Jon-Michael Beasley <jmb@JonMichaelsMBP.lan>
Date:   Thu Oct 31 13:42:46 2024 -0400

    Updated to fix code and add option to create collapsed qualifier Neo4j dump.

commit 0abf23c
Author: Jon-Michael Beasley <jmb@JonMichaelsMBP.lan>
Date:   Thu Oct 31 11:30:51 2024 -0400

    Added script to collapse object qualifier statements to the edge predicates.

commit 56e8b5b
Author: James Chung <jchung@renci.org>
Date:   Thu Oct 31 08:14:15 2024 -0400

    return variable added

commit af3037a
Author: James Chung <jchung@renci.org>
Date:   Wed Oct 30 23:35:18 2024 -0400

    LINCS parsers first try

commit b2eef57
Author: James Chung <jchung@renci.org>
Date:   Wed Oct 30 23:34:52 2024 -0400

    LINCS parser first try

commit 713f527
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Tue Oct 29 10:51:47 2024 -0400

    setting environment variable

commit 26b1073
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Tue Oct 29 10:34:55 2024 -0400

    trying again with fresh eyes

commit 99fc5ee
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Tue Oct 29 10:29:49 2024 -0400

    trying again with fresh eyes

commit cc587c2
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Tue Oct 29 10:18:03 2024 -0400

    trying again with fresh eyes

commit 689d6e9
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Tue Oct 29 10:02:14 2024 -0400

    new script for action

commit ff79e6a
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Tue Oct 29 09:58:35 2024 -0400

    trying again with gresh eyes

commit 91d695e
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 17:23:43 2024 -0400

    error tracking/loggins

commit 7423a00
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 17:20:02 2024 -0400

    gitHub issues error

commit 24d3c91
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 17:16:19 2024 -0400

    split pull requests and issues

commit 736d185
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:58:49 2024 -0400

    split pull requests and issues

commit 55be1e5
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:50:50 2024 -0400

    lets try this with github-script@v6 instead...

commit 76bc6bb
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:44:57 2024 -0400

    fix double/single quote issue

commit 745d0c3
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:43:27 2024 -0400

    fix double/single quote issue

commit e887346
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:22:29 2024 -0400

    fix double/single quote issue

commit 979bc72
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:19:08 2024 -0400

    bump

commit ff0fc91
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:15:29 2024 -0400

    bump

commit 2ff1393
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:14:13 2024 -0400

    actually invoke python...

commit 3e95ae8
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:10:24 2024 -0400

    update to the latest versions of checkout and setup-python

commit e528db4
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:04:06 2024 -0400

    bump

commit 56e9918
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 15:01:21 2024 -0400

    Update requirements.txt

    Will need PyGithub for this...

commit 0d62da8
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Mon Oct 28 14:51:45 2024 -0400

    Added a GitHub action to automatically add the label "Biological Context QC" to issues and PRs that mention "predicate" or "biolink:"

    Automatically assign this event to Kathleen

commit 6b8f389
Merge: b5fcc32 f604791
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu Oct 24 15:40:23 2024 -0400

    Merge pull request #263 from RobokopU24/hgnc_fix

    updating HGNC file location and version date for new HGNC set up

commit f604791
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu Oct 24 13:23:43 2024 -0400

    updating HGNC file location and version date for new HGNC set up

commit b5fcc32
Merge: bc28a73 ff3099e
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu Oct 24 13:06:41 2024 -0400

    Merge pull request #262 from RobokopU24/dependabot/pip/mysql-connector-python-9.1.0

    Bump mysql-connector-python from 8.4.0 to 9.1.0

commit ff3099e
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 24 17:01:59 2024 +0000

    Bump mysql-connector-python from 8.4.0 to 9.1.0

    Bumps [mysql-connector-python](https://github.com/mysql/mysql-connector-python) from 8.4.0 to 9.1.0.
    - [Changelog](https://github.com/mysql/mysql-connector-python/blob/trunk/CHANGES.txt)
    - [Commits](mysql/mysql-connector-python@8.4.0...9.1.0)

    ---
    updated-dependencies:
    - dependency-name: mysql-connector-python
      dependency-type: direct:production
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 224d0b5
Author: Daniel Korn <dkorn@ht1.cluster>
Date:   Thu Sep 19 15:29:18 2024 -0400

    Added all available metadata present in Monarch KG to edge properties.

commit 5e9151b
Author: Daniel Korn <dkorn@ht1.cluster>
Date:   Thu Sep 19 13:03:18 2024 -0400

    Changed Monarch KG ingest to automatically pull from the latest version. Also made process to check the metadata yaml file for the publishing date of the latest version.

commit 90c6231
Author: DnlRKorn <6885702+DnlRKorn@users.noreply.github.com>
Date:   Thu Jul 25 13:57:52 2024 -0400

    Fixed regex pattern issue in loadCTD.py

    Made the regex pattern a "raw" string.

commit d63d4bb
Author: DnlRKorn <6885702+DnlRKorn@users.noreply.github.com>
Date:   Thu Jul 25 11:48:15 2024 -0400

    Dynamically load latest version of GenomeAlliance data

    Previously genome alliance data was frozen to version 5.3.0; added some code to get the latest version instead.
  • Loading branch information
EvanDietzMorris committed Dec 19, 2024
1 parent d4d979a commit 40bbb95
Show file tree
Hide file tree
Showing 14 changed files with 455 additions and 54 deletions.
46 changes: 46 additions & 0 deletions .github/scripts/Bio_QC_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import os
import requests

PREDICATE_KEYWORDS = ["predicate", "biolink:", "edges"]
LABEL_NAME = "Biological Context QC" # Label to add if keywords are found

# GitHub API variables
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
REPO_NAME = os.getenv("GITHUB_REPOSITORY")
ISSUE_NUMBER = os.getenv("ISSUE_NUMBER")
print("GITHUB_TOKEN:", GITHUB_TOKEN)
print("REPO_NAME:", REPO_NAME)
print("ISSUE_NUMBER:", ISSUE_NUMBER)

headers = {"Authorization": f"Bearer {GITHUB_TOKEN}"}
api_url = f"https://api.github.com/repos/{REPO_NAME}"

def get_issue_details(issue_number):
response = requests.get(f"{api_url}/issues/{issue_number}", headers=headers)
response.raise_for_status()
return response.json()

def add_label(issue_number, label_name):
response = requests.post(
f"{api_url}/issues/{issue_number}/labels",
headers=headers,
json={"labels": [label_name]}
)
response.raise_for_status()
print(f"Label '{label_name}' added to issue/PR #{issue_number}")

def check_keywords_in_text(text, keywords):
return any(keyword in text for keyword in keywords)

def main():
issue_details = get_issue_details(ISSUE_NUMBER)
title = issue_details["title"]
body = issue_details["body"]

if check_keywords_in_text(title, PREDICATE_KEYWORDS) or check_keywords_in_text(body, PREDICATE_KEYWORDS):
add_label(ISSUE_NUMBER, LABEL_NAME)
else:
print("No predicate keywords found.")

if __name__ == "__main__":
main()
32 changes: 32 additions & 0 deletions .github/workflows/label-predicate-changes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: 'Label Predicate Changes'

on:
pull_request:
types: [opened, edited, synchronize]
issues:
types: [opened, edited]

jobs:
label_check:
runs-on: ubuntu-latest

steps:
- name: Check out code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install PyGithub
- name: Run predicate check
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
run: |
python .github/scripts/Bio_QC_check.py
52 changes: 47 additions & 5 deletions Common/build_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,12 @@
from Common.supplementation import SequenceVariantSupplementation
from Common.meta_kg import MetaKnowledgeGraphBuilder, META_KG_FILENAME, TEST_DATA_FILENAME
from Common.redundant_kg import generate_redundant_kg
from Common.collapse_qualifiers import generate_collapsed_qualifiers_kg

NODES_FILENAME = 'nodes.jsonl'
EDGES_FILENAME = 'edges.jsonl'
REDUNDANT_EDGES_FILENAME = 'redundant_edges.jsonl'
COLLAPSED_QUALIFIERS_FILENAME = 'collapsed_qualifier_edges.jsonl'


class GraphBuilder:
Expand Down Expand Up @@ -118,6 +120,51 @@ def build_graph(self, graph_id: str):
generate_test_data=needs_test_data)

output_formats = graph_spec.graph_output_format.lower().split('+') if graph_spec.graph_output_format else []
nodes_filepath = os.path.join(graph_output_dir, NODES_FILENAME)
edges_filepath = os.path.join(graph_output_dir, EDGES_FILENAME)

if 'redundant_jsonl' in output_formats:
self.logger.info(f'Generating redundant edge KG for {graph_id}...')
redundant_filepath = edges_filepath.replace(EDGES_FILENAME, REDUNDANT_EDGES_FILENAME)
generate_redundant_kg(edges_filepath, redundant_filepath)

if 'redundant_neo4j' in output_formats:
self.logger.info(f'Generating redundant edge KG for {graph_id}...')
redundant_filepath = edges_filepath.replace(EDGES_FILENAME, REDUNDANT_EDGES_FILENAME)
generate_redundant_kg(edges_filepath, redundant_filepath)
self.logger.info(f'Starting Neo4j dump pipeline for redundant {graph_id}...')
dump_success = create_neo4j_dump(nodes_filepath=nodes_filepath,
edges_filepath=redundant_filepath,
output_directory=graph_output_dir,
graph_id=graph_id,
graph_version=graph_version,
logger=self.logger)

if dump_success:
graph_output_url = self.get_graph_output_URL(graph_id, graph_version)
graph_metadata.set_dump_url(f'{graph_output_url}graph_{graph_version}_redundant.db.dump')

if 'collapsed_qualifiers_jsonl' in output_formats:
self.logger.info(f'Generating collapsed qualifier predicates KG for {graph_id}...')
collapsed_qualifiers_filepath = edges_filepath.replace(EDGES_FILENAME, COLLAPSED_QUALIFIERS_FILENAME)
generate_collapsed_qualifiers_kg(edges_filepath, collapsed_qualifiers_filepath)

if 'collapsed_qualifiers_neo4j' in output_formats:
self.logger.info(f'Generating collapsed qualifier predicates KG for {graph_id}...')
collapsed_qualifiers_filepath = edges_filepath.replace(EDGES_FILENAME, COLLAPSED_QUALIFIERS_FILENAME)
generate_collapsed_qualifiers_kg(edges_filepath, collapsed_qualifiers_filepath)
self.logger.info(f'Starting Neo4j dump pipeline for {graph_id} with collapsed qualifiers...')
dump_success = create_neo4j_dump(nodes_filepath=nodes_filepath,
edges_filepath=collapsed_qualifiers_filepath,
output_directory=graph_output_dir,
graph_id=graph_id,
graph_version=graph_version,
logger=self.logger)

if dump_success:
graph_output_url = self.get_graph_output_URL(graph_id, graph_version)
graph_metadata.set_dump_url(f'{graph_output_url}graph_{graph_version}_collapsed_qualifiers.db.dump')

if 'neo4j' in output_formats:
self.logger.info(f'Starting Neo4j dump pipeline for {graph_id}...')
dump_success = create_neo4j_dump(nodes_filepath=nodes_filepath,
Expand All @@ -131,11 +178,6 @@ def build_graph(self, graph_id: str):
graph_output_url = self.get_graph_output_URL(graph_id, graph_version)
graph_metadata.set_dump_url(f'{graph_output_url}graph_{graph_version}.db.dump')

if 'redundant_jsonl' in output_formats:
self.logger.info(f'Generating redundant edge KG for {graph_id}...')
redundant_filepath = edges_filepath.replace(EDGES_FILENAME, REDUNDANT_EDGES_FILENAME)
generate_redundant_kg(edges_filepath, redundant_filepath)

def build_dependencies(self, graph_spec: GraphSpec):
for subgraph_source in graph_spec.subgraphs:
subgraph_id = subgraph_source.id
Expand Down
171 changes: 171 additions & 0 deletions Common/collapse_qualifiers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
try:
from tqdm import tqdm
TQDM_AVAILABLE = True
except ImportError:
TQDM_AVAILABLE = False

from Common.biolink_constants import PREDICATE, QUALIFIED_PREDICATE, SUBJECT_DERIVATIVE_QUALIFIER, SUBJECT_FORM_OR_VARIANT_QUALIFIER, SUBJECT_PART_QUALIFIER, \
SUBJECT_DIRECTION_QUALIFIER, SUBJECT_ASPECT_QUALIFIER, OBJECT_DERIVATIVE_QUALIFIER, OBJECT_FORM_OR_VARIANT_QUALIFIER, \
OBJECT_PART_QUALIFIER, OBJECT_DIRECTION_QUALIFIER, OBJECT_ASPECT_QUALIFIER, CAUSAL_MECHANISM_QUALIFIER, \
ANATOMICAL_CONTEXT_QUALIFIER, SPECIES_CONTEXT_QUALIFIER
from Common.biolink_utils import get_biolink_model_toolkit
from Common.utils import quick_jsonl_file_iterator
from Common.kgx_file_writer import KGXFileWriter

### The goal of this script is to collapse the qualifiers, which are in edge properties, into a single statement, then replace the
### existing predicate label with the collapsed qualifier statement.

### Call the biolink model toolkit to get the list of all qualifiers. This may change, but the way qualifiers are handled is currently hard-coded in this script.
bmt = get_biolink_model_toolkit()

def write_edge_no_q(edge, predicate, qualifiers):
tmp_edge = edge.copy()
tmp_edge[PREDICATE] = f"{predicate}"
for qualifier in qualifiers.keys():
tmp_edge.pop(qualifier, None)
return tmp_edge

def aspect_qualifier_semantic_adjustment(aspect_qualifier):
# TODO check if other aspect qualifiers besides molecular interaction need to be treated differently.
if aspect_qualifier.split('_')[-1] == 'interaction':
aspect_conversion = aspect_qualifier + "_with"
else:
aspect_conversion = aspect_qualifier + "_of"
return aspect_conversion

def form_or_variant_qualifier_semantic_adjustment(form_or_variant_qualifier):
# TODO check if other form_or_variant_qualifier qualifiers besides molecular interaction need to be treated differently.
form_or_variant_conversion = form_or_variant_qualifier + "_of"
return form_or_variant_conversion

def causal_mechanism_qualifier_semantic_adjustment(causal_mechanism_qualifier):
# TODO check if other causal_mechanism qualifiers besides molecular interaction need to be treated differently.
causal_mechanism_qualifier = "via_"+ causal_mechanism_qualifier
return causal_mechanism_qualifier

def species_context_qualifier_semantic_adjustment(species_context_qualifier):
species_context_qualifier = "in_"+ species_context_qualifier
return species_context_qualifier

def anatomical_context_qualifier_semantic_adjustment(anatomical_context_qualifier, species_context_qualifier=False):
if species_context_qualifier == False:
anatomical_context_qualifier = "in_"+ anatomical_context_qualifier
return anatomical_context_qualifier

def generate_collapsed_qualifiers_kg(infile, edges_file_path):

with KGXFileWriter(edges_output_file_path=edges_file_path) as kgx_file_writer:
for edge in tqdm(quick_jsonl_file_iterator(infile)) if TQDM_AVAILABLE else quick_jsonl_file_iterator(infile):

try:
edge_predicate = edge['predicate']
except KeyError:
print(f"Collapsed Qualifiers Graph Failed - missing predicate on edge: {edge}")
break

qualifiers = {key:value for key, value in edge.items() if bmt.is_qualifier(key)}
# Count the number of qualifiers and print a warning if number of qualifiers we handle in the next section doesn't match number of qualifiers detected.
# This will help warn us if new qualifiers are added in the future while giving us the option to still run the script as is.
qualifier_count = len(qualifiers.keys())
counted_qualifiers = 0

# The following section crafts a new collapsed qualifier statement to replace the edge predicate, but needs to do some semantic adjustment.
# This is where to edit if the biolink model ever changes and handles qualifiers differently.
# Take guidance from: https://biolink.github.io/biolink-model/reading-a-qualifier-based-statement/
# Example jsonl edge used here: {"subject":"UNII:7PK6VC94OU","predicate":"biolink:affects","object":"NCBIGene:6531","primary_knowledge_source":"infores:ctd","description":"decreases activity of","NCBITaxon":"9606","publications":["PMID:30776375"],"knowledge_level":"knowledge_assertion","agent_type":"manual_agent","subject_direction_qualifier":"increased","subject_aspect_qualifier":"abundance","subject_form_or_variant_qualifier":"mutant_form","subject_derivative_qualifier":"transcript","subject_part_qualifier":"polyA_tail","object_aspect_qualifier":"activity","object_direction_qualifier":"upregulated","object_form_or_variant_qualifier":"wildtype_form","object_derivative_qualifier":"protein","object_part_qualifier":"catalytic_site","causal_mechanism_qualifier":"phosyphorylation","species_context_qualifier":"human","anatomical_context_qualifier":"liver","qualified_predicate":"biolink:causes"}

qualifier_statement = ""

# Add on subject direction and aspect qualifiers first. eg. "increased_abundance_of_"
if SUBJECT_DIRECTION_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= qualifiers[SUBJECT_DIRECTION_QUALIFIER]
qualifier_statement+= "_"
if SUBJECT_ASPECT_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= aspect_qualifier_semantic_adjustment(qualifiers[SUBJECT_ASPECT_QUALIFIER])
qualifier_statement+= "_"
# Add on subject form_or_variant qualifiers. eg. "increased_abundance_of_mutant_form_of_<subject_node>"
if SUBJECT_FORM_OR_VARIANT_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= form_or_variant_qualifier_semantic_adjustment(qualifiers[SUBJECT_FORM_OR_VARIANT_QUALIFIER])
qualifier_statement+= "_"
# Add placeholder slot for subject node. eg. "increased_abundance_of_mutant_form_of_<subject_node>"
qualifier_statement+= "<subject_node>_"
# Add on subject derivative and part qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail"
if SUBJECT_DERIVATIVE_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= qualifiers[SUBJECT_DERIVATIVE_QUALIFIER]
qualifier_statement+= "_"
if SUBJECT_PART_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= qualifiers[SUBJECT_PART_QUALIFIER]
qualifier_statement+= "_"

# Add the qualified predicate. eg. "increased_abundance_of_mutant_form_of_<subject_node>_transcript_poly_A_tail_causes"
if QUALIFIED_PREDICATE in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= qualifiers[QUALIFIED_PREDICATE].replace("biolink:","")
qualifier_statement+= "_"

# Add on object direction and aspect qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of"
if OBJECT_DIRECTION_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= qualifiers[OBJECT_DIRECTION_QUALIFIER]
qualifier_statement+= "_"
if OBJECT_ASPECT_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= aspect_qualifier_semantic_adjustment(qualifiers[OBJECT_ASPECT_QUALIFIER])
qualifier_statement+= "_"
# Add on object form_or_variant qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of_mutant_form_of"
if OBJECT_FORM_OR_VARIANT_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= form_or_variant_qualifier_semantic_adjustment(qualifiers[OBJECT_FORM_OR_VARIANT_QUALIFIER])
qualifier_statement+= "_"
# Add placeholder slot for object node. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of_mutant_form_of_<object_node>"
qualifier_statement+= "<object_node>"

# Add on object derivative and part qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of_mutant_form_of_<object_node>_protein_catalytic_site"
# Need to start putting "_" before each qualifier as any given one could be the last in the statement.
if OBJECT_DERIVATIVE_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= "_"
qualifier_statement+= qualifiers[OBJECT_DERIVATIVE_QUALIFIER]
if OBJECT_PART_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= "_"
qualifier_statement+= qualifiers[OBJECT_PART_QUALIFIER]

# Add on mechanism qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of_mutant_form_of_<object_node>_protein_catalytic_site_via_phosphorylation"
if CAUSAL_MECHANISM_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= "_"
qualifier_statement+= causal_mechanism_qualifier_semantic_adjustment(qualifiers[CAUSAL_MECHANISM_QUALIFIER])

# Add on species qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of_mutant_form_of_<object_node>_protein_catalytic_site_via_phosphorylation_in_human"
if SPECIES_CONTEXT_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= "_"
qualifier_statement+= species_context_qualifier_semantic_adjustment(qualifiers[SPECIES_CONTEXT_QUALIFIER])

# Add on anatomical context qualifiers. eg. "increased_abundance_of_mutant_form_of<subject_node>_transcript_poly_A_tail_causes_upregulated_activity_of_mutant_form_of_<object_node>_protein_catalytic_site_via_phosphorylation_in_human_liver"
if ANATOMICAL_CONTEXT_QUALIFIER in qualifiers.keys():
counted_qualifiers+= 1
qualifier_statement+= "_"
if SPECIES_CONTEXT_QUALIFIER in qualifiers.keys():
species_qualifier = True
else:
species_qualifier = False
qualifier_statement+= anatomical_context_qualifier_semantic_adjustment(qualifiers[ANATOMICAL_CONTEXT_QUALIFIER], species_qualifier)

if counted_qualifiers < qualifier_count:
print(f"Qualifiers on edge: {edge} are not all being handled correctly. Please revise collapse_qualifiers.py to handle all qualifiers.")

# Either rewrite the original edge if no qualifier collapsing happened, or rewrite with new predicate from qualifier_statement.
edges_to_write = []
if qualifier_statement != "":
edges_to_write.append(write_edge_no_q(edge, qualifier_statement, qualifiers))
else:
edges_to_write.append(edge)

kgx_file_writer.write_normalized_edges(edges_to_write)
2 changes: 2 additions & 0 deletions Common/data_sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
HMDB = 'HMDB'
HUMAN_GOA = 'HumanGOA'
INTACT = 'IntAct'
LINCS = 'LINCS'
LITCOIN = 'LitCoin'
LITCOIN_W_BAGEL_SERVICE = 'LitCoinBagelService'
LITCOIN_ENTITY_EXTRACTOR = 'LitCoinEntityExtractor'
Expand Down Expand Up @@ -67,6 +68,7 @@
HUMAN_GOA: ("parsers.GOA.src.loadGOA", "HumanGOALoader"),
HUMAN_STRING: ("parsers.STRING.src.loadSTRINGDB", "HumanSTRINGDBLoader"),
INTACT: ("parsers.IntAct.src.loadIA", "IALoader"),
LINCS: ("parsers.LINCS.src.loadLINCS", "LINCSLoader"),
LITCOIN: ("parsers.LitCoin.src.loadLitCoin", "LitCoinLoader"),
LITCOIN_W_BAGEL_SERVICE: ("parsers.LitCoin.src.loadLitCoin", "LitCoinBagelServiceLoader"),
# LITCOIN_ENTITY_EXTRACTOR: ("parsers.LitCoin.src.loadLitCoin", "LitCoinEntityExtractorLoader"),
Expand Down
3 changes: 2 additions & 1 deletion parsers/BINDING/src/loadBINDINGDB.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
from Common.utils import GetData, GetDataPullError
from Common.loader_interface import SourceDataLoader
from Common.extractor import Extractor
from Common.biolink_constants import PUBLICATIONS, AFFINITY, AFFINITY_PARAMETER, KNOWLEDGE_LEVEL, AGENT_TYPE, KNOWLEDGE_ASSERTION, MANUAL_AGENT
from Common.biolink_constants import PUBLICATIONS, AFFINITY, AFFINITY_PARAMETER, KNOWLEDGE_LEVEL, AGENT_TYPE, \
KNOWLEDGE_ASSERTION, MANUAL_AGENT

# Full Binding Data.

Expand Down
2 changes: 1 addition & 1 deletion parsers/CTD/src/loadCTD.py
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ def convert_predicates(predicate):
:return:
"""
# the capture regex
regex = '\/|\ |\^'
regex = r'\/|\ |\^'

# clean up the predicate
cleaned_predicate = re.sub(regex, '_', predicate)
Expand Down
Loading

0 comments on commit 40bbb95

Please sign in to comment.