Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO: top-level mappings tab has inaccuracies #297

Open
jvendetti opened this issue Jan 9, 2024 · 6 comments
Open

GO: top-level mappings tab has inaccuracies #297

jvendetti opened this issue Jan 9, 2024 · 6 comments

Comments

@jvendetti
Copy link
Member

From @caufieldjh:

Some mappings are not accessible? E.g. if I am on https://bioportal.bioontology.org/ontologies/GO/?p=mappings and I click CYTOKINE I get “No mappings found”


The top-level Mappings tab in BioPortal for the GO ontology shows a mapping count of 59 for mappings between GO and CYTOKINE:

Screenshot 2024-01-09 at 11 06 52 AM

There is no such CYTOKINE ontology in BioPortal currently. I checked the production server and there is no physical directory that matches this ontology acronym.

@jvendetti
Copy link
Member Author

@alexskr - with regard to the above problem description, I'm wondering what the status is of the script that generates mapping counts? I know that we resurrected this process now that we're on Allegrograph, and on December 15th you mentioned in the bioportal-operations Slack channel that it was running in production. Did it run to completion? Are we back to executing once weekly on Saturday nights?

My first two ideas here are that 1) there's an issue in our mapping count generation code, or 2) if an ontology is deleted, somehow the associated triples that materialize mappings aren't cleanly removed from the triplestore.

@alexskr
Copy link
Member

alexskr commented Jan 9, 2024

mappings count job is enabled and completes successfully according to the logs.

BioPortal has SYTOKINE ontology with the CYTO acronym.

@caufieldjh
Copy link

So CYTOKINE as an ontology is here: https://bioportal.bioontology.org/ontologies/CYTO
but the mapping link above (https://bioportal.bioontology.org/mappings/GO?target=https%3A%2F%2Fdata.bioontology.org%2Fontologies%2FCYTO) specifies CYTO has no mappings

@jvendetti
Copy link
Member Author

Ugh. Not enough caffeine this morning. 😩

Indeed CYTOKINE is not an ontology acronym, but rather the name of the ontology, and the acronym is CTYO, accessible here: https://bioportal.bioontology.org/ontologies/CYTO. This doesn't appear to be an issue with the Rails application, as the relevant REST call to retrieve the mappings between GO and CTYO is returning a total count value of 59 along with an empty collection.

Screenshot 2024-01-09 at 2 33 05 PM

@alexskr
Copy link
Member

alexskr commented Jan 9, 2024

@jvendetti
Copy link
Member Author

I took a detailed look at the log file for the last run of the cron_mapping_counts job, in particular the section that contains log output for calculation of the pairwise mapping counts. For any given ontology, it looks like this:

I, [2024-01-06T06:55:42.922878 #24967]  INFO -- : Ontology: GO. 539 mapping pair counts to record...
I, [2024-01-06T06:55:42.922926 #24967]  INFO -- : ------------------------------------------------
I, [2024-01-06T06:55:42.922949 #24967]  INFO -- : Mapping count saved for the pair [GO, PW]: 46. 538 counts remaining for GO...
I, [2024-01-06T06:55:42.922967 #24967]  INFO -- : Mapping count saved for the pair [GO, OHMI]: 5. 537 counts remaining for GO...
I, [2024-01-06T06:55:42.922981 #24967]  INFO -- : Mapping count saved for the pair [GO, CIDIT_V1_2]: 731. 536 counts remaining for GO...

# ... and so on

If you spot check any of the pairwise counts that appear in the log, BioPortal returns mapping data. For example - for the first log entry above, the relevant REST call would be https://data.bioontology.org/mappings?ontologies=GO,PW, and it returns the expected collection of mappings with 46 elements.

For the cases mentioned in previous comments where BioPortal shows a mapping count, but no mappings are materialized, the log file shows no entries for pairwise count calculation. In other words, I searched the log file for entries like this:

Mapping count saved for the pair [GO, CYTO]
Mapping count saved for the pair [CYTO, GO-EXT]
Mapping count saved for the pair [CYTO, CL]

... and came up with nothing.

CYTO is a very old ontology, last uploaded in 2015. One possible scenario is that mappings between CYTO and these other ontologies existed at some point in the past and the counts were persisted in the triplestore. It doesn't look like queries issued against the current triplestore content locate mappings between these ontologies. I don't see any logic in the codebase that would handle the case where a MappingCount object gets removed because mappings that once existed between two ontologies no longer exist.

Persisting mapping counts in the triplestore is suboptimal and was developed to workaround 4store scaling issues. Now that BioPortal runs on AllegroGraph, it would be ideal to see if we could return to using COUNT queries in our live system, rather than relying on persisted counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants