-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GO: top-level mappings tab has inaccuracies #297
Comments
@alexskr - with regard to the above problem description, I'm wondering what the status is of the script that generates mapping counts? I know that we resurrected this process now that we're on Allegrograph, and on December 15th you mentioned in the bioportal-operations Slack channel that it was running in production. Did it run to completion? Are we back to executing once weekly on Saturday nights? My first two ideas here are that 1) there's an issue in our mapping count generation code, or 2) if an ontology is deleted, somehow the associated triples that materialize mappings aren't cleanly removed from the triplestore. |
mappings count job is enabled and completes successfully according to the logs. BioPortal has SYTOKINE ontology with the CYTO acronym. |
So CYTOKINE as an ontology is here: https://bioportal.bioontology.org/ontologies/CYTO |
Ugh. Not enough caffeine this morning. 😩 Indeed CYTOKINE is not an ontology acronym, but rather the name of the ontology, and the acronym is CTYO, accessible here: https://bioportal.bioontology.org/ontologies/CYTO. This doesn't appear to be an issue with the Rails application, as the relevant REST call to retrieve the mappings between GO and CTYO is returning a total count value of 59 along with an empty collection. |
mappings between CYTO and a few other ontologies similarly has a positive total count value but return empty collection: https://bioportal.bioontology.org/mappings/CYTO?target=https%3A%2F%2Fdata.bioontology.org%2Fontologies%2FGO-EXT |
I took a detailed look at the log file for the last run of the I, [2024-01-06T06:55:42.922878 #24967] INFO -- : Ontology: GO. 539 mapping pair counts to record...
I, [2024-01-06T06:55:42.922926 #24967] INFO -- : ------------------------------------------------
I, [2024-01-06T06:55:42.922949 #24967] INFO -- : Mapping count saved for the pair [GO, PW]: 46. 538 counts remaining for GO...
I, [2024-01-06T06:55:42.922967 #24967] INFO -- : Mapping count saved for the pair [GO, OHMI]: 5. 537 counts remaining for GO...
I, [2024-01-06T06:55:42.922981 #24967] INFO -- : Mapping count saved for the pair [GO, CIDIT_V1_2]: 731. 536 counts remaining for GO...
# ... and so on If you spot check any of the pairwise counts that appear in the log, BioPortal returns mapping data. For example - for the first log entry above, the relevant REST call would be https://data.bioontology.org/mappings?ontologies=GO,PW, and it returns the expected collection of mappings with 46 elements. For the cases mentioned in previous comments where BioPortal shows a mapping count, but no mappings are materialized, the log file shows no entries for pairwise count calculation. In other words, I searched the log file for entries like this:
... and came up with nothing. CYTO is a very old ontology, last uploaded in 2015. One possible scenario is that mappings between CYTO and these other ontologies existed at some point in the past and the counts were persisted in the triplestore. It doesn't look like queries issued against the current triplestore content locate mappings between these ontologies. I don't see any logic in the codebase that would handle the case where a Persisting mapping counts in the triplestore is suboptimal and was developed to workaround 4store scaling issues. Now that BioPortal runs on AllegroGraph, it would be ideal to see if we could return to using COUNT queries in our live system, rather than relying on persisted counts. |
From @caufieldjh:
The top-level Mappings tab in BioPortal for the GO ontology shows a mapping count of 59 for mappings between GO and CYTOKINE:
There is no such CYTOKINE ontology in BioPortal currently. I checked the production server and there is no physical directory that matches this ontology acronym.
The text was updated successfully, but these errors were encountered: