Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate kgx files against monarch-app schema #478

Open
kevinschaper opened this issue Nov 16, 2023 · 1 comment
Open

Validate kgx files against monarch-app schema #478

kevinschaper opened this issue Nov 16, 2023 · 1 comment
Assignees

Comments

@kevinschaper
Copy link
Member

The iri column is coming in from kg-phenio, through monarch-ingest. It's not yet defined in the schema, so Solr represents it as a multivalued column, which isn't what we want.

For the moment, #474 is going out of its way to trim the iri field out of Solr documents to avoid problems when creating pydantic instances, and this issue is so that we don't lose track of that hack.

On the monarch-ingest / linkml-solr side, we probably want to avoid passing extra fields from the tsv file to Solr. It would have probably been better to get an index-time error.

As for iri itself, right now we handle that expansion in via curies in the app, so if we include it, it would only be for phenio. We could also make the choice to populate it for other entities? or we could leave it out of our kg-phenio ingest, and then stick with only handling curie expansion in the code level.

@kevinschaper kevinschaper added this to the 2023-12 Release milestone Nov 16, 2023
@kevinschaper kevinschaper self-assigned this Nov 16, 2023
@kevinschaper kevinschaper changed the title iri field wasn't in schema, but still made it into Solr from kg-phenio Validate kgx files against monarch-app schema Mar 26, 2024
@kevinschaper
Copy link
Member Author

I want to add a note that I tried this out, and found that there were a lot of false negatives where linkml-validate complained about types, like nodes where the name is a number would fail for not being a string, or that single values in multivalued fields were erroneously not lists. We probably want to run as a module rather than from the cli, so that we can swallow some categories of errors - or we want to validate against a more type-defined file

@kevinschaper kevinschaper removed this from the 2024-05 Release milestone Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants