Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing of round trip serialization #480

Merged
merged 30 commits into from
Jan 13, 2024
Merged

Improve testing of round trip serialization #480

merged 30 commits into from
Jan 13, 2024

Conversation

hrshdhgd
Copy link
Contributor

@hrshdhgd hrshdhgd commented Dec 14, 2023

This fixes: mapping-commons/sssom#321

So the issue was when a tsv file is sssom converted to a JSON file and then the same JSON file is sssom parsed back, there was an error thrown :

ValueError: ['Some prefixes'] are used in the SSSOM mapping set but it does not exist in the prefix map

This happened because the @context block of the JSON file which contained the prefix map was never considered while forming the prefix_map and the metadata items (mapping_set_id, license etc) were also ignored. In addition to this, an absence of the external metadata file path param made it get the default metadata and prefix map internally which is a clear bug in the system.

This PR resolves that issue.

@hrshdhgd hrshdhgd marked this pull request as draft December 14, 2023 02:15
@hrshdhgd hrshdhgd marked this pull request as ready for review December 14, 2023 02:53
@hrshdhgd hrshdhgd requested a review from matentzn December 14, 2023 02:53
src/sssom/parsers.py Outdated Show resolved Hide resolved
src/sssom/parsers.py Outdated Show resolved Hide resolved
src/sssom/parsers.py Outdated Show resolved Hide resolved
tests/data/basic_subset.tsv Show resolved Hide resolved
tests/test_parsers.py Outdated Show resolved Hide resolved
tests/test_parsers.py Outdated Show resolved Hide resolved
@hrshdhgd hrshdhgd requested a review from matentzn December 14, 2023 14:27
src/sssom/parsers.py Outdated Show resolved Hide resolved
@hrshdhgd hrshdhgd requested a review from matentzn December 14, 2023 22:10
@cthoyt
Copy link
Member

cthoyt commented Jan 2, 2024

I am pretty skeptical of this. Can you make a 100% explicit test that shows what is going on here? Loading these opaque files makes it difficult to review this PR

@matentzn matentzn marked this pull request as draft January 4, 2024 11:04
@matentzn matentzn self-assigned this Jan 4, 2024
@matentzn matentzn marked this pull request as ready for review January 4, 2024 15:57
@matentzn matentzn requested a review from cthoyt January 4, 2024 15:57
@matentzn
Copy link
Collaborator

matentzn commented Jan 4, 2024

@cthoyt I have now updated the code to reflect your design in the parse_sssom_table method. As we wont have Harshad for a while, and I will have to deal with maintaining SSSOM for a bit, if you can find a minute to see if what I am doing makes sense, that would be much appreciated.

tests/test_parsers.py Outdated Show resolved Hide resolved
@cthoyt cthoyt changed the title SSSOM tsv => json => tsv Improve testing of round trip serialization Jan 5, 2024
@cthoyt
Copy link
Member

cthoyt commented Jan 13, 2024

@matentzn current issues are now due to data quality with the test data. I can't address further since it's going to require debugging some mismatch between the RDF and the LinkML schema

INFO:root:Unmapped predicated: {rdflib.term.URIRef('https://w3id.org/sssom/comment'), rdflib.term.URIRef('https://w3id.org/sssom/creator_id'), rdflib.term.URIRef('https://w3id.org/sssom/license'), rdflib.term.URIRef('https://w3id.org/sssom/mapping_date')}
INFO:root:Triple processed = 2459, unprocessed = 86


Ran 1 test in 6.724s

FAILED (errors=1)

Error
Traceback (most recent call last):
  File "/Users/cthoyt/dev/sssom-py/tests/test_parsers.py", line 250, in test_parse_sssom_rdf
    msdf = from_sssom_rdf(g=self.rdf_graph, prefix_map=self.df_converter, meta=self.metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/dev/sssom-py/src/sssom/parsers.py", line 451, in from_sssom_rdf
    RDFLibLoader().load(
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 256, in load
    objs = self.from_rdf_graph(g, schemaview=schemaview, target_class=target_class, prefix_map=prefix_map, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 205, in from_rdf_graph
    return [target_class(**x) for x in root_dicts]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 205, in <listcomp>
    return [target_class(**x) for x in root_dicts]
            ^^^^^^^^^^^^^^^^^
  File "<string>", line 31, in __init__
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/sssom_schema/datamodel/sssom_schema.py", line 110, in __post_init__
    self.MissingRequiredField("license")
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.11/site-packages/linkml_runtime/utils/yamlutils.py", line 273, in MissingRequiredField
    raise ValueError(f"{field_name} must be supplied")
ValueError: license must be supplied

this is, indeed, 141, and was wrong before.
@matentzn matentzn requested a review from cthoyt January 13, 2024 06:16
@matentzn
Copy link
Collaborator

@cthoyt thank you for your help on this; I fixed the remaining test and now everything passes. If you don't see any further issue, I think this is ready to go!

@cthoyt cthoyt merged commit f40a1c8 into master Jan 13, 2024
6 checks passed
@cthoyt cthoyt deleted the issue-321 branch January 13, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON serialisation format is woefully underspecified
3 participants