Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow extended filtering on relationships #524

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

merkys
Copy link
Member

@merkys merkys commented Jun 12, 2024

This PR addresses #437 by allowing fully-fledged filtering on relationships. Prior to it, filtering was allowed, but only on id and meta.description of relationships. Now all properties of related entries should in principle be allowed to filter on, to arbitrary depth (OPTIONAL).

This introduces a slight backwards-incompatibility by abandoning special treatment for description. Thus queries which previously matched/mismatched, but were perfectly legal, now will return HTTP 400 for all standard entry types except files where description property is defined since v1.2.0. I am not sure how severe this incompatibility is - if we want to avoid it, we will have to mandate the special handling of description by re-routing it to meta.description of a relationship instead of entry's property.

Edit: Full backwards compatibility is now restored, entry properties are now accessible through target.

Fixes #437.

@merkys merkys requested review from rartino and ml-evs June 12, 2024 11:29
@rartino
Copy link
Contributor

rartino commented Jun 12, 2024

I'm confused. Even if we disregard that the meta subfield of the JSON:API relationship only makes sense in JSON:API, what in this PR explains why the meta subfield accesses the meta of the relationship, rather than the meta of the linked object?

@merkys merkys added the topic/filtering-language Issue discussing changes and improvements to the query and filtering language label Jun 13, 2024
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholeheartedly support this, and indeed have been breaking the spec a bit myself to allow filtering on DOIs of references via this exact mechanism. This would be straightforward to enable in optimade-python-tools, as we currently hvae to disable filtering on fields other than ID in joins.

The only issue I see here is touched on by @rartino, previously we would allow filtering on the relationship description (which nobody ever used, and the filter was never implemented afaik), but somehow this is a breaking change that removes this functionality. Since we do not have a top-level description field of an entry, this could be kept in unambiguously, though slightly annoying for implementers, I think it makes sense (and we could never standardize a field called description for each entry in this case).

optimade.rst Outdated
**Note**: formulating queries on relationships with entries that have specific property values is a multi-step process.
For example, to find all structures with bibliographic references where one of the authors has the last name "Schmidt" is performed by the following two steps:
Support for queries on fields of arbitrary depth is OPTIONAL.
For example, search for all structures with bibliographic references where one of the authors has the last name "Schmidt" could be performed with the following query:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually need to revise this query example, since "name" is the required field for a references author and lastname is optional. How about a filter on DOI?

Or (in addition to this example) something more standardized, give me all references that correspond to structures of a given formula (or composition range)?

e.g., all literature references for Tantalum: /v1/references?filter=structures.elements HAS "Ta"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query on all structures coming from the same DOI sounds good. One more suggestion of a bit more complicated query would be to filter for structures of certain anonymous formula related to publications appearing in year 2024.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the said examples in 85c0a94, can you give them a look and mark as resolved if OK?

optimade.rst Show resolved Hide resolved
@merkys
Copy link
Member Author

merkys commented Jun 14, 2024

The only issue I see here is touched on by @rartino, previously we would allow filtering on the relationship description (which nobody ever used, and the filter was never implemented afaik), but somehow this is a breaking change that removes this functionality.

A somewhat drastic move would be to say that this design choice was a bug 😅 In a sense this would not be false, as meta for references object is JSON:API-specific, and we are trying to keep a generalized approach as much as possible.

Since we do not have a top-level description field of an entry, this could be kept in unambiguously, though slightly annoying for implementers, I think it makes sense (and we could never standardize a field called description for each entry in this case).

files entry type has top-level description property since v1.2.0, which complicates things a bit.

@merkys merkys mentioned this pull request Jun 15, 2024
@rartino
Copy link
Contributor

rartino commented Jun 15, 2024

I don't think filtering on the human readable description of a relationship is particularly crucial. However, the "role" of a relationship discussed in #523 is crucial to be able to filter on. (E.g. "does this calculation have an output structure that contains sodium").

@merkys
Copy link
Member Author

merkys commented Jun 15, 2024

However, the "role" of a relationship discussed in #523 is crucial to be able to filter on. (E.g. "does this calculation have an output structure that contains sodium").

Right, I agree that #523 needs further discussion. Nevertheless #523 should not block this PR.

@rartino
Copy link
Contributor

rartino commented Jun 15, 2024

Since this PR asks to change how the dot operator works, I think we should try to peek forward enough so we do not paint ourselves into another corner.

Are there good reasons to not let the relationship still be the 'relationship object' (or strictly in OPTIMADE a list of relationship objects) and then let the target of the relationship object explicitly live under, e.g., 'target'?

To clarify, we could adopt a model of the structures relationship in an optimade calculation to work like:

"structures": [
  {
    "description": "Initial structure for relaxation",
    "role": "input",
    "target": <here lives the structure entry>
},
{
    "description": "Final structure after relaxation",
    "role": "output",
    "target": <here lives the structure entry>
}

This would mean that a query for all calculations that have an output structure that contains sodium could be expressed as:

calculations?filter=structures.role:structures.target.elements HAS "output":HAS "Na"

(As I type this up, I realize that we are actually missing the construct with the "inner" HAS for zip list constructs in our grammar, but that seems like an oversight that we should fix, given that we support e.g. the full set of fuzzy string operators in that position.)

Edit: note, this means that @ml-evs's query on DOIs now would be, e.g.:

structures?filter=references.target.doi HAS "10.1103/PhysRev.140.A1133"

@merkys
Copy link
Member Author

merkys commented Jun 21, 2024

Are there good reasons to not let the relationship still be the 'relationship object' (or strictly in OPTIMADE a list of relationship objects) and then let the target of the relationship object explicitly live under, e.g., 'target'?

I am not sure if I am getting this right, so bear with me. Do you mean retaining entry's relationships as they are (v1.2.0) and then including explicit copies of the related entries inside entry's "attributes"? If so, I think explicit inclusion might be redundant as JSON:API's "included" will cater for that already. Moreover, we may encounter arbitrary depth issues, which JSON:API avoids by only providing immediate relationships. Nevertheless such approach would indeed greatly simplify the specification of dot operator in filters.

@ml-evs
Copy link
Member

ml-evs commented Nov 4, 2024

I have had a query related to this and think it would be worth discussing again at an upcoming meeting. For my implementations I am still breaking compliance by allowing filtering on references.doi...

@ml-evs ml-evs self-requested a review November 15, 2024 16:40
@rartino
Copy link
Contributor

rartino commented Nov 15, 2024

From the web meeting: how do one filter on role=input AND target.doi = XYZ? Some kind of adaption of the zip syntax, probably.

@ml-evs ml-evs self-assigned this Nov 15, 2024
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This quick review just works up the target that we discussed in the meeting. It feels a little cumbersome and feels strange to be introducing a filter-specific syntax inside a "field" name, but it achieves our goals in a backwards-compatible way and there seemed to be consensus (plus it is easy for servers to implement). I tried to come up with some other options (e.g., relationships.structures.nsites = 20 or included.structures.nsites=20) that match other existing keywords, but they are also not satisfactory (and would be more of a breaking change).

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
merkys and others added 7 commits November 18, 2024 09:41
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
@merkys
Copy link
Member Author

merkys commented Nov 18, 2024

I have updated the PR according to comments on Friday's web meeting. The PR now retains full backwards compatibility and allows for extended filtering on relationships through target field.

I did not implement role field yet which has as well been mentioned as I feel it deserves more discussion. If role is supposed to take values from some enumerator, its values has to be discussed. If role is supposed to be free-form, then description could be reused instead.

@merkys merkys requested a review from ml-evs November 18, 2024 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/filtering-language Issue discussing changes and improvements to the query and filtering language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extended filtering on relationships
3 participants