-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow extended filtering on relationships #524
base: develop
Are you sure you want to change the base?
Allow extended filtering on relationships #524
Conversation
I'm confused. Even if we disregard that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wholeheartedly support this, and indeed have been breaking the spec a bit myself to allow filtering on DOIs of references via this exact mechanism. This would be straightforward to enable in optimade-python-tools, as we currently hvae to disable filtering on fields other than ID in joins.
The only issue I see here is touched on by @rartino, previously we would allow filtering on the relationship description (which nobody ever used, and the filter was never implemented afaik), but somehow this is a breaking change that removes this functionality. Since we do not have a top-level description field of an entry, this could be kept in unambiguously, though slightly annoying for implementers, I think it makes sense (and we could never standardize a field called description for each entry in this case).
optimade.rst
Outdated
**Note**: formulating queries on relationships with entries that have specific property values is a multi-step process. | ||
For example, to find all structures with bibliographic references where one of the authors has the last name "Schmidt" is performed by the following two steps: | ||
Support for queries on fields of arbitrary depth is OPTIONAL. | ||
For example, search for all structures with bibliographic references where one of the authors has the last name "Schmidt" could be performed with the following query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually need to revise this query example, since "name" is the required field for a references
author and lastname
is optional. How about a filter on DOI?
Or (in addition to this example) something more standardized, give me all references that correspond to structures of a given formula (or composition range)?
e.g., all literature references for Tantalum: /v1/references?filter=structures.elements HAS "Ta"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query on all structures coming from the same DOI sounds good. One more suggestion of a bit more complicated query would be to filter for structures of certain anonymous formula related to publications appearing in year 2024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the said examples in 85c0a94, can you give them a look and mark as resolved if OK?
A somewhat drastic move would be to say that this design choice was a bug 😅 In a sense this would not be false, as
|
I don't think filtering on the human readable description of a relationship is particularly crucial. However, the "role" of a relationship discussed in #523 is crucial to be able to filter on. (E.g. "does this calculation have an output structure that contains sodium"). |
Since this PR asks to change how the dot operator works, I think we should try to peek forward enough so we do not paint ourselves into another corner. Are there good reasons to not let the relationship still be the 'relationship object' (or strictly in OPTIMADE a list of relationship objects) and then let the target of the relationship object explicitly live under, e.g., 'target'? To clarify, we could adopt a model of the structures relationship in an optimade calculation to work like:
This would mean that a query for all calculations that have an output structure that contains sodium could be expressed as:
(As I type this up, I realize that we are actually missing the construct with the "inner" HAS for zip list constructs in our grammar, but that seems like an oversight that we should fix, given that we support e.g. the full set of fuzzy string operators in that position.) Edit: note, this means that @ml-evs's query on DOIs now would be, e.g.:
|
I am not sure if I am getting this right, so bear with me. Do you mean retaining entry's relationships as they are (v1.2.0) and then including explicit copies of the related entries inside entry's |
I have had a query related to this and think it would be worth discussing again at an upcoming meeting. For my implementations I am still breaking compliance by allowing filtering on |
From the web meeting: how do one filter on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This quick review just works up the target
that we discussed in the meeting. It feels a little cumbersome and feels strange to be introducing a filter-specific syntax inside a "field" name, but it achieves our goals in a backwards-compatible way and there seemed to be consensus (plus it is easy for servers to implement). I tried to come up with some other options (e.g., relationships.structures.nsites = 20
or included.structures.nsites=20
) that match other existing keywords, but they are also not satisfactory (and would be more of a breaking change).
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
I have updated the PR according to comments on Friday's web meeting. The PR now retains full backwards compatibility and allows for extended filtering on relationships through I did not implement |
This PR addresses #437 by allowing fully-fledged filtering on relationships. Prior to it, filtering was allowed, but only on
id
andmeta.description
of relationships. Now all properties of related entries should in principle be allowed to filter on, to arbitrary depth (OPTIONAL).This introduces a slight backwards-incompatibility by abandoning special treatment fordescription
. Thus queries which previously matched/mismatched, but were perfectly legal, now will return HTTP 400 for all standard entry types exceptfiles
wheredescription
property is defined since v1.2.0. I am not sure how severe this incompatibility is - if we want to avoid it, we will have to mandate the special handling ofdescription
by re-routing it tometa.description
of a relationship instead of entry's property.Edit: Full backwards compatibility is now restored, entry properties are now accessible through
target
.Fixes #437.