-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add SearchAfterMixin for ES search_after capability #4536
base: master
Are you sure you want to change the base?
Conversation
2262eb3
to
a70ebb7
Compare
2b92b48
to
d93f032
Compare
mock_document = MagicMock() | ||
mock_search = MagicMock() | ||
mock_document.search.return_value = mock_search | ||
|
||
mock_search.query.return_value = mock_search | ||
mock_search.sort.return_value = mock_search | ||
mock_search.extra.return_value = mock_search | ||
|
||
mock_result1 = MagicMock() | ||
mock_result1.pk = 1 | ||
mock_result1.meta.sort = ["sort1"] | ||
|
||
mock_result2 = MagicMock() | ||
mock_result2.pk = 2 | ||
mock_result2.meta.sort = ["sort2"] | ||
|
||
mock_search.execute.side_effect = [ | ||
[mock_result1, mock_result2], | ||
[], | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right way to test. This is mocking everything, including ES search after behavior. All the ES related tests in course-discovery hit test ES instance. The search after mixin tests should aim for that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the tests to use ElasticsearchTestMixin
instead.
break | ||
|
||
all_ids.update(ids) | ||
search_after = results[-1].meta.sort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add safety check here, like
search_after = results[-1].meta.sort if results[-1] else None
Also, have you tried it locally? By default, results[-1].meta.sort
is a Elastic.AttrList
, which is not json serializable. Using results[-1].meta.sort.copy()
will convert it into a simple list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch for the search_after check.
I've tested this locally, I'm using this search_after
value just to be sent back to the ES query execution.
d93f032
to
7a8f3b7
Compare
identified_course_ids.update( | ||
i.key | ||
for i in CourseRun.search( | ||
query, | ||
partner=ESDSLQ('term', partner=partner.short_code), | ||
identifiers=ESDSLQ('terms', **{'key.raw': course_run_ids}) | ||
).source(['key']) | ||
) | ||
if course_uuids: | ||
course_uuids = [UUID(course_uuid) for course_uuid in course_uuids.split(',')] | ||
specified_course_ids += course_uuids | ||
|
||
log.info(f"Specified course ids: {specified_course_ids}") | ||
identified_course_ids.update( | ||
Course.search( | ||
query, | ||
partner=ESDSLQ('term', partner=partner.short_code), | ||
identifiers=ESDSLQ('terms', **{'uuid': course_uuids}) | ||
).values_list('uuid', flat=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from this code that updates the querying logic for ES (so rather than fetching all the products AND THEN filter it, we're fetching only the necessary products), this is just a copy of https://github.com/openedx/course-discovery/blob/a948a741b1839e845e07c762bab6ee51622daea5/course_discovery/apps/api/v1/views/catalog_queries.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change not compatible with v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires changing the pkSearchableMixin
a bit as well if we were to keep a single CatalogQueryContainsViewSet
.
de8f6a3
to
e2052c4
Compare
def test_fetch_all_courses(self, mock_get_documents): | ||
query = "Course*" | ||
mock_get_documents.return_value = [CourseDocument] | ||
|
||
queryset = CourseProxy.search(query=query) | ||
|
||
self.assertEqual(len(queryset), self.total_courses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the page_size here? I think we should explicitly have courses greater than page size to emphasize that search will return everything in a single call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, the results should not contain any duplicates. That needs to be checked as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check that but given the correct sort id, it should never have duplicates.
For subsequent pages, they simply include this `next` value as the `search_after` parameter in their next request. | ||
|
||
A new mixin, SearchAfterMixin, will be created and added to enable the search_after functionality in catalog query endpoints, such as `/api/v1/catalog/query_contains`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We should add more details of the mixin
- nit: v2/catalog/query_contains
partner=ESDSLQ('term', partner=partner.short_code), | ||
identifiers=ESDSLQ('terms', **{'uuid': course_uuids}), | ||
document=CourseDocument | ||
).values_list('uuid', flat=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses values_list while course_run_ids is using comprehension. We can should make it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's rather consistent now.
e2052c4
to
b566d5e
Compare
b566d5e
to
66e7fae
Compare
PROD-4233
Adds a new
SearchAfterMixin
to be added in place ofPkSearchableMixin
that allows using search_after. Once this mixin is used, it will bypass the default search limit of 10k by making multiple calls to ES in case we have more than 10k records in an index.Previously, we faced an issue regarding the search limit, resulting in less records to be returned. We increased the
MAX_RESULT_WINDOW
before but a better way is to use search_after capability for an optimal and flexible result.This PR also adds a v2
CatalogQueryContainsViewSet
that fixes the querying mechanism by filtering the items at the time of when we're executing queries on ES. In v1, ourCatalogQueryContainsViewSet
first searched all the records AND THEN filtered it.Testing Instructions:
update_index
locally in Discovery shell./api/v1/catalog/query_contains/
endpoint and add a sample query like this: http://localhost:18381/api/v2/catalog/query_contains/?course_uuids=2de67490-f748-4efd-8532-b445f7ecc6f9,f9f1e100-668a-4fd5-a966-a127de1f69de&query=org:edXYou can set
ELASTICSEARCH_DSL_QUERYSET_PAGINATION
to your specific value in order to test the behavior. While the end result will be the same but this can affect the number of times the search_after mechanism is called.