-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [FC-0074] add support for annotated python dicts as avro map type #433
Open
mariajgrimaldi
wants to merge
17
commits into
main
Choose a base branch
from
MJG/add-support-for-dicts
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
7825196
feat: add support for Avro dict
Ian2012 9aaa04e
refactor: add support for dicts as avro maps instead of record
mariajgrimaldi 743bcc4
test: add more test cases for serializing dicts
mariajgrimaldi f4aa824
refactor: address quality issues from latest commits
mariajgrimaldi f7beb63
refactor: address PR reviews
mariajgrimaldi 482c8d9
fix: address quality issues
mariajgrimaldi 7d9b8b8
test: add deserializer tests for dicts
mariajgrimaldi c76b3d1
test: add tests for deserialization errors
mariajgrimaldi ec1fdec
test: add test cases for schema generation with dict types
mariajgrimaldi 2d225b9
fix: address quality issues
mariajgrimaldi 2839ae7
refactor: rewrite comment with suggestion
mariajgrimaldi 4389cfb
refactor: include map type instead of replacing record
mariajgrimaldi 8f67ea2
refactor: drop changes in forum events
mariajgrimaldi 9ee35f2
refactor: drop changes in forum events
mariajgrimaldi a32ac76
refactor: address PR reviews
mariajgrimaldi 2980a93
test: add missing test for de-serializing complex types
mariajgrimaldi 7eabdaa
refactor: check for keys and values simple types while de-serializing
mariajgrimaldi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
import io | ||
import os | ||
from datetime import datetime | ||
from typing import List | ||
from typing import List, Union | ||
from unittest import TestCase | ||
from uuid import UUID, uuid4 | ||
|
||
|
@@ -43,6 +43,7 @@ def generate_test_data_for_schema(schema): # pragma: no cover | |
'string': "default", | ||
'double': 1.0, | ||
'null': None, | ||
'map': {'key': 'value'}, | ||
} | ||
|
||
def get_default_value_or_raise(schema_field_type): | ||
|
@@ -71,6 +72,9 @@ def get_default_value_or_raise(schema_field_type): | |
elif sub_field_type == "record": | ||
# if we're dealing with a record, recurse into the record | ||
data_dict.update({key: generate_test_data_for_schema(field_type)}) | ||
elif sub_field_type == "map": | ||
# if we're dealing with a map, "values" will be the type of values in the map | ||
data_dict.update({key: {"key": get_default_value_or_raise(field_type["values"])}}) | ||
else: | ||
raise Exception(f"Unsupported type {field_type}") # pylint: disable=broad-exception-raised | ||
|
||
|
@@ -112,6 +116,24 @@ def generate_test_event_data_for_data_type(data_type): # pragma: no cover | |
datetime: datetime.now(), | ||
CCXLocator: CCXLocator(org='edx', course='DemoX', run='Demo_course', ccx='1'), | ||
UUID: uuid4(), | ||
dict[str, str]: {'key': 'value'}, | ||
dict[str, int]: {'key': 1}, | ||
dict[str, float]: {'key': 1.0}, | ||
dict[str, bool]: {'key': True}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about union types: dict[str, Union[str, int]]: ..., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added more test cases and this covers even more than I initially thought: a701f78. Thanks for the suggestion! |
||
dict[str, CourseKey]: {'key': CourseKey.from_string("course-v1:edX+DemoX.1+2014")}, | ||
dict[str, UsageKey]: {'key': UsageKey.from_string( | ||
"block-v1:edx+DemoX+Demo_course+type@video+block@UaEBjyMjcLW65gaTXggB93WmvoxGAJa0JeHRrDThk", | ||
)}, | ||
dict[str, LibraryLocatorV2]: {'key': LibraryLocatorV2.from_string('lib:MITx:reallyhardproblems')}, | ||
dict[str, LibraryUsageLocatorV2]: { | ||
'key': LibraryUsageLocatorV2.from_string('lb:MITx:reallyhardproblems:problem:problem1'), | ||
}, | ||
dict[str, List[int]]: {'key': [1, 2, 3]}, | ||
dict[str, List[str]]: {'key': ["hi", "there"]}, | ||
dict[str, dict[str, str]]: {'key': {'key': 'value'}}, | ||
dict[str, dict[str, int]]: {'key': {'key': 1}}, | ||
dict[str, Union[str, int]]: {'key': 'value'}, | ||
dict[str, Union[str, int, float]]: {'key': 1.0}, | ||
} | ||
data_dict = {} | ||
for attribute in data_type.__attrs_attrs__: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, right? Were dicts only introduced and used with the forums events, which you plan to break anyway? You don't have a changelog entry or a major version change (yet), but is that your plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, dictionaries were never fully supported. Besides this list with known unserializable events, which contain dictionaries in their payloads, we can confirm this by generating a schema for an event using dictionaries. Let's take this event as an example:
Which immediately raises a serialization error:
Because dicts were not a type supported in event_bus/avro/schema.py::_create_avro_field_definition before this PR. So no events with type dicts were ever sent through the event bus. Forum events were not supported either since they were listed in the known unserialiazable events, but I included them here as an example and will remove them shortly.
To maintain backward compatibility, I'll include the map type as an additional type instead of changing "record" to "map".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mariajgrimaldi.
/schemas
introduced in https://github.com/openedx/openedx-events/pull/225/files for unit testing? Is that part of the missing coverage?UPDATE: If we are missing test coverage, it would be great to get that back. Or, if this is a non-breaking change, then you should drop "record" to clean up the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests to solve the missing coverage. However, I still wonder if we're breaking something by replacing "record" with "map". I also checked the /schemas added in https://github.com/openedx/openedx-events/pull/225/files and they are still to be in the repository. Could you clarify which schemas are missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I now see the schemas. In theory, if our tests are working as planned, these should be snapshots of all the schemas and if there is no change detected, all would be good.
You say you fear dropping record is a breaking change. Is it possible to create a breaking test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, changing "record" to "map" should not be a breaking change since dicts were not supported before this PR. By testing the code with dicts without any support, I can infer that
L66
was only used to warn developers that dicts (mapped to avro records, which is a type we can't use because it clashes with the data attributes mapping) without annotations were not allowed, but either dicts with annotations since they were not supported. In any case, I added a test case that checks that the schema generated for dicts should map tomap
for future reference.Although I said I wondered whether this was a breaking change given my findings and considering what you mentioned before, I think we should trust the repo's coverage.