Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataclasses for Recogniser, Ranker and Linker output data types #281

Closed
thobson88 opened this issue Nov 14, 2024 · 3 comments
Closed

Dataclasses for Recogniser, Ranker and Linker output data types #281

thobson88 opened this issue Nov 14, 2024 · 3 comments
Assignees

Comments

@thobson88
Copy link
Collaborator

thobson88 commented Nov 14, 2024

AIM: replace nested dictionaries with appropriate data structures implemented as Python dataclasses.

Subtask of #276.

@thobson88
Copy link
Collaborator Author

thobson88 commented Nov 17, 2024

Dataclass structure

T-Res-dataclasses drawio

Key:

  • Fields (attributes) above the line
  • Methods (functions) below the line
  • Arrows denote subclasses
  • Diamonds denote composition
  • Green indicates an abstract base class
  • Orange indicates a concrete class

@thobson88
Copy link
Collaborator Author

thobson88 commented Nov 21, 2024

Some advantages of dataclasses over nested dictionaries:

  • code is more comprehensible, easier to discuss & reason about (by refering to classes by name)
  • less error prone: explicit references to named data fields, rather than arbitrary indexing via magic strings/numbers.
  • data provenance: dataclass instances retain all of the info and logic from which a prediction is generated, rather than just the final scores.
  • algorithm clarity: e.g. all business logic lives in the appropriate Recogniser/Ranker/Linker subclasses and bookkeeping is done in the dataclasses.
  • no need for utility functions to switch format between different dictionary representations
  • automatic field type checking with pydantic.dataclasses
  • automatic sorting in lists
  • immutability
  • pretty printing

@thobson88
Copy link
Collaborator Author

thobson88 commented Nov 21, 2024

TODO: add fields latlon and wkdt_class into the WikidataLink class structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant