Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO Package refactor #172

Open
wants to merge 63 commits into
base: develop
Choose a base branch
from
Open

IO Package refactor #172

wants to merge 63 commits into from

Conversation

javihern98
Copy link
Contributor

@javihern98 javihern98 commented Jan 13, 2025

This PR is a combination of #164 and #156 with many consistency checks. It covers the whole IO package for better consistency

Closes #164
Closes #156

Summary of changes

Dependency management

  • Removed extra FMR as we make httpx a mandatory dependency (we need it as well in IO package to download files from URL in read_sdmx)
  • Updated Poetry.lock

IO package

Consistency checks (#164)

  • Changed signature from readers and writers as specified in Checking for inconsistencies in parsers and writers methods #164. Normalization across the whole IO package to use Sequence
  • Changed short_urn to be a property on Maintainable Artefact
  • Ensure we use Dataflow=... and DataStructure=... across the IO package as short_urn for Schemas
  • Removed MessageType and used SDMXFormat on writers
  • Updated signature of input processor to use same terminology all across the functions

Convenience methods (#156)

  • Changed input processor to infer SDMX format and flavor (subformat, like SDMX-ML 2.1 Generic)
  • Added SDMXFormat enumeration with supported flavors
  • Added read_sdmx as a general reader for Path, URL or Strings
  • Added get_datasets to extract the Sequence of Datasets with the possibility to use a structures file to extract the Schema

Model package

  • Moved ActionType to Dataset (makes sense only here)
  • Cleaned up Message module and used same typing as readers and writers
  • Moved SubmissionResult (from RegistryInterface) to submission.py
  • Tag on Concept and Item Reference (Util package) due to the use of Concept Identity of both of them in the Component class

Tests changes

  • Adapted tests to the changes above and removed extra tests. Code Coverage reached at 100%

javihern98 and others added 30 commits December 18, 2024 13:23
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
… their classes. Added read_sdmx method to reader

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
… content. Use ItemReference as ConceptIdentity if necessary. Fixed tests accordingly.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…dd-read_sdmx-convenience-method

# Conflicts:
#	src/pysdmx/model/message.py
…es. Added get_datasets methods. Maxed code coverage in all code.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…been split into write data and metadata). Read and write XML tests updated to work with the new methods.
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…detection on SDMX-ML. Fixed tests.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…_datasets method.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…rage.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…ency.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…thod' into 156-add-read_sdmx-convenience-method
…writing and removing redundancies

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…consistencies-in-parsers-and-writers-methods' into io_refactor

# Conflicts:
#	src/pysdmx/io/xml/sdmx21/writer/structure.py
#	tests/io/xml/sdmx21/reader/test_reader.py
#	tests/io/xml/sdmx21/writer/test_structures_writing.py
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…ers file structure.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
…ML reader. Added some error handling if a wrong SDMX message is used as input.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
… DataStructure). Changed Submission to RegistryInterface. Added better documentation for structures formats. Fixed tests for io and csv.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
… a list.

Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
src/pysdmx/io/xml/sdmx21/writer/structure.py Dismissed Show dismissed Hide dismissed
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
@javihern98 javihern98 marked this pull request as ready for review January 13, 2025 16:52
SDMX_ML_2_1_DATA_GENERIC = "SDMX-ML 2.1 Generic"
SDMX_ML_2_1_REGISTRY_INTERFACE = "SDMX-ML 2.1 Registry Interface"
SDMX_ML_2_1_ERROR = "SDMX-ML 2.1 Error"
SDMX_JSON_2 = "SDMX-JSON 2.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also distinguish between Structure and Data, as we do with XML?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me check the Standard, in any case this has no impact in the rest of the code as we have a NotImplemented

Copy link
Contributor Author

@javihern98 javihern98 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to differentiate between SDMX_JSON_2_DATA and SDMX_JSON_2_STRUCTURE

src/pysdmx/io/enums.py Show resolved Hide resolved
@@ -439,6 +440,23 @@ class DataStructureDefinition(MaintainableArtefact, frozen=True, kw_only=True):

components: Components

def to_schema(self) -> Schema:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? The reason Schema was introduced was to represent the response of an SDMX-REST Schema query. In this context, it contains very useful pieces of information like the artefacts property, which should list the URNs of all the artefacts that have been considered to generate the schema. If you don't provide this piece of information, then I don't see what converting to a Schema brings as benefits... Just a question :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to have a simple interface between the DataStructureDefinition and Schema. For sure then I will need to fill the artefacts as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we discuss this? Thanks :)

Copy link
Contributor Author

@javihern98 javihern98 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion:

  • I will add the artefacts when generating the Schema. This includes the Concepts, Codelists and DataConstraints (not implemented as of today) of the DataStructureDefinition
  • We need to close the gap as much as we can between the Schema and DataStructureDefinition, Dataflow and ProvisionAgreement to make these classes interchangeable. This will be addressed in future releases

src/pysdmx/model/submission.py Show resolved Hide resolved
src/pysdmx/io/csv/sdmx10/writer/__init__.py Outdated Show resolved Hide resolved
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Signed-off-by: javier.hernandez <javier.hernandez@meaningfuldata.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checking for inconsistencies in parsers and writers methods Add read_sdmx convenience method
3 participants