You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a review of the topics discussed that the Dataset class attributes must be:
Data: for version 1.0, use Pandas, subsequent versions will include streaming capabilities
Attached attributes: attributes at dataset level (simple dictionary at the moment, will be changed later)
Relationship to Schema class or URN
@sosna and @stratosn proposed the following code, which was accepted by everyone and will be implemented after 1.0, as some changes in parsers and writers are required.
fromdataclassesimportdataclassfromdatetimeimportdatetimefromtypingimportAny, Generator, Optional, Sequence, Unionfrompysdmx.modelimportMetadataReport, DataProvider, Schema@dataclassclass_Component:
id: strvalue: Any@dataclassclassDimension(_Component):
pass@dataclassclassDataAttribute(_Component):
pass@dataclassclassMeasure(_Component):
pass@dataclassclass_Package:
key: str# Full key (cf. MEDAL) A.F.G.M.*dimensions: Sequence[Dimension]
attributes: Optional[Sequence[DataAttribute]]
name: Optional[str]
metadata: Optional[Sequence[MetadataReport, str]]
@dataclassclassObservation(_Package):
measures: Sequence[Measure]
@dataclassclass_ObsPackage(_Package):
observations: Generator[Observation]
obs_count: Optional[int]
start_period: Optional[str]
end_period: Optional[str]
last_updated: Optional[datetime]
@dataclassclassSeries(_ObsPackage):
pass@dataclassclassGroup(_Package):
pass@dataclassclassDataset(_ObsPackage):
packages: Generator[Union[Group, Series, Observation]]
provider: Optional[DataProvider]
structure: Union[Schema, str] # Schema or the SDMX URN of the structure@propertydefgroups(self): # A view on the packages of type Groupreturn (pforpinself.packagesifisinstance(p, Group))
@propertydefseries(self): # A view on the packages of type Seriesreturn (pforpinself.packagesifisinstance(p, Series))
@dataclassclassPandasDataset(Dataset):
defto_pandas():
pass
The text was updated successfully, but these errors were encountered:
Do we think it might be possible to do this with Narwhals to make this dataframe agnostic? (I am a huge lover of pysdmx and moving all my sdmx code over to it, but also a huge polars user :))
Thanks for the suggestion and kind words, @gabrielgellner, we'll definitely have a look at it! Indeed, some of us in the Dev team cannot use Pandas (as there is no guarantee the dataset would fit in memory) and so we will soon need to look at adding more options.
Hi @gabrielgellner . This is scheduled for next year. I would like to first investigate all possible libraries (including Dask, Modin, etc). For sure I will investigate Narwhals as well, seems quite straightforward!
The main goal is to keep all functionalities but add the possibility of loading datasets bigger than memory. As we have already achieved most functionalities, the remaining goals are to add compatibility to SDMX-ML 3.0 and gather more use cases for data consumers and producers in the early months of 2025.
Low memory data loading is tricky as we need as well to find a common interface between lxml and these "low memory libraries" and do not read the whole XML file, adding the necessary "pointers" to read whenever it is necessary. That requires some time and effort to generate good quality software that anyone can use in a production environment. This has been a raising need after talking with some potential users so we will prioritize it.
We need as well to find a common solution to handle data efficiently through the whole library, but bearing in mind that we shall use something that is easily recognizable for the users (like Pandas) and does not add a lot of cognitive load when interacting with the actual data.
This is a review of the topics discussed that the Dataset class attributes must be:
@sosna and @stratosn proposed the following code, which was accepted by everyone and will be implemented after 1.0, as some changes in parsers and writers are required.
The text was updated successfully, but these errors were encountered: