-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataHub Catalogue client for dp-catalogue library #2902
Conversation
…t classes can be inherited - implement `__init__` and `create_or_update_table` for DataHubCatalogueClient using DataHub gms - add tests to check DataHub client is submitting requests as expected (in the format supported by DataHub python sdk gms emitter). - add packages to pypoetry.toml - refactoring
… on name or enum - clear up datahub docstring/comments
python-libraries/data-platform-catalogue/data_platform_catalogue/client.py
Outdated
Show resolved
Hide resolved
python-libraries/data-platform-catalogue/data_platform_catalogue/client.py
Outdated
Show resolved
Hide resolved
- handle misconfigured api_urls in DataHub client - make tags api call optional if not present - update tests for changes
python-libraries/data-platform-catalogue/data_platform_catalogue/client.py
Outdated
Show resolved
Hide resolved
…DataHubCatalogueClient class - update `DataHubCatalogueClient.create_or_update_table` method to create domain and data product if they don't exist but are passed as `data_product_metadata` - associate tables with data products when created in DataHub
linting
From Datahub codebase
4e89ba1
to
4275112
Compare
python-libraries/data-platform-catalogue/data_platform_catalogue/client.py
Outdated
Show resolved
Hide resolved
python-libraries/data-platform-catalogue/data_platform_catalogue/client.py
Outdated
Show resolved
Hide resolved
This is needed in addition to the SchemaMetadata aspect
I have tested using this branch to create a dataset on the apps and tools instance. See https://datahub.apps-tools.development.data-platform.service.justice.gov.uk/dataset/urn:li:dataset:(urn:li:dataPlatform:glue,my_data_product2.my_table3,PROD)/Schema?is_lineage_mode=false&schemaFilter= I think I'm reasonably confident it's working end to end now, although there are some things we might need to come back to. Notes:
|
The aim here is to hide catalogue specific details from the user of this library. This means we should expect to be able to type any value to BaseCatalogueClient and then be able to substitute any of the concrete classes without breaking things. (See Liskov's substitution principle) This cannot be the case if the base class has methods which accept *any* arguments, and the concrete implementations only handle specific arguments. To fix this I've modified the abstract methods to specify the arguments. Since the schema_fqn, database_fqn, etc arguments reflected the OpenMetadata graph structure, I've replaced this with a generic "DataLocation" argument, which does not enforce a particularly hierarchy.
python-libraries/data-platform-catalogue/data_platform_catalogue/client/openmetadata.py
Show resolved
Hide resolved
- Pass a fully qualified table name ('database.table' rather than 'table') so that a container is created rather than allocating to the default container. - Remove dummy value for mandatory platformSchema argument. Just use empty string.
6d2ffbd
to
37130d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priya approved.
__init__
andupsert_table
for DataHubCatalogueClient using DataHub gmsFurther changes - @MatMoore