Dp 2977 #2995

murdo-moj · 2024-01-18T09:29:40Z

Copy curated data in s3 between versions upon table deletion

MatMoore · 2024-01-18T16:02:39Z

containers/daap-python-base/src/var/task/versioning.py

@@ -200,6 +207,21 @@ def update_metadata_remove_schemas(self, schema_list: list[str]) -> str:
                logger=self.logger,
            ).run()

+            # Copy data files in the curated bucket


I thought this wasn't needed because it was already handled by the athena query in CuratedDataCopier.

Rather than copy the all the parquet files in s3 we have an unload query that creates the new parquet from the latest load timestamp in the previous version. Am I misunderstanding how this works?

UNLOAD ( SELECT * FROM {previous_major_database}.{curated_table.name} WHERE load_timestamp = ( SELECT MAX(load_timestamp) FROM {previous_major_database}.{curated_table.name} ) ) TO '{self.table_path}' WITH( format='parquet', compression = 'SNAPPY', partitioned_by=ARRAY['load_timestamp'] )

trigger from https://github.com/ministryofjustice/data-platform/blob/main/containers/daap-python-base/src/var/task/curated_data/curated_data_loader.py#L215

github-actions · 2024-02-18T01:48:27Z

This pull reuest is being marked as stale because it has been open for 30 days with no activity. Remove stale label or comment to keep the pull reuest open.

github-actions · 2024-02-25T01:48:29Z

This pull reuest is being closed because it has been open for a further 7 days with no activity. If this is still a valid pull reuest, please reopen it, Thank you!

murdo-moj added 2 commits January 17, 2024 17:05

Copy remaining data files in curated bucket after table delete

5106047

for delete table, only copy data files which aren't deleted

55a6884

murdo-moj requested a review from a team January 18, 2024 09:29

github-actions bot assigned murdo-moj Jan 18, 2024

murdo-moj added 2 commits January 18, 2024 09:30

Bumped changelog

e189724

Linting changes

6d546cc

murdo-moj mentioned this pull request Jan 18, 2024

🐛 Delete table doesn't copy s3 data #2977

Closed

MatMoore reviewed Jan 18, 2024

View reviewed changes

github-actions bot added the stale label Feb 18, 2024

github-actions bot closed this Feb 25, 2024

jacobwoffenden deleted the dp-2977 branch May 9, 2024 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dp 2977 #2995

Dp 2977 #2995

murdo-moj commented Jan 18, 2024 •

edited

Loading

MatMoore Jan 18, 2024

github-actions bot commented Feb 18, 2024

github-actions bot commented Feb 25, 2024

Dp 2977 #2995

Dp 2977 #2995

Conversation

murdo-moj commented Jan 18, 2024 • edited Loading

MatMoore Jan 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 18, 2024

github-actions bot commented Feb 25, 2024

murdo-moj commented Jan 18, 2024 •

edited

Loading