-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dp 2977 #2995
Conversation
murdo-moj
commented
Jan 18, 2024
•
edited
Loading
edited
- Copy curated data in s3 between versions upon table deletion
@@ -200,6 +207,21 @@ def update_metadata_remove_schemas(self, schema_list: list[str]) -> str: | |||
logger=self.logger, | |||
).run() | |||
|
|||
# Copy data files in the curated bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this wasn't needed because it was already handled by the athena query in CuratedDataCopier.
Rather than copy the all the parquet files in s3 we have an unload query that creates the new parquet from the latest load timestamp in the previous version. Am I misunderstanding how this works?
UNLOAD (
SELECT
*
FROM {previous_major_database}.{curated_table.name}
WHERE load_timestamp = (
SELECT MAX(load_timestamp)
FROM {previous_major_database}.{curated_table.name}
)
)
TO '{self.table_path}'
WITH(
format='parquet',
compression = 'SNAPPY',
partitioned_by=ARRAY['load_timestamp']
)
This pull reuest is being marked as stale because it has been open for 30 days with no activity. Remove stale label or comment to keep the pull reuest open. |
This pull reuest is being closed because it has been open for a further 7 days with no activity. If this is still a valid pull reuest, please reopen it, Thank you! |