Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destinatino S3 Data Lake: parallelize data cleaner; handle failure case #52123

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Jan 23, 2025

previously

  • cleaner was super slow, b/c it did everything in series
  • if a table had no files (e.g. b/c you killed the test partway through creating a table), the cleaner would crash on catalog.loadTable - this is now handled in a try-catch

Copy link

vercel bot commented Jan 23, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 23, 2025 11:07pm

} catch (e: Exception) {
// catalog.loadTable will fail if the table has no files.
// In this case, we can just hard drop the table, because we know it has
// no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The comment format is a bit weird?

Copy link
Contributor

@frifriSF59 frifriSF59 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment but Looks good otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants