-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5769] Ensure partitions created by async indexer are not deleted by regular writers #12662
base: master
Are you sure you want to change the base?
Conversation
243dfa7
to
899cd0a
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
Show resolved
Hide resolved
899cd0a
to
0c0d9b4
Compare
thanks for the PR desc, it helps. btw, for bullet 4 in your PR desc
woudn't the ingestion writer also log updates for RLI, since the table config's inflight metadata partitions will say RLI. |
can you create a follow up ticket for doc updates. This def needs to go into some runbook in our docs. |
if (dataWriteConfig.isRecordIndexEnabled()) { | ||
HoodieData<HoodieRecord> additionalUpdates = getRecordIndexAdditionalUpserts(partitionToRecordMap.get(MetadataPartitionType.RECORD_INDEX.getPartitionPath()), commitMetadata); | ||
partitionToRecordMap.put(RECORD_INDEX.getPartitionPath(), partitionToRecordMap.get(MetadataPartitionType.RECORD_INDEX.getPartitionPath()).union(additionalUpdates)); | ||
if (dataWriteConfig.isRecordIndexEnabled() && RECORD_INDEX.isMetadataPartitionAvailable(dataMetaClient)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess this was not the design.
the regular ingestion writer should log updates for partitions fully available and for partitions that are inflight.
if not, the async indexer has to do full catch up. The catch up is meant just for a race condition to ensure we do not miss any updates, but in most cases, the catch up phase should not have any work to do.
lets discuss if our understanding is different
@@ -1064,12 +1069,21 @@ private boolean shouldDeleteMetadataPartition(MetadataPartitionType partitionTyp | |||
case RECORD_INDEX: | |||
metadataIndexDisabled = !config.isRecordIndexEnabled(); | |||
break; | |||
// PARTITION_STATS should have same behavior as COLUMN_STATS | |||
case PARTITION_STATS: | |||
metadataIndexDisabled = !config.isPartitionStatsIndexEnabled(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) I also had to fix this in my partition stats patch.
0c0d9b4
to
179fc3b
Compare
Change Logs
Ensure partitions created by async indexer are not deleted by regular writers.
HoodieTable
TestHoodieIndexer
Let's understand the interplay between index enable/disable boolean flag and the new config. For illustration prupose, we will assume,
files
and column_statsare enabled by ingestion writer, and
record_index` is created using async indexer.Existing boolean configs:
hoodie.metadata.enable
,hoodie.metadata.index.column.stats.enable
andhoodie.metadata.record.index.enable
.New string config:
hoodie.metadata.index.drop
(this takes the index name/MDT partition name as value)hoodie.metadata.enable
andhoodie.metadata.index.column.stats.enable
set to true. And,files
andcolumn_stats
index are present.hoodie.metadata.enable
andhoodie.metadata.record.index.enable
to true. It checks the available partitions and automatically sets the columns stats config to true as well -hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
Line 182 in c049e9b
hoodie.metadata.index.drop
config, so even though thehoodie.metadata.record.index.enable
is false in ingestion writer config, the record_index is not deleted.hoodie.metadata.record.index.enable=false
but alsohoodie.metadata.index.drop=record_index
.To summarise, there is a boolean flag for enabling.disabling an index, and there is a string config to explicitly specify what index to delete. The interplay between these two decides finally, whether index needs to be disabled/dropped or not.
Impact
Ensure partitions created by async indexer are not deleted by regular writers.
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist