Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to acquire permits on primary shards with different checks #119794

Merged
merged 4 commits into from
Jan 10, 2025

Conversation

arteam
Copy link
Contributor

@arteam arteam commented Jan 8, 2025

Since #42241 we check that the shard must be in a primary mode for acquiring a primary permit on it. We would like customize this check and an option to perform different checks before running the onPermitAcquired listener. For example, we would to skip the primary mode check when we acquire primary permits during recovering of a hollow indexing shard.

See ES-10487

@arteam arteam added >non-issue :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.0.0 labels Jan 8, 2025
…hecks

Since elastic#42241 we check that the shard must be in a primary mode for acquiring a
primary permit on it. We would like customize this check and an option to
perform different checks before running the `onPermitAcquired` listener.
For example, we would to skip the primary mode check when we acquire primary permits
during recovering of a hollow indexing shard.

See ES-10487
@arteam arteam force-pushed the acquire-primary-permit-custom-check branch from 23917ed to 94631d5 Compare January 8, 2025 20:12
@arteam arteam requested review from kingherc and fcofdez January 9, 2025 07:55
Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A couple of comments.

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arteam arteam merged commit 6ca7e75 into elastic:main Jan 10, 2025
16 checks passed
@arteam arteam deleted the acquire-primary-permit-custom-check branch January 10, 2025 13:04
/**
* Check to run before running the primary permit operation
*/
public enum PrimaryPermitCheck {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late, but wouldn't be possible to somehow relax the assertion in #isPrimaryMode when the engine is hollow instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fcofdez ! It's possible, but it's unclear what's the best way to do it since the hollow info/logic is in serverless code. In the POC, I remember I simply relaxed the assertion by allowing it if the shard is in the recovery process. Not sure though it's better than this PR since it was more generic (recovering shards). Feel free to shoot a specific proposal though to see if it's better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcofdez I believe the reason was that IndexShard doesn't know about HollowEngine which is a serverless concept, so you would need to expose this abstraction to Engine. I think disabling the primary mode check explicitly seems to be a good first approach and we can revert to the engine check if that makes more sense.

@fcofdez
Copy link
Contributor

fcofdez commented Jan 14, 2025

I've been chatting with @kingherc and I think that we could avoid this change if we acquire the permits in the target node during the primary context handover in serverless as that's called before the shard is marked as started in

@Override
public void onRecoveryDone(
final RecoveryState state,
ShardLongFieldRange timestampMillisFieldRange,
ShardLongFieldRange eventIngestedMillisFieldRange
) {
shardStateAction.shardStarted(
shardRouting,
primaryTerm,
"after " + state.getRecoverySource(),
timestampMillisFieldRange,
eventIngestedMillisFieldRange,
ActionListener.noop()
);
}
. That would avoid having to change this API.

@kingherc
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue Team:Distributed Indexing Meta label for Distributed Indexing team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants