Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] collation-generation: resolve mismatch between descriptor and commitments core index #7104

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sw10pa
Copy link
Member

@sw10pa sw10pa commented Jan 9, 2025

Issue

[#7107] Core Index Mismatch in Commitments and Descriptor

Description

This PR resolves a bug where normal (non-malus) undying collators failed to generate and submit collations, resulting in the following error:

ERROR tokio-runtime-worker parachain::collation-generation: Failed to construct and distribute collation: V2 core index check failed: The core index in commitments doesn't match the one in descriptor.

More details about the issue and reproduction steps are described in the related issue.

Summary of Fix

  • Core index is now derived from commitments (via UMP signals) instead of sequential claim queue indexes;
  • The fix ensures functionality remains unchanged for parachains not using UMP signals;
  • Added checks to stop processing if multiple cores are assigned but the same core is selected repeatedly.

TODO

  • Implement the fix;
  • Add tests;
  • Add PRdoc.

@sw10pa sw10pa requested a review from alindima January 9, 2025 14:40
@sw10pa sw10pa linked an issue Jan 9, 2025 that may be closed by this pull request
@sw10pa sw10pa self-assigned this Jan 9, 2025
@sw10pa sw10pa added I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known. labels Jan 9, 2025
@sw10pa sw10pa requested a review from sandreim January 10, 2025 12:18
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines +383 to +386
// Malicious behavior detected: The parachain is assigned to
// multiple cores but repeatedly selects the same core. This is
// unexpected and the process should stop here.
gum::error!(target: LOG_TARGET, "Parachain repeatedly selected the same core index");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not malicious behavior, could be legitimately only building for one core (without elastic scaling). I would also demote this log to info level


let descriptor_core_index = match commitments.core_selector() {
// If a valid CoreSelector is found, calculate the core index based on it.
Ok(Some((core_selector, _cq_offset))) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should take into account the claim queue offset, if we do have a core selector

core_selector.0 as usize % cores_to_build_on.len();

if cores_to_build_on.len() > 1 &&
prev_core_selector_index == core_selector_index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're only checking if the previous core selector is the same, but what if the core selectors look like 1,2,1?
In this case we should stop after building for 1 and 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

Core Index Mismatch in Commitments and Descriptor
2 participants