Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] malus-collator: implement initial version of malicious collator submitting same collation to all backing groups #6924

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

sw10pa
Copy link
Member

@sw10pa sw10pa commented Dec 17, 2024

Issues

Description

Modified the undying collator to include a malus mode, in which it submits the same collation to all assigned backing groups.

TODO

…or submitting same collation to all backing groups
@sw10pa sw10pa added the T10-tests This PR/Issue is related to tests. label Dec 17, 2024
@sw10pa sw10pa requested review from alindima and sandreim December 17, 2024 12:17
@sw10pa sw10pa self-assigned this Dec 17, 2024
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12372830027
Failed job name: test-linux-stable

Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall!

/// If true, the collator will behave maliciously by submitting
/// the same collations to all assigned backing groups.
#[arg(long, default_value_t = false)]
pub malus: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have this malus flag map to an enum of variants, describing what malicious behaviour it's triggering

@@ -19,7 +19,7 @@
use polkadot_cli::{Error, Result};
use polkadot_node_primitives::CollationGenerationConfig;
use polkadot_node_subsystem::messages::{CollationGenerationMessage, CollatorProtocolMessage};
use polkadot_primitives::Id as ParaId;
use polkadot_primitives::{self, Id as ParaId};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the self used?

Comment on lines +396 to +400
let scheduled_cores: Vec<CoreIndex> = claim_queue
.iter()
.filter_map(move |(core_index, paras)| {
Some((*core_index, *paras.get(claim_queue_offset.0 as usize)?))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at ClaimQueueSnapshot to reuse some of the code there with iter_claims_at_depth

let claim_queue = match client.runtime_api().claim_queue(relay_parent) {
Ok(claim_queue) =>
if claim_queue.is_empty() {
log::info!(target: LOG_TARGET, "Claim queue is empty.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this case because it'll be caught by scheduled_cores.is_empty()

})
.collect();

if scheduled_cores.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log is super spammy until the para is scheduled. It's also not an error.
We should wait for a new slot before retrying (the 6 seconds). Same for all the other continues here.

I would also log something if scheduled_cores only has one core assigned, as in that case, this malus collator will not do anything malicious


overseer_handle
.send_msg(
CollatorProtocolMessage::DistributeCollation {
Copy link
Contributor

@alindima alindima Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment about why we're not using SubmitCollation

commitments: commitments.clone(),
};

let candidate_receipt = ccr.to_plain();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid building a CommittedCandidateReceiptV2 if you only need the plain receipt

charlie: js-script ./assign-core.js with "2,2000,57600" return is 0 within 600 seconds

# Ensure parachains made progress.
alice: parachain 2000 block height is at least 10 within 300 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add some better assertions, including some log checks that indeed validators got invalid candidates.
Also, if you want this test being run in the CI, it needs to be added to the gitlab config in polkadot.yml

# Register three cores assigned to this parachain.
alice: js-script ./assign-core.js with "0,2000,57600" return is 0 within 600 seconds
bob: js-script ./assign-core.js with "1,2000,57600" return is 0 within 600 seconds
charlie: js-script ./assign-core.js with "2,2000,57600" return is 0 within 600 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core 2 is assigned by default already when force registering the parachain

@@ -0,0 +1,54 @@
[settings]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSL is deprecated, we have a goal to port tests to Zombienet SDK.

@@ -86,6 +90,8 @@ pub struct GraveyardState {
pub zombies: u64,
// Grave seal.
pub seal: [u8; 32],
// Increasing sequence number for core selector.
pub core_selector_number: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting we use the block number (LSB), is there anything preventing us to do so ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use upward messages to place UMP signals and specify the core index, we needed this core_selector_number for the core selector. Do you mean doing something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrei means that we typically use the block number for the core selector in real parachains.
The undying collator's state does not store the block number, so this core_selector_number is actually the block number

}

// Wait before repeating the process.
sleep(Duration::from_secs(6 as u64));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait 6s ? If relay chain skips a slot, we build another candidate, which is different than what the honest collator is doing. We should be building on each relay parent we import.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean no waiting at all, or should the waiting time somehow depend on the imports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is very roughly mimicking what the slot based collator does (every slot, pick the best relay chain block)
What Andrei is suggesting is to simulate what the lookahead collator does and subscribe to relay chain block import notifications instead. This would indeed be closer to what the honest collator is doing. I think for a test collator either is fine

@@ -119,6 +125,7 @@ pub fn execute_transaction(mut block_data: BlockData) -> GraveyardState {
// Chain hash the seals and burn CPU.
block_data.state.seal = hash_state(&block_data.state);
}
block_data.state.core_selector_number += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to use wrapping_add.

some info about overflows in rust https://doc.rust-lang.org/reference/behavior-not-considered-unsafe.html#integer-overflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T10-tests This PR/Issue is related to tests.
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

3 participants