Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lbp: rack awareness #195

Merged
merged 4 commits into from
Nov 20, 2024
Merged

lbp: rack awareness #195

merged 4 commits into from
Nov 20, 2024

Conversation

muzarski
Copy link
Collaborator

@muzarski muzarski commented Oct 17, 2024

Fix: #156
See: scylladb/cpp-driver#73

Changes

Implemented cass_[cluster/exeuction_profile]_set_load_balance_rack_aware[_n]. Adjusted the documentation, and extended unit tests for API related to LBP.

Important note about documentation and default behaviour of rack aware policy

I removed this part of the docstring:

 * With empty local_rack and local_dc,  default local_dc and local_rack
 * is chosen from the first connected contact point,
 * and no remote hosts are considered in query plans.
 * If relying on this mechanism, be sure to use only contact
 * points from the local rack.

There are multiple reasons for this:

  • this behaviour does not make much sense, and we should not mimic it IMO
  • rust-driver does not behave like this
  • this is not even true for cpp-driver

Why it's not true for cpp-driver:
If you carefully study the changes introduced to cpp-driver in the aforementioned
commit, you will notice that it's not possible for the driver to use rack aware
policy with an empty strings. This is because API functions reject
empty string, thus RackAwarePolicy object is never constructed in such case.

CassError cass_cluster_set_load_balance_rack_aware_n(CassCluster* cluster, const char* local_dc,
                                                   size_t local_dc_length,
                                                   const char* local_rack,
                                                   size_t local_rack_length) {
  if (local_dc == NULL || local_dc_length == 0 || local_rack == NULL || local_rack_length == 0) {
    return CASS_ERROR_LIB_BAD_PARAMS;
  }
  cluster->config().set_load_balancing_policy(new RackAwarePolicy(
      String(local_dc, local_dc_length), String(local_rack, local_rack_length)));
  return CASS_OK;
}

Why is this part of docstring included in cpp-driver then? No idea. Maybe,
because cass_cluster_set_load_balance_dc_aware mentions something similar
for empty (non-specified) dc. However, in this case it's true, since dc awareness is enabled
by default in cpp-driver. See the docstring:

 * Configures the cluster to use DC-aware load balancing.
 * For each query, all live nodes in a primary 'local' DC are tried first,
 * followed by any node from other DCs.
 *
 * <b>Note:</b> This is the default, and does not need to be called unless
 * switching an existing from another policy or changing settings.
 * Without further configuration, a default local_dc is chosen from the
 * first connected contact point, and no remote hosts are considered in
 * query plans. If relying on this mechanism, be sure to use only contact
 * points from the local DC.

Default node location preference

Cpp-driver, is dc-aware by default. This is not true for rust-driver, and probably will never be. In rust-driver, by default there are no node location preferences. I adjusted the documentation of cass_cluster_set_load_balance_dc_aware by removing the mention of dc awareness being enabled by default.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have implemented Rust unit tests for the features/changes introduced.
  • [ ] I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
  • [ ] I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

scylla-rust-wrapper/src/cluster.rs Outdated Show resolved Hide resolved
* first connected contact point, and no remote hosts are considered in
* query plans. If relying on this mechanism, be sure to use only contact
* points from the local DC.
*
* @deprecated The remote DC settings for DC-aware are not suitable for most
* scenarios that require DC failover. There is also unhandled gap between
* replication factor number of nodes failing and the full cluster failing. Only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally dislike the machanism of using first contact point DC as local DC (/rack), but I wonder if this won't cause compatibility issues.

We could probably implement it by, in case dc name is empty, following mechanism:

  • After session is connected check the DC of first contact point
  • Then create new LBP with this DC set as local
  • Swap it in session using something like this:
let handle = session.get_default_execution_profile_handle().clone();
let new_profile = handle.pointee_to_builder().load_balancing_policy(...).build();
handle.map_to_another_profile(new_profile);

I'm not sure preserving this mechanism is worth introducing such trickery - probably not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Is it going to be the first time when profile mapping feature is actually useful (and used by anyone!) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, I'm not convinced we should do this - actually, I'm more and more convinced that we shouldn't. So unfortunately it won't :(

@muzarski
Copy link
Collaborator Author

v2: Introduced NodeLocationPreferences enum. Removed DcAwareness and RackAwareness structs.

@muzarski muzarski requested a review from Lorak-mmk October 17, 2024 14:26
scylla-rust-wrapper/src/cluster.rs Outdated Show resolved Hide resolved
include/cassandra.h Show resolved Hide resolved
scylla-rust-wrapper/src/cluster.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/cluster.rs Outdated Show resolved Hide resolved
include/cassandra.h Show resolved Hide resolved
* first connected contact point, and no remote hosts are considered in
* query plans. If relying on this mechanism, be sure to use only contact
* points from the local DC.
*
* @deprecated The remote DC settings for DC-aware are not suitable for most
* scenarios that require DC failover. There is also unhandled gap between
* replication factor number of nodes failing and the full cluster failing. Only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Is it going to be the first time when profile mapping feature is actually useful (and used by anyone!) ?

scylla-rust-wrapper/src/cluster.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/cluster.rs Outdated Show resolved Hide resolved
@muzarski muzarski marked this pull request as draft October 22, 2024 09:39
@muzarski
Copy link
Collaborator Author

v2.1: rebased on master. (now there is only one cassadra.h header.

Still waiting for: #197

@muzarski muzarski marked this pull request as ready for review November 18, 2024 20:41
@muzarski
Copy link
Collaborator Author

muzarski commented Nov 18, 2024

v3: Rebased on master, and addressed all comments. Moving out of draft, since it's ready for review again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit: " cluster: define and implement cass_cluster_set_load_balance_rack_aware[_n]"

Nit: First line of commit message is too long which causes GH to display it badly. I think you can drop "define and".

After reading the commit message, I wonder - do we handle DC aware like cpp-driver? Meaning that we treat first contact point as local DC. If not, should we even try? Or just remove this from docs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you removed it in the third commit. Maybe we should have an issue to discuss this? I'm generally all for improving weird behavior of cpp-driver, even if the change would be slightly breaking. In this case I wonder if it isn't too breaking. Isn't this something that users could depend on?

I guess providing a migration guide could be an option? That would still benefit from an issue so we don't forget about this incompatibility.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue: #204

This is an extension introduced by Scylla's fork of cpp-driver.
See: scylladb/cpp-driver@9691ec0

Note:
I removed this part of the docstring:
```
 * With empty local_rack and local_dc,  default local_dc and local_rack
 * is chosen from the first connected contact point,
 * and no remote hosts are considered in query plans.
 * If relying on this mechanism, be sure to use only contact
 * points from the local rack.
```

There are multiple reasons for this:
- this behaviour does not make much sense, and we should not mimic it IMO
- rust-driver does not behave like this
- this is not even true for cpp-driver

Why it's not true for cpp-driver:
If you carefully study the changes introduced to cpp-driver in the aforementioned
commit, you will notice that it's not possible for the driver to use rack aware
policy with an empty strings. This is because API functions reject
empty string, thus RackAwarePolicy object is never constructed in such case.
```
CassError cass_cluster_set_load_balance_rack_aware_n(CassCluster* cluster, const char* local_dc,
                                                   size_t local_dc_length,
                                                   const char* local_rack,
                                                   size_t local_rack_length) {
  if (local_dc == NULL || local_dc_length == 0 || local_rack == NULL || local_rack_length == 0) {
    return CASS_ERROR_LIB_BAD_PARAMS;
  }
  cluster->config().set_load_balancing_policy(new RackAwarePolicy(
      String(local_dc, local_dc_length), String(local_rack, local_rack_length)));
  return CASS_OK;
}
```

Why is this part of docstring included in cpp-driver then? No idea. Maybe,
because `cass_cluster_set_load_balance_dc_aware` mentions something similar
for empty (non-specified) dc. However, in this case it's true, since dc awareness is enabled
by default in cpp-driver. See the docstring:
```
 * Configures the cluster to use DC-aware load balancing.
 * For each query, all live nodes in a primary 'local' DC are tried first,
 * followed by any node from other DCs.
 *
 * <b>Note:</b> This is the default, and does not need to be called unless
 * switching an existing from another policy or changing settings.
 * Without further configuration, a default local_dc is chosen from the
 * first connected contact point, and no remote hosts are considered in
 * query plans. If relying on this mechanism, be sure to use only contact
 * points from the local DC.
```
This is an extension to the extension.
cpp-driver does not implement it for some reason.
This is not true (and I doubt it will ever be) for cpp-rust-driver.
Added test cases for rack-awareness, and extended dc-awareness
tests by empty and nullptr parameters checks.
@muzarski
Copy link
Collaborator Author

v3.1: fixed commit titles and opened and issue regarding default local DC

@muzarski muzarski requested a review from Lorak-mmk November 19, 2024 16:35
@muzarski muzarski merged commit 74c1cc9 into scylladb:master Nov 20, 2024
11 checks passed
@muzarski muzarski mentioned this pull request Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement RackAwarePolicy loadbalancing policy
3 participants