Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28484: Add ability to replicate to a different tableName #6484

Closed
wants to merge 1 commit into from

Conversation

eab148
Copy link
Contributor

@eab148 eab148 commented Nov 20, 2024

Design document

Jira

Currently, replication can only occur if the source and sink clusters both house tables with the same (tableName, family) pairs. This requirement exists so that the sink cluster knows where to persist the data it receives from the source cluster. In this PR, we loosen the naming constraint and give clients more configuration power over the name of their sink namespaces and tableNames.

cc: @rmdmattingly @hgromer @krconv @ndimiduk @bbeaudreault

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

* Convert sourceToSinkTableOverrides Object to Map.
*/
public static Map<TableName, TableName>
convert2Map(ReplicationProtos.SourceToSinkTableOverride[] sourceToSinkTableOverrides) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to convertTableOverridesToMap?

if (sourceToSinkTableOverrides == null || sourceToSinkTableOverrides.length == 0) {
return null;
}
Map<TableName, TableName> sourceToSinkTableOverridesMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could pre-size this map at n entries

Comment on lines 120 to 127
* Sets an explicit map of source to sink namespaces that should be replicated to the given peer.
* If the map is empty for a namespace, the source namespace is used for the given peer. Use
* {@link #setSourceToSinkTableOverrides} to override the namespace overrides set in this method
* for a given table.
* @param sourceToSinkNamespaceOverrides A map from a source namespace to sink namespace. By
* default, edits will be replicated to the same namespace
* as the source namespace. A null or empty collection can
* be passed to indicate there are no overrides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the map of table overrides only covers a subset of the peer's tables? I wonder whether this method should throw in that case so that we don't need to explicitly support it

Copy link
Contributor Author

@eab148 eab148 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the overrides only cover a subset of the peer's tables, then we assume that the source tableName is the same as the sink tableName for all of those tableNames that do not exist in the map

Comment on lines 109 to 110
ReplicationPeerConfigBuilder
setSourceToSinkTableOverrides(Map<TableName, TableName> sourceToSinkTableOverrides);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these keys and values be TableNames instead? That would allow you to make fewer assumptions about the namespace, and simplify down to only one map of overrides

Copy link
Contributor Author

@eab148 eab148 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original motivation for having namespace and tableName scoped overrides was two fold.

  1. Every imaginable naming override can be categorized into one of these two buckets. Say, for example, we want to support column family name translations in the future. We’d need to provide a family → sinkFamily map for the given tableName because multiple tableNames could have the same column family name. On the other hand, say we want to add a prefix to all tables in the namespace N. We’d need to add a tableNamePrefix field to a given NamespaceOverride object.

  2. This framework follows the existing architecture for replication. Currently, in the ReplicationPeerConfig, clients can opt namespaces or (tableName, family) pairs into or out of replication. Thus, it makes sense to allow users to opt namespaces or tableNames into naming overrides for replication.

This is copied from the design doc, which I did not link before you reviewed the PR (apologies for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the two override scopes does not make sense. I am happy to converge on tableName scoped overrides alone, since every namespace scoped override can be encoded as a tableName override.

Copy link
Contributor

@rmdmattingly rmdmattingly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any overlap between this work and #5819?

@Apache-HBase

This comment has been minimized.

@eab148
Copy link
Contributor Author

eab148 commented Nov 25, 2024

Is there any overlap between this work and #5819?

Both PRs empower users to replicate data from tableNameA to tableNameB. But this one is a bit more extensible for future naming translation additions (like adding a prefix to all tableNames in namespace N.)

I spoke to @hgromer and we decided that it made sense to put up a new PR for this Jira.

@eab148 eab148 marked this pull request as ready for review November 25, 2024 15:19
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@eab148 eab148 force-pushed the HBASE-28484-eboland branch from cb2a0fe to 1cbd15a Compare November 26, 2024 16:13
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@eab148 eab148 force-pushed the HBASE-28484-eboland branch from 1cbd15a to fd4d358 Compare December 17, 2024 14:59
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@eab148 eab148 changed the title HBase-28484: Add ability to replicate to a different tableName HBASE-28484: Add ability to replicate to a different tableName Dec 26, 2024
@eab148 eab148 force-pushed the HBASE-28484-eboland branch from 72ed428 to f366b1d Compare December 31, 2024 00:45
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 2m 47s master passed
+1 💚 compile 3m 31s master passed
+1 💚 checkstyle 0m 47s master passed
+1 💚 spotbugs 2m 1s master passed
+1 💚 spotless 0m 42s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 2m 51s the patch passed
+1 💚 compile 3m 17s the patch passed
+1 💚 javac 3m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 49s the patch passed
-1 ❌ spotbugs 1m 43s /new-spotbugs-hbase-server.html hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 hadoopcheck 9m 31s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 38s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 20s The patch does not generate ASF License warnings.
37m 6s
Reason Tests
SpotBugs module:hbase-server
Exception is caught when Exception is not thrown in org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.getReplicationSinkTranslator() At ReplicationSink.java:is not thrown in org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.getReplicationSinkTranslator() At ReplicationSink.java:[line 192]
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6484/14/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6484
JIRA Issue HBASE-28484
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 8dcdf392e763 5.4.0-200-generic #220-Ubuntu SMP Fri Sep 27 13:19:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f366b1d
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6484/14/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 2m 42s master passed
+1 💚 compile 1m 17s master passed
+1 💚 javadoc 0m 44s master passed
+1 💚 shadedjars 4m 51s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 2m 40s the patch passed
+1 💚 compile 1m 25s the patch passed
+1 💚 javac 1m 25s the patch passed
+1 💚 javadoc 1m 12s the patch passed
+1 💚 shadedjars 5m 15s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 2m 21s hbase-common in the patch passed.
+1 💚 unit 189m 23s hbase-server in the patch passed.
217m 45s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6484/14/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6484
JIRA Issue HBASE-28484
Optional Tests javac javadoc unit compile shadedjars
uname Linux d60469152d07 5.4.0-200-generic #220-Ubuntu SMP Fri Sep 27 13:19:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f366b1d
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6484/14/testReport/
Max. process+thread count 4651 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6484/14/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@eab148
Copy link
Contributor Author

eab148 commented Jan 6, 2025

Messed up local + remote branches. Now we are indefinitely waiting for a commit to get processed. I opened a new branch to work around this issue and continue development.
Screenshot 2025-01-06 at 12 24 39

@eab148 eab148 closed this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants