-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28484: Add ability to replicate to a different tableName #6484
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
* Convert sourceToSinkTableOverrides Object to Map. | ||
*/ | ||
public static Map<TableName, TableName> | ||
convert2Map(ReplicationProtos.SourceToSinkTableOverride[] sourceToSinkTableOverrides) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to convertTableOverridesToMap
?
if (sourceToSinkTableOverrides == null || sourceToSinkTableOverrides.length == 0) { | ||
return null; | ||
} | ||
Map<TableName, TableName> sourceToSinkTableOverridesMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could pre-size this map at n entries
* Sets an explicit map of source to sink namespaces that should be replicated to the given peer. | ||
* If the map is empty for a namespace, the source namespace is used for the given peer. Use | ||
* {@link #setSourceToSinkTableOverrides} to override the namespace overrides set in this method | ||
* for a given table. | ||
* @param sourceToSinkNamespaceOverrides A map from a source namespace to sink namespace. By | ||
* default, edits will be replicated to the same namespace | ||
* as the source namespace. A null or empty collection can | ||
* be passed to indicate there are no overrides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the map of table overrides only covers a subset of the peer's tables? I wonder whether this method should throw in that case so that we don't need to explicitly support it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the overrides only cover a subset of the peer's tables, then we assume that the source tableName is the same as the sink tableName for all of those tableNames that do not exist in the map
ReplicationPeerConfigBuilder | ||
setSourceToSinkTableOverrides(Map<TableName, TableName> sourceToSinkTableOverrides); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these keys and values be TableNames instead? That would allow you to make fewer assumptions about the namespace, and simplify down to only one map of overrides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original motivation for having namespace and tableName scoped overrides was two fold.
Every imaginable naming override can be categorized into one of these two buckets. Say, for example, we want to support column family name translations in the future. We’d need to provide a family → sinkFamily map for the given tableName because multiple tableNames could have the same column family name. On the other hand, say we want to add a prefix to all tables in the namespace N. We’d need to add a tableNamePrefix field to a given NamespaceOverride object.
This framework follows the existing architecture for replication. Currently, in the ReplicationPeerConfig, clients can opt namespaces or (tableName, family) pairs into or out of replication. Thus, it makes sense to allow users to opt namespaces or tableNames into naming overrides for replication.
This is copied from the design doc, which I did not link before you reviewed the PR (apologies for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the two override scopes does not make sense. I am happy to converge on tableName scoped overrides alone, since every namespace scoped override can be encoded as a tableName override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any overlap between this work and #5819?
This comment has been minimized.
This comment has been minimized.
Both PRs empower users to replicate data from tableNameA to tableNameB. But this one is a bit more extensible for future naming translation additions (like adding a prefix to all tableNames in namespace N.) I spoke to @hgromer and we decided that it made sense to put up a new PR for this Jira. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cb2a0fe
to
1cbd15a
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1cbd15a
to
fd4d358
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
72ed428
to
f366b1d
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Messed up local + remote branches. Now we are indefinitely waiting for a commit to get processed. I opened a new branch to work around this issue and continue development. |
Design document
Jira
Currently, replication can only occur if the source and sink clusters both house tables with the same (tableName, family) pairs. This requirement exists so that the sink cluster knows where to persist the data it receives from the source cluster. In this PR, we loosen the naming constraint and give clients more configuration power over the name of their sink namespaces and tableNames.
cc: @rmdmattingly @hgromer @krconv @ndimiduk @bbeaudreault