-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a v2 snapshot when running etcdutl migrate command #19168
base: main
Are you sure you want to change the base?
Conversation
250f708
to
d8f3b56
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
... and 26 files with indirect coverage changes @@ Coverage Diff @@
## main #19168 +/- ##
==========================================
- Coverage 68.83% 68.81% -0.03%
==========================================
Files 420 420
Lines 35678 35737 +59
==========================================
+ Hits 24560 24592 +32
- Misses 9688 9705 +17
- Partials 1430 1440 +10 Continue to review full report in Codecov by Sentry.
|
d8f3b56
to
033f4cf
Compare
d71fc9f
to
f59e8b8
Compare
This is a huge PR, let me breakdown it into small PRs to make the review easier. |
43a213e
to
acffdb5
Compare
acffdb5
to
174c8b4
Compare
So many codecov warnings, is there any way to hide them? |
df49a51
to
099e356
Compare
099e356
to
1187204
Compare
1187204
to
1180f3a
Compare
5695acd
to
e3fb899
Compare
/retest |
/test pull-etcd-robustness-arm64 |
Note that previously I was thinking creating a v2 snapshot file is just an optional step when executing But it would definitely be better to make it as a required step, otherwise we will always see the cannot downgrade storage, WAL contains newer entries error message when executing See also #13405 (comment) cc @fuweid @siyuanfoundation @serathius PTAL |
What about invariant etcd/server/storage/backend.go Lines 98 to 101 in 0dcd015
We cannot just snapshot without updating db. That's not easy thing to change. For |
Not sure I got your point. The
I thought about it before, but the answer is NO. Reasons,
|
e3fb899
to
7589807
Compare
etcdutl/etcdutl/common.go
Outdated
if err := w.SaveSnapshot(walpb.Snapshot{Index: ci, Term: term, ConfState: &confState}); err != nil { | ||
return err | ||
} | ||
if err := w.Save(raftpb.HardState{Term: term, Commit: ci, Vote: st.Vote}, nil); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about adding a HardState here, what's the purpose? We using CI read from db file, means the index here are for sure committed, the commit index might be might further than db which is only flushed once every 5 seconds. I'm worried that it might cause problems with by breaking monotonicity of commit index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about adding a HardState here, what's the purpose?
To ensure the snapshot index is committed, otherwise the snapshot may be filtered by etcdserver on bootstrap. Of course, I agree that we need to ensure the commitIndex never decrease. Will update later.
We using CI read from db file, means the index here are for sure committed
It is NOT guaranteed, because etcd applies the entries and wal sync hardstate async. It's easy to verify,
Firstly, make change something like below,
$ git diff
diff --git a/server/storage/wal/wal.go b/server/storage/wal/wal.go
index f3d7bc5f4..9597a0ccd 100644
--- a/server/storage/wal/wal.go
+++ b/server/storage/wal/wal.go
@@ -961,6 +961,13 @@ func (w *WAL) Save(st raftpb.HardState, ents []raftpb.Entry) error {
return nil
}
+ if st.Commit > 5 {
+ w.lg.Info("########### Sleeping 10 seconds", zap.Uint64("Index", st.Commit))
+ time.Sleep(10 * time.Second)
+ w.lg.Info("########### Panicking after 10 seconds")
+ panic("non empty hard state")
+ }
+
mustSync := raft.MustSync(st, w.state, len(ents))
// TODO(xiangli): no more reference operator
Execute a couple of etcdctl put
command after starting etcd, then etcdserve will panic. Then you will find that consistentIndex is greater than commitIndex.
$ ./etcd-dump-db iterate-bucket ../../default.etcd/member/snap/db meta --decode
key="term", value=2
key="storageVersion", value="3.6.0"
key="consistent_index", value=6
key="confState", value="{\"voters\":[10276657743932975437],\"auto_leave\":false}"
$ ./etcd-dump-logs ../../default.etcd/
Snapshot:
empty
Start dumping log entries from snapshot.
WAL metadata:
nodeID=8e9e05c52164694d clusterID=cdf818194e3a8c32 term=2 commitIndex=5 vote=8e9e05c52164694d
WAL entries: 6
lastIndex=6
term index type data
1 1 conf method=ConfChangeAddNode id=8e9e05c52164694d
2 2 norm
2 3 norm header:<ID:7587884288152690690 > cluster_member_attr_set:<member_ID:10276657743932975437 member_attributes:<name:"default" client_urls:"http://localhost:2379" > >
2 4 norm header:<ID:7587884288152690692 > cluster_version_set:<ver:"3.6.0" >
2 5 norm header:<ID:7587884288152690694 > put:<key:"k1" value:"v1" >
2 6 norm header:<ID:7587884288152690695 > put:<key:"k2" value:"v2" >
Entry types (Normal,ConfigChange) count is : 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation, you are right, there is nothing beneficial from fsyncing HardState to WAL. Still I don't think there is anything beneficial from commiting the Snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I did not explain it clearly.
Firstly the consistent_index may be greater than the commit Index (already confirmed based on my above comment).
Secondly, when etcdserver bootstraps, it only loads v2 snapshot with Index <= commit Index (see below),
etcd/server/storage/wal/wal.go
Lines 645 to 652 in 363b166
// filter out any snaps that are newer than the committed hardstate | |
n := 0 | |
for _, s := range snaps { | |
if s.Index <= state.Commit { | |
snaps[n] = s | |
n++ | |
} | |
} |
So we need to commit the snapshot index, of course only when it's greater than the existing commitIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I now understand the motivation about putting HardState, however I see another problem. If the reason to add generation of snapshot is to prevent etcd v3.5 from going back in WAL and reading SetClusterVersion("3.6")
, then what about situations where SetClusterVersion
has raft index after CI?
PS: Please don't resolve conversations on discussion that didn't get an answer that was accepted by both sides. It slows down review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then what about situations where
SetClusterVersion
has raft index after CI?
Theoretically it's possible, but in practice it's unlikely. Creating a v2 snapshot is just a best effort action.
PS: Please don't resolve conversations on discussion that didn't get an answer that was accepted by both sides. It slows down review
The motivation is that resolving the conversation gives reviewers an impression it's ready for further round of review, instead of still getting stuck on the existing comment.
Also based on previous experience, usually there is no response for a comment for a long time, sometimes no any followup comments at all (it's not aim at you, just a generic comment on review process). So I tend to resolve it if I think everything has resolved.
Improvement:
- I will try to keep it for a little longer next time even if I think everything is resolved.
- Please also feel free to unresolved it if you or anyone still have comment in a conversation.
7589807
to
ae5d208
Compare
cc @serathius |
Also added test to cover the etcdutl migrate command Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ae5d208
to
732e7ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this, it gave me a good insight.
ClusterVersionSet was introduced in v3.5 and is annotated as such etcd/api/membershippb/membership.proto Lines 40 to 44 in 32cfd45
The reason why v3.5 etcd still uses old |
cc @siyuanfoundation can you also take a look as you worked on downgrades. |
I think if we change how This reminds me that because the WAL compatibility of v3.5 and v3.6 I explicitly wanted to add a fake new entry to WAL that could be used to test downgrades and this compatibility checks. Think I just ended up using |
It means that we need to partially rollback #13405, which is what I had been trying avoid doing in the last minute before we release 3.6.0. Anyway, let me see what's the effort and what's the impact. |
Just raised #19263, PTAL |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, fuweid, siyuanfoundation The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Refer to #17911 (comment)
This PR will make the
etcdutl migrate
command fully functional.It creates a v2snapshot from the v3store.
You will never see error below anymore when executing
etcdutl migrate
command,After executing the migrate command for all members, you just need to directly replace the binary of each member, then all done for the offline downgrade. Of course, it's still recommended to follow/perform the online downgrade process, as it doesn't break the workload. cc @ivanvc @jmhbnz
It also adds a separate
etcdutl v2snapshot create
commandIt's just a manual last to resort solution for any potential issue. Usually we don't need it.
I need to add e2e test. I may also break down it into smaller PRs.