Create a v2 snapshot when running etcdutl migrate command #19168

ahrtr · 2025-01-10T18:16:19Z

Refer to #17911 (comment)

This PR will make the etcdutl migrate command fully functional.

It creates a v2snapshot from the v3store.
You will never see error below anymore when executing etcdutl migrate command,
```
Error: cannot downgrade storage, WAL contains newer entries, as the target version (3.5.0) is lower than the version (3.6.0) detected from WAL logs
```
After executing the migrate command for all members, you just need to directly replace the binary of each member, then all done for the offline downgrade. Of course, it's still recommended to follow/perform the online downgrade process, as it doesn't break the workload. cc @ivanvc @jmhbnz
It also adds a separate etcdutl v2snapshot create command
It's just a manual last to resort solution for any potential issue. Usually we don't need it.

I need to add e2e test. I may also break down it into smaller PRs.

codecov · 2025-01-10T18:33:59Z

Codecov Report

Attention: Patch coverage is 68.33333% with 19 lines in your changes missing coverage. Please review.

Project coverage is 68.81%. Comparing base (32cfd45) to head (732e7ef).

Files with missing lines	Patch %	Lines
etcdutl/etcdutl/common.go	77.35%	6 Missing and 6 partials ⚠️
etcdutl/etcdutl/migrate_command.go	0.00%	7 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
etcdutl/etcdutl/migrate_command.go	`0.00% <0.00%> (ø)`
etcdutl/etcdutl/common.go	`74.68% <77.35%> (+28.52%)`	⬆️

... and 26 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19168      +/-   ##
==========================================
- Coverage   68.83%   68.81%   -0.03%     
==========================================
  Files         420      420              
  Lines       35678    35737      +59     
==========================================
+ Hits        24560    24592      +32     
- Misses       9688     9705      +17     
- Partials     1430     1440      +10

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32cfd45...732e7ef. Read the comment docs.

ahrtr · 2025-01-11T17:53:58Z

This is a huge PR, let me breakdown it into small PRs to make the review easier.

serathius · 2025-01-13T12:58:24Z

So many codecov warnings, is there any way to hide them?

ahrtr · 2025-01-16T16:47:49Z

/retest

ahrtr · 2025-01-16T17:45:40Z

/test pull-etcd-robustness-arm64

ahrtr · 2025-01-20T18:38:14Z

Note that previously I was thinking creating a v2 snapshot file is just an optional step when executing etcdutl migarte.

But it would definitely be better to make it as a required step, otherwise we will always see the cannot downgrade storage, WAL contains newer entries error message when executing etcdutl migrate on a newly created cluster, due to the clusterVer with a new version (e.g 3.6, when migrating from 3.6 to 3.5) .

See also #13405 (comment)

cc @fuweid @siyuanfoundation @serathius PTAL

serathius · 2025-01-21T09:58:06Z

What about invariant snapshot.Metadata.Index < db.consistentIndex?

etcd/server/storage/backend.go

Lines 98 to 101 in 0dcd015

    
           // RecoverSnapshotBackend recovers the DB from a snapshot in case etcd crashes 
        
           // before updating the backend db after persisting raft snapshot to disk, 
        
           // violating the invariant snapshot.Metadata.Index < db.consistentIndex. In this 
        
           // case, replace the db with the snapshot db sent by the leader.

We cannot just snapshot without updating db. That's not easy thing to change.

For cannot downgrade storage, WAL contains newer entries, as the target version (3.5.0) is lower than the version (3.6.0) detected from WAL logs error we might consider not considering SetClusterVersion("3.6") as an entry that makes wal version equal to 3.6.

ahrtr · 2025-01-21T10:46:23Z

What about invariant snapshot.Metadata.Index < db.consistentIndex?

etcd/server/storage/backend.go

Lines 98 to 101 in 0dcd015

// RecoverSnapshotBackend recovers the DB from a snapshot in case etcd crashes

// before updating the backend db after persisting raft snapshot to disk,

// violating the invariant snapshot.Metadata.Index < db.consistentIndex. In this

// case, replace the db with the snapshot db sent by the leader.

We cannot just snapshot without updating db. That's not easy thing to change.

Not sure I got your point. The etcdutl migrate is just an offline tool, and we are going to creating a v2snapshot file based on the v3store(bbolt db). Recovering the DB from a snapshot in case etcd crashes isn't etcdutl's responsiblity.

For cannot downgrade storage, WAL contains newer entries, as the target version (3.5.0) is lower than the version (3.6.0) detected from WAL logs error we might consider not considering SetClusterVersion("3.6") as an entry that makes wal version equal to 3.6.

I thought about it before, but the answer is NO. Reasons,

3.5 doesn't have SetClusterVersion entries in WAL file. If we see such entries, it's definitely generated by 3.6 or above version.
Example for 3.6,

$ ./etcd-dump-logs ../../default.etcd/
Snapshot:
empty
Start dumping log entries from snapshot.
WAL metadata:
nodeID=8e9e05c52164694d clusterID=cdf818194e3a8c32 term=2 commitIndex=5 vote=8e9e05c52164694d
WAL entries: 5
lastIndex=5
term	     index	type	data
   1	         1	conf	method=ConfChangeAddNode id=8e9e05c52164694d
   2	         2	norm	
   2	         3	norm	header:<ID:7587884260861681666 > cluster_member_attr_set:<member_ID:10276657743932975437 member_attributes:<name:"default" client_urls:"http://localhost:2379" > > 
   2	         4	norm	header:<ID:7587884260861681668 > cluster_version_set:<ver:"3.6.0" > 
   2	         5	norm	header:<ID:7587884260861681669 > put:<key:"k1" value:"v1" > 

Entry types (Normal,ConfigChange) count is : 5

Example for 3.5,

$ ./etcd-dump-logs ../../default.etcd/
Snapshot:
empty
Start dumping log entries from snapshot.
WAL metadata:
nodeID=8e9e05c52164694d clusterID=cdf818194e3a8c32 term=2 commitIndex=5 vote=8e9e05c52164694d
WAL entries: 5
lastIndex=5
term	     index	type	data
   1	         1	conf	method=ConfChangeAddNode id=8e9e05c52164694d
   2	         2	norm	
   2	         3	norm	method=PUT path="/0/members/8e9e05c52164694d/attributes" val="{\"name\":\"default\",\"clientURLs\":[\"http://localhost:2379\"]}"
   2	         4	norm	method=PUT path="/0/version" val="3.5.0"
   2	         5	norm	header:<ID:7587884260899009541 > put:<key:"k1" value:"v1" > 

Entry types (Normal,ConfigChange) count is : 5

We can revisit this in 3.7, as it won't be needed anymore.

etcdutl/etcdutl/common.go

etcdutl/etcdutl/common_test.go

serathius · 2025-01-22T14:00:42Z

etcdutl/etcdutl/common.go

+	if err := w.SaveSnapshot(walpb.Snapshot{Index: ci, Term: term, ConfState: &confState}); err != nil {
+		return err
+	}
+	if err := w.Save(raftpb.HardState{Term: term, Commit: ci, Vote: st.Vote}, nil); err != nil {


Not sure about adding a HardState here, what's the purpose? We using CI read from db file, means the index here are for sure committed, the commit index might be might further than db which is only flushed once every 5 seconds. I'm worried that it might cause problems with by breaking monotonicity of commit index.

Not sure about adding a HardState here, what's the purpose?

To ensure the snapshot index is committed, otherwise the snapshot may be filtered by etcdserver on bootstrap. Of course, I agree that we need to ensure the commitIndex never decrease. Will update later.

We using CI read from db file, means the index here are for sure committed

It is NOT guaranteed, because etcd applies the entries and wal sync hardstate async. It's easy to verify,

Firstly, make change something like below,

$ git diff diff --git a/server/storage/wal/wal.go b/server/storage/wal/wal.go index f3d7bc5f4..9597a0ccd 100644 --- a/server/storage/wal/wal.go +++ b/server/storage/wal/wal.go @@ -961,6 +961,13 @@ func (w *WAL) Save(st raftpb.HardState, ents []raftpb.Entry) error { return nil } + if st.Commit > 5 { + w.lg.Info("########### Sleeping 10 seconds", zap.Uint64("Index", st.Commit)) + time.Sleep(10 * time.Second) + w.lg.Info("########### Panicking after 10 seconds") + panic("non empty hard state") + } + mustSync := raft.MustSync(st, w.state, len(ents)) // TODO(xiangli): no more reference operator

Execute a couple of etcdctl put command after starting etcd, then etcdserve will panic. Then you will find that consistentIndex is greater than commitIndex.

$ ./etcd-dump-db iterate-bucket ../../default.etcd/member/snap/db meta --decode key="term", value=2 key="storageVersion", value="3.6.0" key="consistent_index", value=6 key="confState", value="{\"voters\":[10276657743932975437],\"auto_leave\":false}" $ ./etcd-dump-logs ../../default.etcd/ Snapshot: empty Start dumping log entries from snapshot. WAL metadata: nodeID=8e9e05c52164694d clusterID=cdf818194e3a8c32 term=2 commitIndex=5 vote=8e9e05c52164694d WAL entries: 6 lastIndex=6 term index type data 1 1 conf method=ConfChangeAddNode id=8e9e05c52164694d 2 2 norm 2 3 norm header:<ID:7587884288152690690 > cluster_member_attr_set:<member_ID:10276657743932975437 member_attributes:<name:"default" client_urls:"http://localhost:2379" > > 2 4 norm header:<ID:7587884288152690692 > cluster_version_set:<ver:"3.6.0" > 2 5 norm header:<ID:7587884288152690694 > put:<key:"k1" value:"v1" > 2 6 norm header:<ID:7587884288152690695 > put:<key:"k2" value:"v2" > Entry types (Normal,ConfigChange) count is : 6

Good observation, you are right, there is nothing beneficial from fsyncing HardState to WAL. Still I don't think there is anything beneficial from commiting the Snapshot.

Probably I did not explain it clearly.

Firstly the consistent_index may be greater than the commit Index (already confirmed based on my above comment).

Secondly, when etcdserver bootstraps, it only loads v2 snapshot with Index <= commit Index (see below),

etcd/server/storage/wal/wal.go

Lines 645 to 652 in 363b166

// filter out any snaps that are newer than the committed hardstate

n := 0

for _, s := range snaps {

if s.Index <= state.Commit {

snaps[n] = s

n++

}

}

So we need to commit the snapshot index, of course only when it's greater than the existing commitIndex.

Ok, I think I now understand the motivation about putting HardState, however I see another problem. If the reason to add generation of snapshot is to prevent etcd v3.5 from going back in WAL and reading SetClusterVersion("3.6"), then what about situations where SetClusterVersion has raft index after CI?

PS: Please don't resolve conversations on discussion that didn't get an answer that was accepted by both sides. It slows down review

then what about situations where SetClusterVersion has raft index after CI?

Theoretically it's possible, but in practice it's unlikely. Creating a v2 snapshot is just a best effort action.

PS: Please don't resolve conversations on discussion that didn't get an answer that was accepted by both sides. It slows down review

The motivation is that resolving the conversation gives reviewers an impression it's ready for further round of review, instead of still getting stuck on the existing comment.

Also based on previous experience, usually there is no response for a comment for a long time, sometimes no any followup comments at all (it's not aim at you, just a generic comment on review process). So I tend to resolve it if I think everything has resolved.

Improvement:

I will try to keep it for a little longer next time even if I think everything is resolved.

Please also feel free to unresolved it if you or anyone still have comment in a conversation.

ahrtr · 2025-01-23T16:23:46Z

cc @serathius

Also added test to cover the etcdutl migrate command Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ah8ad3

Thanks for implementing this, it gave me a good insight.

serathius · 2025-01-23T19:43:48Z

3.5 doesn't have SetClusterVersion entries in WAL file. If we see such entries, it's definitely generated by 3.6 or above version.

ClusterVersionSet was introduced in v3.5 and is annotated as such

etcd/api/membershippb/membership.proto

Lines 40 to 44 in 32cfd45

    
           message ClusterVersionSetRequest { 
        
             option (versionpb.etcd_version_msg) = "3.5"; 
        
             string ver = 1; 
        
           }

.

The reason why v3.5 etcd still uses old method=PUT path="/0/version" val="3.5.0" is backward compatibility. The versioned WAL entries were introduced just in v3.6, so before that etcd maintainers added new proto across multiple releases to ensure compatibility. SetClusterVersion was added in v3.5, but not used to ensure it doesn't break v3.4 during upgrade. SetClusterVersion started being used in v3.6, but is v3.5 compatible because v3.5 understands it.

serathius · 2025-01-23T19:50:03Z

cc @siyuanfoundation can you also take a look as you worked on downgrades.

serathius · 2025-01-23T19:51:41Z

We can revisit this in 3.7, as it won't be needed anymore.

I think if we change how SetClusterVersion is handled in WAL version detection than we would not need this PR at all. I think that would be a simpler solution. I think I might have made mistake by implementing it like this just to be safe. However, if there are no incompatible entries in the WAL there is no reason not allow downgrade without snapshot. Also as there are no new entries in v3.6 then we would be able to always downgrade to v3.5.

This reminds me that because the WAL compatibility of v3.5 and v3.6 I explicitly wanted to add a fake new entry to WAL that could be used to test downgrades and this compatibility checks. Think I just ended up using SetClusterVersion for that.

ahrtr · 2025-01-23T20:16:06Z

I think if we change how SetClusterVersion is handled in WAL version detection than we would not need this PR at all.

It means that we need to partially rollback #13405, which is what I had been trying avoid doing in the last minute before we release 3.6.0.

Anyway, let me see what's the effort and what's the impact.

ahrtr · 2025-01-23T22:25:05Z

Just raised #19263, PTAL

k8s-ci-robot · 2025-01-24T02:38:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, fuweid, siyuanfoundation

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahrtr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added area/etcdutl approved size/L labels Jan 10, 2025

ahrtr force-pushed the etcdutl_snapshot_20250110 branch 2 times, most recently from 250f708 to d8f3b56 Compare January 10, 2025 18:21

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from d8f3b56 to 033f4cf Compare January 10, 2025 19:36

k8s-ci-robot added the area/testing label Jan 11, 2025

ahrtr force-pushed the etcdutl_snapshot_20250110 branch 2 times, most recently from d71fc9f to f59e8b8 Compare January 11, 2025 16:15

This was referenced Jan 11, 2025

Move getLatestWALSnap into etcdutl/common.go #19173

Merged

Enhance test case TestEtctlutlMigrate to support multiple member cluster #19174

Merged

ahrtr force-pushed the etcdutl_snapshot_20250110 branch 4 times, most recently from 43a213e to acffdb5 Compare January 13, 2025 10:13

etcd-io deleted a comment from k8s-ci-robot Jan 13, 2025

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from acffdb5 to 174c8b4 Compare January 13, 2025 12:25

ahrtr force-pushed the etcdutl_snapshot_20250110 branch 2 times, most recently from df49a51 to 099e356 Compare January 15, 2025 16:15

k8s-ci-robot added size/XL and removed size/L labels Jan 15, 2025

This was referenced Jan 15, 2025

Rename migrate_command_test.go to common_test.go #19199

Merged

Minor refactor on RaftCluster.Recover() #19200

Merged

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from 099e356 to 1187204 Compare January 15, 2025 19:06

k8s-ci-robot added size/L and removed size/XL labels Jan 15, 2025

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from 1187204 to 1180f3a Compare January 15, 2025 19:33

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from 5695acd to e3fb899 Compare January 16, 2025 15:33

ahrtr requested review from serathius, siyuanfoundation and fuweid January 20, 2025 14:54

fuweid reviewed Jan 21, 2025

View reviewed changes

etcdutl/etcdutl/common.go Show resolved Hide resolved

etcdutl/etcdutl/common_test.go Show resolved Hide resolved

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from e3fb899 to 7589807 Compare January 22, 2025 09:49

serathius reviewed Jan 22, 2025

View reviewed changes

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from 7589807 to ae5d208 Compare January 22, 2025 16:50

fuweid approved these changes Jan 23, 2025

View reviewed changes

ahrtr closed this Jan 23, 2025

ahrtr deleted the etcdutl_snapshot_20250110 branch January 23, 2025 16:25

ahrtr restored the etcdutl_snapshot_20250110 branch January 23, 2025 16:31

ahrtr reopened this Jan 23, 2025

Create a v2 snapshot when running etcdutl migrate command

732e7ef

Also added test to cover the etcdutl migrate command Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr force-pushed the etcdutl_snapshot_20250110 branch from ae5d208 to 732e7ef Compare January 23, 2025 16:33

ah8ad3 reviewed Jan 23, 2025

View reviewed changes

ahrtr mentioned this pull request Jan 23, 2025

Treat ClusterVersionSet as 3.5 version #19263

Open

siyuanfoundation approved these changes Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a v2 snapshot when running etcdutl migrate command #19168

Create a v2 snapshot when running etcdutl migrate command #19168

ahrtr commented Jan 10, 2025

codecov bot commented Jan 10, 2025 •

edited

Loading

ahrtr commented Jan 11, 2025

serathius commented Jan 13, 2025

ahrtr commented Jan 16, 2025

ahrtr commented Jan 16, 2025

ahrtr commented Jan 20, 2025

serathius commented Jan 21, 2025

ahrtr commented Jan 21, 2025

serathius Jan 22, 2025

ahrtr Jan 22, 2025

ahrtr Jan 22, 2025

serathius Jan 22, 2025

ahrtr Jan 22, 2025

serathius Jan 23, 2025 •

edited

Loading

ahrtr Jan 23, 2025

ahrtr commented Jan 23, 2025

ah8ad3 left a comment

serathius commented Jan 23, 2025

serathius commented Jan 23, 2025

serathius commented Jan 23, 2025 •

edited

Loading

ahrtr commented Jan 23, 2025

ahrtr commented Jan 23, 2025

k8s-ci-robot commented Jan 24, 2025

	// filter out any snaps that are newer than the committed hardstate
	n := 0
	for _, s := range snaps {
	if s.Index <= state.Commit {
	snaps[n] = s
	n++
	}
	}

Create a v2 snapshot when running etcdutl migrate command #19168

Are you sure you want to change the base?

Create a v2 snapshot when running etcdutl migrate command #19168

Conversation

ahrtr commented Jan 10, 2025

codecov bot commented Jan 10, 2025 • edited Loading

Codecov Report

ahrtr commented Jan 11, 2025

serathius commented Jan 13, 2025

ahrtr commented Jan 16, 2025

ahrtr commented Jan 16, 2025

ahrtr commented Jan 20, 2025

serathius commented Jan 21, 2025

ahrtr commented Jan 21, 2025

serathius Jan 22, 2025

Choose a reason for hiding this comment

ahrtr Jan 22, 2025

Choose a reason for hiding this comment

ahrtr Jan 22, 2025

Choose a reason for hiding this comment

serathius Jan 22, 2025

Choose a reason for hiding this comment

ahrtr Jan 22, 2025

Choose a reason for hiding this comment

serathius Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

ahrtr Jan 23, 2025

Choose a reason for hiding this comment

ahrtr commented Jan 23, 2025

ah8ad3 left a comment

Choose a reason for hiding this comment

serathius commented Jan 23, 2025

serathius commented Jan 23, 2025

serathius commented Jan 23, 2025 • edited Loading

ahrtr commented Jan 23, 2025

ahrtr commented Jan 23, 2025

k8s-ci-robot commented Jan 24, 2025

codecov bot commented Jan 10, 2025 •

edited

Loading

serathius Jan 23, 2025 •

edited

Loading

serathius commented Jan 23, 2025 •

edited

Loading