forked from pkj415/KopiaTLA
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathKopiaGC.tla
275 lines (233 loc) · 17.5 KB
/
KopiaGC.tla
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
------------------------------ MODULE KopiaGC ------------------------------
(*
This is a spec to model behaviours of the interaction of Snapshot and GC processes in Kopia.
It is based out of my understanding of the code as of this commit - 9c3d419bc396c0446f21c68813ca195196843eab
General rules of thumb - Avoid using sequences where possible, using sets/bags can reduce state space.
Things to possibly consider later -
1. Had started with a bag of contents for snapshots. Reverted to using a set because
a snapshot will not do anything in case the same content is to be written again. Because, it doesn't
refresh the index from the blob store. This can be relaxed later if we have such a case.
2. Similarly, not considering refresh of indices and snapshot manifest midway in a GC process.
*)
\* TODO - A deleted content entry can get removed during small index compaction if the entry is older than SkipDeletedOlderThan
EXTENDS Integers, Sequences, FiniteSets, Bags, TLC
CONSTANT
NumContents, \* Restrict behaviours to snapshots that can contain contents with content IDs 0..NumContents-1
MaxSnapshotsIssued, \* Maximum number of snapshots that can be issued. Constraint needed to restrict state space, else TLC will run forever
MaxSnapshotTime, \* Maximum number of logical time steps a snapshot process can take. Post that it will not be able to mark itself "completed".
MaxLogicalTime, \* Can't keep TLC running forever
MaxGCsIssued
(* TODO - Explain how logical time is modeled, and how this field changes the scope of possible behaviours *)
VARIABLES
(* Format - Set of content info entries.
{
[content_id |-> 0, deleted |-> TRUE/FALSE, timestamp |-> 0]
}
*)
index,
(* Format - Bag of records.
{
[status |-> "in_progress", "completed", "deleted", \* "completed" is analogous to the manifest for the snapshot being written to the store
contents_written |-> Contents that have been written to the blob storage for the snapshot,
index |-> Point in time view of index read at the start of snapshotting process, later this is updated on every
flush of indices. Note that the update will only append flushed indices to this field. The indices are not refreshed from
the global indices.
index_blob_to_be_flushed |-> index blob that can be flushed to global index,
start_timestamp |-> start_timestamp]
}
*)
snapshots,
(*
{
[
snapshots |-> Point in time view of the snapshots in the repository. Populated when this GC is triggered.
index |-> Point in time view of the global index.
contents_deleted |-> set of deleted content ids,
deletions_to_be_flushed |-> index blob of deletion entries that can be flushed to global index
]
}
*)
gcs,
current_timestamp
\* actionPreformedInTimestep
KopiaInit == /\ snapshots = EmptyBag
/\ current_timestamp = 0
/\ index = {}
/\ gcs = EmptyBag
\* /\ actionPreformedInTimestep = FALSE
NonEmptyPowerset(s) == (SUBSET s \ {{}})
IndexBlobsTypeOK == /\ IsABag(index)
/\ BagToSet(index) \in SUBSET(
[content_id: 0..NumContents-1,
deleted: {TRUE, FALSE},
timestamp: 0..MaxLogicalTime])
KopiaTypeOK == /\ IndexBlobsTypeOK
/\ current_timestamp \in 0..MaxLogicalTime
HasContentInfo(idx_blobs, content_id) == LET matching_content_infos == {
content_info \in idx_blobs: content_info.content_id = content_id}
IN IF matching_content_infos # {} THEN TRUE
ELSE FALSE
\* An index blob will have just one entry for each content ID. It can't have more than one. If this assumption breaks, change this logic.
\* Call this only if HasContentInfo is TRUE.
GetContentInfo(idx_blobs, content_id) ==
LET matching_content_infos == {
content_info \in idx_blobs: content_info.content_id = content_id}
max_timestamp == CHOOSE timestamp \in {content_info.timestamp: content_info \in matching_content_infos}:
\A content_info \in matching_content_infos: timestamp >= content_info.timestamp
latest_matching_content_infos == {
content_info \in matching_content_infos: content_info.timestamp = max_timestamp}
IN IF \E content_info \in latest_matching_content_infos: content_info.deleted = FALSE
THEN CHOOSE content_info \in latest_matching_content_infos: content_info.deleted = FALSE
ELSE CHOOSE content_info \in latest_matching_content_infos: TRUE
\* Define set of all content IDs in the idx_blobs irrespective of whether they are deleted or not.
ContentIDs(idx_blobs) == {content_info.content_id: content_info \in idx_blobs}
\* Either of the implementations of MergeIndices can be used. The first one compacts on the fly.
\* TODO - Handle the case where a deleted entry has the same timestamp as a normal entry for a given content ID. Or maybe don't handle it and leave it
\* as is? If left as is, at a single timestamp there can be a deleted and a normal entry for a single content ID. We can should the normal entry.
\*MergeIndices(idx_blobs_1, idx_blobs_2) == {content_info \in idx_blobs_1 \cup idx_blobs_2:
\* /\
\* \/ ~ HasContentInfo(idx_blobs_1, content_info.content_id)
\* \/ GetContentInfo(idx_blobs_1, content_info.content_id).timestamp <= content_info.timestamp
\* /\
\* \/ ~ HasContentInfo(idx_blobs_2, content_info.content_id)
\* \/ GetContentInfo(idx_blobs_2, content_info.content_id).timestamp <= content_info.timestamp}
\* TODO - For now keeping MergeIndcies a simple union. If possible, check the above commented version and use it.
MergeIndices(idx_blobs_1, idx_blobs_2) == idx_blobs_1 \cup idx_blobs_2
(* A thought (mostly incorrect) -
\* The case of a snapshot process being past its max time limit doesn't require explicit specification, it is as good as a smaller snapshot being completed
\* and then deleted. Note that this will miss the case where a snapshot in progress has written blobs but not indices, this can't be simulated using the
\* case of a smaller snapshot which is deleted, because deletion happens after completion which in turn is post all indices are flushed.
\* Also, the case of a partial snapshot flushing indices post its maximum time limit is not considered. If clocks can be arbitrary *)
WriteContents(snap) == /\ \E contents_to_write \in NonEmptyPowerset(0..NumContents-1 \ snap.contents_written):
\* Pick only those contents which have not been written till now. Even if a snapshot has the same content multiple times
\* it would only write it to the blob and add to index once.
LET contents_non_existing == {c_id \in contents_to_write:
(\/ ~ HasContentInfo(snap.index, c_id)
\/ GetContentInfo(snap.index, c_id).deleted)}
content_info_entries == [content_id: contents_non_existing,
timestamp: {current_timestamp},
deleted: {FALSE}]
updated_snapshot == [snap EXCEPT !.contents_written = snap.contents_written \cup contents_to_write,
!.index_blob_to_be_flushed = snap.index_blob_to_be_flushed \cup content_info_entries]
IN snapshots' = (snapshots (-) SetToBag({snap})) (+) SetToBag({updated_snapshot})
FlushIndex(snapshot) ==
LET
updated_snapshot == [snapshot EXCEPT !.index = MergeIndices(snapshot.index, snapshot.index_blob_to_be_flushed),
!.index_blob_to_be_flushed = {}]
IN
/\ snapshots' = (snapshots (-) SetToBag({snapshot})) (+) SetToBag({updated_snapshot})
/\ index' = MergeIndices(index, snapshot.index_blob_to_be_flushed)
TriggerSnapshot == /\ snapshots' = snapshots (+) SetToBag({[
status |-> "in_progress", contents_written |-> {},
index |-> index,
index_blob_to_be_flushed |-> {},
start_timestamp |-> current_timestamp]})
SnapshotProcessing ==
\/ \* Trigger a snapshot
/\ BagCardinality(snapshots) < MaxSnapshotsIssued
/\ TriggerSnapshot
/\ UNCHANGED <<index, current_timestamp, gcs>>
\/
/\ \E snapshot \in BagToSet(snapshots):
\/ \* Write some contents to a blob
\* TODO - Should this be allowed post snapshot.start_timestamp + MaxSnapshotTime?
/\ snapshot.status = "in_progress"
/\ WriteContents(snapshot)
/\ UNCHANGED <<index, current_timestamp, gcs>>
\/ \* Flush index
/\ snapshot.status = "in_progress"
/\ snapshot.index_blob_to_be_flushed # {}
/\ FlushIndex(snapshot)
/\ UNCHANGED <<current_timestamp, gcs>>
\/ \* Complete snapshot
/\ snapshot.status = "in_progress"
/\ current_timestamp < snapshot.start_timestamp + MaxSnapshotTime
/\ snapshot.index_blob_to_be_flushed = {} \* There is nothing to be flushed
/\ LET updated_snapshot == [snapshot EXCEPT !.status = "completed"]
IN /\ UNCHANGED <<index, current_timestamp, gcs>>
/\ snapshots' = (snapshots (+) SetToBag({updated_snapshot})) (-) SetToBag({snapshot})
\/ \* Delete a snapshot
/\ snapshot.status = "completed"
/\ LET updated_snapshot == [snapshot EXCEPT !.status = "deleted"]
IN /\ snapshots' = (snapshots (+) SetToBag({updated_snapshot})) (-) SetToBag({snapshot})
/\ UNCHANGED <<index, current_timestamp, gcs>>
UnusedContentIDs(idx_blobs, snaps) == {content_id \in ContentIDs(idx_blobs):
/\ ~ GetContentInfo(idx_blobs, content_id).deleted} \ UNION{snap.contents_written: snap \in snaps}
\* TriggerGC stores a point in time view of snapshots and the global index in the GC process record.
\* Consider only completed snapshots in TriggerGC, this helps in reducing state space. Reasoning - If all snapshots were kept in the GC process record,
\* two states with GC processes having the same set of completed snapshots but different set of non-completed snapshots would be differentiated,
\* even though the behaviour following those states won't be affected by the non-completed snapshots stored in the GC record (GC doesn't use the
\* information in non-completed records).
\* Consider only content_info entries which are old enough. Do it in Trigger instead of having a check in UnusedContentIDs (to
\* reduce state space). Here MaxSnapshotTime acts analogous to minContentAge in actual code.
TriggerGC ==
/\ gcs' = gcs (+) SetToBag({[snapshots |-> {snap \in BagToSet(snapshots): snap.status = "completed"},
index |-> {content_info \in index: current_timestamp >= (content_info.timestamp + MaxSnapshotTime)},
contents_deleted |-> {},
deletions_to_be_flushed |-> {}
]})
\* TODO - If content iteration is in some specific order, the state space can be reduced. Right now the specification allows
\* deletion in any order.
DeleteContents(gc) == /\ \E content_ids_to_delete \in
NonEmptyPowerset(UnusedContentIDs(gc.index, gc.snapshots) \ gc.contents_deleted): \* Earlier gc.content_deleted was a set of contents and not content ids. That wasn't required and leads to a larger state space. TODO - Ponder on this.
LET contents_to_delete == [content_id: content_ids_to_delete, timestamp: {current_timestamp}, deleted: {TRUE}]
updated_gc == [gc EXCEPT !.contents_deleted = gc.contents_deleted \cup {content.content_id: content \in contents_to_delete},
!.deletions_to_be_flushed = gc.deletions_to_be_flushed \cup contents_to_delete]
IN
/\ gcs' = (gcs (-) SetToBag({gc})) (+) SetToBag({updated_gc})
FlushDeletedContents(gc) == LET
\* No need to update the gc.index like we did for snapshots. This is because once a content is deleted by its
\* presence in gc.contents_deleted, it will never be checked again in the gc.index.
updated_gc == [gc EXCEPT !.deletions_to_be_flushed = {}]
IN
/\ gcs' = (gcs (-) SetToBag({gc})) (+) SetToBag({updated_gc})
/\ index' = MergeIndices(index, gc.deletions_to_be_flushed)
GC_Processing == \/
\* Trigger GC. Load the index blobs and completed snapshots (not all, to save on state space) at the current_timestamp.
/\ BagCardinality(gcs) < MaxGCsIssued
/\ TriggerGC
/\ UNCHANGED <<snapshots, index, current_timestamp>>
\/
\* Delete some contents which are in the index blobs, but not referenced by the snapshots.
/\ \E gc \in BagToSet(gcs):
/\ DeleteContents(gc)
/\ UNCHANGED <<snapshots, index, current_timestamp>>
\/ \* Flush index blob of deleted entries
/\ \E gc \in BagToSet(gcs):
/\ gc.deletions_to_be_flushed # {}
/\ FlushDeletedContents(gc)
/\ UNCHANGED <<snapshots, current_timestamp>>
KopiaNext == \/
/\ SnapshotProcessing
\/
/\ GC_Processing
\/
\* Tick time forward.
/\ current_timestamp < MaxLogicalTime
/\ current_timestamp' = current_timestamp + 1
/\ UNCHANGED <<index, snapshots, gcs>>
(* Invariants and safety check *)
\* TODO - In this specification a content being marked as deleted is as good as the content not existing i.e., removed by compaction (Stmnt A). This is different
\* from what deletion means in the GC2 spec (which is the actual meaning of the deleted=True marker on a content info) (Stmnt B).
\* Actually, I have written this spec with the meaning of stmnt A in mind. This doesn't change anything for us, but for clarity, I will refactor later to
\* indicate the true meaning as per Kopia code i.e., Stmnt B.
GCInvariant == \A snap \in {snap \in BagToSet(snapshots): snap.status = "completed"}:
/\ \A content_id \in snap.contents_written:
/\ HasContentInfo(index, content_id)
/\ ~ GetContentInfo(index, content_id).deleted
GetContentInfoCheck == ~ \E content1, content2 \in index:
/\ content1 # content2
/\ content1.content_id = content2.content_id
/\ content1.timestamp = content2.timestamp
/\ content1.deleted = TRUE
/\ content2.deleted = FALSE
/\ GetContentInfo(index, content1.content_id).deleted = TRUE
GetContentInfoCheck2 == ~ \E content1, content2 \in index:
/\ content1 # content2
/\ content1.content_id = content2.content_id
/\ content1.timestamp < content2.timestamp
/\ GetContentInfo(index, content1.content_id) = content1
=============================================================================
\* Modification History
\* Last modified Fri May 01 07:56:47 CDT 2020 by pkj
\* Created Fri Apr 10 15:50:28 CDT 2020 by pkj