-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathotm-gateway.html
522 lines (504 loc) · 25.2 KB
/
otm-gateway.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
<title>OTM Gateway API Specification</title>
<script
src='https://www.w3.org/Tools/respec/respec-w3c'
class='remove'></script>
<script class='remove'>
var respecConfig = {
specStatus: "base",
editors: [{
name: "Tom Johnson",
company: "University of California, Santa Barbara Library",
companyURL: "https://www.library.ucsb.edu/"
},{
name: "Sibyl Schaefer",
company: "University of California, San Diego",
companyURL: "https://ucsd.edu"
}],
gitHub: "ucsdlib/otm-specs",
shortName: "otm-gateway-api",
wg: "One to Many Working Group",
wgURI: "https://wiki.lyrasis.org/display/OTM",
edDraftURI: "https://github.com/ucsdlib/otm-specs/blob/master/otm-gateway.html",
maxTocLevel: 3,
localBiblio: {
"RFC7617": {
title: "The 'Basic' HTTP Authentication Scheme",
href: "https://tools.ietf.org/html/rfc7617"
},
"RFC8493": {
title: "The BagIt File Packaging Format (V1.0)",
href: "https://tools.ietf.org/html/rfc8493"
},
"S3API": {
title: "Amazon S3 REST API Introduction",
href: "https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html"
}
},
otherLinks: [{
key: "OTM Specifications",
data: [{
value: "One to Many (OTM) Specifications Overview",
href: "."
},
{
value: "OTM Preservation Workflow",
href: "preservation-workflow.html"
},
{
value: "OTM Bridge API Specification",
href: "otm-bridge.html"
},
{
value: "OTM Appendix - Audit Events",
href: "audit-appendix.html"
},
{
value: "OTM Appendix - Hyrax Workflow",
href: "hyrax-workflow.html"
}]
}]
};
</script>
<link rel="stylesheet" href="otm-styles.css">
</head>
<body>
<p class='copyright'>This document is licensed under a
<a class='subfoot' href='https://creativecommons.org/licenses/by/4.0/' rel='license'>
Creative Commons Attribution 4.0 License
</a>.
</p>
<section id='abstract'>
<p>
The One To Many (OTM) Gateway API Specification is one of two APIs used to support communication between digital
content repository systems (Repository) and distributed digital preservation systems (DDP). These APIs work in tandem
to allow content captured in Repository systems to be copied to DDP systems for preservation.
</p>
<p>
The preservation Gateway functions as an aggregating cache for preservation requests originating with a repository
and destined for a DDP via the <a href="otm-bridge.html">One to Many Bridge API</a>. The Gateway API provides a
synchronous interface for requests related to preservation of content, even when the preservation system's
interactions are fundamentally asynchronous.
</p>
</section>
<section>
<h2>Status of This Document</h2>
<p>This document is a draft of a specification, created as part of the One to Many grant, funded by the Andrew W.
Mellon Foundation.</p>
</section>
<section id="conformance">
</section>
<section id="introduction" class="informative">
<h2>Introduction</h2>
<p>
This specification describes APIs enabling digital object repository systems to manage preservation of content. The
interfaces described here are intended to fit within the larger context of the <a
href="preservation-workflow.html">One to Many Preservation Workflow</a>.
</p>
<p>
This specification defines a set of interfaces for synchronous interactions for depositing, restoring, and purging
objects across multiple distributed digital preservation (DDP) systems, using a single system managed by the
repository administrators. Because DDPs (and the <a href="otm-bridge.html">One to Many Bridge API</a>) perform
deposit and retrieval actions asynchronously, the Gateway's synchronous interfaces are needed to support real-time
feedback for repository managers and curators. The Gateway also serves as a short-term cache of content to be
preserved, allowing the repository system to continue its normal object life-cycle while ensuring exact bitwise
preservation of the objects at the time they were selected for preservation. The Gateway relieves the repository of
the requirement to manage preservation beyond calls to the Gateway API. To this end, Gateway implementations should
seek to provide robust guarantees to the repository about the eventual preservation of content.
</p>
<p>
For background, it may be helpful to read the
<a href="https://wiki.lyrasis.org/display/OTM/Project+Overview+and+Goals">One to Many Project Overview and Goals</a>
and the <a href="https://wiki.lyrasis.org/display/OTM/User+Stories">One to Many User Stories</a>.
</p>
<p class="note">
The gateway interfaces are designed with substantial overlap with the S3 API ([[S3API]]).
Interactions between the repository and the gateway closely follow patterns used by
S3-compatible object stores. This design descision is intended to support ease of
implementation and code reuse for repository systems that may already support S3 for
object storage, or may wish to replicate preserved content in an object store as well
as a DDP.
</p>
<section>
<h3>A note about <em>Objects</em></h3>
<p>The Gateway's repository-facing interfaces are concerned with the management of <em>objects</em>, represented by
<code>object-id</code> URL arguments and packaged in the BagIt format. In the context of the repository, it's normal
for these <em>objects</em> to be meaningful and interpretable beyond their bitwise content.</p>
<p>The Gateway exposes the <em>object</em> content as <em>files</em> within <em>file groups</em> for use by the
Bridge. This distinction reflects a separation of concerns: the Bridge and DDP are ultimately concerned with
guarantee of bitwise preservation and retrieval of <em>files</em>; the repository and Gateway are concerned with the
presentation and interpretation of those <em>files</em> as representations of meaningful resources. By making this
separation explicit, the Gateway and Bridge disclaim any out-of-band coordination of meaning between the repository
and the DDP.</p>
<p>The semantics of the <code>object-id</code>, the scope of the objects (i.e. which files are included), and the
relationships between the files are decisions left to the repository administrators. Repositories should include in
each <em>object</em> sufficient information for an agent to determine its schema, understand its structure, and
reconstruct it within the repository given only the <em>file group</em>, its <em>file group identifier</em>, and the
contents of the individual files. </p>
</section>
</section>
<section>
<h2>Repository API</h2>
<section>
<h3><dfn>Gateway Service Description</dfn></h3>
<p>Provides a description of this Gateway, including the API version, and the available preservation providers.</p>
<ul>
<li>Request Type: <code>GET</code></li>
<li>Request Path: <code>/</code></li>
<li>Response Body: <code>JSON</code>
<ul>
<li><code>gateway-version</code>: The current version of the Gateway API.</li>
<li><code>providers</code>: The preservation providers available through this Gateway.
<ul>
<li><code>name</code>: The name of the preservation provider; e.g. "chronopolis". This name MUST be unique
in the context of this gateway. Gateways SHOULD NOT change the names of preservation providers.</li>
</ul>
</li>
</ul>
</li>
<li>Response Code: <code>200</code> (on success)</li>
<pre class="example" title="Gateway Service Description JSON">
{
"gateway-version" : "0.1.0",
"providers" : [{
"name": "ddp1_name"
},
{
"name": "ddp2_name"
}]
}
</pre>
</ul>
</section>
<section>
<h3><dfn>Deposit Object</dfn></h3>
<p>Creates or updates a Gateway Object for preservation of an Object from the repository.</p>
<p>Upon acceptance of the request, the Gateway MUST create a unique version identifier for the object, corresponding
to the exact content in this request. This identifier MUST be returned to the client in a
<code>x-otm-version-id</code> response header.</p>
<p>The request body MUST be a BagIt <em>bag</em> as defined by [[RFC8493]], packaged and compressed in the media type
specified by the <code>Content-Type</code> request header. The Gateway MUST request deposit of the payload files to
the preservation provider given in <code>x-otm-preservation-provider</code> via the <a
href="otm-bridge.html#deposit-content">Bridge's Deposit Content</a> endpoint.</p>
<p>For efficiency, the requester MAY omit some or all payload files from the <em>bag</em>, instead providing a
<code>fetch.txt</code> as specified in Section 2.2.3 of [[RFC8493]]. The URL given in the <code>fetch.txt</code> MUST
be reachable by the Bridge (i.e. on the public internet) using the credentials and authentication methods described
in <a href="#transfer-file">Transfer File</a>. In the case of files provided in this way, the Gateway MAY implement
<a href="#transfer-file">Transfer File</a> as a redirect to the given URL.</p>
<ul>
<li>Request Type: <code>PUT</code></li>
<li>Request Path: <code>/{object-id}</code></li>
<li>Request Headers:
<ul>
<li><code>Content-Type</code>: Media type of the object to be deposited.</li>
<li><code>Content-Length</code>: Size in bytes of the object to be deposited</li>
<li><code>x-otm-preservation-provider</code>: the name of preservation systems targeted for this deposit.</li>
</ul>
</li>
<li>Request Body: Compressed Bag</li>
<li>Response Code: <code>200</code> (on success)</li>
<li>Response Headers:
<ul>
<li><code>ETag</code>: Entity tag for the deposited object.</li>
<li><code>x-otm-version-id</code>: Version of the object.</li>
</ul>
</li>
<pre class="example" title="Deposit Object Request">
PUT /af48c3d HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:15:00 GMT
Content-Type: application/zip
Content-Length: 493285
Content-MD5: 4efcb3d98ce0fabfd585eb6c4332859
[493285 bytes of object data]
</pre>
<pre class="example" title="Deposit Object Response">
HTTP/1.1 200 OK
Date: Tue, 02 Jul 2019 20:15:00 GMT
ETag: "4efcb3d98ce0fabfd585eb6c4332859"
Content-Length: 0
Server: OTM Preservation Gateway
</pre>
<ul class="note">
<li>The use of <code>fetch.txt</code> allows the repository to avoid the cost of packaging large payload files
and transmitting them to the Gateway. In exchange for this efficiency the repository sacrifices the caching
properties of the Gateway, increasing the likelihood of local changes causing checksum mismatches and resulting
deposit failures. Repositories that can guarantee the availability of content indefinitely (e.g. because files
are immutably versioned within the repository itself) won't suffer this drawback and may wish to use this method
of deposit exclusively.</li>
</ul>
</section>
<section>
<h3><dfn>Get Object Audit</dfn></h3>
<p>Describes the status and history of the object's preservation. For more information about auditing, see
<a href="audit-appendix.html">OTM Appendix - Audit Events</a>.</p>
<p>The information exposed by this endpoint is intended to be informational, allowing repository administrators to
monitor deposit activity and resolve issues encountered during the deposit process. The Gateway MUST report any
errors that would prevent it from requesting deposit via the Bridge in the <code>gateway-errors</code> field. The
Gateway MAY choose to poll status information from the Bridge asynchronously, and is not required to to provide
real-time information. The repository SHOULD NOT rely on the information in fields other than
<code>gateway-errors</code> to be up-to-date in real time.</p>
<ul>
<li>Request Type: <code>GET</code></li>
<li>Request Path: <code>/{object-id}/audit ? versionId=</code></li>
<li>Response Body: <code>JSON</code></li>
<ul>
<li><code>object-id</code>: The id of the object this describes.</li>
<li><code>deposits</code>: Data about the deposited versions; this exposes any errors within the Gateway (e.g.
failed <em>bag</em> validation) and the <a href="otm-bridge.html#get-deposit-status">Bridge status</a>.
<ul>
<li><code>version</code>: The object version this deposit corresponds to.</li>
<li><code>gateway-errors</code>: Human readable description of any errors encountered by the gateway.</li>
<li><code>status</code>: The status text given by the
<a href="otm-bridge.html#get-deposit-status">Bridge</a>.</li>
<li><code>file-count</code>: The count of files included in the deposit, as reported by the
<a href="otm-bridge.html#get-deposit-status">Bridge</a>.</li>
<li><code>details</code>: Additional details about the state of the deposit.</li>
</ul>
</li>
<li><code>audit-events</code>: A list of audit events intended to be formatted for display to curators and
repository administrators; see the Bridge's <a href="otm-bridge.html#get-audit-log">Get Audit Log</a> for more
information.</li>
</ul>
<li>Response Code: <code>200</code> (on success)</li>
<pre class="example" title="Object Audit JSON">
[
"object-id": "af48c3d",
"deposits": [
{
"version" "20190702T201500.001",
"gateway-errors": null,
"status": "DEPOSIT_ACCEPTED",
"file-count": "2",
"details": ""
}
],
"audit-events": []
]
</pre>
</ul>
</section>
<section>
<h3><dfn>Initiate Restore</dfn></h3>
<p>Request restore of an Object and all its contents for later retrieval.</p>
<p>The client can request restore of a specific version of the object by providing a <code>versionId</code> URL
parameter. If this parameter is present, the Gateway MUST restore the exact content of the requested version. When
the client does not specify a version, the Gateway SHOULD seek to restore the most recent version.</p>
<ul>
<li>Request Type: <code>POST</code></li>
<li>Request Path: <code>/{object-id} ? restore & versionId=</code></li>
<li>Response Code:
<ul>
<li><code>202</code> (if the object restore has been initiated and the object is not yet available for
retrieval)</li>
<li><code>200</code> (if the object is restored and available for retrieval)</li>
<li><code>409</code> (if there is a restore already in progress)</li>
</ul>
</li>
</ul>
<pre class="example" title="Restore Object Request">
POST /af48c3d?restore HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:35:00 GMT
Content-Length: 0
</pre>
<pre class="example" title="Restore Object Version Request">
POST /af48c3d?restore&versionId=20190702T201500.001 HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:35:00 GMT
Content-Length: 0
</pre>
<pre class="example" title="Restore Object Response">
HTTP/1.1 202 Accepted
Date: Tue, 02 Jul 2019 20:35:00 GMT
Content-Length: 0
Server: OTM Preservation Gateway
</pre>
<pre class="example" title="Restore Object Response for an object that is available">
HTTP/1.1 200 OK
Date: Tue, 02 Jul 2019 20:35:00 GMT
Content-Length: 0
Server: OTM Preservation Gateway
</pre>
<pre class="example" title="Restore Object Response when a restore is already in progress and a new one will not be
created">
HTTP/1.1 409 Conflict
Date: Tue, 02 Jul 2019 20:35:00 GMT
Content-Type: application/xml
Server: OTM Preservation Gateway
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>RestoreAlreadyInProgress</Code>
<Message>Object restore is already in progress.</Message>
<Resource>/af48c3d<Resource>
</Error>
</pre>
</section>
<section>
<h3><dfn>Retrieve Object</dfn></h3>
<p>Retrieve the content of an object.</p>
<p>The client can request retrieval of a specific version of the object by providing a <code>versionId</code> URL
parameter. If this parameter is present, the Gateway MUST provide the exact content of the requested version or
return a failure response code. When the client does not specify a version, the Gateway MUST return the most recent
available version. In either case, the response MUST include an <code>x-otm-version-id</code> header specifying the
identifier for returned version. </p>
<ul>
<li>Request Type: <code>GET</code></li>
<li>Request Path: <code>/{object-id} ? versionId=</code></li>
<li>Request Headers:
<ul>
<li><code>If-Match</code></li>
<li><code>If-None-Match</code></li>
<li><code>Accept</code></li>
</ul>
</li>
<li>Response Code:
<ul>
<li><code>200</code> (on success)</li>
<li><code>403</code> (when attempting to access an unavailable preserved object; i.e. one that has not been
restored)</li>
<li><code>412</code> (when an If-Match or If-None-Match header fails)</li>
</ul>
</li>
<li>Response Headers:
<ul>
<li><code>Content-Type</code></li>
<li><code>ETag</code></li>
<li><code>x-otm-version-id</code>: Version of the object.</li>
</ul>
</li>
</ul>
<pre class="example" title="Retrieve Object Request">
GET /af48c3d HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:45:00 GMT
Content-Length: 0
</pre>
<pre class="example" title="Retrieve Object Version Request">
GET /af48c3d?versionId=20190702T201500.001 HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:45:00 GMT
Content-Length: 0
</pre>
<pre class="example" title="Retrieve Object Response">
HTTP/1.1 200 OK
Date: Tue, 02 Jul 2019 20:45:00 GMT
ETag: "4efcb3d98ce0fabfd585eb6c4332859"
Content-Length: 493285
Content-Type: application/zip
x-otm-version-id: 20190702T201500.001
Server: OTM Preservation Gateway
[493285 bytes of object data]
</pre>
<pre class="example" title="Retrieve Object Response for Unavailable Object">
HTTP/1.1 403 Forbidden
Date: Tue, 02 Jul 2019 20:45:00 GMT
ETag: "4efcb3d98ce0fabfd585eb6c4332859"
Content-Type: application/xml
Server: OTM Preservation Gateway
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>InvalidObjectState</Code>
<Message>The Object is not available</Message>
<Resource>/af48c3d</Resource>
</Error>
</pre>
</section>
<section>
<h3><dfn>Purge Object</dfn></h3>
<p>Initiates a purge of the object from the preservation system. This will result in eradication of the object's
content. If a <code>versionId</code> is provided, the Gateway MUST request deletion objects matching the specified
version. Otherwise it MUST request deletion of all versions of the requested object. </p>
<ul>
<li>Request Type: <code>DELETE</code></li>
<li>Request Path: <code>/{object-id} ? versionId=</code></li>
<li>Response Code: <code>204</code> (on success)</li>
</ul>
<pre class="example" title="Purge Object Request">
DELETE /af48c3d HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:25:00 GMT
Content-Type: text/plain
</pre>
<pre class="example" title="Purge Object Response">
HTTP/1.1 204 NoContent
Date: Tue, 02 Jul 2019 20:25:00 GMT
Content-Length: 0
Server: OTM Preservation Gateway
</pre>
<pre class="example" title="Purge Object Version Request">
DELETE /af48c3d?versionId=20190702T201500.001 HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2019 20:25:00 GMT
Content-Type: text/plain
</pre>
<pre class="example" title="Purge Object Version Response">
HTTP/1.1 204 NoContent
Date: Tue, 02 Jul 2019 20:25:00 GMT
Content-Length: 0
Server: OTM Preservation Gateway
</pre>
</section>
</section>
<section>
<h2>Bridge API</h2>
<section>
<h3><dfn>Transfer File</dfn></h3>
<p>Transfer a cached file. This endpoint allows a Bridge to pull content for storage in a preservation service.</p>
<p>The client MUST provide a <code>versionId</code> parameter to guarantee the object fetched for preservation is the
exact version requested. For the same reason, the client SHOULD use the <code>If-Match</code> header.</p>
<p>This endpoint MUST support HTTP Basic Authentication as described in [[RFC7617]]. Appropriate credentials should
be provided to the Bridge via the <a href="otm-bridge.html#register">Register</a> endpoint. Gateways SHOULD use
different credentials for each Bridge/preservation system.</p>
</p>
<ul>
<li>Request Type: <code>GET</code></li>
<li>Request Path: <code>/{object-id}/{fileName} ? versionId=</code></li>
<li>Request Headers:
<ul>
<li><code>Authorization</code></li>
<li><code>If-Match</code></li>
</ul>
</li>
<li>Response Code:
<ul>
<li><code>200</code> (on success)</li>
<li><code>302</code> (when the file is not cached by the Gateway, but is available to the Bridge at an external
URL)</li>
<li><code>401</code> (when the given credentials do not allow access to the file)</li>
<li><code>404</code> (if the file is not present)</li>
<li><code>412</code> (if the requested file does not match the <code>If-Match</code> checksum.</li>
</ul>
</li>
<li>Response Headers:
<ul>
<li><code>Content-Type</code></li>
<li><code>ETag</code></li>
<li><code>x-otm-version-id</code></li>
</ul>
</li>
</ul>
<pre class="example" title="Transfer File Request">
GET /af48c3d/file1?versionId=20190702T201500.001 HTTP/1.1
Host: preservation-gateway.institution.edu
Date: Tue, 02 Jul 2020 12:00:00 GMT
If-Match: a93eddb6387aaaa61f6192926214d338
Authorization: Basic QWxhZGRpbjpPcGVuU2VzYW17
</pre>
<pre class="example" title="Transfer File Response">
HTTP/1.1 200 OK
Date: Tue, 02 Jul 2020 12:00:01 GMT
ETag: "Tag: "4efcb3d98ce0fabfd585eb6c4332859"
Content-Length: 2357
Content-Type: application/octet-stream
x-otm-version-id: 20190702T201500.001
Server: OTM Preservation Gateway
[2357 bytes of object data]
</pre>
</section>
</section>
</body>
</html>