created | last updated | status | title | authors | |
---|---|---|---|---|---|
2019-05-27 |
2019-05-27 |
implemented |
Authentication in downloads |
|
This document describes a proposal to add authentication to downloads
by bazel during the creation of external repositories. It is the outcome
of a discusion in the bazel team, based on an earlier proposal on
http downloads using .netrc
and the email
discussion
on said proposal.
Bazel uses external repositories to import dependencies that are not part of the main repository. Those external dependencies often are standard open-source libraries, but not in all cases; sometimes they are repositories private to a company or some other organisation. In the latter case, usually some form of authentication (on top of being in the right network) is required when accessing those sources.
When fetching external resources, bazel supports two fundamental mechanisms,
- bazel can be asked to download a file, possibly with a specified hash, from one of a specified list of URLs (taking whichever is best reachable), and
- bazel can be told to call an external program like
git
. In the latter case, obviously authentication can happen by whatever mechanism the external program supports. The former case, however, is often preferred, as in that way bazel's caching mechanism, that is shared between all workspaces, can be used. The API function to tell bazel to download a file isdownload
from the repository context. Currently, it does not support authentication.
The proposed API change is to extend the repository-context functions download
and download_and_extract
by an additional, optional (with default value {}
)
parameter auth
. This parameter has to be a dict
. For every URL specified to
download from, if it is a key of that dict
, then the value has to be a dict
as well, with one entry "type"
and other entries depending on the value
of the the specified type. For the type "basic"
, the additional entries are
"login"
and "password"
with the obvious semantics.
For example, a call telling bazel to download a file could look as follows.
ctx.download(
urls = ["https://company-mirror.example.com/files/data.txt",
"http://public.example.org/files/data.txt",
"ftp://public.example.org/files/data.txt",
],
sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
auth = {"https://company-mirror.example.com/files/data.txt" :
{"type" : "basic",
"login" : "allice",
"password" : "thisismysecret",
},
"ftp://public.example.org/files/data.txt" :
{"type" : "basic",
"login" : "anonymous",
"password" : "alice@example.com",
},
})
In this example, the auth
aparamter would specify, that the same file could
be obtained from company-mirror.example.com
via https
using a proper
user name and password, but also form public.example.org
, either via http
without authentification, or via standard anonyomous ftp
.
Of course, the authentication dictionary would not be committed into the
workspace; it would only be constructed by the rule in memory from some
form of credential store, e.g., a
.netrc
file. To simplify that use case, a Starlark function will be provided to compute
that dictonary from a list of URLs and a path to a .netrc
file.
Moreover, the version of http_archive
(and related rules) that ships embedded
in bazel will use functions to honor a .netrc
file in the user's home
direcotry, unless explicitly told to user other (including no) credentials by
an explict auth
dict passed as argument.
The proposed change to the API is deliberately minimal, explicit, and agnostic to the actual mechanism the credentials are stored. The reason for this design choice are the following.
-
The
.netrc
file format is not the only way to store credentials on disk. By separating off parsing of the credential file and moving it to Starlark, adaptions and support for other file formats is more easy and in control of the user (without requring additional changes to the build API). -
The approach of being very explicit about for which URLs authentication is needed, instead of having a global default through a side channel (like an option specifying a
.netrc
file), ensures that authentication is only used where desired. While a.netrc
files specifies which credential to use for which domain, and, in general, a download providing basic auth credentials will also be successfull if none are needed, eagerly providing credentials might still be a problem. Consider a domain, say a git-hosting site, that contains both, private and public repositories. Then, accessing the private repositories will require credentials, so the domain will be in the.netrc
file. The same hosting site, on the other hand, might also provide public repositories added to the workspace by rules (via the*_deps()
pattern). For those, the rule author might decide to use plainhttp
URLs (instead ofhttps
) to avoid the SSL-handshake overhead, using that the SHA256 hash of the file is enough to ensure integrity. Now, eagerly providing the available credentials will not make the download fail, but stil cause an unintended clear-text transmission. -
By moving the logic to Starlark as far as possible, extensions are more easy. In particular, related feature requests, like URL-rewriting to go through a local mirror, can be implemented in pure Starlark and are hence out of scope of this proposal.
If the additional parameter auth
is not given, the functions download
and download_and_extract
behave the same way as before. So this is a
fully backwards-compatible extension.