-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CWL: support for uploading (private) containers #180
Comments
While I can well imagine that being able to use private container images is an important use case particularly in industry settings, I find uploading large files via workflow attachments a bit problematic. For big workflows we may be talking about gigs of data. I'd presume this to be difficult to implement both for clients and services, especially in an interoperable manner. It's also not very user-friendly. If the problem is lack of access, why not rather pass along the necessary credentials in some way? |
We have run into this in the past. For instance, AWS ECR requires a log in or a vendor's docker repo requires log in. I haven't found a good way to handle this. I don't like the solution of uploading a docker image with the workflow. Reason being, the image may already exist on the running instance and not need the upload. |
Maybe we need to allow multiple stages of uploads. Or a query interface. |
Really I would think the engine running the CWL step, when pulling the image, needs to be able to perform a Docker login. So providing the credentials in some manner would make the most sense. For AWS ECR, our nodes would have iam role capabilities so no credentials need to be passed. For a vendors Docker repo, credentials need to be provided somehow. |
While I definitely have seen this issue and been on the receiving end of it, I do not really see this as the responsibility of WES ( If anything this feels more of a challenge for TRS then it does for WES). If we were to try to implement it though I do think it would provide considerable challenges, security concerns and barriers to new implementors. Along those lines I have a number of concerns with this idea in no specific order
|
@mr-c would an alternative solution be for the specific Wes engine to provide information on what container registries it supports (similar to supported file types) or maybe even as far as what specific registry to push to (ie it can list a preferred registry, and maybe Wes can provide a way to get limited access to push to that registry out of scope of a run request) |
Not sure if this is actually as straight-forward as it sounds. I don't know much about container registries, if there are fundamentally different types of them, and with different access mechanisms. But even when just configuring Docker registry servers, there are bound to be many that have public content, available without authentication, and that a given WES wouldn't necessarily know about. So that list of supported registries would almost certainly be incomplete. Conversely, registries may require authentication for some but not all of the hosted resources, and in these cases for a given service being on a list of "supported" registries wouldn't be sufficient. I agree that a lot of these considerations are rather in the scope of TRS, which is supposed to be a means of granting access to workflows and tools (including containers), or at least resolving them via TRS URIs, similar to DRS. It is, however, unlikely that we can come up with a authorization mecahnism similar to Passport for container files that will be widely adopted by container registries. I don't really see any way around passing access tokens/credentials to realize this use case if attaching images to the run request is not an option (and I strongly agree with @patmagee that it probably shouldn't be an intimate part of WES). However, @mr-c, since I was just working on a WES implementation these last few days and have looked at the specs intimately again: I'm pretty sure there is nothing in the WES specs that explicitly forbids you from doing what you are asking for. If your WES client and server-side implementation can deal with the object sizes and you are able to load the files into the daemon and make them available to the actual workflow engine - that should actually be possible without having to violate WES specs. |
Tagging @denis-yuen and @coverbeck here |
FWIW, while the official TRS specification has no stance on private/shared resources. dockstore.org's implementation does accept optional auth for controlling access to your own/shared workflows. Expanding that to Docker containers is a reasonable idea with the caveat that from a practical standpoint, concerns 1, 3, 4 do seem challenging even in a TRS context. If I understand correctly though, in a theoretical world where TRS included a Docker registry, wouldn't you still need to essentially pass credentials around? Instead of passing around, say, Docker hub credentials one would end up passing around TRS credentials (or GA4GH passports)? |
Thanks @denis-yuen. I also think it's worthwhile to think about access control in TRS similar to DRS. Having non-public workflows is also an important, releated use case. But perhaps one could also play through the idea that TRS could be implemented by a container registry directly. That could address the problem, although I'm not quite sure how easy it would be to convince Docker Hub, Quay, ECR and smaller container registries to adopt it - especially since it is way too elaborate for the rather simple use case it would solve (resolve an identifier to a server and image and check whether the request has permissions). @mr-c: Isn't there some effort in OCI, perhaps, towards a more uniform way of gaining access to non-public container images across container registries? |
1 similar comment
Thanks @denis-yuen. I also think it's worthwhile to think about access control in TRS similar to DRS. Having non-public workflows is also an important, releated use case. But perhaps one could also play through the idea that TRS could be implemented by a container registry directly. That could address the problem, although I'm not quite sure how easy it would be to convince Docker Hub, Quay, ECR and smaller container registries to adopt it - especially since it is way too elaborate for the rather simple use case it would solve (resolve an identifier to a server and image and check whether the request has permissions). @mr-c: Isn't there some effort in OCI, perhaps, towards a more uniform way of gaining access to non-public container images across container registries? |
@uniqueg OCI has a standard for accessing container registries; beyond that I don't know |
If the submitter's workflow uses one or more local/private containers, but the WES server doesn't have access to those containers (or a registry with them), then WES is not usable.
For CWL I propose that the containers may be uploaded (in the OCI image format) as part of the WES request, perhaps to a "_wes_docker_container" directory to avoid clashes.
The references to these non-public containers may have been via
dockerPull
,dockerLoad
, ordockerImport
; if nodockerImageId
was also specified, then thedockerImageId
should be added by the WES client to make clear which of the uploaded containers theDockerRequirement
refers to.https://www.commonwl.org/v1.0/CommandLineTool.html#DockerRequirement
The text was updated successfully, but these errors were encountered: