-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download: Fix support for Seqera Community Containers #3179
Comments
Great write up, thanks @MatthiasZepper! Two minor notes:
|
Also note that there are two things happening here: the first comments you linked to are for a PR that hasn't been merged. We have not yet adopted Seqera Containers so it's not unexpected that nf-core/tools has not yet been updated. However, there is organic uptake of Seqera Containers by the community, putting them into old-style |
That is not my problem, I actually think that the SHAs are really useful. The problem is, that they now differ! Previously, a container declaration looked like this:
Because the def prioritize_direct_download(self, container_list):
d = {}
for c in container_list:
if re.match(r"^$|(?!^http)", d.get(k := re.sub(".*/(.*)", "\\1", c), "")):
log.debug(f"{c} matches and will be saved as {k}")
d[k] = c
return sorted(list(d.values())) With the new Seqera Community Containers, the native Singularity image and the Docker image have different SHAs, because they are technically two different images:
But that makes it a lot harder to determine, which of the records should be given priority. Particularly when downloading multiple revisions of a pipeline at the same time, there are commonly multiple versions of the same tool in the raw list of container images that somehow need to be consolidated. This problem is anything but trivial and maybe downloading both is the only way forward. (Alternatively, making an API request to Seqera Containers to obtain the alternative URI would be better of course).
I have not yet considered this aspect, actually. It is true, that it is nice to not need Singularity when downloading, but de facto we can assume that is it available in most cases. For MacOS, I wrote a virtual machine, so that works as well. I have no/few objections to pull the images instead of using https:// downloads, that is relatively simple to implement, if the priority and image SHA resolution is somehow solved.
Yep, but it is the very same people discussing the official proposal and approving the PRs for the "organic" updates to modules. Unfortunately, the offline capability of a pipeline is essentially an untestable feature, because I can't take a Github runner offline, even though I tried to catch some download issues during a test on the pipeline level. Once a pipeline is released, the code is unfortunately out in the wild and therefore, I wish that offline usability was not an afterthought of module or pipeline updates. |
Writing up the blog post / migration guide for Seqera Containers adoption now. It should massively simplify the task for Do you see any issues with this @MatthiasZepper ? |
I haven't read your blog post yet, so hard to tell if I see issues with it - where do I find the draft? But I already commented on your POC drafts that I really like the idea of having a central YAML/JSON that is being generated upon pipeline release and that contains all container URIs. That will indeed help tons, since one does not have to sift through all modules and configs to find container declarations.
I am starting to lean towards telling people to use an older version of tools for that purpose and just dropping the support entirely. If it needs to be, one could probably retain the old code and switch to it if the requested pipeline revision does not have said YAML yet.
Yes, that would be much appreciated. I did not know that If it was able to return the containers URIs also for a different platform architecture than the one it currently runs on, that would be the cherry on the cake (e.g. I execute it on a Mac and ARM64 architecture, but I would like to get the container URIs for an amd64 Linux pipeline execution). Since it considers the configs, I think that should be possible already...I should try that out. |
@mirpedrol is this one closed then? |
Not fully:
|
Description of the bug
While I got the impression that the adoption of Seqera Community Containers would be tightly linked to the new dynamically created container paths upon pipeline release, this does not hold true.
We now already have 70 modules (mainly local but also some in the module's directory), which have no support by
nf-core (pipelines) download
and may not run offline.This shows mainly by two means:
oras://
protocol is simply ignored. Instead, the Docker version is converted, which takes much longer. Also, the converted image will not be found if the containerEngine is set to 'Singularity', because the SHA of the native Singularity image and the Docker converted image differ.depot.galaxyproject.org-community.wave.seqera.io-library-python_numpy_pip_checkqc_interop-b5301d9801b8e66b.img
. Evidently there is no use for this and if a user copies the files withcp -L
, a lot of unessessary data duplication happens.I would have preferred a synchronized and coordinated update, but since this stuff is now out in the wild, I reckon that
nf-core (pipelines) download
should be adapted to:gather_registries()
hardcodescommunity.wave.seqera.io
similar todepot.galaxyproject.org
, so it is not required to explicitly configure this registry, e.g. with the--library
/-l
argument.singularity_image_filenames()
handles the filenames correctly. Mind the library subfolder, such that the resulting image name should becommunity.wave.seqera.io-library-python_pip_samshee-84a770c9853c725d.img
fororas://community.wave.seqera.io/library/python_pip_samshee:84a770c9853c725d
prioritize_direct_download()
(or write a dedicated function for this purpose) to also prioritizeoras://
over the Docker images and adapt the corresponding test. Mind that the SHA sums differ, and this needs to be accounted for:oras://community.wave.seqera.io/library/python_pip_samshee:84a770c9853c725d
andcommunity.wave.seqera.io/library/python_pip_samshee:e8a5c47ec32efa42
are equivalent. Just ignoring the SHA sums however could prove risky, as there is no way to assure that the correct version of the tool is used then. Mind that it is not uncommon for a pipeline to use tools in different versions within the same release. Therefore, downloading both is possibly the only way forward without an API request to Seqera Community Containers.get_singularity_images()
should sort theoras://
links tocontainers_pull
. Ensure that the output path is correct, the pulling works. Lastly, fix the test.symlink_singularity_images()
handles the images correctly and trims the hard-coded registries prior to building the symlinks. Update the test.Thx
Command used and terminal output
No response
System information
No response
The text was updated successfully, but these errors were encountered: