-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support OpenTF style artifacts #3313
Comments
Thanks @caniszczyk, OpenTF artifacts would be a nice addition to Artifact Hub 🙂 Hi @omry-hay 👋 Let me explain a bit how Artifact Hub indexes content. Please note that AH does not host or serve any of the artifacts kinds supported, we just collect and index some metadata about them periodically. Any organization or user can add repositories to Artifact Hub. At the moment we support several repositories kinds, like Helm charts, OLM operators, etc. The The generic tracker relies on a custom metadata file and a flexible directory structure that supports one or more packages per repository, including multiple versions per package if needed. Data unique to the artifact kind can be added in the form on custom annotations. Some examples of how other projects organize the Artifact Hub metadata for their artifacts:
Now we can move to OpenTF specifics 🙂 What artifacts kinds would you like to start with? IMHO the best approach would be to add the AH metadata file to each of the providers/modules repositories and let them list themselves on Artifact Hub. As an example, this would mean adding an You can start working on this as soon as you'd like. That way when we have AH ready for the new kinds you can list them straight away 😉 Please let us know if you have any questions! |
Hey @tegioz thanks for the warm welcome and very concise and clear intro!
I think I may have not fully understood the above ☝️, though. We have a couple of use cases that we're looking into right now, which maybe what you mentioned above could help with, or maybe a different direction within ArtifactHub could.
We would love your thoughts or pointers as to wether we could, and if so how extend ArtifactHub to support these use cases. |
No worries @roni-frantchi! I think we may be talking about two different goals, maybe there was a misunderstanding 😇
We'd be super happy to help with However, even if Maybe AH could even be some sort of provider for the OpenTF registry in order to discover content to serve (just thinking out loud). Harbor, for example, uses Artifact Hub as a provider to replicate content from. This is implemented on the Harbor side by relying on the Artifact Hub API. Hope this helps 🙂 |
Thank you @tegioz ! Yes, indeed being able to resolve artifacts is our first priority ATM and we were thinking maybe AH does consider that to be in scope (as opposed to hosting the artifacts themselves which is clearly OoS).
This is also an approach to consider - thing is from our end, it means adaptations to the CLI which we would want to refrain to before getting feedback from the community that this is the direction OpenTF should take, which is why we try to preserve the resolution API (aside from the target host due to T&C rag pull , clearly). |
I think I didn't explain myself well, let me try again 😇 When I mentioned that AH could be a provider for the OpenTF registry, I meant for the new tool that is yet to be written. I was seeing the full picture as follows:
Hope this clarifies it 🙂 |
Got it thanks @tegioz . So I get the upside for using AH to discover providers and modules, of course. On the other hand, if there's a need to deploy a dedicated TF registry service providing the package resolution API to match that of the CLI, and seeing there is a clear imposed convention as to how to resolve them (see quote below), what would be the benefit in using the AH over letting the convention hit the GitHub API and fetch the stored packages there?
|
I was thinking that AH could be useful in that case to discover repositories (i.e. list all OpenTF providers repositories urls, list all the official OpenTF modules repositories, the ones published by some given organizations, etc). As I was seeing it, it'd be just to fetch the repositories url and some minimal metadata, the entry point that would let you process the repositories as if they had been provided manually on a config file. So it'd be an optional mechanism to discover available repositories, not an alternative way of processing them. Once you had those urls, you could read their content from GitHub following those conventions. Artifact Hub lets users and organizations to add repositories automatically from the UI, so more and more are added every day in an unattended way. You could achieve the same by maintaining a list of repositories in a file somehow and let users submit PRs to keep it up to date, for example, or by any other mechanism. It's not a huge win and probably not a priority, but maybe it could be useful at some point both for the public registry and for private deployments 🙂 |
Hey @tegioz thanks and apologies it took me so long to circle back. So looking beyond our initial alpha and its interim registry which as described is more of a proxy by-convention to GitHub releases and their artifacts; Going back to what you have mention earlier, trying to explore that a little more with you - what are your thoughts about OpenTF making an enhancement on its client-side to also be able to use the ArtifactHub API, rather than just the official API (which will continued to be supported, for the sake of private registries).
Would love your thoughts here |
No worries 👍
Sure, that'd be great! Just one thing I'd like to share with you first: I've been talking to @cynthia-sg and @mattfarina (the other Artifact Hub maintainers) about this new kind and we all agreed that we'd need to hold on a bit until OpenTF is part of the Linux Foundation / CNCF. Everything listed at the moment on AH is part of a foundation, as it's been a requirement since the beginning of the project, and we'd like to continue honoring this requirement. Hope this makes sense 😇
Yes, that's right. Artifact Hub visits all the repositories registered periodically and indexes new versions available automatically. I'm thinking that it's likely that we add a new tracker source for this artifact kind. The current ecosystem is quite large and it'd make things easier on your side. But we'll need to think about it a bit more 🙂
Correct. Owners can even claim the ownership of repositories in an automated way. So you could add all repositories (let's say on the OpenTF org in AH) and their respective owners could claim their ownership if they wish eventually. That'd allow them to request the verified publisher and official badges (when applicable), as well as receive notifications when something goes wrong processing their repo, among other things.
One way of handling this would be to create a new single endpoint for this integration on AH. That endpoint would return a list of the OpenTF repositories listed on AH, including the information you'd need to operate on them. The OpenTF CLI tool could cache this data locally (we'll make sure it's properly cached in our CDN as well) and refresh it when it's older than X hours (something we can agree on once it's ready). We've used this approach with success in other similar cases (please see the issues below for more information): |
Absolutely! Thanks for the responses validating my thinking!
Wonder what are you're thoughts on that one too 🙏
Maybe I'm missing something - but it sounds like you're suggesting to have a single endpoint for all OpenTF packages, which sounds like a huge response right there - if indeed it is meant to contain all of the thousands of providers/modules, each with possibly hundreds of versions?... On the other hand from looking at the issues you've shared it seems as if specific popular packages where added such endpoints for their clients to use?.. |
Oh I thought the paragraph that starts by "Correct. Owners can.." was answering that as well 🙂 Yes, that wouldn't be a problem. We just should coordinate a bit their addition to measure how everything goes as more and more are added.
Well, this would depend on the amount of information needed. If we were able to keep this to a minimum, a list of 6k urls plus some extra details wouldn't be that big. Including all the versions would probably be too much, but I wonder if that would be something really needed or the OpenTF CLI tool could interact directly with the repositories once they've been located. I was thinking something like the repository name, maybe a short description and the url. I was hoping that would be enough as a starting point that the tool would be able to follow on its own. A file like that could be cached locally for a few hours, and would allow the tool to use that information even if AH was down.
Oh no, those endpoints are listing all Helm charts actually 😉 They were added for their clients to use, but listing all content available. We did it this way because those tools required either doing a lot of concurrent searches or fetching a lot of packages each time they run. So instead of hitting the AH API thousands of times (per each user of that tool), we prepared the information they needed in a single endpoint (easy for them to fetch, and easy as well for us to serve and cache).
The AH API exposes some endpoints to fetch a single package, or search for packages or repositories. That's already available. The problem is that, depending on how tools use those endpoints, there could be an important impact on the AH service. And when we receive a lot of requests, the rate limits start kicking, as we need to keep AH healthy for all users. The OpenTF tool has the potential to have a lot of users, and those users may build tools on top of it, tools that may require hitting the AH API a lot. So given that the datasets in AH are relatively small and that we can afford to add some special endpoints in some situations, relying on a data dump is an option we can consider sometimes 😇 As an example, I've run a test and a gzipped json file with the names and urls of 6k repositories would be ~110KB. This indeed can get bigger as we add more data (i.e. the Nova dump gzipped is about 1.9MB). But some other tools are hitting the AH API directly without relying on a data dump, like the Helm CLI, and that's perfectly fine too. That tool has a My intention was to give you more options so that if the OpenTF tool required a heavier use of the AH API for its operations (like the other tools we provided those endpoints for), we could consider providing a data dump for your specific use case if that would help (I wish I had that option sometimes with some APIs we rely on!). On the other hand, if you only expect sporadic searches and fetching some information for a single package, then you could rely on the API as is, as those endpoints are already implemented. |
Hey @tegioz, Arel from OpenTofu here (the new name of OpenTF, under the Linux Foundation) 👋🏻 Currently, the OpenTofu project has a requirement set for the OpenTofu Registry, marked in this comment. I'd like to continue this discussion here and see how ArtifactHub could help us with creating this Registry, and afterwards I'll create an RFC issue in OpenTofu for a suggestion as to how to accomplish that So I'd like to continue the conversation from where it left off. From what I understand, creating a dump endpoint that will provide the names and URLs of all repositories is possible in AH-side. However, currently
Having Would there be any possible way of dealing with this requirement of OpenTofu? Seems like at the very least, we'd need to get the available versions of the providers and modules from AH, we might be able to generate the download URL by convention from the GitHub release of the repository that AH has directed us towards. However, ideally we'd prefer to be able to determine the release artifact download URL easily without resulting to that (as right now, all providers and modules are forced to be in GitHub) Regarding a few of the other requirements here:
Can AH store Public GPG keys used for artifact signature validation? We need to be able to get the GPG key, for use with validating the signature of the artifact downloaded
If the dump endpoint could include versions, would it also be able to include deprecation warnings / information from somewhere?
Would this amount of load from a single IP address be OK for AH (from the standpoint of a dump endpoint) Thank you very much for the help 🙏🏻 |
Hi @RLRabinowitz 👋 The main goal of Artifact Hub is to provide a UI where users can explore and discover multiple kind of artifacts. However, it was never meant to be a potential SPOF that could block an entire ecosystem. We don't store any artifacts intentionally, just some metadata about them. As of today, if Artifact Hub was down users shouldn't be blocked from interacting with any of the artifacts/repositories listed on it. And this is something we'd like to continue being this way 🙂 However, Artifact Hub could provide an alternative to the UI side of the Terraform Registry for OpenTofu. But this would be much easier to achieve the other way around: AH being able to index content from OpenTofu's based registries instead of AH collecting information from the GH repositories. The main reason for this is that we are not immune to GH rate limits either 😅 If we were obtaining this metadata from you, this wouldn't be a problem (for us 😇). As you mentioned, the Terraform Registry requires repositories to be in GitHub and, IIRC, they rely on GH webhook notifications for updates. Artifact Hub does not require repositories to be hosted on GitHub, so publishers can use GitLab, Bitbucket, Gitea, etc. For git based repositories, we use a poll model that uses some git operations to detect changes and process repositories when needed. So if we were processing the GitHub repositories ourselves, having a very large number of them on the same external provider could be problematic for us as well. As it was mentioned above in a previous comment, there are around ~6k TF providers/modules repositories. So if we were to process them, we'd need to roll this out progressively over time. The suggestion of adding a dump endpoint was precisely to provide you with an API that was less likely to be rate limited. Our top priority is to keep the web application available at https://artifacthub.io/ up and running for all users. So when we detect any API usage that has a considerable impact on the service, we're forced to apply rate limits. This is less of a problem for a dump endpoint, as we can cache aggressively at the CDN level and it becomes a cost problem (something that we may need to deal with as well and rate limit at some point). But when I proposed this dump endpoint, it was with the intention of providing an optional way of discovering repositories that the OpenTofu CLI tool would process itself. As an example, the Helm CLI has a Hope this makes sense (and helps!) 🙂 |
Thank you @tegioz 😄 So, the use-case of In So, listing the versions is an important part of
So yeah, if such a service / endpoint would be down, the main flow of
That's an interesting approach for a different RFC, where we statically host information about providers and modules (repositories, their versions, download links, and other metadata). The documentation for that RFC is WIP, and maybe AH could be a nice option there. In such a case, would users be able to use the artifact UI to find information about the providers (and their docs) via AH? Would it still require the providers be added manually (or via API) to AH? |
No worries 😄
TBH this is a position we'd rather not to be for any kind of artifact supported. In this particular case, we could be rate limited at any point given the amount of repositories expected, we don't have any guarantees. So we wouldn't be able to provide them either. And IMHO building and supporting a solution that may be critical for many organizations on top of these uncertainties wouldn't be right. AH can't be a potential blocker for the OpenTofu ecosystem.
Yes, users would be able to use the AH UI to find information about the providers, read their docs, etc. We could also display the warnings you mentioned in your previous comment, or even other OpenTofu specific views. The easiest way to achieve this would be to, once you've collected all information about the providers and modules available, to generate (and keep up to date over time) the required AH metadata files for all of them in a git repository (this can be easily automated). Artifact Hub would visit periodically the metadata repository and would index new/updated/deleted content as needed. This way, you could create an OpenTofu organization in AH and publish that content under it. No need to list providers individually, but users/orgs would have the ability to do so if they'd wish. You can see more details about how this would work in my first comment in this issue. IMHO this is the right way to integrate OpenTofu artifacts in Artifact Hub. This is something we can have ready pretty quickly, so once you are ready on your side please let us know and we'll get it done 😇 |
OK. So that would mean that we'd need to go down a different route for the "discovery" part of Now, regarding using AH UI:
This is possible, especially for first party providers, but a lot of providers/modules are 3rd party and would require someone to contribute that there. I don't think that doing this we'll get many 3rd party providers onto AH, at least not initially. Would using a specialized tracker source for that help on that front? (Though I assume you'd prefer to use the generic tracker)
This approach is more likely IMO, as it does not require all 3rd party providers or modules to handle the AH integration themselves. However, that would mean that those repositories wouldn't be able to apply for verified and official statuses? Also, I have a question regarding documentation. Today, the documentation in AH is basically the README.md, right? In Thanks again for all the help 🙏🏻 |
I agree, I think it'd be best if you generated the metadata for all providers once you have processed them. A specialized tracker could talk to an API provided by the OpenTofu registry, but as you said, we'd prefer to use the generic tracker as it makes everything easier to maintain down the line. We support quite a bit of artifact kinds at the moment and the number keeps growing, so we need to try to keep it as simple as possible 😇 Regarding the verified and official status, we'll need to think about something for this. It doesn't help final users that packages that should be official aren't marked as such, so this is something we need to find a solution for, not just for OpenTofu. We have a similar situation with OLM operators, where we get most of them from a single repository as well. The README file is quite important in AH due to the way the UI displays it. But we'd be open to include something specific for OpenTofu that allows displaying some extra information in a special way. We've done this for some artifacts kinds, like Helm. Some examples: https://artifacthub.io/packages/helm/artifact-hub/artifact-hub?modal=values-schema |
No description provided.
The text was updated successfully, but these errors were encountered: