-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArgoCD healthcheck and GrafanaDashboard #1444
Comments
Hi @zelig81 , that is really nice. When it gets merged we are planning to change the other CRs to match how status is managed in the alert. How exactly they will look like we haven't decided. So in short, your issue is in our TODO, we just haven't created any issue for it. So this will be a great way for us to keep track of it. Thank you. When those changes get merged, it would be great if we can work on custom health checks for ArgoCD and write some docs about it. If you are interested in contributing the new status solution to the existing CRs that would be great. Else someone maintainer will take a stab at it in the future. |
I would be delighted to assist with this. Just please let me know the format of the conditions. |
In general, we just need to understand which status/condition will be relevant for which of 4 states of the resource in terms of ArgoCD (Progressing; Healthy; Degraded; Suspended(if relevant) ) It will be most nice to have all the types of CR to follow this convention (Grafana, GrafanaFolder, GrafanaDatasource, GrafanaDashboard, GrafanaAlertrule...). Although it can look a little bit strange, it will allow us to make a single ArgoCD healthcheck to rule them all :) |
@zelig81 we haven't come that far to have given it any thought on how the status would look like. It would be great if you took a look at the Alert implementation and see how it's done there and see how if you think it would fit for the other CRs. We are also planning to go towards finalizers in the other CRs which should remove the need for some of the existing statuses. It's probably a good idea to do that at the same time. But I do understand that it will make this refactor allot bigger. So please come with a suggestion on how you think it could be done, and we will try to provide feedback as quickly as possible. The maintainers also have weekly bug scrub meetings, and if you want to join and talk about this refactoring, you are also free to do that. Btw, I think it sounds like a great idea to have a healtcheck in the upstream argo repo around this. |
let's continue the conversation on slack |
Hi, i don't know the status of this issue but i got a few information to add as of v5.14: I noticed that when deploying resources with argoCD if the resource fails to sync it will keep the status field empty on argo and are marked as healthy and synced, in this case the CR also shows status: {} The only resource that seems to be working is GrafanaNotificationPolicy, showing the status degraded and the error message when there is an issue |
Is your feature request related to a problem? Please describe.
I created an ArgoCD healthcheck to show if ArgoCD app (and related GrafanaDashboard resource) is OK or not. If the Dashboard has some issue, it is not replicated in the status of the GrafanaDashboard object. It is just not resynced.
Here is the error message in the grafana-operator that I want to be propagated to the object (here: the dashboard json is broken on purpose - title changed to "title-break"):
Describe the solution you'd like
I propose to update
status.lastMessage
with this error message (like it is done in GrafanaDatasource: "error 400: Dashboard title cannot be empty")Describe alternatives you've considered
Meanwhile, I'm waiting for 5 minutes since the last resync and set the status of the healthcheck about failure to sync for more than 5 minutes and a plea to check in grafana-operator logs.
Additionally - it can be nice to make the status of the object to following usual k8s resources convention like:
Additional context
My current ArgoCD healthcheck:
The text was updated successfully, but these errors were encountered: