-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self Service Log Ingestion #3518
Comments
We see that we can use pod logs but do we want to force customers to create pod logs for log ingestion? Can we allow them to collect logs at the namespace level (with annotations and so on)? |
How much effort is it to create podlogs for customers? I would love to have some label based stuff where we can just say "add this label and it's automatically ingested" because that makes it quite flexible and intuitive. It will also help us with mutli-tenancy I believe. |
The issue I have is not that pod logs don't make sense but I would think they should be used on really rare occasions. Ideally, an annotation/label on the pod or namespace should be enough to get most the tenant for most log and that would make profiles and traces collection easier. I would only use pod logs if the pod needs a custom pipeline imo What i'm not sure is if we can get alls logs for a namespace if it's annotated unless the pod has it's own label and unless it's equipped with a pod log? I would think we could do something with drops but I'm not sure. Maybe @TheoBrigitte knows if log sources can exclude data taken from other sources? |
When using Alloy as logging agent installed within a workload cluster, we configure it in a way which would allow to retrieves logs from specific namespaces and/or pods. This solution makes use of 2 differente PodLogs (with mutual exclusion):
Those PodLogs would be configured by us and customers would only deal with labels on their resources. With this solution we might face a problem with resources usage on the Kubeletes, as all the log traffic would happen via the Kubernetes API the network and CPU usage on Kubelet might be problematic especially in cases where many/all pods would be monitored. I opened an upstream issue requesting to add the namespace metadata within the |
Did you take a look at this?https://grafana.com/docs/alloy/latest/reference/components/loki/loki.source.kubernetes/ |
Looking it, this would be simpler than the currently used |
I quite like that we do not have to run it as a daemonset though :D But why do you not have the namespace ? I thought those should give you |
Oh you meant namespace labels,nevermind |
Using a combination of In the following example the tenant id is taken from the pod label Here is the config and the PodLog resource I used
loki.source.podlogs "default" {
forward_to = [loki.relabel.default.receiver]
}
loki.relabel "default" {
forward_to = [loki.write.default.receiver]
rule {
action = "replace"
source_labels = ["foo"]
target_label = "__tenant_id__"
replacement = "$1"
regex = "(.*)"
}
rule {
action = "labeldrop"
regex = "^foo$"
}
}
loki.write "default" {
endpoint {
url = "https://loki.svc/loki/api/v1/push"
}
}
apiVersion: monitoring.grafana.com/v1alpha2
kind: PodLogs
metadata:
name: pod-tenant-id-from-label
spec:
selector: {}
namespaceSelector: {}
relabelings:
- action: replace
sourceLabels: ["__meta_kubernetes_pod_label_foo"]
targetLabel: "foo"
replacement: "$1"
regex: "(.*)" It is also possible to set the tenant id using the |
Current prototype idea Improvements we want to explore:
|
Using |
The potential new join feature would not help in our case as this would only allow enriching metadata in the |
What if you enrich then drop logs instead of trying to discover only those we should "scrape" ? |
There would still be now way match the resulting targets against a local file as the |
|
if you join based on the extracted labels from |
The namespace metadata is only present when using the
|
We can't load components like The way to load dynamic configuration into Alloy is via modules. A module is describe by a |
It is currently not possible to use kyverno policy to label the kube-system namespace, as kyverno lacks permissions to do so
|
Just linking the Alloy internal |
Load testing storyI used Loki canary to load test the logging pipeline and see how the current setup with Promtail compares with Alloy logs. Logs where only ingested from pods in the kube-system namespace. I looked at the Kubernetes API server pods usage and network traffic. Loki canary was started at: 14:08 UTC, generating approximately 10k log lines per minutes Kubernetes API server usage stayed the same when using Promtail (which scrapes logs from disk). Usage is ~10 times higher when using Alloy (tailing logs through the api server). Loki statsComparison on data being ingested by Loki
Kube API server podsComparison on api server pods resources (memory is not relevant and stays the same).
Node resources usageAdding this for information but nothing relevant here, resources usage stayed ruffly the same. |
Since the last results showed concerning performances when tailing logs throught Kubernetes API server I experimented with a new solution which allow to fetch logs from files on disk and discover targets using labels. This solution requires an additional container which does update the list of labeled namespaces directly into Alloy's configuration. Here is a high level overview on how this solution looks like. The additional container is a 4 lines shell script described here https://github.com/giantswarm/logging-operator/pull/235/files#diff-405f451506c2146b4cf915863a848d9a42f39a3b292a9b6f9ada78b4eac32598R57-R61 |
Does this sidecar container work well with clustering? Also, what do you see as concerning? a bit more cpu looks okay to me because with your tests, you fetch 10k log lines from 6 nodes and not 230 like on bigger clusters right? so the metrics you get will definitely be higher than the actual usage |
I haven't tested this new container with clustering mode, but it should work fine with it.
Yeah maybe in the end performances are not so bad. Anyway with this solution there's no way to override the tenant id, so I am going with solution 1 (PodLogs). |
Here is the source for the graph above self-service-logging-2024-10-21-1114.excalidraw.gz |
We are good to go here
This will be available from CAPI v30.0.0 releases
Last point: announce this to everyone including customers. How do we proceed ? Do we include this in the v30.0.0 release announcement ? |
Yeah would be good to have it in the v30.0.0 release announcement. Do we have release notes where we add that? |
I'll do the post and make sure we have this also in the v30.0.0 release annoucement |
We can only craft the release announcement when the next releases are being worked on. Tenet will ping when this happens. I added a todo as a reminder here. |
Taking the release announcement out of the scope for now, to unblock this ticket. Release will be handled seperately. |
Motivation
We want customers to be able to ingest whatever data is relevant for them in a self service way, this also includes logs. So we need to make sure we have a way how they can add their own data sources for logs.
Todo
Outcome
The text was updated successfully, but these errors were encountered: