Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki OTEL ingestion only registers first log in batch #22232

Open
FredrikAugust opened this issue Jan 17, 2025 · 0 comments
Open

Loki OTEL ingestion only registers first log in batch #22232

FredrikAugust opened this issue Jan 17, 2025 · 0 comments
Labels
sink: opentelemetry Anything `opentelemetry` sink related type: bug A code related bug.

Comments

@FredrikAugust
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I've recently rewritten our log aggregation pipeline to use Vector, which works very well, but as I was rolling it out I noticed that the vast majority of our logs weren't showing up in our Grafana Cloud Loki instance. After some further investigation it turns out that when sink.otel.protocol.batch.max_events > 1 it only registers the first log event. I created a reproduction for this: https://github.com/FredrikAugust/vector-otel-loss-repro. When turning off batching by setting protocol.batch.max_events = 1 it works perfectly. I suppose this is because to ingest several logs at the same time the log events have to be concatenated in the JSON structure (per protobuf: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/logs/v1/logs.proto#L38).

Configuration

data_dir: ./vector-data-dir
api:
  enabled: true
  address: 127.0.0.1:8686
  playground: true
sources:
  vector:
    type: file
    include: [ ./*.log ]
    ignore_checkpoints: true
transforms:
  otel:
    type: remap
    inputs: [ vector ]
    source: |
      . = parse_json!(.message)
      # Bootstrap structure
      .resourceLogs = [{
        "resource": {
          "attributes": [
            {"key": "k8s.cluster.name", "value": {"stringValue": "${CLUSTER_ENVIRONMENT}"}},
            {"key": "k8s.container.name", "value": {"stringValue": .kubernetes.container_name}},
            {"key": "k8s.namespace.name", "value": {"stringValue": .kubernetes.pod_namespace}},
            {"key": "k8s.pod.name", "value": {"stringValue": .kubernetes.pod_name}},
            {"key": "service.instance.id", "value": {"stringValue": .kubernetes.pod_uid}},
            {"key": "service.name", "value": {"stringValue": .kubernetes.pod_labels."app.kubernetes.io/name"}},
            {"key": "service.version", "value": {"stringValue": .kubernetes.container_image}}
          ]
        },
        "scopeLogs": [{
          "scope": {
          },
          "logRecords": [{
            "timeUnixNano": to_unix_timestamp(now(), unit: "nanoseconds"),
            "observedTimeUnixNano": to_unix_timestamp(now(), unit: "nanoseconds"),
            "body": {
              "stringValue": .message
            },
            "attributes": []
          }]
        }]
      }]
  isolate-otel-envelope:
    type: remap
    inputs: [ routes._unmatched, backend-parsed-logs ]
    source: |
      . = { "resourceLogs": .resourceLogs }
  routes:
    type: route
    inputs: [ otel ]
    route:
      backend: '.kubernetes.container_name == "init" && .kubernetes.pod_namespace == "init"'
  backend-parsed-logs:
    type: remap
    inputs: [ routes.backend ]
    source: |
      parsed_message, err = parse_json(.resourceLogs[0].scopeLogs[0].logRecords[0].body.stringValue)
      if err != null {
          log("Failed to parse following as JSON: " + string!(.resourceLogs[0].scopeLogs[0].logRecords[0].body.stringValue))
          parsed_message = {"msg": .resourceLogs[0].scopeLogs[0].logRecords[0].body.stringValue}
      }
      .resourceLogs[0].scopeLogs[0].logRecords[0].body.stringValue = parsed_message.msg
      .resourceLogs[0].scopeLogs[0].logRecords[0].severityText = "INFO"
      if parsed_message.data != null {
        data = flatten!(parsed_message.data)
        for_each(data) -> |key, value| {
          .resourceLogs[0].scopeLogs[0].logRecords[0].attributes = push(array!(.resourceLogs[0].scopeLogs[0].logRecords[0].attributes), {
            "key": "data." + string(key),
            "value": {"stringValue": to_string!(value)}
          })
        }
      } else if parsed_message.error != null {
        error = flatten!(parsed_message.error)
        for_each(error) -> |key, value| {
          .resourceLogs[0].scopeLogs[0].logRecords[0].attributes = push(array!(.resourceLogs[0].scopeLogs[0].logRecords[0].attributes), {
            "key": "error." + string(key),
            "value": {"stringValue": to_string!(value)}
          })
        }
      }
sinks:
  stdout:
    type: console
    inputs:
      - isolate-otel-envelope
    encoding:
      codec: json
      json:
        pretty: false
  loki-otel:
    type: opentelemetry
    inputs: [ isolate-otel-envelope ]
    protocol:
      batch:
        max_events: 1
      type: http
      uri: "$uri"
      method: post
      request:
        headers:
          content-type: application/json
      encoding:
        codec: json
      framing:
        method: newline_delimited
      auth:
        strategy: basic
        user: '$user'
        password: >-
          ${password:?missing password}

Version

vector 0.44.0 (aarch64-apple-darwin 3cdc7c3 2025-01-13 21:26:04.735691656)

Debug Output

https://gist.github.com/FredrikAugust/7acd38595d1e6ab7af34b842477764bc

Example Data

https://raw.githubusercontent.com/FredrikAugust/vector-otel-loss-repro/refs/heads/master/logs.log

Additional Context

As mentioned; by disabling batching it works great. I could always see all the correct logs in the stdout sink.

References

#22188

@FredrikAugust FredrikAugust added the type: bug A code related bug. label Jan 17, 2025
@pront pront added the sink: opentelemetry Anything `opentelemetry` sink related label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: opentelemetry Anything `opentelemetry` sink related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants