OTEP: Recording exceptions as log based events #4333

lmolkova · 2024-12-10T02:44:05Z

Related to open-telemetry/semantic-conventions#1536

Changes

Recording exceptions as span events is problematic since it

ties recording exceptions to tracing/sampling
duplicates exceptions recorded by instrumented libraries on logs
does not leverage log features such as typical log filtering based on severity

This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.

If accepted, the follow-up spec changes are expected to replace existing (stable) documents:

Related issues Document generic approach for span status (code + description) and exception event when instrumented code throws semantic-conventions#1536
~~Related OTEP(s) #~~
Links to the prototypes (when adding or changing features)
CHANGELOG.md file updated for non-trivial changes
~~spec-compliance-matrix.md updated if necessary~~

oteps/4333-recording-exceptions-on-logs.md

pellared · 2024-12-10T05:27:14Z

I think this is a related issue:

Define whether instrumentations can emit non-event log records #4234

oteps/4333-recording-exceptions-on-logs.md

carlosalberto · 2025-01-07T00:01:12Z

A small doubt:

If this instrumentation supports tracing, it should capture the error in the scope of the processing span.

Although (I think) it's not called out, I'm understanding exceptions should now be explicitly reported as both 1) Span.Event and 2) Log/Event? i.e. coding wise you should do this:

currentSpan.recordException(e);
logger.logRecordBuilder
    .addException(e);

Is this the case?

jsuereth

Overall I'm very supportive. Just some nits and one mitigation I'd like to see called out/addressed.

jsuereth · 2025-01-07T13:10:14Z

oteps/4333-recording-exceptions-on-logs.md

+
+5. An error should be logged with appropriate severity depending on the available context.
+
+   - Errors that don't indicate any issue should be recorded with severity not higher than `Info`.


What's an example of this? I'm struggling to understand when this would be used.

Most popular is exception-driven logic - you check that something is available and it throws if it's not. You try to delete something and it throws if someone concurrently deleted it.
You send a request, it times-out, you retry and get 'already exists', etc.

It's not always the best development style, but there are plenty of real-life examples. Adding one to the text.

oteps/4333-recording-exceptions-on-logs.md

jsuereth · 2025-01-07T13:12:13Z

oteps/4333-recording-exceptions-on-logs.md

+4. It's not recommended to record the same error as it propagates through the stack trace or
+   attach the same instance of exception to multiple log records.
+
+5. An error should be logged with appropriate severity depending on the available context.


I like the goal of the taxonomy, but think we need to crisp up the language around Info/Warning

makes sense, updated each severity bullet with examples and some criteria related to error impact. PTAL

jsuereth · 2025-01-07T13:23:58Z

oteps/4333-recording-exceptions-on-logs.md

+> OTel should provide API like `setException` when creating log record that will record only necessary information depending
+> on the configuration and log severity.
+
+It should not be an instrumentation library concern to decide whether exception stack trace should be recorded or not.


Two things:

SHOULD is normative, so please capatilize (and I think IS a normative statement here).

This may not be language neutral, so I think SHOULD is the right guidance here. For example, in Rust, stack traces are something you can opt-in on an error. They leave some details to libraries (see Rust Backtrace/Source capabilities on https://docs.rs/thiserror/latest/thiserror/ e.g. or C++ [prior to 23] https://github.com/jeremy-rifkin/cpptrace).
Additionally, in some highly green-thread/async APIs, I've seen custom stack traces created (e.g. Scala's ZIO where they try to preserve logical stack when physical stack is a confusing mess of work-stealing green-threads. We should allow these to interact with exception reporting in OTEL in some fashion.

I agree with the sentiment, I'd expand the wording though to allow languages like Rust/C++ (and Java ecosystem) to provide stack trace compatibility with their library ecosystem.

updated to use normative language. Also added

The signature of the method is to be determined by each language
and can be overloaded as appropriate including ability to customize stack trace
collection.

It MUST be possible to efficiently set exception information on a log record based on configuration
and without using the setException method.

jsuereth · 2025-01-07T13:28:37Z

oteps/4333-recording-exceptions-on-logs.md

+     with appropriate severity (or stop reporting them).
+   - We should provide opt-in mechanism for existing instrumentations to switch to logs.
+
+2. Recording exceptions as log-based events would result in UX degradation for users


I think we should also call out the fact that now we have two channels of exporting/batching/recording information of exceptions and Traces. In this new world, you may see a trace before an exception or vice versa, and one may be dropped where the other is not.

We probably need some other mitigatioin should that requiring knowledge of an exception event under a Span is no longer needed (e.g. more aggressively using Span.status and attributes around "transient failures" as we discussed in Semconv SIG.

Added a clarification in the OTEP that logs are intended to replace span events (gracefully). I.e. there should be just one channel of communication at a time.

oteps/4333-recording-exceptions-on-logs.md

Co-authored-by: Joao Grassi <5938087+joaopgrassi@users.noreply.github.com>

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

…r span events

oteps/4333-recording-exceptions-on-logs.md

adriangb · 2025-01-24T11:49:24Z

Doesn't this potentially make data much harder to query?
I can't possibly be familiar with all of the ways backends store and make data queryable so I will take our backend as an example. We essentially store the data in a relational table and make it queryable via SQL.

If the exception is on the span itself I can do something like:

select *
from records
where span_name = 'db query' and has_exceptions(events)

Whereas if it's emitted as a (non span) event (aka a log?) we would have to either re-assemble during ingest (very difficult) or it ends up as it's own row, which means the query would have to be something like:

select *
from records r1
where span_name = 'db query' and exists (
  select
  from records r2
  where r1.trace_id = r2.trace_id and r2.parent_span_id = r1.span_id and has_exceptions(r2.events)
)

pellared · 2025-01-24T12:30:27Z

Doesn't this potentially make data much harder to query? I can't possibly be familiar with all of the ways backends store and make data queryable so I will take our backend as an example. We essentially store the data in a relational table and make it queryable via SQL.

If the exception is on the span itself I can do something like:
select *
from records
where span_name = 'db query' and has_exceptions(events)
Whereas if it's emitted as a (non span) event (aka a log?) we would have to either re-assemble during ingest (very difficult) or it ends up as it's own row, which means the query would have to be something like:
select *
from records r1
where span_name = 'db query' and exists (
  select
  from records r2
  where r1.trace_id = r2.trace_id and r2.parent_span_id = r1.span_id and has_exceptions(r2.events)
)

How is this related to the OTEP? Many backends already show log records related to the given trace. It looks like a complain to the trace-log correlation. Or even https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0265-event-vision.md#relationship-to-span-events (different OTEP).

re-assemble during ingest (very difficult)

I guess the difficulty depends on the given backend design.

pellared · 2025-01-24T12:52:05Z

oteps/4333-recording-exceptions-on-logs.md

+   - OpenTelemetry API and/or SDK in the future may provide opt-in span events -> log-based events conversion,
+     but that's not enough - instrumentations will have to change their behavior to report errors
+     as logs with appropriate severity.
+   - We should provide opt-in mechanism for existing instrumentations to switch to logs.


This is a limited mitigation that will not work for instrumentations that we do not own.

pellared · 2025-01-24T12:56:16Z

oteps/4333-recording-exceptions-on-logs.md

+   for any component following existing [exception guidance](/specification/trace/exceptions.md).
+
+   **Mitigation:**
+   - OpenTelemetry API and/or SDK in the future may provide opt-in span events -> log-based events conversion,


In the long term, errors recorded on logs will replace span events
(according to Event vision OTEP).

I think this mechanism will be necessary for migrating from Span Events to Exception Log Records and we should probably have a separate OTEP for this.

adriangb · 2025-01-24T13:11:01Z

How is this related to the OTEP? Many backends already show log records related to the given trace. It looks like a complain to the trace-log correlation. Or even https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0265-event-vision.md#relationship-to-span-events (different OTEP).

The OTEP is proposing to move data from the span to related logs, so understanding the impact of this on user experience is important. I guess you could say my question is more in general about https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0265-event-vision.md#relationship-to-span-events but I think exceptions are a particularly important tipping point.

Many backends already show log records related to the given trace. It looks like a complain to the trace-log correlation

We do as well. But do any of them support doing this programmatically? It's one thing to have the backend know how to retrieve that data, it's another thing to do it programatically e.g. to build a dashboard of error rate by endpoint. If they do I'd like to see what that query looks like before and after this proposal from a general user's perspective.

Kludex · 2025-01-24T16:03:55Z

oteps/4333-recording-exceptions-on-logs.md

+   **Mitigation:**
+   - OpenTelemetry API and/or SDK may provide span events -> log events conversion.
+     See also [Event vision OTEP](./0265-event-vision.md#relationship-to-span-events).


Is the mitigation supposed to be long term instead of short term for trace-only backends then?

opentelemetry-specification/oteps/0265-event-vision.md

Lines 41 to 46 in c55ed5c

### Relationship to Span Events

Events are intended to replace Span Events in the long term.

Span Events will be deprecated to signal that users should prefer Events.

Interoperability between Events and Span Events will be defined in the short term.

It seems this section is proposing a long term mitigation, but the link provided specifies that it should be a short term solution.

adriangb · 2025-01-24T16:04:42Z

I also want to confirm what the proposed behavior is based on this scenario. It was not clear to me from reading the OTEP.

from opentelemetry.sdk.resources import SERVICE_NAME, Resource

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.trace import ReadableSpan

resource = Resource(attributes={SERVICE_NAME:"test"})

def formatter(span: ReadableSpan) -> str:
    out: list[str] = []
    out.append(f"{span.name}")
    for event in span.events:
        if event.name == "exception":
            attributes = event.attributes or {}
            stacktrace = attributes.get("exception.stacktrace") or ""
            out.append(f"  {event.name}: {stacktrace}")
    out.append("")
    return "\n".join(out)

trace_provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter(formatter=formatter))
trace_provider.add_span_processor(processor)
trace.set_tracer_provider(trace_provider)

tracer = trace.get_tracer("test")



def inner() -> None:
    with tracer.start_as_current_span("inner"):
        raise ValueError('the message')

def outer() -> None:
    with tracer.start_as_current_span("outer"):
        inner()


try:
    with tracer.start_as_current_span("root"):
        outer()
except Exception as e:
    pass  # keep the output clean

Currently this gives (cleanup up a bit):

inner
  File "/Users/adriangb/GitHub/platform/test.py", line 32, in inner
    raise ValueError('the message')
ValueError: the message

outer
  File "/Users/adriangb/GitHub/platform/test.py", line 36, in outer
    inner()
  File "/Users/adriangb/GitHub/platform/test.py", line 32, in inner
    raise ValueError('the message')
ValueError: the message

root
  File "/Users/adriangb/GitHub/platform/test.py", line 41, in <module>
    outer()
  File "/Users/adriangb/GitHub/platform/test.py", line 36, in outer
    inner()
  File "/Users/adriangb/GitHub/platform/test.py", line 32, in inner
    raise ValueError('the message')
ValueError: the message

The point is that each span has the exception recorded on it, and they each have a different stack trace.

Under this new paradigm, would a log/event be emitted for each span the exception bubbles up through? Or would only a single log/event be emitted for the first span it bubbles though? The current version of this PR states:

It's NOT RECOMMENDED to record the same error as it propagates through the call stack, or to attach the same instance of an exception to multiple log records.

If this means that only a single log would be emitted we'd loose information on the fact that this exception bubbled up through the other spans. Even if the span's status code and message get set, I still don't have the stack trace that bubbled up through the event, meaning information is lost relative to the current system.

trask · 2025-01-24T15:53:28Z

oteps/4333-recording-exceptions-on-logs.md

+> [!NOTE]
+> Throughout this OTEP, the terms exception and error are defined as follows:
+>
+> - *Error* refers to a general concept describing any non-success condition,
+>   which may manifest as an exception, non-successful status code, or an invalid
+>   response.
+> - *Exception* specifically refers to runtime exceptions and their associated stack traces.


trask · 2025-01-24T15:53:52Z

oteps/4333-recording-exceptions-on-logs.md

+
+## Motivation
+
+Today OTel supports recording *exceptions* using span events available through Trace API. Outside of OTel world,


Suggested change

Today OTel supports recording *exceptions* using span events available through Trace API. Outside of OTel world,

Today OTel supports recording *exceptions* using span events available through the Trace API. Outside the OTel world,

trask · 2025-01-24T15:55:13Z

oteps/4333-recording-exceptions-on-logs.md

+## Motivation
+
+Today OTel supports recording *exceptions* using span events available through Trace API. Outside of OTel world,
+*errors* are usually recorded by user apps and libraries using logging libraries


Suggested change

*errors* are usually recorded by user apps and libraries using logging libraries

*errors* are usually recorded by user apps and libraries by using logging libraries,

trask · 2025-01-24T15:55:31Z

oteps/4333-recording-exceptions-on-logs.md

+
+Today OTel supports recording *exceptions* using span events available through Trace API. Outside of OTel world,
+*errors* are usually recorded by user apps and libraries using logging libraries
+and may be recorded as OTel logs via logging bridge.


Suggested change

and may be recorded as OTel logs via logging bridge.

and may be recorded as OTel logs via a logging bridge.

trask · 2025-01-24T15:56:31Z

oteps/4333-recording-exceptions-on-logs.md

+*errors* are usually recorded by user apps and libraries using logging libraries
+and may be recorded as OTel logs via logging bridge.
+
+Errors recorded on logs have the following advantages over span events:


Suggested change

Errors recorded on logs have the following advantages over span events:

Using logs to record errors has the following advantages over using span events:

trask · 2025-01-24T16:10:52Z

oteps/4333-recording-exceptions-on-logs.md

+      - Background job terminates with an exception
+      - HTTP framework error handler catches exception thrown by the application code.
+
+        Note: some frameworks use exceptions as a communication mechanism when request fails. For example,


Suggested change

Note: some frameworks use exceptions as a communication mechanism when request fails. For example,

Note: some frameworks use exceptions as a communication mechanism when a request fails. For example,

trask · 2025-01-24T16:11:02Z

oteps/4333-recording-exceptions-on-logs.md

+      - HTTP framework error handler catches exception thrown by the application code.
+
+        Note: some frameworks use exceptions as a communication mechanism when request fails. For example,
+        Spring users can throw [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)


Suggested change

Spring users can throw [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)

Spring users can throw a [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)

trask · 2025-01-24T16:11:18Z

oteps/4333-recording-exceptions-on-logs.md

+
+        Note: some frameworks use exceptions as a communication mechanism when request fails. For example,
+        Spring users can throw [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)
+        exception to return unsuccessful status code. Such exceptions represent errors already handled by the application code.


Suggested change

exception to return unsuccessful status code. Such exceptions represent errors already handled by the application code.

exception to return an unsuccessful status code. Such exceptions represent errors already handled by the application code.

trask · 2025-01-24T16:12:01Z

oteps/4333-recording-exceptions-on-logs.md

+
+   - Errors that result in application shutdown SHOULD be recorded with severity `Fatal`.
+
+      - The application detects an invalid configuration at startup and shuts down.


Suggested change

- The application detects an invalid configuration at startup and shuts down.

Examples:

- The application detects an invalid configuration at startup and shuts down.

trask · 2025-01-24T16:12:54Z

oteps/4333-recording-exceptions-on-logs.md

+   to describe the context that the exception was thrown in.
+   They are also encouraged to define their own error events and enrich them with exception details.
+
+7. OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to


Suggested change

7. OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to

7. The OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to

dmontagu · 2025-01-24T16:41:11Z

oteps/4333-recording-exceptions-on-logs.md

+4. It's NOT RECOMMENDED to record the same error as it propagates through the call stack, or
+   to attach the same instance of an exception to multiple log records.


In python, because exceptions are objects, if you have exceptions bubbling through nested spans, I'm not sure how you'd be able to achieve this while still being able to distinguish which specific exception was raised through the higher spans.

Here is a concrete example:

import asyncio from opentelemetry.sdk.resources import SERVICE_NAME, Resource from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter from opentelemetry.sdk.trace import ReadableSpan resource = Resource(attributes={SERVICE_NAME:"test"}) def formatter(span: ReadableSpan) -> str: out: list[str] = [] out.append(f"{span.name}") for event in span.events: if event.name == "exception": attributes = event.attributes or {} stacktrace = attributes.get("exception.stacktrace") or "" out.append(f" {event.name}: {stacktrace}") out.append('') return "\n".join(out) trace_provider = TracerProvider(resource=resource) processor = BatchSpanProcessor(ConsoleSpanExporter(formatter=formatter)) trace_provider.add_span_processor(processor) trace.set_tracer_provider(trace_provider) tracer = trace.get_tracer("test") async def task_1(fail: bool): with tracer.start_as_current_span("task_1"): if fail: raise ValueError('fail') async def task_2(fail: bool): with tracer.start_as_current_span("task_2"): if fail: raise ValueError('fail') async def task_3(fail: bool): with tracer.start_as_current_span("task_3"): if fail: raise ValueError('fail') async def root(fail_1: bool, fail_2: bool, fail_3: bool) -> None: with tracer.start_as_current_span("root"): # We want to run all tasks concurrently, attempting to finish as many as possible, even if some fail results = await asyncio.gather(task_1(fail_1), task_2(fail_2), task_3(fail_3), return_exceptions=True) # Here is some concrete logic for handling the produced exceptions, # but note that we could just as easily raise the last error, ignore certain exception types, etc. for result in results: if isinstance(result, Exception): raise result async def main(): # No failures await root(False, False, False) # All failures try: await root(True, True, True) except Exception as e: pass # keep the output clean asyncio.run(main())

If you run this script, you get the following output:

task_1 task_2 task_3 root task_1 exception: Traceback (most recent call last): File "python3.10/site-packages/opentelemetry/trace/__init__.py", line 589, in use_span yield span File "python3.10/site-packages/opentelemetry/sdk/trace/__init__.py", line 1104, in start_as_current_span yield span File "scratch_file.py", line 34, in task_1 raise ValueError('fail') ValueError: fail task_2 exception: Traceback (most recent call last): File "python3.10/site-packages/opentelemetry/trace/__init__.py", line 589, in use_span yield span File "python3.10/site-packages/opentelemetry/sdk/trace/__init__.py", line 1104, in start_as_current_span yield span File "scratch_file.py", line 40, in task_2 raise ValueError('fail') ValueError: fail task_3 exception: Traceback (most recent call last): File "python3.10/site-packages/opentelemetry/trace/__init__.py", line 589, in use_span yield span File "python3.10/site-packages/opentelemetry/sdk/trace/__init__.py", line 1104, in start_as_current_span yield span File "scratch_file.py", line 46, in task_3 raise ValueError('fail') ValueError: fail root exception: Traceback (most recent call last): File "python3.10/site-packages/opentelemetry/trace/__init__.py", line 589, in use_span yield span File "python3.10/site-packages/opentelemetry/sdk/trace/__init__.py", line 1104, in start_as_current_span yield span File "scratch_file.py", line 59, in root raise result File "scratch_file.py", line 34, in task_1 raise ValueError('fail') ValueError: fail

With the current behavior, you can tell by looking in the stack trace of the exception bubbling through the "root" span that the raised exception was coming from task_1. However, if we were to only record each exception once, I'm not sure how you'd be able to determine which exception was raised through the "root" span (e.g., if the code were changed to raise the last exception rather than the first).

While this may feel somewhat contrived (a consequence of trying to keep the example fairly minimal), I am confident that there are lots of python use cases today where, in the face of an exception, you might try a fallback codepath, and if that fallback codepath fails you still raise the error coming from the original execution path. (And there are obviously scenarios where you would want to just raise the error of the fallback codepath.) So, if each of those exceptions end up bubbling through their own spans, and subsequently bubble through a parent span, I think we need to retain a way to know which exception was raised through the parent span.

Given these examples, and the flexibility of python, it's hard for me to imagine how you'd be able to know programmatically whether there was information loss if you don't record the same error as it passes through spans higher in the call stack, therefore I would consider it a regression if we lost the current behavior of recording an event every time an exception passes through a span, even if the same exception has already been recorded as passing through some child span. (And in particular, the events should retain the full stack trace — note that the otel status message isn't enough to distinguish the specific exception in the example above, but the stack trace is enough.)

I see this note below:

It also may be desirable by some vendors/apps to record all exception details at all levels.

As long as this can be done for an arbitrary python application and isn't dependent on the instrumentations or otel SDK behavior, then I think everything is okay, I just want to point out that retaining that option is important, at least for python.

lmolkova changed the title ~~OTEP: Recording exceptions and errors with OpenTelemetry~~ OTEP: Recording exceptions as log based events Dec 10, 2024

cijothomas reviewed Dec 10, 2024

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved

cijothomas reviewed Dec 10, 2024

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved

cijothomas reviewed Dec 10, 2024

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Show resolved Hide resolved

cijothomas reviewed Dec 10, 2024

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved

cijothomas reviewed Dec 10, 2024

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved

pellared assigned lmolkova Dec 10, 2024

trask reviewed Dec 11, 2024

View reviewed changes

tedsuo added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Dec 12, 2024

lmolkova force-pushed the exceptions-on-logs-otep branch 2 times, most recently from b06a09f to 76c7d85 Compare December 17, 2024 17:30

reyang reviewed Dec 20, 2024

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved

trask mentioned this pull request Dec 21, 2024

Allow user to enable/disable storing entire stack trace in spans open-telemetry/opentelemetry-java-instrumentation#431

Open

lmolkova mentioned this pull request Dec 27, 2024

What's the use case for exception.escaped? open-telemetry/semantic-conventions#1516

Closed

lmolkova marked this pull request as ready for review December 27, 2024 17:41

lmolkova requested review from a team as code owners December 27, 2024 17:41

lmolkova force-pushed the exceptions-on-logs-otep branch 4 times, most recently from 1a1ea49 to 5ddfd05 Compare December 27, 2024 18:28

lmolkova mentioned this pull request Dec 28, 2024

Add common guidance on recording errors on spans and metrics, clarify DB conventions open-telemetry/semantic-conventions#1716

Merged

3 tasks

joaopgrassi reviewed Jan 2, 2025

View reviewed changes

lmolkova force-pushed the exceptions-on-logs-otep branch from db27087 to e9f38aa Compare January 3, 2025 01:42

jsuereth reviewed Jan 7, 2025

View reviewed changes

pellared reviewed Jan 7, 2025

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved

This was referenced Jan 7, 2025

Define whether instrumentations can emit non-event log records #4234

Closed

logs: Remove Events operations in favor of making whole Logger user-facing #4352

Merged

lmolkova and others added 19 commits January 20, 2025 21:48

OTEP: Recording exceptions and errors with OpenTelemetry

50ef640

filename

cfc75d1

feedback and cleanup

28b7590

cleanups

5473397

feedback: recording all exceptions by default is too noisy

39264f6

minor fixes

e33290f

minor fixes

91e4e3c

clean up

810c864

changelog and lint

23a723c

more cleanups and lint

5087efc

ore fixes

9d5a075

more cleanups and lint

d8ad5c0

Apply suggestions from code review

e7556b2

Co-authored-by: Joao Grassi <5938087+joaopgrassi@users.noreply.github.com>

feedback

26f599c

remove the note

81d17ab

Apply suggestions from code review

3285ea7

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

feedback: more details on severity, language-specific, replacement fo…

b53cda0

…r span events

more feedback, define error/exception

f035e98

lint

61d9551

lmolkova force-pushed the exceptions-on-logs-otep branch from 5a4b286 to 61d9551 Compare January 21, 2025 05:48

pellared reviewed Jan 21, 2025

View reviewed changes

oteps/4333-recording-exceptions-on-logs.md Show resolved Hide resolved

pellared reviewed Jan 24, 2025

View reviewed changes

trask mentioned this pull request Jan 24, 2025

Glossary term Event needs more explanation open-telemetry/opentelemetry.io#6069

Open

Kludex reviewed Jan 24, 2025

View reviewed changes

trask reviewed Jan 24, 2025

View reviewed changes

dmontagu reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTEP: Recording exceptions as log based events #4333

OTEP: Recording exceptions as log based events #4333

lmolkova commented Dec 10, 2024 •

edited

Loading

pellared commented Dec 10, 2024 •

edited

Loading

carlosalberto commented Jan 7, 2025

jsuereth left a comment

jsuereth Jan 7, 2025

lmolkova Jan 14, 2025

jsuereth Jan 7, 2025

lmolkova Jan 18, 2025

jsuereth Jan 7, 2025

lmolkova Jan 18, 2025

jsuereth Jan 7, 2025

lmolkova Jan 18, 2025

adriangb commented Jan 24, 2025

pellared commented Jan 24, 2025 •

edited

Loading

pellared Jan 24, 2025 •

edited

Loading

pellared Jan 24, 2025

adriangb commented Jan 24, 2025 •

edited

Loading

Kludex Jan 24, 2025 •

edited

Loading

adriangb commented Jan 24, 2025 •

edited

Loading

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

trask Jan 24, 2025

dmontagu Jan 24, 2025 •

edited

Loading


		5. An error should be logged with appropriate severity depending on the available context.

		- Errors that don't indicate any issue should be recorded with severity not higher than `Info`.

	### Relationship to Span Events

	Events are intended to replace Span Events in the long term.
	Span Events will be deprecated to signal that users should prefer Events.

	Interoperability between Events and Span Events will be defined in the short term.


		## Motivation

		Today OTel supports recording exceptions using span events available through Trace API. Outside of OTel world,

	errors are usually recorded by user apps and libraries using logging libraries
	errors are usually recorded by user apps and libraries by using logging libraries,

	and may be recorded as OTel logs via logging bridge.
	and may be recorded as OTel logs via a logging bridge.

	Errors recorded on logs have the following advantages over span events:
	Using logs to record errors has the following advantages over using span events:

	Note: some frameworks use exceptions as a communication mechanism when request fails. For example,
	Note: some frameworks use exceptions as a communication mechanism when a request fails. For example,

	Spring users can throw [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)
	Spring users can throw a [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)

	exception to return unsuccessful status code. Such exceptions represent errors already handled by the application code.
	exception to return an unsuccessful status code. Such exceptions represent errors already handled by the application code.


		- Errors that result in application shutdown SHOULD be recorded with severity `Fatal`.

		- The application detects an invalid configuration at startup and shuts down.

	7. OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to
	7. The OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to

		4. It's NOT RECOMMENDED to record the same error as it propagates through the call stack, or
		to attach the same instance of an exception to multiple log records.

OTEP: Recording exceptions as log based events #4333

Are you sure you want to change the base?

OTEP: Recording exceptions as log based events #4333

Conversation

lmolkova commented Dec 10, 2024 • edited Loading

Changes

pellared commented Dec 10, 2024 • edited Loading

carlosalberto commented Jan 7, 2025

jsuereth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Jan 24, 2025

pellared commented Jan 24, 2025 • edited Loading

pellared Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Jan 24, 2025 • edited Loading

Kludex Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

adriangb commented Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmontagu Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

lmolkova commented Dec 10, 2024 •

edited

Loading

pellared commented Dec 10, 2024 •

edited

Loading

pellared commented Jan 24, 2025 •

edited

Loading

pellared Jan 24, 2025 •

edited

Loading

adriangb commented Jan 24, 2025 •

edited

Loading

Kludex Jan 24, 2025 •

edited

Loading

adriangb commented Jan 24, 2025 •

edited

Loading

dmontagu Jan 24, 2025 •

edited

Loading