Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade storage integration test: use TraceWriter #6437

Merged
merged 18 commits into from
Jan 2, 2025

Conversation

ekefan
Copy link
Contributor

@ekefan ekefan commented Dec 28, 2024

Which problem is this PR solving?

Description of the changes

  • Incrementally swaps the fields of StorageIntegration to align with v2 storage api while supporting v1 api
    • replaced SpanWriter with TraceWriter
  • Updates test functions accordingly to work with the updated fields

How was this change tested?

  • CI

Checklist

Copy link

codecov bot commented Dec 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.29%. Comparing base (244b759) to head (1559b2c).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6437      +/-   ##
==========================================
- Coverage   96.30%   96.29%   -0.02%     
==========================================
  Files         371      371              
  Lines       21160    21173      +13     
==========================================
+ Hits        20379    20389      +10     
- Misses        598      600       +2     
- Partials      183      184       +1     
Flag Coverage Δ
badger_v1 10.74% <76.92%> (+0.23%) ⬆️
badger_v2 2.80% <76.92%> (+0.21%) ⬆️
cassandra-4.x-v1-manual 16.62% <76.92%> (+0.25%) ⬆️
cassandra-4.x-v2-auto 2.73% <76.92%> (+0.21%) ⬆️
cassandra-4.x-v2-manual 2.73% <76.92%> (+0.21%) ⬆️
cassandra-5.x-v1-manual 16.62% <76.92%> (+0.25%) ⬆️
cassandra-5.x-v2-auto 2.73% <76.92%> (+0.21%) ⬆️
cassandra-5.x-v2-manual 2.73% <76.92%> (+0.21%) ⬆️
elasticsearch-6.x-v1 20.31% <76.92%> (+0.21%) ⬆️
elasticsearch-7.x-v1 20.38% <76.92%> (+0.21%) ⬆️
elasticsearch-8.x-v1 20.55% <76.92%> (+0.21%) ⬆️
elasticsearch-8.x-v2 2.79% <76.92%> (+0.21%) ⬆️
grpc_v1 12.38% <76.92%> (+0.22%) ⬆️
grpc_v2 9.15% <76.92%> (+0.20%) ⬆️
kafka-3.x-v1 10.58% <76.92%> (+0.23%) ⬆️
kafka-3.x-v2 2.80% <76.92%> (+0.21%) ⬆️
memory_v2 2.80% <76.92%> (+0.21%) ⬆️
opensearch-1.x-v1 20.43% <76.92%> (+0.21%) ⬆️
opensearch-2.x-v1 20.43% <76.92%> (+0.21%) ⬆️
opensearch-2.x-v2 2.79% <76.92%> (+0.21%) ⬆️
tailsampling-processor 0.52% <0.00%> (+0.12%) ⬆️
unittests 95.16% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ekefan ekefan changed the title Upgrade storage integration test, use TraceWriter Upgrade storage integration test: use TraceWriter Dec 28, 2024
@yurishkuro yurishkuro added the changelog:ci Change related to continuous integration / testing label Dec 28, 2024
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, with some small nits. Also please check why CI is not green.

@ekefan
Copy link
Contributor Author

ekefan commented Dec 29, 2024

lgtm, with some small nits. Also please check why CI is not green.

For otlp_json encoding, kafka test fails to get large spans.
With console logs, I discovered that no traces are found when trying to read...

I'm trying to figure it out why

@yurishkuro
Copy link
Member

I found this error
2024-12-29T12:34:09.540-0500 info internal/retry_sender.go:126 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "kafka", "error": "Failed to deliver 1 messages due to kafka: invalid configuration (Attempt to produce message larger than configured Producer.MaxMessageBytes: 3052630 > 1000000)", "interval": "30.935639755s"}

I'd call this a bug in OTEL Kafka exporter, it is not respecting the max message size for Kafka. In other words, if for whatever reason the collector received a very large payload and accepted it, the exporter should not be failing to export it just because it's large, it should try to split the payload into chunks that are of acceptable size.

For the purpose of this PR ww can probably just increase this parameter to 3MB (but I am not sure if Kafka's internal configuration also needs to be increased). Alternatively we can change the e2e test to not send the whole trace all at once, but break it into say 1000-span chunks. The ideal fix would be to correct OTEL kafkaexporter to respect the message size.

@ekefan
Copy link
Contributor Author

ekefan commented Dec 29, 2024

For the purpose of this PR ww can probably just increase this parameter to 3MB (but I am not sure if Kafka's internal configuration also needs to be increased). Alternatively we can change the e2e test to not send the whole trace all at once, but break it into say 1000-span chunks. The ideal fix would be to correct OTEL kafkaexporter to respect the message size.

for the alternative solution, it is at this point

func (w *traceWriter) WriteTraces(ctx context.Context, td ptrace.Traces) error {
        // create chunks of trace if span count > 1000
	return w.exporter.ConsumeTraces(ctx, td)
}

I have a question please:

  • Why doesn't this exporter return an error saying it can write a trace with that many spans?

@yurishkuro
Copy link
Member

Good question. It's because there is no error when it does export - it sends the payload to OTLP receiver in the collector which accepts it and passes down the pipeline. In the pipeline we have a batch processor that always responds without an error because it groups the spans and then sends then in the background, at which point the error happens but there's no place to report it except for logs. It's a flaw in the OTEL collector batch processor - it could've been implemented to be able to return the error to all clients whose payload failed to be exported in a batch.

@yurishkuro
Copy link
Member

And yes, in the writeTrace function we can split a trace into several chunks.

@yurishkuro
Copy link
Member

I booked a ticket #6439 to have a proper fix upstream, but meanwhile we can fix writeTrace to unblock this work.

ekefan added 4 commits January 1, 2025 23:35
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
@ekefan ekefan force-pushed the update-spanWriter branch from 03b059d to 9ce3f14 Compare January 1, 2025 22:44
ekefan added 13 commits January 1, 2025 23:46
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
- use standard for traces
- upgrade test for `V1TraceToOtelTrace`

Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
- upgrade test
- improve function structure

Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
@ekefan ekefan force-pushed the update-spanWriter branch from 9ce3f14 to 861eea1 Compare January 1, 2025 22:55
Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>
Comment on lines +57 to +63
// Add span1 and span2
scope1 := resources.At(0).ScopeSpans().At(0)
for i := 1; i <= 2; i++ {
span := scope1.Spans().AppendEmpty()
span.SetSpanID(pcommon.SpanID([8]byte{0, 0, 0, 0, 0, 0, 0, byte(i)}))
span.SetName(fmt.Sprintf("span%d", i))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do this in the same loop at L54?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it to be explicit that scope1 and scope3 have two spans while scope 2 has one only

@yurishkuro yurishkuro merged commit 83b64d6 into jaegertracing:main Jan 2, 2025
54 checks passed
@yurishkuro
Copy link
Member

Thanks

@ekefan ekefan deleted the update-spanWriter branch January 2, 2025 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage changelog:ci Change related to continuous integration / testing v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants