Upgrade storage integration test: use `TraceWriter` #6437

ekefan · 2024-12-28T15:20:18Z

Which problem is this PR solving?

Part of Upgrade storage integration tests to Storage v2 API #6366

Description of the changes

Incrementally swaps the fields of StorageIntegration to align with v2 storage api while supporting v1 api
- replaced SpanWriter with TraceWriter
Updates test functions accordingly to work with the updated fields

How was this change tested?

CI

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: npm run lint and npm run test

codecov · 2024-12-28T15:27:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.29%. Comparing base (244b759) to head (1559b2c).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6437      +/-   ##
==========================================
- Coverage   96.30%   96.29%   -0.02%     
==========================================
  Files         371      371              
  Lines       21160    21173      +13     
==========================================
+ Hits        20379    20389      +10     
- Misses        598      600       +2     
- Partials      183      184       +1

Flag	Coverage Δ
badger_v1	`10.74% <76.92%> (+0.23%)`	⬆️
badger_v2	`2.80% <76.92%> (+0.21%)`	⬆️
cassandra-4.x-v1-manual	`16.62% <76.92%> (+0.25%)`	⬆️
cassandra-4.x-v2-auto	`2.73% <76.92%> (+0.21%)`	⬆️
cassandra-4.x-v2-manual	`2.73% <76.92%> (+0.21%)`	⬆️
cassandra-5.x-v1-manual	`16.62% <76.92%> (+0.25%)`	⬆️
cassandra-5.x-v2-auto	`2.73% <76.92%> (+0.21%)`	⬆️
cassandra-5.x-v2-manual	`2.73% <76.92%> (+0.21%)`	⬆️
elasticsearch-6.x-v1	`20.31% <76.92%> (+0.21%)`	⬆️
elasticsearch-7.x-v1	`20.38% <76.92%> (+0.21%)`	⬆️
elasticsearch-8.x-v1	`20.55% <76.92%> (+0.21%)`	⬆️
elasticsearch-8.x-v2	`2.79% <76.92%> (+0.21%)`	⬆️
grpc_v1	`12.38% <76.92%> (+0.22%)`	⬆️
grpc_v2	`9.15% <76.92%> (+0.20%)`	⬆️
kafka-3.x-v1	`10.58% <76.92%> (+0.23%)`	⬆️
kafka-3.x-v2	`2.80% <76.92%> (+0.21%)`	⬆️
memory_v2	`2.80% <76.92%> (+0.21%)`	⬆️
opensearch-1.x-v1	`20.43% <76.92%> (+0.21%)`	⬆️
opensearch-2.x-v1	`20.43% <76.92%> (+0.21%)`	⬆️
opensearch-2.x-v2	`2.79% <76.92%> (+0.21%)`	⬆️
tailsampling-processor	`0.52% <0.00%> (+0.12%)`	⬆️
unittests	`95.16% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

storage_v2/v1adapter/translator_test.go

cmd/jaeger/internal/integration/e2e_integration.go

storage_v2/v1adapter/translator.go

plugin/storage/integration/memstore_test.go

storage_v2/v1adapter/writer.go

storage_v2/v1adapter/translator.go

storage_v2/v1adapter/translator_test.go

yurishkuro

lgtm, with some small nits. Also please check why CI is not green.

ekefan · 2024-12-29T05:08:18Z

lgtm, with some small nits. Also please check why CI is not green.

For otlp_json encoding, kafka test fails to get large spans.
With console logs, I discovered that no traces are found when trying to read...

I'm trying to figure it out why

yurishkuro · 2024-12-29T18:49:06Z

I found this error
2024-12-29T12:34:09.540-0500 info internal/retry_sender.go:126 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "kafka", "error": "Failed to deliver 1 messages due to kafka: invalid configuration (Attempt to produce message larger than configured Producer.MaxMessageBytes: 3052630 > 1000000)", "interval": "30.935639755s"}

I'd call this a bug in OTEL Kafka exporter, it is not respecting the max message size for Kafka. In other words, if for whatever reason the collector received a very large payload and accepted it, the exporter should not be failing to export it just because it's large, it should try to split the payload into chunks that are of acceptable size.

For the purpose of this PR ww can probably just increase this parameter to 3MB (but I am not sure if Kafka's internal configuration also needs to be increased). Alternatively we can change the e2e test to not send the whole trace all at once, but break it into say 1000-span chunks. The ideal fix would be to correct OTEL kafkaexporter to respect the message size.

ekefan · 2024-12-29T21:31:41Z

For the purpose of this PR ww can probably just increase this parameter to 3MB (but I am not sure if Kafka's internal configuration also needs to be increased). Alternatively we can change the e2e test to not send the whole trace all at once, but break it into say 1000-span chunks. The ideal fix would be to correct OTEL kafkaexporter to respect the message size.

for the alternative solution, it is at this point

func (w *traceWriter) WriteTraces(ctx context.Context, td ptrace.Traces) error {
        // create chunks of trace if span count > 1000
	return w.exporter.ConsumeTraces(ctx, td)
}

I have a question please:

Why doesn't this exporter return an error saying it can write a trace with that many spans?

yurishkuro · 2024-12-29T22:35:31Z

Good question. It's because there is no error when it does export - it sends the payload to OTLP receiver in the collector which accepts it and passes down the pipeline. In the pipeline we have a batch processor that always responds without an error because it groups the spans and then sends then in the background, at which point the error happens but there's no place to report it except for logs. It's a flaw in the OTEL collector batch processor - it could've been implemented to be able to return the error to all clients whose payload failed to be exported in a batch.

yurishkuro · 2024-12-29T22:36:17Z

And yes, in the writeTrace function we can split a trace into several chunks.

yurishkuro · 2024-12-29T23:16:18Z

I booked a ticket #6439 to have a proper fix upstream, but meanwhile we can fix writeTrace to unblock this work.

cmd/jaeger/internal/integration/trace_writer_test.go

Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>

- use standard for traces - upgrade test for `V1TraceToOtelTrace` Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>

Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>

- upgrade test - improve function structure Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>

Signed-off-by: Emmanuel Emonueje Ebenezer <eebenezer949@gmail.com>

yurishkuro · 2025-01-02T00:09:17Z

cmd/jaeger/internal/integration/trace_writer_test.go

+	// Add span1 and span2
+	scope1 := resources.At(0).ScopeSpans().At(0)
+	for i := 1; i <= 2; i++ {
+		span := scope1.Spans().AppendEmpty()
+		span.SetSpanID(pcommon.SpanID([8]byte{0, 0, 0, 0, 0, 0, 0, byte(i)}))
+		span.SetName(fmt.Sprintf("span%d", i))
+	}


why not do this in the same loop at L54?

I wanted it to be explicit that scope1 and scope3 have two spans while scope 2 has one only

yurishkuro · 2025-01-02T00:16:04Z

Thanks

ekefan requested a review from a team as a code owner December 28, 2024 15:20

ekefan requested a review from pavolloffay December 28, 2024 15:20

dosubot bot added area/storage storage/badger Issues related to badger storage storage/cassandra storage/elasticsearch storage/kafka v2 labels Dec 28, 2024