TaskMetadata as header #238

ocadaruma · 2024-06-13T08:53:30Z

Rework of #80 since it's too outdated.

Motivation

Currently, DecatonClient serializes tasks in DecatonTaskRequest protobuf format because when Decaton had started, Kafka didn't have record header yet
- As Kafka started record header support quite long ago, it's natural to use it to embed task metadata

Summary of changes

Deprecate DecatonTaskRequest

DecatonTaskRequest protobuf is necessary only to parse decaton tasks produced from old clients/retry-processors), so I moved it to .internal and marked as deprecated
DecatonTaskRequest will be removed in later major release in the future

DecatonClient

DecatonClient now produces serialized tasks as record value directly, instead of wrapping as DecatonTaskRequest
TaskMetadata is stored in record headers from now on

TaskExtractor#extract signature change

To extract TaskMetadata from record headers, TaskExtractor#extract now accepts ConsumerRecord<byte[], byte[]> instead of just byte[]

ProcessorProperties, RetryQueueingProcessor

New decaton.task.metadata.as.header config is introduced. This is to control producing retry tasks with header-metadata or in deprecated DecatonTaskRequest format.

Trivial

Updated copyright header config

Breaking changes

DecatonTaskRequest package change to com.linecorp.decaton.protocol.internal
KafkaProducerSupplier signature change
DecatonClient no longer wraps tasks as DecatonTaskRequest
TaskExtractor signature change
Retry tasks are no longer wrapped as DecatonTaskRequest unless decaton.task.metadata.as.header is set to false

IMPORTANT NOTE

To upgrade Decaton to 9.0.0 (which will contains this PR) from prior releases, users MUST take care about the procedure, because topic format is changed. Otherwise the processor may cause error.
General rule:
- You must upgrade processors first, before upgrading clients
Procedures depending on the use case:
- A: Using retry-queueing && downtime is not acceptable:
  - (1) Upgrade all processors with setting decaton.task.metadata.as.header = false
  - (2) Upgrade clients
  - (3) Set decaton.task.metadata.as.header = true on your timing later
- B: Using retry-queueing && downtime is acceptable:
  - (1) Shutdown all processors once, then start processors with new version
  - (2) Upgrade clients
- C: Not using retry-queueing:
  - (1) Upgrade processors as usual
  - (2) Upgrade clients

TODOs

Prepare a detailed migration guide and include it in release note

ocadaruma · 2024-06-14T03:08:55Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/TaskRequest.java

-     * This class will live until the task process has been completed.
-     * To lessen heap pressure, rawRequestBytes should be purged by calling this once the task is extracted.
-     */
-    public void purgeRawRequestBytes() {


I found that purging rawRequestBytes doesn't make much sense because ProcessingContextImpl.task.taskDataBytes anyways hold bytes before extraction, and ProcessingContextImpl lives until the task is completed.

The exception is when task is serialized as DecatonTaskRequest.
In this case, task.taskDataBytes and ConsumerRecord#value differs so purging latter might be effective though, since DecatonTaskRequest is already deprecated, we don't need to take care of this case I think.

ProcessingContextImpl lives until the task is completed.

Not really if users implement it in this way?

Completion comp = context.deferCompletion(); executor.execute(() -> { ...; comp.complete(); });

In this case the reference for the ProcessingContext itself gets cut hence GC-able?

H-m that sounds right.

Let me consider about this again then

I reverted the code to still do purging after extraction 644474b

kawamuray

Let me return once with the following comments

kawamuray · 2024-07-05T03:53:03Z

common/src/main/java/com/linecorp/decaton/common/TaskMetadataUtil.java

+ * under the License.
+ */
+
+package com.linecorp.decaton.common;


Hm, is this the only class to be placed in a common module? Since the processor module has dependency for the producer module I think we've been placing these kind of classes in the producer module, maybe that's sufficient?

Sounds good

kawamuray · 2024-07-05T03:54:12Z

common/src/main/java/com/linecorp/decaton/common/TaskMetadataUtil.java

+import com.linecorp.decaton.protocol.Decaton.TaskMetadataProto;
+
+public class TaskMetadataUtil {
+    private static final String METADATA_HEADER_KEY = "dt_meta";


I was actually expecting to decompose protobuf struct into the individual header fields, because why not?

Have you considered pros/cons of putting a single serialized value into the header vs decomposed primitive fields into multiple headers?

Considering below pros/cons, I decided to choose single protobuf value.

Decomposed primitive fields

Pros

We can access each field without decoding entire metadata (not sure about the concrete use case though)

Cons

Since header is just a Map<String, byte[]> essentially, we need to decide how to encode primitives as byte array (endianness, charset encoding, varint or fixed length, ...)

Encoded protobuf in header

Pros

Above "Cons" doesn't exist. The byte-array representation is defined by protobuf spec

Cons

Above "Pros" is not possible

not sure about the concrete use case though

only debug cases I think.

ok, I think your point is fair. let's go with the current way then.

kawamuray · 2024-07-05T05:30:41Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/ProcessorProperties.java

+     * <p>
+     * <b>CAUTION!!! YOU MAY NEED TO SET THIS TO FALSE WHEN YOU UPGRADE FROM 8.0.1 OR EARLIER</b>
+     * <p>
+     * Please read <a href="https://github.com/line/decaton/releases/tag/v8.0.1">Decaton 9.0.0 Release Note</a> carefully.


TODO: update link?

kawamuray · 2024-07-05T05:41:09Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/TaskExtractor.java

     * If the method throws an exception, the task will be discarded and processor continues to process subsequent tasks.
     */
-    DecatonTask<T> extract(byte[] bytes);
+    DecatonTask<T> extract(ConsumerRecord<byte[], byte[]> record);


Given our current situation would it be a good idea to expose more kafka specific interfaces in our public interfaces?

Yeah, that's what I wondered.
Giving full access to ConsumerRecord might be useful for advanced users, but at the same time it makes Decaton to couple with Kafka tightly.

I propose below strategy. WDYT?

Creating a class to hold necessary information (e.g. ConsumedRecord? name TBD. Which will include key: byte[], value: byte[], headers: Headers)

Change TaskExtractor#extract signature to extract(ConsumedRecord record)

Why not extract(key: byte[], value: byte[], headers: Headers)?

To prevent extract() signature change in the future when we want to provide more field, which need all users to rewrite their extractor impls

kawamuray · 2024-07-05T05:57:17Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/TaskRequest.java

-     * This class will live until the task process has been completed.
-     * To lessen heap pressure, rawRequestBytes should be purged by calling this once the task is extracted.
-     */
-    public void purgeRawRequestBytes() {


ProcessingContextImpl lives until the task is completed.

Not really if users implement it in this way?

Completion comp = context.deferCompletion(); executor.execute(() -> { ...; comp.complete(); });

In this case the reference for the ProcessingContext itself gets cut hence GC-able?

kawamuray · 2024-07-05T06:02:17Z

client/src/main/java/com/linecorp/decaton/client/internal/DecatonTaskProducer.java

    }

-    public CompletableFuture<PutTaskResult> sendRequest(byte[] key, DecatonTaskRequest request,


So is this class now essentially a slightly different KafkaProducer implementation which at least gives CF as the returning value of sendRequest? Does it worth it to keep the class itself then..?

It also fill preset configs (e.g. acks=all).
So I think it's worth to keep it

kawamuray

just a few minor points.

kawamuray · 2024-08-09T10:07:15Z

client/src/main/java/com/linecorp/decaton/client/internal/TaskMetadataUtil.java

@@ -0,0 +1,58 @@
+/*
+ * Copyright 2020 LINE Corporation


My bad. Gonna fix

kawamuray · 2024-08-13T10:06:25Z

processor/src/main/java/com/linecorp/decaton/processor/runtime/TaskExtractor.java

     * If the method throws an exception, the task will be discarded and processor continues to process subsequent tasks.
     */
-    DecatonTask<T> extract(byte[] bytes);
+    DecatonTask<T> extract(ConsumedRecord record);


So we're effectively exposing kafka specific API (Headers) to users through this?

Yes.
In fact, we already expose Headers through ProcessingContext#headers.

We might extend Decaton to support other type of mq in the future though, then we need to make change to ProcessingContext#headers too so I think we're ok to expose headers for now.

Of course, we should be cautious to avoid making Decaton couple to specific mq as much as possible though.

kawamuray

LGTM, cool 👍

ocadaruma added 5 commits June 13, 2024 17:53

TaskMetadata as header

eadc5c1

fix metadata drop

e307321

fix test

02261cf

add test

a0ddf01

add integration test

e52c367

ocadaruma added the breaking change Breaking change for a public API label Jun 14, 2024

ocadaruma marked this pull request as ready for review June 14, 2024 03:03

ocadaruma requested a review from kawamuray June 14, 2024 03:03

ocadaruma commented Jun 14, 2024

View reviewed changes

kawamuray suggested changes Jul 5, 2024

View reviewed changes

address comments

644474b

ocadaruma force-pushed the metadata-as-header-2 branch from 7984c60 to 644474b Compare July 19, 2024 06:02

ocadaruma requested a review from kawamuray July 19, 2024 06:43

kawamuray suggested changes Aug 13, 2024

View reviewed changes

update license header

51ac9eb

ocadaruma requested a review from kawamuray August 13, 2024 10:27

kawamuray approved these changes Aug 13, 2024

View reviewed changes

kawamuray merged commit 5ef1bea into line:master Aug 13, 2024
5 checks passed

ocadaruma mentioned this pull request Aug 14, 2024

Update docs and examples for 9.0.0 #239

Merged

ocadaruma deleted the metadata-as-header-2 branch August 14, 2024 12:11

This was referenced Aug 15, 2024

Make Decaton can consume any topic with deserializer #241

Merged

Add IT to check protocol migration works #243

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TaskMetadata as header #238

TaskMetadata as header #238

ocadaruma commented Jun 13, 2024 •

edited

Loading

ocadaruma Jun 14, 2024

kawamuray Jul 5, 2024

ocadaruma Jul 18, 2024

ocadaruma Jul 19, 2024

kawamuray left a comment

kawamuray Jul 5, 2024

ocadaruma Jul 18, 2024

kawamuray Jul 5, 2024

ocadaruma Jul 18, 2024 •

edited

Loading

kawamuray Jul 18, 2024

kawamuray Jul 5, 2024

kawamuray Jul 5, 2024

ocadaruma Jul 18, 2024

kawamuray Jul 5, 2024

kawamuray Jul 5, 2024

ocadaruma Jul 18, 2024

kawamuray Jul 18, 2024

kawamuray left a comment

kawamuray Aug 9, 2024

ocadaruma Aug 13, 2024

kawamuray Aug 13, 2024

ocadaruma Aug 13, 2024

kawamuray left a comment

		}

		public CompletableFuture<PutTaskResult> sendRequest(byte[] key, DecatonTaskRequest request,

TaskMetadata as header #238

TaskMetadata as header #238

Conversation

ocadaruma commented Jun 13, 2024 • edited Loading

Motivation

Summary of changes

Deprecate DecatonTaskRequest

DecatonClient

TaskExtractor#extract signature change

ProcessorProperties, RetryQueueingProcessor

Trivial

Breaking changes

IMPORTANT NOTE

TODOs

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kawamuray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ocadaruma Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Decomposed primitive fields

Encoded protobuf in header

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kawamuray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kawamuray left a comment

Choose a reason for hiding this comment

ocadaruma commented Jun 13, 2024 •

edited

Loading

ocadaruma Jul 18, 2024 •

edited

Loading