Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task batching doc #150

Merged
merged 10 commits into from
Jul 30, 2024
Merged

Conversation

ryamagishi
Copy link
Contributor

Motivation

Related Issue: #128
Related PR: #139

@ryamagishi ryamagishi force-pushed the add-task-batching-doc branch from e2813da to 0b5959e Compare April 10, 2022 12:33
Copy link
Contributor

@ocadaruma ocadaruma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up!

Left some feedbacks.
Also would you mind updating example project? (https://github.com/line/decaton/tree/master/docs/example/src/main/java/example)

When downstream-DB supports batching I/O (which often very efficient)

== Usage
To use `Task Batching`, you only need to instantiate `BatchingProcessor`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically, developers will create their own class that inherits BatchingProcessor rather direct instantiation I guess.
So you only need to implement BatchingProcessor sounds more appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description and examples have been updated to use InsertHelloTaskBatchingProcessor, which inherits BatchingProcessor<HelloTask>.

)
}

private static BatchingProcessor<HelloTask> createBatchingProcessor(long lingerMillis, int capacity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above comment, I guess the practical usage would be implementing a class that inherits BatchingProcessor.

Then how about fixing this example as like so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the same. #150 (comment)

[source,java]
.HelloTask.java
----
public class HelloTask {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this example more looks like "real", by naming this class as "InsertHelloTask" or something? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe I don't understand this comment properly.

I created InsertHelloTaskBatchingProcessor and wrote the insertion points in the commet, but please let me know if there are any problems.
https://github.com/line/decaton/pull/150/files#diff-6cfdc5f934448411f9f9ff81c4cf114233a240ab01765dec212c7eac322f9961R70

Comment on lines 93 to 97
In this section, we will briefly explain how is Task Batching implemented.
All the magic happened in `BatchingProcessor` when a task comes to this processor the following things happen:

1. The task will be put into an in-memory window.
2. When the size or time reach each limit, `processBatchingTasks(List)` is called with stored `batchingTasks`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we should include in "Implementation" section is something that developers should notice which comes from its implementation detail (rather than its "API"), rather than just a "how this feature is implemented".

e.g. In rate-limiting doc, it's mentioned that it adopts token-bucket algorithm which allows some "bursting" against configured rate-limit, so developers can notice that it's not suitable when they have to limit the rate strictly. (https://github.com/line/decaton/blob/master/docs/rate-limiting.adoc#implementation)

Then, we don't need to have Implementation section for BatchingProcessor I think? Or do you have some ideas you want to mention in this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation.
Removed the "Implementation" section.

@ocadaruma
Copy link
Contributor

@ryamagishi Hi, long time no see. Do you still have a plan to complete this PR?

@ryamagishi
Copy link
Contributor Author

Hi @ocadaruma, I apologize for the delay and for forgetting about this PR. If you would permit, I would like to complete it by the end of this month. I need some time to refresh my memory on the work, so I appreciate your understanding.

@ocadaruma
Copy link
Contributor

If you would permit, I would like to complete it by the end of this month

That's totally fine. Thank you!

@ryamagishi ryamagishi force-pushed the add-task-batching-doc branch from e0c9d97 to 332262a Compare July 27, 2024 06:41
@ryamagishi
Copy link
Contributor Author

@ocadaruma
I apologize for the late response.
I have corrected the parts you commented on.
If there is anything that needs to be corrected, I will deal with it right away so that it can be merged.
Please check it when you have time.

@@ -7,7 +7,7 @@ plugins {

ext {
DECATON_VERSION = getProperty("version") + (getProperty("snapshot").toBoolean() ? "-SNAPSHOT" : "")
PROTOBUF_VERSION = "3.3.0"
PROTOBUF_VERSION = "3.22.3"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example build failed, so I upgraded it.

Copy link
Contributor

@ocadaruma ocadaruma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for update, current content LGTM!

I left one more request. Please take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add Caution: somewhere to mention below?:

  • batch-flush is done in BatchingProcessor's scheduled executor thread
  • Which means, the parallelism of flushing has to be controlled by ProcessorScope, not only by decaton.partition.concurrency config. i.e.:
    • parallelize flushing per partition: ProcessorScope.PARTITION
    • parallelize flushing per processor thread: ProcessorScope.THREAD

Some users pointed out that this behavior might be confusing, so it's worth to mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've added it.
7b0112a

@ryamagishi ryamagishi force-pushed the add-task-batching-doc branch 2 times, most recently from fc79b8c to 2bd4b5e Compare July 30, 2024 09:16
@ryamagishi ryamagishi force-pushed the add-task-batching-doc branch from 387fa4f to 7b0112a Compare July 30, 2024 09:22
@ryamagishi
Copy link
Contributor Author

@ocadaruma
I have responded to your comments.
Please take a look.

Copy link
Contributor

@ocadaruma ocadaruma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM

@ocadaruma ocadaruma merged commit 08614dd into line:master Jul 30, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants