-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add task batching doc #150
Conversation
e2813da
to
0b5959e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up!
Left some feedbacks.
Also would you mind updating example project? (https://github.com/line/decaton/tree/master/docs/example/src/main/java/example)
docs/task-batching.adoc
Outdated
When downstream-DB supports batching I/O (which often very efficient) | ||
|
||
== Usage | ||
To use `Task Batching`, you only need to instantiate `BatchingProcessor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically, developers will create their own class that inherits BatchingProcessor rather direct instantiation I guess.
So you only need to implement BatchingProcessor
sounds more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description and examples have been updated to use InsertHelloTaskBatchingProcessor
, which inherits BatchingProcessor<HelloTask>
.
) | ||
} | ||
|
||
private static BatchingProcessor<HelloTask> createBatchingProcessor(long lingerMillis, int capacity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above comment, I guess the practical usage would be implementing a class that inherits BatchingProcessor.
Then how about fixing this example as like so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the same. #150 (comment)
[source,java] | ||
.HelloTask.java | ||
---- | ||
public class HelloTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this example more looks like "real", by naming this class as "InsertHelloTask" or something? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I don't understand this comment properly.
I created InsertHelloTaskBatchingProcessor
and wrote the insertion points in the commet, but please let me know if there are any problems.
https://github.com/line/decaton/pull/150/files#diff-6cfdc5f934448411f9f9ff81c4cf114233a240ab01765dec212c7eac322f9961R70
docs/task-batching.adoc
Outdated
In this section, we will briefly explain how is Task Batching implemented. | ||
All the magic happened in `BatchingProcessor` when a task comes to this processor the following things happen: | ||
|
||
1. The task will be put into an in-memory window. | ||
2. When the size or time reach each limit, `processBatchingTasks(List)` is called with stored `batchingTasks`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we should include in "Implementation" section is something that developers should notice which comes from its implementation detail (rather than its "API"), rather than just a "how this feature is implemented".
e.g. In rate-limiting doc, it's mentioned that it adopts token-bucket algorithm which allows some "bursting" against configured rate-limit, so developers can notice that it's not suitable when they have to limit the rate strictly. (https://github.com/line/decaton/blob/master/docs/rate-limiting.adoc#implementation)
Then, we don't need to have Implementation section for BatchingProcessor I think? Or do you have some ideas you want to mention in this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation.
Removed the "Implementation" section.
@ryamagishi Hi, long time no see. Do you still have a plan to complete this PR? |
Hi @ocadaruma, I apologize for the delay and for forgetting about this PR. If you would permit, I would like to complete it by the end of this month. I need some time to refresh my memory on the work, so I appreciate your understanding. |
That's totally fine. Thank you! |
…BatchingProcessor instead of direct instantiation.
e0c9d97
to
332262a
Compare
@ocadaruma |
@@ -7,7 +7,7 @@ plugins { | |||
|
|||
ext { | |||
DECATON_VERSION = getProperty("version") + (getProperty("snapshot").toBoolean() ? "-SNAPSHOT" : "") | |||
PROTOBUF_VERSION = "3.3.0" | |||
PROTOBUF_VERSION = "3.22.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example build failed, so I upgraded it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for update, current content LGTM!
I left one more request. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add Caution:
somewhere to mention below?:
- batch-flush is done in
BatchingProcessor
's scheduled executor thread - Which means, the parallelism of flushing has to be controlled by ProcessorScope, not only by
decaton.partition.concurrency
config. i.e.:- parallelize flushing per partition: ProcessorScope.PARTITION
- parallelize flushing per processor thread: ProcessorScope.THREAD
Some users pointed out that this behavior might be confusing, so it's worth to mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've added it.
7b0112a
fc79b8c
to
2bd4b5e
Compare
387fa4f
to
7b0112a
Compare
@ocadaruma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! LGTM
Motivation
BathingProcessor
like task-compaction.adoc.Related Issue: #128
Related PR: #139