Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache task.to_str_params() to speedup Task initialization #421

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Hi-king
Copy link
Member

@Hi-king Hi-king commented Dec 27, 2024

Summary:

During the initialization process of gokart.TaskOnKart, especially with deeply nested TaskInstanceParameter, the repeated calls to to_str_params() caused significant performance degradation. To address this issue, we introduced a caching mechanism for function calls, which has significantly accelerated the initialization process.

Detailed Behavior:

The following actions occur during initialization:

  1. to_str_params() is called during the task's init.
  2. Inside to_str_params(), the parameter's serialize method is executed.
  3. Within the TaskInstanceParameter's serialize, to_str_params() is called recursively.

Changes:

Implemented a caching mechanism for the to_str_params() function.
Cached the string representations of parameters once computed to prevent recalculations for the same parameters.
This change substantially reduces the time complexity from O(n^2), thereby shortening the initialization time.
Impact:
This modification affects all tasks inheriting from gokart.TaskOnKart, particularly those with complex dependencies, improving the initialization performance significantly.

How to Reproduce

class HelloWorldTask(gokart.TaskOnKart):
    _cache_key_date = luigi.DateMinuteParameter(default=datetime.datetime.now())
    
    def run(self):
        self.dump("Hello, World!")

class RecursiveTask(gokart.TaskOnKart):
    dep = gokart.TaskInstanceParameter()

    def requires(self):
        return self.dep

    def run(self):
        self.dump(self.dep.load())

def create_recursive_task(n):
    if n == 0:
        return HelloWorldTask()
    return RecursiveTask(dep=create_recursive_task(n - 1))

%time create_recursive_task(90)

Prior to this change, this sample code took 21.9 seconds to execute. With the new caching mechanism, it now takes only 1.15 seconds, marking a significant improvement in performance.

TBA

need some tests

@hiro-o918
Copy link
Contributor

Nice contribution!
please fix CI errors

@Hi-king Hi-king force-pushed the cache_to_str_params branch from b289714 to a6c8ade Compare January 1, 2025 12:02
@Hi-king
Copy link
Member Author

Hi-king commented Jan 1, 2025

@hiro-o918 Thanks! I've fixed the CI errors and also added some tests!

Copy link
Collaborator

@hirosassa hirosassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put just a nits comment

gokart/task.py Outdated Show resolved Hide resolved
Co-authored-by: hirosassa <hiro.sassa@gmail.com>
@Hi-king Hi-king changed the title Draft: cache task.to_str_params() to speedup Task initialization cache task.to_str_params() to speedup Task initialization Jan 1, 2025
Copy link
Collaborator

@hirosassa hirosassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gokart/task.py Outdated
@@ -373,6 +374,18 @@ def make_unique_id(self) -> str:
self.task_unique_id = unique_id
return unique_id

def to_str_params(self, only_significant=False, only_public=False):
if only_significant and (not only_public):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this condition is necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kitagry This intend to cache only when called from init.

There're two solutions

  • only cache default parameters: simple impl. , but specific use
  • cache all patterns of parameters: general use, but a little bit comprecated impl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to name the condition for your intention. For example

init_condition = only_significant and (not only_public)
if !init_condition:
    return super().to_str_params(only_significant, only_public)

Copy link
Member Author

@Hi-king Hi-king Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kitagry Thanks! I've introduced _called_with_default_args :)

gokart/task.py Outdated
@@ -373,6 +374,17 @@ def make_unique_id(self) -> str:
self.task_unique_id = unique_id
return unique_id

def to_str_params(self, only_significant=False, only_public=False) -> dict[str, str]:
_called_with_default_args: bool = only_public and (not only_significant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ask] default only_public is False. Is it correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe?

Suggested change
_called_with_default_args: bool = only_public and (not only_significant)
_called_with_default_args: bool = (not only_public) and (not only_significant)

@Hi-king
Copy link
Member Author

Hi-king commented Jan 10, 2025

@kitagry @hiro-o918 @hirosassa

After discussing with @kitagry, we concluded that the code's intent was somewhat unclear. Therefore, we simplified it by switching to simply using functools.lru_cache .
Could you please give it a quick review and merge it? Thanks!

Copy link
Collaborator

@hirosassa hirosassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clear and simple. I agree with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants