Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling with store_models = TRUE is slow #1222

Open
be-marc opened this issue Nov 28, 2024 · 0 comments
Open

Resampling with store_models = TRUE is slow #1222

be-marc opened this issue Nov 28, 2024 · 0 comments

Comments

@be-marc
Copy link
Member

be-marc commented Nov 28, 2024

library(mlr3learners)

task = tsk("pima")
learner = lrn("classif.ranger", num.trees = 5000)
resampling = rsmp("cv", folds = 10)

system.time(resample(task, learner, resampling))
 
#    user  system elapsed 
#  25.037   1.413  24.330 

system.time(resample(task, learner, resampling, store_models = TRUE))

#    user  system elapsed 
#  27.560   1.701  27.138 

Saving the models take 3 seconds longer. While tuning with store_models it took almost 8 seconds longer. The ranger models get quite large ~60MB. The 3 seconds get lost when creating this data.table:

mlr3/R/resample.R

Lines 121 to 131 in 2c8734a

data = data.table(
task = list(task),
learner = grid$learner,
learner_state = map(res, "learner_state"),
resampling = list(resampling),
iteration = seq_len(n),
prediction = map(res, "prediction"),
uhash = UUIDgenerate(),
param_values = map(res, "param_values"),
learner_hash = map_chr(res, "learner_hash")
)

Profiling of this part

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant