Refactor of `get_tune_schedule()` #978

hfrick · 2025-01-17T14:41:16Z

Here's the refactor of get_tune_schedule()! The basic idea is to schedule the stages recursively, starting at the preprocessing stage down to the postprocessing stage, and always do one stage at a time, pushing the remaining parameters into a nested tibble.

I've made a PR into tune-schedule so that you can see the diffs to the previous version clearly. I understand that branch to be our place to work things out, so I'm happy to make a separate PR into main if we are happy with how get_tune_schedule() looks. I think the tests could also make use of that separation of the stages into smaller scheduling functions, but I didn't do this here because I wanted you to see how the tests changed for this refactor.

The refactor leads to 0-row tibbles when there are no tuning parameters at all (which we discussed in the team meeting) and small changes in the order of the columns. The ordering of the rows (for preprocessing) also stays the same now between the ingoing grid and the outgoing schedule.

Since this is the second round of working over this scheduling function, no need to review "only" high-level, hit me with your nits so that this part is ready for main!

(note that otherwise the testing pane in Positron doesn't work)

hfrick · 2025-01-17T15:11:58Z

R/schedule.R


-	# ------------------------------------------------------------------------------
+get_param_info <- function(wflow) {
+  param_info <- tune_args(wflow) %>% 


Using tune_args() here instead of a parameter set object, due to considerations I've put in #974 (comment)

hfrick · 2025-01-17T15:14:20Z

tests/testthat/helper-tune-package.R

-mod_tune_bst <- boost_tree(trees = tune(), min_n = tune(), mode = "regression")
-mod_tune_rf <- rand_forest(min_n = tune(), mode = "regression")
+mod_tune_bst <- parsnip::boost_tree(trees = tune(), min_n = tune(), mode = "regression")
+mod_tune_rf <- parsnip::rand_forest(min_n = tune(), mode = "regression")

 if (rlang::is_installed("probably")) {

 	adjust_tune_min <-


Given that we usually use rec in the name of recipes objects, I would like to advocate for calling tailor objects something with tailor rather than adjust_.

simonpcouch

Going to hold off on a proper review until I can carve out a solid chunk of time, but re:

I've made a PR into tune-schedule so that you can see the diffs to the previous version clearly. I understand that branch to be our place to work things out, so I'm happy to make a separate PR into main if we are happy with how get_tune_schedule() looks. I think the tests could also make use of that separation of the stages into smaller scheduling functions, but I didn't do this here because I wanted you to see how the tests changed for this refactor.

Totally makes sense, thanks! I'm definitely on board for the workflow of taking chunks of that PR and refactoring + reviewing more in-detail and then sending those smaller portions into main as we do so.

Just eyeballing the diffs, it looks like this PR still makes use of the UseSpacesForTab: No setting. I see that it probably makes sense to keep that setting around to prevent conflicts with—and more easily diff against—tune-schedule, but I'd advocate for reverting back to UseSpacesForTab: Yes and reformatting the smaller chunks at some point before we send them into main. I can imagine a couple different ways that workflow could look (wait to reformat, merge into tune-schedule, reformat that whole PR to line up with the rest of the repo, extract out the relevant bits and merge to main?), but whatever results in the least work for the implementer has a thumbs-up from me.

simonpcouch

Very readable, very concise. A huge step up from compute_grid_info() or any of its refactors. Got a lot of joy from reviewing this one—bravo to yall!

+1 to working in some tests at the level of the newly separate functions, but fine with me to wait for a separate PR to make that happen.

Huzzah🙆

simonpcouch · 2025-01-19T21:29:47Z

tests/testthat/test-schedule.R

+suppressPackageStartupMessages(library(tailor))
+suppressPackageStartupMessages(library(purrr))
+suppressPackageStartupMessages(library(dplyr))
+


I think my usual Do-Repeat-Yourself-ery would have these library calls within each test_that() chunk, though I know that's probably not the standard set in other places in this repo

Repeating the library calls in each test_that() expression is actually my least favorite approach to this problem 😅

Because:
At least when executing such test code in the active console, the library calls have a lasting effect - but are less visible for quick cross-file scanning when they (unintentionally) affect something elsewhere. It could be that testthat sandboxes this behaviour but I think it might not.

library(testthat) test_that("tune function is not found", { expect_error( tune_grid(), "could not find" ) }) #> Test passed 🎊 test_that("find it with library", { library(tune) expect_error( tune_grid(), "The first argument" ) }) #> Test passed 🌈 test_that("tune function is still not found", { expect_error( tune_grid(), "could not find" ) }) #> ── Failure: tune function is still not found ─────────────────────────────────── #> `tune_grid()` threw an error with unexpected message. #> Expected match: "could not find" #> Actual message: "The first argument to [tune_grid()] should be either a model or workflow." #> Backtrace: #> ▆ #> 1. ├─testthat::expect_error(tune_grid(), "could not find") #> 2. │ └─testthat:::quasi_capture(...) #> 3. │ ├─testthat (local) .capture(...) #> 4. │ │ └─base::withCallingHandlers(...) #> 5. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo)) #> 6. ├─tune::tune_grid() #> 7. └─tune:::tune_grid.default() #> 8. └─rlang::abort(msg) #> Error: #> ! Test failed

^{Created on 2025-01-21 with reprex v2.1.0}

Ah, I see. Dang, doesn't seem like there's a great option then. Feel free to keep as-is. :)

simonpcouch · 2025-01-19T21:39:58Z

R/schedule.R

+	schedule <- grid %>% 
+    tidyr::nest(.by = all_of(param_pre_stage), .key = "model_stage")
+
+	# schedule next stages recursively


Apologies if this feels like a nit, but I struggled to wrap my head around this code a bit longer than I might've otherwise trying to work this comment into my mental model—is there actually any recursion happening in this code?

A) > hit me with your nits

B) Not a nit, but a valuable comment! You're right, I supposed it's not quite the right word. What would you call it? Something with "nested"? Or just "schedule next stages within `schedule_model_stage_i()"? I just want to give the reader a heads-up that all stages will be taken care of, even though you can only "see" scheduling the immediate next stage from that point in the code.

Yeah, I see what you're trying to call out! Maybe "nested iteration"? Or possibly just point out "each model stage will also iterate across predict and post stages"

simonpcouch · 2025-01-19T21:43:26Z

R/schedule.R

+  # schedule model parameters
+  schedule <- min_model_grid(model_stage, model_param, wflow)
+
+  # push remaining paramters into the next stage


Suggested change

# push remaining paramters into the next stage

# push remaining parameters into the next stage

simonpcouch · 2025-01-19T21:45:46Z

R/schedule.R


-	# ------------------------------------------------------------------------------
-	# Merge in submodel fit value (if any)
+schedule_model_stage_i <- function(model_stage, param_info, wflow){


this is so so smart and i love it

Max worked it out and now I got to let it shine 😄

simonpcouch · 2025-01-19T21:55:02Z

tests/testthat/test-schedule.R

-			tidyr::unnest(post_stage),
-		grid_nada
-	)
+	expect_equal(nrow(sched_nada), 0)

 	expect_s3_class(
 		sched_nada,


I can't comment on the line directly since it's unchanged in this PR, but tidy style would include no trailing newline within the test_that({}) expression

topepo · 2025-01-22T01:49:51Z

R/schedule.R

+schedule_stages <- function(grid, param_info, wflow) {
+	# schedule preprocessing stage and push the rest into a nested tibble
+	param_pre_stage <- param_info %>% 
+    filter(source == "recipe") %>% 


Suggested change

filter(source == "recipe") %>%

dplyr::filter(source == "recipe") %>%

topepo · 2025-01-22T01:50:38Z

R/schedule.R

+	# schedule preprocessing stage and push the rest into a nested tibble
+	param_pre_stage <- param_info %>% 
+    filter(source == "recipe") %>% 
+    pull(id)


Suggested change

pull(id)

dplyr::pull(id)

topepo · 2025-01-22T01:50:52Z

R/schedule.R

+
+	# schedule next stages recursively
+	schedule %>% 
+		mutate(


Suggested change

mutate(

dplyr::mutate(

topepo · 2025-01-22T01:51:01Z

R/schedule.R

-	# Merge in submodel fit value (if any)
+schedule_model_stage_i <- function(model_stage, param_info, wflow){
+  model_param <- param_info %>% 
+    filter(source == "model_spec") %>% 


Suggested change

filter(source == "model_spec") %>%

dplyr::filter(source == "model_spec") %>%

topepo · 2025-01-22T01:51:10Z

R/schedule.R

+schedule_model_stage_i <- function(model_stage, param_info, wflow){
+  model_param <- param_info %>% 
+    filter(source == "model_spec") %>% 
+    pull(id)


Suggested change

pull(id)

dplyr::pull(id)

topepo · 2025-01-22T01:51:18Z

R/schedule.R

+    filter(source == "model_spec") %>% 
+    pull(id)
+  non_submodel_param <- param_info %>% 
+    filter(source == "model_spec" & !has_submodel) %>% 


Suggested change

filter(source == "model_spec" & !has_submodel) %>%

dplyr::filter(source == "model_spec" & !has_submodel) %>%

topepo · 2025-01-22T01:51:37Z

R/schedule.R

+    pull(id)
+  non_submodel_param <- param_info %>% 
+    filter(source == "model_spec" & !has_submodel) %>% 
+    pull(id)


Suggested change

pull(id)

dplyr::pull(id)

topepo · 2025-01-22T01:51:50Z

R/schedule.R

+
+  # push remaining paramters into the next stage
+  next_stage <- model_stage %>% 
+    tidyr::nest(.by = all_of(non_submodel_param), .key = "predict_stage")


Suggested change

tidyr::nest(.by = all_of(non_submodel_param), .key = "predict_stage")

tidyr::nest(.by = dplyr::all_of(non_submodel_param), .key = "predict_stage")

ok, I think that you get it at this point. This is all just protection for being invoked inside of worker processes.

Thanks for all the code suggestions! I find it totally okay for the reviewer to do one example and then hand it back to whoever opened the PR.

About the worker process: Do we expect to execute get_tune_schedule() in parallel? My understanding is that we first schedule and then send todos off to workers, i.e. we would not expect to call get_tune_schedule() in parallel, no?

topepo · 2025-01-22T02:19:44Z

R/schedule.R

-				~.x %>% dplyr::group_nest(!!!symbs$sub, .key = "post_stage")
-			)
-		)
+schedule_stages <- function(grid, param_info, wflow) {


Do we need param_info as an argument? Since it is created by get_param_info(), we could call that immediately with wflow.

topepo · 2025-01-22T02:23:33Z

R/schedule.R

-	if (any(param$source == "tailor")) {
-		post_id <- param$id[param$source == "tailor"]
+	og_cls <- class(schedule)
+	if (nrow(param) == 0) {


It's a little gross to code here but this class structure feels like a good solution (until we know that it is not)

topepo

Also, since I forgot to say it in the review... this looks great. Big improvement on may refactor.

hfrick added 7 commits January 15, 2025 11:15

Refactor get_tune_schedule()

2fbcbfb

Namespace things

108d94f

(note that otherwise the testing pane in Positron doesn't work)

give in temporarily

f39824b

submodel parameters don't get move to the end anymore

88279fe

schedule now keeps grid ordering

64aadb3

allow 0-row tibbles for "no tuning"

450a182

clean up

ee76649

hfrick commented Jan 17, 2025

View reviewed changes

simonpcouch reviewed Jan 17, 2025

View reviewed changes

hfrick requested review from topepo and simonpcouch January 17, 2025 15:40

simonpcouch approved these changes Jan 19, 2025

View reviewed changes

topepo reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of `get_tune_schedule()` #978

Refactor of `get_tune_schedule()` #978

hfrick commented Jan 17, 2025

hfrick Jan 17, 2025

hfrick Jan 17, 2025

simonpcouch left a comment

simonpcouch left a comment

simonpcouch Jan 19, 2025

hfrick Jan 21, 2025

simonpcouch Jan 21, 2025

simonpcouch Jan 19, 2025

hfrick Jan 21, 2025

simonpcouch Jan 21, 2025

simonpcouch Jan 19, 2025

simonpcouch Jan 19, 2025

hfrick Jan 21, 2025

simonpcouch Jan 19, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

hfrick Jan 23, 2025

topepo Jan 22, 2025

topepo Jan 22, 2025

topepo left a comment

	# push remaining paramters into the next stage
	# push remaining parameters into the next stage

	filter(source == "recipe") %>%
	dplyr::filter(source == "recipe") %>%

	filter(source == "model_spec") %>%
	dplyr::filter(source == "model_spec") %>%

	filter(source == "model_spec" & !has_submodel) %>%
	dplyr::filter(source == "model_spec" & !has_submodel) %>%

	tidyr::nest(.by = all_of(non_submodel_param), .key = "predict_stage")
	tidyr::nest(.by = dplyr::all_of(non_submodel_param), .key = "predict_stage")

Refactor of get_tune_schedule() #978

Are you sure you want to change the base?

Refactor of get_tune_schedule() #978

Conversation

hfrick commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topepo left a comment

Choose a reason for hiding this comment

Refactor of `get_tune_schedule()` #978

Refactor of `get_tune_schedule()` #978