Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formulas don't seem to be searched for globals #87

Open
DavisVaughan opened this issue Mar 27, 2023 · 1 comment
Open

Formulas don't seem to be searched for globals #87

DavisVaughan opened this issue Mar 27, 2023 · 1 comment
Labels

Comments

@DavisVaughan
Copy link
Contributor

First seen here futureverse/furrr#256

I've reduced that down to this minimal ish example. It seems like formula objects aren't searched in for globals? I'm not quite sure. If you do decide to look in formula objects for globals, it will be important to ensure that you look in the formula's environment for those globals rather than the standard envir argument.

The only way I found around this was to wrap up the formula creation in local() so the constant is contained in an environment that gets shipped along to the worker alongside the formula.

library(future)

set.seed(123)

plan(multisession, workers = 2)

df <- data.frame(
  y = sample(10),
  x = sample(10)
)

constant <- rep(0, nrow(df))
formula <- y ~ x + constant

# This works
model.matrix(formula, data = df)
#>    (Intercept)  x constant
#> 1            1 10        0
#> 2            1  5        0
#> 3            1  3        0
#> 4            1  8        0
#> 5            1  1        0
#> 6            1  4        0
#> 7            1  6        0
#> 8            1  9        0
#> 9            1  7        0
#> 10           1  2        0
#> attr(,"assign")
#> [1] 0 1 2

# This doesn't
result <- future::future({
  model.matrix(formula, data = df)
})

# Uh oh
value(result)
#> Error in eval(predvars, data, env): object 'constant' not found

# //////////////////////////////////////////////////////////////////////////////

# globals is not seeing `constant`
globals::globalsOf(quote({
  model.matrix(formula, data = df)
}))
#> $`{`
#> .Primitive("{")
#> 
#> $model.matrix
#> function (object, ...) 
#> UseMethod("model.matrix")
#> <bytecode: 0x7fb5667dcbb0>
#> <environment: namespace:stats>
#> 
#> $formula
#> y ~ x + constant
#> 
#> $df
#>     y  x
#> 1   3 10
#> 2  10  5
#> 3   2  3
#> 4   8  8
#> 5   6  1
#> 6   9  4
#> 7   1  6
#> 8   7  9
#> 9   5  7
#> 10  4  2
#> 
#> attr(,"where")
#> attr(,"where")$`{`
#> <environment: base>
#> 
#> attr(,"where")$model.matrix
#> <environment: package:stats>
#> attr(,"name")
#> [1] "package:stats"
#> attr(,"path")
#> [1] "/Library/Frameworks/R.framework/Versions/4.2/Resources/library/stats"
#> 
#> attr(,"where")$formula
#> <environment: R_GlobalEnv>
#> 
#> attr(,"where")$df
#> <environment: R_GlobalEnv>
#> 
#> attr(,"class")
#> [1] "Globals" "list"

# Note that the environment of `formula` is the global env.
# Since the global env isn't serialized to the worker, the `constant` won't
# be available over on the worker

# //////////////////////////////////////////////////////////////////////////////

# Here is a trick that does work, taking advantage of the fact that a `formula`
# keeps track of its environment

formula <- local({
  constant <- rep(0, nrow(df))
  y ~ x + constant
})

# Note the env here isn't the global env
formula
#> y ~ x + constant
#> <environment: 0x7fb5706b6a38>

# So now this works
result <- future::future({
  model.matrix(formula, data = df)
})

value(result)
#>    (Intercept)  x constant
#> 1            1 10        0
#> 2            1  5        0
#> 3            1  3        0
#> 4            1  8        0
#> 5            1  1        0
#> 6            1  4        0
#> 7            1  6        0
#> 8            1  9        0
#> 9            1  7        0
#> 10           1  2        0
#> attr(,"assign")
#> [1] 0 1 2
@twest820
Copy link

twest820 commented Apr 7, 2023

An observation which may be relevant to issue prioritization: I've updated the fit_gam() function mentioned in futureverse/furrr#256 to use the local() workaround. Looks like use of the workaround is obligate to plan(workers > 1) and that future lacks a way to query for the current number of workers, suggesting there isn't an easy way for functions to bypass the workaround in single worker cases. I can work around this as well in my current use cases, so it's not a big deal, but it does strike me as odd that workers = 1 breaks local().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants