Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERFORMANCE: Set options(socketOptions = "no-delay") on workers by default(?) #72

Closed
HenrikBengtsson opened this issue Nov 9, 2021 · 1 comment
Milestone

Comments

@HenrikBengtsson
Copy link
Collaborator

Background

Setting options(socketOptions = "no-delay") on parallel "snow" workers during their startup will significantly decrease the communication latency on Linux. This requires R (>= 4.1.0). See ?socketConnection for details.

Proposal

Have makeClusterPSOCK(), or more precisely makeNodePSOCK(), set this option by default on each parallel worker.

There are a few alternatives,

  1. We could make rscript_startup = quote(options(socketOptions = "no-delay")) the new default. The drawback with this is that code that already sets this argument explicitly with other values will override this default and not benefit from this faster setting.
  2. The same argument would apply if we could do rscript_options = list(socketOptions = "no-delay"), cf. Issue makeClusterPSOCK(): Add rscript_options #70.
  3. Introduce a new argument socketOptions that sets this option on the parallel workers. It can default to socketOptions = "no-delay".

I'm leaning towards Alt 3.

cc/ @jeffkeller87

@HenrikBengtsson
Copy link
Collaborator Author

Added Alt 3 in parallelly (>= 1.28.1-9003). The default can be controlled by option parallelly.makeNodePSOCK.socketOptions and env var R_PARALLELLY_MAKENODEPSOCK.SOCKETOPTIONS.

library(parallel)
library(parallelly)

clA <- makeClusterPSOCK(1)
clB <- makeClusterPSOCK(1, socketOptions = "no-delay")  ## default
clC <- makeClusterPSOCK(1, socketOptions = NULL)

stats <- bench::mark(
  clusterEvalQ(clA, iris)[[1]],
  clusterEvalQ(clB, iris)[[1]],
  clusterEvalQ(clC, iris)[[1]]
)
print(stats)
#> # A tibble: 3 × 13
#>   expression                        min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr>                   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 clusterEvalQ(clA, iris)[[1]]    772µs   1.02ms     960.     23.8KB     0      480     0      500ms
#> 2 clusterEvalQ(clB, iris)[[1]]    797µs   1.04ms     961.     23.8KB     2.04   471     1      490ms
#> 3 clusterEvalQ(clC, iris)[[1]]    865µs  43.85ms      28.2    23.8KB     0       15     0      531ms
#> # … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant