as.cluster run asynchronous parallel? #13

Gibbsdavidl · 2018-09-19T19:51:26Z

Hi there,
Great package! Really cool. I am using the plan function along with as.cluster where docker images are being pulled on the VMs. The issue is that this processes is done sequentially for a list of VMs ... and it seems like it could/should take place asynchronously in parallel... which would save a lot of time if you have a long list of VMs (as made using googleComputeEngineR).

Here's what I'm talking about:

my_rscript <- c("docker", 
                "run", c("--net=host","--shm-size=10G"),
                "gibbsdavidl/google_r:v3", 
                "Rscript")

plan(cluster, 
     workers = as.cluster(vms, 
                          docker_image="gibbsdavidl/google_r:v3",
                          rscript=my_rscript)
     )

The as.cluster call pulls the docker image for each VM in order ... one at a time ... and it's really slow. I could probably make a 'skinnier' docker image however.

Thanks!
-dave

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2018-11-09T18:56:32Z

Sorry for the slow reply. Yes, this would be a neat feature, especially when setting up remote connections is "slow" or when setting up a lot of parallel workers. I actually have an old note of mine on this:

Would it make sense to "parallelize" makeClusterPSOCK(), i.e.

launch all workers

then connect to each of them

instead of as now:

for all workers,
a. launch it
b. connect to it

A downside of this approach is when the setup of the connections fail - then we might have launched lots of zombie workers. The risk for this happening could be mitigated by first making sure that one worker can be set up, and only after that has been confirmed, the rest are launched in parallel.

PS. Note that future::makeClusterPSOCK() has nothing to do with the Future API per se. I've deliberately implemented it such that it could be moved elsewhere, e.g. incorporated into the parallel package itself. In other words, the same feature request applies to the sibling parallel::makePSOCKcluster() as well.

HenrikBengtsson transferred this issue from futureverse/future Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

as.cluster run asynchronous parallel? #13

as.cluster run asynchronous parallel? #13

Gibbsdavidl commented Sep 19, 2018

HenrikBengtsson commented Nov 9, 2018

as.cluster run asynchronous parallel? #13

as.cluster run asynchronous parallel? #13

Comments

Gibbsdavidl commented Sep 19, 2018

HenrikBengtsson commented Nov 9, 2018