You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there,
Great package! Really cool. I am using the plan function along with as.cluster where docker images are being pulled on the VMs. The issue is that this processes is done sequentially for a list of VMs ... and it seems like it could/should take place asynchronously in parallel... which would save a lot of time if you have a long list of VMs (as made using googleComputeEngineR).
The as.cluster call pulls the docker image for each VM in order ... one at a time ... and it's really slow. I could probably make a 'skinnier' docker image however.
Thanks!
-dave
The text was updated successfully, but these errors were encountered:
Sorry for the slow reply. Yes, this would be a neat feature, especially when setting up remote connections is "slow" or when setting up a lot of parallel workers. I actually have an old note of mine on this:
Would it make sense to "parallelize" makeClusterPSOCK(), i.e.
launch all workers
then connect to each of them
instead of as now:
for all workers,
a. launch it
b. connect to it
A downside of this approach is when the setup of the connections fail - then we might have launched lots of zombie workers. The risk for this happening could be mitigated by first making sure that one worker can be set up, and only after that has been confirmed, the rest are launched in parallel.
PS. Note that future::makeClusterPSOCK() has nothing to do with the Future API per se. I've deliberately implemented it such that it could be moved elsewhere, e.g. incorporated into the parallel package itself. In other words, the same feature request applies to the sibling parallel::makePSOCKcluster() as well.
Hi there,
Great package! Really cool. I am using the plan function along with as.cluster where docker images are being pulled on the VMs. The issue is that this processes is done sequentially for a list of VMs ... and it seems like it could/should take place asynchronously in parallel... which would save a lot of time if you have a long list of VMs (as made using googleComputeEngineR).
Here's what I'm talking about:
The as.cluster call pulls the docker image for each VM in order ... one at a time ... and it's really slow. I could probably make a 'skinnier' docker image however.
Thanks!
-dave
The text was updated successfully, but these errors were encountered: