Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error setting up cloud instance #220

Open
jnmaloof opened this issue Feb 18, 2021 · 1 comment
Open

Error setting up cloud instance #220

jnmaloof opened this issue Feb 18, 2021 · 1 comment

Comments

@jnmaloof
Copy link

jnmaloof commented Feb 18, 2021

Submitting first job through cloudml, and there are errors on the cloud install.

Log from google cloud console:

The replica master 0 exited with a non-zero status of 1. 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-kgurwctg/setup.py", line 163, in <module>
    cmdclass         = { "install": CustomCommands }
  File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 161, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/tmp/pip-req-build-kgurwctg/setup.py", line 138, in run
    self.RunCustomCommandList(PIP_INSTALL_KERAS)
  File "/tmp/pip-req-build-kgurwctg/setup.py", line 119, in RunCustomCommandList
    self.RunCustomCommand(command, True)
  File "/tmp/pip-req-build-kgurwctg/setup.py", line 102, in RunCustomCommand
    raise RuntimeError(message)
RuntimeError: Command ['pip', 'install', 'h5py', 'pyyaml', 'requests', 'Pillow', 'scipy', '--upgrade'] failed: exit code 1

To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=894990183050&resource=ml_job%2Fjob_id%2Fcloudml_2021_02_18_030427919&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22cloudml_2021_02_18_030427919%22
@jnmaloof
Copy link
Author

To explore this a bit more, I:

  1. installed cloudml from this repository rather than CRAN

  2. Used the example mnist script:

library(cloudml)
dir.create("mnist-train")
file.copy(system.file("examples/mnist/train.R", package = "cloudml"), "mnist-train")
setwd("mnist-train")
cloudml_train()

The first error that pops up is probably not consequential:

ERROR: You have configured your Cloud SDK installation to be fixed to version [220.0.0]. Make sure this is a valid archived Cloud SDK version.

But things seem to go wrong when installing matrix 1.3-2, where I get:

curl: (22) The   requested URL returned error: 404 Not Found
FAILED
Error in getSourceForPkgRecord(pkgRecord,   srcDir(project), availablePackagesSource(repos = repos),  :
Failed to retrieve package sources for Matrix 1.3-2 from CRAN   (internet connectivity issue?)
Calls: retrieve_packrat_packages ...   restoreImpl -> playActions -> installPkg -> getSourceForPkgRecord
Execution halted
Command ['Rscript',   '/root/.local/lib/python3.7/site-packages/cloudml-model/cloudml/deploy.R']   failed: exit code 1
Command '['python3', '-m',   'cloudml-model.cloudml.deploy', 'Rscript', '--job-dir',   'gs://jm-dl-r-2/r-cloudml/staging']' returned non-zero exit status 1.

full logs in csv and JSON attached

Archive.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant