Skip to content
This repository has been archived by the owner on May 22, 2024. It is now read-only.

Healthcheck failure, stardog never converge #69

Open
earthquakesan opened this issue May 6, 2018 · 6 comments
Open

Healthcheck failure, stardog never converge #69

earthquakesan opened this issue May 6, 2018 · 6 comments

Comments

@earthquakesan
Copy link

I have run:

stardog-graviton launch stardog-cloud

And after initialization of the volumes and VMs, I've got timeout on waiting for stardog to be up...

When I run status command:

 % stardog-graviton status stardog-cloud
The instance is not healthy
Stardog is available here: http://bla.amazonaws.com:5821
Stardog is internally available here: http://internal-bla.amazonaws.com:5821
ssh is available here: bla.amazonaws.com
Failed: Failed do the post Get http://bla.amazonaws.com:5821/admin/cluster: dial tcp 52.29.xxx.xxx:5821: i/o timeout
Please check the log files:
	/home/ivan/.graviton/logs/graviton.log
	/home/ivan/.graviton/deployments/stardog-cloud/logs/graviton.log

I can connect to instance via SSH, also 3 ZK, 3 Stardog and 1 Bastion nodes are running and visible via AWS console.

@pdmars
Copy link
Member

pdmars commented May 7, 2018

This can be caused by a number of different issues. Usually the first thing to check are the Stardog logs themselves to see if the nodes are unable to start for some reason. Are you able to get the logs (stardog-graviton logs) of the Stardog nodes to see if there are any errors in them?

@earthquakesan
Copy link
Author

@pdmars it gives me logs/ folder with empty graviton.log file.

@earthquakesan
Copy link
Author

I have opened another issue for logs command.

I have ssh'ed into one of the stardog nodes and look into the logs directly. There is a problem with license (I redirected it to responsible colleagues). Would be good if you can catch and report it directly in graviton and not just doing healthchecks. E.g. parsing logs and telling user that the cluster was not able to get up, because your license has expired.

@earthquakesan
Copy link
Author

@pdmars hi Paul!

I've got new license. Graviton is still failing for the proper startup. The command I use is:

stardog-graviton launch --volume-size 25 --env="STARDOG_JAVA_ARGS=\"-Xms12g -Xmx12g -XX:MaxDirectMemorySize=12g\"" --sd-instance-type="r4.large" stardog-cloud

The stdout is:

Creating the new deployment stardog-cloud
- Initializing terraform......
- Calling out to terraform to create the volumes...
- Calling out to terraform to stop builder instances...
Successfully created the volumes.
- Initializing terraform...
- Creating the instance VMs......
Successfully created the instance.
Waiting for stardog to come up...
The instance is healthy
\ Opening the firewall......
Successfully opened up the instance.
Failed: Timed out waiting for all the cluster nodes
Please check the log files:
	/home/ivan/.graviton/logs/graviton.log
	/home/ivan/.graviton/deployments/stardog-cloud/logs/graviton.log

As I can see from the logs Stardog is up and running. What could be the problem for the launch command and how it can be solved?

@earthquakesan
Copy link
Author

Ok, the license I have right now is not eligible for cluster version. Two out of three stardog nodes will not get up. That's the reason for the script failure.

P.S. Again, I had to ssh into all the instances and see what's up there in the logs. Please propagate all the logs to the user in case of launch timeout. That would save me a lot of time.

@pdmars
Copy link
Member

pdmars commented May 21, 2018

Good to hear you figured out the issue. We'll work on improving the error message and propagating the reason, however, depending on the nature of the failure that can be a bit tricky. If you are able to ssh into all of the nodes without providing a password, are you able to use the logs command to gather up all the Stardog logs?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants