Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #11 #13

Merged
merged 19 commits into from
Mar 22, 2016
Merged

fix #11 #13

merged 19 commits into from
Mar 22, 2016

Conversation

matthewvalimaki
Copy link
Contributor

  • Add consul-template
  • Update existing examples
  • Clean up some scripts

Signed-off-by: Matthew Valimaki matthew.valimaki@gmail.com

- Add consul-template
- Update existing examples
- Clean up some scripts

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
@matthewvalimaki
Copy link
Contributor Author

I've updated user-consul-nginx-nodejs to show how consul-template is used as it was the only one where we actually have two services in one container.

@matthewvalimaki
Copy link
Contributor Author

Forgot to mention:

  • consul-template will always be running as I could not find a way to tell s6 not to launch it, see run and the if-clause there. If there's a way to to tell s6 not to launch consul-template we need to implement it. Simple exit did not do it.
  • consul-template runs as root, which right now is due to permissions to edit configuration files such as /etc/nginx/conf.d/default.conf. I would prefer having a group which then is given permission to specific files. Perhaps use existing consul user and group?

@@ -0,0 +1,23 @@
#!/usr/bin/with-contenv sh
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a couple of ideas to make this part a little more elegant.

s6 has the concept of a down file, http://skarnet.org/software/s6/servicedir.html. If a down file exists in a service directory, s6 won't start the service. This container should have a down file in root/etc/services.d/consul-template by default.

We can then move the num templates logic into an init script, https://github.com/just-containers/s6-overlay#executing-initialization-andor-finalization-tasks. The init script would remove the down file if there was something for consul-template to do.

In order to stop consul-template running indefinitely, after everything has finished, we could done one of two things:

  • use s6-svc -d /var/run/s6/services/consul-template to bring down this service and don't restart it
  • or write a down file, and then exit (s6 will notice the down file and not restart it)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, perhaps the looping logic could be tidied up by changing this from a shell script to an execline script and utilising s6-svwait, http://skarnet.org/software/s6/s6-svwait.html. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to stop consul-template running indefinitely

It is meant to run indefinitely as it monitors changes and updates files accordingly. That is why there's s6 command to restart, see examples/user-consul-nginx-nodejs/root/etc/consul-template/conf.d/nginx/nginx.conf and command = "s6-svc -h /var/run/s6/services/nginx".

I will add the down file and init script as that is exactly what is needed. Thanks for pointing this feature out!

Also, perhaps the looping logic could be tidied up by changing this from a shell script to an execline script and utilising s6-svwait

I did try s6-svwait but unfortunately what happens is that while consul agent is up it is not usable immediately causing consul-template to throw up an error as it tries to query for data for templates. We could try notification-fd however I am not really sure how to implement this. If you have an example or know how to please let me know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to handle this stuff is through s6-rc which is brand new. You can view this issue on s6-overly for more information just-containers/s6-overlay#112. Until then, I think we should keep it really simple and run with the loop. At such a time s6-rc is integrate into s6-overlay, then we can refactor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewvalimaki, I was reading the consul-template documentation a little more (I haven't actually used it before). Could we make use of the retry option? Would that cause consul-template not to error? Perhaps then we could remove the Consul nodes watch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smebberson I haven't been clear. It does retry but problem is that it
logs an error. So what I'm trying to avoid is that error from happening.
Nothing crashes and consul-template does continue to work correctly.
On Dec 16, 2015 4:22 PM, "Scott Mebberson" notifications@github.com wrote:

In alpine-consul/root/etc/services.d/consul-template/run
#13 (comment)
:

@@ -0,0 +1,23 @@
+#!/usr/bin/with-contenv sh

@matthewvalimaki https://github.com/matthewvalimaki, I was reading the
consul-template documentation a little more (I haven't actually used it
before). Could we make use of the retry option? Would that cause
consul-template not to error? Perhaps then we could remove the Consul
nodes watch?


Reply to this email directly or view it on GitHub
https://github.com/smebberson/docker-alpine/pull/13/files#r47854240.

@smebberson
Copy link
Owner

@matthewvalimaki, to make this super complete I think it might need a consul watch implementation, https://www.consul.io/docs/commands/watch.html. Having the benefit that configuration could be re-run when the nodejs.app IP changes? What do you think? Are you interested in adding that?

@smebberson
Copy link
Owner

@matthewvalimaki, btw, thanks for this. It's going to be a great addition. I'm happy to help add the extra bits and pieces if you like.

@matthewvalimaki
Copy link
Contributor Author

@smebberson No need for consul watch as consul-template already handles this. So if IP, port changes or new nodes with same service and tag become available /etc/nginx/conf.d/default.conf will be automatically rewritten and nginx through s6 restarted.

And no problem, I like these Docker images and use them myself and want to improve them so that I have less to worry about downstream :)

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
… to be brought up when templates actually exist.

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
…to use it

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
@matthewvalimaki
Copy link
Contributor Author

@smebberson Improvements pushed, see matthewvalimaki@ae74691.

@smebberson
Copy link
Owner

@matthewvalimaki, these changes look great. Given the scenario we're faced with, the loop you have implemented should work fine. To implement notifications, well, I think it will require some effort from Consul's part. But I think we should raise it as a bug. Either consul-template should not throw an error but repeat connection attempts for a specified amount of time, or consul should repeat when it is not only up, but ready to start receiving requests.

I wasn't aware that consul-template already does the job of consul watch also. That is great.

Thanks again. I'll look to merge this in ASAP.

@matthewvalimaki
Copy link
Contributor Author

@smebberson actually I believe by using consul watch, something you brought up, and its Type: nodes we probably could remove the loop I introduced and clean up consul-template run. Consul Watches documentation for Type: nodes states the following:

The "nodes" watch type is used to watch the list of available nodes. It has no parameters.

Then in combination with consul handler executing a shell script we should be able to know when consul agent is in usable state.

alpine-consul and alpine-consul-base would see the following new file introduced in /etc/consul.d/_watch-node.conf and /etc/consul.d/bootstrap/_watch-node.conf. Actually the file should be in /etc/consul.d/watches/_watch-node.conf and then symlinked to mentioned paths:

{
    "watches": [
      {
          "type": "nodes",
          "handler": "/usr/sbin/consul-nodes-handler.sh"
      }
    ]
}

consul-nodes-handler.sh would then be responsible for:

  1. Checks if templates exist.
  2. Remove /etc/services.d/consul-template/down if templates did exist.
  3. Executes s6-svc -u /var/run/s6/services.d/consul-template to start consul-template.
  4. rm symlink
  5. consul reload to reload configuration with now removed watch.

Unfortunately as it is https://github.com/smebberson/docker-alpine/blob/master/alpine-consul-base/root/etc/services.d/consul/run is recursively set to go through /etc/consul.d/ so it would load up /etc/consul.d/watches every time. @smebberson what if we rename /etc/consul.d to /etc/consul and then have all default configurations in /etc/consul/conf.d? That way we could have /etc/consul/watches and do the symlink I mentioned above.

/etc/cont-finish.d/01-consul-template would re-create the symlink and down removed by handler.

  • By consul reload in the handler we ensure that the handler gets executed only once.
  • By using new finish script we ensure that when services (container) goes down we reset to original state.

@smebberson
Copy link
Owner

@matthewvalimaki, yep, I like the sound of that. I think that would be a great addition and tidy things up nicely! Great suggestion.

…onf.d

- Update consul-template setup

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
- Move consul-nodes-handler from sbin to bin to better reflect other images

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
@matthewvalimaki
Copy link
Contributor Author

@smebberson I've updated code to reflect what was discussed earlier with one exception:
Instead of /etc/cont-finish.d/01-consul-template I opted to use /etc/cont-init.d/00-consul-template to guarantee that necessary down and symlink are created in case services are not finished properly.

fi

# 3) Execute `s6-svc -u` to notify supervisor that service should start
s6-svc -u /var/run/s6/services.d/consul-template
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be s6-svc -u /var/run/s6/services/consul-template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should. Thanks for spotting this. I'll take care of it.

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
- Remove useless loop which was replaced by init and additional script
- Fix down file location from /etc to /var/run/s6/services/consul-template/down which is created by s6-overlay stage 2.

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
@smebberson
Copy link
Owner

@matthewvalimaki, sorry, I didn't realise this was ready for review again. I assume it is?

@matthewvalimaki
Copy link
Contributor Author

No problem and yes it is ready again.
A side note: would it be easy to implement tests for these? Haven't looked
into that yet at all but I would like to work on those.
On Dec 12, 2015 1:06 PM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki, sorry, I didn't
realise this was ready for review again. I assume it is?


Reply to this email directly or view it on GitHub
#13 (comment)
.

#!/usr/bin/execlineb -S1

# only tell s6 to bring down the entire container, if it isn't already doing so
# http://skarnet.org/software/s6/s6-supervise.html
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewvalimaki, I think it would be better if consul-template automatically restarted rather than bringing down the container (to do this just remove this file). consul-template is a secondary service to consul itself and I think it would be fine to have s6 automatically restart it.

I think it would also go aways to resolving the consul-template runs before consul is ready issue (aside from the retry option I've just commented about).

What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewvalimaki, we might even be able to add a sleep 1 in the finish script so that s6 doesn't restart it so quickly.

I guess I'm just not 100% on the consul watch to wait for nodes. I definitely like it better than the shell loop. But it just feels like it is making the container logic heavy. But, without s6-rc, if that is what it takes, then that is what it takes I guess.

Anyway, let me know your final thoughts and if you think the container definitely needs the watch, then I'll get to merging it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smebberson not a big deal to me but my reasoning against would be that if
template fails it means that configuration probably failed too and so
template in my mind is as essential as consul. In the end user defines
behavior so again not a big deal.
On Dec 16, 2015 4:46 PM, "Scott Mebberson" notifications@github.com wrote:

In alpine-consul/root/etc/services.d/consul-template/finish
#13 (comment)
:

@@ -0,0 +1,8 @@
+#!/usr/bin/execlineb -S1
+
+# only tell s6 to bring down the entire container, if it isn't already doing so
+# http://skarnet.org/software/s6/s6-supervise.html

@matthewvalimaki https://github.com/matthewvalimaki, I think it would
be better if consul-template automatically restarted rather than bringing
down the container (to do this just remove this file). consul-template is
a secondary service to consul itself and I think it would be fine to have
s6 automatically restart it.

I think it would also go aways to resolving the consul-template runs
before consul is ready issue (aside from the retry option I've just
commented about).

What do you think?


Reply to this email directly or view it on GitHub
https://github.com/smebberson/docker-alpine/pull/13/files#r47856359.

@matthewvalimaki
Copy link
Contributor Author

@smebberson I'll do those group changes.

@smebberson according to init-stages stage 2 i. /etc/fix-attrs.d happens before stage 2. iii which is where copying happens. I have not tested this but I would think the problematic dir in question /var/run/s6/services/*/supervise would still be root:root.

@smebberson to me it feels like s6-overlay should have 4th step in stage 2 after copying files.

…image versions.

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
@matthewvalimaki
Copy link
Contributor Author

@smebberson consul-template user and group added. I've updated user-consul-nginx-nodejs example to make use of this change, see the Dockerfile.

Running consul-template as non-root does cause another problem: reloading of services.
Having consul-template belong to the same group as nginx so that consul-template can rewrite configuration works. But in order for that configuration to be loaded up one would need to tell s6 to reload nginx leading us back to the problem of /var/run/s6/services/*/supervise belonging to root:root.

Right now this problem is solved for consul-template in /etc/services.d/consul/run where I made it chown -R consul:consul /var/run/s6/services/consul-template. Unless s6-overlay adds something to manage this I think the best place is indeed in services.d/run.

To make things cleaner we could introduce a new group which owns, along with root, all /var/run/s6/services/*/supervise so that users such as consul-template by belonging to that group could easily manage services. This way in all run scripts where we use s6-setuidgid we should also modify the /var/run/s6/services/*/supervise.

@smebberson
Copy link
Owner

@matthewvalimaki, have you seen the response in just-containers/s6-overlay#130 ? This goes along with our thinking. Let's set it up as you've stated:

Introduce a new group which owns, along with root, all /var/run/s6/services/*/supervise so that users such as consul-template by belonging to that group could easily manage services. This way in all run scripts where we use s6-setuidgid we should also modify the /var/run/s6/services/*/supervise.

I would like to some how keep this out of the individual run scripts if possible, do you agree? I agree with your sentiment about a 4th stage. I'll follow up in the same issue on s6-overlay and see what happens.

@smebberson
Copy link
Owner

@matthewvalimaki, it seems as though we should be able to set everything up in some cont-init.d scripts however? Reading http://skarnet.org/software/s6-portable-utils/s6-hiercopy.html, it says owner and group are preserved. Although, it feels like this setup stuff should be in alpine-base?

@smebberson
Copy link
Owner

@matthewvalimaki, are you still keen on pushing this one through?

@matthewvalimaki
Copy link
Contributor Author

Yes. I am currently on vacationing causing the delay.
On Jan 27, 2016 12:49 AM, "Scott Mebberson" notifications@github.com
wrote:

@matthewvalimaki https://github.com/matthewvalimaki, are you still keen
on pushing this one through?


Reply to this email directly or view it on GitHub
#13 (comment)
.

@matthewvalimaki
Copy link
Contributor Author

@smebberson I tested the idea of setting permissions on /etc/services.d/consul to root:s6 (s6 being the new group) and s6-hiercopy does seem to retain ownership. Unfortunately s6-svc -h /var/run/s6/services/consul/ threw permission denied and code 111. Once I changed ownership to consul:s6 the service restart worked. Problem seems to be that s6 requires user to be defined (in this case consul) and group is irrelevant. Adding group write permission seems to do the trick. I followed up on your comment at just-containers/s6-overlay#130 (comment).

Unfortunately /var/run/s6/services/consul/supervise/ is filled after cont-init.d therefore any permission fixes we might want to do would have to be done from run.

@smebberson
Copy link
Owner

Okay, cool. No worries, thanks for following that up. So, is everything ready in this PR to be merged?

@smebberson
Copy link
Owner

@matthewvalimaki, ping! Let me know if this is ready to go. I want to get this merged and then resolve the other issues on this repo. Thanks!

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
…mplate` work properly with `s6-svc`.

Signed-off-by: Matthew Valimaki <matthew.valimaki@gmail.com>
@matthewvalimaki
Copy link
Contributor Author

@smebberson pong! Sorry for the delay. This PR has existed far too long. Not only has it held back new releases but it's hard to capture all that has been discussed even when those are written above.

Please review if Dockerfile could be improved, the first RUN seems a bit heavy to me.

All changes were made to alpine-consul-base:

  • Added s6 usergroup.
    • consul and consul-template users belong to this group.
  • Per Permissions for service management just-containers/s6-overlay#130 (comment)
    • /etc/services.d/consul-template/supervise/control is created in Dockerfile.
    • chmod g+w /etc/services.d/consul-template/supervise/control is done in Dockerfile.
    • A note of previous bullet point requirement was added to alpine-consul-base/README.md. Any and all services that consul-template is expected to manage through s6-svc need to have s6 group as the owner of this particular file.
  • chmod o-w /etc/services.d/consul-template/down is done in Dockerfile to prevent potential security issue as anybody could remove that file.

@smebberson
Copy link
Owner

@matthewvalimaki, this looks good. I'm going to pull this down, work it up a little more and then merge it into master. We can always improve on things along the way. My gut feel is this is a little heavy straight in alpine-consul-base but I'm more than happy with the overall capability this brings and I'm happy to see how it goes.

I'll get onto this first thing tomorrow morning (don't want to start this so late in the day here).

@matthewvalimaki
Copy link
Contributor Author

@smebberson sounds good. Please do whatever changes you want. I just want
consul-template to be available :) Once we have this merged I will
provide PR for nomad.

On Tue, Mar 1, 2016 at 1:29 AM, Scott Mebberson notifications@github.com
wrote:

@matthewvalimaki https://github.com/matthewvalimaki, this looks good.
I'm going to pull this down, work it up a little more and then merge it
into master. We can always improve on things along the way. My gut feel is
this is a little heavy straight in alpine-consul-base but I'm more than
happy with the overall capability this brings and I'm happy to see how it
goes.

I'll get onto this first thing tomorrow morning (don't want to start this
so late in the day here).


Reply to this email directly or view it on GitHub
#13 (comment)
.

@matthewvalimaki
Copy link
Contributor Author

@smebberson understanding that this + other issues cause quite a backlog but I was wondering how is it going? Do you need help with anything? I rather help you with this project than start maintaining my fork so let me know if there's anything I could do. I have quite a few things depending on these images :)

@smebberson
Copy link
Owner

@matthewvalimaki. Sorry! Working on it now ;)

@smebberson
Copy link
Owner

@matthewvalimaki, I'm getting some weird errors and I can't get it to run. I guess you weren't experience anything like the below?

Consul Template returned errors:
open /etc/nginx/conf.d/186949828: permission denied

I don't have heaps of experience with consul-template so I'm not sure what it would be doing. Or even why it would be looking in that directory?

Any ideas?

@smebberson
Copy link
Owner

@matthewvalimaki, I managed to get past that. Running into a few other issues now. But I'm working through them.

@matthewvalimaki
Copy link
Contributor Author

Sorry, sounds like I made a broken or :(

A side note: remember that we can use pre-built consul binary as of 0.6.
On Mar 17, 2016 6:35 PM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki, I managed to get
past that. Running into a few other issues now. But I'm working through
them.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#13 (comment)

@smebberson
Copy link
Owner

@matthewvalimaki, take a look at https://github.com/smebberson/docker-alpine/tree/matthewvalimaki-add-consul-template. That is my integration branch. Everything seems to be running smoothly now. There was a number of issues due to permissions not allowing certain things to occur (such as removing the consul nodes watch symlink - you didn't experience anything like that?).

The only thing I'm not happy with is the weird permissions I had to have on the /etc/nginx directory. I had to set them to 674 (??) so that consul-template could write to the directory. When I run with 664 I get:

Consul Template returned errors:
open /etc/nginx/conf.d/default.conf: permission denied

Despite the fact that consul-template has been added to the nginx group? Can you see why that would happen?

If you want to run this locally, you'll have to build (in this order):

  • alpine-consul-base
  • alpine-consul-nginx
  • alpine-consul-nginx-nodejs
  • examples/user-consul-nginx-nodejs

If you could give me your input on that - that'd be great. I'll wait to merge into master until you get back to me.

Also, before I merge into master, I'll release v1.0.0 of alpine-consul-nginx, and alpine-consul-nginx-nodejs and then merge those into my integration branch so that people wanting to keep the old versions can.

@matthewvalimaki
Copy link
Contributor Author

@smebberson please merge #12 while you're at working on nodejs :) I'll think about your questions now and see if I can come up with something. Thanks for the cleanup.

@matthewvalimaki
Copy link
Contributor Author

@smebberson going back to your last questions:

There was a number of issues due to permissions not allowing certain things to occur (such as removing the consul nodes watch symlink - you didn't experience anything like that?).

I did! And I thought I pushed up those fixes, very similar to what you did. I'm sorry, I should've made a clean clone to test it again. The symlink was one of the very first permissions issues I encountered.

Do you still have similar issues elsewhere?

The only thing I'm not happy with is the weird permissions I had to have on the /etc/nginx directory. I had to set them to 674 (??) so that consul-template could write to the directory.

I did not see that. Probably having Windows as host causes some issues for permissions for me. As the file you mentioned, /etc/nginx/conf.d/default.conf, is added when building I suspect the ownership or permissions being incorrect.

@smebberson smebberson merged commit 36c447e into smebberson:master Mar 22, 2016
@smebberson
Copy link
Owner

@matthewvalimaki this is finally done! I've version bumped and Git tagged, and Docker hub tagged alpine-consul-base, alpine-consul-nginx and alpine-consul-nginx-nodejs. All seem to be working well. Thanks for your PR!

@matthewvalimaki
Copy link
Contributor Author

@smebberson awesome! Good job! Now I get to provide Nomad and other
improvements. These images are by far the best with Hashi stuff :)
On Mar 22, 2016 4:41 PM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki this is finally
done! I've version bumped and Git tagged, and Docker hub tagged
alpine-consul-base, alpine-consul-nginx and alpine-consul-nginx-nodejs.
All seem to be working well. Thanks for your PR!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#13 (comment)

@matthewvalimaki matthewvalimaki deleted the add-consul-template branch July 21, 2016 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants