-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from 1.7.2 to 1.10.0 causes connections to hang #1899
Comments
@MakMukhi Let me know if there is further troubleshooting required to help you isolate the issue. |
My guess is this is fixed by #1889, since you are dialing "unix://..." (via |
@dfawley I gave that fix a try and am still seeing the client hang and timeout. Is it possible other changes are needed for this to be a full fix, as well? |
@stevvooe can you try turning on info logging and see if anything useful comes out? Thanks!
|
@dfawley It looks to be blocking on the balancer picker: ~/g/s/g/c/containerd ❯❯❯ sudo GRPC_GO_LOG_SEVERITY_LEVEL=INFO ctr --debug images pull docker.io/library/wordpress:latest update-grpc-110 ✭ ✱ ◼
INFO[0000] connecting to containerd
INFO[0000] dialing containerd address="/run/containerd/containerd.sock"
2018/03/07 13:38:54 dialing to target with scheme: "unix"
2018/03/07 13:38:54 ccResolverWrapper: sending new addresses to cc: [{run/containerd/containerd.sock 0 <nil>}]
2018/03/07 13:38:54 ClientConn switching balancer to "pick_first"
2018/03/07 13:38:54 pickfirstBalancer: HandleSubConnStateChange: 0xc420077f40, CONNECTING The first two log lines I added to |
Is this with #1889? What is the actual address that is being passed to grpc.Dial()? Is it In this case, you are confusing our target parsing logic. We now follow the naming scheme defined in the grpc spec here: https://github.com/grpc/grpc/blob/master/doc/naming.md#name-syntax. The only exception is our default scheme is "passthrough" instead of "dns", for historical-compatibility reasons. Because you actually want your dialer to see
(When #1741 is fixed, you won't even need a custom dialer for unix sockets, so this whole mess would not be a problem.) |
Will all due respect, I am confusing nothing. We updated the package and now the dialer code doesn't work. Whatever changes were put in have created an incompatibility that is breaking our code. I get the same output, whether or not #1889 is applied. It doesn't seem to have any effect. The above output is with the patch applied. I modified the above example to show the address passed to dial: ~/g/s/g/c/containerd ❯❯❯ sudo ctr plugins -d type~=snapshotter update-grpc-110 ✭ ✱ ◼
INFO[0000] connecting to containerd
INFO[0000] dialing containerd address="/run/containerd/containerd.sock"
INFO[0000] dialing containerd address_passed_to_dial="unix:///run/containerd/containerd.sock"
2018/03/08 12:13:02 dialing to target with scheme: "unix"
2018/03/08 12:13:02 ccResolverWrapper: sending new addresses to cc: [{run/containerd/containerd.sock 0 <nil>}]
2018/03/08 12:13:02 ClientConn switching balancer to "pick_first"
2018/03/08 12:13:02 pickfirstBalancer: HandleSubConnStateChange: 0xc4200b3d70, CONNECTING
INFO[0000] address in Dialer wrapper address="run/containerd/containerd.sock"
INFO[0000] address in dialer address="run/containerd/containerd.sock"
INFO[0000] address trimmed for net.DialTimeout address="run/containerd/containerd.sock"
ERRO[0000] dial error error="dial unix run/containerd/containerd.sock: connect: no such file or directory" As you can see, we are passing in the Here is the patch to containerd used to debug this: diff --git a/client.go b/client.go
index 2ac256dd..b9360247 100644
--- a/client.go
+++ b/client.go
@@ -40,6 +40,7 @@ import (
"github.com/containerd/containerd/dialer"
"github.com/containerd/containerd/errdefs"
"github.com/containerd/containerd/images"
+ "github.com/containerd/containerd/log"
"github.com/containerd/containerd/namespaces"
"github.com/containerd/containerd/platforms"
"github.com/containerd/containerd/plugin"
@@ -94,16 +95,23 @@ func New(address string, opts ...ClientOpt) (*Client, error) {
)
}
connector := func() (*grpc.ClientConn, error) {
- conn, err := grpc.Dial(dialer.DialAddress(address), gopts...)
+
+ log.L.WithField("address", address).Infoln("dialing containerd")
+ addressPassedToDial := dialer.DialAddress(address)
+ log.L.WithField("address_passed_to_dial", addressPassedToDial).Infoln("dialing containerd")
+ conn, err := grpc.Dial(addressPassedToDial, gopts...)
if err != nil {
return nil, errors.Wrapf(err, "failed to dial %q", address)
}
return conn, nil
}
+ log.L.Infoln("connecting to containerd")
conn, err := connector()
if err != nil {
return nil, err
}
+
+ log.L.Infoln("connected to containerd")
return &Client{
conn: conn,
connector: connector,
diff --git a/cmd/ctr/main.go b/cmd/ctr/main.go
index ec41c59a..a72c8fba 100644
--- a/cmd/ctr/main.go
+++ b/cmd/ctr/main.go
@@ -18,7 +18,6 @@ package main
import (
"fmt"
- "io/ioutil"
"log"
"os"
@@ -45,7 +44,7 @@ var extraCmds = []cli.Command{}
func init() {
// Discard grpc logs so that they don't mess with our stdio
- grpclog.SetLogger(log.New(ioutil.Discard, "", log.LstdFlags))
+ grpclog.SetLogger(log.New(os.Stderr, "", log.LstdFlags))
cli.VersionPrinter = func(c *cli.Context) {
fmt.Println(c.App.Name, version.Package, c.App.Version)
diff --git a/dialer/dialer.go b/dialer/dialer.go
index 766d3449..35dec9ec 100644
--- a/dialer/dialer.go
+++ b/dialer/dialer.go
@@ -20,6 +20,7 @@ import (
"net"
"time"
+ "github.com/containerd/containerd/log"
"github.com/pkg/errors"
)
@@ -30,6 +31,7 @@ type dialResult struct {
// Dialer returns a GRPC net.Conn connected to the provided address
func Dialer(address string, timeout time.Duration) (net.Conn, error) {
+ log.L.WithField("address", address).Infof("address in Dialer wrapper")
var (
stopC = make(chan struct{})
synC = make(chan *dialResult)
@@ -43,6 +45,7 @@ func Dialer(address string, timeout time.Duration) (net.Conn, error) {
default:
c, err := dialer(address, timeout)
if isNoent(err) {
+ log.L.WithError(err).Error("dial error")
<-time.After(10 * time.Millisecond)
continue
}
diff --git a/dialer/dialer_unix.go b/dialer/dialer_unix.go
index e7d19583..8a3fddc5 100644
--- a/dialer/dialer_unix.go
+++ b/dialer/dialer_unix.go
@@ -25,6 +25,8 @@ import (
"strings"
"syscall"
"time"
+
+ "github.com/containerd/containerd/log"
)
// DialAddress returns the address with unix:// prepended to the
@@ -47,6 +49,8 @@ func isNoent(err error) bool {
}
func dialer(address string, timeout time.Duration) (net.Conn, error) {
+ log.L.WithField("address", address).Infof("address in dialer")
address = strings.TrimPrefix(address, "unix://")
+ log.L.WithField("address", address).Infof("address trimmed for net.DialTimeout")
return net.DialTimeout("unix", address, timeout)
} |
The behavior change brings us in line with the gRPC spec. I'm sorry for the breakage, but your input happens to exactly follow the format the spec defines, so it unfortunately doesn't end up following the fallback behavior intended to maintain backward compatibility for most users. You should be able to fix this by prefixing your target with "passthrough:///". Then the full text after that will be handed directly to your custom dialer, as it was before. Let me know if that doesn't work and I'll take another look. |
@dfawley How is this possibly correct behavior? Is the unix dialing scheme even supported? We still have to inject our own dialer for this to work at all. If the existing dialing stack doesn't understand the scheme, why is it parsing and modifying it? This seems completely broken. For example, if you parse off the scheme for |
Once unix support is done, you will not need a custom dialer.
In the spec, it says: It's true that http URLs grab the separating slash along with the path, but in this case, pulling the slash along with it would not be usable; the endpoint name is typically a DNS name like "google.com". Parsing it out as "/google.com" would be a problem. FWIW, there was a gRFC published for these changes before they were implemented, since they were fairly significant. You can find them at the proposal repo here: https://github.com/grpc/proposal/blob/master/L9-go-resolver-balancer-API.md |
@dfawley With this scheme, how would one continue using a custom dialer if the package incorrectly returns the path for the filesystem? I appreciate the RFC, but it seems like important details still need to be figured out. |
@stevvooe When the unix scheme is supported, you'd use it like so:
You can use your existing custom dialer similarly:
|
No system in the world implements it this way. |
cc @markdroth |
I expect the i.e. +1 for |
Apparently C implements it this way, despite what is in the gRPC naming spec (which is what we used for our implementation). We'll look into this: #1911. |
@dfawley Can we reopen this issue for tracking? Please us know if you need help with building the use case. |
We probably will avoid upgrading, for now. I hope a solution can be found that doesn't break everyone. I'll also try to attend the community meeting to get a better understanding of the proposal. I know I reviewed earlier versions of this but the current proposal doesn't look very familiar. |
@stevvooe There could be a solution for this issue: For Filed #1943 for this. |
Please answer these questions before submitting your issue.
What version of gRPC are you using?
Was using 1.7.2 and now using 1.10.0
What version of Go are you using (
go version
)?1.10
What operating system (Linux, Windows, …) and version?
Confirmed on Ubuntu 16.04, 17.10 and Amazon Linux.
What did you do?
We upgraded from 1.7.2 to 1.10 grpc package (see containerd/containerd#2186). The code connects with the following https://github.com/containerd/containerd/blob/master/client.go#L71. This is typically over a unix socket.
Details are here: containerd/containerd#2185.
What did you expect to see?
It should not hang.
What did you see instead?
It hangs until there is a timeout. Here is the user experience with a containerd client:
The text was updated successfully, but these errors were encountered: