Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Connect function #11

Merged
merged 8 commits into from
Feb 7, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 49 additions & 6 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

162 changes: 162 additions & 0 deletions connection/connection.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
/*
Copyright 2019 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the usual copyright and license boiler plate code at the beginning of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


import (
"context"
"errors"
"net"
"strings"
"time"

"github.com/kubernetes-csi/csi-lib-utils/protosanitizer"
"google.golang.org/grpc"
"k8s.io/klog"
)

const (
// Interval of logging connection errors
connectionLoggingInterval = 10 * time.Second
)

// Connect opens insecure gRPC connection to a CSI driver. Address must be either absolute path to UNIX domain socket
// file or have format '<protocol>://', following gRPC name resolution mechanism at
// https://github.com/grpc/grpc/blob/master/doc/naming.md.
//
// The function tries to connect indefinitely every second until it connects. The function automatically disables TLS
// and adds interceptor for logging of all gRPC messages at level 5.
//
// For a connection to a Unix Domain socket, the behavior after
// loosing the connection is configurable. The default is to
// log the connection loss and reestablish a connection. Applications
// which need to know about a connection loss can be notified by
// passing a callback with OnConnectionLoss and in that callback
// can decide what to do:
// - exit the application with os.Exit
// - invalidate cached information
// - disable the reconnect, which will cause all gRPC method calls to fail with status.Unavailable
//
// For other connections, the default behavior from gRPC is used and
// loss of connection is not detected reliably.
func Connect(address string, options ...Option) (*grpc.ClientConn, error) {
return connect(address, []grpc.DialOption{}, options)
}

// Option is the type of all optional parameters for Connect.
type Option func(o *options)

// OnConnectionLoss registers a callback that will be invoked when the
// connection got lost. If that callback returns true, the connection
// is restablished. Otherwise the connection is left as it is and
// all future gRPC calls using it will fail with status.Unavailable.
func OnConnectionLoss(reconnect func() bool) Option {
return func(o *options) {
o.reconnect = reconnect
}
}

type options struct {
reconnect func() bool
}

// connect is the internal implementation of Connect. It has more options to enable testing.
func connect(address string, dialOptions []grpc.DialOption, connectOptions []Option) (*grpc.ClientConn, error) {
var o options
for _, option := range connectOptions {
option(&o)
}

dialOptions = append(dialOptions,
grpc.WithInsecure(), // Don't use TLS, it's usually local Unix domain socket in a container.
grpc.WithBackoffMaxDelay(time.Second), // Retry every second after failure.
grpc.WithBlock(), // Block until connection succeeds.
grpc.WithUnaryInterceptor(LogGRPC), // Log all messages.
)
unixPrefix := "unix://"
if strings.HasPrefix(address, "/") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle an explicit unix:/// prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dial() handles unix:/// just fine without any special dialer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? grpc/grpc-go#1741 was asking for this and then the feature request moved to grpc/grpc-go#1911, which is still open.

I was under the impression that the custom dialer was used specifically to work around that shortcoming in gRPC. Perhaps I misunderstood. 🤷‍♂️

Looks like a good test case, don't you think? 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does indeed work. Then I don't understand what those open gRPC issues are about. It looks like the custom dialer is needed because an absolute path without the unix:/// prefix will not be treated as a Unix domain socket.

I'm still a bit wary about not always using that dialer, though. It creates two different code paths in our Connect function which (theoretically) could have different behavior. In practice I've not found any relevant difference, but the slow tests in PR jsafrane#1 now only cover the case with an absolute path. I only tried the unix:/// variant of those by editing the test source code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added code that removes custom dialer and uses unix:// for all sockets. This changes behavior of existing sidecar containers that used custom dialer in old releases, IMO it should be fine.

// It looks like filesystem path.
address = unixPrefix + address
}

if strings.HasPrefix(address, unixPrefix) {
// state variables for the custom dialer
haveConnected := false
lostConnection := false
reconnect := true

dialOptions = append(dialOptions, grpc.WithDialer(func(addr string, timeout time.Duration) (net.Conn, error) {
if haveConnected && !lostConnection {
// We have detected a loss of connection for the first time. Decide what to do...
// Record this once. TODO (?): log at regular time intervals.
klog.Errorf("Lost connection to %s.", address)
// Inform caller and let it decide? Default is to reconnect.
if o.reconnect != nil {
reconnect = o.reconnect()
}
lostConnection = true
}
if !reconnect {
return nil, errors.New("connection lost, reconnecting disabled")
}
conn, err := net.DialTimeout("unix", address[len(unixPrefix):], timeout)
if err == nil {
// Connection restablished.
haveConnected = true
lostConnection = false
}
return conn, err
}))
} else if o.reconnect != nil {
return nil, errors.New("OnConnectionLoss callback only supported for unix:// addresses")
}

klog.Infof("Connecting to %s", address)

// Connect in background.
var conn *grpc.ClientConn
var err error
ready := make(chan bool)
go func() {
conn, err = grpc.Dial(address, dialOptions...)
close(ready)
}()

// Log error every connectionLoggingInterval
ticker := time.NewTicker(connectionLoggingInterval)
defer ticker.Stop()

// Wait until Dial() succeeds.
for {
select {
case <-ticker.C:
klog.Warningf("Still connecting to %s", address)

case <-ready:
return conn, err
}
}
}

// LogGRPC is gPRC unary interceptor for logging of CSI messages at level 5. It removes any secrets from the message.
func LogGRPC(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
klog.V(5).Infof("GRPC call: %s", method)
klog.V(5).Infof("GRPC request: %s", protosanitizer.StripSecrets(req))
err := invoker(ctx, method, req, reply, cc, opts...)
klog.V(5).Infof("GRPC response: %s", protosanitizer.StripSecrets(reply))
klog.V(5).Infof("GRPC error: %v", err)
return err
}
Loading