Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goworker does not recover from loss of connection to redis #15

Open
joamaki opened this issue Mar 10, 2014 · 3 comments
Open

Goworker does not recover from loss of connection to redis #15

joamaki opened this issue Mar 10, 2014 · 3 comments

Comments

@joamaki
Copy link

joamaki commented Mar 10, 2014

If the connection to redis goes down (due to redis restart, netsplit, ...) the library does
not recover, but only logs the error:

2014-03-10 19:07:23,682 DEBG 'my-worker' stdout output:
1394474843680784649 [Error] Error on my-worker-1:14479-poller:my_queue getting job from [my_queue]: use of closed network connection

Goworker should recognize the redis client instance is dead and try reconnecting.

@subosito
Copy link

I modified GetConn to check connection status and perform redial. Here's the changes:

 func GetConn() (*RedisConn, error) {
    resource, err := pool.Get()

    if err != nil {
        return nil, err
    }
+
+   // Performs simple connection check. Redial if needed
+   _, err = resource.(*RedisConn).Do("PING")
+   if err != nil {
+       resource.(*RedisConn).Close()
+       resource, err = redisConnFromUri(uri)
+       if err != nil {
+           return nil, err
+       }
+   }
+
    return resource.(*RedisConn), nil
 }

Not sure if this is recommended way when working with vitess' pools.

I can make a PR if this changes are OK :)

@jpatters
Copy link
Contributor

jpatters commented Nov 4, 2017

I know this is an old thread but maybe this will help someone like me who found it when trying to handle disconnects.
This solution does not work because the connection does not get put back in the pool. When the pool tries to close it hangs on draining the chan that contains the connections in the pool.
I opted to handle the error outside of goworker. It required a patch to get it to actually throw an error though. See my PR here #60

@xescugc
Copy link
Contributor

xescugc commented Feb 3, 2021

The main issue is that on the poller.poll it would only return an error if it happens before the goroutine, once the goroutine is lunched Work will always return nil and just finish.

The only way I can think of having an error be returned as everything is running in goroutines and the main Work is blocked with ah sync.WaitGroup would be to have a make(chan error, 0) be part of the internal struct on the poller.newPoller so we can "just" <- poller.err and have the WaitGroup be also in the select (with a goroutine and a channel that i'll be closed once it's done).

  	waitCh := make(chan struct{})
	go func() {
		monitor.Wait()
		close(waitCh)
	}()
	select {
	case <-waitCh:
	case err := <-poller.error:
		// quite to gracefully shut down
		quit <- true
		// Return the poller error
		return err
	}

All the possible errors from the workers are also related to Redis and and they'll be closed using the quite (well they'll fail as there will be no connection to Redis but that's expected hehe).

Would this be ok to open a PR for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants