How to handle connection interruption/timeout in PubSub listen() ?
#386

Comments

What happens when connection to Redis goes down or timeouts while listening to PubSub messages via a long running "pubsub.listen()" ?
Any tips how i can gracefully handle / restart interrupted connection?
I do appreciate your help!

Actually, I didn't find any strict description of this option in documentation, but I should that it didn't work as expected. I'm using long running "pubsub.listen()" on production and it can hang unexpectedly forever.

Currently, I'm working on some kind of "pubsub.listen()"-thread+watchdog-thread implementation of my script. This should give me information about whole script behavior, not only socket behavior.

This comment has been minimized.

This won't fix the hang problem. The permanent hang is easy to recreate. Run your redis-server in the foreground of your terminal, put a breakpoint immediately before the while loop in SocketBuffer._read_from_socket(), then run the commands in this gist in your interpreter: https://gist.github.com/endophage/11238000

Once you're listening, kill your redis-server. You'll find the while loop gets stuck in an infinite loop where self._sock.recv(chunksize) immediately returns an empty string when it's called. I would expect a socket.error to be raised but it isn't (points big foam blame finger at core devs). The only way I think this can sensibly be detected is to determine how many times you've looped and received the empty string, set a number of loops to decide you've failed after, then attempt to send the PING command to see if the redis server is still reachable.

This sounds like a pretty awful solution but you should be waiting on self._sock.recv() to return if the redis-server is still there (it will block until data is available), so consistently getting an empty string response is a good sign the connection is no longer valid.

This comment has been minimized.

This comment has been minimized.

I have a fix for the hang but it's an ugly ass hack (basically the PING thing I described above, but crow-barred in just to confirm it works). We're experiencing this as a production issue at my work (fortunately in a non-critical but highly prized component), so I'll be actively working on a more elegant fix over the next couple of days.

I am using AWS elasticCache and running this code using supervisor, What my problem is my listener is not listening if the listener is kept quiet for long (say 4 or 5 days). and once we look for sending messages after 4 or days using "publish" the listener is not working (i.e) not executing the code "trigger.listen():" .

will the listener timeouts after certain hours or days ?

But once we rerun the above code the listener is working perfectly(listening the publish message).

Actually we are having 2 nodes in AWS elastic cache, but for these pubsub we are using master node, so no chance of disconnecting.

We are using 2.10.3 version of redis-py.

Any help will be greatly appreciated.

This comment has been minimized.

I too am having issues with the subscriber just not listening according to the publisher (result of the publish is a 0 instead of 1). I've taken steps to ensure that the the thread with the subscribed client is running and will restart if it stops, and the thread is running just fine and everything appears to be in order asides from the fact that the publish returns a 0 indicating nothing is listening. Bouncing the service doing the subscribing fixes it, but that is not a viable solution. As @sriman mentioned above it takes a few days to happen and I can't put my finger on it, but it has happened many many times since we started using redis pubsub months ago.

This comment has been minimized.

I have faced this problem mainly because I am using elastic cache of aws it
tends to close the connection sometimes, no specific usecase when it
happens, in the python code whenever connection closes its not reconnected
again. thats why subscriber can't listen to the signal to the subscribe, to
solve this there is option called socket keep alive in this library, give a
shot with that option