Nothing springs to mind, but I will think about this one for a while. Are
there likely other outstanding requests to the same kernel at the time,
and/or from the same requesting socket? If so, can you ballpark how many?
What are you using to indicate that pyzmq thinks the message has been sent?
-MinRK
On Wed, Feb 12, 2014 at 10:45 AM, Jason Grout
<jason-sage@creativetrax.com>wrote:
> Hi everyone,
>> I'm trying to track down a problem we're seeing in the Sage cell server
> with sending computation messages to an IPython kernel. This may end up
> being a problem with using pyzmq or zmq, so apologies in advance if it
> turns out to be OT for this list.
>> The tl;dr version is: it appears that in some very sporadic cases, pyzmq
> is sending a message (an execute_request message) to a kernel's shell
> channel tcp port on localhost, but wireshark never registers that
> message being sent, and the kernel that is supposed to receive the
> message never acts on it. My question is: does anyone have suggestions
> on debugging this or narrowing down the problem?
>> The (abbreviated, simplified) long version: in the sage cell server, we
> start up a number of IPython kernels that we keep waiting around for
> computations. When a computation is requested, we hook up the kernel's
> shell/iopub/heartbeat channels (i.e., create pyzmq zmqstream objects
> connecting to the tcp ports corresponding to the kernel's
> shell/io/heartbeat channels), send an execute_request, and assemble an
> answer for the user from output coming back on the iopub channel. When
> the system is under moderate load, every now and then (maybe every 300
> computations), we send an execute_request message to one of these
> kernels that is waiting around, and I see the zmq socket code claiming
> that it sent the message, but wireshark indicates that the message was
> never transmitted when looking at raw tcp traffic, and the kernel acts
> like it never received the message. We didn't change the high water
> mark for zmq, and I'm running zmq 3.2.2 and pyzmq 14.0.1. I've spent a
> long time narrowing the issue down to a zmq message not being sent, even
> though pyzmq seems to have thought it sent it. Does anyone have any
> suggestions for narrowing this down more, or possible causes?
>> I realize that my setup is a bit complicated, and I've tried to simplify
> the issues (but hopefully not too much). Any suggestions or help would
> be appreciated. The next thing I'm going to do is (a) upgrade zmq to
> 4.x, and (b) insert some debugging statements in the zmq library itself
> to see if the C zmq library thinks it sent the message.
>> Thanks,
>> Jason
> _______________________________________________
> IPython-dev mailing list
>IPython-dev@scipy.org>http://mail.scipy.org/mailman/listinfo/ipython-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-dev/attachments/20140212/e71c5ce6/attachment.html