Dang. Didn't save the log. Pardon me.
I pasted exceptions only and thought it all about 0x26ed968d880001
session but now I see that what I posted above has TIMED_OUT on
another session altogether. Above I skipped pasting exceptions
thinking them on the same session but now it seems they probably were
not.

Advertising

I'm trying to track a case where zk seems of a sudden, client-side, to
give up the ghost w/ exceptions like those pasted above --
connectivity probs. There has been pollution in here where long gc
pauses that are > session timeout would trigger TIMED_OUT but those
have been tamed.
I'll be back if I get another instance on my hook.
Meantime, thanks for the comments.
St.Ack
On Mon, Feb 22, 2010 at 6:43 PM, Mahadev Konar <maha...@yahoo-inc.com> wrote:
> HI stack,
> the other interesting part is with the session:
> 0x26ed968d880001
>
> Looks like it gets disconnected from one of the servers (TIMEOUT). DO you
> see any of these messages: "Attempting connection to server" in the logs
> before you see all the consecutive
>
> org.apache.zookeeper.ClientCnxn: Exception closing session
> 0x26ed968d880001 to sun.nio.ch.selectionkeyi...@788ab708
> java.io.IOException: Read error rc = -1
> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
> at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>
> and....
>
>
> From the cient 0x26ed968d880001?
>
> Thanks
> mahadev
>
>
> On 2/22/10 11:42 AM, "Stack" <st...@duboce.net> wrote:
>
>> The thing that seems odd to me is that the connectivity complaints are
>> out of the zk client, right?, why is it failing getting to member 14
>> and why not move to another ensemble member if issue w/ 14?, and if
>> there were a general connectivity issue, I'd think that the running
>> hbase cluster would be complaining at about the same time (its talking
>> to datanodes and masters at this time).
>>
>> (Thanks for the input lads)
>>
>> St.Ack
>>
>>
>> On Mon, Feb 22, 2010 at 11:26 AM, Mahadev Konar <maha...@yahoo-inc.com>
>> wrote:
>>> I also looked at the logs. Ted might have a point. It does look like that
>>> zookeeper server's are doing fine (though as ted mentions the skew is a
>>> little concerning, though that might be due to very few packets served by
>>> the first server). Other than that the latencies of 300 ms at max should not
>>> cause any timeouts.
>>> Also, the number of packets received is pretty low - meaning that it wasn't
>>> serving huge traffic. Is there anyway we can check if the network connection
>>> from the client to the server is not flaky?
>>>
>>> Thanks
>>> mahadev
>>>
>>>
>>> On 2/22/10 10:40 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:
>>>
>>>> Not sure this helps at all, but these times are remarkably asymmetrical. I
>>>> would expect members of a ZK cluster to have very comparable times.
>>>>
>>>> Additionally, 345 ms is nowhere near large enough to cause a session to
>>>> expire. My take is that ZK doesn't think it caused the timeout.
>>>>
>>>> On Mon, Feb 22, 2010 at 10:18 AM, Stack <st...@duboce.net> wrote:
>>>>
>>>>> Latency min/avg/max: 2/125/345
>>>>> ...
>>>>> Latency min/avg/max: 0/7/81
>>>>> ...
>>>>> Latency min/avg/max: 1/1/1
>>>>>
>>>>> Thanks for any pointers on how to debug.
>>>>>
>>>
>>>
>
>