zookeeper-user mailing list archives

No, I don't have that data. I'll try to get it next time.
On Jul 31, 2012, at 5:13 PM, Patrick Hunt <phunt@apache.org> wrote:
> Any monitoring of mem, gc, disk, etc... that might give some
> additional insight? Perhaps the disks were loaded and that was slowing
> things? Or swapping/gc of the jvm? You might be able to tune to
> resolve some of that.
>
> One thing you can try is copying the snapshot file to a an empty
> datadir on a separate machine and try starting a 2 node cluster.
> (where the second node starts with an empty datadir)
>
> Patrick
>
> On Tue, Jul 31, 2012 at 3:34 PM, Jordan Zimmerman
> <jordan@jordanzimmerman.com> wrote:
>>> Seems you are down to 4gb now. That still seems way too high for
>>> "coordination" operations… ?
>>
>> A big problem currently is detritus nodes. People use lock recipes for various movie
IDs and they leave garbage parent nodes around in the thousands. I've written some gc tasks
to clean them up but it's been a slow process to get everyone to use it. I know there is a
Jira to help with this but I don't know the status.
>>
>> -JZ
>>
>> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <phunt@apache.org> wrote:
>>
>>> On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman
>>> <jordan@jordanzimmerman.com> wrote:
>>>> There were a lot creations but I removed those nodes last night. How long
does it take to clear out of the snapshot?
>>>
>>> The snapshot is a copy of whatever is in the znode tree at the time
>>> the snapshot is taken. (so instantaneous the next time a snapshot is
>>> taken). You can see the dates and the epoch number if that gives you
>>> any insight (epoch is the upper 32 bits of the filename)
>>>
>>> Seems you are down to 4gb now. That still seems way too high for
>>> "coordination" operations... ?
>>>
>>> Patrick
>>>
>>>>
>>>> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>>
>>>>> You have an 11gig snapshot file. That's very large. Did someone
>>>>> unexpectedly overload the server with znode creations?
>>>>>
>>>>> When a follower comes up the leader needs to serialize the znodes to
>>>>> the snapshot file, stream it to the follower, who saves it locally
>>>>> then deserializes it. (11g/15min is avg about 12meg/second for this
>>>>> process)
>>>>>
>>>>> Often times this is exacerbated by the max heap and GC interactions.
>>>>>
>>>>> Patrick
>>>>>
>>>>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman
>>>>> <jordan@jordanzimmerman.com> wrote:
>>>>>> BTW - this is 3.3.5
>>>>>>
>>>>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman <jordan@jordanzimmerman.com>
wrote:
>>>>>>
>>>>>>> We've had a few outages of our ZK cluster recently. When trying
to bring the cluster back up it's been taking 10-15 minutes for the followers to sync with
the Leader. Any idea what might cause this? Here's an ls of the data dir:
>>>>>>>
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 20:39
log.3900a4bc75
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 20:40
log.3900a634ee
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 21:21
log.3a00000001
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 21:22
log.3a000139a2
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 9279729723 Jul 31 20:42
snapshot.3900a634ec
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09
snapshot.3900a6b149
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 4153727423 Jul 31 21:22
snapshot.3a000139a0
>>>>>>>
>>>>>>
>>>>
>>