I believe there is a bug in this commit. The "cnxns" size() call isnot being synchronized. This will lead to invalid results at best, atworst outright failure (hard to say w/o knowing the implementation ofHashSet).

Camille can you work with Neha to get this fixed? Perhaps in themeantime (if it's going to take a while) you can revert this change,re-open the jira, update the patch, and reapply at some later time?

The implementation of HashSet in jdk 1.6 uses an underlying HashMap, whichuses a bog-standard int for the size. So, I completely agree that we canget invalid results for a point in time. But does anyone really care aboutthe *exact* moment-in-time number of cnxns in the system, in a system wherecnxns are coming and going? At best you'll see a value that will likely beout of date the moment you read it.

But, if we're going to do this, I do think there may be a case forrefactoring to a concurrent data structure given that we are randomlysyncing on the cnxns set in a way that makes adding additional interactionswith it error prone in this way. And we should definitely remove thatgetConnections method, if anyone ever iterated over that set they would bein a sad concurrency situation.Seems possibly worth a whole new ticket to do this. What do you think?

Also, Pat, the JIRA is still open. I will look to revert in a minutealthough I think the sync changes may necessitate a new JIRA entirely.

On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:> After looking for a few mins, here are my observations:>> The implementation of HashSet in jdk 1.6 uses an underlying HashMap, which> uses a bog-standard int for the size. So, I completely agree that we can> get invalid results for a point in time. But does anyone really care about> the *exact* moment-in-time number of cnxns in the system, in a system where> cnxns are coming and going? At best you'll see a value that will likely be> out of date the moment you read it.>> But, if we're going to do this, I do think there may be a case for> refactoring to a concurrent data structure given that we are randomly> syncing on the cnxns set in a way that makes adding additional interactions> with it error prone in this way. And we should definitely remove that> getConnections method, if anyone ever iterated over that set they would be> in a sad concurrency situation.> Seems possibly worth a whole new ticket to do this. What do you think?>> Thanks,> C>> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <[EMAIL PROTECTED]>wrote:>>> I'm not sure it's meaningful enough to be worth the sync overhead. We>> should look into that. The alternative if we must is using a proper>> concurrent collection. Neha, any thoughts?>>>> C>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>>>>> I believe there is a bug in this commit. The "cnxns" size() call is>>> not being synchronized. This will lead to invalid results at best, at>>> worst outright failure (hard to say w/o knowing the implementation of>>> HashSet).>>>>>> Camille can you work with Neha to get this fixed? Perhaps in the>>> meantime (if it's going to take a while) you can revert this change,>>> re-open the jira, update the patch, and reapply at some later time?>>>>>> Patrick>>>>>> On Wed, Dec 28, 2011 at 6:55 AM, <[EMAIL PROTECTED]> wrote:>>> > Author: camille>>> > Date: Wed Dec 28 14:55:37 2011>>> > New Revision: 1225200>>> >>>> > URL: http://svn.apache.org/viewvc?rev=1225200&view=rev>>> > Log:>>> > ZOOKEEPER-1321: Add number of client connections metric in JMX and srvr>>> (Neha Narkhede via camille)>>> >>>> > Modified:>>> > zookeeper/trunk/CHANGES.txt>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerStats.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerBean.java>>> >>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMXBean.java>>> >>>> zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java>>> >>>> zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FourLetterWordsTest.java>>> >>>> > Modified: zookeeper/trunk/CHANGES.txt>>> > URL:>>> http://svn.apache.org/viewvc/zookeeper/trunk/CHANGES.txt?rev=1225200&r1=1225199&r2=1225200&view=diff>>> >>>> =============================================================================>>> > --- zookeeper/trunk/CHANGES.txt (original)>>> > +++ zookeeper/trunk/CHANGES.txt Wed Dec 28 14:55:37 2011>>> > @@ -162,6 +162,8 @@ IMPROVEMENTS:>>> >>>> > ZOOKEEPER-1342. quorum Listener & LearnerCnxAcceptor are missing>>> > thread names (Rakesh R via phunt)>>> > +>>> > + ZOOKEEPER-1321. Add number of client connections metric in JMX and

I'd recommend a new ticket to fix the issues you identified (refactor,etc...) Neha was looking for further items to work on, this would be agood one.

Patrick

On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <[EMAIL PROTECTED]> wrote:> Also, Pat, the JIRA is still open. I will look to revert in a minute> although I think the sync changes may necessitate a new JIRA entirely.>> On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:>> After looking for a few mins, here are my observations:>>>> The implementation of HashSet in jdk 1.6 uses an underlying HashMap, which>> uses a bog-standard int for the size. So, I completely agree that we can>> get invalid results for a point in time. But does anyone really care about>> the *exact* moment-in-time number of cnxns in the system, in a system where>> cnxns are coming and going? At best you'll see a value that will likely be>> out of date the moment you read it.>>>> But, if we're going to do this, I do think there may be a case for>> refactoring to a concurrent data structure given that we are randomly>> syncing on the cnxns set in a way that makes adding additional interactions>> with it error prone in this way. And we should definitely remove that>> getConnections method, if anyone ever iterated over that set they would be>> in a sad concurrency situation.>> Seems possibly worth a whole new ticket to do this. What do you think?>>>> Thanks,>> C>>>> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <[EMAIL PROTECTED]>wrote:>>>>> I'm not sure it's meaningful enough to be worth the sync overhead. We>>> should look into that. The alternative if we must is using a proper>>> concurrent collection. Neha, any thoughts?>>>>>> C>>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>>>>>>> I believe there is a bug in this commit. The "cnxns" size() call is>>>> not being synchronized. This will lead to invalid results at best, at>>>> worst outright failure (hard to say w/o knowing the implementation of>>>> HashSet).>>>>>>>> Camille can you work with Neha to get this fixed? Perhaps in the>>>> meantime (if it's going to take a while) you can revert this change,>>>> re-open the jira, update the patch, and reapply at some later time?>>>>>>>> Patrick>>>>>>>> On Wed, Dec 28, 2011 at 6:55 AM, <[EMAIL PROTECTED]> wrote:>>>> > Author: camille>>>> > Date: Wed Dec 28 14:55:37 2011>>>> > New Revision: 1225200>>>> >>>>> > URL: http://svn.apache.org/viewvc?rev=1225200&view=rev>>>> > Log:>>>> > ZOOKEEPER-1321: Add number of client connections metric in JMX and srvr>>>> (Neha Narkhede via camille)>>>> >>>>> > Modified:>>>> > zookeeper/trunk/CHANGES.txt>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerStats.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerBean.java>>>> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMXBean.java>>>> >>>>> zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java>>>> >>>>> zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FourLetterWordsTest.java

> I think it needs to be fixed. It's obviously incorrect. Also if> someone changes the underlying implementation at some point it might> bite them.>> Given it wasn't applied to 3.4 branch yet I'd suggest revert, fix the> patch, then reapply to both branches. (that's easiest/bulletproof imo)>> I'd recommend a new ticket to fix the issues you identified (refactor,> etc...) Neha was looking for further items to work on, this would be a> good one.>> Patrick>> On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <[EMAIL PROTECTED]>> wrote:> > Also, Pat, the JIRA is still open. I will look to revert in a minute> > although I think the sync changes may necessitate a new JIRA entirely.> >> > On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]>> wrote:> >> After looking for a few mins, here are my observations:> >>> >> The implementation of HashSet in jdk 1.6 uses an underlying HashMap,> which> >> uses a bog-standard int for the size. So, I completely agree that we can> >> get invalid results for a point in time. But does anyone really care> about> >> the *exact* moment-in-time number of cnxns in the system, in a system> where> >> cnxns are coming and going? At best you'll see a value that will likely> be> >> out of date the moment you read it.> >>> >> But, if we're going to do this, I do think there may be a case for> >> refactoring to a concurrent data structure given that we are randomly> >> syncing on the cnxns set in a way that makes adding additional> interactions> >> with it error prone in this way. And we should definitely remove that> >> getConnections method, if anyone ever iterated over that set they would> be> >> in a sad concurrency situation.> >> Seems possibly worth a whole new ticket to do this. What do you think?> >>> >> Thanks,> >> C> >>> >> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <[EMAIL PROTECTED]> >wrote:> >>> >>> I'm not sure it's meaningful enough to be worth the sync overhead. We> >>> should look into that. The alternative if we must is using a proper> >>> concurrent collection. Neha, any thoughts?> >>>> >>> C> >>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:> >>>> >>>> I believe there is a bug in this commit. The "cnxns" size() call is> >>>> not being synchronized. This will lead to invalid results at best, at> >>>> worst outright failure (hard to say w/o knowing the implementation of> >>>> HashSet).> >>>>> >>>> Camille can you work with Neha to get this fixed? Perhaps in the> >>>> meantime (if it's going to take a while) you can revert this change,> >>>> re-open the jira, update the patch, and reapply at some later time?> >>>>> >>>> Patrick> >>>>> >>>> On Wed, Dec 28, 2011 at 6:55 AM, <[EMAIL PROTECTED]> wrote:> >>>> > Author: camille> >>>> > Date: Wed Dec 28 14:55:37 2011> >>>> > New Revision: 1225200> >>>> >> >>>> > URL: http://svn.apache.org/viewvc?rev=1225200&view=rev> >>>> > Log:> >>>> > ZOOKEEPER-1321: Add number of client connections metric in JMX and> srvr> >>>> (Neha Narkhede via camille)> >>>> >> >>>> > Modified:> >>>> > zookeeper/trunk/CHANGES.txt> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerStats.java

On Wed, Dec 28, 2011 at 12:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote:> w.r.t. getConnections() method, I think we can keep it - just clone a new> Set and return the clone.> In javadoc, we document that getConnections() incurs additional cost in> cloning.

Sounds reasonable but given there are no callers of this function whykeep it? I'd still rather we remove it given it's dead code. Perhapsthere are locations where we should be using and aren't?

>> On Wed, Dec 28, 2011 at 11:24 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:>>> I think it needs to be fixed. It's obviously incorrect. Also if>> someone changes the underlying implementation at some point it might>> bite them.>>>> Given it wasn't applied to 3.4 branch yet I'd suggest revert, fix the>> patch, then reapply to both branches. (that's easiest/bulletproof imo)>>>> I'd recommend a new ticket to fix the issues you identified (refactor,>> etc...) Neha was looking for further items to work on, this would be a>> good one.>>>> Patrick>>>> On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> > Also, Pat, the JIRA is still open. I will look to revert in a minute>> > although I think the sync changes may necessitate a new JIRA entirely.>> >>> > On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> >> After looking for a few mins, here are my observations:>> >>>> >> The implementation of HashSet in jdk 1.6 uses an underlying HashMap,>> which>> >> uses a bog-standard int for the size. So, I completely agree that we can>> >> get invalid results for a point in time. But does anyone really care>> about>> >> the *exact* moment-in-time number of cnxns in the system, in a system>> where>> >> cnxns are coming and going? At best you'll see a value that will likely>> be>> >> out of date the moment you read it.>> >>>> >> But, if we're going to do this, I do think there may be a case for>> >> refactoring to a concurrent data structure given that we are randomly>> >> syncing on the cnxns set in a way that makes adding additional>> interactions>> >> with it error prone in this way. And we should definitely remove that>> >> getConnections method, if anyone ever iterated over that set they would>> be>> >> in a sad concurrency situation.>> >> Seems possibly worth a whole new ticket to do this. What do you think?>> >>>> >> Thanks,>> >> C>> >>>> >> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <[EMAIL PROTECTED]>> >wrote:>> >>>> >>> I'm not sure it's meaningful enough to be worth the sync overhead. We>> >>> should look into that. The alternative if we must is using a proper>> >>> concurrent collection. Neha, any thoughts?>> >>>>> >>> C>> >>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >>>>> >>>> I believe there is a bug in this commit. The "cnxns" size() call is>> >>>> not being synchronized. This will lead to invalid results at best, at>> >>>> worst outright failure (hard to say w/o knowing the implementation of>> >>>> HashSet).>> >>>>>> >>>> Camille can you work with Neha to get this fixed? Perhaps in the>> >>>> meantime (if it's going to take a while) you can revert this change,>> >>>> re-open the jira, update the patch, and reapply at some later time?>> >>>>>> >>>> Patrick>> >>>>>> >>>> On Wed, Dec 28, 2011 at 6:55 AM, <[EMAIL PROTECTED]> wrote:>> >>>> > Author: camille>> >>>> > Date: Wed Dec 28 14:55:37 2011>> >>>> > New Revision: 1225200>> >>>> >>> >>>> > URL: http://svn.apache.org/viewvc?rev=1225200&view=rev>> >>>> > Log:>> >>>> > ZOOKEEPER-1321: Add number of client connections metric in JMX and>> srvr>> >>>> (Neha Narkhede via camille)>> >>>> >>> >>>> > Modified:>> >>>> > zookeeper/trunk/CHANGES.txt>> >>>> >>> >>>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java>> >>>> >>> >>>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java>> >>>> >>> >>

I don't think creating it as a flat sync is a good idea, so if we want thisI think we need to refactor that structure to be concurrent.

COn Dec 28, 2011 2:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:

> I think it needs to be fixed. It's obviously incorrect. Also if> someone changes the underlying implementation at some point it might> bite them.>> Given it wasn't applied to 3.4 branch yet I'd suggest revert, fix the> patch, then reapply to both branches. (that's easiest/bulletproof imo)>> I'd recommend a new ticket to fix the issues you identified (refactor,> etc...) Neha was looking for further items to work on, this would be a> good one.>> Patrick>> On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <[EMAIL PROTECTED]>> wrote:> > Also, Pat, the JIRA is still open. I will look to revert in a minute> > although I think the sync changes may necessitate a new JIRA entirely.> >> > On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]>> wrote:> >> After looking for a few mins, here are my observations:> >>> >> The implementation of HashSet in jdk 1.6 uses an underlying HashMap,> which> >> uses a bog-standard int for the size. So, I completely agree that we can> >> get invalid results for a point in time. But does anyone really care> about> >> the *exact* moment-in-time number of cnxns in the system, in a system> where> >> cnxns are coming and going? At best you'll see a value that will likely> be> >> out of date the moment you read it.> >>> >> But, if we're going to do this, I do think there may be a case for> >> refactoring to a concurrent data structure given that we are randomly> >> syncing on the cnxns set in a way that makes adding additional> interactions> >> with it error prone in this way. And we should definitely remove that> >> getConnections method, if anyone ever iterated over that set they would> be> >> in a sad concurrency situation.> >> Seems possibly worth a whole new ticket to do this. What do you think?> >>> >> Thanks,> >> C> >>> >> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <[EMAIL PROTECTED]> >wrote:> >>> >>> I'm not sure it's meaningful enough to be worth the sync overhead. We> >>> should look into that. The alternative if we must is using a proper> >>> concurrent collection. Neha, any thoughts?> >>>> >>> C> >>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:> >>>> >>>> I believe there is a bug in this commit. The "cnxns" size() call is> >>>> not being synchronized. This will lead to invalid results at best, at> >>>> worst outright failure (hard to say w/o knowing the implementation of> >>>> HashSet).> >>>>> >>>> Camille can you work with Neha to get this fixed? Perhaps in the> >>>> meantime (if it's going to take a while) you can revert this change,> >>>> re-open the jira, update the patch, and reapply at some later time?> >>>>> >>>> Patrick> >>>>> >>>> On Wed, Dec 28, 2011 at 6:55 AM, <[EMAIL PROTECTED]> wrote:> >>>> > Author: camille> >>>> > Date: Wed Dec 28 14:55:37 2011> >>>> > New Revision: 1225200> >>>> >> >>>> > URL: http://svn.apache.org/viewvc?rev=1225200&view=rev> >>>> > Log:> >>>> > ZOOKEEPER-1321: Add number of client connections metric in JMX and> srvr> >>>> (Neha Narkhede via camille)> >>>> >> >>>> > Modified:> >>>> > zookeeper/trunk/CHANGES.txt> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerStats.java> >>>> >> >>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java

They seem like distinct changes to me. In particular getting the sizeis going to happen infrequently (monitoring pull) so I don't see aproblem fixing the existing patch in the same way the code currentlyhandles cnxns access, with a separate jira to do the refactoring. Am Imissing something?

Patrick

On Wed, Dec 28, 2011 at 1:10 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:> I don't think creating it as a flat sync is a good idea, so if we want this> I think we need to refactor that structure to be concurrent.>> C> On Dec 28, 2011 2:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>>> I think it needs to be fixed. It's obviously incorrect. Also if>> someone changes the underlying implementation at some point it might>> bite them.>>>> Given it wasn't applied to 3.4 branch yet I'd suggest revert, fix the>> patch, then reapply to both branches. (that's easiest/bulletproof imo)>>>> I'd recommend a new ticket to fix the issues you identified (refactor,>> etc...) Neha was looking for further items to work on, this would be a>> good one.>>>> Patrick>>>> On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> > Also, Pat, the JIRA is still open. I will look to revert in a minute>> > although I think the sync changes may necessitate a new JIRA entirely.>> >>> > On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> >> After looking for a few mins, here are my observations:>> >>>> >> The implementation of HashSet in jdk 1.6 uses an underlying HashMap,>> which>> >> uses a bog-standard int for the size. So, I completely agree that we can>> >> get invalid results for a point in time. But does anyone really care>> about>> >> the *exact* moment-in-time number of cnxns in the system, in a system>> where>> >> cnxns are coming and going? At best you'll see a value that will likely>> be>> >> out of date the moment you read it.>> >>>> >> But, if we're going to do this, I do think there may be a case for>> >> refactoring to a concurrent data structure given that we are randomly>> >> syncing on the cnxns set in a way that makes adding additional>> interactions>> >> with it error prone in this way. And we should definitely remove that>> >> getConnections method, if anyone ever iterated over that set they would>> be>> >> in a sad concurrency situation.>> >> Seems possibly worth a whole new ticket to do this. What do you think?>> >>>> >> Thanks,>> >> C>> >>>> >> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <[EMAIL PROTECTED]>> >wrote:>> >>>> >>> I'm not sure it's meaningful enough to be worth the sync overhead. We>> >>> should look into that. The alternative if we must is using a proper>> >>> concurrent collection. Neha, any thoughts?>> >>>>> >>> C>> >>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >>>>> >>>> I believe there is a bug in this commit. The "cnxns" size() call is>> >>>> not being synchronized. This will lead to invalid results at best, at>> >>>> worst outright failure (hard to say w/o knowing the implementation of>> >>>> HashSet).>> >>>>>> >>>> Camille can you work with Neha to get this fixed? Perhaps in the>> >>>> meantime (if it's going to take a while) you can revert this change,>> >>>> re-open the jira, update the patch, and reapply at some later time?>> >>>>>> >>>> Patrick>> >>>>>> >>>> On Wed, Dec 28, 2011 at 6:55 AM, <[EMAIL PROTECTED]> wrote:>> >>>> > Author: camille>> >>>> > Date: Wed Dec 28 14:55:37 2011>> >>>> > New Revision: 1225200>> >>>> >>> >>>> > URL: http://svn.apache.org/viewvc?rev=1225200&view=rev>> >>>> > Log:>> >>>> > ZOOKEEPER-1321: Add number of client connections metric in JMX and>> srvr>> >>>> (Neha Narkhede via camille)>> >>>> >>> >>>> > Modified:>> >>>> > zookeeper/trunk/CHANGES.txt>> >>>> >>> >>>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java>> >>>> >>> >>>>>> zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java

Well that would be bad news then - see "stat" command processing andsimilar. All the more reason to look at this in more depth.

Patrick> On Dec 28, 2011 4:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>>> They seem like distinct changes to me. In particular getting the size>> is going to happen infrequently (monitoring pull) so I don't see a>> problem fixing the existing patch in the same way the code currently>> handles cnxns access, with a separate jira to do the refactoring. Am I>> missing something?>>>> Patrick>>>> On Wed, Dec 28, 2011 at 1:10 PM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> > I don't think creating it as a flat sync is a good idea, so if we want>> this>> > I think we need to refactor that structure to be concurrent.>> >>> > C>> > On Dec 28, 2011 2:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >>> >> I think it needs to be fixed. It's obviously incorrect. Also if>> >> someone changes the underlying implementation at some point it might>> >> bite them.>> >>>> >> Given it wasn't applied to 3.4 branch yet I'd suggest revert, fix the>> >> patch, then reapply to both branches. (that's easiest/bulletproof imo)>> >>>> >> I'd recommend a new ticket to fix the issues you identified (refactor,>> >> etc...) Neha was looking for further items to work on, this would be a>> >> good one.>> >>>> >> Patrick>> >>>> >> On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <[EMAIL PROTECTED]>>> >> wrote:>> >> > Also, Pat, the JIRA is still open. I will look to revert in a minute>> >> > although I think the sync changes may necessitate a new JIRA entirely.>> >> >>> >> > On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <[EMAIL PROTECTED]>> >>> >> wrote:>> >> >> After looking for a few mins, here are my observations:>> >> >>>> >> >> The implementation of HashSet in jdk 1.6 uses an underlying HashMap,>> >> which>> >> >> uses a bog-standard int for the size. So, I completely agree that we>> can>> >> >> get invalid results for a point in time. But does anyone really care>> >> about>> >> >> the *exact* moment-in-time number of cnxns in the system, in a system>> >> where>> >> >> cnxns are coming and going? At best you'll see a value that will>> likely>> >> be>> >> >> out of date the moment you read it.>> >> >>>> >> >> But, if we're going to do this, I do think there may be a case for>> >> >> refactoring to a concurrent data structure given that we are randomly>> >> >> syncing on the cnxns set in a way that makes adding additional>> >> interactions>> >> >> with it error prone in this way. And we should definitely remove that>> >> >> getConnections method, if anyone ever iterated over that set they>> would>> >> be>> >> >> in a sad concurrency situation.>> >> >> Seems possibly worth a whole new ticket to do this. What do you>> think?>> >> >>>> >> >> Thanks,>> >> >> C>> >> >>>> >> >> On Wed, Dec 28, 2011 at 1:12 PM, Camille Fournier <>> [EMAIL PROTECTED]>> >> >wrote:>> >> >>>> >> >>> I'm not sure it's meaningful enough to be worth the sync overhead.>> We>> >> >>> should look into that. The alternative if we must is using a proper>> >> >>> concurrent collection. Neha, any thoughts?>> >> >>>>> >> >>> C>> >> >>> On Dec 28, 2011 12:45 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >> >>>>> >> >>>> I believe there is a bug in this commit. The "cnxns" size() call is>> >> >>>> not being synchronized. This will lead to invalid results at best,>> at>> >> >>>> worst outright failure (hard to say w/o knowing the implementation>> of>> >> >>>> HashSet).>> >> >>>>>> >> >>>> Camille can you work with Neha to get this fixed? Perhaps in the>> >> >>>> meantime (if it's going to take a while) you can revert this>> change,>> >> >>>> re-open the jira, update the patch, and reapply at some later time?

What would you like to do about these? I had proposed having aseparate 4lw port some time back in order to simplify/isolate. Shouldwe do this for 3.5.0? (deprecate the old port usage but keep it aroundfor a while) Something else?

Patrick

> On Wed, Dec 28, 2011 at 4:32 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:>>> On Wed, Dec 28, 2011 at 1:26 PM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> > I'm not sure I agree in the assumption that monitoring pulls happen>> > infrequently...>>>> Well that would be bad news then - see "stat" command processing and>> similar. All the more reason to look at this in more depth.>>>> Patrick>>>>>> > On Dec 28, 2011 4:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >>> >> They seem like distinct changes to me. In particular getting the size>> >> is going to happen infrequently (monitoring pull) so I don't see a>> >> problem fixing the existing patch in the same way the code currently>> >> handles cnxns access, with a separate jira to do the refactoring. Am I>> >> missing something?>> >>>> >> Patrick>> >>>> >> On Wed, Dec 28, 2011 at 1:10 PM, Camille Fournier <[EMAIL PROTECTED]>>> >> wrote:>> >> > I don't think creating it as a flat sync is a good idea, so if we want>> >> this>> >> > I think we need to refactor that structure to be concurrent.>> >> >>> >> > C>> >> > On Dec 28, 2011 2:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >> >>> >> >> I think it needs to be fixed. It's obviously incorrect. Also if>> >> >> someone changes the underlying implementation at some point it might>> >> >> bite them.>> >> >>>> >> >> Given it wasn't applied to 3.4 branch yet I'd suggest revert, fix the>> >> >> patch, then reapply to both branches. (that's easiest/bulletproof>> imo)>> >> >>>> >> >> I'd recommend a new ticket to fix the issues you identified>> (refactor,>> >> >> etc...) Neha was looking for further items to work on, this would be>> a>> >> >> good one.>> >> >>>> >> >> Patrick>> >> >>>> >> >> On Wed, Dec 28, 2011 at 11:18 AM, Camille Fournier <>> [EMAIL PROTECTED]>>> >> >> wrote:>> >> >> > Also, Pat, the JIRA is still open. I will look to revert in a>> minute>> >> >> > although I think the sync changes may necessitate a new JIRA>> entirely.>> >> >> >>> >> >> > On Wed, Dec 28, 2011 at 1:31 PM, Camille Fournier <>> [EMAIL PROTECTED]>> >> >>> >> >> wrote:>> >> >> >> After looking for a few mins, here are my observations:>> >> >> >>>> >> >> >> The implementation of HashSet in jdk 1.6 uses an underlying>> HashMap,>> >> >> which>> >> >> >> uses a bog-standard int for the size. So, I completely agree that>> we>> >> can>> >> >> >> get invalid results for a point in time. But does anyone really>> care>> >> >> about>> >> >> >> the *exact* moment-in-time number of cnxns in the system, in a>> system>> >> >> where>> >> >> >> cnxns are coming and going? At best you'll see a value that will>> >> likely>> >> >> be>> >> >> >> out of date the moment you read it.>> >> >> >>>> >> >> >> But, if we're going to do this, I do think there may be a case for>> >> >> >> refactoring to a concurrent data structure given that we are>> randomly>> >> >> >> syncing on the cnxns set in a way that makes adding additional>> >> >> interactions>> >> >> >> with it error prone in this way. And we should definitely remove

That is a very good question. This should definitely be fixed in 3.5. Ithink your general idea in that tracker comment is right on:

One option that would make this a lot less ugly would be to deprecate> support for 4letterwords on the client port in 3.5.0 (dep, not remove,> although provide an option to turn off via config), we'd designate a new> port specifically for 4letterwords. We'd fix this problem on the new port,> the old port would have the existing limitations.>> 3.6.0 we would remove 4lw on client port entirely.>> This is good for a number of reasons IMO – one is that having 4lw on the> client port is a bit of a security issue in some customers - as any client> has access to this port and it cannot by definition be firewalled. Many> admins would like to firewall this.>> In the process we could:> 1) make the new port fully b/w compatible with the existing functionality> 2) enable long lived sessions rather than polling> 3) provide support for "extended command syntax" which would enhance 4lw> features (for example to control the format of the output)> 4) etc...>

Especially the points 1-4... we need this to be a handshake model, not justa connect and dump model.I want to work on this, but I will probably need someone a bit better atpython to rewrite your monitoring server to use it.I would also welcome help on this issue, if there's someone else isinterested in tackling it, I am happy to provide feedback and guidance.

I would suggest start clean, "link" the new jira to the old jiras, butmake the new description a useful summary of what this umbrella jirashould do and the issues faced. I say umbrella because we should splitit into sub-parts, as the whole will be relatively big chunk of work.

If you could also add a sub-task right off the bat to address thecnxns issue I think that would be helpful as well.

Patrick

On Wed, Dec 28, 2011 at 2:21 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:> :)>> That is a very good question. This should definitely be fixed in 3.5. I> think your general idea in that tracker comment is right on:>> One option that would make this a lot less ugly would be to deprecate>> support for 4letterwords on the client port in 3.5.0 (dep, not remove,>> although provide an option to turn off via config), we'd designate a new>> port specifically for 4letterwords. We'd fix this problem on the new port,>> the old port would have the existing limitations.>>>> 3.6.0 we would remove 4lw on client port entirely.>>>> This is good for a number of reasons IMO – one is that having 4lw on the>> client port is a bit of a security issue in some customers - as any client>> has access to this port and it cannot by definition be firewalled. Many>> admins would like to firewall this.>>>> In the process we could:>> 1) make the new port fully b/w compatible with the existing functionality>> 2) enable long lived sessions rather than polling>> 3) provide support for "extended command syntax" which would enhance 4lw>> features (for example to control the format of the output)>> 4) etc...>>>> Especially the points 1-4... we need this to be a handshake model, not just> a connect and dump model.> I want to work on this, but I will probably need someone a bit better at> python to rewrite your monitoring server to use it.> I would also welcome help on this issue, if there's someone else is> interested in tackling it, I am happy to provide feedback and guidance.>> Should I start clean and make a new tracker or fold it into 1197?>> C>> On Wed, Dec 28, 2011 at 5:12 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:>>> On Wed, Dec 28, 2011 at 1:53 PM, Camille Fournier <[EMAIL PROTECTED]>>> wrote:>> > Fair enough. Syncs for all! It's a drop in the bucket compared to the>> fact>> > that we aren't handling the fundamental socket connections for 4lws>> > correctly anyway.>>>> sorry. I see a blog post coming on. ;-)>>>> You're referring to the following (any more?)>> https://issues.apache.org/jira/browse/ZOOKEEPER-1197>> https://issues.apache.org/jira/browse/ZOOKEEPER-1237>>>> is this comment still a correct interpretation of the issue?>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-737?focusedCommentId=13109067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13109067>>>> What would you like to do about these? I had proposed having a>> separate 4lw port some time back in order to simplify/isolate. Should>> we do this for 3.5.0? (deprecate the old port usage but keep it around>> for a while) Something else?>>>> Patrick>>>> > On Wed, Dec 28, 2011 at 4:32 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:>> >>> >> On Wed, Dec 28, 2011 at 1:26 PM, Camille Fournier <[EMAIL PROTECTED]>>> >> wrote:>> >> > I'm not sure I agree in the assumption that monitoring pulls happen>> >> > infrequently...>> >>>> >> Well that would be bad news then - see "stat" command processing and>> >> similar. All the more reason to look at this in more depth.>> >>>> >> Patrick>> >>>> >>>> >> > On Dec 28, 2011 4:24 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:>> >> >>> >> >> They seem like distinct changes to me. In particular getting the size>> >> >> is going to happen infrequently (monitoring pull) so I don't see a>> >> >> problem fixing the existing patch in the same way the code currently>> >> >> handles cnxns access, with a separate jira to do the refactoring. Am>> I>> >> >> missing something?

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext