This issue appears to be a configuration problem:1. HBase client uses NIO(socket) API that uses the directmemory.2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all directmemory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

Thank you all. I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. And we have known the content of the problem memory section, all the records contains the info like below:"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||""BBZHtable_UFDR_058,048342220093168-02570"........

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Could you (or somebody) point me to where the client is using NIO?I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off toseparate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.

This issue appears to be a configuration problem:1. HBase client uses NIO(socket) API that uses the directmemory.2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all directmemory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

Thank you all. I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. And we have known the content of the problem memory section, all the records contains the info like below:"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";"BBZHtable_UFDR_058,048342220093168-02570"........

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Could you (or somebody) point me to where the client is using NIO?I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off toseparate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.

This issue appears to be a configuration problem:1. HBase client uses NIO(socket) API that uses the directmemory.2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all directmemory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

Thank you all. I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. And we have known the content of the problem memory section, all the records contains the info like below:"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";"BBZHtable_UFDR_058,048342220093168-02570"........

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Could you (or somebody) point me to where the client is using NIO?I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off toseparate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.

This issue appears to be a configuration problem:1. HBase client uses NIO(socket) API that uses the directmemory.2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all directmemory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

Thank you all. I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. And we have known the content of the problem memory section, all the records contains the info like below:"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";"BBZHtable_UFDR_058,048342220093168-02570"........

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Did Gaojinchao attached the stack dump people received it (Lars?).Could some one or Gaojinchao attach it to the jira.

-Shrijeet

On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:>> Thanks. Now the question is: How many connection threads do we have?>> I think there is one per regionserver, which would indeed be a problem.> Need to look at the code again (I'm only partially familiar with the client code).>> Either the client should chunk (like the server does), or there should be a limited number of thread that> perform IO on behalf of the client (or both).>> -- Lars>>> ----- Original Message -----> From: Gaojinchao <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>> Cc: Chenjian <[EMAIL PROTECTED]>; wenzaohua <[EMAIL PROTECTED]>> Sent: Saturday, December 3, 2011 11:22 PM> Subject: Re: FeedbackRe: Suspectedmemoryleak>> This is dump stack.>>> -----邮件原件-----> 发件人: lars hofhansl [mailto:[EMAIL PROTECTED]]> 发送时间: 2011年12月4日 14:15> 收件人: [EMAIL PROTECTED]> 抄送: Chenjian; wenzaohua> 主题: Re: FeedbackRe: Suspectedmemoryleak>> Dropping user list.>> Could you (or somebody) point me to where the client is using NIO?> I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off to> separate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.>> It's possible (likely?) that I missed something in the code.>> Thanks.>> -- Lars>> ________________________________> From: Gaojinchao <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: Chenjian <[EMAIL PROTECTED]>; wenzaohua <[EMAIL PROTECTED]>> Sent: Saturday, December 3, 2011 7:57 PM> Subject: FeedbackRe: Suspectedmemoryleak>> Thank you for your help.>> This issue appears to be a configuration problem:> 1. HBase client uses NIO(socket) API that uses the directmemory.> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all directmemory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".>> This is only a preliminary result, All tests is running, If have any further results , we will be fed back.> Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633.>> If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?>>> Thanks>> -----邮件原件-----> 发件人: bijieshan [mailto:[EMAIL PROTECTED]]> 发送时间: 2011年12月2日 15:37> 收件人: [EMAIL PROTECTED]; [EMAIL PROTECTED]> 抄送: Chenjian; wenzaohua> 主题: Re: Suspectedmemoryleak>> Thank you all.> I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug.> And we have known the content of the problem memory section, all the records contains the info like below:> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";> "BBZHtable_UFDR_058,048342220093168-02570"> ........>> Jieshan.>> -----邮件原件-----> 发件人: Kihwal Lee [mailto:[EMAIL PROTECTED]]> 发送时间: 2011年12月2日 4:20> 收件人: [EMAIL PROTECTED]> 抄送: Ramakrishna s vasudevan; [EMAIL PROTECTED]> 主题: Re: Suspectedmemoryleak>> Adding to the excellent write-up by Jonathan:> Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because0.90.5 hasn't been released.Assuming the NIO consumption is related to the number of connections fromclient side, it would help to perform benchmarking on 0.90.5

Jinchao:Please attach stack trace to HBASE-4633 so that we can verify ourassumptions.

I think HBASE-4508 is unrelated.The "connections" I referring to are HBaseClient.Connection objects (not HConnections).It turns out that HBaseClient.Connection.setParam is actually called directly by the client threads, which means we can getan unlimited amount of DirectByteBuffers (until we get a full GC).

The JDK will cache 3 per thread with a size necessary to serve the IO. So sending some large requests from many threadwill lead to OOM.

I think that was a related thread that Stack forwarded a while back from the asynchbase mailing lists.

Jinchao, could you add a text version (not a png image, please :-) ) of this to the jira?-- Lars

I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because0.90.5 hasn't been released.Assuming the NIO consumption is related to the number of connections fromclient side, it would help to perform benchmarking on 0.90.5

Jinchao:Please attach stack trace to HBASE-4633 so that we can verify ourassumptions.

> Thanks. Now the question is: How many connection threads do we have?>> I think there is one per regionserver, which would indeed be a problem.> Need to look at the code again (I'm only partially familiar with the> client code).>> Either the client should chunk (like the server does), or there should be> a limited number of thread that> perform IO on behalf of the client (or both).>> -- Lars>>> ----- Original Message -----> From: Gaojinchao <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <> [EMAIL PROTECTED]>> Cc: Chenjian <[EMAIL PROTECTED]>; wenzaohua <[EMAIL PROTECTED]>> Sent: Saturday, December 3, 2011 11:22 PM> Subject: Re: FeedbackRe: Suspectedmemoryleak>> This is dump stack.>>> -----邮件原件-----> 发件人: lars hofhansl [mailto:[EMAIL PROTECTED]]> 发送时间: 2011年12月4日 14:15> 收件人: [EMAIL PROTECTED]> 抄送: Chenjian; wenzaohua> 主题: Re: FeedbackRe: Suspectedmemoryleak>> Dropping user list.>> Could you (or somebody) point me to where the client is using NIO?> I'm looking at HBaseClient and I do not see references to NIO, also it> seems that all work is handed off to> separate threads: HBaseClient.Connection, and the JDK will not cache more> than 3 direct buffers per thread.>> It's possible (likely?) that I missed something in the code.>> Thanks.>> -- Lars>> ________________________________> From: Gaojinchao <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]"> <[EMAIL PROTECTED]>> Cc: Chenjian <[EMAIL PROTECTED]>; wenzaohua <[EMAIL PROTECTED]>> Sent: Saturday, December 3, 2011 7:57 PM> Subject: FeedbackRe: Suspectedmemoryleak>> Thank you for your help.>> This issue appears to be a configuration problem:> 1. HBase client uses NIO(socket) API that uses the directmemory.> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if> there doesn't have "full gc", all directmemory can't reclaim.> Unfortunately, using GC confiugre parameter of our client doesn't produce> any "full gc".>> This is only a preliminary result, All tests is running, If have any> further results , we will be fed back.> Finally , I will update our story to issue> https://issues.apache.org/jira/browse/HBASE-4633.>> If our digging is crrect, whether we should set a default value for the> "-XXMaxDirectMemorySize" to prevent this situation?

Second, you may wish to experiment with "-XX:+UseParallelGC -XX:+UseParallelOldGC" rather than CMS GC. I have been trying this recently on some of my app servers and hadoop servers, and it certainly does fix the problem of non-Java heap growth. The concern with parallel GC is that full GCs (which are the solution to the non-heap memory problem, it would seem) take too long. Personally, I consider this reasoning fallacious, since full GC is bound to occur sooner or later, and when using the CMS GC with this bug in effect, they can be fatal (and even without this bug, CMS uses a single thread for a full GC AFAIK). The numbers for parallel GC on a 2G heap are not terrible, even without tuning, even with old processors (max pause 2.8 sec, avg pause 1 sec for a full GC, with minor collections outnumbering the major at least 3:1, total overhead 1.3%). If your application can tolerate a second or two of latency once in a while, you can switch to parallelOldGC and call it a day.

The fact that some installations are trying to deal with ~24GB heaps sounds like a design issue to me; HBase and Hadoop are already designed to scale horizontally, and this emphasis on scaling vertically just because the hardware comes in a certain size sounds misguided. But not having that hardware, I might be missing something.

Finally, you might look at changing the vm.swappiness parameter in the Linux kernel (I think it's in sysctl.conf). I have set swappiness to 0 for my servers, and I'm happy with it. I don't know the exact mechanism, but it certainly appears that there's a memory pressure feedback of some sort going on between the kernel and the JVM. Perhaps it has to do with the total commit charge appearing lower (just physical instead of physical + swap) when swappiness is low. I'd love to hear from someone with a deep understanding of OS memory allocation about this.

Hope this helps,Sandy> -----Original Message-----> From: Gaojinchao [mailto:[EMAIL PROTECTED]]> Sent: Saturday, December 03, 2011 19:58> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]> Cc: Chenjian; wenzaohua> Subject: FeedbackRe: Suspectedmemoryleak> > Thank you for your help.> > This issue appears to be a configuration problem:> 1. HBase client uses NIO(socket) API that uses the directmemory.> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there> doesn't have "full gc", all directmemory can't reclaim. Unfortunately, using> GC confiugre parameter of our client doesn't produce any "full gc".> > This is only a preliminary result, All tests is running, If have any further results> , we will be fed back.> Finally , I will update our story to issue> https://issues.apache.org/jira/browse/HBASE-4633.> > If our digging is crrect, whether we should set a default value for the "-> XXMaxDirectMemorySize" to prevent this situation?> > > Thanks> > -----邮件原件-----> 发件人: bijieshan [mailto:[EMAIL PROTECTED]]> 发送时间: 2011年12月2日 15:37> 收件人: [EMAIL PROTECTED]; [EMAIL PROTECTED]> 抄送: Chenjian; wenzaohua> 主题: Re: Suspectedmemoryleak> > Thank you all.> I think it's the same problem with the link provided by Stack. Because the> heap-size is stabilized, but the non-heap size keep growing. So I think not the> problem of the CMS GC bug.> And we have known the content of the problem memory section, all the

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext