We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?

1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test)3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF withIN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space(or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -there is no guarantee (as usual). If you don have in_memory column families you may decrease

We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?

2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.

Questions to experts -

1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?

4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.

Any help is highly appreciable,

Thanks,Saurabh.

Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.

We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size?

2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server?

3. We will change the cf to IN_memory and report back performance difference.

> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies> latencies you have been observing in the test)> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -> there is no guarantee (as usual). If you don have in_memory column families you may decrease> > > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 5:10 AM> To: [EMAIL PROTECTED]> Subject: experiencing high latency for few reads in HBase> > Hi,> > We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?> > We observe the following things -> > 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.> > 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.> > Questions to experts -> > 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?> > 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.> > Any help is highly appreciable,> > Thanks,> Saurabh.> > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

A 1s SLA is tough in HBase (or any large memory JVM application).Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile.I heard some folks have experimented with 30GB heaps and G1 and have reported max GC times of 200ms, but I have not verified that.

1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.

We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size?

2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server?

3. We will change the cf to IN_memory and report back performance difference.

> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies> latencies you have been observing in the test)> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -> there is no guarantee (as usual). If you don have in_memory column families you may decrease> > > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 5:10 AM> To: [EMAIL PROTECTED]> Subject: experiencing high latency for few reads in HBase> > Hi,> > We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?> > We observe the following things -> > 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.> > 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.> > Questions to experts -> > 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?> > 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.> > Any help is highly appreciable,> > Thanks,> Saurabh.> > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

What version have you deployed?On 08/29/2013 01:29 AM, lars hofhansl wrote:> A 1s SLA is tough in HBase (or any large memory JVM application).>>> Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile.> I heard some folks have experimented with 30GB heaps and G1 and have reported max GC times of 200ms, but I have not verified that.>> -- Lars>>>> ----- Original Message -----> From: Saurabh Yahoo <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Sent: Wednesday, August 28, 2013 3:17 PM> Subject: Re: experiencing high latency for few reads in HBase>> Hi Vlad,>> Thanks for your response.>> 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.>> We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size?>> 2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server?>> 3. We will change the cf to IN_memory and report back performance difference.>> Thanks,> Saurabh.>> On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:>>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>> latencies you have been observing in the test)>> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with>> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space>> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache ->> there is no guarantee (as usual). If you don have in_memory column families you may decrease>>>>>>>> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>>>> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 5:10 AM>> To: [EMAIL PROTECTED]>> Subject: experiencing high latency for few reads in HBase>>>> Hi,>>>> We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?>>>> We observe the following things ->>>> 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.>>>> 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.>>>> Questions to experts ->>>> 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?>>>> 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.>>>> Any help is highly appreciable,>>>> Thanks,>> Saurabh.>>>> Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

> In 0.94.11 Release, has been included an optimization for MultiGets: https://issues.apache.org/jira/browse/HBASE-9087> > What version have you deployed?> > > On 08/29/2013 01:29 AM, lars hofhansl wrote:>> A 1s SLA is tough in HBase (or any large memory JVM application).>> >> >> Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile.>> I heard some folks have experimented with 30GB heaps and G1 and have reported max GC times of 200ms, but I have not verified that.>> >> -- Lars>> >> >> >> ----- Original Message ----->> From: Saurabh Yahoo <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Wednesday, August 28, 2013 3:17 PM>> Subject: Re: experiencing high latency for few reads in HBase>> >> Hi Vlad,>> >> Thanks for your response.>> >> 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.>> >> We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size?>> >> 2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server?>> >> 3. We will change the cf to IN_memory and report back performance difference.>> >> Thanks,>> Saurabh.>> >> On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:>> >>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>>> latencies you have been observing in the test)>>> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with>>> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space>>> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache ->>> there is no guarantee (as usual). If you don have in_memory column families you may decrease>>> >>> >>> >>> Best regards,>>> Vladimir Rodionov>>> Principal Platform Engineer>>> Carrier IQ, www.carrieriq.com>>> e-mail: [EMAIL PROTECTED]>>> >>> ________________________________________>>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>>> Sent: Wednesday, August 28, 2013 5:10 AM>>> To: [EMAIL PROTECTED]>>> Subject: experiencing high latency for few reads in HBase>>> >>> Hi,>>> >>> We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?>>> >>> We observe the following things ->>> >>> 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.>>> >>> 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.>>> >>> Questions to experts ->>> >>> 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?>>> >>> 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.>>> >>> Any help is highly appreciable,>>> >>> Thanks,>>> Saurabh.>>> >>> Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

I just moved from 0.94.10 to 0.94.11. Tremendous improvement in our app's query response. Went down to 1.3 sec from 1.7 sec. Concurrent tests are also good, but it still exponentially degrades from to 10 secs for 8 concurrent clients. There might a bug lurking in there somewhere that is probably affecting us.

What version have you deployed?On 08/29/2013 01:29 AM, lars hofhansl wrote:> A 1s SLA is tough in HBase (or any large memory JVM application).>>> Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile.> I heard some folks have experimented with 30GB heaps and G1 and have reported max GC times of 200ms, but I have not verified that.>> -- Lars>>>> ----- Original Message -----> From: Saurabh Yahoo <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Sent: Wednesday, August 28, 2013 3:17 PM> Subject: Re: experiencing high latency for few reads in HBase>> Hi Vlad,>> Thanks for your response.>> 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.>> We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size?>> 2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server?>> 3. We will change the cf to IN_memory and report back performance difference.>> Thanks,> Saurabh.>> On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:>>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>> latencies you have been observing in the test)>> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with>> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space>> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache ->> there is no guarantee (as usual). If you don have in_memory column families you may decrease>>>>>>>> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>>>> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 5:10 AM>> To: [EMAIL PROTECTED]>> Subject: experiencing high latency for few reads in HBase>>>> Hi,>>>> We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?>>>> We observe the following things ->>>> 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.>>>> 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.>>>> Questions to experts ->>>> 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?

Are your GC logs enabled? Can you see any long pause in it?On Fri, Aug 30, 2013 at 4:45 AM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

> I just moved from 0.94.10 to 0.94.11. Tremendous improvement in our app's> query response. Went down to 1.3 sec from 1.7 sec.> Concurrent tests are also good, but it still exponentially degrades from> to 10 secs for 8 concurrent clients. There might a bug lurking in there> somewhere that is probably affecting us.>>> Regards,> - kiru>> ________________________________> From: Federico Gaule <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]> Sent: Thursday, August 29, 2013 5:37 AM> Subject: Re: experiencing high latency for few reads in HBase>>> In 0.94.11 Release, has been included an optimization for MultiGets:> https://issues.apache.org/jira/browse/HBASE-9087>> What version have you deployed?>>> On 08/29/2013 01:29 AM, lars hofhansl wrote:> > A 1s SLA is tough in HBase (or any large memory JVM application).> >> >> > Maybe, if you presplit your table, play with JDK7 and the G1 collector,> but nobody here will vouch for such an SLA in the 99th percentile.> > I heard some folks have experimented with 30GB heaps and G1 and have> reported max GC times of 200ms, but I have not verified that.> >> > -- Lars> >> >> >> > ----- Original Message -----> > From: Saurabh Yahoo <[EMAIL PROTECTED]>> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> > Sent: Wednesday, August 28, 2013 3:17 PM> > Subject: Re: experiencing high latency for few reads in HBase> >> > Hi Vlad,> >> > Thanks for your response.> >> > 1. Our SLA is less than one sec. we cannot afford latency more than 1> sec.> >> > We can increase heap size if that help, we have enough memory on server.> What would be the optimal heap size?> >> > 2. Cache hit ratio is 95%. One thing I don't understand that we have> allocated only 4gb for block cache out of 12gb. That left 8gb for rest of> JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to> process the requests? What are the most memory consuming objects in region> server?> >> > 3. We will change the cf to IN_memory and report back performance> difference.> >> > Thanks,> > Saurabh.> >> > On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]>> wrote:> >> >> 1. 4 sec max latency is not that bad taking into account 12GB heap. It> can be much larger. What is your SLA?> >> 2. Block evictions is the result of a poor cache hit rate and the root> cause of a periodic stop-the-world GC pauses (max latencies> >> latencies you have been observing in the test)> >> 3. Block cache consists of 3 parts (25% young generation, 50% -> tenured, 25% - permanent). Permanent part is for CF with> >> IN_MEMORY = true (you can specify this when you create CF). Block> first stored in 'young gen' space, then gets promoted to 'tenured gen' space> >> (or gets evicted). May be your 'perm gen' space is underutilized? This> is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the> space allocated for block cache -> >> there is no guarantee (as usual). If you don have in_memory column> families you may decrease> >>> >>> >>> >> Best regards,> >> Vladimir Rodionov> >> Principal Platform Engineer> >> Carrier IQ, www.carrieriq.com> >> e-mail: [EMAIL PROTECTED]> >>> >> ________________________________________> >> From: Saurabh Yahoo [[EMAIL PROTECTED]]> >> Sent: Wednesday, August 28, 2013 5:10 AM> >> To: [EMAIL PROTECTED]> >> Subject: experiencing high latency for few reads in HBase> >>> >> Hi,> >>> >> We are running a stress test in our 5 node cluster and we are getting> the expected mean latency of 10ms. But we are seeing around 20 reads out of> 25 million reads having latency more than 4 seconds. Can anyone provide the> insight what we can do to meet below second SLA for each and every read?

I did check GC earlier. Not much . About .0x seconds.I think there is lock/contention issue. 0.94.11 is also more stable compared to 0.94.10.With 64 concurrent clients, 0.94.10 one regionserver or other will crash.

Are your GC logs enabled? Can you see any long pause in it?On Fri, Aug 30, 2013 at 4:45 AM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

> I just moved from 0.94.10 to 0.94.11. Tremendous improvement in our app's> query response. Went down to 1.3 sec from 1.7 sec.> Concurrent tests are also good, but it still exponentially degrades from> to 10 secs for 8 concurrent clients. There might a bug lurking in there> somewhere that is probably affecting us.>>> Regards,> - kiru>> ________________________________> From: Federico Gaule <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]> Sent: Thursday, August 29, 2013 5:37 AM> Subject: Re: experiencing high latency for few reads in HBase>>> In 0.94.11 Release, has been included an optimization for MultiGets:> https://issues.apache.org/jira/browse/HBASE-9087>> What version have you deployed?>>> On 08/29/2013 01:29 AM, lars hofhansl wrote:> > A 1s SLA is tough in HBase (or any large memory JVM application).> >> >> > Maybe, if you presplit your table, play with JDK7 and the G1 collector,> but nobody here will vouch for such an SLA in the 99th percentile.> > I heard some folks have experimented with 30GB heaps and G1 and have> reported max GC times of 200ms, but I have not verified that.> >> > -- Lars> >> >> >> > ----- Original Message -----> > From: Saurabh Yahoo <[EMAIL PROTECTED]>> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> > Sent: Wednesday, August 28, 2013 3:17 PM> > Subject: Re: experiencing high latency for few reads in HBase> >> > Hi Vlad,> >> > Thanks for your response.> >> > 1. Our SLA is less than one sec. we cannot afford latency more than 1> sec.> >> > We can increase heap size if that help, we have enough memory on server.> What would be the optimal heap size?> >> > 2. Cache hit ratio is 95%. One thing I don't understand that we have> allocated only 4gb for block cache out of 12gb. That left 8gb for rest of> JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to> process the requests? What are the most memory consuming objects in region> server?> >> > 3. We will change the cf to IN_memory and report back performance> difference.> >> > Thanks,> > Saurabh.> >> > On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]>> wrote:> >> >> 1. 4 sec max latency is not that bad taking into account 12GB heap. It> can be much larger. What is your SLA?> >> 2. Block evictions is the result of a poor cache hit rate and the root> cause of a periodic stop-the-world GC pauses (max latencies> >> latencies you have been observing in the test)> >> 3. Block cache consists of 3 parts (25% young generation, 50% -> tenured, 25% - permanent). Permanent part is for CF with> >> IN_MEMORY = true (you can specify this when you create CF). Block> first stored in 'young gen' space, then gets promoted to 'tenured gen' space> >> (or gets evicted). May be your 'perm gen' space is underutilized? This> is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the> space allocated for block cache -> >> there is no guarantee (as usual). If you don have in_memory column> families you may decrease> >>> >>> >>> >> Best regards,> >> Vladimir Rodionov> >> Principal Platform Engineer> >> Carrier IQ, www.carrieriq.com> >> e-mail: [EMAIL PROTECTED]> >>> >> ________________________________________> >> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Adrien Mogenethttp://www.borntosegfault.com

1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test)3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF withIN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space(or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -there is no guarantee (as usual). If you don have in_memory column families you may decrease

We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?

2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.

Questions to experts -

1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?

4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.

Any help is highly appreciable,

Thanks,Saurabh.

Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

> Right 4 sec is good. > @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?> BTW, in this stress test how many concurrent clients do you have ? > > Regards,> - kiru> > > ________________________________> From: Vladimir Rodionov <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Wednesday, August 28, 2013 12:15 PM> Subject: RE: experiencing high latency for few reads in HBase > > > 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies> latencies you have been observing in the test)> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -> there is no guarantee (as usual). If you don have in_memory column families you may decrease> > > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 5:10 AM> To: [EMAIL PROTECTED]> Subject: experiencing high latency for few reads in HBase> > Hi,> > We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?> > We observe the following things -> > 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.> > 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.> > Questions to experts -> > 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?> > 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.> > Any help is highly appreciable,> > Thanks,> Saurabh.> > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.(I do have an issue right now, it is not scaling to multiple clients.)

> Right 4 sec is good. > @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?> BTW, in this stress test how many concurrent clients do you have ? > > Regards,> - kiru> > > ________________________________> From: Vladimir Rodionov <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Wednesday, August 28, 2013 12:15 PM> Subject: RE: experiencing high latency for few reads in HBase > > > 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies> latencies you have been observing in the test)> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -> there is no guarantee (as usual). If you don have in_memory column families you may decrease> > > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 5:10 AM> To: [EMAIL PROTECTED]> Subject: experiencing high latency for few reads in HBase> > Hi,> > We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?> > We observe the following things -> > 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.> > 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.> > Questions to experts -> > 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?> > 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.> > Any help is highly appreciable,> > Thanks,> Saurabh.> > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.> (I do have an issue right now, it is not scaling to multiple clients.)> > Regards,> - kiru> > > Kiru Pakkirisamy | webcloudtech.wordpress.com> > > ________________________________> From: Saurabh Yahoo <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Wednesday, August 28, 2013 3:20 PM> Subject: Re: experiencing high latency for few reads in HBase > > > Thanks Kitu. We need less than 1 sec latency. > > We are using both muliGet and get. > > We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).> > Thanks,> Saurabh. > > On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:> >> Right 4 sec is good. >> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>> BTW, in this stress test how many concurrent clients do you have ? >> >> Regards,>> - kiru>> >> >> ________________________________>> From: Vladimir Rodionov <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Sent: Wednesday, August 28, 2013 12:15 PM>> Subject: RE: experiencing high latency for few reads in HBase >> >> >> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>> latencies you have been observing in the test)>> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with>> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space>> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache ->> there is no guarantee (as usual). If you don have in_memory column families you may decrease>> >> >> >> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>> >> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 5:10 AM>> To: [EMAIL PROTECTED]>> Subject: experiencing high latency for few reads in HBase>> >> Hi,>> >> We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?>> >> We observe the following things ->> >> 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized.>> >> 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.>> >> Questions to experts ->> >> 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?>> >> 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.

Increasing Java heap size will make latency worse, actually.You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.

You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards,Vladimir RodionovPrincipal Platform EngineerCarrier IQ, www.carrieriq.come-mail: [EMAIL PROTECTED]

> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.> (I do have an issue right now, it is not scaling to multiple clients.)>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: Saurabh Yahoo <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Sent: Wednesday, August 28, 2013 3:20 PM> Subject: Re: experiencing high latency for few reads in HBase>>> Thanks Kitu. We need less than 1 sec latency.>> We are using both muliGet and get.>> We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).>> Thanks,> Saurabh.>> On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>> Right 4 sec is good.>> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>> BTW, in this stress test how many concurrent clients do you have ?>>>> Regards,>> - kiru>>>>>> ________________________________>> From: Vladimir Rodionov <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Wednesday, August 28, 2013 12:15 PM>> Subject: RE: experiencing high latency for few reads in HBase>>>>>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>> latencies you have been observing in the test)>> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with>> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space>> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache ->> there is no guarantee (as usual). If you don have in_memory column families you may decrease>>>>>>>> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>>>> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 5:10 AM

> Increasing Java heap size will make latency worse, actually.> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). > I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.> > You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability > of two independent events (slow requests) is the product of event's probabilities themselves. > > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 4:18 PM> To: [EMAIL PROTECTED]> Subject: Re: experiencing high latency for few reads in HBase> > Thanks Kiru,> > Scan is not an option for our use cases. Our read is pretty random.> > Any other suggestion to bring down the latency.> > Thanks,> Saurabh.> > > On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:> >> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.>> (I do have an issue right now, it is not scaling to multiple clients.)>> >> Regards,>> - kiru>> >> >> Kiru Pakkirisamy | webcloudtech.wordpress.com>> >> >> ________________________________>> From: Saurabh Yahoo <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Wednesday, August 28, 2013 3:20 PM>> Subject: Re: experiencing high latency for few reads in HBase>> >> >> Thanks Kitu. We need less than 1 sec latency.>> >> We are using both muliGet and get.>> >> We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).>> >> Thanks,>> Saurabh.>> >> On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>> >>> Right 4 sec is good.>>> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>>> BTW, in this stress test how many concurrent clients do you have ?>>> >>> Regards,>>> - kiru>>> >>> >>> ________________________________>>> From: Vladimir Rodionov <[EMAIL PROTECTED]>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Sent: Wednesday, August 28, 2013 12:15 PM>>> Subject: RE: experiencing high latency for few reads in HBase>>> >>> >>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>>> latencies you have been observing in the test)>>> 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with>>> IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space>>> (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -

Saurabh,I have a suspicion that the few high latency responses are happening because of "hot" region.(s)I vaguely remember you mentioning that the data is evenly distributed across all regions.I hope your test also goes across them evenly. You may want to check the read requests to the regions.

> Increasing Java heap size will make latency worse, actually.> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). > I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.> > You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability > of two independent events (slow requests) is the product of event's probabilities themselves. > > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 4:18 PM> To: [EMAIL PROTECTED]> Subject: Re: experiencing high latency for few reads in HBase> > Thanks Kiru,> > Scan is not an option for our use cases. Our read is pretty random.> > Any other suggestion to bring down the latency.> > Thanks,> Saurabh.> > > On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:> >> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.>> (I do have an issue right now, it is not scaling to multiple clients.)>> >> Regards,>> - kiru>> >> >> Kiru Pakkirisamy | webcloudtech.wordpress.com>> >> >> ________________________________>> From: Saurabh Yahoo <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Wednesday, August 28, 2013 3:20 PM>> Subject: Re: experiencing high latency for few reads in HBase>> >> >> Thanks Kitu. We need less than 1 sec latency.>> >> We are using both muliGet and get.>> >> We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).>> >> Thanks,>> Saurabh.>> >> On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>> >>> Right 4 sec is good.>>> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>>> BTW, in this stress test how many concurrent clients do you have ?>>> >>> Regards,>>> - kiru>>> >>> >>> ________________________________>>> From: Vladimir Rodionov <[EMAIL PROTECTED]>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Sent: Wednesday, August 28, 2013 12:15 PM>>> Subject: RE: experiencing high latency for few reads in HBase>>> >>> >>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?

Thanks Kiru. Yes. Our read is going across the region servers evenly. I did not see any issue with that. On Aug 29, 2013, at 11:59 AM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

> Saurabh,> I have a suspicion that the few high latency responses are happening because of "hot" region.(s)> I vaguely remember you mentioning that the data is evenly distributed across all regions.> I hope your test also goes across them evenly. You may want to check the read requests to the regions.> > Regards,> - kiru> > > ________________________________> From: Saurabh Yahoo <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Thursday, August 29, 2013 2:49 AM> Subject: Re: experiencing high latency for few reads in HBase > > > Hi Vlad,> > We do have strict latency requirement as it is financial data requiring direct access from clients. > > Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ?> > > > > > > > On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:> >> Increasing Java heap size will make latency worse, actually.>> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). >> I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.>> >> You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.>> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability >> of two independent events (slow requests) is the product of event's probabilities themselves. >> >> >> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>> >> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 4:18 PM>> To: [EMAIL PROTECTED]>> Subject: Re: experiencing high latency for few reads in HBase>> >> Thanks Kiru,>> >> Scan is not an option for our use cases. Our read is pretty random.>> >> Any other suggestion to bring down the latency.>> >> Thanks,>> Saurabh.>> >> >> On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>> >>> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.>>> (I do have an issue right now, it is not scaling to multiple clients.)>>> >>> Regards,>>> - kiru>>> >>> >>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> >>> >>> ________________________________>>> From: Saurabh Yahoo <[EMAIL PROTECTED]>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Sent: Wednesday, August 28, 2013 3:20 PM>>> Subject: Re: experiencing high latency for few reads in HBase>>> >>> >>> Thanks Kitu. We need less than 1 sec latency.>>> >>> We are using both muliGet and get.>>> >>> We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).>>> >>> Thanks,>>> Saurabh.>>> >>> On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>> >>>> Right 4 sec is good.>>>> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>>>> BTW, in this stress test how many concurrent clients do you have ?>>>> >>>> Regards,>>>> - kiru>>>> >>>> >

> Increasing Java heap size will make latency worse, actually.> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB).> I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.>> You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability> of two independent events (slow requests) is the product of event's probabilities themselves.>>> Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]>> ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Wednesday, August 28, 2013 4:18 PM> To: [EMAIL PROTECTED]> Subject: Re: experiencing high latency for few reads in HBase>> Thanks Kiru,>> Scan is not an option for our use cases. Our read is pretty random.>> Any other suggestion to bring down the latency.>> Thanks,> Saurabh.>>> On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.>> (I do have an issue right now, it is not scaling to multiple clients.)>>>> Regards,>> - kiru>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>>>>> ________________________________>> From: Saurabh Yahoo <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Wednesday, August 28, 2013 3:20 PM>> Subject: Re: experiencing high latency for few reads in HBase>>>>>> Thanks Kitu. We need less than 1 sec latency.>>>> We are using both muliGet and get.>>>> We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).>>>> Thanks,>> Saurabh.>>>> On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>>>> Right 4 sec is good.>>> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>>> BTW, in this stress test how many concurrent clients do you have ?>>>>>> Regards,>>> - kiru>>>>>>>>> ________________________________>>> From: Vladimir Rodionov <[EMAIL PROTECTED]>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Sent: Wednesday, August 28, 2013 12:15 PM>>> Subject: RE: experiencing high latency for few reads in HBase>>>>>>>>> 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?>>> 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies>>> latencies you have been observing in the test)

> Yes. HBase won't guarantee strict sub-second latency. > > Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]> > ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Thursday, August 29, 2013 2:49 AM> To: [EMAIL PROTECTED]> Cc: [EMAIL PROTECTED]> Subject: Re: experiencing high latency for few reads in HBase> > Hi Vlad,> > We do have strict latency requirement as it is financial data requiring direct access from clients.> > Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ?> > > > > > > > On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:> >> Increasing Java heap size will make latency worse, actually.>> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB).>> I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.>> >> You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.>> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability>> of two independent events (slow requests) is the product of event's probabilities themselves.>> >> >> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>> >> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 4:18 PM>> To: [EMAIL PROTECTED]>> Subject: Re: experiencing high latency for few reads in HBase>> >> Thanks Kiru,>> >> Scan is not an option for our use cases. Our read is pretty random.>> >> Any other suggestion to bring down the latency.>> >> Thanks,>> Saurabh.>> >> >> On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>> >>> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.>>> (I do have an issue right now, it is not scaling to multiple clients.)>>> >>> Regards,>>> - kiru>>> >>> >>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> >>> >>> ________________________________>>> From: Saurabh Yahoo <[EMAIL PROTECTED]>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Sent: Wednesday, August 28, 2013 3:20 PM>>> Subject: Re: experiencing high latency for few reads in HBase>>> >>> >>> Thanks Kitu. We need less than 1 sec latency.>>> >>> We are using both muliGet and get.>>> >>> We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients).>>> >>> Thanks,>>> Saurabh.>>> >>> On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>> >>>> Right 4 sec is good.>>>> @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ?>>>> BTW, in this stress test how many concurrent clients do you have ?>>>> >>>> Regards,>>>> - kiru>>>> >>>> >>>> ________________________________>>>> From: Vladimir Rodionov <[EMAIL PROTECTED]>

If you lost a RS or otherwise moved a region around (hbase balancer?) thenit would throw off the locality. You'll want to compact any time a regionmoves for any reason (unless it moved to another RS with which it hadlocality). Try major compacting again now and see if the locality indexgoes up to 100On Thu, Aug 29, 2013 at 1:52 PM, Saurabh Yahoo <[EMAIL PROTECTED]> wrote:

Usually, either cluster restart or major compaction helps improving locality index.There is an issue in region assignment after table disable/enable in 0.94.x (x <11) which breaks HDFS locality. Fixed in 0.94.11

You can write your own routine to manually "localize" particular table using public HBase Client API.

> Yes. HBase won't guarantee strict sub-second latency.>> Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]>> ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Thursday, August 29, 2013 2:49 AM> To: [EMAIL PROTECTED]> Cc: [EMAIL PROTECTED]> Subject: Re: experiencing high latency for few reads in HBase>> Hi Vlad,>> We do have strict latency requirement as it is financial data requiring direct access from clients.>> Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ?>>>>>>>> On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:>>> Increasing Java heap size will make latency worse, actually.>> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB).>> I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.>>>> You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.>> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability>> of two independent events (slow requests) is the product of event's probabilities themselves.>>>>>> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>>>> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 4:18 PM>> To: [EMAIL PROTECTED]>> Subject: Re: experiencing high latency for few reads in HBase>>>> Thanks Kiru,>>>> Scan is not an option for our use cases. Our read is pretty random.>>>> Any other suggestion to bring down the latency.>>>> Thanks,>> Saurabh.>>>>>> On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>>>> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.>>> (I do have an issue right now, it is not scaling to multiple clients.)>>>>>> Regards,>>> - kiru>>>>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>>>>>>>> ________________________________>>> From: Saurabh Yahoo <[EMAIL PROTECTED]>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>>> Sent: Wednesday, August 28, 2013 3:20 PM>>> Subject: Re: experiencing high latency for few reads in HBase

But locality index should not matter right if you are in IN_MEMORY most and you are running the test after a few runs to make sure they are already in IN_MEMORY (ie blockCacheHit is high or blockCacheMiss is low) (?)

Usually, either cluster restart or major compaction helps improving locality index.There is an issue in region assignment after table disable/enable in 0.94.x (x <11) which breaks HDFS locality. Fixed in 0.94.11

You can write your own routine to manually "localize" particular table using public HBase Client API.

> Yes. HBase won't guarantee strict sub-second latency.>> Best regards,> Vladimir Rodionov> Principal Platform Engineer> Carrier IQ, www.carrieriq.com> e-mail: [EMAIL PROTECTED]>> ________________________________________> From: Saurabh Yahoo [[EMAIL PROTECTED]]> Sent: Thursday, August 29, 2013 2:49 AM> To: [EMAIL PROTECTED]> Cc: [EMAIL PROTECTED]> Subject: Re: experiencing high latency for few reads in HBase>> Hi Vlad,>> We do have strict latency requirement as it is financial data requiring direct access from clients.>> Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ?>>>>>>>> On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:>>> Increasing Java heap size will make latency worse, actually.>> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB).>> I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.>>>> You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.>> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability>> of two independent events (slow requests) is the product of event's probabilities themselves.>>>>>> Best regards,>> Vladimir Rodionov>> Principal Platform Engineer>> Carrier IQ, www.carrieriq.com>> e-mail: [EMAIL PROTECTED]>>>> ________________________________________>> From: Saurabh Yahoo [[EMAIL PROTECTED]]>> Sent: Wednesday, August 28, 2013 4:18 PM>> To: [EMAIL PROTECTED]>> Subject: Re: experiencing high latency for few reads in HBase>>>> Thanks Kiru,>>>> Scan is not an option for our use cases. Our read is pretty random.>>>> Any other suggestion to bring down the latency.>>>> Thanks,>> Saurabh.>>>>>> On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>>>> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.>>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.

We have 10TB of data on disk. It would not fit in memory. Also for the first time, hbase need to read from the disk. And it has to go through the network to read the blocks which are stored at other data node.

We have 10TB of data on disk. It would not fit in memory. Also for the first time, hbase need to read from the disk. And it has to go through the network to read the blocks which are stored at other data node.

Another point that could help to stay under the `1s SLA': enable directbyte buffers for LruBlockCache. Have a look at HBASE-4027.On Thu, Aug 29, 2013 at 9:27 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA?2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test)3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF withIN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space(or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache -there is no guarantee (as usual). If you don have in_memory column families you may decrease

We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read?

2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction.

Questions to experts -

1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache?

4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb.

Any help is highly appreciable,

Thanks,Saurabh.

Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.

+

Vladimir Rodionov 2013-08-28, 19:18

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext