If someone was contemplating running multiple HBase region servers per node, what would you tell him? I've lots of RAM on my machines and I want HBase to be able to make the most of it. Heaps larger than 12-16G have significant gc penalties. I could try to have a larger cache off-heap, but that feature is considered to be not fully baked in yet (according to people in the know). The option left over is to run multiple region servers per node. What are the downsides to that?

> If someone was contemplating running multiple HBase region servers per> node, what would you tell him? I've lots of RAM on my machines and I want> HBase to be able to make the most of it. Heaps larger than 12-16G have> significant gc penalties. I could try to have a larger cache off-heap, but> that feature is considered to be not fully baked in yet (according to> people in the know). The option left over is to run multiple region servers> per node. What are the downsides to that?>> Regards,> -sudarshan

Currently it's very possible to run multiple region servers permachine. People who have benchmarked it have even found that it'sfaster (there are more wal's being used in addition to the extraheap). It's a head-ache to run and administer, but if you havemachines with lots of spindles it's an option. Though it's not therecommended way to run HBase so there may be scary edges.

Yes the G1 looks promising if you have a very read heavy workload(We've been running it for integration tests for about a month or so). But until multiple wal's are in HBase (it's going to be a while) someusers might want try other solutions.On Mon, Jul 29, 2013 at 2:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote:> Have you seen this thread ?>> http://search-hadoop.com/m/PhUxhdw5Mz1/otis+g1gc&subj=Re+G1+before+after+GC+time+graph>> On Mon, Jul 29, 2013 at 2:19 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <> [EMAIL PROTECTED]> wrote:>>> If someone was contemplating running multiple HBase region servers per>> node, what would you tell him? I've lots of RAM on my machines and I want>> HBase to be able to make the most of it. Heaps larger than 12-16G have>> significant gc penalties. I could try to have a larger cache off-heap, but>> that feature is considered to be not fully baked in yet (according to>> people in the know). The option left over is to run multiple region servers>> per node. What are the downsides to that?>>>> Regards,>> -sudarshan

G1 doesn't really make our write path much better if you have unevenregion writes (zipfian distribution or the like).Lately I've been seeing the memstore blocking size per region being amajor factor. In fact I'm thinking of opening a jira to remove it bydefault.

Can you elaborate on the blocking being a major factor? Are youreferring to the default value of 7 slowing down writes? I don't thinkremoving that feature is a great idea. Here are a couple things that it ishelpful for:

1.) Slows down the write path so we are less likely to end up with 1000storefiles per region server2.) Slows down the write path enough for the end user to realize it isslow, and thus start troubleshooting/optimizing the write path.

I think it might be fair to bump to 15 - 20 by default, but this is theLAST option I touch when troubleshooting the write path. Usually when westart to have blocking issues it is caused by a few other issues:

1.) Too many regions for the allowed memstore upper/lower limit and we areflushing too small2.) Too small of Hlogs/number of HLogs and we are prematurely rolling3.) Ingest rate is too fast for the memstore size and needs to be raised.4.) Slow/not enough drives to keep up with the compaction churn and needsto be tuned

The defaults for HBase out of the box here are decent for testing, but notoptimized for a production workload. If this is not what you were talkingabout sorry for the long rant :)On Tue, Jul 30, 2013 at 2:34 AM, Elliott Clark <[EMAIL PROTECTED]> wrote:

Its possible to walk to the middle of the Golden Gate Bridge, evade security and take a swan dive in to the Bay. If you're in NY, substitute Golden Gate with Brooklyn ....

The point is that while something is possible, doesn't mean that its a good idea to do it.

Elliot makes this point...> It's a head-ache to run and administer, but if you have> machines with lots of spindles it's an option. Though it's not the> recommended way to run HBase so there may be scary edges.

The key phrase ... "Its a head-ache to run and administer" In the real world... there are things like BOFHs and its best not to get on their bad side. ;-)

But there are other options... If you do have too man spindles, too much memory and too many cores.... you have another option... Virtualize your nodes. You will take a performance hit on disk and network I/O but it gets balanced out. VMWare (err Pivotal) has made claims about improving their virtualization software so that there is less overhead. Note that YMMV.

This way you can run more RS on the existing hardware. Note that you don't run HBase alone and with the loss of memory and cores, you would end up reducing the number of slots per server.

But its a viable solution and its recommended over trying to run multiple RS on a node.

> Currently it's very possible to run multiple region servers per> machine. People who have benchmarked it have even found that it's> faster (there are more wal's being used in addition to the extra> heap). It's a head-ache to run and administer, but if you have> machines with lots of spindles it's an option. Though it's not the> recommended way to run HBase so there may be scary edges.> > Yes the G1 looks promising if you have a very read heavy workload> (We've been running it for integration tests for about a month or so).> But until multiple wal's are in HBase (it's going to be a while) some> users might want try other solutions.> > > On Mon, Jul 29, 2013 at 2:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote:>> Have you seen this thread ?>> >> http://search-hadoop.com/m/PhUxhdw5Mz1/otis+g1gc&subj=Re+G1+before+after+GC+time+graph>> >> On Mon, Jul 29, 2013 at 2:19 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <>> [EMAIL PROTECTED]> wrote:>> >>> If someone was contemplating running multiple HBase region servers per>>> node, what would you tell him? I've lots of RAM on my machines and I want>>> HBase to be able to make the most of it. Heaps larger than 12-16G have>>> significant gc penalties. I could try to have a larger cache off-heap, but>>> that feature is considered to be not fully baked in yet (according to>>> people in the know). The option left over is to run multiple region servers>>> per node. What are the downsides to that?>>> >>> Regards,>>> -sudarshan>

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext