Make Thrift server thread pool bounded and add a command-line UI test

Details

Description

This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side.

Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread.

This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side.

Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread.

TEST PLAN
Unit tests, cluster test with a Python Thrift client.
I will post an update when I'm done with testing.

Phabricator
added a comment - 24/Nov/11 09:50 mbautin requested code review of " [jira] HBASE-4863 Make HBase Thrift server more configurable and add a command-line UI test".
Reviewers: JIRA, Kannan, tedyu, stack
This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side.
Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread.
TEST PLAN
Unit tests, cluster test with a Python Thrift client.
I will post an update when I'm done with testing.
REVISION DETAIL
https://reviews.facebook.net/D531
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java
src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
src/main/java/org/apache/hadoop/hbase/util/Threads.java
src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
src/test/java/org/apache/hadoop/hbase/util/TestThreads.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/1167/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.

Ted Yu
added a comment - 24/Nov/11 16:55 - edited I got compilation error :
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine) Time elapsed: 2.047 sec <<< ERROR!
java.lang.Error: Unresolved compilation problem:
Cannot make a static reference to the non- static method getColumnDescriptors() from the type TestThriftServer
at org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
Since HBaseThreadPoolServer extends TServer, I think a better name for the class would be TBoundedThreadPoolServer (TThreadPoolServer is in thrift).

tedyu has commented on the revision "[jira]HBASE-4863 Make HBase Thrift server more configurable and add a command-line UI test".

INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:64 Please add javadoc for the keys.
These keys should be placed into hbase-default.xml
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:80 Is TIME_TO_WAIT_AFTER_SHUTDOWN_MS a better name for this constant ?

Phabricator
added a comment - 24/Nov/11 17:11 tedyu has commented on the revision " [jira] HBASE-4863 Make HBase Thrift server more configurable and add a command-line UI test".
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:64 Please add javadoc for the keys.
These keys should be placed into hbase-default.xml
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:80 Is TIME_TO_WAIT_AFTER_SHUTDOWN_MS a better name for this constant ?
REVISION DETAIL
https://reviews.facebook.net/D531

tedyu has commented on the revision "[jira]HBASE-4863 Make HBase Thrift server more configurable and add a command-line UI test".

Should similar changes in thrift/ThriftServer.java be applied to thrift2/ThriftServer.java ?

INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:111 Should this become a parameter user can adjust ?
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:263 Should ttx.getType() be logged ?
src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java:179 Should read 'Exactly one '

Phabricator
added a comment - 24/Nov/11 19:46 tedyu has commented on the revision " [jira] HBASE-4863 Make HBase Thrift server more configurable and add a command-line UI test".
Should similar changes in thrift/ThriftServer.java be applied to thrift2/ThriftServer.java ?
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:111 Should this become a parameter user can adjust ?
src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:263 Should ttx.getType() be logged ?
src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java:179 Should read 'Exactly one '
REVISION DETAIL
https://reviews.facebook.net/D531

Ted Yu
added a comment - 25/Nov/11 01:56 In thrift2/ThriftServer.java:
} else {
server = getTThreadPoolServer(protocolFactory, processor, transportFactory, inetSocketAddress);
where
TThreadPoolServer.Args serverArgs = new TThreadPoolServer.Args(serverTransport);
It would be nice to incorporate TBoundedThreadPoolServer into the above module. This can be done in a separate JIRA.

Hadoop QA
added a comment - 25/Nov/11 23:11 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12505163/4863.addendum
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/372//console
This message is automatically generated.