Details

Description

If via HBASE-794 first class support for talking via Thrift directly to HMaster and HRS is available, then pure C and C++ client libraries are possible.

The C client library would wrap a Thrift core.

The C++ client library can provide a class hierarchy quite close to o.a.h.h.client and, ideally, identical semantics. It should be just a wrapper around the C API, for economy.

Internally to my employer there is a lot of resistance to HBase because many dev teams have a strong C/C++ bias. The real issue however is really client side integration, not a fundamental objection. (What runs server side and how it is managed is a secondary consideration.)

Thrift RPC has come a long way - there's a much better server available, and I'm working on a much more compact protocol (THRIFT-110) that would keep wire size down. It might be a mature enough project for you guys to take a look.

Bryan Duxbury
added a comment - 17/Dec/08 18:25 Thrift RPC has come a long way - there's a much better server available, and I'm working on a much more compact protocol ( THRIFT-110 ) that would keep wire size down. It might be a mature enough project for you guys to take a look.

Just by way of FYI, here is how the C++ interface to HDFS is done: "HDFS provides a C++ library called libhdfs that mirrors the Java interface. In fact, it works using the Java Native Interface (JNI) to call a Java HDFS client. Hadoop comes with pre-built libhdfs binaries for 32-bit Linux, but for other platforms you will need to build them yourself using the instructions at http://wiki.apache.org/hadoop/LibHDFS."

stack
added a comment - 19/Jan/09 18:55 Just by way of FYI, here is how the C++ interface to HDFS is done: "HDFS provides a C++ library called libhdfs that mirrors the Java interface. In fact, it works using the Java Native Interface (JNI) to call a Java HDFS client. Hadoop comes with pre-built libhdfs binaries for 32-bit Linux, but for other platforms you will need to build them yourself using the instructions at http://wiki.apache.org/hadoop/LibHDFS ."

After discussion at the LA Hackathon, it is resolved that a fat C based client API and a fat Java based one would be co-supported. Initially the C client would be marked experimental, until the frequency of changes to the Java client API drops to a maintenance level only.

Andrew Purtell
added a comment - 30/Jan/09 22:48 After discussion at the LA Hackathon, it is resolved that a fat C based client API and a fat Java based one would be co-supported. Initially the C client would be marked experimental, until the frequency of changes to the Java client API drops to a maintenance level only.

Andrew, here's another serialization package in case you hadn't seen the post from Doug:

I propose we add a new Hadoop subproject for Avro, a serialization system. My ambition is for Avro to replace both Hadoop's RPC and to be used for most Hadoop data files, e.g., by Pig, Hive, etc.
Initial committers would be Sharad Agarwal and me, both existing Hadoop committers. We are the sole authors of this software to date.
The code is currently at:
http://people.apache.org/~cutting/avro.git/
To learn more:
git clone http://people.apache.org/~cutting/avro.git/ avro
cat avro/README.txt

stack
added a comment - 03/Apr/09 07:40 Andrew, here's another serialization package in case you hadn't seen the post from Doug:
I propose we add a new Hadoop subproject for Avro, a serialization system. My ambition is for Avro to replace both Hadoop's RPC and to be used for most Hadoop data files, e.g., by Pig, Hive, etc.
Initial committers would be Sharad Agarwal and me, both existing Hadoop committers. We are the sole authors of this software to date.
The code is currently at:
http: //people.apache.org/~cutting/avro.git/
To learn more:
git clone http: //people.apache.org/~cutting/avro.git/ avro
cat avro/README.txt

I am not exactly sure I understand this issue. We had similar worries as our shop is <bold>very</bold> c++ biased, and we went with the thrift client. We now solely write c++ based code and tbh hitting a thrift server local to the data is faster than falling back to the rpc mechanism anyway. Would it be enough to write an efficient c++ based thrift server? I would love to see thrift api be the focus of api development as their are still numerous features which haven't been moved out of the java api. Anyway, just my two cents, I will totally help out with any c++ api.

Alex Newman
added a comment - 29/Apr/09 03:45 I am not exactly sure I understand this issue. We had similar worries as our shop is <bold>very</bold> c++ biased, and we went with the thrift client. We now solely write c++ based code and tbh hitting a thrift server local to the data is faster than falling back to the rpc mechanism anyway. Would it be enough to write an efficient c++ based thrift server? I would love to see thrift api be the focus of api development as their are still numerous features which haven't been moved out of the java api. Anyway, just my two cents, I will totally help out with any c++ api.

Hosting the region assignment table in ZK will simplify the implementation of a C/C++ client. We can use the ZK C API to look up region locations independent of the master so would only have to talk with regionservers. Can start then as an async RPC engine mediating client requests to region servers and not much more (low level C API). Incrementally add smarts from there (higher level C++ API).

Andrew Purtell
added a comment - 12/Nov/09 16:29 Hosting the region assignment table in ZK will simplify the implementation of a C/C++ client. We can use the ZK C API to look up region locations independent of the master so would only have to talk with regionservers. Can start then as an async RPC engine mediating client requests to region servers and not much more (low level C API). Incrementally add smarts from there (higher level C++ API).

I have looked into Avro quite a bit the last weeks so I was thinking that I could probably easily provide an Avro interface alongside the Thrift interface.

What I don't quite understand how this issue fits in all that. Thrift and Avro can be used with C/C++ but after reading this I have the feeling you mean something else than just a Thrift-like client interface. If those turn out to be separate things I'll open a new issue and discuss it there further.

Lars Francke
added a comment - 12/Feb/10 16:25 I have looked into Avro quite a bit the last weeks so I was thinking that I could probably easily provide an Avro interface alongside the Thrift interface.
What I don't quite understand how this issue fits in all that. Thrift and Avro can be used with C/C++ but after reading this I have the feeling you mean something else than just a Thrift-like client interface. If those turn out to be separate things I'll open a new issue and discuss it there further.

The intent of this issue is to build a fat client in C, wrap in C+, and have it talk directly to the master and regionservers without any gateway/connector process as intermediary. The C+ wrapper would have similar class structure and API as o.a.h.h.client. No need for any Java except on the servers. No intermediary to be a potential bottleneck.

The notion has a reasonable argument but it's a lot of work. The rationale for taking it on has become less convincing over time as the Thrift and REST connectors have been satisfying enough for users. There was a fair amount of interest in the 0.19 days but that has waned as far as I can see.

Andrew Purtell
added a comment - 12/Feb/10 17:22 The intent of this issue is to build a fat client in C, wrap in C+ , and have it talk directly to the master and regionservers without any gateway/connector process as intermediary. The C + wrapper would have similar class structure and API as o.a.h.h.client. No need for any Java except on the servers. No intermediary to be a potential bottleneck.
The notion has a reasonable argument but it's a lot of work. The rationale for taking it on has become less convincing over time as the Thrift and REST connectors have been satisfying enough for users. There was a fair amount of interest in the 0.19 days but that has waned as far as I can see.

I just started using Thrift and HBase. Now this seems like a typical question but can't seem to find an answer. Our "number-crunching" code is all C++. I am planning on using thrift to load data from HBase
into the client. The problem is that we're talking about a LOT of data. In a typical scenario I spawn 100 processes and each loads up about 10GB of data ~ 1TB. So I'm not sure if thrift will be fast enough. For management tasks we're going to write everything in Java so that is not a problem. The question then is is it better to write custom wrappers in JNI and bypass thrift completely? Purely for performance considerations.

Also, what seems like the timeline for the C client since like libhdfs my guess is HBase will provide c++ wrappers using JNI from what the discussion here looks like.

If thrift is the way to go then we are looking at creating a tool that takes an ODBC data source and loads all the data from one table to an HBase table. Again this will be in C+. Only if we find that the overhead of thrift is too much will we shift to java but that would mean double work writing clients for java and c+. Anyway, we could provide this code for the community.

Abhimanyu
added a comment - 30/Sep/10 19:46 I just started using Thrift and HBase. Now this seems like a typical question but can't seem to find an answer. Our "number-crunching" code is all C++. I am planning on using thrift to load data from HBase
into the client. The problem is that we're talking about a LOT of data. In a typical scenario I spawn 100 processes and each loads up about 10GB of data ~ 1TB. So I'm not sure if thrift will be fast enough. For management tasks we're going to write everything in Java so that is not a problem. The question then is is it better to write custom wrappers in JNI and bypass thrift completely? Purely for performance considerations.
Also, what seems like the timeline for the C client since like libhdfs my guess is HBase will provide c++ wrappers using JNI from what the discussion here looks like.
If thrift is the way to go then we are looking at creating a tool that takes an ODBC data source and loads all the data from one table to an HBase table. Again this will be in C+ . Only if we find that the overhead of thrift is too much will we shift to java but that would mean double work writing clients for java and c +. Anyway, we could provide this code for the community.

stack
added a comment - 28/Jul/12 22:48 Mikhail I think this worth an announcement out on the user mailing list? Its great stuff. If you don't want to do it, I will (better if you do it). I added note to refguide and pushed it out.

I'd be delighted to resolve this issue (excellent!) but just to be sure: Do we want to hold it open as a vehicle for moving the native-cpp-hbase-client code into the HBase tree proper, or no? If the latter, let's resolve.

Andrew Purtell
added a comment - 28/Jul/12 23:19 I'd be delighted to resolve this issue (excellent!) but just to be sure: Do we want to hold it open as a vehicle for moving the native-cpp-hbase-client code into the HBase tree proper, or no? If the latter, let's resolve.

Andrew Purtell
added a comment - 28/Jul/12 23:28 Mikhail Bautin Perhaps someone more versed in Thrift and its C++ language support in particular could say, but can we plug in Thrift's TSasl
{Client,Server}
Transport here for authenticated opens and optional wire encryption?

Hey Andrew. Someone here at Cloudera is working on SASL support for the Thrift C++ bindings, I believe – at least the client side – which should be compatible with the Java server. Hopefully we'll post it to THRIFT-1620 in the coming weeks.

Todd Lipcon
added a comment - 29/Jul/12 06:02 Hey Andrew. Someone here at Cloudera is working on SASL support for the Thrift C++ bindings, I believe – at least the client side – which should be compatible with the Java server. Hopefully we'll post it to THRIFT-1620 in the coming weeks.

I would agree. The embedded thrift servers in the regionservers were an experiment at FB that they've backed away from. THRIFT-1620 is open with no implementation available.

I think an HBase client library implemented in C is a mandatory feature for a database approaching 1.0 release.

The PB work is not finished.

The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client.

Various discussions about "native client" usually end with the notion of a Grand Unified Client Project: lighter weight async client, perhaps asynchbase itself or in the mold of it, talking PB to the cluster, with a sync API layered on top. It might be straightforward to build a C++ analogue to asynchbase with std::async (don't know enough about C++11 to say for sure). That does not provide an answer for C folks though.

Andrew Purtell
added a comment - 17/Jan/13 17:53 - edited I don't think the thrift core makes sense anymore, considering protobuf.
I would agree. The embedded thrift servers in the regionservers were an experiment at FB that they've backed away from. THRIFT-1620 is open with no implementation available.
I think an HBase client library implemented in C is a mandatory feature for a database approaching 1.0 release.
The PB work is not finished.
The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client.
Various discussions about "native client" usually end with the notion of a Grand Unified Client Project: lighter weight async client, perhaps asynchbase itself or in the mold of it, talking PB to the cluster, with a sync API layered on top. It might be straightforward to build a C++ analogue to asynchbase with std::async (don't know enough about C++11 to say for sure). That does not provide an answer for C folks though.

The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client.

Agreed; I mean the construction of a fully-featured client implementation available via C, not just transport. I've been out of C/C++ for a number of years, I'm entirely ignorant on C+11 so I cannot comment on implementation details. I do know that it's fairly common-place to wrap a C+ library with C bindings, so that decision can be left up to the implementor.

Nick Dimiduk
added a comment - 17/Jan/13 18:09 The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client.
Agreed; I mean the construction of a fully-featured client implementation available via C, not just transport. I've been out of C/C++ for a number of years, I'm entirely ignorant on C+ 11 so I cannot comment on implementation details. I do know that it's fairly common-place to wrap a C + library with C bindings, so that decision can be left up to the implementor.

I will throw out there that libhdfs "cheats" by linking to libjvm.so and pulling in the HDFS client bytecode as engine. I presume we don't want this, but it would be a half measure that stands in for something comprehensive.

Andrew Purtell
added a comment - 17/Jan/13 18:13 I will throw out there that libhdfs "cheats" by linking to libjvm.so and pulling in the HDFS client bytecode as engine. I presume we don't want this, but it would be a half measure that stands in for something comprehensive.

Ted Dunning
added a comment - 10/Jan/14 22:09 Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot?
Shouldn't reality be recognized? Shouldn't this be closed as WONT_FIX?

Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot?

This issue has been superseded by the use of protobuf in RPCs instead of Thrift and the commit of the start of a C/C++ client library, see HBASE-9977. Closing this issue in lieu of something else is fine, but WONTFIX is the incorrect resolution.

Andrew Purtell
added a comment - 10/Jan/14 22:30 Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot?
This issue has been superseded by the use of protobuf in RPCs instead of Thrift and the commit of the start of a C/C++ client library, see HBASE-9977 . Closing this issue in lieu of something else is fine, but WONTFIX is the incorrect resolution.

")
+# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
+ which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
+ CHECK2=`echo "$arg"|egrep c "^"` ### Determine if an option
+ (9) set – "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
+# Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules
+@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.