Coprocessors

Folks:
This is my first post on the HBase user mailing list.
I have the following scenario:
I've a HBase table of upto a billion keys. I'm looking to support an application where on some user action,
I'd need to fetch multiple columns for upto 250K keys and do some sort of aggregation on it. Fetching all
that data and doing the aggregation in my application takes about a minute.
I'm looking to co-locate the aggregation logic with the region servers to
a. Distribute the aggregation
b. Avoid having to fetch large amounts of data over the network (this could potentially be cross-datacenter)
Neither observers nor aggregation endpoints work for this use case. Observers don't return data back to
the client while aggregation endpoints work in the context of scans not a multi-get (Are these correct assumptions?).
I'm looking to write a service that runs alongside the region servers and acts a proxy b/w my application and
the region servers.
I plan to use the logic in HBase client's HConnectionManager, to segment my request of 1M rowkeys into
sub-requests per region-server. These are sent over to the proxy which fetches the data from the region
server, aggregates locally and sends data back. Does this sound reasonable or even a useful thing to pursue?
Regards,
-sudarshan

Re: Coprocessors

You might want to have a look at Phoenix (https://github.com/forcedotcom/phoenix), which does that and
more, and gives a SQL/JDBC interface.
-- Lars
________________________________
From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <skadambi@...>
To: user@...
Sent: Thursday, April 25, 2013 2:44 PM
Subject: Coprocessors
Folks:
This is my first post on the HBase user mailing list.
I have the following scenario:
I've a HBase table of upto a billion keys. I'm looking to support an application where on some user action,
I'd need to fetch multiple columns for upto 250K keys and do some sort of aggregation on it. Fetching all
that data and doing the aggregation in my application takes about a minute.
I'm looking to co-locate the aggregation logic with the region servers to
a. Distribute the aggregation
b. Avoid having to fetch large amounts of data over the network (this could potentially be cross-datacenter)
Neither observers nor aggregation endpoints work for this use case. Observers don't return data back to
the client while aggregation endpoints work in the context of scans not a multi-get (Are these correct assumptions?).
I'm looking to write a service that runs alongside the region servers and acts a proxy b/w my application and
the region servers.

Re: Coprocessors

I don't think Phoenix will solve his problem.
He also needs to explain more about his problem before we can start to think about the problem.
On Apr 25, 2013, at 4:54 PM, lars hofhansl <larsh@...> wrote:
> You might want to have a look at Phoenix (https://github.com/forcedotcom/phoenix), which does that and
more, and gives a SQL/JDBC interface.
>
> -- Lars
>
>
>
> ________________________________
> From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <skadambi@...>
> To: user@...
> Sent: Thursday, April 25, 2013 2:44 PM
> Subject: Coprocessors
>
>
> Folks:
>
> This is my first post on the HBase user mailing list.
>
> I have the following scenario:
> I've a HBase table of upto a billion keys. I'm looking to support an application where on some user action,
I'd need to fetch multiple columns for upto 250K keys and do some sort of aggregation on it. Fetching all
that data and doing the aggregation in my application takes about a minute.
>
> I'm looking to co-locate the aggregation logic with the region servers to

Re: Coprocessors

Phoenix might be able to solve the problem if the keys are structured in
the binary format that it understand or else you are better off reloading
that data in a table created via Phoenix. But I will let James tackle this
question.
Regarding your use-case, why can't you do the aggregation using observers ?
You should be able to do the aggregation and return a new Scanner to your
client.
And Lars is right about the range scans that Phoenix does. It does restrict
things and also will do parallel scans for you based on what you
select/filter.
-Viral
On Thu, Apr 25, 2013 at 3:12 PM, Michael Segel <michael_segel@...>wrote:
> I don't think Phoenix will solve his problem.
>
> He also needs to explain more about his problem before we can start to
> think about the problem.
>
>
> On Apr 25, 2013, at 4:54 PM, lars hofhansl <larsh@...> wrote:
>
> > You might want to have a look at Phoenix (
> https://github.com/forcedotcom/phoenix), which does that and more, and
> gives a SQL/JDBC interface.
> >
> > -- Lars

Re: Coprocessors

Thanks Lars. I briefly looked into Phoenix but it appeared to do full-table scans to perform the
aggregation. The same goes with Impala. If you think otherwise, I'll look into it again.
----- Original Message -----
From: user <at> hbase.apache.org
To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), user <at> hbase.apache.org
At: Apr 25 2013 17:54:48
You might want to have a look at Phoenix (https://github.com/forcedotcom/phoenix), which does that and
more, and gives a SQL/JDBC interface.
-- Lars
________________________________
From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <skadambi <at> bloomberg.net>
To: user <at> hbase.apache.org
Sent: Thursday, April 25, 2013 2:44 PM
Subject: Coprocessors
Folks:
This is my first post on the HBase user mailing list.
I have the following scenario:
I've a HBase table of upto a billion keys. I'm looking to support an application where on some user action,
I'd need to fetch multiple columns for upto 250K keys and do some sort of aggregation on it. Fetching all
that data and doing the aggregation in my application takes about a minute.

Re: Coprocessors

> I'm looking to write a service that runs alongside the region servers and
> acts a proxy b/w my application and the region servers.
>
> I plan to use the logic in HBase client's HConnectionManager, to segment
> my request of 1M rowkeys into sub-requests per region-server. These are
> sent over to the proxy which fetches the data from the region server,
> aggregates locally and sends data back. Does this sound reasonable or even
> a useful thing to pursue?
>
>
This is essentially what coprocessor endpoints (called through
HTable.coprocessorExec()) basically do. (One difference is that there is a
parallel request per-region, not per-region server, though that is a
potential optimization that could be made as well).
The tricky part I see for the case you describe is splitting your full set
of row keys up correctly per region. You could send the full set of row
keys to each endpoint invocation, and have the endpoint implementation
filter down to only those keys present in the current region. But that
would be a lot of overhead on the request side. You could split the row
keys into per-region sets on the client side, but I'm not sure we provide
sufficient context for the Batch.Callable instance you provide to
coprocessorExec() to determine which region it is being invoked against.

Re: Coprocessors

On 04/25/2013 03:35 PM, Gary Helmling wrote:
>> I'm looking to write a service that runs alongside the region servers and
>> acts a proxy b/w my application and the region servers.
>>
>> I plan to use the logic in HBase client's HConnectionManager, to segment
>> my request of 1M rowkeys into sub-requests per region-server. These are
>> sent over to the proxy which fetches the data from the region server,
>> aggregates locally and sends data back. Does this sound reasonable or even
>> a useful thing to pursue?
>>
>>
> This is essentially what coprocessor endpoints (called through
> HTable.coprocessorExec()) basically do. (One difference is that there is a
> parallel request per-region, not per-region server, though that is a
> potential optimization that could be made as well).
>
> The tricky part I see for the case you describe is splitting your full set
> of row keys up correctly per region. You could send the full set of row
> keys to each endpoint invocation, and have the endpoint implementation
> filter down to only those keys present in the current region. But that
> would be a lot of overhead on the request side. You could split the row
> keys into per-region sets on the client side, but I'm not sure we provide
> sufficient context for the Batch.Callable instance you provide to
> coprocessorExec() to determine which region it is being invoked against.
Sudarshan,
In our head branch of Phoenix (we're targeting this for a 1.2 release in
two weeks), we've implemented a skip scan filter that functions similar
to a batched get, except:
1) it's more flexible in that it can jump not only from a single key to

Re: Coprocessors

Michael: Fair enough. Let me see what relevant information I can add to what I've already said:
1. To Lars' point, my 250K keys are unlikely to fall into fewer than 250K sub-ranges.
2. Here's a bit more about my schema:
2.1 My rowkeys are composed of 2 entities - let's call it object-id and field-type. An object (O1) has 100s of
field types (F1,F2,F3...). Each object-id - field-type pair has 100s of attributes (A1,A2,A3).
2.2 My rowkeys are O1-F1, O1-F2, O1-F3, etc.
2.3 My primary application (not the one my original post was about) accesses by these rowkeys.
2.4 My application that does aggregation is given a bunch of objects <O1, O2, O3>, a field-type <F1>, a bunch
of attributes <A1,A2> and some computation to perform.
2.5 As you can see, scans are unlikely to be useful when fetching O1-F1, O2-F1, O3-F1 etc.
Viral: How do I tackle aggregation using observers? Let's say I override the postGet method. I do a
multi-get from my client and my method gets called on each region server for each row. What is the next step
with this approach?
----- Original Message -----
From: user <at> hbase.apache.org
To: larsh <at> apache.org, user <at> hbase.apache.org
Cc: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
At: Apr 25 2013 18:12:46
I don't think Phoenix will solve his problem.
He also needs to explain more about his problem before we can start to think about the problem.
On Apr 25, 2013, at 4:54 PM, lars hofhansl <larsh <at> apache.org> wrote:

Re: Coprocessors

Thanks for the additional info, Sudarshan. This would fit well with the
implementation of Phoenix's skip scan.
CREATE TABLE t (
object_id INTEGER NOT NULL,
field_type INTEGER NOT NULL,
attrib_id INTEGER NOT NULL,
value BIGINT
CONSTRAINT pk PRIMARY KEY (object_id, field_type, attribute_id));
SELECT count(value), sum(value),avg(value) FROM t
WHERE object_id IN (?,?,?) AND field_type IN (?,?,?) AND attribute_type
IN (?,?,?)
and then your client would do whatever additional computation it needed
on the results it got back.
Would that fit with what you're trying to do?
James
On 04/25/2013 03:36 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) wrote:
> Michael: Fair enough. Let me see what relevant information I can add to what I've already said:
>
> 1. To Lars' point, my 250K keys are unlikely to fall into fewer than 250K sub-ranges.
> 2. Here's a bit more about my schema:
> 2.1 My rowkeys are composed of 2 entities - let's call it object-id and field-type. An object (O1) has 100s
of field types (F1,F2,F3...). Each object-id - field-type pair has 100s of attributes (A1,A2,A3).
> 2.2 My rowkeys are O1-F1, O1-F2, O1-F3, etc.
> 2.3 My primary application (not the one my original post was about) accesses by these rowkeys.

Re: Coprocessors

Hi,
Lets reiterate what you've said....
You have a set of objects <O1, O2..... On> and you have some field type <F1> where F1 which is part of your
composite key. You want to fetch back a set of rows and then do some aggregation on the attributes.
There was a similar discussion on this where someone had a random set of values and was having performance
issues.
If your set of objects is in sort order and you have only one field type <F1> you should be able to do the
multi-gets.
Are you currently using the multigets ?
On Apr 25, 2013, at 5:36 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
<skadambi@...> wrote:
> Michael: Fair enough. Let me see what relevant information I can add to what I've already said:
>
> 1. To Lars' point, my 250K keys are unlikely to fall into fewer than 250K sub-ranges.
> 2. Here's a bit more about my schema:
> 2.1 My rowkeys are composed of 2 entities - let's call it object-id and field-type. An object (O1) has 100s
of field types (F1,F2,F3...). Each object-id - field-type pair has 100s of attributes (A1,A2,A3).
> 2.2 My rowkeys are O1-F1, O1-F2, O1-F3, etc.
> 2.3 My primary application (not the one my original post was about) accesses by these rowkeys.
> 2.4 My application that does aggregation is given a bunch of objects <O1, O2, O3>, a field-type <F1>, a
bunch of attributes <A1,A2> and some computation to perform.
> 2.5 As you can see, scans are unlikely to be useful when fetching O1-F1, O2-F1, O3-F1 etc.
>

Re: Coprocessors

James: First of all, this looks quite promising.
The table schema outlined in your other message is correct except that attrib_id will not be in the primary
key. Will that be a problem with respect to the skip-scan filter's performance? (it doesn't seem like it...)
Could you share any sort of benchmark numbers? I want to try this out right away, but I've to wait for my
cluster administrator to upgrade us from HBase 0.92 first!
----- Original Message -----
From: user <at> hbase.apache.org
To: user <at> hbase.apache.org
At: Apr 25 2013 18:45:14
On 04/25/2013 03:35 PM, Gary Helmling wrote:
>> I'm looking to write a service that runs alongside the region servers and
>> acts a proxy b/w my application and the region servers.
>>
>> I plan to use the logic in HBase client's HConnectionManager, to segment
>> my request of 1M rowkeys into sub-requests per region-server. These are
>> sent over to the proxy which fetches the data from the region server,
>> aggregates locally and sends data back. Does this sound reasonable or even
>> a useful thing to pursue?
>>
>>
> This is essentially what coprocessor endpoints (called through
> HTable.coprocessorExec()) basically do. (One difference is that there is a
> parallel request per-region, not per-region server, though that is a
> potential optimization that could be made as well).
>
> The tricky part I see for the case you describe is splitting your full set

Re: Coprocessors

Our performance engineer, Mujtaba Chohan has agreed to put together a
benchmark for you. We only have a four node cluster of pretty average
boxes, but it should give you an idea.
No performance impact for the attrib_id not being part of the PK since
you're not filtering on them (if I understand things correctly).
A few more questions for you:
- How many rows should be use? 1B?
- How many rows would be filtered by object_id and field_type?
- Any particular key distribution or is random fine?
- What's the minimum key size we should use for object_id and
field_type? 2 bytes each?
- Any particular kind of aggregation? count(attrib1)? sum(attrib1)? A
sample query would be helpful
Since you're upgrading, use the latest on the 0.94 branch, 0.94.7.
Thanks,
James
On 04/25/2013 04:19 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) wrote:
> James: First of all, this looks quite promising.
>
> The table schema outlined in your other message is correct except that attrib_id will not be in the primary
key. Will that be a problem with respect to the skip-scan filter's performance? (it doesn't seem like it...)
>
> Could you share any sort of benchmark numbers? I want to try this out right away, but I've to wait for my
cluster administrator to upgrade us from HBase 0.92 first!

Re: Coprocessors

Sudarshan,
Below are the results that Mujtaba put together. He put together two
version of your schema: one with the ATTRIBID as part of the row key
and one with it as a key value. He also benchmarked the query time both
when all of the data was in the cache versus when all of the data was
read off of disk.
Let us know if you have any questions/follow up.
Thanks,
James (& Mujtaba)
Compute Average over 250K random rows in 1B row table
ATTRIBID in row key
Data from HBase cache Data loaded from disk
Phoenix Skip Scan 1.4 sec 31 sec
HBase Batched Gets 3.8 sec 58 sec
HBase Range Scan - 10+ min
ATTRIBID as key value
Data from HBase cache Data loaded from disk
Phoenix Skip Scan 1.7 sec 37 sec
HBase Batched Gets 4.0 sec 82 sec
HBase Range Scan - 10+ min
Details
-------
HBase 0.94.7 Hadoop 1.04