Hi,I am finding an odd behavior with the Coprocessor performance lagging a client side Get.I have a table with 500000 rows. Each have variable # of columns in one column family (in this case about 600000 columns in total are processed)When I try to get specific 55 rows, the client side completes in half-the time as the coprocessor endpoint.I am using 55 RowFilters on the Coprocessor scan side. The rows are processed are exactly the same way in both the cases.Any pointers on how to debug this scenario ?

> Hi,> I am finding an odd behavior with the Coprocessor performance lagging a> client side Get.> I have a table with 500000 rows. Each have variable # of columns in one> column family (in this case about 600000 columns in total are processed)> When I try to get specific 55 rows, the client side completes in half-the> time as the coprocessor endpoint.> I am using 55 RowFilters on the Coprocessor scan side. The rows are> processed are exactly the same way in both the cases.> Any pointers on how to debug this scenario ?>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com

There are 31 regions on 4 nodes X 8 CPU.I am on 0.94.6 (from Hortonworks).I think it seems to behave like what linwukang says, - it is almost a full table scan in the coprocessor. Actually, when I set more specific ColumnPrefixFilters performance went down.I want to do things on the server side because, I dont want to be sending 500K column/values to the client.I cannot believe a single-threaded client which does some calculations and group-by beats the coprocessor running in 31 regions.

> Hi,> I am finding an odd behavior with the Coprocessor performance lagging a> client side Get.> I have a table with 500000 rows. Each have variable # of columns in one> column family (in this case about 600000 columns in total are processed)> When I try to get specific 55 rows, the client side completes in half-the> time as the coprocessor endpoint.> I am using 55 RowFilters on the Coprocessor scan side. The rows are> processed are exactly the same way in both the cases.> Any pointers on how to debug this scenario ?>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com

> Ted,> Here is the method signature/protocol> public Map<String, Double> getFooMap<String, Double> input,> int topN) throws IOException;> > There are 31 regions on 4 nodes X 8 CPU.> I am on 0.94.6 (from Hortonworks).> I think it seems to behave like what linwukang says, - it is almost a full table scan in the coprocessor. > Actually, when I set more specific ColumnPrefixFilters performance went down.> I want to do things on the server side because, I dont want to be sending 500K column/values to the client.> I cannot believe a single-threaded client which does some calculations and group-by beats the coprocessor running in 31 regions.> > Regards,> - kiru> > > Kiru Pakkirisamy | webcloudtech.wordpress.com> > > ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]> > Sent: Thursday, August 8, 2013 8:40 PM> Subject: Re: Client Get vs Coprocessor scan performance> > > Can you give us a bit more information ?> > How do you deliver the 55 rowkeys to your endpoint ?> How many regions do you have for this table ?> > What HBase version are you using ?> > Thanks> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> <[EMAIL PROTECTED]>wrote:> >> Hi,>> I am finding an odd behavior with the Coprocessor performance lagging a>> client side Get.>> I have a table with 500000 rows. Each have variable # of columns in one>> column family (in this case about 600000 columns in total are processed)>> When I try to get specific 55 rows, the client side completes in half-the>> time as the coprocessor endpoint.>> I am using 55 RowFilters on the Coprocessor scan side. The rows are>> processed are exactly the same way in both the cases.>> Any pointers on how to debug this scenario ?>> >> Regards,>> - kiru>> >> >> Kiru Pakkirisamy | webcloudtech.wordpress.com

> Ted,> Here is the method signature/protocol> public Map<String, Double> getFooMap<String, Double> input,> int topN) throws IOException;> > There are 31 regions on 4 nodes X 8 CPU.> I am on 0.94.6 (from Hortonworks).> I think it seems to behave like what linwukang says, - it is almost a full table scan in the coprocessor. > Actually, when I set more specific ColumnPrefixFilters performance went down.> I want to do things on the server side because, I dont want to be sending 500K column/values to the client.> I cannot believe a single-threaded client which does some calculations and group-by beats the coprocessor running in 31 regions.> > Regards,> - kiru> > > Kiru Pakkirisamy | webcloudtech.wordpress.com> > > ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]> > Sent: Thursday, August 8, 2013 8:40 PM> Subject: Re: Client Get vs Coprocessor scan performance> > > Can you give us a bit more information ?> > How do you deliver the 55 rowkeys to your endpoint ?> How many regions do you have for this table ?> > What HBase version are you using ?> > Thanks> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> <[EMAIL PROTECTED]>wrote:> >> Hi,>> I am finding an odd behavior with the Coprocessor performance lagging a>> client side Get.>> I have a table with 500000 rows. Each have variable # of columns in one>> column family (in this case about 600000 columns in total are processed)>> When I try to get specific 55 rows, the client side completes in half-the>> time as the coprocessor endpoint.>> I am using 55 RowFilters on the Coprocessor scan side. The rows are>> processed are exactly the same way in both the cases.>> Any pointers on how to debug this scenario ?>> >> Regards,>> - kiru>> >> >> Kiru Pakkirisamy | webcloudtech.wordpress.com

Hi Kiru, Sorry for my poor english. If you perform a batch GET using HTable.get(List<Get>), it not a reallysingle-threaded operation. It will first sort and group the gets you inputby region, and execute the get operation for each 'Group' in a thread pool.the HTable.get(List<Get>) call will only connect to the regionServer whichcontain the rows you query, say 2/4 RS(if the 55 rows are distributedon only 2 RS, 4 regions). The connect and disconnect operator is much cost,if you dont specified a startkey and endkey, all coprocessors deplied onthe 31 regions will be called, though most of those are irrelevant. In your case, you can first sort and group the rows by yourself, anduse a multi-threads client to invoke the coprocessor protocol per group, soonly the coprocessors on the righ regions can be called exactly.as theregion localtion infomation is cache by the client, it is not too cost.(Idont know whether there is a interface to do this, performance acoprocessor call like batch get.)2013/8/9 Kiru Pakkirisamy <[EMAIL PROTECTED]>

> Ted,> Here is the method signature/protocol> public Map<String, Double> getFooMap<String, Double> input,> int topN) throws IOException;>> There are 31 regions on 4 nodes X 8 CPU.> I am on 0.94.6 (from Hortonworks).> I think it seems to behave like what linwukang says, - it is almost a full> table scan in the coprocessor.> Actually, when I set more specific ColumnPrefixFilters performance went> down.> I want to do things on the server side because, I dont want to be sending> 500K column/values to the client.> I cannot believe a single-threaded client which does some calculations and> group-by beats the coprocessor running in 31 regions.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>> Sent: Thursday, August 8, 2013 8:40 PM> Subject: Re: Client Get vs Coprocessor scan performance>>> Can you give us a bit more information ?>> How do you deliver the 55 rowkeys to your endpoint ?> How many regions do you have for this table ?>> What HBase version are you using ?>> Thanks>> On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> <[EMAIL PROTECTED]>wrote:>> > Hi,> > I am finding an odd behavior with the Coprocessor performance lagging a> > client side Get.> > I have a table with 500000 rows. Each have variable # of columns in one> > column family (in this case about 600000 columns in total are processed)> > When I try to get specific 55 rows, the client side completes in half-the> > time as the coprocessor endpoint.> > I am using 55 RowFilters on the Coprocessor scan side. The rows are> > processed are exactly the same way in both the cases.> > Any pointers on how to debug this scenario ?> >> > Regards,> > - kiru> >> >> > Kiru Pakkirisamy | webcloudtech.wordpress.com>

Hi Kiru, Sorry for my poor english. If you perform a batch GET using HTable.get(List<Get>), it not a reallysingle-threaded operation. It will first sort and group the gets you inputby region, and execute the get operation for each 'Group' in a thread pool.the HTable.get(List<Get>) call will only connect to the regionServer whichcontain the rows you query, say 2/4 RS(if the 55 rows are distributedon only 2 RS, 4 regions). The connect and disconnect operator is much cost,if you dont specified a startkey and endkey, all coprocessors deplied onthe 31 regions will be called, though most of those are irrelevant. In your case, you can first sort and group the rows by yourself, anduse a multi-threads client to invoke the coprocessor protocol per group, soonly the coprocessors on the righ regions can be called exactly.as theregion localtion infomation is cache by the client, it is not too cost.(Idont know whether there is a interface to do this, performance acoprocessor call like batch get.)2013/8/9 Kiru Pakkirisamy <[EMAIL PROTECTED]>

> Ted,> Here is the method signature/protocol> public Map<String, Double> getFooMap<String, Double> input,> int topN) throws IOException;>> There are 31 regions on 4 nodes X 8 CPU.> I am on 0.94.6 (from Hortonworks).> I think it seems to behave like what linwukang says, - it is almost a full> table scan in the coprocessor.> Actually, when I set more specific ColumnPrefixFilters performance went> down.> I want to do things on the server side because, I dont want to be sending> 500K column/values to the client.> I cannot believe a single-threaded client which does some calculations and> group-by beats the coprocessor running in 31 regions.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>> Sent: Thursday, August 8, 2013 8:40 PM> Subject: Re: Client Get vs Coprocessor scan performance>>> Can you give us a bit more information ?>> How do you deliver the 55 rowkeys to your endpoint ?> How many regions do you have for this table ?>> What HBase version are you using ?>> Thanks>> On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> <[EMAIL PROTECTED]>wrote:>> > Hi,> > I am finding an odd behavior with the Coprocessor performance lagging a> > client side Get.> > I have a table with 500000 rows. Each have variable # of columns in one> > column family (in this case about 600000 columns in total are processed)> > When I try to get specific 55 rows, the client side completes in half-the> > time as the coprocessor endpoint.> > I am using 55 RowFilters on the Coprocessor scan side. The rows are> > processed are exactly the same way in both the cases.> > Any pointers on how to debug this scenario ?> >> > Regards,> > - kiru> >> >> > Kiru Pakkirisamy | webcloudtech.wordpress.com>

I think this fixes my issues. On our dev cluster what used to take 1200 msec is now in the 700-800 msec region. Thanks again.I will be soon deploying this to our Performance cluster where our query is at 15 secs range.

> Ted,> Here is the method signature/protocol> public Map<String, Double> getFooMap<String, Double> input,> int topN) throws IOException;> > There are 31 regions on 4 nodes X 8 CPU.> I am on 0.94.6 (from Hortonworks).> I think it seems to behave like what linwukang says, - it is almost a full table scan in the coprocessor. > Actually, when I set more specific ColumnPrefixFilters performance went down.> I want to do things on the server side because, I dont want to be sending 500K column/values to the client.> I cannot believe a single-threaded client which does some calculations and group-by beats the coprocessor running in 31 regions.> > Regards,> - kiru> > > Kiru Pakkirisamy | webcloudtech.wordpress.com> > > ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]> > Sent: Thursday, August 8, 2013 8:40 PM> Subject: Re: Client Get vs Coprocessor scan performance> > > Can you give us a bit more information ?> > How do you deliver the 55 rowkeys to your endpoint ?> How many regions do you have for this table ?> > What HBase version are you using ?> > Thanks> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> <[EMAIL PROTECTED]>wrote:> >> Hi,>> I am finding an odd behavior with the Coprocessor performance lagging a>> client side Get.>> I have a table with 500000 rows. Each have variable # of columns in one>> column family (in this case about 600000 columns in total are processed)>> When I try to get specific 55 rows, the client side completes in half-the>> time as the coprocessor endpoint.>> I am using 55 RowFilters on the Coprocessor scan side. The rows are>> processed are exactly the same way in both the cases.>> Any pointers on how to debug this scenario ?>> >> Regards,>> - kiru>> >> >> Kiru Pakkirisamy | webcloudtech.wordpress.com

Maybe I spoke too soon. HBASE-6870 fixes the table scan (as verified by metrics of read requests on the region).But the performance with RowFilter is very bad (actually worse than a full table scan, dont know how this can happen).API I hope my API usage is right. All I am doing is add RowFilters to FilterList and setFilter on the scan.I tried looking into the AggregateImplementation (which is mentioned as unit test for this bug) but did not follow through because I am in a rush for a good workaround.I have now replaced RowFilters with a Get on the Region (in a loop) after making sure my key is within startKey and endKey of the region.I think this is getting my data right. Performance is very good, almost half that of the full scan code we had in the coprocessor earlier.Are there any gotchas/bad side-effects to using a Get on the Region ?

I think this fixes my issues. On our dev cluster what used to take 1200 msec is now in the 700-800 msec region. Thanks again.I will be soon deploying this to our Performance cluster where our query is at 15 secs range.

> Ted,> Here is the method signature/protocol> public Map<String, Double> getFooMap<String, Double> input,> int topN) throws IOException;> > There are 31 regions on 4 nodes X 8 CPU.> I am on 0.94.6 (from Hortonworks).> I think it seems to behave like what linwukang says, - it is almost a full table scan in the coprocessor. > Actually, when I set more specific ColumnPrefixFilters performance went down.> I want to do things on the server side because, I dont want to be sending 500K column/values to the client.> I cannot believe a single-threaded client which does some calculations and group-by beats the coprocessor running in 31 regions.> > Regards,> - kiru> > > Kiru Pakkirisamy | webcloudtech.wordpress.com> > > ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]> > Sent: Thursday, August 8, 2013 8:40 PM> Subject: Re: Client Get vs Coprocessor scan performance> > > Can you give us a bit more information ?> > How do you deliver the 55 rowkeys to your endpoint ?> How many regions do you have for this table ?> > What HBase version are you using ?> > Thanks> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> <[EMAIL PROTECTED]>wrote:> >> Hi,>> I am finding an odd behavior with the Coprocessor performance lagging a>> client side Get.>> I have a table with 500000 rows. Each have variable # of columns in one>> column family (in this case about 600000 columns in total are processed)>> When I try to get specific 55 rows, the client side completes in half-the>> time as the coprocessor endpoint.>> I am using 55 RowFilters on the Coprocessor scan side. The rows are>> processed are exactly the same way in both the cases.>> Any pointers on how to debug this scenario ?>> >> Regards,>> - kiru>> >> >> Kiru Pakkirisamy | webcloudtech.wordpress.com

Hey Kiru,Another option for you may be to use Phoenix (https://github.com/forcedotcom/phoenix). In particular, our skip scan maybe what you're looking for:http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html.Under-the-covers, the skip scan is doing a series of parallel scans takingadvantage of both coprocessors and the SEEK_NEXT_USING_HINT. As you cansee, it's more than 2x faster than the batched get approach. On top ofthat, your queries do not only have to be doing point gets, but range scansleverage it as well.Thanks,James@JamesPlusPlusOn Sat, Aug 10, 2013 at 11:15 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

> Maybe I spoke too soon. HBASE-6870 fixes the table scan (as verified by> metrics of read requests on the region).> But the performance with RowFilter is very bad (actually worse than a full> table scan, dont know how this can happen).API> I hope my API usage is right. All I am doing is add RowFilters to> FilterList and setFilter on the scan.> I tried looking into the AggregateImplementation (which is mentioned as> unit test for this bug) but did not follow through because I am in a rush> for a good workaround.> I have now replaced RowFilters with a Get on the Region (in a loop) after> making sure my key is within startKey and endKey of the region.> I think this is getting my data right. Performance is very good, almost> half that of the full scan code we had in the coprocessor earlier.> Are there any gotchas/bad side-effects to using a Get on the Region ?>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: Kiru Pakkirisamy <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Sent: Friday, August 9, 2013 1:04 PM> Subject: Re: Client Get vs Coprocessor scan performance>>> I think this fixes my issues. On our dev cluster what used to take 1200> msec is now in the 700-800 msec region. Thanks again.> I will be soon deploying this to our Performance cluster where our query> is at 15 secs range.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: Ted Yu <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Sent: Thursday, August 8, 2013 10:44 PM> Subject: Re: Client Get vs Coprocessor scan performance>>> I think you need HBASE-6870 which went into 0.94.8>> Upgrading should boost coprocessor performance.>> Cheers>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]>> wrote:>> > Ted,> > Here is the method signature/protocol> > public Map<String, Double> getFooMap<String, Double> input,> > int topN) throws IOException;> >> > There are 31 regions on 4 nodes X 8 CPU.> > I am on 0.94.6 (from Hortonworks).> > I think it seems to behave like what linwukang says, - it is almost a> full table scan in the coprocessor.> > Actually, when I set more specific ColumnPrefixFilters performance went> down.> > I want to do things on the server side because, I dont want to be> sending 500K column/values to the client.> > I cannot believe a single-threaded client which does some calculations> and group-by beats the coprocessor running in 31 regions.> >> > Regards,> > - kiru> >> >> > Kiru Pakkirisamy | webcloudtech.wordpress.com> >> >> > ________________________________> > From: Ted Yu <[EMAIL PROTECTED]>> > To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>> > Sent: Thursday, August 8, 2013 8:40 PM> > Subject: Re: Client Get vs Coprocessor scan performance> >> >> > Can you give us a bit more information ?> >> > How do you deliver the 55 rowkeys to your endpoint ?> > How many regions do you have for this table ?> >> > What HBase version are you using ?> >> > Thanks> >> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> > <[EMAIL PROTECTED]>wrote:

James,We actually planned to use Phoenix for this project. But we did not have much time to design on top of Phoenix. Also, this app is more like a 'search' app and I wanted it to be doing just "key lookups". There is no write and everything is in block cache.Still, yes, let me take a look at your code. Maybe, we will get a chance to rewrite this on top of Phoenix.Thanks for your tip and reminder,

Maybe I spoke too soon. HBASE-6870 fixes the table scan (as verified by metrics of read requests on the region).>But the performance with RowFilter is very bad (actually worse than a full table scan, dont know how this can happen).API >I hope my API usage is right. All I am doing is add RowFilters to FilterList and setFilter on the scan.>I tried looking into the AggregateImplementation (which is mentioned as unit test for this bug) but did not follow through because I am in a rush for a good workaround.>I have now replaced RowFilters with a Get on the Region (in a loop) after making sure my key is within startKey and endKey of the region.>I think this is getting my data right. Performance is very good, almost half that of the full scan code we had in the coprocessor earlier.>Are there any gotchas/bad side-effects to using a Get on the Region ?>> >Regards,>- kiru>>>Kiru Pakkirisamy | webcloudtech.wordpress.com>>>>________________________________> From: Kiru Pakkirisamy <[EMAIL PROTECTED]>>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>Sent: Friday, August 9, 2013 1:04 PM>>Subject: Re: Client Get vs Coprocessor scan performance>>>I think this fixes my issues. On our dev cluster what used to take 1200 msec is now in the 700-800 msec region. Thanks again.>I will be soon deploying this to our Performance cluster where our query is at 15 secs range.> >Regards,>- kiru>>>Kiru Pakkirisamy | webcloudtech.wordpress.com>>>________________________________>From: Ted Yu <[EMAIL PROTECTED]>>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>Sent: Thursday, August 8, 2013 10:44 PM>Subject: Re: Client Get vs Coprocessor scan performance>>>I think you need HBASE-6870 which went into 0.94.8>>Upgrading should boost coprocessor performance.>>Cheers>>On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:>>> Ted,>> Here is the method signature/protocol>> public Map<String, Double> getFooMap<String, Double> input,>> int topN) throws IOException;>>>> There are 31 regions on 4 nodes X 8 CPU.>> I am on 0.94.6 (from Hortonworks).>> I think it seems to behave like what linwukang says, - it is almost a full table scan in the coprocessor.>> Actually, when I set more specific ColumnPrefixFilters performance went down.>> I want to do things on the server side because, I dont want to be sending 500K column/values to the client.>> I cannot believe a single-threaded client which does some calculations and group-by beats the coprocessor running in 31 regions.

Ted, can you elaborate a little bit why this issue boosts performance?I couldn't figure out from the issue comments if they execCoprocessor scansthe entire .META. table or and entire table, to understand the actualimprovement.

> I think you need HBASE-6870 which went into 0.94.8>> Upgrading should boost coprocessor performance.>> Cheers>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]>> wrote:>> > Ted,> > Here is the method signature/protocol> > public Map<String, Double> getFooMap<String, Double> input,> > int topN) throws IOException;> >> > There are 31 regions on 4 nodes X 8 CPU.> > I am on 0.94.6 (from Hortonworks).> > I think it seems to behave like what linwukang says, - it is almost a> full table scan in the coprocessor.> > Actually, when I set more specific ColumnPrefixFilters performance went> down.> > I want to do things on the server side because, I dont want to be> sending 500K column/values to the client.> > I cannot believe a single-threaded client which does some calculations> and group-by beats the coprocessor running in 31 regions.> >> > Regards,> > - kiru> >> >> > Kiru Pakkirisamy | webcloudtech.wordpress.com> >> >> > ________________________________> > From: Ted Yu <[EMAIL PROTECTED]>> > To: [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>> > Sent: Thursday, August 8, 2013 8:40 PM> > Subject: Re: Client Get vs Coprocessor scan performance> >> >> > Can you give us a bit more information ?> >> > How do you deliver the 55 rowkeys to your endpoint ?> > How many regions do you have for this table ?> >> > What HBase version are you using ?> >> > Thanks> >> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy> > <[EMAIL PROTECTED]>wrote:> >> >> Hi,> >> I am finding an odd behavior with the Coprocessor performance lagging a> >> client side Get.> >> I have a table with 500000 rows. Each have variable # of columns in one> >> column family (in this case about 600000 columns in total are processed)> >> When I try to get specific 55 rows, the client side completes in> half-the> >> time as the coprocessor endpoint.> >> I am using 55 RowFilters on the Coprocessor scan side. The rows are> >> processed are exactly the same way in both the cases.> >> Any pointers on how to debug this scenario ?> >>> >> Regards,> >> - kiru> >>> >>> >> Kiru Pakkirisamy | webcloudtech.wordpress.com>

Ted,On a table with 600K rows, Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on the whole length of the key). I thought the FuzzyRowFilter's SEEK_NEXT_USING_HINT would help. All this on the client side, I have not changed my CoProcessor to use the FuzzyRowFilter based on the client side performance (still doing multiple get inside the coprocessor). Also, I am seeing very bad concurrent query performance. Are there any thing that would make Coprocessors almost single threaded across multiple invocations ?Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems to be very good in bringing up the regions online fast and balanced. Thanks and much appreciated.

> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on> the whole length of the key)>> In this case the Get's are very selective. The number of rows FuzzyRowFilter> was evaluated against would be much higher.> It would be nice if you remember the time each took.>> bq. Also, I am seeing very bad concurrent query performance>> Were the multi Get's performed by your coprocessor within region boundary> of the respective coprocessor ? Just to confirm.>> bq. that would make Coprocessors almost single threaded across multiple> invocations ?>> Let me dig into code some more.>> Cheers>>> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <> [EMAIL PROTECTED]> wrote:>>> Ted,>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the>> FuzzyRowFilter (mask on the whole length of the key). I thought the>> FuzzyRowFilter's SEEK_NEXT_USING_HINT would help. All this on the client>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on>> the client side performance (still doing multiple get inside the>> coprocessor). Also, I am seeing very bad concurrent query performance. Are>> there any thing that would make Coprocessors almost single threaded across>> multiple invocations ?>> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems>> to be very good in bringing up the regions online fast and balanced. Thanks>> and much appreciated.>>>> Regards,>> - kiru>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>>>>> ________________________________>> From: Ted Yu <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Saturday, August 17, 2013 4:19 PM>> Subject: Re: Client Get vs Coprocessor scan performance>>>>>> HBASE-6870 targeted whole table scanning for each coprocessorService call>> which exhibited itself through:>>>> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->>> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),>> getTableName(), false)>>>> The cached region locations in HConnectionImplementation would be used.>>>> Cheers>>>>>> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[EMAIL PROTECTED]>>> wrote:>>>>> Ted, can you elaborate a little bit why this issue boosts performance?>>> I couldn't figure out from the issue comments if they execCoprocessor>> scans>>> the entire .META. table or and entire table, to understand the actual>>> improvement.>>>>>> Thanks!>>>>>>>>>>>>>>> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:>>>>>>> I think you need HBASE-6870 which went into 0.94.8>>>>>>>> Upgrading should boost coprocessor performance.>>>>>>>> Cheers>>>>>>>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <>> [EMAIL PROTECTED]>>>>>>>> wrote:>>>>>>>>> Ted,>>>>> Here is the method signature/protocol>>>>> public Map<String, Double> getFooMap<String, Double> input,>>>>> int topN) throws IOException;>>>>>>>>>> There are 31 regions on 4 nodes X 8 CPU.>>>>> I am on 0.94.6 (from Hortonworks).>>>>> I think it seems to behave like what linwukang says, - it is almost a>>>> full table scan in the coprocessor.>>>>> Actually, when I set more specific ColumnPrefixFilters performance>> went>>>> down.>>>>> I want to do things on the server side because, I dont want to be>>>> sending 500K column/values to the client.>>>>> I cannot believe a single-threaded client which does some>> calculations>>>> and group-by beats the coprocessor running in 31 regions.>>>>>>>>>> Regards,>>>>> - kiru>>>>>>>>>>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com

Ted,Re: Multiple Gets vs FuzzyRowFilter - looks like my row/column processing is mixed in and not giving a definitive view of the performance of either interfaces. I will do more testing on this, by writing a simpler test program.

A Get on a HRegion throws an exception if the key is not there. So now, I just catch and skip the processing for such keys.I think my "single-threaded" behavior may be related to key locality. If I shutdown one regionserver the processing gets shifted to another. So I think the locality is probably the issue. I might use multiple tables (duplicating the data) and randomly pick a table to do lookups. This might keep all the nodes busy. But one thing that is perplexing is why does the Heap usage go down when all my tables are supposed to be IN_MEMORY ? (this was not the case with 0.94.6 where the Heap usage grew and stayed up there). Thanks again.

James,I am using the FuzzyRowFilter or the Gets within a Coprocessor. Looks like I cannot use your SkipScanFilter by itself as it has lots of phoenix imports. I thought of writing my own Custom filter and saw that the FuzzyRowFilter in the 0.94 branch also had an implementation for getNextKeyHint(), only that it works well only with fixed length keys if I wanted a complete match of the keys. After my padding my keys to fixed length it seems to be fine.Once I confirm some key locality and other issues (like heap), I will try to bench mark this table alone against Phoenix on another cluster. Thanks.

> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on> the whole length of the key)>> In this case the Get's are very selective. The number of rows FuzzyRowFilter> was evaluated against would be much higher.> It would be nice if you remember the time each took.>> bq. Also, I am seeing very bad concurrent query performance>> Were the multi Get's performed by your coprocessor within region boundary> of the respective coprocessor ? Just to confirm.>> bq. that would make Coprocessors almost single threaded across multiple> invocations ?>> Let me dig into code some more.>> Cheers>>> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <> [EMAIL PROTECTED]> wrote:>>> Ted,>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the>> FuzzyRowFilter (mask on the whole length of the key). I thought the>> FuzzyRowFilter's SEEK_NEXT_USING_HINT would help. All this on the client>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on>> the client side performance (still doing multiple get inside the>> coprocessor). Also, I am seeing very bad concurrent query performance. Are>> there any thing that would make Coprocessors almost single threaded across>> multiple invocations ?>> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems>> to be very good in bringing up the regions online fast and balanced. Thanks>> and much appreciated.>>>> Regards,>> - kiru>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>>>>> ________________________________>> From: Ted Yu <[EMAIL PROTECTED]>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Saturday, August 17, 2013 4:19 PM>> Subject: Re: Client Get vs Coprocessor scan performance>>>>>> HBASE-6870 targeted whole table scanning for each coprocessorService call>> which exhibited itself through:>>>> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->>> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),>> getTableName(), false)>>>> The cached region locations in HConnectionImplementation would be used.>>>> Cheers>>>>>> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[EMAIL PROTECTED]>>> wrote:>>>>> Ted, can you elaborate a little bit why this issue boosts performance?>>> I couldn't figure out from the issue comments if they execCoprocessor>> scans>>> the entire .META. table or and entire table, to understand the actual>>> improvement.>>>>>> Thanks!>>>>>>>>>>>>>>> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:>>>>>>> I think you need HBASE-6870 which went into 0.94.8>>>>>>>> Upgrading should boost coprocessor performance.

Kiru,If you're able to post the key values, row key structure, and data typesyou're using, I can post the Phoenix code to query against it. You're doingsome kind of aggregation too, right? If you could explain that part too,that would be helpful. It's likely that you can just query the existingHBase data you've already created on the same cluster you're already using(provided you put the phoenix jar on all the region servers - use our 2.0.0version that just came out). Might be interesting to compare the amount ofcode necessary in each approach as well.Thanks,JamesOn Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:

> James,> I am using the FuzzyRowFilter or the Gets within a Coprocessor. Looks> like I cannot use your SkipScanFilter by itself as it has lots of phoenix> imports. I thought of writing my own Custom filter and saw that the> FuzzyRowFilter in the 0.94 branch also had an implementation for> getNextKeyHint(), only that it works well only with fixed length keys if I> wanted a complete match of the keys. After my padding my keys to fixed> length it seems to be fine.> Once I confirm some key locality and other issues (like heap), I will try> to bench mark this table alone against Phoenix on another cluster. Thanks.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: James Taylor <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>> Sent: Sunday, August 18, 2013 11:44 AM> Subject: Re: Client Get vs Coprocessor scan performance>>> Would be interesting to compare against Phoenix's Skip Scan> (> http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html> )> which does a scan through a coprocessor and is more than 2x faster> than multi Get (plus handles multi-range scans in addition to point> gets).>> James>> On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:>> > bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on> > the whole length of the key)> >> > In this case the Get's are very selective. The number of rows> FuzzyRowFilter> > was evaluated against would be much higher.> > It would be nice if you remember the time each took.> >> > bq. Also, I am seeing very bad concurrent query performance> >> > Were the multi Get's performed by your coprocessor within region boundary> > of the respective coprocessor ? Just to confirm.> >> > bq. that would make Coprocessors almost single threaded across multiple> > invocations ?> >> > Let me dig into code some more.> >> > Cheers> >> >> > On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <> > [EMAIL PROTECTED]> wrote:> >> >> Ted,> >> On a table with 600K rows, Get'ting 100 rows seems to be faster than the> >> FuzzyRowFilter (mask on the whole length of the key). I thought the> >> FuzzyRowFilter's SEEK_NEXT_USING_HINT would help. All this on the> client> >> side, I have not changed my CoProcessor to use the FuzzyRowFilter based> on> >> the client side performance (still doing multiple get inside the> >> coprocessor). Also, I am seeing very bad concurrent query performance.> Are> >> there any thing that would make Coprocessors almost single threaded> across> >> multiple invocations ?> >> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which> seems> >> to be very good in bringing up the regions online fast and balanced.> Thanks> >> and much appreciated.> >>> >> Regards,> >> - kiru> >>> >>> >> Kiru Pakkirisamy | webcloudtech.wordpress.com> >>> >>> >> ________________________________> >> From: Ted Yu <[EMAIL PROTECTED]>> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> >> Sent: Saturday, August 17, 2013 4:19 PM> >> Subject: Re: Client Get vs Coprocessor scan performance> >>> >>> >> HBASE-6870 targeted whole table scanning for each coprocessorService

If I can create a Phoenix schema mapping to this existing table that would be great. I actually do a group by the column values and return another value which is a function of the value and an input double value. Input is a Map<String, Double> and return is also a Map<String, Double>.

Kiru,If you're able to post the key values, row key structure, and data types you're using, I can post the Phoenix code to query against it. You're doing some kind of aggregation too, right? If you could explain that part too, that would be helpful. It's likely that you can just query the existing HBase data you've already created on the same cluster you're already using (provided you put the phoenix jar on all the region servers - use our 2.0.0 version that just came out). Might be interesting to compare the amount of code necessary in each approach as well.Thanks,James

James,>I am using the FuzzyRowFilter or the Gets within a Coprocessor. Looks like I cannot use your SkipScanFilter by itself as it has lots of phoenix imports. I thought of writing my own Custom filter and saw that the FuzzyRowFilter in the 0.94 branch also had an implementation for getNextKeyHint(), only that it works well only with fixed length keys if I wanted a complete match of the keys. After my padding my keys to fixed length it seems to be fine.>Once I confirm some key locality and other issues (like heap), I will try to bench mark this table alone against Phoenix on another cluster. Thanks.>> >Regards,>- kiru>>>Kiru Pakkirisamy | webcloudtech.wordpress.com>>>________________________________> From: James Taylor <[EMAIL PROTECTED]>>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>>Sent: Sunday, August 18, 2013 11:44 AM>>Subject: Re: Client Get vs Coprocessor scan performance>>>Would be interesting to compare against Phoenix's Skip Scan>(http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html)>which does a scan through a coprocessor and is more than 2x faster>than multi Get (plus handles multi-range scans in addition to point>gets).>>James>>On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:>>> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on>> the whole length of the key)>>>> In this case the Get's are very selective. The number of rows FuzzyRowFilter>> was evaluated against would be much higher.>> It would be nice if you remember the time each took.>>>> bq. Also, I am seeing very bad concurrent query performance>>>> Were the multi Get's performed by your coprocessor within region boundary>> of the respective coprocessor ? Just to confirm.>>>> bq. that would make Coprocessors almost single threaded across multiple>> invocations ?>>>> Let me dig into code some more.>>>> Cheers>>>>>> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <>> [EMAIL PROTECTED]> wrote:>>>>> Ted,>>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the>>> FuzzyRowFilter (mask on the whole length of the key). I thought the>>> FuzzyRowFilter's SEEK_NEXT_USING_HINT would help. All this on the client>>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on>>> the client side performance (still doing multiple get inside the>>> coprocessor). Also, I am seeing very bad concurrent query performance. Are>>> there any thing that would make Coprocessors almost single threaded across

Kiru,What's your column family name? Just to confirm, the column qualifier ofyour key value is C_10345 and this stores a value as a Double usingBytes.toBytes(double)? Are any of the Double values negative? Any other keyvalues?

Can you give me an idea of the kind of fuzzy filtering you're doing on the7 char row key? We may want to model that as a set of row key columns inPhoenix to leverage the skip scan more.

How about I model your aggregation as an AVG over a group of rows? Whatwould your GROUP BY expression look like? Are you grouping based on a partof the 7 char row key? Or on some other key value?

> James,> Rowkey - String - len - 7> Col = String - variable length - but looks C_10345> Col value = Double>> If I can create a Phoenix schema mapping to this existing table that would> be great. I actually do a group by the column values and return another> value which is a function of the value and an input double value. Input is> a Map<String, Double> and return is also a Map<String, Double>.>>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>> ------------------------------> *From:* James Taylor <[EMAIL PROTECTED]>> *To:* [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>> *Sent:* Sunday, August 18, 2013 2:07 PM>> *Subject:* Re: Client Get vs Coprocessor scan performance>> Kiru,> If you're able to post the key values, row key structure, and data types> you're using, I can post the Phoenix code to query against it. You're doing> some kind of aggregation too, right? If you could explain that part too,> that would be helpful. It's likely that you can just query the existing> HBase data you've already created on the same cluster you're already using> (provided you put the phoenix jar on all the region servers - use our 2.0.0> version that just came out). Might be interesting to compare the amount of> code necessary in each approach as well.> Thanks,> James>>> On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <> [EMAIL PROTECTED]> wrote:>> James,> I am using the FuzzyRowFilter or the Gets within a Coprocessor. Looks> like I cannot use your SkipScanFilter by itself as it has lots of phoenix> imports. I thought of writing my own Custom filter and saw that the> FuzzyRowFilter in the 0.94 branch also had an implementation for> getNextKeyHint(), only that it works well only with fixed length keys if I> wanted a complete match of the keys. After my padding my keys to fixed> length it seems to be fine.> Once I confirm some key locality and other issues (like heap), I will try> to bench mark this table alone against Phoenix on another cluster. Thanks.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: James Taylor <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>> Sent: Sunday, August 18, 2013 11:44 AM> Subject: Re: Client Get vs Coprocessor scan performance>>> Would be interesting to compare against Phoenix's Skip Scan> (> http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html> )> which does a scan through a coprocessor and is more than 2x faster> than multi Get (plus handles multi-range scans in addition to point> gets).>> James>> On Aug 18, 2013, at 6:39 AM, Ted Yu <[EMAIL PROTECTED]> wrote:>> > bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on> > the whole length of the key)> >> > In this case the Get's are very selective. The number of rows> FuzzyRowFilter> > was evaluated against would be much higher.> > It would be nice if you remember the time each took.> >> > bq. Also, I am seeing very bad concurrent query performance> >> > Were the multi Get's performed by your coprocessor within region boundary

James,I have only one family -cp. Yes, that is how I store the Double. No, the doubles are always positive.The keys are "A14568 " Less than a million and I added the alphabets to randomize them.I group them based on the C_ suffix and say order them by the Double (to simplify it).Is there a way to do a sort of "user defined function" on a column ? that would take care of my calculation on that double. Thanks again.

Kiru,What's your column family name? Just to confirm, the column qualifier ofyour key value is C_10345 and this stores a value as a Double usingBytes.toBytes(double)? Are any of the Double values negative? Any other keyvalues?

Can you give me an idea of the kind of fuzzy filtering you're doing on the7 char row key? We may want to model that as a set of row key columns inPhoenix to leverage the skip scan more.

How about I model your aggregation as an AVG over a group of rows? Whatwould your GROUP BY expression look like? Are you grouping based on a partof the 7 char row key? Or on some other key value?

> James,> Rowkey - String - len - 7> Col = String - variable length - but looks C_10345> Col value = Double>> If I can create a Phoenix schema mapping to this existing table that would> be great. I actually do a group by the column values and return another> value which is a function of the value and an input double value. Input is> a Map<String, Double> and return is also a Map<String, Double>.>>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>> ------------------------------> *From:* James Taylor <[EMAIL PROTECTED]>> *To:* [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>> *Sent:* Sunday, August 18, 2013 2:07 PM>> *Subject:* Re: Client Get vs Coprocessor scan performance>> Kiru,> If you're able to post the key values, row key structure, and data types> you're using, I can post the Phoenix code to query against it. You're doing> some kind of aggregation too, right? If you could explain that part too,> that would be helpful. It's likely that you can just query the existing> HBase data you've already created on the same cluster you're already using> (provided you put the phoenix jar on all the region servers - use our 2.0.0> version that just came out). Might be interesting to compare the amount of> code necessary in each approach as well.> Thanks,> James>>> On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <> [EMAIL PROTECTED]> wrote:>> James,> I am using the FuzzyRowFilter or the Gets within a Coprocessor. Looks> like I cannot use your SkipScanFilter by itself as it has lots of phoenix> imports. I thought of writing my own Custom filter and saw that the> FuzzyRowFilter in the 0.94 branch also had an implementation for> getNextKeyHint(), only that it works well only with fixed length keys if I> wanted a complete match of the keys. After my padding my keys to fixed> length it seems to be fine.> Once I confirm some key locality and other issues (like heap), I will try> to bench mark this table alone against Phoenix on another cluster. Thanks.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: James Taylor <[EMAIL PROTECTED]>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Cc: Kiru Pakkirisamy <[EMAIL PROTECTED]>> Sent: Sunday, August 18, 2013 11:44 AM> Subject: Re: Client Get vs Coprocessor scan performance>>> Would be interesting to compare against Phoenix's Skip Scan

> James,> I have only one family -cp. Yes, that is how I store the Double. No, the doubles are always positive.> The keys are "A14568 " Less than a million and I added the alphabets to randomize them.> I group them based on the C_ suffix and say order them by the Double (to simplify it).> Is there a way to do a sort of "user defined function" on a column ? that would take care of my calculation on that double.> Thanks again.>> Regards,> - kiru>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>> ________________________________> From: James Taylor <[EMAIL PROTECTED]>> To: Kiru Pakkirisamy <[EMAIL PROTECTED]>> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>> Sent: Sunday, August 18, 2013 5:34 PM> Subject: Re: Client Get vs Coprocessor scan performance>>> Kiru,> What's your column family name? Just to confirm, the column qualifier of> your key value is C_10345 and this stores a value as a Double using> Bytes.toBytes(double)? Are any of the Double values negative? Any other key> values?>> Can you give me an idea of the kind of fuzzy filtering you're doing on the> 7 char row key? We may want to model that as a set of row key columns in> Phoenix to leverage the skip scan more.>> How about I model your aggregation as an AVG over a group of rows? What> would your GROUP BY expression look like? Are you grouping based on a part> of the 7 char row key? Or on some other key value?>> Thanks,> James>>> On Sun, Aug 18, 2013 at 2:16 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]>> wrote:>>> James,>> Rowkey - String - len - 7>> Col = String - variable length - but looks C_10345>> Col value = Double>>>> If I can create a Phoenix schema mapping to this existing table that would>> be great. I actually do a group by the column values and return another>> value which is a function of the value and an input double value. Input is>> a Map<String, Double> and return is also a Map<String, Double>.>>>>>> Regards,>> - kiru>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com>>>> ------------------------------>> *From:* James Taylor <[EMAIL PROTECTED]>>> *To:* [EMAIL PROTECTED]; Kiru Pakkirisamy <[EMAIL PROTECTED]>>> *Sent:* Sunday, August 18, 2013 2:07 PM>>>> *Subject:* Re: Client Get vs Coprocessor scan performance>>>> Kiru,>> If you're able to post the key values, row key structure, and data types>> you're using, I can post the Phoenix code to query against it. You're doing>> some kind of aggregation too, right? If you could explain that part too,>> that would be helpful. It's likely that you can just query the existing>> HBase data you've already created on the same cluster you're already using>> (provided you put the phoenix jar on all the region servers - use our 2.0.0>> version that just came out). Might be interesting to compare the amount of>> code necessary in each approach as well.>> Thanks,>> James>>>>>> On Sun, Aug 18, 2013 at 12:16 PM, Kiru Pakkirisamy <>> [EMAIL PROTECTED]> wrote:>>>> James,>> I am using the FuzzyRowFilter or the Gets within a Coprocessor. Looks>> like I cannot use your SkipScanFilter by itself as it has lots of phoenix>> imports. I thought of writing my own Custom filter and saw that the>> FuzzyRowFilter in the 0.94 branch also had an implementation for>> getNextKeyHint(), only that it works well only with fixed length keys if I>> wanted a complete match of the keys. After my padding my keys to fixed>> length it seems to be fine.>> Once I confirm some key locality and other issues (like heap), I will try>> to bench mark this table alone against Phoenix on another cluster. Thanks.

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext