I think it is time for us to have another meeting. Yahoo would be happy to host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please, let us know if you are planning to attend and if the date/time works for you.

Things that come to mind to discuss and as always feel free to suggest others:

- Error handling proposal - this might be easier to finalize face-to-face- Pig 0.9 plan- Pig Roadmap beyond 0.9o What do we want to do in Pig.next?o Are we ready for Pig 1.0

If making Pig Thread safe (i.e.: two threads running a different pig script) is important then we need to change some of the APIs from static singleton access to a dependency injection pattern.In that case, this should probably be done before 1.0For example: UDFContext should be passed to the UDF after construction (similar to the SevrletContext in Servlet or the way Hadoop passes the context to tasks)Also a clearly separated API that does not depend on the Pig implementation would help.For example UDFContext is in org.apache.pig.impl.util when it would be better in org.apache.pig.api (Or at least an interface defining it)

Julien

On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:

Hi Guys,

I think it is time for us to have another meeting. Yahoo would be happy to host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please, let us know if you are planning to attend and if the date/time works for you.

Things that come to mind to discuss and as always feel free to suggest others:

- Error handling proposal - this might be easier to finalize face-to-face- Pig 0.9 plan- Pig Roadmap beyond 0.9o What do we want to do in Pig.next?o Are we ready for Pig 1.0

I may be wrong but I think predicate pushdown is designed for, but notactually implemented in the current LoadPushdown interface (you can onlypush projections). If I am wrong, that's great.. but if not, that would bean important feature to add, as people are trying to connect Pig to "smart"storage systems like rdbmses, HBase, and Cassandra more and more. I thinkwe only kind of simulate this with partition keys info, which is not alwayssufficient

> If making Pig Thread safe (i.e.: two threads running a different pig> script) is important then we need to change some of the APIs from static> singleton access to a dependency injection pattern.> In that case, this should probably be done before 1.0> For example: UDFContext should be passed to the UDF after construction> (similar to the SevrletContext in Servlet or the way Hadoop passes the> context to tasks)> Also a clearly separated API that does not depend on the Pig implementation> would help.> For example UDFContext is in org.apache.pig.impl.util when it would be> better in org.apache.pig.api (Or at least an interface defining it)>> Julien>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:>> Hi Guys,>> I think it is time for us to have another meeting. Yahoo would be happy to> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please,> let us know if you are planning to attend and if the date/time works for> you.>> Things that come to mind to discuss and as always feel free to suggest> others:>> - Error handling proposal - this might be easier to finalize> face-to-face> - Pig 0.9 plan> - Pig Roadmap beyond 0.9> o What do we want to do in Pig.next?> o Are we ready for Pig 1.0>> Olga>>

Are you talking about LoadMetadata.setPartitionFilter? PartitionFilterOptimizer will do that.

Daniel

Dmitriy Ryaboy wrote:> I may be wrong but I think predicate pushdown is designed for, but not> actually implemented in the current LoadPushdown interface (you can only> push projections). If I am wrong, that's great.. but if not, that would be> an important feature to add, as people are trying to connect Pig to "smart"> storage systems like rdbmses, HBase, and Cassandra more and more. I think> we only kind of simulate this with partition keys info, which is not always> sufficient>> D>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]> wrote:>> >> If making Pig Thread safe (i.e.: two threads running a different pig>> script) is important then we need to change some of the APIs from static>> singleton access to a dependency injection pattern.>> In that case, this should probably be done before 1.0>> For example: UDFContext should be passed to the UDF after construction>> (similar to the SevrletContext in Servlet or the way Hadoop passes the>> context to tasks)>> Also a clearly separated API that does not depend on the Pig implementation>> would help.>> For example UDFContext is in org.apache.pig.impl.util when it would be>> better in org.apache.pig.api (Or at least an interface defining it)>>>> Julien>>>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:>>>> Hi Guys,>>>> I think it is time for us to have another meeting. Yahoo would be happy to>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm. Please,>> let us know if you are planning to attend and if the date/time works for>> you.>>>> Things that come to mind to discuss and as always feel free to suggest>> others:>>>> - Error handling proposal - this might be easier to finalize>> face-to-face>> - Pig 0.9 plan>> - Pig Roadmap beyond 0.9>> o What do we want to do in Pig.next?>> o Are we ready for Pig 1.0>>>> Olga>>>>>>

> Are you talking about LoadMetadata.setPartitionFilter?> PartitionFilterOptimizer will do that.>> Daniel>>> Dmitriy Ryaboy wrote:>>> I may be wrong but I think predicate pushdown is designed for, but not>> actually implemented in the current LoadPushdown interface (you can only>> push projections). If I am wrong, that's great.. but if not, that would be>> an important feature to add, as people are trying to connect Pig to>> "smart">> storage systems like rdbmses, HBase, and Cassandra more and more. I think>> we only kind of simulate this with partition keys info, which is not>> always>> sufficient>>>> D>>>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>>> wrote:>>>>>>>>> If making Pig Thread safe (i.e.: two threads running a different pig>>> script) is important then we need to change some of the APIs from static>>> singleton access to a dependency injection pattern.>>> In that case, this should probably be done before 1.0>>> For example: UDFContext should be passed to the UDF after construction>>> (similar to the SevrletContext in Servlet or the way Hadoop passes the>>> context to tasks)>>> Also a clearly separated API that does not depend on the Pig>>> implementation>>> would help.>>> For example UDFContext is in org.apache.pig.impl.util when it would be>>> better in org.apache.pig.api (Or at least an interface defining it)>>>>>> Julien>>>>>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:>>>>>> Hi Guys,>>>>>> I think it is time for us to have another meeting. Yahoo would be happy>>> to>>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.>>> Please,>>> let us know if you are planning to attend and if the date/time works for>>> you.>>>>>> Things that come to mind to discuss and as always feel free to suggest>>> others:>>>>>> - Error handling proposal - this might be easier to finalize>>> face-to-face>>> - Pig 0.9 plan>>> - Pig Roadmap beyond 0.9>>> o What do we want to do in Pig.next?>>> o Are we ready for Pig 1.0>>>>>> Olga>>>>>>>>>>>>>>>

> Are you talking about LoadMetadata.setPartitionFilter?> PartitionFilterOptimizer will do that.>> Daniel>>> Dmitriy Ryaboy wrote:>>> I may be wrong but I think predicate pushdown is designed for, but not>> actually implemented in the current LoadPushdown interface (you can only>> push projections). If I am wrong, that's great.. but if not, that would be>> an important feature to add, as people are trying to connect Pig to>> "smart">> storage systems like rdbmses, HBase, and Cassandra more and more. I think>> we only kind of simulate this with partition keys info, which is not>> always>> sufficient>>>> D>>>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>>> wrote:>>>>>>>>> If making Pig Thread safe (i.e.: two threads running a different pig>>> script) is important then we need to change some of the APIs from static>>> singleton access to a dependency injection pattern.>>> In that case, this should probably be done before 1.0>>> For example: UDFContext should be passed to the UDF after construction>>> (similar to the SevrletContext in Servlet or the way Hadoop passes the>>> context to tasks)>>> Also a clearly separated API that does not depend on the Pig>>> implementation>>> would help.>>> For example UDFContext is in org.apache.pig.impl.util when it would be>>> better in org.apache.pig.api (Or at least an interface defining it)>>>>>> Julien>>>>>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:>>>>>> Hi Guys,>>>>>> I think it is time for us to have another meeting. Yahoo would be happy>>> to>>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.>>> Please,>>> let us know if you are planning to attend and if the date/time works for>>> you.>>>>>> Things that come to mind to discuss and as always feel free to suggest>>> others:>>>>>> - Error handling proposal - this might be easier to finalize>>> face-to-face>>> - Pig 0.9 plan>>> - Pig Roadmap beyond 0.9>>> o What do we want to do in Pig.next?>>> o Are we ready for Pig 1.0>>>>>> Olga>>>>>>>>>>>>>>>

> While there is a lively discussion on this thread, I have not actually> gotten any responses to having the meeting with exception of 1 person :).>> Please, let me know by the end of the week if you are planning to attend.> If we don't get at least a few more responses I suggest we postpone the> meeting.>> Thanks,>> Olga>> -----Original Message-----> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, January 26, 2011 6:04 PM> To: [EMAIL PROTECTED]> Subject: Re: Pig developer meeting in February>> Right, we do partition filtering, but not true predicate pushdown.>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]>> wrote:>> > Are you talking about LoadMetadata.setPartitionFilter?> > PartitionFilterOptimizer will do that.> >> > Daniel> >> >> > Dmitriy Ryaboy wrote:> >> >> I may be wrong but I think predicate pushdown is designed for, but not> >> actually implemented in the current LoadPushdown interface (you can only> >> push projections). If I am wrong, that's great.. but if not, that would> be> >> an important feature to add, as people are trying to connect Pig to> >> "smart"> >> storage systems like rdbmses, HBase, and Cassandra more and more. I> think> >> we only kind of simulate this with partition keys info, which is not> >> always> >> sufficient> >>> >> D> >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>> >> wrote:> >>> >>> >>> >>> If making Pig Thread safe (i.e.: two threads running a different pig> >>> script) is important then we need to change some of the APIs from> static> >>> singleton access to a dependency injection pattern.> >>> In that case, this should probably be done before 1.0> >>> For example: UDFContext should be passed to the UDF after construction> >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the> >>> context to tasks)> >>> Also a clearly separated API that does not depend on the Pig> >>> implementation> >>> would help.> >>> For example UDFContext is in org.apache.pig.impl.util when it would be> >>> better in org.apache.pig.api (Or at least an interface defining it)> >>>> >>> Julien> >>>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:> >>>> >>> Hi Guys,> >>>> >>> I think it is time for us to have another meeting. Yahoo would be happy> >>> to> >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.> >>> Please,> >>> let us know if you are planning to attend and if the date/time works> for> >>> you.> >>>> >>> Things that come to mind to discuss and as always feel free to suggest> >>> others:> >>>> >>> - Error handling proposal - this might be easier to finalize> >>> face-to-face> >>> - Pig 0.9 plan> >>> - Pig Roadmap beyond 0.9> >>> o What do we want to do in Pig.next?> >>> o Are we ready for Pig 1.0> >>>> >>> Olga> >>>> >>>> >>>> >>>> >>> >>

> While there is a lively discussion on this thread, I have not actually> gotten any responses to having the meeting with exception of 1 person :).>> Please, let me know by the end of the week if you are planning to attend.> If we don't get at least a few more responses I suggest we postpone the> meeting.>> Thanks,>> Olga>> -----Original Message-----> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, January 26, 2011 6:04 PM> To: [EMAIL PROTECTED]> Subject: Re: Pig developer meeting in February>> Right, we do partition filtering, but not true predicate pushdown.>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]>> wrote:>> > Are you talking about LoadMetadata.setPartitionFilter?> > PartitionFilterOptimizer will do that.> >> > Daniel> >> >> > Dmitriy Ryaboy wrote:> >> >> I may be wrong but I think predicate pushdown is designed for, but not> >> actually implemented in the current LoadPushdown interface (you can only> >> push projections). If I am wrong, that's great.. but if not, that would> be> >> an important feature to add, as people are trying to connect Pig to> >> "smart"> >> storage systems like rdbmses, HBase, and Cassandra more and more. I> think> >> we only kind of simulate this with partition keys info, which is not> >> always> >> sufficient> >>> >> D> >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>> >> wrote:> >>> >>> >>> >>> If making Pig Thread safe (i.e.: two threads running a different pig> >>> script) is important then we need to change some of the APIs from> static> >>> singleton access to a dependency injection pattern.> >>> In that case, this should probably be done before 1.0> >>> For example: UDFContext should be passed to the UDF after construction> >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the> >>> context to tasks)> >>> Also a clearly separated API that does not depend on the Pig> >>> implementation> >>> would help.> >>> For example UDFContext is in org.apache.pig.impl.util when it would be> >>> better in org.apache.pig.api (Or at least an interface defining it)> >>>> >>> Julien> >>>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:> >>>> >>> Hi Guys,> >>>> >>> I think it is time for us to have another meeting. Yahoo would be happy> >>> to> >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.> >>> Please,> >>> let us know if you are planning to attend and if the date/time works> for> >>> you.> >>>> >>> Things that come to mind to discuss and as always feel free to suggest> >>> others:> >>>> >>> - Error handling proposal - this might be easier to finalize> >>> face-to-face> >>> - Pig 0.9 plan> >>> - Pig Roadmap beyond 0.9> >>> o What do we want to do in Pig.next?> >>> o Are we ready for Pig 1.0> >>>> >>> Olga> >>>> >>>> >>>> >>>> >>> >>

> Are you saying that as long as one claims every column as a partition, all filters will be pushed> down?

Exactly. Though javadoc are heavily worded for partition pruning,since that was the primary use case at that time for predicatepushdown. But you will get all the filter expressions if you claimall the columns are partition columns. Partition columns have nospecial semantics in Pig apart then this.

> Will the filters also be applied to the data the loader returns, even if the loader accepts the> expression?

I think filter will be deleted from logical plan if it is pushed up.So, it wont be applied in pipeline later on. Daniel can confirm ifthats the case with new logical plan or not?

Ashutosh

On Thu, Jan 27, 2011 at 17:21, Julien Le Dem <[EMAIL PROTECTED]> wrote:> Me too.> Julien>>> On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:>> Ok yeah I'll come :).>>>> On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote:>>> While there is a lively discussion on this thread, I have not actually>> gotten any responses to having the meeting with exception of 1 person :).>>>> Please, let me know by the end of the week if you are planning to attend.>> If we don't get at least a few more responses I suggest we postpone the>> meeting.>>>> Thanks,>>>> Olga>>>> -----Original Message----->> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]>> Sent: Wednesday, January 26, 2011 6:04 PM>> To: [EMAIL PROTECTED]>> Subject: Re: Pig developer meeting in February>>>> Right, we do partition filtering, but not true predicate pushdown.>>>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]>>> wrote:>>>> > Are you talking about LoadMetadata.setPartitionFilter?>> > PartitionFilterOptimizer will do that.>> >>> > Daniel>> >>> >>> > Dmitriy Ryaboy wrote:>> >>> >> I may be wrong but I think predicate pushdown is designed for, but not>> >> actually implemented in the current LoadPushdown interface (you can only>> >> push projections). If I am wrong, that's great.. but if not, that would>> be>> >> an important feature to add, as people are trying to connect Pig to>> >> "smart">> >> storage systems like rdbmses, HBase, and Cassandra more and more. I>> think>> >> we only kind of simulate this with partition keys info, which is not>> >> always>> >> sufficient>> >>>> >> D>> >>>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>>> >> wrote:>> >>>> >>>> >>>> >>> If making Pig Thread safe (i.e.: two threads running a different pig>> >>> script) is important then we need to change some of the APIs from>> static>> >>> singleton access to a dependency injection pattern.>> >>> In that case, this should probably be done before 1.0>> >>> For example: UDFContext should be passed to the UDF after construction>> >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the>> >>> context to tasks)>> >>> Also a clearly separated API that does not depend on the Pig>> >>> implementation>> >>> would help.>> >>> For example UDFContext is in org.apache.pig.impl.util when it would be>> >>> better in org.apache.pig.api (Or at least an interface defining it)>> >>>>> >>> Julien>> >>>>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:>> >>>>> >>> Hi Guys,>> >>>>> >>> I think it is time for us to have another meeting. Yahoo would be happy>> >>> to>> >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.>> >>> Please,>> >>> let us know if you are planning to attend and if the date/time works>> for>> >>> you.>> >>>>> >>> Things that come to mind to discuss and as always feel free to suggest>> >>> others:>> >>>>> >>> - Error handling proposal - this might be easier to finalize>> >>> face-to-face>> >>> - Pig 0.9 plan>> >>> - Pig Roadmap beyond 0.9>> >>> o What do we want to do in Pig.next?>> >>> o Are we ready for Pig 1.0

> While there is a lively discussion on this thread, I have not actually> gotten any responses to having the meeting with exception of 1 person :).>> Please, let me know by the end of the week if you are planning to attend.> If we don't get at least a few more responses I suggest we postpone the> meeting.>> Thanks,>> Olga>> -----Original Message-----> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, January 26, 2011 6:04 PM> To: [EMAIL PROTECTED]> Subject: Re: Pig developer meeting in February>> Right, we do partition filtering, but not true predicate pushdown.>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]>> wrote:>> > Are you talking about LoadMetadata.setPartitionFilter?> > PartitionFilterOptimizer will do that.> >> > Daniel> >> >> > Dmitriy Ryaboy wrote:> >> >> I may be wrong but I think predicate pushdown is designed for, but not> >> actually implemented in the current LoadPushdown interface (you can only> >> push projections). If I am wrong, that's great.. but if not, that would> be> >> an important feature to add, as people are trying to connect Pig to> >> "smart"> >> storage systems like rdbmses, HBase, and Cassandra more and more. I> think> >> we only kind of simulate this with partition keys info, which is not> >> always> >> sufficient> >>> >> D> >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>> >> wrote:> >>> >>> >>> >>> If making Pig Thread safe (i.e.: two threads running a different pig> >>> script) is important then we need to change some of the APIs from> static> >>> singleton access to a dependency injection pattern.> >>> In that case, this should probably be done before 1.0> >>> For example: UDFContext should be passed to the UDF after construction> >>> (similar to the SevrletContext in Servlet or the way Hadoop passes the> >>> context to tasks)> >>> Also a clearly separated API that does not depend on the Pig> >>> implementation> >>> would help.> >>> For example UDFContext is in org.apache.pig.impl.util when it would be> >>> better in org.apache.pig.api (Or at least an interface defining it)> >>>> >>> Julien> >>>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:> >>>> >>> Hi Guys,> >>>> >>> I think it is time for us to have another meeting. Yahoo would be happy> >>> to> >>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.> >>> Please,> >>> let us know if you are planning to attend and if the date/time works> for> >>> you.> >>>> >>> Things that come to mind to discuss and as always feel free to suggest> >>> others:> >>>> >>> - Error handling proposal - this might be easier to finalize> >>> face-to-face> >>> - Pig 0.9 plan> >>> - Pig Roadmap beyond 0.9> >>> o What do we want to do in Pig.next?> >>> o Are we ready for Pig 1.0> >>>> >>> Olga> >>>> >>>> >>>> >>>> >>> >>

> While there is a lively discussion on this thread, I have not actually > gotten any responses to having the meeting with exception of 1 person :).>> Please, let me know by the end of the week if you are planning to attend.> If we don't get at least a few more responses I suggest we postpone > the meeting.>> Thanks,>> Olga>> -----Original Message-----> From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, January 26, 2011 6:04 PM> To: [EMAIL PROTECTED]> Subject: Re: Pig developer meeting in February>> Right, we do partition filtering, but not true predicate pushdown.>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]>> wrote:>> > Are you talking about LoadMetadata.setPartitionFilter?> > PartitionFilterOptimizer will do that.> >> > Daniel> >> >> > Dmitriy Ryaboy wrote:> >> >> I may be wrong but I think predicate pushdown is designed for, but > >> not actually implemented in the current LoadPushdown interface (you > >> can only push projections). If I am wrong, that's great.. but if > >> not, that would> be> >> an important feature to add, as people are trying to connect Pig to > >> "smart"> >> storage systems like rdbmses, HBase, and Cassandra more and more. > >> I> think> >> we only kind of simulate this with partition keys info, which is > >> not always sufficient> >>> >> D> >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem > >> <[EMAIL PROTECTED]>> >> wrote:> >>> >>> >>> >>> If making Pig Thread safe (i.e.: two threads running a different > >>> pig> >>> script) is important then we need to change some of the APIs from> static> >>> singleton access to a dependency injection pattern.> >>> In that case, this should probably be done before 1.0 For example: > >>> UDFContext should be passed to the UDF after construction (similar > >>> to the SevrletContext in Servlet or the way Hadoop passes the > >>> context to tasks) Also a clearly separated API that does not > >>> depend on the Pig implementation would help.> >>> For example UDFContext is in org.apache.pig.impl.util when it > >>> would be better in org.apache.pig.api (Or at least an interface > >>> defining it)> >>>> >>> Julien> >>>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:> >>>> >>> Hi Guys,> >>>> >>> I think it is time for us to have another meeting. Yahoo would be > >>> happy to host if this works for everybody. How about Wednesday, > >>> 2/9 4-6 pm.> >>> Please,> >>> let us know if you are planning to attend and if the date/time > >>> works> for> >>> you.> >>>> >>> Things that come to mind to discuss and as always feel free to > >>> suggest> >>> others:> >>>> >>> - Error handling proposal - this might be easier to finalize> >>> face-to-face> >>> - Pig 0.9 plan> >>> - Pig Roadmap beyond 0.9> >>> o What do we want to do in Pig.next?> >>> o Are we ready for Pig 1.0> >>>> >>> Olga> >>>> >>>> >>>> >>>> >>> >>

What do you mean by true predicate pushdown? We hand over the fullfilter expression in that method to loader. That I guess issufficient info to push more processing at storage layer e.g. to dorange queries in Hbase. Pig doesn't have any more information aboutfilters then that to push, unless you want full logical plan.

AshutoshOn Wed, Jan 26, 2011 at 18:04, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:> Right, we do partition filtering, but not true predicate pushdown.>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:>>> Are you talking about LoadMetadata.setPartitionFilter?>> PartitionFilterOptimizer will do that.>>>> Daniel>>>>>> Dmitriy Ryaboy wrote:>>>>> I may be wrong but I think predicate pushdown is designed for, but not>>> actually implemented in the current LoadPushdown interface (you can only>>> push projections). If I am wrong, that's great.. but if not, that would be>>> an important feature to add, as people are trying to connect Pig to>>> "smart">>> storage systems like rdbmses, HBase, and Cassandra more and more. I think>>> we only kind of simulate this with partition keys info, which is not>>> always>>> sufficient>>>>>> D>>>>>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>>>> wrote:>>>>>>>>>>>>> If making Pig Thread safe (i.e.: two threads running a different pig>>>> script) is important then we need to change some of the APIs from static>>>> singleton access to a dependency injection pattern.>>>> In that case, this should probably be done before 1.0>>>> For example: UDFContext should be passed to the UDF after construction>>>> (similar to the SevrletContext in Servlet or the way Hadoop passes the>>>> context to tasks)>>>> Also a clearly separated API that does not depend on the Pig>>>> implementation>>>> would help.>>>> For example UDFContext is in org.apache.pig.impl.util when it would be>>>> better in org.apache.pig.api (Or at least an interface defining it)>>>>>>>> Julien>>>>>>>> On 1/24/11 10:14 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote:>>>>>>>> Hi Guys,>>>>>>>> I think it is time for us to have another meeting. Yahoo would be happy>>>> to>>>> host if this works for everybody. How about Wednesday, 2/9 4-6 pm.>>>> Please,>>>> let us know if you are planning to attend and if the date/time works for>>>> you.>>>>>>>> Things that come to mind to discuss and as always feel free to suggest>>>> others:>>>>>>>> - Error handling proposal - this might be easier to finalize>>>> face-to-face>>>> - Pig 0.9 plan>>>> - Pig Roadmap beyond 0.9>>>> o What do we want to do in Pig.next?>>>> o Are we ready for Pig 1.0>>>>>>>> Olga>>>>>>>>>>>>>>>>>>>>>>

Ashutosh, where do we do that? I thought we did, too, but didn't find itlast time I looked. LoadPushDown has this:

/**

* Set of possible operations that Pig can push down to a loader.

*/

enum OperatorSet {PROJECTION};There is also this in LoadMetadata, but it is pretty explicit in thecomments about this being partition-specific. Are you saying that as long asone claims every column as a partition, all filters will be pushed down?Will the filters also be applied to the data the loader returns, even if theloader accepts the expression? That would be useful for loaders that haveability to apply probabilistic filters, for example.

> What do you mean by true predicate pushdown? We hand over the full> filter expression in that method to loader. That I guess is> sufficient info to push more processing at storage layer e.g. to do> range queries in Hbase. Pig doesn't have any more information about> filters then that to push, unless you want full logical plan.>> Ashutosh> On Wed, Jan 26, 2011 at 18:04, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:> > Right, we do partition filtering, but not true predicate pushdown.> >> > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[EMAIL PROTECTED]>> wrote:> >> >> Are you talking about LoadMetadata.setPartitionFilter?> >> PartitionFilterOptimizer will do that.> >>> >> Daniel> >>> >>> >> Dmitriy Ryaboy wrote:> >>> >>> I may be wrong but I think predicate pushdown is designed for, but not> >>> actually implemented in the current LoadPushdown interface (you can> only> >>> push projections). If I am wrong, that's great.. but if not, that would> be> >>> an important feature to add, as people are trying to connect Pig to> >>> "smart"> >>> storage systems like rdbmses, HBase, and Cassandra more and more. I> think> >>> we only kind of simulate this with partition keys info, which is not> >>> always> >>> sufficient> >>>> >>> D> >>>> >>> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem <[EMAIL PROTECTED]>> >>> wrote:> >>>> >>>> >>>> >>>> If making Pig Thread safe (i.e.: two threads running a different pig> >>>> script) is important then we need to change some of the APIs from> static> >>>> singleton access to a dependency injection pattern.> >>>> In that case, this should probably be done before 1.0> >>>> For example: UDFContext should be passed to the UDF after construction> >>>> (similar to the SevrletContext in Servlet or the way Hadoop passes the

+

Dmitriy Ryaboy 2011-01-28, 00:15

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext