We keep running into problems (e.g. KAFKA-946) that are basically due tothe fact that the Kafka committers don't seem to mostly be Hadoopdevelopers and aren't doing a good job of maintaining this code (keeping ittested, improving it, documenting it, writing tutorials, getting it movedover to the more modern apis, getting it working with newer Hadoopversions, etc).

A couple of options:1. We could try to get someone in the Kafka community (either a currentcommitter or not) who would adopt this as their baby (it's not much code).2. We could just let Camus take over this functionality. They already havea more sophisticated consumer and the producer is pretty minimal.

So are there any people who would like to adopt the current Hadoop contribcode?

Conversely would it be possible to provide the same or similarfunctionality in Camus and just delete these?

If the Hadoop consumer/producers use-case will remain relevant for Kafka(I assume it will), it would make sense to have the core components (kafkainput/output format at least) as part of Kafka so that it could be built,tested and versioned together to maintain compatibility.This would also make it easier to build custom MR jobs on top of Kafka,rather than having to decouple stuff from Camus.Also it would also be less confusing for users at least when startingusing Kafka.

Camus could use those instead of providing it's own.

This being said we did some work on the consumer side (0.8 and the new(er)MR API).We could probably try to rewrite them to use Camus or fix Camus orwhatever, but please consider this alternative as well.

Thanks,Cosmin

On 7/3/13 11:06 AM, "Sam Meder" <[EMAIL PROTECTED]> wrote:

>I think it makes sense to kill the hadoop consumer/producer code in>Kafka, given, as you said, Camus and the simplicity of the Hadoop>producer.>>/Sam>>On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:>>> We currently have a contrib package for consuming and producing messages>> from mapreduce (>> >>https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5>>3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD>> ).>> >> We keep running into problems (e.g. KAFKA-946) that are basically due to>> the fact that the Kafka committers don't seem to mostly be Hadoop>> developers and aren't doing a good job of maintaining this code>>(keeping it>> tested, improving it, documenting it, writing tutorials, getting it>>moved>> over to the more modern apis, getting it working with newer Hadoop>> versions, etc).>> >> A couple of options:>> 1. We could try to get someone in the Kafka community (either a current>> committer or not) who would adopt this as their baby (it's not much>>code).>> 2. We could just let Camus take over this functionality. They already>>have>> a more sophisticated consumer and the producer is pretty minimal.>> >> So are there any people who would like to adopt the current Hadoop>>contrib>> code?>> >> Conversely would it be possible to provide the same or similar>> functionality in Camus and just delete these?>> >> -Jay>

> If the Hadoop consumer/producers use-case will remain relevant for Kafka> (I assume it will), it would make sense to have the core components (kafka> input/output format at least) as part of Kafka so that it could be built,> tested and versioned together to maintain compatibility.> This would also make it easier to build custom MR jobs on top of Kafka,> rather than having to decouple stuff from Camus.> Also it would also be less confusing for users at least when starting> using Kafka.>> Camus could use those instead of providing it's own.>> This being said we did some work on the consumer side (0.8 and the new(er)> MR API).> We could probably try to rewrite them to use Camus or fix Camus or> whatever, but please consider this alternative as well.>> Thanks,> Cosmin>>>> On 7/3/13 11:06 AM, "Sam Meder" <[EMAIL PROTECTED]> wrote:>> >I think it makes sense to kill the hadoop consumer/producer code in> >Kafka, given, as you said, Camus and the simplicity of the Hadoop> >producer.> >> >/Sam> >> >On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:> >> >> We currently have a contrib package for consuming and producing messages> >> from mapreduce (> >>> >>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5> >>3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD> >> ).> >>> >> We keep running into problems (e.g. KAFKA-946) that are basically due to> >> the fact that the Kafka committers don't seem to mostly be Hadoop> >> developers and aren't doing a good job of maintaining this code> >>(keeping it> >> tested, improving it, documenting it, writing tutorials, getting it> >>moved> >> over to the more modern apis, getting it working with newer Hadoop> >> versions, etc).> >>> >> A couple of options:> >> 1. We could try to get someone in the Kafka community (either a current> >> committer or not) who would adopt this as their baby (it's not much> >>code).> >> 2. We could just let Camus take over this functionality. They already> >>have> >> a more sophisticated consumer and the producer is pretty minimal.> >>> >> So are there any people who would like to adopt the current Hadoop> >>contrib> >> code?> >>> >> Conversely would it be possible to provide the same or similar> >> functionality in Camus and just delete these?> >>> >> -Jay> >>> --> You received this message because you are subscribed to the Google Groups> "Camus - Kafka ETL for Hadoop" group.> To unsubscribe from this group and stop receiving emails from it, send an> email to camus_etl+[EMAIL PROTECTED].> For more options, visit https://groups.google.com/groups/opt_out.>>>

I guess I am more concerned about the long term than the short term. Ithink if you guys want to have all the Hadoop+Kafka stuff then we shouldmove the producer there and it sounds like it would be possible to getsimilar functionality from the existing consumer code. I am not in a rush Ijust want to figure out a plan.

The alternative is if there is anyone who is interested in maintaining thisstuff in Kafka. The current state where it is poorly documented andmaintained is not good.

> We can easily make a Camus configuration that would mimic the> functionality of the hadoop consumer in contrib. It may require the> addition of a BinaryWritable decoder, and a couple minor code changes. As> for the producer, we don't have anything in Camus that does what it does.> But maybe we should at some point. In the meantime, Gaurav is going to> take a look at what is in contrib and see if it is easily fixed. I have a> feeling it probably will take minimal effort, and allow us to kick the can> down the road till we get more time to properly address this.>> @Jay, would this work for now?>> Ken>>> On Wed, Jul 3, 2013 at 10:57 AM, Felix GV <[EMAIL PROTECTED]> wrote:>>> IMHO, I think Camus should probably be decoupled from Avro before the>> simpler contribs are deleted.>>>> We don't actually use the contribs, so I'm not saying this for our sake,>> but it seems like the right thing to do to provide simple examples for this>> type of stuff, no...?>>>> -->> Felix>>>>>> On Wed, Jul 3, 2013 at 4:56 AM, Cosmin Lehene <[EMAIL PROTECTED]> wrote:>>>>> If the Hadoop consumer/producers use-case will remain relevant for Kafka>>> (I assume it will), it would make sense to have the core components>>> (kafka>>> input/output format at least) as part of Kafka so that it could be built,>>> tested and versioned together to maintain compatibility.>>> This would also make it easier to build custom MR jobs on top of Kafka,>>> rather than having to decouple stuff from Camus.>>> Also it would also be less confusing for users at least when starting>>> using Kafka.>>>>>> Camus could use those instead of providing it's own.>>>>>> This being said we did some work on the consumer side (0.8 and the>>> new(er)>>> MR API).>>> We could probably try to rewrite them to use Camus or fix Camus or>>> whatever, but please consider this alternative as well.>>>>>> Thanks,>>> Cosmin>>>>>>>>>>>> On 7/3/13 11:06 AM, "Sam Meder" <[EMAIL PROTECTED]> wrote:>>>>>> >I think it makes sense to kill the hadoop consumer/producer code in>>> >Kafka, given, as you said, Camus and the simplicity of the Hadoop>>> >producer.>>> >>>> >/Sam>>> >>>> >On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:>>> >>>> >> We currently have a contrib package for consuming and producing>>> messages>>> >> from mapreduce (>>> >>>>> >>>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5>>> >>3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD>>> >> ).>>> >>>>> >> We keep running into problems (e.g. KAFKA-946) that are basically due>>> to>>> >> the fact that the Kafka committers don't seem to mostly be Hadoop>>> >> developers and aren't doing a good job of maintaining this code>>> >>(keeping it>>> >> tested, improving it, documenting it, writing tutorials, getting it>>> >>moved>>> >> over to the more modern apis, getting it working with newer Hadoop>>> >> versions, etc).>>> >>>>> >> A couple of options:>>> >> 1. We could try to get someone in the Kafka community (either a>>> current>>> >> committer or not) who would adopt this as their baby (it's not much>>> >>code).>>> >> 2. We could just let Camus take over this functionality. They already>>> >>have>>> >> a more sophisticated consumer and the producer is pretty minimal.>>> >>>>> >> So are there any people who would like to adopt the current Hadoop

We can easily make a Camus configuration that would mimic the functionalityof the hadoop consumer in contrib. It may require the addition of aBinaryWritable decoder, and a couple minor code changes. As for theproducer, we don't have anything in Camus that does what it does. Butmaybe we should at some point. In the meantime, Gaurav is going to take alook at what is in contrib and see if it is easily fixed. I have a feelingit probably will take minimal effort, and allow us to kick the can down theroad till we get more time to properly address this.

> IMHO, I think Camus should probably be decoupled from Avro before the> simpler contribs are deleted.>> We don't actually use the contribs, so I'm not saying this for our sake,> but it seems like the right thing to do to provide simple examples for this> type of stuff, no...?>> --> Felix>>> On Wed, Jul 3, 2013 at 4:56 AM, Cosmin Lehene <[EMAIL PROTECTED]> wrote:>>> If the Hadoop consumer/producers use-case will remain relevant for Kafka>> (I assume it will), it would make sense to have the core components (kafka>> input/output format at least) as part of Kafka so that it could be built,>> tested and versioned together to maintain compatibility.>> This would also make it easier to build custom MR jobs on top of Kafka,>> rather than having to decouple stuff from Camus.>> Also it would also be less confusing for users at least when starting>> using Kafka.>>>> Camus could use those instead of providing it's own.>>>> This being said we did some work on the consumer side (0.8 and the new(er)>> MR API).>> We could probably try to rewrite them to use Camus or fix Camus or>> whatever, but please consider this alternative as well.>>>> Thanks,>> Cosmin>>>>>>>> On 7/3/13 11:06 AM, "Sam Meder" <[EMAIL PROTECTED]> wrote:>>>> >I think it makes sense to kill the hadoop consumer/producer code in>> >Kafka, given, as you said, Camus and the simplicity of the Hadoop>> >producer.>> >>> >/Sam>> >>> >On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:>> >>> >> We currently have a contrib package for consuming and producing>> messages>> >> from mapreduce (>> >>>> >>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5>> >>3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD>> >> ).>> >>>> >> We keep running into problems (e.g. KAFKA-946) that are basically due>> to>> >> the fact that the Kafka committers don't seem to mostly be Hadoop>> >> developers and aren't doing a good job of maintaining this code>> >>(keeping it>> >> tested, improving it, documenting it, writing tutorials, getting it>> >>moved>> >> over to the more modern apis, getting it working with newer Hadoop>> >> versions, etc).>> >>>> >> A couple of options:>> >> 1. We could try to get someone in the Kafka community (either a current>> >> committer or not) who would adopt this as their baby (it's not much>> >>code).>> >> 2. We could just let Camus take over this functionality. They already>> >>have>> >> a more sophisticated consumer and the producer is pretty minimal.>> >>>> >> So are there any people who would like to adopt the current Hadoop>> >>contrib>> >> code?>> >>>> >> Conversely would it be possible to provide the same or similar>> >> functionality in Camus and just delete these?>> >>>> >> -Jay>> >>>>> -->> You received this message because you are subscribed to the Google Groups>> "Camus - Kafka ETL for Hadoop" group.>> To unsubscribe from this group and stop receiving emails from it, send an>> email to camus_etl+[EMAIL PROTECTED].>> For more options, visit https://groups.google.com/groups/opt_out.>>>>>>> --> You received this message because you are subscribed to the Google Groups> "Camus - Kafka ETL for Hadoop" group.> To unsubscribe from this group and stop receiving emails from it, send an

Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop. We don't currently use Avro and I'm not sure if we are going to. I came across this post.

If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?

Camus can be made to work without avro. You will need to implement a message decoder and and a data writer. We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two.

-Ken

On Aug 7, 2013, at 9:33 AM, [EMAIL PROTECTED] wrote:

> Hi all,> > Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop. We don't currently use Avro and I'm not sure if we are going to. I came across this post.> > If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?> > Thanks!> -Andrew> > -- > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+[EMAIL PROTECTED].> For more options, visit https://groups.google.com/groups/opt_out.> >

We also have a need today to ETL from Kafka into Hadoop and we do not currently nor have any plans to use Avro.

So is the official direction based on this discussion to ditch the Kafka contrib code and direct people to use Camus without Avro as Ken described or are both solutions going to survive?

I can put time into the contrib code and/or work on documenting the tutorial on how to make Camus work without Avro.

Which is the preferred route, for the long term?

Thanks,Andrew

On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:> Hi Andrew,> > > > Camus can be made to work without avro. You will need to implement a message decoder and and a data writer. We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two. > > > > -Ken> > > > On Aug 7, 2013, at 9:33 AM, [EMAIL PROTECTED] wrote:> > > > > Hi all,> > > > > > Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop. We don't currently use Avro and I'm not sure if we are going to. I came across this post.> > > > > > If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?> > > > > > Thanks!> > > -Andrew> > > > > > -- > > > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.> > > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+[EMAIL PROTECTED].> > > For more options, visit https://groups.google.com/groups/opt_out.> > > > > >

The contrib code is simple and probably wouldn't require too much work tofix, but it's a lot less robust than Camus, so you would ideally need to dosome work to make it solid against all edge cases, failure scenarios andperformance bottlenecks...

I would definitely recommend investing in Camus instead, since it alreadycovers a lot of the challenges I'm mentioning above, and also has morecommunity support behind it at the moment (as far as I can tell, anyway),so it is more likely to keep getting improvements than the contrib code.

--FelixOn Thu, Aug 8, 2013 at 9:28 AM, <[EMAIL PROTECTED]> wrote:

> We also have a need today to ETL from Kafka into Hadoop and we do not> currently nor have any plans to use Avro.>> So is the official direction based on this discussion to ditch the Kafka> contrib code and direct people to use Camus without Avro as Ken described> or are both solutions going to survive?>> I can put time into the contrib code and/or work on documenting the> tutorial on how to make Camus work without Avro.>> Which is the preferred route, for the long term?>> Thanks,> Andrew>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:> > Hi Andrew,> >> >> >> > Camus can be made to work without avro. You will need to implement a> message decoder and and a data writer. We need to add a better tutorial> on how to do this, but it isn't that difficult. If you decide to go down> this path, you can always ask questions on this list. I try to make sure> each email gets answered. But it can take me a day or two.> >> >> >> > -Ken> >> >> >> > On Aug 7, 2013, at 9:33 AM, [EMAIL PROTECTED] wrote:> >> >> >> > > Hi all,> >> > >> >> > > Over at the Wikimedia Foundation, we're trying to figure out the best> way to do our ETL from Kafka into Hadoop. We don't currently use Avro and> I'm not sure if we are going to. I came across this post.> >> > >> >> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do> you think we should not consider it as one of our viable options?> >> > >> >> > > Thanks!> >> > > -Andrew> >> > >> >> > > --> >> > > You received this message because you are subscribed to the Google> Groups "Camus - Kafka ETL for Hadoop" group.> >> > > To unsubscribe from this group and stop receiving emails from it, send> an email to camus_etl+[EMAIL PROTECTED].> >> > > For more options, visit https://groups.google.com/groups/opt_out.> >> > >> >> > >>>

I hope to have that done sometime in the morning and would be happy to share it if others can benefit from it.

Thanks,AndrewOn Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:>> The contrib code is simple and probably wouldn't require too much work to > fix, but it's a lot less robust than Camus, so you would ideally need to do > some work to make it solid against all edge cases, failure scenarios and > performance bottlenecks...>> I would definitely recommend investing in Camus instead, since it already > covers a lot of the challenges I'm mentioning above, and also has more > community support behind it at the moment (as far as I can tell, anyway), > so it is more likely to keep getting improvements than the contrib code.>> --> Felix>>> On Thu, Aug 8, 2013 at 9:28 AM, <[EMAIL PROTECTED] <javascript:>>wrote:>>> We also have a need today to ETL from Kafka into Hadoop and we do not >> currently nor have any plans to use Avro.>>>> So is the official direction based on this discussion to ditch the Kafka >> contrib code and direct people to use Camus without Avro as Ken described >> or are both solutions going to survive?>>>> I can put time into the contrib code and/or work on documenting the >> tutorial on how to make Camus work without Avro.>>>> Which is the preferred route, for the long term?>>>> Thanks,>> Andrew>>>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:>> > Hi Andrew,>> >>> >>> >>> > Camus can be made to work without avro. You will need to implement a >> message decoder and and a data writer. We need to add a better tutorial >> on how to do this, but it isn't that difficult. If you decide to go down >> this path, you can always ask questions on this list. I try to make sure >> each email gets answered. But it can take me a day or two.>> >>> >>> >>> > -Ken>> >>> >>> >>> > On Aug 7, 2013, at 9:33 AM, [EMAIL PROTECTED] <javascript:> wrote:>> >>> >>> >>> > > Hi all,>> >>> > >>> >>> > > Over at the Wikimedia Foundation, we're trying to figure out the best >> way to do our ETL from Kafka into Hadoop. We don't currently use Avro and >> I'm not sure if we are going to. I came across this post.>> >>> > >>> >>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do >> you think we should not consider it as one of our viable options?>> >>> > >>> >>> > > Thanks!>> >>> > > -Andrew>> >>> > >>> >>> > > -->> >>> > > You received this message because you are subscribed to the Google >> Groups "Camus - Kafka ETL for Hadoop" group.>> >>> > > To unsubscribe from this group and stop receiving emails from it, >> send an email to camus_etl+[EMAIL PROTECTED] <javascript:>.>> >>> > > For more options, visit https://groups.google.com/groups/opt_out.>> >>> > >>> >>> > >>>>>>

Dibyendu,According to the pull request: https://github.com/linkedin/camus/pull/15 itwas merged into the camus-kafka-0.8 branch. I have not checked if the codewas subsequently removed, however, two at least one the important filesfrom this patch(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)is still present.

I just checked and that patch is in .8 branch. Thanks for working on backporting it Andrew. We'd be happy to commit that work to master.

As for the kafka contrib project vs Camus, they are similar but not quiteidentical. Camus is intended to be a high throughput ETL for bulkingestion of Kafka data into HDFS. Where as what we have in contrib ismore of a simple KafkaInputFormat. Neither can really replace the other.If you had a complex hadoop workflow and wanted to introduce some Kafkadata into that workflow, using Camus would be a gigantic overkill and apain to setup. On the flipside, if what you want is frequent reliableingest of Kafka data into HDFS, a simple InputFormat doesn't provide youwith that.

I think it would be preferable to simplify the existing contribInput/OutputFormats by refactoring them to use the more stable higher levelKafka APIs. Currently they use the lower level APIs. This should makethem easier to maintain, and user friendly enough to avoid the need forextensive documentation.

> Dibyendu,> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8> branch. I have not checked if the code was subsequently removed, however,> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)> is still present.>> Thanks,> Andrew>>> On Fri, Aug 9, 2013 at 9:39 AM, <[EMAIL PROTECTED]> wrote:>>> Hi Ken,>>>> I am also working on making the Camus fit for Non Avro message for our>> requirement.>>>> I see you mentioned about this patch (>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)>> which supports custom data writer for Camus. But this patch is not pulled>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?>>>> Regards,>> Dibyendu>>>> -->> You received this message because you are subscribed to a topic in the>> Google Groups "Camus - Kafka ETL for Hadoop" group.>> To unsubscribe from this topic, visit>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.>> To unsubscribe from this group and all its topics, send an email to>> camus_etl+[EMAIL PROTECTED].>> For more options, visit https://groups.google.com/groups/opt_out.>>>> --> You received this message because you are subscribed to the Google Groups> "Camus - Kafka ETL for Hadoop" group.> To unsubscribe from this group and stop receiving emails from it, send an> email to camus_etl+[EMAIL PROTECTED].> For more options, visit https://groups.google.com/groups/opt_out.>>>

1. We aren't deprecating anything. I just noticed that the Hadoop contribpackage wasn't getting as much attention as it should.

2. Andrew or anyone--if there is anyone using the contrib package who wouldbe willing to volunteer to kind of adopt it that would be great. I am happyto help in whatever way I can. The practical issue is that most of thecommitters are either using Camus or not using Hadoop at all so we justhaven't been doing a good job of documenting, bug fixing, and supportingthe contrib packages.

3. Ken, if you could document how to use Camus that would likely make it alot more useful to people. I think most people would want a full-fledgedETL solution and would likely prefer Camus, but very few people are usingAvro.

> I just checked and that patch is in .8 branch. Thanks for working on> back porting it Andrew. We'd be happy to commit that work to master.>> As for the kafka contrib project vs Camus, they are similar but not quite> identical. Camus is intended to be a high throughput ETL for bulk> ingestion of Kafka data into HDFS. Where as what we have in contrib is> more of a simple KafkaInputFormat. Neither can really replace the other.> If you had a complex hadoop workflow and wanted to introduce some Kafka> data into that workflow, using Camus would be a gigantic overkill and a> pain to setup. On the flipside, if what you want is frequent reliable> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you> with that.>> I think it would be preferable to simplify the existing contrib> Input/OutputFormats by refactoring them to use the more stable higher level> Kafka APIs. Currently they use the lower level APIs. This should make> them easier to maintain, and user friendly enough to avoid the need for> extensive documentation.>> Ken>>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <[EMAIL PROTECTED]>wrote:>>> Dibyendu,>> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8>> branch. I have not checked if the code was subsequently removed, however,>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)>> is still present.>>>> Thanks,>> Andrew>>>>>> On Fri, Aug 9, 2013 at 9:39 AM, <[EMAIL PROTECTED]>wrote:>>>>> Hi Ken,>>>>>> I am also working on making the Camus fit for Non Avro message for our>>> requirement.>>>>>> I see you mentioned about this patch (>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)>>> which supports custom data writer for Camus. But this patch is not pulled>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?>>>>>> Regards,>>> Dibyendu>>>>>> -->>> You received this message because you are subscribed to a topic in the>>> Google Groups "Camus - Kafka ETL for Hadoop" group.>>> To unsubscribe from this topic, visit>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.>>> To unsubscribe from this group and all its topics, send an email to>>> camus_etl+[EMAIL PROTECTED].>>> For more options, visit https://groups.google.com/groups/opt_out.>>>>>>> -->> You received this message because you are subscribed to the Google Groups>> "Camus - Kafka ETL for Hadoop" group.>> To unsubscribe from this group and stop receiving emails from it, send an>> email to camus_etl+[EMAIL PROTECTED].>> For more options, visit https://groups.google.com/groups/opt_out.>>>>>>>> --> You received this message because you are subscribed to the Google Groups> "Camus - Kafka ETL for Hadoop" group.> To unsubscribe from this group and stop receiving emails from it, send an> email to camus_etl+[EMAIL PROTECTED].> For more options, visit https://groups.google.com/groups/opt_out.

I would like to do this refactoring since I did a high level consumer a while ago. A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to pick this up instead. Thanks!

1. We aren't deprecating anything. I just noticed that the Hadoop contribpackage wasn't getting as much attention as it should.

2. Andrew or anyone--if there is anyone using the contrib package who wouldbe willing to volunteer to kind of adopt it that would be great. I am happyto help in whatever way I can. The practical issue is that most of thecommitters are either using Camus or not using Hadoop at all so we justhaven't been doing a good job of documenting, bug fixing, and supportingthe contrib packages.

3. Ken, if you could document how to use Camus that would likely make it alot more useful to people. I think most people would want a full-fledgedETL solution and would likely prefer Camus, but very few people are usingAvro.

> I just checked and that patch is in .8 branch. Thanks for working on> back porting it Andrew. We'd be happy to commit that work to master.>> As for the kafka contrib project vs Camus, they are similar but not quite> identical. Camus is intended to be a high throughput ETL for bulk> ingestion of Kafka data into HDFS. Where as what we have in contrib is> more of a simple KafkaInputFormat. Neither can really replace the other.> If you had a complex hadoop workflow and wanted to introduce some Kafka> data into that workflow, using Camus would be a gigantic overkill and a> pain to setup. On the flipside, if what you want is frequent reliable> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you> with that.>> I think it would be preferable to simplify the existing contrib> Input/OutputFormats by refactoring them to use the more stable higher level> Kafka APIs. Currently they use the lower level APIs. This should make> them easier to maintain, and user friendly enough to avoid the need for> extensive documentation.>> Ken>>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <[EMAIL PROTECTED]>wrote:>>> Dibyendu,>> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8>> branch. I have not checked if the code was subsequently removed, however,>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)>> is still present.>>>> Thanks,>> Andrew>>>>>> On Fri, Aug 9, 2013 at 9:39 AM, <[EMAIL PROTECTED]>wrote:>>>>> Hi Ken,>>>>>> I am also working on making the Camus fit for Non Avro message for our>>> requirement.>>>>>> I see you mentioned about this patch (>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)>>> which supports custom data writer for Camus. But this patch is not pulled

I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.On Aug 12, 2013, at 8:16 PM, Kam Kasravi <[EMAIL PROTECTED]> wrote:

> I would like to do this refactoring since I did a high level consumer a while ago. > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.> KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to > pick this up instead. Thanks!> > Kam> > From: Jay Kreps <[EMAIL PROTECTED]>> To: Ken Goodhope <[EMAIL PROTECTED]> > Cc: Andrew Psaltis <[EMAIL PROTECTED]>; [EMAIL PROTECTED]; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Saturday, August 10, 2013 3:30 PM> Subject: Re: Kafka/Hadoop consumers and producers> > So guys, just to throw my 2 cents in:> > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib> package wasn't getting as much attention as it should.> > 2. Andrew or anyone--if there is anyone using the contrib package who would> be willing to volunteer to kind of adopt it that would be great. I am happy> to help in whatever way I can. The practical issue is that most of the> committers are either using Camus or not using Hadoop at all so we just> haven't been doing a good job of documenting, bug fixing, and supporting> the contrib packages.> > 3. Ken, if you could document how to use Camus that would likely make it a> lot more useful to people. I think most people would want a full-fledged> ETL solution and would likely prefer Camus, but very few people are using> Avro.> > -Jay> > > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <[EMAIL PROTECTED]> wrote:> > > I just checked and that patch is in .8 branch. Thanks for working on> > back porting it Andrew. We'd be happy to commit that work to master.> >> > As for the kafka contrib project vs Camus, they are similar but not quite> > identical. Camus is intended to be a high throughput ETL for bulk> > ingestion of Kafka data into HDFS. Where as what we have in contrib is> > more of a simple KafkaInputFormat. Neither can really replace the other.> > If you had a complex hadoop workflow and wanted to introduce some Kafka> > data into that workflow, using Camus would be a gigantic overkill and a> > pain to setup. On the flipside, if what you want is frequent reliable> > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you> > with that.> >> > I think it would be preferable to simplify the existing contrib> > Input/OutputFormats by refactoring them to use the more stable higher level> > Kafka APIs. Currently they use the lower level APIs. This should make> > them easier to maintain, and user friendly enough to avoid the need for> > extensive documentation.> >> > Ken> >> >> > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <[EMAIL PROTECTED]>wrote:> >> >> Dibyendu,> >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8

Kam,I am perfectly fine if you pick this up. After thinking about it for awhile, we are going to upgrade to Kafka 0.8.0 and also use Camus as itmore closely matches our use case, with the caveat of we do not use Avro.With that said, I will try and work on the back-port of custom data writerpatch[1], however, I am not sure how quickly I will get this done as weare going to work towards upgrading our Kafka cluster.

>I would like to do this refactoring since I did a high level consumer a>while ago. >A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also>hoping to add to contribute.>It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka>0.8 to the bigtop distribution.>KAFKA-949 basically allows kafka brokers to be started up using sysvinit>services and would ease some of the>startup/configuration issues that newbies have when getting started with>kafka. Ideally I would like to>fold a number of kafka/bin/* commands into the kafka service. Andrew>please let me know if would like to>pick this up instead. Thanks!>>Kam>>>________________________________> From: Jay Kreps <[EMAIL PROTECTED]>>To: Ken Goodhope <[EMAIL PROTECTED]>>Cc: Andrew Psaltis <[EMAIL PROTECTED]>;>[EMAIL PROTECTED]; "[EMAIL PROTECTED]"><[EMAIL PROTECTED]>; "[EMAIL PROTECTED]"><[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene><[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>;>"[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>Sent: Saturday, August 10, 2013 3:30 PM>Subject: Re: Kafka/Hadoop consumers and producers> >>So guys, just to throw my 2 cents in:>>1. We aren't deprecating anything. I just noticed that the Hadoop contrib>package wasn't getting as much attention as it should.>>2. Andrew or anyone--if there is anyone using the contrib package who>would>be willing to volunteer to kind of adopt it that would be great. I am>happy>to help in whatever way I can. The practical issue is that most of the>committers are either using Camus or not using Hadoop at all so we just>haven't been doing a good job of documenting, bug fixing, and supporting>the contrib packages.>>3. Ken, if you could document how to use Camus that would likely make it a>lot more useful to people. I think most people would want a full-fledged>ETL solution and would likely prefer Camus, but very few people are using>Avro.>>-Jay>>>On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <[EMAIL PROTECTED]>>wrote:>>> I just checked and that patch is in .8 branch. Thanks for working on>> back porting it Andrew. We'd be happy to commit that work to master.>>>> As for the kafka contrib project vs Camus, they are similar but not>>quite>> identical. Camus is intended to be a high throughput ETL for bulk>> ingestion of Kafka data into HDFS. Where as what we have in contrib is>> more of a simple KafkaInputFormat. Neither can really replace the>>other.>> If you had a complex hadoop workflow and wanted to introduce some Kafka>> data into that workflow, using Camus would be a gigantic overkill and a>> pain to setup. On the flipside, if what you want is frequent reliable>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you>> with that.>>>> I think it would be preferable to simplify the existing contrib>> Input/OutputFormats by refactoring them to use the more stable higher>>level>> Kafka APIs. Currently they use the lower level APIs. This should make>> them easier to maintain, and user friendly enough to avoid the need for>> extensive documentation.>>>> Ken>>>>>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis>><[EMAIL PROTECTED]>wrote:>>>>> Dibyendu,>>> According to the pull request:>>>https://github.com/linkedin/camus/pull/15it was merged into the

> Kam,> I am perfectly fine if you pick this up. After thinking about it for a> while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it> more closely matches our use case, with the caveat of we do not use Avro.> With that said, I will try and work on the back-port of custom data writer> patch[1], however, I am not sure how quickly I will get this done as we> are going to work towards upgrading our Kafka cluster.> > Thanks,> Andrew> > [1] > https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5> 2f7aa8> > > > > > On 8/12/13 6:16 PM, "Kam Kasravi" <[EMAIL PROTECTED]> wrote:> >> I would like to do this refactoring since I did a high level consumer a>> while ago. >> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also>> hoping to add to contribute.>> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka>> 0.8 to the bigtop distribution.>> KAFKA-949 basically allows kafka brokers to be started up using sysvinit>> services and would ease some of the>> startup/configuration issues that newbies have when getting started with>> kafka. Ideally I would like to>> fold a number of kafka/bin/* commands into the kafka service. Andrew>> please let me know if would like to>> pick this up instead. Thanks!>> >> Kam>> >> >> ________________________________>> From: Jay Kreps <[EMAIL PROTECTED]>>> To: Ken Goodhope <[EMAIL PROTECTED]>>> Cc: Andrew Psaltis <[EMAIL PROTECTED]>;>> [EMAIL PROTECTED]; "[EMAIL PROTECTED]">> <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]">> <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene>> <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>;>> "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>> Sent: Saturday, August 10, 2013 3:30 PM>> Subject: Re: Kafka/Hadoop consumers and producers>> >> >> So guys, just to throw my 2 cents in:>> >> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib>> package wasn't getting as much attention as it should.>> >> 2. Andrew or anyone--if there is anyone using the contrib package who>> would>> be willing to volunteer to kind of adopt it that would be great. I am>> happy>> to help in whatever way I can. The practical issue is that most of the>> committers are either using Camus or not using Hadoop at all so we just>> haven't been doing a good job of documenting, bug fixing, and supporting>> the contrib packages.>> >> 3. Ken, if you could document how to use Camus that would likely make it a>> lot more useful to people. I think most people would want a full-fledged>> ETL solution and would likely prefer Camus, but very few people are using>> Avro.>> >> -Jay>> >> >> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <[EMAIL PROTECTED]>>> wrote:>> >>> I just checked and that patch is in .8 branch. Thanks for working on>>> back porting it Andrew. We'd be happy to commit that work to master.>>> >>> As for the kafka contrib project vs Camus, they are similar but not>>> quite>>> identical. Camus is intended to be a high throughput ETL for bulk>>> ingestion of Kafka data into HDFS. Where as what we have in contrib is>>> more of a simple KafkaInputFormat. Neither can really replace the>>> other.>>> If you had a complex hadoop workflow and wanted to introduce some Kafka>>> data into that workflow, using Camus would be a gigantic overkill and a>>> pain to setup. On the flipside, if what you want is frequent reliable>>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you>>> with that.>>> >>> I think it would be preferable to simplify the existing contrib>>> Input/OutputFormats by refactoring them to use the more stable higher

Kam,I am perfectly fine if you pick this up. After thinking about it for awhile, we are going to upgrade to Kafka 0.8.0 and also use Camus as itmore closely matches our use case, with the caveat of we do not use Avro.With that said, I will try and work on the back-port of custom data writerpatch[1], however, I am not sure how quickly I will get this done as weare going to work towards upgrading our Kafka cluster.

>I would like to do this refactoring since I did a high level consumer a>while ago. >A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also>hoping to add to contribute.>It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka>0.8 to the bigtop distribution.>KAFKA-949 basically allows kafka brokers to be started up using sysvinit>services and would ease some of the>startup/configuration issues that newbies have when getting started with>kafka. Ideally I would like to>fold a number of kafka/bin/* commands into the kafka service. Andrew>please let me know if would like to>pick this up instead. Thanks!>>Kam>>>________________________________> From: Jay Kreps <[EMAIL PROTECTED]>>To: Ken Goodhope <[EMAIL PROTECTED]>>Cc: Andrew Psaltis <[EMAIL PROTECTED]>;>[EMAIL PROTECTED]; "[EMAIL PROTECTED]"><[EMAIL PROTECTED]>; "[EMAIL PROTECTED]"><[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene><[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>;>"[EMAIL PROTECTED]" <[EMAIL PROTECTED]>>Sent: Saturday, August 10, 2013 3:30 PM>Subject: Re: Kafka/Hadoop consumers and producers> >>So guys, just to throw my 2 cents in:>>1. We aren't deprecating anything. I just noticed that the Hadoop contrib>package wasn't getting as much attention as it should.>>2. Andrew or anyone--if there is anyone using the contrib package who>would>be willing to volunteer to kind of adopt it that would be great. I am>happy>to help in whatever way I can. The practical issue is that most of the>committers are either using Camus or not using Hadoop at all so we just>haven't been doing a good job of documenting, bug fixing, and supporting>the contrib packages.>>3. Ken, if you could document how to use Camus that would likely make it a>lot more useful to people. I think most people would want a full-fledged>ETL solution and would likely prefer Camus, but very few people are using>Avro.>>-Jay>>>On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <[EMAIL PROTECTED]>>wrote:>>> I just checked and that patch is in .8 branch. Thanks for working on>> back porting it Andrew. We'd be happy to commit that work to master.>>>> As for the kafka contrib project vs Camus, they are similar but not>>quite>> identical. Camus is intended to be a high throughput ETL for bulk>> ingestion of Kafka data into HDFS. Where as what we have in contrib is>> more of a simple KafkaInputFormat. Neither can really replace the>>other.>> If you had a complex hadoop workflow and wanted to introduce some Kafka>> data into that workflow, using Camus would be a gigantic overkill and a

I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.On Aug 12, 2013, at 8:16 PM, Kam Kasravi <[EMAIL PROTECTED]> wrote:

> I would like to do this refactoring since I did a high level consumer a while ago. > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.> KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to > pick this up instead. Thanks!> > Kam> > From: Jay Kreps <[EMAIL PROTECTED]>> To: Ken Goodhope <[EMAIL PROTECTED]> > Cc: Andrew Psaltis <[EMAIL PROTECTED]>; [EMAIL PROTECTED]; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Saturday, August 10, 2013 3:30 PM> Subject: Re: Kafka/Hadoop consumers and producers> > So guys, just to throw my 2 cents in:> > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib> package wasn't getting as much attention as it should.> > 2. Andrew or anyone--if there is anyone using the contrib package who would> be willing to volunteer to kind of adopt it that would be great. I am happy> to help in whatever way I can. The practical issue is that most of the> committers are either using Camus or not using Hadoop at all so we just> haven't been doing a good job of documenting, bug fixing, and supporting> the contrib packages.> > 3. Ken, if you could document how to use Camus that would likely make it a> lot more useful to people. I think most people would want a full-fledged> ETL solution and would likely prefer Camus, but very few people are using> Avro.> > -Jay> > > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <[EMAIL PROTECTED]> wrote:> > > I just checked and that patch is in .8 branch. Thanks for working on> > back porting it Andrew. We'd be happy to commit that work to master.> >> > As for the kafka contrib project vs Camus, they are similar but not quite> > identical. Camus is intended to be a high throughput ETL for bulk> > ingestion of Kafka data into HDFS. Where as what we have in contrib is> > more of a simple KafkaInputFormat. Neither can really replace the other.> > If you had a complex hadoop workflow and wanted to introduce some Kafka

> What installs all the kafka dependencies under /usr/share/java?The debian/ work was done mostly by another WMF staffer. We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.

Many of the /usr/share/java dependencies are handled by apt. Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.

> Thanks Andrew - I like the shell wrapper - very clean and simple. > What installs all the kafka dependencies under /usr/share/java?> > From: Andrew Otto <[EMAIL PROTECTED]>> To: Kam Kasravi <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Ken Goodhope <[EMAIL PROTECTED]>; Andrew Psaltis <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Monday, August 12, 2013 7:00 PM> Subject: Re: Kafka/Hadoop consumers and producers> > We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.> > https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian> > Most relevant, Ken, is an init script for Kafka:> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init> > And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka> > I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.> > > On Aug 12, 2013, at 8:16 PM, Kam Kasravi <[EMAIL PROTECTED]> wrote:> > > I would like to do this refactoring since I did a high level consumer a while ago. > > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.> > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.> > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the > > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to > > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to > > pick this up instead. Thanks!> > > > Kam> > > > From: Jay Kreps <[EMAIL PROTECTED]>> > To: Ken Goodhope <[EMAIL PROTECTED]> > > Cc: Andrew Psaltis <[EMAIL PROTECTED]>; [EMAIL PROTECTED]; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > Sent: Saturday, August 10, 2013 3:30 PM> > Subject: Re: Kafka/Hadoop consumers and producers> > > > So guys, just to throw my 2 cents in:> > > > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib> > package wasn't getting as much attention as it should.> > > > 2. Andrew or anyone--if there is anyone using the contrib package who would> > be willing to volunteer to kind of adopt it that would be great. I am happy> > to help in whatever way I can. The practical issue is that most of the> > committers are either using Camus or not using Hadoop at all so we just> > haven't been doing a good job of documenting, bug fixing, and supporting

> What installs all the kafka dependencies under /usr/share/java?The debian/ work was done mostly by another WMF staffer. We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.

Many of the /usr/share/java dependencies are handled by apt. Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.

Yeah, there seems to be a constant struggle between the 'java way' of doing things, e.g. Maven downloading the internet, and the 'debian way', e.g. be paranoid about everything, make sure the build process is 100% repeatable.

Bigtop should definitely do whatever Bigtop thinks is best. This Makefile technique works for us now, but probably will require a lot of manual maintenance as Kafka grows.On Aug 13, 2013, at 6:03 PM, Kam Kasravi <[EMAIL PROTECTED]> wrote:

> Thanks - I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.> I may merge a bit of your work into bigtop-989 if that's ok with you. I do know the bigtop folks > would like to see sbt support.> > From: Andrew Otto <[EMAIL PROTECTED]>> To: Kam Kasravi <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Ken Goodhope <[EMAIL PROTECTED]>; Andrew Psaltis <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Tuesday, August 13, 2013 1:03 PM> Subject: Re: Kafka/Hadoop consumers and producers> > > What installs all the kafka dependencies under /usr/share/java?> > > The debian/ work was done mostly by another WMF staffer. We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.> > Many of the /usr/share/java dependencies are handled by apt. Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.> > The sbt build system has been replaced with Make:> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch> > You should be able to build a .deb by checking out the debian branch and running:> > git-buildpackage -uc -us> > -Ao> > > > > > > On Aug 13, 2013, at 1:34 PM, Kam Kasravi <[EMAIL PROTECTED]> wrote:> > > Thanks Andrew - I like the shell wrapper - very clean and simple. > > What installs all the kafka dependencies under /usr/share/java?> > > > From: Andrew Otto <[EMAIL PROTECTED]>> > To: Kam Kasravi <[EMAIL PROTECTED]> > > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Ken Goodhope <[EMAIL PROTECTED]>; Andrew Psaltis <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > Sent: Monday, August 12, 2013 7:00 PM> > Subject: Re: Kafka/Hadoop consumers and producers> > > > We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.> > > > https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian> > > > Most relevant, Ken, is an init script for Kafka:> > https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init> > > > And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:> > https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka> > > > I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.> > > > > > On Aug 12, 2013, at 8:16 PM, Kam Kasravi <[EMAIL PROTECTED]> wrote:> > > > > I would like to do this refactoring since I did a high level consumer a while ago. > > > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.

I agree with you. We are looking for a simple solution for data from Kafka to Hadoop. I have tried using Camus earlier (Non-Avro) and documentation is lacking to make it work correctly, as we do not need to introduce another component to the solution. In the meantime, can the Kafka Hadoop Consumer/Producer be documented well so we can try it out ASAP. :) Thanks.

On Friday, August 9, 2013 12:27:12 PM UTC-7, Ken Goodhope wrote:>> I just checked and that patch is in .8 branch. Thanks for working on > back porting it Andrew. We'd be happy to commit that work to master.>> As for the kafka contrib project vs Camus, they are similar but not quite > identical. Camus is intended to be a high throughput ETL for bulk > ingestion of Kafka data into HDFS. Where as what we have in contrib is > more of a simple KafkaInputFormat. Neither can really replace the other. > If you had a complex hadoop workflow and wanted to introduce some Kafka > data into that workflow, using Camus would be a gigantic overkill and a > pain to setup. On the flipside, if what you want is frequent reliable > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you > with that.>> I think it would be preferable to simplify the existing contrib > Input/OutputFormats by refactoring them to use the more stable higher level > Kafka APIs. Currently they use the lower level APIs. This should make > them easier to maintain, and user friendly enough to avoid the need for > extensive documentation.>> Ken>>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <[EMAIL PROTECTED]<javascript:>> > wrote:>>> Dibyendu,>> According to the pull request: https://github.com/linkedin/camus/pull/15<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Flinkedin%2Fcamus%2Fpull%2F15&sa=D&sntz=1&usg=AFQjCNENlPRS_I-7w_drkTC09rmQKGNNVg>it was merged into the camus-kafka-0.8 >> branch. I have not checked if the code was subsequently removed, however, >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java) >> is still present.>>>> Thanks,>> Andrew>>>>>> On Fri, Aug 9, 2013 at 9:39 AM, <[EMAIL PROTECTED] <javascript:>>> > wrote:>>>>> Hi Ken,>>>>>> I am also working on making the Camus fit for Non Avro message for our >>> requirement.>>>>>> I see you mentioned about this patch (>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Flinkedin%2Fcamus%2Fcommit%2F87917a2aea46da9d21c8f67129f6463af52f7aa8&sa=D&sntz=1&usg=AFQjCNGxLfUhDjxOiEp-zHUb14dlNYwriw>)>>> which supports custom data writer for Camus. But this patch is not pulled >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?>>>>>> Regards,>>> Dibyendu>>>>>> -->>> You received this message because you are subscribed to a topic in the >>> Google Groups "Camus - Kafka ETL for Hadoop" group.>>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.>>> To unsubscribe from this group and all its topics, send an email to >>> camus_etl+[EMAIL PROTECTED] <javascript:>.>>> For more options, visit https://groups.google.com/groups/opt_out.>>>>>>> -- >> You received this message because you are subscribed to the Google Groups >> "Camus - Kafka ETL for Hadoop" group.>> To unsubscribe from this group and stop receiving emails from it, send an >> email to camus_etl+[EMAIL PROTECTED] <javascript:>.>> For more options, visit https://groups.google.com/groups/opt_out.>> >> >>>>

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext