since /path/to/hadoop-cluster1/hdfs-site.xml do not have information abouthadoop-cluster2-logicalname's namenodes.One option is to add hadoop-cluster2-logicalname's namednodes to/path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, thisbecomes problem.Is there any other cleaner approach to solving this?

The option you have enumerated at the end is the current way to set upmulti clusterenvironment. That is, all the client side configurations will include thefollowing:- Logical service names (either for federation or HA)- The corresponding physical namenode addresses information

For simpler management, one could use xml include to include an xml documentthat defines all the namespaces and namenodes.

> Hello Devs,>> With hadoop 1.0 when there was single namespace. One could access any HDFS> cluster using any other hadoop config. Something like this>> hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/>> Since NameNode host and port were passed directly as part of URI, if hdfs> client version matched, one could talk to different clusters without> needing to have access to cluster specific configuration.>> With Hadoop 2.0 or HA mode, we only specify logical name for namenode and> rely on hdfs-site.xml to resolve logical name to two underlying namenode> hosts.>> So, you cannot do something like> hadoop --config /path/to/hadoop-cluster1> hdfs://hadoop-cluster2-logicalname/>> since /path/to/hadoop-cluster1/hdfs-site.xml do not have information about> hadoop-cluster2-logicalname's namenodes.>>> One option is to add hadoop-cluster2-logicalname's namednodes to> /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this> becomes problem.> Is there any other cleaner approach to solving this?>> --> Have a Nice Day!> Lohit>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

> Lohit,>> The option you have enumerated at the end is the current way to set up> multi cluster> environment. That is, all the client side configurations will include the> following:> - Logical service names (either for federation or HA)> - The corresponding physical namenode addresses information>> For simpler management, one could use xml include to include an xml> document> that defines all the namespaces and namenodes.>> Regards,> Suresh>>> On Mon, Nov 4, 2013 at 2:02 PM, lohit <[EMAIL PROTECTED]> wrote:>> > Hello Devs,> >> > With hadoop 1.0 when there was single namespace. One could access any> HDFS> > cluster using any other hadoop config. Something like this> >> > hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/> >> > Since NameNode host and port were passed directly as part of URI, if hdfs> > client version matched, one could talk to different clusters without> > needing to have access to cluster specific configuration.> >> > With Hadoop 2.0 or HA mode, we only specify logical name for namenode and> > rely on hdfs-site.xml to resolve logical name to two underlying namenode> > hosts.> >> > So, you cannot do something like> > hadoop --config /path/to/hadoop-cluster1> > hdfs://hadoop-cluster2-logicalname/> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have information> about> > hadoop-cluster2-logicalname's namenodes.> >> >> > One option is to add hadoop-cluster2-logicalname's namednodes to> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this> > becomes problem.> > Is there any other cleaner approach to solving this?> >> > --> > Have a Nice Day!> > Lohit> >>>>> --> http://hortonworks.com/download/>> --> CONFIDENTIALITY NOTICE> NOTICE: This message is intended for the use of the individual or entity to> which it is addressed and may contain information that is confidential,> privileged and exempt from disclosure under applicable law. If the reader> of this message is not the intended recipient, you are hereby notified that> any printing, copying, dissemination, distribution, disclosure or> forwarding of this communication is strictly prohibited. If you have> received this communication in error, please contact the sender immediately> and delete it from your system. Thank You.>

But that does present a problem if you have to change the DNS address ofone of the HA namenodes. It forces you to update the config on all otherclusters that want to talk to it. If you only have a few clusters that isprobably not a big deal, but it can be problematic if you have manydifferent clusters that talk to each other.

--Bobby

On 11/4/13 4:15 PM, "lohit" <[EMAIL PROTECTED]> wrote:

>Thanks Suresh!>>>2013/11/4 Suresh Srinivas <[EMAIL PROTECTED]>>>> Lohit,>>>> The option you have enumerated at the end is the current way to set up>> multi cluster>> environment. That is, all the client side configurations will include>>the>> following:>> - Logical service names (either for federation or HA)>> - The corresponding physical namenode addresses information>>>> For simpler management, one could use xml include to include an xml>> document>> that defines all the namespaces and namenodes.>>>> Regards,>> Suresh>>>>>> On Mon, Nov 4, 2013 at 2:02 PM, lohit <[EMAIL PROTECTED]>>>wrote:>>>> > Hello Devs,>> >>> > With hadoop 1.0 when there was single namespace. One could access any>> HDFS>> > cluster using any other hadoop config. Something like this>> >>> > hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/>> >>> > Since NameNode host and port were passed directly as part of URI, if>>hdfs>> > client version matched, one could talk to different clusters without>> > needing to have access to cluster specific configuration.>> >>> > With Hadoop 2.0 or HA mode, we only specify logical name for namenode>>and>> > rely on hdfs-site.xml to resolve logical name to two underlying>>namenode>> > hosts.>> >>> > So, you cannot do something like>> > hadoop --config /path/to/hadoop-cluster1>> > hdfs://hadoop-cluster2-logicalname/>> >>> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have information>> about>> > hadoop-cluster2-logicalname's namenodes.>> >>> >>> > One option is to add hadoop-cluster2-logicalname's namednodes to>> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this>> > becomes problem.>> > Is there any other cleaner approach to solving this?>> >>> > -->> > Have a Nice Day!>> > Lohit>> >>>>>>>>> -->> http://hortonworks.com/download/>>>> -->> CONFIDENTIALITY NOTICE>> NOTICE: This message is intended for the use of the individual or>>entity to>> which it is addressed and may contain information that is confidential,>> privileged and exempt from disclosure under applicable law. If the>>reader>> of this message is not the intended recipient, you are hereby notified>>that>> any printing, copying, dissemination, distribution, disclosure or>> forwarding of this communication is strictly prohibited. If you have>> received this communication in error, please contact the sender>>immediately>> and delete it from your system. Thank You.>>>>>>-- >Have a Nice Day!>Lohit

> But that does present a problem if you have to change the DNS address of> one of the HA namenodes.Not sure what you mean by this? Do you mean hostname of one of the namenodechanges?If so, why is this is not a problem for single namenode deployment?. How doapplicationsaddressing a namenode in a different cluster handle the change?> It forces you to update the config on all other> clusters that want to talk to it. If you only have a few clusters that is> probably not a big deal, but it can be problematic if you have many> different clusters that talk to each other.>> --Bobby>> On 11/4/13 4:15 PM, "lohit" <[EMAIL PROTECTED]> wrote:>> >Thanks Suresh!> >> >> >2013/11/4 Suresh Srinivas <[EMAIL PROTECTED]>> >> >> Lohit,> >>> >> The option you have enumerated at the end is the current way to set up> >> multi cluster> >> environment. That is, all the client side configurations will include> >>the> >> following:> >> - Logical service names (either for federation or HA)> >> - The corresponding physical namenode addresses information> >>> >> For simpler management, one could use xml include to include an xml> >> document> >> that defines all the namespaces and namenodes.> >>> >> Regards,> >> Suresh> >>> >>> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <[EMAIL PROTECTED]>> >>wrote:> >>> >> > Hello Devs,> >> >> >> > With hadoop 1.0 when there was single namespace. One could access any> >> HDFS> >> > cluster using any other hadoop config. Something like this> >> >> >> > hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/> >> >> >> > Since NameNode host and port were passed directly as part of URI, if> >>hdfs> >> > client version matched, one could talk to different clusters without> >> > needing to have access to cluster specific configuration.> >> >> >> > With Hadoop 2.0 or HA mode, we only specify logical name for namenode> >>and> >> > rely on hdfs-site.xml to resolve logical name to two underlying> >>namenode> >> > hosts.> >> >> >> > So, you cannot do something like> >> > hadoop --config /path/to/hadoop-cluster1> >> > hdfs://hadoop-cluster2-logicalname/> >> >> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have information> >> about> >> > hadoop-cluster2-logicalname's namenodes.> >> >> >> >> >> > One option is to add hadoop-cluster2-logicalname's namednodes to> >> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this> >> > becomes problem.> >> > Is there any other cleaner approach to solving this?> >> >> >> > --> >> > Have a Nice Day!> >> > Lohit> >> >> >>> >>> >>> >> --> >> http://hortonworks.com/download/> >>> >> --> >> CONFIDENTIALITY NOTICE> >> NOTICE: This message is intended for the use of the individual or> >>entity to> >> which it is addressed and may contain information that is confidential,> >> privileged and exempt from disclosure under applicable law. If the> >>reader> >> of this message is not the intended recipient, you are hereby notified> >>that> >> any printing, copying, dissemination, distribution, disclosure or> >> forwarding of this communication is strictly prohibited. If you have> >> received this communication in error, please contact the sender> >>immediately> >> and delete it from your system. Thank You.> >>> >> >> >> >--> >Have a Nice Day!> >Lohit>>-- http://hortonworks.com/download/

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

You are correct I did not explain myself very well. If one of the namenodes has hardware failure. In order to avoid updating the configs forevery single service that talks to HDFS you have to make sure thereplacement box appears to the network to be exactly the same as theoriginal. This is not impossible as you mentioned.

The more common case when this is problematic is upgrading clusters fromnon-HA to HA, or adding in new HA clusters, because there is no existingIP address/config to be copied. Every time this happens all existingservices must have new configs pushed to be able to talk to thenew/updated HDFS. This includes Gateways, RM, Compute Nodes, OozieServers, etc.

Again, this is not that big of a deal for a small setup, but for a largesetup it can be painful.

--Bobby

On 11/5/13 4:57 PM, "Suresh Srinivas" <[EMAIL PROTECTED]> wrote:

>On Tue, Nov 5, 2013 at 6:57 AM, Bobby Evans <[EMAIL PROTECTED]> wrote:>>> But that does present a problem if you have to change the DNS address of>> one of the HA namenodes.>>>Not sure what you mean by this? Do you mean hostname of one of the>namenode>changes?>If so, why is this is not a problem for single namenode deployment?. How>do>applications>addressing a namenode in a different cluster handle the change?>>>> It forces you to update the config on all other>> clusters that want to talk to it. If you only have a few clusters that>>is>> probably not a big deal, but it can be problematic if you have many>> different clusters that talk to each other.>>>> --Bobby>>>> On 11/4/13 4:15 PM, "lohit" <[EMAIL PROTECTED]> wrote:>>>> >Thanks Suresh!>> >>> >>> >2013/11/4 Suresh Srinivas <[EMAIL PROTECTED]>>> >>> >> Lohit,>> >>>> >> The option you have enumerated at the end is the current way to set>>up>> >> multi cluster>> >> environment. That is, all the client side configurations will include>> >>the>> >> following:>> >> - Logical service names (either for federation or HA)>> >> - The corresponding physical namenode addresses information>> >>>> >> For simpler management, one could use xml include to include an xml>> >> document>> >> that defines all the namespaces and namenodes.>> >>>> >> Regards,>> >> Suresh>> >>>> >>>> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <[EMAIL PROTECTED]>>> >>wrote:>> >>>> >> > Hello Devs,>> >> >>> >> > With hadoop 1.0 when there was single namespace. One could access>>any>> >> HDFS>> >> > cluster using any other hadoop config. Something like this>> >> >>> >> > hadoop --config /path/to/hadoop-cluster1>>hdfs://hadoop-cluster2:8020/>> >> >>> >> > Since NameNode host and port were passed directly as part of URI,>>if>> >>hdfs>> >> > client version matched, one could talk to different clusters>>without>> >> > needing to have access to cluster specific configuration.>> >> >>> >> > With Hadoop 2.0 or HA mode, we only specify logical name for>>namenode>> >>and>> >> > rely on hdfs-site.xml to resolve logical name to two underlying>> >>namenode>> >> > hosts.>> >> >>> >> > So, you cannot do something like>> >> > hadoop --config /path/to/hadoop-cluster1>> >> > hdfs://hadoop-cluster2-logicalname/>> >> >>> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have>>information>> >> about>> >> > hadoop-cluster2-logicalname's namenodes.>> >> >>> >> >>> >> > One option is to add hadoop-cluster2-logicalname's namednodes to>> >> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters,>>this>> >> > becomes problem.>> >> > Is there any other cleaner approach to solving this?>> >> >>> >> > -->> >> > Have a Nice Day!>> >> > Lohit>> >> >>> >>>> >>>> >>>> >> -->> >> http://hortonworks.com/download/>> >>>> >> -->> >> CONFIDENTIALITY NOTICE>> >> NOTICE: This message is intended for the use of the individual or>> >>entity to>> >> which it is addressed and may contain information that is>>confidential,>> >> privileged and exempt from disclosure under applicable law. If the

We've discussed a few times adding a FailoverProxyProvider which would useDNS records for this. For example, you'd add a SRV record (or multiple Arecords) for the logical name, pointing to the physical hosts backing thecluster. I think it would help reduce client-side configuration prettyneatly, though has the disadvantage that your DNS admins need to get in theloop.

> Suresh,>> You are correct I did not explain myself very well. If one of the name> nodes has hardware failure. In order to avoid updating the configs for> every single service that talks to HDFS you have to make sure the> replacement box appears to the network to be exactly the same as the> original. This is not impossible as you mentioned.>> The more common case when this is problematic is upgrading clusters from> non-HA to HA, or adding in new HA clusters, because there is no existing> IP address/config to be copied. Every time this happens all existing> services must have new configs pushed to be able to talk to the> new/updated HDFS. This includes Gateways, RM, Compute Nodes, Oozie> Servers, etc.>> Again, this is not that big of a deal for a small setup, but for a large> setup it can be painful.>> --Bobby>> On 11/5/13 4:57 PM, "Suresh Srinivas" <[EMAIL PROTECTED]> wrote:>> >On Tue, Nov 5, 2013 at 6:57 AM, Bobby Evans <[EMAIL PROTECTED]> wrote:> >> >> But that does present a problem if you have to change the DNS address of> >> one of the HA namenodes.> >> >> >Not sure what you mean by this? Do you mean hostname of one of the> >namenode> >changes?> >If so, why is this is not a problem for single namenode deployment?. How> >do> >applications> >addressing a namenode in a different cluster handle the change?> >> >> >> It forces you to update the config on all other> >> clusters that want to talk to it. If you only have a few clusters that> >>is> >> probably not a big deal, but it can be problematic if you have many> >> different clusters that talk to each other.> >>> >> --Bobby> >>> >> On 11/4/13 4:15 PM, "lohit" <[EMAIL PROTECTED]> wrote:> >>> >> >Thanks Suresh!> >> >> >> >> >> >2013/11/4 Suresh Srinivas <[EMAIL PROTECTED]>> >> >> >> >> Lohit,> >> >>> >> >> The option you have enumerated at the end is the current way to set> >>up> >> >> multi cluster> >> >> environment. That is, all the client side configurations will include> >> >>the> >> >> following:> >> >> - Logical service names (either for federation or HA)> >> >> - The corresponding physical namenode addresses information> >> >>> >> >> For simpler management, one could use xml include to include an xml> >> >> document> >> >> that defines all the namespaces and namenodes.> >> >>> >> >> Regards,> >> >> Suresh> >> >>> >> >>> >> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <[EMAIL PROTECTED]>> >> >>wrote:> >> >>> >> >> > Hello Devs,> >> >> >> >> >> > With hadoop 1.0 when there was single namespace. One could access> >>any> >> >> HDFS> >> >> > cluster using any other hadoop config. Something like this> >> >> >> >> >> > hadoop --config /path/to/hadoop-cluster1> >>hdfs://hadoop-cluster2:8020/> >> >> >> >> >> > Since NameNode host and port were passed directly as part of URI,> >>if> >> >>hdfs> >> >> > client version matched, one could talk to different clusters> >>without> >> >> > needing to have access to cluster specific configuration.> >> >> >> >> >> > With Hadoop 2.0 or HA mode, we only specify logical name for> >>namenode> >> >>and> >> >> > rely on hdfs-site.xml to resolve logical name to two underlying> >> >>namenode> >> >> > hosts.> >> >> >> >> >> > So, you cannot do something like> >> >> > hadoop --config /path/to/hadoop-cluster1> >> >> > hdfs://hadoop-cluster2-logicalname/> >> >> >> >> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have> >>information> >> >> about> >> >> > hadoop-cluster2-logicalname's namenodes.Todd LipconSoftware Engineer, Cloudera

+

Todd Lipcon 2013-11-06, 19:40

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext