> Hi>> I've been working on security in Hadoop and have come up with a design for> the same. I ran some basic experiments to evaluate the design. Here's the> report for the same.>> Feedback/comments/discussions on this would be great.>> Amandeep>>> Amandeep Khurana> Computer Science Graduate Student> University of California, Santa Cruz>

This is a good paper with test data to go alongside the theoryIntroduction=======-I'd cite NFS as a good equivalent design, the same "we trust you to be who you say you are" protocol, similar assumptions about the network ("only trusted machines get on it")-If EC2 does not meet these requirements, you could argue it's fault of EC2; there's no fundamental reason why it can't offer private VPNs for clusters the way other infrastructure (VMWare) can-the whoami call is done by the command line client; different clients don't even have to do that. Mine doesn't.-it is not the "superuser" in unix sense, "root", that runs jobs, it is whichever user started hadoop on that node. It can still be a locked down user with limited machine rights.

Attacks===Add -unauthorised nodes spoofing other IP addresses (via ARP attacks) and becoming nodes in the cluster. You could acquire and then keep or destroy data, or pretend to do work and return false values. Or come up as a spoof namenode datanode and disrupt all work.-denial of service attacks: too many heartbeats, etc-spoof clients running malicious code on the tasktrackers.

Protocol=====-SSL does need to deal with trust; unless you want to pay for every server certificate (you may be able to share them), you'llneed to set up your own CA and issuing private certs -leaving you with the problem of securiing distributing CA public keys and getting SSL private keys out to nodes securely (and not anything on the net trying to use your kickstart server to boot a VM with the same mac address as a trusted server just to get at those keys)

-I'll have to get somebody who understands security protocols to review the paper. One area I'd flag as trouble is that on virtual machines, clock drift can be choppy and non-linear. You also have to worry about clients not being in the right time zone. It is good for everything to work off one clock (say the namenode) rather than their own. Amazon's S3 authentication protocol has this bug, as do the bits of WS-DM which take absolute times rather than relative ones (presumably to make operations idempotent). A the very least, the namenode needs an operation to return its current time, which callers can then work off

Implementation-any implementation should be allowed to use different (userid, credentials) than (whoami , ~/.hadoop). This is to allow workflow servers and the like to schedule work as different users.-server side should log success/failures to different Log categories; with that an JMX instrumentation you can track security attacks.

Overall, a nice paper. Do you have the patches to try it out on a bigger cluster?

> Amandeep Khurana wrote:>>> Apparently, the file attached was striped off. Here's the link for where>> you>> can get it:>> http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc.edu/%7Eakhurana/Hadoop_Security.pdf>>>>> Amandeep>>>>>>> This is a good paper with test data to go alongside the theory> Introduction> =======> -I'd cite NFS as a good equivalent design, the same "we trust you to be who> you say you are" protocol, similar assumptions about the network ("only> trusted machines get on it")> -If EC2 does not meet these requirements, you could argue it's fault of> EC2; there's no fundamental reason why it can't offer private VPNs for> clusters the way other infrastructure (VMWare) can> -the whoami call is done by the command line client; different clients> don't even have to do that. Mine doesn't.> -it is not the "superuser" in unix sense, "root", that runs jobs, it is> whichever user started hadoop on that node. It can still be a locked down> user with limited machine rights.I'll look into the NFS security stuff in detail and then add it later.

Where did EC2 come into picture?

Yes, the whoami can be bypassed, thats why the whole thing aroundauthentication.

By superuser, I meant the user who starts the hadoop instance... Will makeit clearer in the writing.>>> Attacks> ===> Add> -unauthorised nodes spoofing other IP addresses (via ARP attacks) and> becoming nodes in the cluster. You could acquire and then keep or destroy> data, or pretend to do work and return false values. Or come up as a spoof> namenode datanode and disrupt all work.> -denial of service attacks: too many heartbeats, etc> -spoof clients running malicious code on the tasktrackers.I havent looked these attacks. This paper is not focussing on that. This candefinitely be looked at and incorporated at a later stage. Lets go step bystep. (Debatable)

>>> Protocol> =====> -SSL does need to deal with trust; unless you want to pay for every server> certificate (you may be able to share them), you'll> need to set up your own CA and issuing private certs -leaving you with the> problem of securiing distributing CA public keys and getting SSL private> keys out to nodes securely (and not anything on the net trying to use your> kickstart server to boot a VM with the same mac address as a trusted server> just to get at those keys)SSL is a possible solution but the details arent the focus of this design.Regarding the other keys, there is a format around which they are createdand you dont need a CA for that.>>> -I'll have to get somebody who understands security protocols to review the> paper. One area I'd flag as trouble is that on virtual machines, clock drift> can be choppy and non-linear. You also have to worry about clients not being> in the right time zone. It is good for everything to work off one clock (say> the namenode) rather than their own. Amazon's S3 authentication protocol has> this bug, as do the bits of WS-DM which take absolute times rather than> relative ones (presumably to make operations idempotent). A the very least,> the namenode needs an operation to return its current time, which callers> can then work offThe time issue is definitely a concern and has to be somehow cracked. Thenamenode giving its time is a good idea. But the sync would still beimportant. There is a way to sync the time across the cluster. I dontremember it clearly, but I have it on my "little" cluster. I'll look thatup.>>> Implementation> -any implementation should be allowed to use different (userid,> credentials) than (whoami , ~/.hadoop). This is to allow workflow servers> and the like to schedule work as different users.Yes, thats the intention. So, you log into the system by giving a commandlikebin/hadoop login <userid>Namenode asks for a password and it authenticates it with the underlyingunix system (or a separate user oracle if we want that).Thanks! Just my first attempt at writing a paper. Glad you like it and gavesome valuable feedback.

The code that I added is kind of crude right now. It can be tested on alarge cluster, but I'd rather wait for some more inputs from others who'vebeen working on security or have thoughts around it. If this design isaccepted by everyone, I can go ahead and write up the code properly and wecan test it thereafter.

Amandeep Khurana wrote:> Thanks for the feedback Steve.> > My response on the points that you have mentioned are written inline below.> > Amandeep> > > Amandeep Khurana> Computer Science Graduate Student> University of California, Santa Cruz> > > On Thu, Mar 19, 2009 at 4:31 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:> >> Amandeep Khurana wrote:>>>>> Apparently, the file attached was striped off. Here's the link for where>>> you>>> can get it:>>> http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc.edu/%7Eakhurana/Hadoop_Security.pdf>>>>>>> Amandeep>>>>>>>>>>> This is a good paper with test data to go alongside the theory>> Introduction>> =======>> -I'd cite NFS as a good equivalent design, the same "we trust you to be who>> you say you are" protocol, similar assumptions about the network ("only>> trusted machines get on it")>> -If EC2 does not meet these requirements, you could argue it's fault of>> EC2; there's no fundamental reason why it can't offer private VPNs for>> clusters the way other infrastructure (VMWare) can>> -the whoami call is done by the command line client; different clients>> don't even have to do that. Mine doesn't.>> -it is not the "superuser" in unix sense, "root", that runs jobs, it is>> whichever user started hadoop on that node. It can still be a locked down>> user with limited machine rights.> > > I'll look into the NFS security stuff in detail and then add it later.The key point about NFS security is there was none, because the early eighties, the idea of a linux laptop getting on your wifi network was not conceivable, so you really could trust workstations. It was only with PC-NFS that the assumptions started to fail.

> > Where did EC2 come into picture?

Its an example of a place where Hadoop is deployed where the assumption that only trusted users have network access (and/or only fixed IP addresses can join the cluster) don't hold.

> > Yes, the whoami can be bypassed, thats why the whole thing around> authentication.> > By superuser, I meant the user who starts the hadoop instance... Will make> it clearer in the writing.

OK

> > >>>> Attacks>> ===>> Add>> -unauthorised nodes spoofing other IP addresses (via ARP attacks) and>> becoming nodes in the cluster. You could acquire and then keep or destroy>> data, or pretend to do work and return false values. Or come up as a spoof>> namenode datanode and disrupt all work.>> -denial of service attacks: too many heartbeats, etc>> -spoof clients running malicious code on the tasktrackers.> > > I havent looked these attacks. This paper is not focussing on that. This can> definitely be looked at and incorporated at a later stage. Lets go step by> step. (Debatable)

I was just broadening the list of attacks. Spoofing joining the cluster is something to fear.> >>>> Protocol>> =====>> -SSL does need to deal with trust; unless you want to pay for every server>> certificate (you may be able to share them), you'll>> need to set up your own CA and issuing private certs -leaving you with the>> problem of securiing distributing CA public keys and getting SSL private>> keys out to nodes securely (and not anything on the net trying to use your>> kickstart server to boot a VM with the same mac address as a trusted server>> just to get at those keys)> > > SSL is a possible solution but the details arent the focus of this design.> Regarding the other keys, there is a format around which they are created> and you dont need a CA for that.> > >>>> -I'll have to get somebody who understands security protocols to review the>> paper. One area I'd flag as trouble is that on virtual machines, clock drift>> can be choppy and non-linear. You also have to worry about clients not being>> in the right time zone. It is good for everything to work off one clock (say>> the namenode) rather than their own. Amazon's S3 authentication protocol has>> this bug, as do the bits of WS-DM which take absolute times rather than

NTP is the normal protocol, everyone tries to use it. But asking the NN for its clock would avoid having to rely on everything being in sync at the OS level -and would let the client detect when its clock had drifted too far off for a conversation. One recurrent problem of mine is machines that are on NTP but whose time zones are wrong; they are perfectly accurate to the second but 8 hours out.

1. The Jira covers only authentication using Kerberos. I dont think Kerberosis the best way to do it since I feel the scalability is limited. All keyshave to be negotiated by the Kerberos server. The design in the paper has alittle different protocol for authentication.

2. The Jira doesnt have cover the access control aspect of things. As aclient, I can skip talking to the NN and get blocks from the DN straightaway. There is no way to prevent it. This paper takes care of that aspect aswell.Amandeep KhuranaComputer Science Graduate StudentUniversity of California, Santa CruzOn Fri, Mar 20, 2009 at 12:54 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> 1. The Jira covers only authentication using Kerberos. I dont think> Kerberos is the best way to do it since I feel the scalability is limited.> All keys have to be negotiated by the Kerberos server. The design in the> paper has a little different protocol for authentication.>> 2. The Jira doesnt have cover the access control aspect of things. As a> client, I can skip talking to the NN and get blocks from the DN straight> away. There is no way to prevent it. This paper takes care of that aspect as> well.>>> Amandeep Khurana> Computer Science Graduate Student> University of California, Santa Cruz>>> On Fri, Mar 20, 2009 at 12:54 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:>>> Amandeep Khurana wrote:>>>>> http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc.edu/%7Eakhurana/Hadoop_Security.pdf>>>>>>>> How does this relate to the current proposal in Jira?>>>> https://issues.apache.org/jira/browse/HADOOP-4343>>>> Doug>>>>

I don't think there is a real reason for Hadoop to favor this design or only stay with HADOOP-4343 or another proposal at this state. It is healthy if we have different designs and implementation proceed independently. If you are willing to, I think you should proceed with a prototype so that others interested can play with. This is true not just for this feature, but many others as well.

This of course should not discourage others from reviewing your design.

Raghu.

Amandeep Khurana wrote:> Bouncing the thread... Waiting to hear from people about the proposal.> > > Amandeep Khurana> Computer Science Graduate Student> University of California, Santa Cruz> > > On Fri, Mar 20, 2009 at 2:47 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:> >> 1. The Jira covers only authentication using Kerberos. I dont think>> Kerberos is the best way to do it since I feel the scalability is limited.>> All keys have to be negotiated by the Kerberos server. The design in the>> paper has a little different protocol for authentication.>>>> 2. The Jira doesnt have cover the access control aspect of things. As a>> client, I can skip talking to the NN and get blocks from the DN straight>> away. There is no way to prevent it. This paper takes care of that aspect as>> well.>>>>>> Amandeep Khurana>> Computer Science Graduate Student>> University of California, Santa Cruz>>>>>> On Fri, Mar 20, 2009 at 12:54 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:>>>>> Amandeep Khurana wrote:>>>>>>> http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc.edu/%7Eakhurana/Hadoop_Security.pdf>>>>>>>> How does this relate to the current proposal in Jira?>>>>>> https://issues.apache.org/jira/browse/HADOOP-4343>>>>>> Doug>>>>>>

Our community uses X509 for a single-sign-on solution for a few thousand physicists. There's been increased interest in HDFS lately, and it would be very attractive to this community if Hadoop used a lightweight, but secure solution based upon Kerberos as in HADOOP-4343 (something like kerberos to initialize a session token and use that with the service).

This would be especially useful because the likely implementation would use JSSE - we'd be able to replace the kerberos implementation and, with a little work, drop the Globus implementation into place. We'd be able to use our single-sign-on and make the organization very happy.

Brian

On Mar 24, 2009, at 11:29 PM, Raghu Angadi wrote:

>> I haven't looked into the proposal, but a meta comment:>> I don't think there is a real reason for Hadoop to favor this design > or only stay with HADOOP-4343 or another proposal at this state. It > is healthy if we have different designs and implementation proceed > independently. If you are willing to, I think you should proceed > with a prototype so that others interested can play with. This is > true not just for this feature, but many others as well.>> This of course should not discourage others from reviewing your > design.>> Raghu.>> Amandeep Khurana wrote:>> Bouncing the thread... Waiting to hear from people about the >> proposal.>> Amandeep Khurana>> Computer Science Graduate Student>> University of California, Santa Cruz>> On Fri, Mar 20, 2009 at 2:47 PM, Amandeep Khurana >> <[EMAIL PROTECTED]> wrote:>>> 1. The Jira covers only authentication using Kerberos. I dont think>>> Kerberos is the best way to do it since I feel the scalability is >>> limited.>>> All keys have to be negotiated by the Kerberos server. The design >>> in the>>> paper has a little different protocol for authentication.>>>>>> 2. The Jira doesnt have cover the access control aspect of things. >>> As a>>> client, I can skip talking to the NN and get blocks from the DN >>> straight>>> away. There is no way to prevent it. This paper takes care of that >>> aspect as>>> well.>>>>>>>>> Amandeep Khurana>>> Computer Science Graduate Student>>> University of California, Santa Cruz>>>>>>>>> On Fri, Mar 20, 2009 at 12:54 PM, Doug Cutting >>> <[EMAIL PROTECTED]> wrote:>>>>>>> Amandeep Khurana wrote:>>>>>>>>> http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc.edu/%7Eakhurana/Hadoop_Security.pdf>>>>> >>>>>>>>>> How does this relate to the current proposal in Jira?>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-4343>>>>>>>> Doug>>>>>>>

Yes, an additional benefit of using Hadoop proprietary "delegation tokens"for delegation as described in HADOOP-4343, as opposed to using KerberosTGT/Service tickets, is that Kerberos is only used at the "edge" of Hadoop.Delegation tokens don't depend on Kerberos and can be coupled withnon-Kerberos authentication mechanisms (such as SSL) used at the edge.

KanOn 3/24/09 4:37 PM, "Brian Bockelman" <[EMAIL PROTECTED]> wrote:

> A related meta comment.> > Our community uses X509 for a single-sign-on solution for a few> thousand physicists. There's been increased interest in HDFS lately,> and it would be very attractive to this community if Hadoop used a> lightweight, but secure solution based upon Kerberos as in HADOOP-4343> (something like kerberos to initialize a session token and use that> with the service).> > This would be especially useful because the likely implementation> would use JSSE - we'd be able to replace the kerberos implementation> and, with a little work, drop the Globus implementation into place.> We'd be able to use our single-sign-on and make the organization very> happy.> > Brian> > On Mar 24, 2009, at 11:29 PM, Raghu Angadi wrote:> >> >> I haven't looked into the proposal, but a meta comment:>> >> I don't think there is a real reason for Hadoop to favor this design>> or only stay with HADOOP-4343 or another proposal at this state. It>> is healthy if we have different designs and implementation proceed>> independently. If you are willing to, I think you should proceed>> with a prototype so that others interested can play with. This is>> true not just for this feature, but many others as well.>> >> This of course should not discourage others from reviewing your>> design.>> >> Raghu.>> >> Amandeep Khurana wrote:>>> Bouncing the thread... Waiting to hear from people about the>>> proposal.>>> Amandeep Khurana>>> Computer Science Graduate Student>>> University of California, Santa Cruz>>> On Fri, Mar 20, 2009 at 2:47 PM, Amandeep Khurana>>> <[EMAIL PROTECTED]> wrote:>>>> 1. The Jira covers only authentication using Kerberos. I dont think>>>> Kerberos is the best way to do it since I feel the scalability is>>>> limited.>>>> All keys have to be negotiated by the Kerberos server. The design>>>> in the>>>> paper has a little different protocol for authentication.>>>> >>>> 2. The Jira doesnt have cover the access control aspect of things.>>>> As a>>>> client, I can skip talking to the NN and get blocks from the DN>>>> straight>>>> away. There is no way to prevent it. This paper takes care of that>>>> aspect as>>>> well.>>>> >>>> >>>> Amandeep Khurana>>>> Computer Science Graduate Student>>>> University of California, Santa Cruz>>>> >>>> >>>> On Fri, Mar 20, 2009 at 12:54 PM, Doug Cutting>>>> <[EMAIL PROTECTED]> wrote:>>>> >>>>> Amandeep Khurana wrote:>>>>> >>>>>> http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc>>>>>> .edu/%7Eakhurana/Hadoop_Security.pdf>>>>>>> >>>>>> >>>>> How does this relate to the current proposal in Jira?>>>>> >>>>> https://issues.apache.org/jira/browse/HADOOP-4343>>>>> >>>>> Doug>>>>> >>>> >

Amandeep Khurana wrote:> 1. The Jira covers only authentication using Kerberos. I dont think Kerberos> is the best way to do it since I feel the scalability is limited. All keys> have to be negotiated by the Kerberos server.

The design in HADOOP-4343 seeks to minimize the number of key negotiations. Do you think that's insufficient? If so, please add a comment on that issue.

> 2. The Jira doesnt have cover the access control aspect of things. As a> client, I can skip talking to the NN and get blocks from the DN straight> away. There is no way to prevent it. This paper takes care of that aspect as> well.

The intent is that access to a block on a datanode will require authentication. Currently it does not, but as security features are added this clearly must change. HADOOP-4343 does not mention how this will be done, but I believe it must be implemented in the same timeframe as namenode authentication.

As Raghu said, the security design for Hadoop is far from complete and your contributions here are very welcome.

> > 2. The Jira doesnt have cover the access control aspect of things. As a> client, I can skip talking to the NN and get blocks from the DN straight> away. There is no way to prevent it. This paper takes care of that aspect as> well.>

Have you looked at HADOOP-4359? In that JIRA, we discussed the idea of usingpublic-key signed capabilities and dismissed it in favor of symmetric-keybased capabilities. That said, you're welcome to explore the public-key ideafurther.

>>> On 3/20/09 2:47 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:>> >> > 2. The Jira doesnt have cover the access control aspect of things. As a> > client, I can skip talking to the NN and get blocks from the DN straight> > away. There is no way to prevent it. This paper takes care of that aspect> as> > well.> >>> Have you looked at HADOOP-4359? In that JIRA, we discussed the idea of> using> public-key signed capabilities and dismissed it in favor of symmetric-key> based capabilities. That said, you're welcome to explore the public-key> idea> further.Yes, I read through that. The issue with that approach is that the moment asingle DN gets compromised somehow (which isnt a big deal in a big systemcontaining 1000s of nodes), the symmetric key gets exposed and the entiresystem is compromised. The whole idea of asymmetric key crypto is to allowonly a single authorized prinicipal to sign stuff.> Kan>>

> Amandeep Khurana wrote:>>> 1. The Jira covers only authentication using Kerberos. I dont think>> Kerberos>> is the best way to do it since I feel the scalability is limited. All keys>> have to be negotiated by the Kerberos server.>>>> The design in HADOOP-4343 seeks to minimize the number of key negotiations.> Do you think that's insufficient? If so, please add a comment on that> issue.The NN doing key negotiations is fundamentally not feasible. Thats thelimitation of Kerberos and there's only a certain degree to which it can beoptimized. The design I proposed in the paper is a little different fromKerberos, where the clients negotiate the keys. This frees up the NN fromthe responsibility to do this task.

>> 2. The Jira doesnt have cover the access control aspect of things. As a>> client, I can skip talking to the NN and get blocks from the DN straight>> away. There is no way to prevent it. This paper takes care of that aspect>> as>> well.>>>> The intent is that access to a block on a datanode will require> authentication. Currently it does not, but as security features are added> this clearly must change. HADOOP-4343 does not mention how this will be> done, but I believe it must be implemented in the same timeframe as namenode> authentication.Agreed.>>> As Raghu said, the security design for Hadoop is far from complete and your> contributions here are very welcome.Got that.>>> Doug>>

>> 2. The Jira doesnt have cover the access control aspect of things. As a>> client, I can skip talking to the NN and get blocks from the DN straight>> away. There is no way to prevent it. This paper takes care of that aspect as>> well.> > The intent is that access to a block on a datanode will require> authentication. Currently it does not, but as security features are> added this clearly must change. HADOOP-4343 does not mention how this> will be done, but I believe it must be implemented in the same timeframe> as namenode authentication.>

We plan to use capability tokens issued by NN to control accesses to DN (seeHADOOP-4359). If DN authenticates users, those capability tokens can be madenon-transferable. This will improve security since stolen tokens can't beused by the attacker. Another benefit of having authentication is to be ableto establish an encrypted communication channel afterwards (if theauthentication protocol used supports it). However, I think DN userauthentication may not be necessary for many use cases and can be addressedafter NN authentication is done.

>> >> >> On 3/20/09 2:47 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:>> >>> >>> 2. The Jira doesnt have cover the access control aspect of things. As a>>> client, I can skip talking to the NN and get blocks from the DN straight>>> away. There is no way to prevent it. This paper takes care of that aspect>> as>>> well.>>> >> >> Have you looked at HADOOP-4359? In that JIRA, we discussed the idea of>> using>> public-key signed capabilities and dismissed it in favor of symmetric-key>> based capabilities. That said, you're welcome to explore the public-key>> idea>> further.> > > Yes, I read through that. The issue with that approach is that the moment a> single DN gets compromised somehow (which isnt a big deal in a big system> containing 1000s of nodes), the symmetric key gets exposed and the entire> system is compromised. The whole idea of asymmetric key crypto is to allow> only a single authorized prinicipal to sign stuff.> Yes, I discussed this point in the JIRA. It's a trade-off between securityand performance and I think it's worth taking for our cluster setup. In oursetup, all the nodes of a cluster are located in the same datacenter andmanaged in the same way. While securing 1000 nodes is certain harder thansecuring one node, it's not like you have 1000 desktops spread around.You're welcome to submit a patch for the public-key solution. It can beuseful for some other cluster setups.

> On Wed, Mar 25, 2009 at 2:49 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:> >> Amandeep Khurana wrote:>> >>> 1. The Jira covers only authentication using Kerberos. I dont think>>> Kerberos>>> is the best way to do it since I feel the scalability is limited. All keys>>> have to be negotiated by the Kerberos server.>>> >> >> The design in HADOOP-4343 seeks to minimize the number of key negotiations.>> Do you think that's insufficient? If so, please add a comment on that>> issue.> > > The NN doing key negotiations is fundamentally not feasible. Thats the> limitation of Kerberos and there's only a certain degree to which it can be> optimized. The design I proposed in the paper is a little different from> Kerberos, where the clients negotiate the keys. This frees up the NN from> the responsibility to do this task.> You've lost me. What are you referring to when you say key negotiations? Asfar as I read from your paper, you didn't propose anything new for theauthentication between NN and the user, simply mentioning it will be aKerberos like protocol. If you are referring to those capabilities foraccessing DN, those are issued by NN, right?

>>>> On 3/25/09 2:49 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:>> >> 2. The Jira doesnt have cover the access control aspect of things. As a> >> client, I can skip talking to the NN and get blocks from the DN straight> >> away. There is no way to prevent it. This paper takes care of that> aspect as> >> well.> >> > The intent is that access to a block on a datanode will require> > authentication. Currently it does not, but as security features are> > added this clearly must change. HADOOP-4343 does not mention how this> > will be done, but I believe it must be implemented in the same timeframe> > as namenode authentication.> >>> We plan to use capability tokens issued by NN to control accesses to DN> (see> HADOOP-4359). If DN authenticates users, those capability tokens can be> made> non-transferable. This will improve security since stolen tokens can't be> used by the attacker. Another benefit of having authentication is to be> able> to establish an encrypted communication channel afterwards (if the> authentication protocol used supports it). However, I think DN user> authentication may not be necessary for many use cases and can be addressed> after NN authentication is done.Got it. There is no user authentication at the DN. I'm not sure why you gotthat impression. Authentication is done only once by the NN. Thereafter itsonly capabilities being passed around. However, there are 2 maindifferences:1. You plan to use symmetric key and I proposed asymmetric key.2. The authentation protocol you plan to use is Kerberos and I dont thinkthats scalable. Hence a different one that my paper talks about.

> > > > On 3/25/09 12:15 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:> >> On Wed, Mar 25, 2009 at 2:49 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:>> >>> Amandeep Khurana wrote:>>> >>>> 1. The Jira covers only authentication using Kerberos. I dont think>>>> Kerberos>>>> is the best way to do it since I feel the scalability is limited. All keys>>>> have to be negotiated by the Kerberos server.>>>> >>> >>> The design in HADOOP-4343 seeks to minimize the number of key negotiations.>>> Do you think that's insufficient? If so, please add a comment on that>>> issue.>> >> >> The NN doing key negotiations is fundamentally not feasible. Thats the>> limitation of Kerberos and there's only a certain degree to which it can be>> optimized. The design I proposed in the paper is a little different from>> Kerberos, where the clients negotiate the keys. This frees up the NN from>> the responsibility to do this task.>> > You've lost me. What are you referring to when you say key negotiations? As> far as I read from your paper, you didn't propose anything new for the> authentication between NN and the user, simply mentioning it will be a> Kerberos like protocol. If you are referring to those capabilities for> accessing DN, those are issued by NN, right?> My bad. I read your doc again and I guess you are referring to the protocolyou proposed in the paper for the authentication to datanode using namenodeas a trusted third-party. But the namenode is certainly involved in theissuing of the ticket, right? Whereas if you use Kerberos, that task can beoff-loaded to the Kerberos KDC.

>>>>> On 3/25/09 12:12 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:>> >>> >>> >> On 3/20/09 2:47 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:> >>> >>>> >>> 2. The Jira doesnt have cover the access control aspect of things. As a> >>> client, I can skip talking to the NN and get blocks from the DN> straight> >>> away. There is no way to prevent it. This paper takes care of that> aspect> >> as> >>> well.> >>>> >>> >> Have you looked at HADOOP-4359? In that JIRA, we discussed the idea of> >> using> >> public-key signed capabilities and dismissed it in favor of> symmetric-key> >> based capabilities. That said, you're welcome to explore the public-key> >> idea> >> further.> >> >> > Yes, I read through that. The issue with that approach is that the moment> a> > single DN gets compromised somehow (which isnt a big deal in a big system> > containing 1000s of nodes), the symmetric key gets exposed and the entire> > system is compromised. The whole idea of asymmetric key crypto is to> allow> > only a single authorized prinicipal to sign stuff.> >> Yes, I discussed this point in the JIRA. It's a trade-off between security> and performance and I think it's worth taking for our cluster setup. In our> setup, all the nodes of a cluster are located in the same datacenter and> managed in the same way. While securing 1000 nodes is certain harder than> securing one node, it's not like you have 1000 desktops spread around.> You're welcome to submit a patch for the public-key solution. It can be> useful for some other cluster setups.>

Makes sense... Performance definitely is a concern but if you look at theresults that I got out of the basic testing I did, its really not big.>> Kan>>

>>>> On 3/25/09 1:04 PM, "Kan Zhang" <[EMAIL PROTECTED]> wrote:>> >> >> >> > On 3/25/09 12:15 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:> >> >> On Wed, Mar 25, 2009 at 2:49 AM, Doug Cutting <[EMAIL PROTECTED]>> wrote:> >>> >>> Amandeep Khurana wrote:> >>>> >>>> 1. The Jira covers only authentication using Kerberos. I dont think> >>>> Kerberos> >>>> is the best way to do it since I feel the scalability is limited. All> keys> >>>> have to be negotiated by the Kerberos server.> >>>>> >>>> >>> The design in HADOOP-4343 seeks to minimize the number of key> negotiations.> >>> Do you think that's insufficient? If so, please add a comment on that> >>> issue.> >>> >>> >> The NN doing key negotiations is fundamentally not feasible. Thats the> >> limitation of Kerberos and there's only a certain degree to which it can> be> >> optimized. The design I proposed in the paper is a little different from> >> Kerberos, where the clients negotiate the keys. This frees up the NN> from> >> the responsibility to do this task.> >>> > You've lost me. What are you referring to when you say key negotiations?> As> > far as I read from your paper, you didn't propose anything new for the> > authentication between NN and the user, simply mentioning it will be a> > Kerberos like protocol. If you are referring to those capabilities for> > accessing DN, those are issued by NN, right?> >> My bad. I read your doc again and I guess you are referring to the protocol> you proposed in the paper for the authentication to datanode using namenode> as a trusted third-party. But the namenode is certainly involved in the> issuing of the ticket, right? Whereas if you use Kerberos, that task can be> off-loaded to the Kerberos KDC.The NN issues a ticket to a client once and the client goes ahead andnegotiates the keys. So, we dont need a Kerberos KDC and no other principalin the system is loaded... At the same time, the NN has full control overwho gets into the system.

Amandeep Khurana wrote:> On Wed, Mar 25, 2009 at 12:23 PM, Kan Zhang <[EMAIL PROTECTED]> wrote:> >>>>>> On 3/25/09 2:49 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:>>>>>> 2. The Jira doesnt have cover the access control aspect of things. As a>>>> client, I can skip talking to the NN and get blocks from the DN straight>>>> away. There is no way to prevent it. This paper takes care of that>> aspect as>>>> well.>>> The intent is that access to a block on a datanode will require>>> authentication. Currently it does not, but as security features are>>> added this clearly must change. HADOOP-4343 does not mention how this>>> will be done, but I believe it must be implemented in the same timeframe>>> as namenode authentication.>>>>> We plan to use capability tokens issued by NN to control accesses to DN>> (see>> HADOOP-4359). If DN authenticates users, those capability tokens can be>> made>> non-transferable. This will improve security since stolen tokens can't be>> used by the attacker. Another benefit of having authentication is to be>> able>> to establish an encrypted communication channel afterwards (if the>> authentication protocol used supports it). However, I think DN user>> authentication may not be necessary for many use cases and can be addressed>> after NN authentication is done.> > > Got it. There is no user authentication at the DN. I'm not sure why you got> that impression. Authentication is done only once by the NN. Thereafter its> only capabilities being passed around. However, there are 2 main> differences:> 1. You plan to use symmetric key and I proposed asymmetric key.> 2. The authentation protocol you plan to use is Kerberos and I dont think> thats scalable. Hence a different one that my paper talks about.

Brian's points about x509 integration are relevant -they are people who have to worry about trust.

There's a separate issue bubbling up here and that is US government export rules regarding encryption and the like. Apache has to deal with that already, and has a page covering the status:http://www.apache.org/licenses/exports/

generally, if you use jsch or the bouncy castle implementations of JSSE then it's not your project's problem. Building security and encryption support more directly into the app is something that needs to be looked at very carefully. It's where legal issues take priority over coding ones.

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext