hadoop-common-dev mailing list archives

Hi Tianyou -
As was discussed on the pre-summit calls, we were approaching the summit from a clean slate.
Perhaps, that wasn't articulated in the summary of those calls very well - I thought that
it was.
In any case, the agreed upon approach to move forward was to agree on the moving parts that
needed to be worked on, prioritize them and start creating subtasks for them.
Using one of the existing jiras would work for this but using both doesn't make a lot of sense
to me.
My wording regarding the alignment of 9392 and 9533 is regrettable.
The point is that the SSO server instance/s based approach that is now apparent in 9392 is
very much the same thing that 9533 attempted to introduce.
Yes, there are a number of differences that exist in details that are in the documents. If
we are starting from a clean slate then it is too early to talk about many of those details.
Part of the difficulty in reconciling the two jiras has been related to having to consume
the whole thing at once and try and agree on all the details of all the components - much
like trying to boil the ocean all at once.
Starting anew allows us to:
1. establish and agree on the components and broad stroke interaction patterns.
2. identify individual pieces to work on and agree on their finer details - this is where
the differences will be rationalized
3. break up the workload and deliver the overall vision
This approach allows us to boil the ocean one pot at a time.
If you would like to keep the jiras separate - which I would see as unfortunate - then the
server instance aspects should be in 9533.
This would include the endpoints used for the flows, the hosting of the pluggable authentication
mechanisms created in 9392, trust relationship management required across instances, etc.
9533 is a jira for a Hadoop SSO Server.
Unfortunately, I believe this approach would leave us exactly where where we started.
So, again the discussion points were not really addressed.
It seems that you and Kai have provided your preference for the jira question - though you
have really added another option which is keep things the same - which we can make work.
We still need an opinion on the list of components in this thread.
My suggestion is that you take your document and make sure that from a high level all the
major components are represented here.
If not, describe anything else that is needed and why.
We also need to determine the first component to drill down into. Brain and I both see the
HSSO Tokens as central to the implementations of other components and should probably be tackled
first.
By the way, this drilling down into the details of each of the components is where we will
rationalize the differences in implementations/approaches.
> Our updated design doc clearly addresses the authorization and proxy flow which are important
for users.
Yes - this is goodness. I don't see the fact that more flows are described as a difference.
Those use cases that are needed by our users will need to be implemented.
Once we get to the components that need to provide for these flows we will need to define
them for that component/s.
> HSSO can continue to be layered on top of TAS via federation.
I don't know what this actually means.
HSSO was to be a SSO server instance that hosts the endpoints for the required flows in acquiring
the necessary tokens.
You would have to explain to me what layering on top of TAS via federation means.
In fact, I don't even want to reference HSSO in this thread anymore - its aspects are represented
in the components list of this thread as the SSO Server Instance.
> Please review the design doc we have uploaded to understand the differences. I am sure
Kai will also add more details about the differences between these JIRAs.
At this point, it is important that you make sure the components represented in this thread
are sufficient for your ideas.
We will not be well served by continuing to compare and contrast.
This thread and those to follow are part of the collaboration process - once the work items
are identified through this thread the collaboration on individual components can certainly
happen in jiras.
If we want a new jira to host this higher level discussion that is fine too.
You should use your work on 9392 within this process to help drive the discussion and definition
of the components identified here.
So…
At this point, I think that we should commit to moving this thread forward and not backward
by pointing to silo'd jiras.
This highest level pass of identifying the components should have been the easy part.
We need close down on this list and move on to the more challenging discussions of the component
details.
Can we do this?
Is there another approach that folks would like to take here?
thanks,
--larry
On Jul 4, 2013, at 12:19 AM, "Li, Tianyou" <tianyou.li@intel.com> wrote:
> Hi Larry,
>
> I participated in the design discussion at Hadoop Summit. I do not remember there was
any discussion of abandoning the current JIRAs which tracks a lot of good input from others
in the community and important for us to consider as we move forward with the work. Recommend
we continue to move forward with the two JIRAs that we have already been respectively working
on, as well other JIRAs that others in the community continue to work on.
>
> "Your latest design revision actually makes it clear that you are now targeting exactly
what was described as HSSO - so comparing and contrasting is not going to add any value."
> That is not my understanding. As Kai has pointed out in response to your comment on HADOOP-9392,
a lot of these updates predate last week's discussion at the summit. Fortunately the discussion
at the summit was in line with our thinking on the required revisions from discussing with
others in the community prior to the summit. Our updated design doc clearly addresses the
authorization and proxy flow which are important for users. HSSO can continue to be layered
on top of TAS via federation.
>
> "Personally, I think that continuing the separation of 9533 and 9392 will do this effort
a disservice. There doesn't seem to be enough differences between the two to justify separate
jiras anymore."
> Actually I see many key differences between 9392 and 9533. Andrew and Kai has also pointed
out there are key differences when comparing 9392 and 9533. Please review the design doc we
have uploaded to understand the differences. I am sure Kai will also add more details about
the differences between these JIRAs.
>
> The work proposed by us on 9392 addresses additional user needs beyond what 9533 proposes
to implement. We should figure out some of the implementation specifics for those JIRAs so
both of us can keep moving on the code without colliding. Kai has also recommended the same
as his preference in response to your comment on 9392.
>
> Let's work that out as a community of peers so we can all agree on an approach to move
forward collaboratively.
>
> Thanks,
> Tianyou
>
> -----Original Message-----
> From: Larry McCay [mailto:lmccay@hortonworks.com]
> Sent: Thursday, July 04, 2013 4:10 AM
> To: Zheng, Kai
> Cc: common-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components
>
> Hi Kai -
>
> I think that I need to clarify something...
>
> This is not an update for 9533 but a continuation of the discussions that are focused
on a fresh look at a SSO for Hadoop.
> We've agreed to leave our previous designs behind and therefore we aren't really seeing
it as an HSSO layered on top of TAS approach or an HSSO vs TAS discussion.
>
> Your latest design revision actually makes it clear that you are now targeting exactly
what was described as HSSO - so comparing and contrasting is not going to add any value.
>
> What we need you to do at this point, is to look at those high-level components described
on this thread and comment on whether we need additional components or any that are listed
that don't seem necessary to you and why.
> In other words, we need to define and agree on the work that has to be done.
>
> We also need to determine those components that need to be done before anything else
can be started.
> I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the other components
and should probably be defined and POC'd in short order.
>
> Personally, I think that continuing the separation of 9533 and 9392 will do this effort
a disservice. There doesn't seem to be enough differences between the two to justify separate
jiras anymore. It may be best to file a new one that reflects a single vision without the
extra cruft that has built up in either of the existing ones. We would certainly reference
the existing ones within the new one. This approach would align with the spirit of the discussions
up to this point.
>
> I am prepared to start a discussion around the shape of the two Hadoop SSO tokens: identity
and access. If this is what others feel the next topic should be.
> If we can identify a jira home for it, we can do it there - otherwise we can create another
DISCUSS thread for it.
>
> thanks,
>
> --larry
>
>
> On Jul 3, 2013, at 2:39 PM, "Zheng, Kai" <kai.zheng@intel.com> wrote:
>
>> Hi Larry,
>>
>> Thanks for the update. Good to see that with this update we are now aligned on most
points.
>>
>> I have also updated our TokenAuth design in HADOOP-9392. The new revision incorporates
feedback and suggestions in related discussion with the community, particularly from Microsoft
and others attending the Security design lounge session at the Hadoop summit. Summary of the
changes:
>> 1. Revised the approach to now use two tokens, Identity Token plus Access Token,
particularly considering our authorization framework and compatibility with HSSO;
>> 2. Introduced Authorization Server (AS) from our authorization framework into
the flow that issues access tokens for clients with identity tokens to access services;
>> 3. Refined proxy access token and the proxy/impersonation flow;
>> 4. Refined the browser web SSO flow regarding access to Hadoop web services;
>> 5. Added Hadoop RPC access flow regarding CLI clients accessing Hadoop services
via RPC/SASL;
>> 6. Added client authentication integration flow to illustrate how desktop logins
can be integrated into the authentication process to TAS to exchange identity token;
>> 7. Introduced fine grained access control flow from authorization framework, I
have put it in appendices section for the reference;
>> 8. Added a detailed flow to illustrate Hadoop Simple authentication over TokenAuth,
in the appendices section;
>> 9. Added secured task launcher in appendices as possible solutions for Windows
platform;
>> 10. Removed low level contents, and not so relevant parts into appendices section
from the main body.
>>
>> As we all think about how to layer HSSO on TAS in TokenAuth framework, please take
some time to look at the doc and then let's discuss the gaps we might have. I would like to
discuss these gaps with focus on the implementations details so we are all moving towards
getting code done. Let's continue this part of the discussion in HADOOP-9392 to allow for
better tracking on the JIRA itself. For discussions related to Centralized SSO server, suggest
we continue to use HADOOP-9533 to consolidate all discussion related to that JIRA. That way
we don't need extra umbrella JIRAs.
>>
>> I agree we should speed up these discussions, agree on some of the implementation
specifics so both us can get moving on the code while not stepping on each other in our work.
>>
>> Look forward to your comments and comments from others in the community. Thanks.
>>
>> Regards,
>> Kai
>>
>> -----Original Message-----
>> From: Larry McCay [mailto:lmccay@hortonworks.com]
>> Sent: Wednesday, July 03, 2013 4:04 AM
>> To: common-dev@hadoop.apache.org
>> Subject: [DISCUSS] Hadoop SSO/Token Server Components
>>
>> All -
>>
>> As a follow up to the discussions that were had during Hadoop Summit, I would like
to introduce the discussion topic around the moving parts of a Hadoop SSO/Token Service.
>> There are a couple of related Jira's that can be referenced and may or may not be
updated as a result of this discuss thread.
>>
>> https://issues.apache.org/jira/browse/HADOOP-9533
>> https://issues.apache.org/jira/browse/HADOOP-9392
>>
>> As the first aspect of the discussion, we should probably state the overall goals
and scoping for this effort:
>> * An alternative authentication mechanism to Kerberos for user
>> authentication
>> * A broader capability for integration into enterprise identity and
>> SSO solutions
>> * Possibly the advertisement/negotiation of available authentication
>> mechanisms
>> * Backward compatibility for the existing use of Kerberos
>> * No (or minimal) changes to existing Hadoop tokens (delegation, job,
>> block access, etc)
>> * Pluggable authentication mechanisms across: RPC, REST and webui
>> enforcement points
>> * Continued support for existing authorization policy/ACLs, etc
>> * Keeping more fine grained authorization policies in mind - like attribute based
access control
>> - fine grained access control is a separate but related effort that
>> we must not preclude with this effort
>> * Cross cluster SSO
>>
>> In order to tease out the moving parts here are a couple high level and simplified
descriptions of SSO interaction flow:
>> +------+
>> +------+ credentials 1 | SSO |
>> |CLIENT|-------------->|SERVER|
>> +------+ :tokens +------+
>> 2 |
>> | access token
>> V :requested resource
>> +-------+
>> |HADOOP |
>> |SERVICE|
>> +-------+
>>
>> The above diagram represents the simplest interaction model for an SSO service in
Hadoop.
>> 1. client authenticates to SSO service and acquires an access token
>> a. client presents credentials to an authentication service endpoint
>> exposed by the SSO server (AS) and receives a token representing the
>> authentication event and verified identity b. client then presents
>> the identity token from 1.a. to the token endpoint exposed by the SSO
>> server (TGS) to request an access token to a particular Hadoop service
>> and receives an access token 2. client presents the Hadoop access
>> token to the Hadoop service for which the access token has been
>> granted and requests the desired resource or services a. access token
>> is presented as appropriate for the service endpoint protocol being
>> used b. Hadoop service token validation handler validates the token
>> and verifies its integrity and the identity of the issuer
>>
>> +------+
>> | IdP |
>> +------+
>> 1 ^ credentials
>> | :idp_token
>> | +------+
>> +------+ idp_token 2 | SSO |
>> |CLIENT|-------------->|SERVER|
>> +------+ :tokens +------+
>> 3 |
>> | access token
>> V :requested resource
>> +-------+
>> |HADOOP |
>> |SERVICE|
>> +-------+
>>
>>
>> The above diagram represents a slightly more complicated interaction model for an
SSO service in Hadoop that removes Hadoop from the credential collection business.
>> 1. client authenticates to a trusted identity provider within the
>> enterprise and acquires an IdP specific token a. client presents
>> credentials to an enterprise IdP and receives a token representing the
>> authentication identity 2. client authenticates to SSO service and
>> acquires an access token a. client presents idp_token to an
>> authentication service endpoint exposed by the SSO server (AS) and
>> receives a token representing the authentication event and verified
>> identity b. client then presents the identity token from 2.a. to the
>> token endpoint exposed by the SSO server (TGS) to request an access
>> token to a particular Hadoop service and receives an access token 3.
>> client presents the Hadoop access token to the Hadoop service for
>> which the access token has been granted and requests the desired
>> resource or services a. access token is presented as appropriate for
>> the service endpoint protocol being used b. Hadoop service token
>> validation handler validates the token and verifies its integrity and
>> the identity of the issuer
>>
>> Considering the above set of goals and high level interaction flow description, we
can start to discuss the component inventory required to accomplish this vision:
>>
>> 1. SSO Server Instance: this component must be able to expose endpoints for both
authentication of users by collecting and validating credentials and federation of identities
represented by tokens from trusted IdPs within the enterprise. The endpoints should be composable
so as to allow for multifactor authentication mechanisms. They will also need to return tokens
that represent the authentication event and verified identity as well as access tokens for
specific Hadoop services.
>>
>> 2. Authentication Providers: pluggable authentication mechanisms must be easily created
and configured for use within the SSO server instance. They will ideally allow the enterprise
to plugin their preferred components from off the shelf as well as provide custom providers.
Supporting existing standards for such authentication providers should be a top priority concern.
There are a number of standard approaches in use in the Java world: JAAS loginmodules, servlet
filters, JASPIC authmodules, etc. A pluggable provider architecture that allows the enterprise
to leverage existing investments in these technologies and existing skill sets would be ideal.
>>
>> 3. Token Authority: a token authority component would need to have the ability to
issue, verify and revoke tokens. This authority will need to be trusted by all enforcement
points that need to verify incoming tokens. Using something like PKI for establishing trust
will be required.
>>
>> 4. Hadoop SSO Tokens: the exact shape and form of the sso tokens will need to be
considered in order to determine the means by which trust and integrity are ensured while
using them. There may be some abstraction of the underlying format provided through interface
based design but all token implementations will need to have the same attributes and capabilities
in terms of validation and cryptographic verification.
>>
>> 5. SSO Protocol: the lowest common denominator protocol for SSO server interactions
across client types would likely be REST. Depending on the REST client in use it may require
explicitly coding to the token flow described in the earlier interaction descriptions or a
plugin may be provided for things like HTTPClient, curl, etc. RPC clients will have this taken
care for them within the SASL layer and will leverage the REST endpoints as well. This likely
implies trust requirements for the RPC client to be able to trust the SSO server's identity
cert that is presented over SSL.
>>
>> 6. REST Client Agent Plugins: required for encapsulating the interaction with the
SSO server for the client programming models. We may need these for many client types: e.g.
Java, JavaScript, .Net, Python, cURL etc.
>>
>> 7. Server Side Authentication Handlers: the server side of the REST, RPC or webui
connection will need to be able to validate and verify the incoming Hadoop tokens in order
to grant or deny access to requested resources.
>>
>> 8. Credential/Trust Management: throughout the system - on client and server sides
- we will need to manage and provide access to PKI and potentially shared secret artifacts
in order to establish the required trust relationships to replace the mutual authentication
that would be otherwise provided by using kerberos everywhere.
>>
>> So, discussion points:
>>
>> 1. Are there additional components that would be required for a Hadoop SSO service?
>> 2. Should any of the above described components be considered not actually necessary
or poorly described?
>> 2. Should we create a new umbrella Jira to identify each of these as a subtask?
>> 3. Should we just continue to use 9533 for the SSO server and add additional subtasks?
>> 4. What are the natural seams of separation between these components and any dependencies
between one and another that affect priority?
>>
>> Obviously, each component that we identify will have a jira of its own - more than
likely - so we are only trying to identify the high level descriptions for now.
>>
>> Can we try and drive this discussion to a close by the end of the week? This will
allow us to start breaking out into component implementation plans.
>>
>> thanks,
>>
>> --larry
>