Kan Zhang
added a comment - 13/Nov/09 21:31 The current thinking on the fields to include in the tokenID is as follows.
tokenID = {issuerID, ownerID, renewerID, issueDate, maxLifetime, SerialNo, MasterKeyID}

Attaching a preliminary patch to facilitate discussions on how code should be structured and what interfaces should be adopted. It's incomplete, but should suffice for a high-level discussion.

A few new classes are introduced in the patch. DelegationTokenHandler is the main class. It is responsible for choosing master keys and updating them. It also uses master keys to generate, re-generate or verify tokens. Master keys are not exposed. One must invoke methods on a DelegationTokenHandler for operations involving master keys. A DelegationTokenHandler is only needed on the server-side. Client uses tokens generated by a handler and does not require access to the handler.

DelegationToken is the token object generated by the handler and sent to the client over the wire. It supports serialization and should contain methods that are necessary for the client to use the token. A token has two fields, tokenID and tokenAuth. When used for authentication, tokenID is effectively used as username and tokenAuth is the password. A client doesn't need to understand what is encoded in the tokenID. Both fields are opaque to the user. TokenID can be used to encode as many data items as necessary and server needs to be able to encode and decode those items. That logic is captured in the handler. We may add different types of tokens in the future and each type will have its own handler. The tokens generated by different handlers may have different types but otherwise look similar to the client.

Note that client needs to be able to find out who the tokenIssuer is so that it can pick the right token to use from its cache (a token can only be used against its issuing server). A tokenIssuer field or getTokenIssuer() method needs to be added to DelegationToken or its superclass Token. Let's ignore this issue for now.

An alternative design is to break tokenID into individual data items and store those items as separate fields inside the token. TokenID can be computed on demand from those fields. The benefit is that users are able to retrieve those fields without de-serializing. However, apart from tokenIssuer mentioned above and possibly expirationDate, it's unclear the client needs to know anything more about the token. Moreover, different types of tokens may require different fields and it would be harder to abstract a common Token class than the opaque model. Maybe we should just separate out tokenIssuer and expirationDate, and put everything else into an opaque field, which only the handler understands.

Another alternative is to move operations that use master keys from DelegationTokenHandler to DelegationToken, making the handler a key storage facility and doesn't care about how keys are used. However, the keys need to be used to initialize Mac objects, which are then used to compute tokenAuth. It's better to share a single Mac object for each key. The handler provides a convenient place to synchronize accesses to the shared Mac object. Conceptually, a token is a credential and once it's generated it doesn't change. The client that uses a token never needs to re-compute tokenAuth or needs access to the key. The server does need access to the key and it seems a good idea to hide the key in a single place - the handler, where all the key management is happening.

Kan Zhang
added a comment - 13/Nov/09 21:37 Attaching a preliminary patch to facilitate discussions on how code should be structured and what interfaces should be adopted. It's incomplete, but should suffice for a high-level discussion.
A few new classes are introduced in the patch. DelegationTokenHandler is the main class. It is responsible for choosing master keys and updating them. It also uses master keys to generate, re-generate or verify tokens. Master keys are not exposed. One must invoke methods on a DelegationTokenHandler for operations involving master keys. A DelegationTokenHandler is only needed on the server-side. Client uses tokens generated by a handler and does not require access to the handler.
DelegationToken is the token object generated by the handler and sent to the client over the wire. It supports serialization and should contain methods that are necessary for the client to use the token. A token has two fields, tokenID and tokenAuth. When used for authentication, tokenID is effectively used as username and tokenAuth is the password. A client doesn't need to understand what is encoded in the tokenID. Both fields are opaque to the user. TokenID can be used to encode as many data items as necessary and server needs to be able to encode and decode those items. That logic is captured in the handler. We may add different types of tokens in the future and each type will have its own handler. The tokens generated by different handlers may have different types but otherwise look similar to the client.
Note that client needs to be able to find out who the tokenIssuer is so that it can pick the right token to use from its cache (a token can only be used against its issuing server). A tokenIssuer field or getTokenIssuer() method needs to be added to DelegationToken or its superclass Token. Let's ignore this issue for now.
An alternative design is to break tokenID into individual data items and store those items as separate fields inside the token. TokenID can be computed on demand from those fields. The benefit is that users are able to retrieve those fields without de-serializing. However, apart from tokenIssuer mentioned above and possibly expirationDate, it's unclear the client needs to know anything more about the token. Moreover, different types of tokens may require different fields and it would be harder to abstract a common Token class than the opaque model. Maybe we should just separate out tokenIssuer and expirationDate, and put everything else into an opaque field, which only the handler understands.
Another alternative is to move operations that use master keys from DelegationTokenHandler to DelegationToken, making the handler a key storage facility and doesn't care about how keys are used. However, the keys need to be used to initialize Mac objects, which are then used to compute tokenAuth. It's better to share a single Mac object for each key. The handler provides a convenient place to synchronize accesses to the shared Mac object. Conceptually, a token is a credential and once it's generated it doesn't change. The client that uses a token never needs to re-compute tokenAuth or needs access to the key. The server does need access to the key and it seems a good idea to hide the key in a single place - the handler, where all the key management is happening.
Thoughts?

1. The TokenIdentifiers are pulled out and made into classes. Each kind of Token will define three classes:
a. The TokenIdentifier class that contains the fields of the token.
b. The TokenPicker class searches through the tokens in a user's Subject to find the token for a given RPC connection.
c. The SecretManager class handles the secrets that are used to create and validate the tokens.
2. The serialization of each of the TokenIdentifiers is done via standard Writable interfaces.
3. The client-side Tokens are not sub-classed. They just contain bytes for the serialized token identifier and corresponding password. They also have a "kind", which represents what kind of token they are, and "service", which represents which instance of the service the token is for. For HDFS delegation tokens, they would be "hdfs.delegation" and "$namenode:$port".
4. The TokenIdentifiers are subclassed and store their values as explicit fields, which makes using their values much easier. It will also be easier to move over to Avro when our RPC supports it. That will simplify putting in versioning into the token identifiers.
5. Using thread local Mac's means that the servers don't need to hold a global lock while they compute the HMAC-SHA1.
6. Dividing up the token handlers into SecretManagers means that all of the common code for interfacing to SASL will be shared.

Owen O'Malley
added a comment - 22/Nov/09 21:46 Here is a rough sketch of what I'd propose. The relevant differences:
1. The TokenIdentifiers are pulled out and made into classes. Each kind of Token will define three classes:
a. The TokenIdentifier class that contains the fields of the token.
b. The TokenPicker class searches through the tokens in a user's Subject to find the token for a given RPC connection.
c. The SecretManager class handles the secrets that are used to create and validate the tokens.
2. The serialization of each of the TokenIdentifiers is done via standard Writable interfaces.
3. The client-side Tokens are not sub-classed. They just contain bytes for the serialized token identifier and corresponding password. They also have a "kind", which represents what kind of token they are, and "service", which represents which instance of the service the token is for. For HDFS delegation tokens, they would be "hdfs.delegation" and "$namenode:$port".
4. The TokenIdentifiers are subclassed and store their values as explicit fields, which makes using their values much easier. It will also be easier to move over to Avro when our RPC supports it. That will simplify putting in versioning into the token identifiers.
5. Using thread local Mac's means that the servers don't need to hold a global lock while they compute the HMAC-SHA1.
6. Dividing up the token handlers into SecretManagers means that all of the common code for interfacing to SASL will be shared.

Comments on the patch:
1. The algorithm name should be protected not public.
2. The key generator should be a non-static field and the method should synchronize on the key generator.
3. In JobTokenSecretManager, the secret key should not be a static, but a non-static field.
4. Don't bother with the WritableFactory for DelegationTokenIdentifier.
5. The generic interface should be in common in org.apache.hadoop.security.token.
6. Delegation tokens should be in HDFS in org.apache.hadoop.hdfs.security.token.delegation.
7. Job tokens should in in MapReduce in org.apache.hadoop.mapreduce.security.token.job.
8. The thread should catch Throwable at the top level and stop the server.
9. The thread should catch InterruptedException at the top level and return immediately.
10. activate should be startThreads() and stopTokenReceiver should be stopThreads()
11. Keep the SecretKey details in the the various SecretManagers rather than spreading them out.

Owen O'Malley
added a comment - 02/Dec/09 06:49 The generic parts of this should be done as MAPREDUCE-1250 .
Comments on the patch:
1. The algorithm name should be protected not public.
2. The key generator should be a non-static field and the method should synchronize on the key generator.
3. In JobTokenSecretManager, the secret key should not be a static, but a non-static field.
4. Don't bother with the WritableFactory for DelegationTokenIdentifier.
5. The generic interface should be in common in org.apache.hadoop.security.token.
6. Delegation tokens should be in HDFS in org.apache.hadoop.hdfs.security.token.delegation.
7. Job tokens should in in MapReduce in org.apache.hadoop.mapreduce.security.token.job.
8. The thread should catch Throwable at the top level and stop the server.
9. The thread should catch InterruptedException at the top level and return immediately.
10. activate should be startThreads() and stopTokenReceiver should be stopThreads()
11. Keep the SecretKey details in the the various SecretManagers rather than spreading them out.

Devaraj Das
added a comment - 11/Aug/11 19:10 Eli, the delegation token feature is there in 20-security and in trunk as part of other jiras. I don't have that list handy, but I am closing this one as duplicate.

Devaraj Das
added a comment - 11/Aug/11 21:22 The implementation is not very different.
The SecretManager in particular is there in the trunk directory common/src/java/org/apache/hadoop/security/token/SecretManager.java