Note: This web page was
converted automatically from a Word original. There may be problems with the
formatting and the pictures. To see the intended form, follow one of these
links: Acrobat, Word.

What do we want from secure computer systems? Here is a
reasonable goal:

Computers are as
secure as real world systems, and people believe it.

Most real world systems are not very secure by the absolute
standard suggested above. It’s easy to break into someone’s house. In fact, in
many places people don’t even bother to lock their houses, although in Manhattan
they may use two or three locks on the front door. It’s fairly easy to steal
something from a store. You need very little technology to forge a credit card,
and it’s quite safe to use a forged card at least a few times.

Real security is
about punishment, not about locks; about accountability, not access control

Why do people live with such poor security in real world
systems? The reason is that real world security is not about perfect defenses
against determined attackers. Instead, it’s about

·value,

·locks, and

·punishment.

The bad guys balances the value of what they gain against the
risk of punishment, which is the cost of punishment times the probability of
getting punished. The main thing that makes real world systems sufficiently
secure is that bad guys who do break in are caught and punished often enough to
make a life of crime unattractive. The purpose of locks is not to provide
absolute security, but to prevent casual intrusion by raising the threshold for
a break-in.

Security is about
risk management

Well, what’s wrong with perfect defenses? The answer is
simple: they cost too much. There is a good way to protect personal belongings
against determined attackers: put them in a safe deposit box. After 100 years
of experience, banks have learned how to use steel and concrete, time locks,
alarms, and multiple keys to make these boxes quite secure. But they are both
expensive and inconvenient. As a result, people use them only for things that
are seldom needed and either expensive or hard to replace.

Practical security balances the cost of protection and the
risk of loss, which is the cost of recovering from a loss times its
probability. Usually the probability is fairly small (because the risk of punishment
is high enough), and therefore the risk of loss is also small. When the risk is
less than the cost of recovering, it’s better to accept it as a cost of doing
business (or a cost of daily living) than to pay for better security. People
and credit card companies make these decisions every day.

With computers, on the other hand, security is only a matter
of software, which is cheap to manufacture, never wears out, and can’t be
attacked with drills or explosives. This makes it easy to drift into thinking
that computer security can be perfect, or nearly so. The fact that work on
computer security has been dominated by the needs of national security has made
this problem worse. In this context the stakes are much higher and there are no
police or courts available to punish attackers, so it’s more important not to
make mistakes. Furthermore, computer security has been regarded as an offshoot
of communication security, which is based on cryptography. Since cryptography
can be nearly perfect, it’s natural to think that computer security can be as
well.

What’s wrong with this reasoning? It ignores two critical
facts:

·Secure systems are complicated, hence imperfect.

·Security gets in the way of other things you
want.

The end result should not be surprising. We don’t have
“real” security that guarantees to stop bad things from happening, and the main
reason is that people don’t buy it. They don’t buy it because the danger is
small, and because security is a pain.

·Since the danger is small, people prefer to buy
features. A secure system has fewer features because it has to be implemented
correctly. This means that it takes more time to build, so naturally it lacks
the latest features.

·Security is a pain because it stops you from
doing things, and you have to do work to authenticate yourself and to set it
up.

A secondary reason we don’t have “real” security is that
systems are complicated, and therefore both the code and the setup have bugs
that an attacker can exploit. This is the reason that gets all the attention,
but it is not the heart of the problem.

1Implementing security

The job of computer security is to defend against vulnerabilities.
These take three main forms:

1)Bad
(buggy or hostile) programs.

2)Bad
(careless or hostile) agents, either programs or people, giving bad
instructions to good but gullible programs.

3)Bad
agents tapping or spoofing communications.

Case (2) can be cascaded through several levels of gullible
agents. Clearly agents that might get instructions from bad agents must be
prudent, or even paranoid, rather than gullible.

Broadly speaking, there are five defensive strategies:

4)Coarse: Isolate—keep everybody out. It
provides the best security, but it keeps you from using information or services
from others, and from providing them to others. This is impractical for all but
a few applications.

5)Medium: Exclude—keep the bad guys out.
It’s all right for programs inside this defense to be gullible. Code signing
and firewalls do this.

6)Fine: Restrict—Let the bad guys in, but
keep them from doing damage. Sandboxing does this, whether the traditional kind
provided by an operating system process, or the modern kind in a Java virtual
machine. Sandboxing typically involves access control on resources to define
the holes in the sandbox. Programs accessible from the sandbox must be
paranoid; it’s hard to get this right.

7)Recover—Undo the damage. Backup systems
and restore points are examples. This doesn’t help with secrecy, but it helps a
lot with integrity and availability.

8)Punish—Catch the bad guys and prosecute
them. Auditing and police do this.

The well-known access control model shown in Figure 1 provides the framework for these strategies. In this
model, a guard controls the access of requests for service to valued resources,
which are usually encapsulated in objects. The guard’s job is to decide whether
the source of the request, called a principal, is allowed to do the
operation on the object. To decide, it uses two kinds of information: authentication
information from the left, which identifies the principal who made the request,
and authorization information from the right, which says who is allowed
to do what to the object. There are many ways to make this division. The reason
for separating the guard from the object is to keep it simple.

Of course security still depends on the object to implement
its methods correctly. For instance, if a file’s read method
changes its data, or the write method fails to debit the quota,
or either one touches data in other files, the system is insecure in spite of
the guard.

Another model is sometimes used when secrecy in the face of
bad programs is a primary concern: the information flow control model shown
in Figure
2 [‎5].
This is roughly a dual of the access control model, in which the guard decides
whether information can flow to a principal.

In either model, there are three basic mechanisms for implementing
security. Together, they form the gold standard for security (since they all
begin with Au):

·Authenticating principals,
answering the question “Who said that?” or “Who is getting that information?”.
Usually principals are people, but they may also be groups, machines, or
programs.

·Authorizing access,
answering the question “Who is trusted to do which operations on this object?”.

·Auditing the decisions of
the guard, so that later it’s possible to figure out what happened and why.

2Access control

Figure
1shows the overall model for access control. It
says that principals make requests on objects; this is the basic paradigm of object-oriented programming
or of services. The job of security is to decide whether a particular request
is allowed; this is done by the guard,
which needs to know who is making the request (the principal), what the request
is, and what the target of the request is (the object). The guard is often
called the relying party, since it relies on the information in the
request and in policy to make its decision. Because all trust is local, the
guard has the final say about how to interpret all the incoming information. For
the guard to do its job it needs to see every request on the object; to ensure
this the object is protected by an isolation
boundary that blocks all access to the object except over a channel that
passes through the guard. There are many ways to implement principals,
requests, objects and isolation, but this abstraction works for all of them.

Access Control: Access
control is broken down into authentication, authorization, and auditing.

Policy and User Model:
Access control policy is set by human beings—sometimes trained, sometimes
not.

This paper addresses one piece of the security model: access
control. It gives an overview that extends from setting authentication policy
through authenticating a request to the mechanics of checking access. It then
discusses the major elements of authentication and authorization in turn.

Every action that requires a security decision, whether it
is a user command, a system call, or the processing of a message from the net,
is represented in the model of as a request from a principal over a channel. Each
request must pass through a guard or relying party that makes an access control
decision. That decision consists of a series of steps:

Do direct authentication, which establishes
the principal directly making the request. The most common example of this
is verifying a cryptographic signature on a message; in this case the
principal is the cryptographic key that verifies the signature. Another
example is accepting input from the keyboard, which is the principal
directly making the request.[2]

(optionally) Associate one or more other principals
with the principal of step 1. These could be groups or attributes.

Do authorization, which determines whether any of these
principals is allowed to have the request fulfilled on that object.

The boundary between authentication and authorization,
however, is not clear. Different experts draw it in different places. It is
also not particularly relevant, since it makes little sense to do one without
the other.

Figure
3 shows the basic elements of authentication and how
they are used to log on a user, access a resource, and then do a network logon
to another host. Note the distinction between the elements that are part of a
single host and external token sources such as domain controllers and STS’s.
For concreteness, the figure describes the process of authenticating a user as
logon to Windows, that is, as creating a Windows session that can speak for the
user; in Windows a SID is a 128-bit binary identifier for a principal. However,
exactly the same mechanisms can be used to log onto an application such as SQL
Server, or to authenticate a single message, so it covers these cases equally
well.

A distributed system may involve systems (and people) that
belong to different organizations and are managed differently. To do access
control cleanly in such a system (as opposed to the local systems that are well
supported by Windows domains, as in the previous example) we need a way to
treat uniformly all the infor­mation that contributes to the decision to grant
or deny access. Consider the following example, illustrated in Figure 4:

Alice at Intel is part of a team working on a joint Intel-Microsoft
project called Atom. She logs in to her Intel workstation, using a smart
card to authenticate herself, and connects using SSL to a project web page
called Spectra
at Microsoft. The web page grants her access because:

1)The request comes over an SSL connection secured with a
connection key KSSL created using the Diffie-Hellman key
exchange protocol.

5)Microsoft’s
group database says that Alice@Intel.com
is in the Atom
group.

6)The
ACL on the Spectra
page says that Atom
has read/write access.

In the figure, Alice’s requests to Spectra travel
over the SSL channel (represented by the fat arrow), which is secured by the
key KSSL. In contrast, the reasoning about trust that allows Spectra
to conclude that it should grant the requests runs clockwise around the circle
of double arrows; note that requests never travel on this path.

From this example we can see that many different kinds of
information contribute to the access control decision:

·Authenticated session keys

·User passwords or public keys

·Delegations from one system to another

·Group memberships

·ACL entries.

We want to do a number of things with this information:

·Keep track of how secure channels are authenticated,
whether by passwords, smart cards, or systems.

·Make it secure for Microsoft to accept Intel’s
authentication of Alice.

·Handle delegation of authority to a system, for
example, Alice’s logon system.

·Handle authorization via ACLs like the one on
the Spectra
page.

·Record the reasons for an access control
decision so that it can be audited later.

This section describes the basic concepts, informally but in
considerable detail: principals and identifiers; speaks-for and trust; tokens;
paths, security domains, attributes, and groups; global identifiers; how to
choose identifiers and names, and freshness or consistency. Sections ‎5 and ‎6 describe the components of the architecture and how
they use these concepts.

A principal is the source of a request in the model of ;
it is the answer to the questions:

·“Who made this request?” (authentication)

·“Who is trusted for this request?” (authorization—for
example, who is on the ACL)

We say that the principal says the request, as in Psays doread report.doc. In addition to saying
requests, principals can also say speaks-for statements or claims, as explained
in section ‎4.2.

Principals are not only people and devices. Executable code is
a principal. An input/output channel and a cryptographic signing key are
principals. So are groups such as Microsoft-FTE and attributes such as age=32.
We treat all these uniformly because they can all be answers to the question
“Who is trusted for this request?”. Furthermore, if we interpret the question
“Who made this request?” broadly, they can all be answers to this question as
well: a request can be made directly only by a channel or key, but it can be
made indirectly by a person (or device) that controls the key, or by a group
that such a person is a member of.

It turns out to be convenient to treat objects or resources as
principals too, even though they don’t make requests.

Principals can be either simple or compound. Simple
principals are denoted by identifiers, which are strings. Intuitively,
identifiers are labels used for people, computers and other devices, applications,
attributes, channels, resources, etc., or groups of these.[5]
Compound principals are explained in section ‎5.8.

Channels are special because they are the only direct
principals: a computer can tell directly that a request comes from a channel,
without any other information. Thus any authentication of a request must start
with a channel. A cryptographic signing key is the most important kind of
channel.

An identifier is a string; often the string encodes a path,
as explained below. The string can be meaningful (to humans), or it can be
meaningless; for example, it can encode a binary number (Occasionally an identifier
is something that is meaningful, but not as a string of characters, such as a
picture.). This distinction is important because access control policy must be
expressed in terms of meaningful identifiers so that people can understand it,
and also because people care about the meanings of a meaningful identifier such
as coke.com,
but no one cares about the bit pattern of a binary identifier. Of course there
are gray areas in this taxonomy; a name such as davcdata.exeis not meaningful to
most people, and a phone number might be very meaningful. But the taxonomy is
useful none the less.

Meaningless identifiers in turn can be direct or not. This
leads to a three-way classification of identifiers:

·name: an identifier that is meaningful
to humans.

·ID: a meaningless identifier that is
not direct. In this taxonomy an identifier such as xpz5914@hotmail.com
is probably an ID, not a name, since it probably isn’t meaningful.

·direct: a meaningless identifier that
identifies a channel. There are three kinds of direct identifiers:

·key: a cryptographic key (most simply,
a public key) that can verify a signature on a request. We view a signing key
as a channel, and say that messages signed by the key arrive on the channel
named by that key.[6]

·hash: a cryptographic collision-free
hash of data (code, other files, keys, etc.): different data is guaranteed to
have different hashes. A hash H can
say X if a suitable encoding of “This data says
X” appears in the data of which H is
the hash. For code we usually hash a manifest that includes the hash
of each member file. This has the same collision-free property as a hash of the
contents of all the files.

·handle: an identifier provided by the
host for some channel, such as the keyboard (Strictly speaking, the wire from
the keyboard.) or a pipe.

An identifier can be a path,
which is a sequence of strings, just like a path name for a file such as C:\program files\Adobe\Acrobat6.
It can be encoded as a single string using some syntactic convention. There are
a number of different syntactic conventions for representing a path as a single
string; the file name example uses“\” as a separator. The canonical
form is left-to-right with / as the separator. A path can be rooted in a key,
such as KVerisign/andy@intel.com
(or KVerisign/com/intel/andy in the canonical
form for paths); such a path is called fully qualified. A path not
rooted in a key is rooted in self,
the local environment interpreting the identifier; it is like a relative file
name because its meaning depends on the context.

Authentication must start with a channel, for example, with
a cryptographic signature key. But it must end up with access control policy,
which has to be expressed in terms of names so that people can understand it.
To bridge the gap between channels and names we uses the notion of
“speaks-for”. We say that a channel speaks for a user, for example, if we trust
that every request that arrives on the channel comes from the user, in other
words, if the channel is trusted to speak for the user.

But the notion of speaks-for is much more general than this,
as the example of section ‎3 illustrates. What is the common element in all the
steps of the example and all the different kinds of information? There is a chain oftrust running from the request at one end to the Spectraresource at the
other. A link of this chain has the form

“Principal P speaks for principal Q
about statements in set R”

For example, KSSL speaks for KAlice
about everything, and Atom@Microsoft speaks for Spectra
about read and write. We write “about R”
as shorthand for “about statements in set R”. Often P is called
the subject and R is called the rights.

The idea of “P speaks for Q about R” is
that

if P says
something about R, then Q says it too

That is, P is trusted
as much as Q, at least for statements
in R. Put another way, Q takes
responsibility for anything that P says about R. A third way: P
is a more powerful principal than Q (at least with respect to R)
since P’s statements are taken at least as seriously as Q’s (and
perhaps more seriously). Thus P has all of Q’s authority about R.

The notion of principal is very general, encompassing any
entity that we can imagine making statements or being trusted. Secure channels,
people, groups, attributes, systems, program images, and resource objects are
all principals. The notion of speaks-for is also very general; some examples
are:

Binding a key to a user name.

Binding a program hash to a name for the program.

Allowing an authority to certify a set of
names.

Making a user a member of a group.

Assigning a principal an attribute.

Granting a principal access to a resource by
putting it on the resource’s ACL.

The idea of “about R” is that R is some way of
describing a set of things that P (and therefore Q) might say.
You can think of R as a pattern or predicate that characterizes this set
of statements, or you can think of it as some rights that P can exercise
as much as Q can. In the example of section ‎3, R is “all statements” except for step (5),
where it is “read and write requests”. It’s up to the guard of the object that
gets the request to figure out whether the request is in R, so the
interpretation of R’s encoding can be local to the object. For example,
we could refine “read and write requests” to “read and write requests for files
whose names match /users/lampson/security/*.doc”. In most ACEs today, R
is encoded as a bit vector of permissions, and you can’t say anything as complicated
as the previous sentence.

We can write this PÞRQ
for short, or just PÞQ without any
subscript if R is “all statements”. With this notation the chain for the
example is:

KSSLÞKlogonÞKAliceÞAlice@IntelÞAtom@MicrosoftÞr/wSpectra

A single speaks-for fact such as KAliceÞAlice@Intel
is called a claim. The principal on the left is the subject.

The way to think about it is that Þ is “greater than or
equal”: the more powerful principal goes on the left, and the less powerful one
on the right. So role=architect Þ Slava
means that everyone in the architect role has all the power that Slava has.
This is unlikely to be what you want. The other way, Slava Þ role=architect,
means that Slava has all the power that the architect role has. This is a
reasonable way to state the implications for security of making Slava an architect.

Figure
4 shows how the chain of trust is related to the
various principals. Note that the “speaks for” arrows are quite independent of
the flow of bytes: trust flows clockwise around the loop, but no data traverses
this path. The example shows that claims can abstract from a wide variety of real-world
facts:

·A key can speak for a person (KAliceÞAlice@Intel)
or for a naming authority (KIntelÞIntel.com).

·A person can speak for a group (Alice@IntelÞAtom@Microsoft).

·A person or group can speak for a resource,
usually by being on the ACL of the resource (Atom@MicrosoftÞr/wSpectra).
We say that Spectra makes this claim by putting Atom on its
ACL.

4.2.1Establishing
claims: Delegation

How does a claim get established? It can be built in; such facts
appear in the trust root, discussed in section ‎5.1. Or it can be derived from other claims, or from
statements made by principals, according to a few simple rules:

From the definition of Þ, if Q'saysPÞQ
and Q'ÞQ
then QsaysPÞQ,
and it follows from ‎(S3) that PÞQ. So a
principal is trusted to delegate the authority of any principal it speaks for,
not just its own authority. Frequently a delegation is restricted so that the
delegate P speaks for Q only for requests (this is the usual
interpretation of an X.509 end-entity certificate, for example, or membership
in a group) or only for further delegation (an X.509 CA certificate, or GROUP_ADD/REMOVE_MEMBER
permission on the ACL for a group).

A claim usually has a validity period, which is an interval
of real time during which it is valid. When applying the rules to derive a claim
from other claims and tokens, intersect their validity periods to get the validity
period of the derived claim. This ensures that the derived claim is only valid
when all of the inputs to its derivation are valid. A claim can be the result
of a query to some authority A. For
example, if the result of a query “Is P
in group G” to a database of group
memberships is “Yes”, that is an encoding of the claim PÞG.
The validity period of such a statement is often just the instant at which the
response is made, although the queryer might choose to cache it and believe it
for a longer time.

A claim made by a principal is called a token (not to
be confused with a user authentication token such as a SecurID device). Many
tokens are called certificates, but this paper uses the more general term
except when discussing X.509 certificates specifically. The rule ‎(S3) tells you whether or not to believe a token; section ‎4.5 on global identifiers gives the most important
example of this.

where KI is the issuer key, KS
is the subject key, “name” is the certified name, KV is Verisign’s
key, H(code) is the hash value of the code being signed, “publisher” is the
name of the code’s publisher, KD is the key of the domain
controller, SU is the SID of the user and SG
is the SID of the group of which the user is a member. XrML tokens can do all
of these things, and more besides.

A token can be signed in several different ways, which don’t
change the meaning of a token to its intended recipient, but do affect how
difficult it is to forward:

·A token signed by a public key, like a X.509
certificate, can be forwarded to anyone without the cooperation of the third
party. From a security point of view it is like a broadcast.

·A token signed by a symmetric key, like a Kerberos
ticket, can be returned to its sender for forwarding to anyone with whom the
sender shares a symmetric key.

·A token that is just sent on an authenticated
channel cannot be forwarded, since there’s no way to prove to anyone that the
sender said it.

In a token the principals on both sides of theÞmust be represented by identifiers, and it’s important for
these identifiers to be unambiguous. A fully qualified identifier (one that
starts with a key or hash) is unambiguous. Other identifiers depend on the context,
that is, on some convention between the issuer and the consumer of the token.

Like a claim, a token usually has a validity period; see
section ‎4.2.2. For example, a Kerberos token is typically valid for
eight hours.

A token is the most common way for a principal to communicate
a claim to others, but it is not the only way. You can ask a principal A
“Do you say PÞQ?” or “What principal
does P speak for?” and get back “Asays ‘yes’” or “Asays ‘Q’”. Such a statement only makes sense as a response
to the original query; to be secure it must not only be signed by (some
principal that speaks for) A, but also be bound securely to the query
(for example, by a secure RPC protocol), so that an adversary can’t later
supply it as the response to some other query.

4.4Organizing principals

4.4.1Security
domains

A security domain is a collection of principals (users,
groups, computers, servers and other resources) to which a particular set of
policies apply, or in other words, that have common management. Usually we will
just say “domain”. It normally comprises:

·A key KD.

·A namespace based on that key.

·A trust root—a set of claims of the form Kj1 /\
Kj2 ... Þ identifier-pattern

·ACLs for the trust root and the accounts, which
define the administrators of the domain.

·A set of accounts—statements of the form KDsaysKiÞKD/N
for principals with names in its namespace.

·A set of resources and policies for those resources

The essential property of paths is that namespaces with
different roots are independent, just as different file system volumes are
independent. In fact, namespaces with different prefixes are
independent, just as file system directories with different names are independent.
This means that anybody with a public key K can create a namespace
rooted in that key. Such a namespace is the most important part of a security
domain. Because of ‎(S2), K speaks for the domain. Because of ‎(S3), if you know K-1 you can delegate
authority over any part of the domain, and since K is public, anyone can
verify these delegations. This means that authentication can happen independent
of association with any domain controller. Of course, you can also rely on a
third party such as a domain controller to do it for you, and this is necessary
if K is a symmetric key.

For example, an application such as SQL Server can create
its own domain of objects, IDs, names and authorities that has no elements in
common with the Windows domain of objects, IDs, names and authorities for the
machine on which SQL Server is running. However, the SQL Server can use part or
all the Windows security domain if that is desired. That use is controlled by
policy, in the form of trust root contents and issued tokens.

Here are some other examples of operating in multiple
security domains:

1.A
user takes a work laptop home and connects to the home network, which has no
connection to the work security domain.

2.A
consultant has a laptop that is used in working with two competing companies.
For each company, the consultant has a virtual machine with its own virtual
disk. Each of those virtual machines joins the Windows domain of its respective
company. The host OS, however, is managed by the consultant and has its own
local domain.

Sometimes we distinguish between resource domains and
account domains, depending on whether the domain mostly contains resources or
objects, or mostly contains users or subjects.

Domains can be nested. A child domain has its own
management, but can also be managed by its parent.

4.4.2Attributes

An attribute such as age=32 is a
special kind of path, and thus is a principal like any other. This one has two
components, the nameage
and the value32; they are
separated by “=” rather than “/” to emphasize the idea that 32 is a value
for the attribute name age, but this is purely syntactic.[7]
The claim PaulÞage=32 expresses
the fact that Paul has the attribute age=32. Like any path, an
attribute should be global if it is to be passed between machines: Koasis/age=32.
However, unlike file names or people, we expect that most attributes with the
same name in many different namespaces will have the same intended meaning in
all of them. A claim can translate the attribute from one namespace to another.
For example, WA/dmv/ageÞNY/rmv/age
means that New York trusts WA/dmv for the age attribute.
Translation can involve intermediaries: WA/dmv/ageÞUS/age and US/ageÞNY/rmv/age means that New York trusts US for age,
and US in turn trusts Washington (presumably US trusts lots of other states as
well, but these claims don’t say anything about that). Locally, of course, it’s
fine to use age=32; it’s a local name, and if you want to translate US/age=32
to age=32
you need a trust root entry US/ageÞage. In fact,
from the point of view of trust age=32 is just like a nickname. The
difference is that we expect lots of translations, because we expect lots of
principals to agree about the meaning of age, whereas
we don’t expect wide agreement about the meaning of Bob.

Because of the broad scope of many attribute names such as age,
the name of an attribute can change as it is expressed in different languages
and even different scripts. Therefore it is often necessary to use an ID rather
than a name for the attribute in policy. For example, an X.509 object
identifier or OID is such an ID. Sections ‎4.5 and ‎4.6 discuss the implications of this; what they say
applies to attributes as well.

A Boolean-valued attribute (one with a value that is true or
false), such as over21, defines a group; we normally write it that
way rather than as over21=true. The next section discusses groups.

4.4.3Groups
and conditions

A condition is a Boolean expression over attribute names and
values, such as “microsoft.com/division == ‘sales’ &
microsoft.com/region == ‘NW’”. A condition is a principal; every
principal that speaks for attributes whose values cause the expression to
evaluate “true” speaks for the condition. In the preceding example,
every Microsoft employee in the northwest sales region would speak for it.

For use in conditions, identifiers are considered to be
Boolean-valued attributes that evaluate true for the principals that speak for
them. Hence the condition paul@microsoft.com | carl@microsoft.com
is true for paul@microsoft.com and carl@microsoft.com.
It is also true for the key K if K => paul@microsoft.com.

In addition, there are special attributes, such as time,
that may be used in conditions; every principal is considered to speak for
them. For example, “time >= 0900 & time <= 1500 &
shift == ‘day’ & jobtitle == ‘operator’” would be true for all day-shift
operators between 9am and 5pm.

If C is a
condition, and a principal P has attributes
whose values cause C to evaluate
true, then we write:

PÞC

We can give a condition an identifier (a name or an ID) by saying
that the condition speaks for the identifier:

CÞidentifier

We call such an identifier a group.[8]
A group is thus a principal with zero or more other principals that speak for
it. If a principal speaks for the group, we say that it is a member of the group. Today’s groups are
defined by a condition that is just the “or” of a list of members. In such a
case, it’s possible to provide a complete list of all the group members, but
this is not always true. The distinction is important for a principal with the
authority to define members, but it is invisible to access control, which only
cares about a requestor P presenting
a claim PÞG
and GÞresource being on the ACL.

Such an authority will only issue such a claim if it:

·Has access to a complete list of the group members
(such as Paul, Carl, Charlie), and P is in it, or

·Has access to a partial list of the group
members and P is on its partial list; there may be several such lists,
each accessible to a different issuer, or

·Knows that P satisfies the condition that
defines the group (such as age>=21).

The question of who is trusted to assert PÞG,
that is, who can define the members of a group, is part of authorization.

To avoid confusion, identifiers communicated
between computer systems should be global. If a set of systems doesn’t communicate
with the rest of the world, they only need to agree among each other. However,
when these systems suddenly do need to share identifiers (perhaps because they
merge with another set of systems), collisions of identifiers can occur, requiring
a massive renaming of entities. To avoid such problems, all identifiers that
might travel between computers should be global, except perhaps names intended
to communicate to a human being.

An identifier is global if everyone agrees
on its meaning, that is, when presented with a request and some supporting
evidence, everyone either agrees on whether the identifier is the principal
that made the request or doesn’t know. A key or hash is automatically global;
cryptography makes it so. Other identifiers are paths (perhaps of length one).

A path rooted in a key, such as Kintel/andy@intel.com,
is called fully qualified. Such identifiers are global, because Kintelis global, and according to rule ‎(S2) above it can say what other keys can speak for
identifiers rooted in itself. For example, Kintel
can establish that Andy’s key Kandy speaks for the
name Kintel/andy@intel.com,
by signing a certificate (token):

Paths not rooted in keys are rooted in self,
the local environment interpreting the identifier. They are not global and
therefore should not be sent outside the local environment.

We would like to treat an identifier like andy@intel.com
(or /com/intel/andy
in the canonical form) as global, even though it is not rooted in a key,
because we want to keep keys out of most policy. This is a conventionally global
identifier: we make it very likely that almost everybody agrees about what
speaks for it, by making it very likely that everyone agrees that KandyÞ andy@intel.com.
We do that by getting the same agreement that Kintel => intel.com;
then everyone will accept Kintel’s certificate
‎(C1). Of course this is the same problem, and we can solve
it in the same way: agree that KverisignÞ com,
and get a certificate

(C2)Kverisign says KintelÞ intel.com

This recursion has to stop somewhere, and it
stops in a special part of the security policy called the trust root,
where some of these facts are built in. The essential idea is:

Provided
their trust roots agree and they have the same tokens, two parties will agree
on what keys speak for a conventionally global identifier.

One case in which the parties might disagree
is while a key is being rolled over or replaced, but only if they have different
tokens—one has heard about the key change and the other one hasn’t.

Section ‎5.1 discusses the trust root in detail, and section ‎5.1.1 explains how to make it likely that two trust roots
agree.

Although any kind of path could be a
conventionally global identifier, the ones that people cares most about are DNS
names (see section ‎4.7). Email names are important too, but they usually
don’t require special attention because there’s a single DNS name that authenticates
a given email name.

·Meaningful (to humans): When security
policy such as group definitions, access control lists, etc is displayed to
humans, identifiers must be meaningful, since people must be able to understand
the policy. Only names are meaningful. Another consequence is that only names
are controversial: no one cares what bit pattern your public key has, or what
domain ID your SID uses, but people do care who controls microsoft.com
or mit.edu.

·Long-lived: The identifier doesn’t need
to change when encryption keys or names change. This is desirable, because much
security policy is long-lived: the identifier may appear on ACLs for objects
that last for decades, and that are scattered over the internet or written on
DVDs. Neither names nor direct identifiers can be guaranteed to be long-lived,
since people get married, join a new organization, or otherwise change their
minds about names, and keys can be compromised and need to change.

·Direct: some identifiers must be
direct, since only direct identifiers can actually make requests. Direct
identifiers are neither meaningful nor long-lived.[9]

The following table summarizes the choices:

Property

Identifier
type

Meaningful

Long-lived

Direct

Name

yes

no

no

ID

no

possibly

no

Direct (keys
etc.)

no

no

yes

We can distinguish three main places where an identifier may
appear:

·As the direct source of a request, where it must
be direct, since all the machine directly knows about the source of a request
is the channel it arrives on.

·In the user’s view of access control policy,
where it must be meaningful, in other words, a name.

·In access control policy stored in the system,
where it’s desirable for it to be long-lived, but it could have none of these
properties as long as there is extra machinery to make up the lack.

As peer-to-peer operation grows—both personal P2P and corporate
P2P—identifiers for principals will show up in access control policy far and
wide. An identifier might be on ACLs on machines and DVDs all over the world,
with no record of where those machines are. It might also be in tokens such as XrML
licenses, SAML or XACML tokens, certificates in various forms, etc., which are
another way to express access control policy. These signed statements can be
carried anywhere, can be backed up, can be transferred from one machine to
another. Again, there is no requirement that each such statement have its
location registered in any central place. Hence it’s often desirable for the
identifiers in access control policy to be long-lived.

Since no identifiers satisfy all the requirements, there
have to be ways of mapping among them:

·When a request or a token comes in, it can only
be authenticated as coming from a direct principal, that is, a channel C,
so there must be a mapping CÞP to a stored
principal.

·When a user wants to examine or edit policy they
need to see a meaningful principal M, so there must be mappings in both
directions MÞP and PÞM.

Any kind of identifier can appear in stored access control
policy. As we have seen, however, it’s often important for stored identifiers
to be long lived, so that the policy doesn’t have to change when the
identifiers change. It’s therefore advantageous to use a particular kind of ID
called a SID for stored policy, because SIDs are carefully constructed to be
long-lived; see section ‎4.7. There has to be a reliable correspondence between
SIDs and names so that policy can be read and written by people, but this
correspondence can change with time. There also has to be a reliable
SID↔key correspondence so that requests can get access.[10]

The preferred approach to keys is complementary to this one:
the only long-term place to store keys should be the trust root (see section ‎5.1), which contains facts about principals that are
installed manually and accepted on faith in reasoning about authentication.

4.6.1Anonymity

Sometimes people want to avoid using the same identifier for
all their interactions with the world, because they want to preserve their
anonymity. A variation on this is that they don’t want their actions at one web
site, for example, to be correlated with their actions at another site; this
kind of correlation is called tracking.

Since there is no shortage of encryption keys or identifiers,
it’s easy for a computer to generate as many identifiers for me as I want, for
example, a different one for every web site I interact with. The computer can
keep track of which identifier to use at which site. If you are really
paranoid, you can use a different identifier each time you go to the same
site.

In many case, this by itself is sufficient. Sometimes,
however, a web site or other party may want to know something about me: that I
am over 18, or have a decent credit rating, or whatever. For this purpose a
mutually trusted third party such as Live or Consumer Reports can
authenticate one of my identifiers, certifying, for example, Kbwl‑amazonÞover18.
The protocol for this is simple: I authenticate to Live, I give Kbwl‑amazon
to Live and ask for a certificate, and I get back KlivesaysKbwl-amazonÞover18.

SIDs contain a 96
bit domain identifier plus a 32 bit relative identifier within the domain. Thus
the structure is D/R. To distinguish SIDs from other identifiers we prefix SID, so the full identifier is SID/D/R, but we will usually
omit the SID/ prefix
here. Roughly speaking, D corresponds to something like microsoft.com, and R to blampson or the server red-msg-70, so D/R corresponds
to blampson@microsoft.com
or red-msg-70.microsoft.com.

3.There are plenty of them, so they don’t have to
be rationed (except to prevent denial of service attacks on ID services that
map SIDs to keys).

4.They are (two part) paths D/R, so that a key
that speaks for a domain D can speak for lots of SIDs in that domain.

Because of (‎1) and (‎2) a SID is a long-lived identifier that is suitable
for long-lived policy such as ACLs.

Since there are
plenty of domain identifiers, you can get a new one just by choosing a 96 bit
random number; this is reasonable because one D is as good as another. The
chance of an accidental collision is very small (once every 8,000 years if
there are a thousand new domains per second); we consider collisions caused by
malice shortly. Some domains will have only a few SIDs (that is, a few values
of R for one D), for example, a domain for a person, family, or
small organization. But most SIDs will probably be in large domains belonging
to corporations or to Internet services such as Live or Yahoo.

As we saw in the
previous section, we need to know KÞD/R
so that we can authenticate a statement signed by K as coming from D/R.
We also need to know nameÞD/R
and D/RÞname
so that users can read and change policy that is stored in terms of SIDs. These
mappings could be strictly local if the local administrator takes responsibility
for setting up and maintaining them, but in general it will come from someone
who speaks for D/R (for example, someone who speaks for D) or for
name (for example, microsoft.com
if name is billg@microsoft.com).

Note that joining a
Windows domain is quite different from learning KDÞD. A machine can only be joined to one
domain, and a domain joined machine trusts its domain controller for any
SID, and also for various management functions. A machine or session can know
about lots of domains, and it trusts each one only for its own SIDs.

To simplify the
handling of domain key changes and malicious (as opposed to accidental)
conflicts for domain identifiers, it’s desirable to have one or more domain ID
services, which are intended to issue tokens KDRsaysKDÞD. Then instead of having a trust root
entry for each D that you encounter, you only need one that says KDRÞSID/*
for each ID service that you want to trust. For greater security, you could
configure your trust root with n domain ID services and a requirement
that k of them agree on KDÞD before it is believed; see section ‎5.1.2 for more on this. As with other kinds of trust root
entries, an entry KDÞD
for a specific domain takes precedence, or disagreement is referred to the
administrator; see section ‎5.1. For this to work well, there should not be too many
ID services and the scope of each one should be wide.

The domain ID
service can work as a simple web service with no human operator involvement
only because what it records has no intrinsic value. The ID service is designed
specifically and only to meet the needs of authentication. It offers only one
public query: “Is KDÞD
a registered claim?”[11] It is intentionally not a general purpose
directory. It is intentionally limited never to become a general purpose
directory. Nothing stops people from making more general directories, but those
are not domain ID services.

In addition to the
query, there is one operation for registering new values of D. The input
parameters are D, a public key KD, and an optional
password PW encrypted by KDR that can be used for
resetting KD. The request is signed by KD-1.
There is no other authentication. In particular, there is no linkage to any PII
or to any other information that would require human operators at the domain ID
service. After success, KDÞD is a registered claim.

Windows domains
today implement a highly simplified version of this scheme, since a domain
joined machine trusts its domain controller for any SID.

The purpose of a
name is to be meaningful to a human. Most useful names are paths, and the
preferred (conventionally) global names are DNS and email names such as research.microsoft.com or billg@microsoft.com. As we did
with SIDs, to distinguish DNS names from other identifiers we prefix DNS, so the full identifier is DNS/com/microsoft/research, but
we will usually omit the DNS/
prefix here and use the standard DNS syntax.

The crucial
security questions about a name are what real world entity it identifies, and
what key or SID speaks for it. To answer the second question, you consult the trust
root, together with any tokens that are relevant. Thus the trust root might
contain

KVerisignÞ DNS/*; KbillgÞ billg@microsoft.com

Here the second name is
written in its conventional email form; as a canonical path name it would be DNS/com/microsoft/email/billg.
The rule for trust roots (see section ‎5.1) is that the more specific entry governs, so that
what Verisign or Microsoft have to say about billg@microsoft.com will be ignored.

Today’s X.509 trust
roots usually grant a certificate authority such as Verisign authority over all
DNS names; that is what the KVerisignÞ DNS/* claim in the example
says. Although there are ways to limit the names that such a key can speak for,
today they are obscure. Such limits are of fundamental importance, and need to
be easy to set and understand.

Adding an entry for
a name to the trust root must be a human decision, so the procedure by which
the human decides that it’s the right thing to do, called a ceremony,
must be carefully designed. A ceremony is like a network protocol but includes
human components as well as computers. for more on this topic.

Secure
communication requires more than assurance that a message came from a known
source; it also requires freshness, a guarantee that the message is sufficiently
recent. Without freshness, a bad guy can make trouble by replaying old
messages, which might well be misinterpreted in the current context. For
example, consider a request to a service to write a check for $10,000. Replaying
this request should not result in a second check. Or consider a request that
asks “Does key K speak for microsoft.com?”
and expects a yes or no answer. If a previous request that asked “Does key Kmicrosoft
speak for microsoft.com?”
got a “yes” answer, it should not be possible to replay this answer and get the
requester to accept it as the answer to the later request.

There are many ways
to ensure freshness. In a request-response protocol like the second example
above, you tag the request with a sequence number and demand the same sequence
number in the response. Such a tag is called a nonce or challenge.
To ensure that an incoming message is fresh, in particular that it was
generated since you chose a nonce, you insist that it contain some evidence
that the sender received that nonce.

The essential
property of a nonce is that it is not reused; nonces may be ordered, but this
is usually unimportant. If you want to prevent the responder from precomputing
the response, a nonce must be unpredictable; frequently this is not a
requirement. Often there are two layers of freshness. For example, a sequence
of requests might be carried on a channel that is secured with a fresh key.
Then the nonces need only be unique within that sequence, since a different
sequence of requests will be secured with a different key. In this example the
sequence numbers on the messages don’t need to be unpredictable.

To ensure that a
key is fresh, generate it by hashing some data that includes a newly generated
random number. For two party two-way communication, each party should generate
its own random number to be included in the hashed data; this gives each party
assurance of freshness, and also ensures a good key even if one of the parties
is not good at generating random numbers.

For broadcast
communication such as a certificate signed with a public key, nonces don’t work
because the receivers don’t send anything to the broadcaster beforehand. Instead,
we usually rely on a timestamp in the certificate for freshness. The validity
period in a token is an example of such a timestamp. You might also want to use
a timestamp to avoid a round trip, for instance when sending email. It’s not as
conclusive as a nonce because of clock skew (and perhaps because it’s predictable).

There is a
fundamental tradeoff between consistency (or freshness) and availability. A
is consistent with B if A’s view of B’s state agrees with B’s
actual state.[12] The only way to ensure this is for A
to hold a lock on B’s state, but this means that A has to
communicate with B to acquire the lock, and after that B can’t
change its state until A releases the lock. This is usually unacceptable
in a distributed system because it hurts availability too much: if A and
B can’t communicate, one of them is going to be stuck.[13]

The alternative is
for A to settle for a view of B’s state at some time in the past;
often this is cached information. Now there is a tradeoff among freshness (how
far in the past?), availability, and performance (how often does A check
for changes in B’s state?). This tradeoff is fundamental; no cleverness
in the implementation can avoid it. The choice is between acting on old (perhaps
cached) information, and getting stuck when you can’t communicate. This is a
management decision and it must be exposed to management control. At least two
parameters must be settable by the relying party (perhaps taking account of
hints in the token):

1.How old data can be and still be acted on (the
tradeoff between freshness and availability).

2.How frequently data should be refreshed (the
tradeoff between freshness and performance).

The way to get the
freshest information is for A to ask B for its state right now.
This still doesn’t guarantee perfect consistency, since B’s state can
change between the time that B sends its reply and the time that A
receives and acts on it, but it’s the best you can do for consistency without a
lock. The way to get the greatest availability and the least communication cost
is for A to act on any view it has of B’s state, no matter how
old.

This issue shows up
most often for authentication in the validity period of a token. A short
validity period means that the token is fresh, but also that new tokens must be
issued and distributed frequently. A long validity period means that once you
have the token you’re good to go, but the token’s issuer might have changed its
mind about the claims in it. Note that there’s nothing to stop a relying party
from using a different validity period from the one in the token.

If you have issued
a token and you want to cancel it, is there any alternative to letting the
validity period expire? Well, yes and no. Yes, because you may be able to
revoke the token. No, because the revocation is just another kind of token,
with a shorter validity period.

The idea behind
revocation is that you need two tokens to justify a claim: the original
token Tk that is “issuer says subject Þ ... as long as revoker confirms”, and another confirmation
token “revoker saysTk
is still valid” that has a much shorter validity period than Tk. This is
better than simply issuing Tk with a short validity period because the revoker
is optimized for issuing confirmation tokens cheaply, quickly, and with high
availability. It can’t grant any access by itself, and it doesn’t need any
detailed information about the principals involved. Its database consists
simply of tokens revoked by their issuers. When queried about Tk, it
checks that database and issues a confirmation token if the database doesn’t
say that Tk is revoked.

To add an entry to
the revoker’s database, the original issuer writes a token “issuer says the token identified by TkId
has been revoked” and sends it to the revoker. TkId could be a hash of
the original token or a serial number embedded in the original token. The
revoker puts (issuer, TkId) in its database. Since issuers can only
revoke their own tokens, the revoker doesn’t need to know anything about the
issuers (unless it wants them to pay). The only harm the revoker can do is to
revoke tokens without instructions, that is, mount a denial of service attack.

Because it is much
simpler than most issuers and because it can’t grant any access by itself, the
revoker can afford to issue confirmation tokens with short validity periods,
and it can be replicated for high availability. It’s important to understand,
however, that this is a difference of degree and not of kind. The tradeoffs described
in section ‎4.9.1 still apply; only the parameters are different. For systems
that are expected to be connected to the Internet, it’s reasonable to use a
validity period of a few minutes (or the length of a session, if that is
greater). Policy might say that if you can’t contact a revoker, you should accept
the token anyway.

There are several schemes
for revocation. The original X.509 standard specifies a method called a
Certificate Revocation List (CRL), but this has fallen out of favor. The revocation
scheme usually used for X.509 certificates is the Internet standard OCSP; see [‎3]. It’s undecided what revocation scheme should be used for
other tokens; currently there is none.

This section describes the core components of authentication,
highlighted in Figure
5: the trust root, token sources, and the speaks-for
engine. Then it touches briefly on other components: user logon, device and app
authentication, compound principals, and capabilities.

Where do these claims come from? They can be known, (that
is, built in), or they can be deduced from other claims or from tokens, which
are claims made by known principals. The trust root holds the built in claims,
token sources supply tokens, and the speaks-for engine makes the deductions.
Thus these components are the core of authentication:

1.The
trust root holds claims that we know, such as KVerisignÞ Verisign.
All trust is local, so the trust root is the basis of all trust.

The trust root is a local store, protected from tampering,
that holds things that a system (a machine, a session, an application) knows
to be true. Everything that a system knows about authentication is based on
facts held in its trust root. The trust root needs to be tamper-resistant
because attackers who can modify it can assign themselves all the power of any
principal allowed on the system.

The trust root is a set of claims (speaks-for facts) that
say what keys (or other identifiers) are trusted and what identifiers (names,
SIDs) they can speak for. Typical trust root entries are:

KDÞSID/D

key KD
speaks for domain identifier D

KMicrosoftÞmicrosoft.com

key KMicrosoft
speaks for the name microsoft.com

KVerisignÞDNS/*

the key KVerisign
speaks for any DNS name

KDRÞSID/*

key KDR
speaks for all domain identifiers

Because all trust is local, the trust root is local, and it
must be set up manually. It must also be protected, like any other local store
whose integrity is important. Because manual setup is expensive and
error-prone, a trust root usually delegates a lot of authority to some third
party such as a domain controller or certificate authority. The third claim example
above, KVerisignÞDNS/*,
is such a delegation. It says that Verisign’s key is trusted for any DNS name.
Another example of such a delegation is the first one above, KDÞSID/D,
which delegates authority over the domain identifier D to the key KD.

All
trust is partial.

For convenience people tend to delegate a great deal of
authority in the trust root. For example:

·Today Microsoft Update is trusted by default to
change entries in a Windows X.509 trust root.

This is not necessary, however. In a speaks-for claim, a
delegation can be as specific as desired. Existing encodings of claims are not
completely general, but for example, name constraints in a X.509 certificate
can either allow or forbid any set of subtrees of the DNS or email namespace.

A very convenient way of limiting the authority of the
delegation in the trust root is the rule that “most specific wins”. According
to this rule, a trust root with the two entries

KVerisignÞ DNS/*; KMSÞ microsoft.com

means that KVerisign speaks for every DNS
name except those that start with microsoft.com. It may also be desirable
to find out what key KVerisign says speaks for microsoft.com,
and notify an administrator if that key is different from KMS.

As we saw in section ‎4.5, we would like to use names such as microsoft.com
as global identifiers. Since this name doesn’t start with a key and therefore
is not fully qualified, however, and since all trust is local, this can only be
done by convention. There is nothing except convention to stop two different
trust roots from trusting two different keys to speak for microsoft.com,
or from delegating authority over *.com to two different third parties
that have different ideas about what PKI speaks for microsoft.com.

Our goal is that “normal” trust roots should agree on conventionally
global identifiers (SIDs and DNS names). We can’t force them to agree, but we
can encourage them to consult friends, neighbors and recognized authorities,
and to compare their contents and notify administrators of any disagreements.

As long as trust roots delegate authority to the same third
parties they will agree. If they delegate to two different third parties that
agree, the trust roots will also agree. So it is desirable to systematically
detect and report cases where recognized authorities disagree.

The cryptographic mechanisms used in distributed authentication
merely take the place, in the digital world, of human authentication processes.
These are not just human-scale scenarios performed faster and more accurately,
however; they are scenarios that are too complex for unaided humans. Therefore
it’s important that human intervention be needed as seldom as possible.

It’s simple to roll over a cryptographic key automatically,
which is fortunate since good cryptographic hygiene demands that this be done
at regular intervals. The owner of the old key simply signs a token KoldsaysKnewÞKold.
Both keys will be valid for some period of time. The main use of these tokens
is to persuade each authority that issued a certificate for Kold
to issue an equivalent certificate for Knew.

When a cryptographic key is stolen or otherwise compromised,
or the corresponding secret key is lost, things are not so simple. If the key
is compromised but not lost, often the first step is to revoke it with a
revocation certificate Koldsays “Kold is no longer valid”; by a slight
extension of ‎(S3), everyone believes this. See section ‎4.9.2.

The lost or compromised key must now be replaced with a new
key. That replacement process requires authentication. In the simplest case,
there is an authority responsible for asserting that the key speaks for a SID
or name, for example, a trust root (the base case), Verisign or a domain ID service.
This authority must have a suitable ceremony for replacing the key. Here are five
examples of such a ceremony:

·You sign a replacement request with a backup key.

·You visit the bank in person.

·You give your mother’s maiden name.

·You call up your associates in a P2P system on
the phone and tell them to change their trust roots.

·Microsoft takes out full page ads in every major
newspaper announcing that the Microsoft Update key has been compromised and
explaining what you should do to update the trust root of your Windows systems.

5.2Token sources

Recall that a token is a signed claim (speaks-for statement):
issuer saysPÞQ.
In today’s Windows, the sources of tokens are highly specialized to particular
protocols. For example, a domain controller provides Kerberos tokens, and the
SSL protocols obtain server and client certificates. Any entity that obeys a
suitable protocol (like the STS protocol for Web Services) can be a source of tokens.

The same host may get tokens from many sources, and any kind
of token source can be local, remote, or both. In addition to coming from domain
controllers, protocols such as SSL and IPSec, and Web Services Security Token
Services, tokens can come from public key certificate authorities, from peer
machines, from searches over web pages or online databases that contain tokens,
from Personal Trusted Devices such as smart cards or (trusted) cellphones, and
from many other places. In corporate scenarios most if not all tokens will
probably come from the corporate authentication authority, but in P2P scenarios
they will often come from peer machines as well as from services such as Live.
This means that a standard Windows machine needs to be a token source.

The simplest kind of token to manage is signed by a key, and
therefore can be stored anywhere since its security depends only on the
signature and not on where it is stored. If the token is signed by a public
key, anyone can verify it. However, a token can also be signed by a symmetric
key, and in this case it usually must come from a trusted online source that
shares the symmetric key with the recipient of the token.

The job of the speaks-for engine is to derive conclusions
about what principals are trusted, starting from claims and adding information
derived from tokens. The starting claims are:

·The ones in the trust root.

·If you are checking access to a resource that
has an ACL, the claims in the ACL. Recall that we view an ACL entry as a claim
of the form SID Þpermissions resource.

Today this reasoning is done in a variety of different
places. For example, in Windows:

·Logon, both interactive and network, derives the
groups and privileges that a user speaks for; this is called group expansion.
Part of this work is done in the host, part in the domain controller.

·X.509 certificate chain validation, which is
used to authenticate SSL connections, for example, derives the name that a
public key speaks for. In Windows it also does group expansion and optionally
maps a certified name to a local account.

·AccessCheck uses an NT token, which asserts
that a thread speaks for every SID in a set, and an access control list, which
asserts that every SID in a set speaks for a resource, to check that a thread
making a request has the necessary access (that is, speaks for) the resource.

·A Web Services STS takes authentication tokens
supplied as input and a query, and produces new tokens that match the query. It
can do this in any way it likes, but in many cases it has a database that
encodes a set of claims (for example, associating keys with users or users with
attributes), and the tokens it produces are just the ones that the speaks-for
engine would produce from those claims and the inputs.

Although some or all of these specialized reasoning engines
may survive for reasons of performance or expediency, or because they implement
specialized restrictions, every conclusion about trust should be derived from a
set of input claims and tokens using a few simple rules.

The implementation of this tenet is a speaks-for engine,
a piece of code that takes a set of claims and tokens as input and produces all
of the claims that follow from this input. More practically, it produces all of
the claims that match some query. In general, the query defines a set of
claims. For example, for an access to a resource, the query is “Does this
request speak for this resource about this operation”. For group expansion, the
query is “What are all the groups that this principal speaks for”.

The speaks-for engine produces one or more chains of trust
demonstrating that principal P speaks for resource T about access
R. For example, in section ‎3 we saw how to demonstrate that KSSLÞr/wSpectra
by deriving the chain of trust

KSSLÞKlogonÞKAliceÞAlice@IntelÞAtom@MicrosoftÞr/wSpectra

Each link in this chain corresponds to
a claim, either already in the trust root or derived from a token. For example,
we derive KAliceÞAlice@Intel.com from the token KIntelsaysKAliceÞAlice@Intel.com,
using the claim KIntelÞIntel.com.
This fact comes either from the trust root or from another token KVerisignsaysKIntelÞIntel.com,
using the claim KVerisignÞ*.com. So the
main chain of trust has auxiliary chains hanging off it to justify the use of
tokens. The entire structure forms a proof tree for the conclusion KSSLÞr/wSpectra.

When P is a set of SIDs in an NT token, R is a
permission expressed in the bit mask form used in Windows and Unix ACLs and T
has an ACL, this is a very simple, very efficient computational proof.

The full speaks-for calculus extends the flexibility and
power of this statement. P can be a principal other than SIDs. T can
be the name of a resource or a named group of resources. Rights R can be
expressed as names and as named groups of rights. A principal P can
delegate to Q its right R to T by the token PsaysQÞRT
(if P has the right to do this).

For example, what can be delegated in an X.509 certificate
chain is the permission to speak for some portion of the namespace for which
the chain’s root key can speak. This does not include the ability to define
groups, for example, because group definition is outside the X.509 certificate
scope. For that, one can use another encoding of a speaks-for statement
(perhaps in SAML, XACML or ISO REL). From the speaks-for engine deduction we
can establish that some key (bound to an ID by X.509) speaks for some group
(defined by the other encoding—e.g., SAML), and establish that without having
to teach SAML to understand X.509 or teach X.509 to understand SAML.

5.4Additional components

Figure
6 shows all the components of authentication. They are
(starting in the lower left corner of the figure and roughly tracing the arrows
in the figure, which follow the walkthrough in section ‎3.1‎3.1; * marks components already discussed):

User Logon Agent: a
module that is responsible for gathering authentication information from
human users.

Token Sources (User Authentication):
a source, whether local or remote, such as the Kerberos KDC or an STS,
that verifies a logon and provides SIDs or other identifiers to represent
the logged-on principal.

*Token Sources (Claims
(groups), Token issue): a source of group and attribute information.
This information may either be obtained over a secure channel, or issued
as a token.

Translator: a
dispatcher and a collection of components, each of which verifies the
signature on a token and translates that token into an internal claim.

App Manifest: a data
structure that completely specifies an application (listing the modules of
the application and the hash of each module).

TPM: hardware
support for strong verification of application manifests and of the entire
stack on which the application runs.

App Logon: code that
compares an application being loaded into a process against the manifest
for that application and, when the two agree, assigns an appropriate SID
to that process.

*Speaks-for Engine: the
module that derives claims according to the speaks-for calculus—of primary
use in authorization but used in authentication to deduce group
memberships.

NT Token: the
existing Windows NT Token—of which there is at least one per session—containing
a collection of SIDs identifying the system on which the logon initiated,
the user, groups to which this process belongs and the application ID of
the process application. In other applications of the architecture this
will be a general security context, that is, a principal. Authentication
verifies that the user and app speak for this principal.

Other claim sources:
token or claim sources that do not fit the model of Token Sources—tokens
or claims can come from anywhere.

Cert / claim cache: a
local cache of certificates or claims (in general, tokens)—in either
external or internal form.

Transient key store: a
protected and confidential store of cryptographic keys (symmetric keys and
private keys) by which this session authenticates (proves) itself to
remote entities.

Logon (out): the
module with which this session authenticates (proves) itself to a remote
entity, including both protocols for authentication with negotiation and
the user interface that allows a human operator to decide what information
to release to the remote system (the CardSpace Identity Selector).

·It authenticates the user to the host, giving
the host evidence that the user is typing on the keyboard and viewing the
screen.

·It optionally also makes it possible for the
host to convince others that it is acting on behalf of the user without any
more user interaction. This process of convincing others is called network
logon.

There are many subtleties in user authentication that are
beyond the scope of this paper. Here are the steps of user authentication in
its most straightforward form:

1.The
user agent in the host collects some evidence that it interacted with the user,
called credentials: a nonce signed by a key or password, biometric samples
(the output of a biometric reader: measurements of fingerprints, irises, or
whatever), a one time password, etc.. Modularity here is for the data
collection, which is likely to depend on the type of evidence, and often on the
particular hardware device that provides it.

2.It
passes this evidence to logon along with the user name.

3.Logon
sends the evidence, together with a temporary logon session key Klogon,
over a secure channel to a user authentication service that understands this
kind of evidence; the service may be local, like the Windows SAM (Security
Accounts Manager), or may be remote (as in the figure) like a domain controller.
Modularity here is for the protocol used to communicate with the service.[14]

4.The
authentication service evaluates the evidence, and if it is convinced it
returns “yes, this evidence speaks for this user name”.

5.In
addition, to support single sign-on it returns tokens authoritysaysKlogonÞuser name
and authoritysaysKlogonÞuser SID.
It may also return additional information such as KlogonÞauthentication method
or KlogonÞlogon location.

Device authentication is more subtle than you think. As much
as possible, computers and other digital devices should authenticate to each
other cryptographically with tokens of the form Ksays ... As we have seen, for
these to be useful the key K must speak for some meaningful name. This
section explains how such names get established, using the example of very
simple devices such as a light switch or a thermostat. More powerful devices
with better I/O, such as PCs, can use the same ideas, but they can be much more
chatty.

It is a fundamental fact of cryptographic security that keys
must be established initially by some out of band mechanism. There are several
ways to do this, but two of them seem practical and are unencumbered by intellectual
property restrictions: a pre-assigned meaningful name and a key ferry. This
section describes both of them.

You might think that this is a lot of bother over nothing,
but consider that lots of wireless microphones and even cameras are likely to
be installed in bedrooms in apartments. Some neighbors will certainly be
strongly motivated to eavesdrop on these devices. Because the wireless channel
is a broadcast channel, the neighbor can mount a “man-in-the-middle” attack that
intercepts the messages passing between the device and your computer, and
pretends to be the device to the computer and the computer to the device.

5.6.1Device
authentication by name

For device authentication, the simplest such mechanism is
for the manufacturer to install a key K-1 in the device, give
it a name dn, and provide a certificate manufacturersaysKÞdn,
for example, HoneywellsaysKÞthermo524XN12.Honeywell.com.
In this example the out of band channel is a piece of paper with the name thermo524XN12
printed on it that comes in the box with the thermostat. After installing the
thermostat in the living room, the user goes to a computer, asks it to look
around for a new device, reads the name off the screen, compares it with the
name on the paper, and assigns the thermostat a meaningful name such as LivingRoomThermostat.
Of course a hash of the device’s key would do instead of a name, but it may be
less meaningful to the user (not that 524XN12 is very meaningful). This
protocol only authenticates the device to the computer, not the other way
around, but now the computer can “capture” the device by sending it a “only
listen to this key” message.

In many important cases this assignment needs to be done
only once, even though many different people and computers will interact with
the device. For example, a networked projector installed in Microsoft
conference room 27/1145 might be given the name projector.27-1145.microsoft.com
by the IT department that installs it. When you walk into the conference room
and ask your laptop to look around for available projectors, seeing one that
can authenticate with that name should be good enough security for almost
anyone. Because this name is very meaningful, authenticating to it is just like
authenticating to any other service such as a remote file system.

In many other important cases this assignment only needs to
be done very rarely because the device belongs to one computer, which is the
device’s exclusive user until the computer is replaced. This is typical for an
I/O device such as a scanner or keyboard.

5.6.2Device
authentication by key ferry

There are three disadvantages to pre-assigned names that
might make you want to use a different scheme:

·You might lose the piece of paper, in which case
the device becomes useless.

·You might not trust the manufacturer to assign
the name correctly and uniquely.

·You might not trust the user to compare the
displayed name with the printed one correctly (or at all, since users like to
just click OK)

The alternative to a pre-assigned name as an out of band
channel is some sort of physical contact. What makes this problem different
from peer-to-peer user authentication is that the device may have very little I/O,
and does not have an owner that you can talk to. There are various ways to
solve this problem, but the simplest one that doesn’t assume a cable or other
direct physical connection is a “key ferry”. This is a special gadget that can
communicate with both host and device using channels that are physically
secure. This communication can be quite minimal: upload a key from host into
ferry at one end; download the key out of ferry into device at the other end. The
simplest ferry would plug into USB on the host and the device.

This section explains how to authenticate applications.
While it’s also important to understand how apps are isolated so that it makes
sense to hold an app responsible for its requests, this out of scope here.

The basic idea is that apps are principals just like users:

·An app is registered in a domain, with an AppSID
and a name. This domain is typically the publisher’s domain.

·An app is authenticated by the hash of a binary
image, just as a user is authenticated by a key.

·When a host makes a new execution environment
(process, app domain, etc.) and loads a binary image into it, the new
environment gets the hash of the image (and everything that the hash speaks
for) as its ID.

·User, machine, and app identifiers can all
appear on ACLs or as group members.

Also like users, apps can be put into groups, but this is
even more important for apps than it is for users because groups are the tool
for managing multiple versions of apps. Like any group membership, the fact
that an app is a member of the group can be recorded in AD, or it can be
represented in a certificate that is digitally signed by an appropriate authority.
Like groups containing users, groups containing apps can nest to make management
easier. For example, the GoodApps group might have members GoodOffice,
GoodAcrobat,
etc.

AppSIDs are probably assigned from the same space as user,
group, and machine SIDs, though frequently the AppSIDs are from a “foreign”
domain, that of the software publisher (e.g. Microsoft). The assignment is
encoded in a signed certificate (usually in the manifest) that associates the
binary image with an AppSID and a name in the publisher’s domain.

AppSIDs can also be assigned locally by a domain or machine
administrator. This must always be done for locally generated applications, and
can be done for third party applications (where the AppSID is assigned as part
of some approval process). The application is identified by a hash just as in
the published case. The local administrator can sign a manifest just like the
publisher, or can define a group locally or in AD.

ACLs list the users, machines, and applications that are allowed
to access the resource. Sensitive resources might only be accessible through
applications in the GoodApps group. Specialized resources might only be
accessible to specific applications (plus things like backup and restore utilities).

5.7.1AppSIDs
and versions

A certificate for an app is a signed statement that says
something like “hash 743829 => MS/Word12.3.1, s-msft-word12.3.1.
Applications contain many files; a manifest
is a data structure that defines the entire contents of the application. The
manifest includes hashes of all the component files, and it’s the hash of the
manifest that defines the app.

The manifest can reference system components that are not distributed
with the app (e.g. system .dlls). Such a component is considered to be part the
platform on which the app is running, not part of the app; see section ‎5.7.2, and it is referred to by a name, which need not
change if the component is patched. There are many complications having to do
with side-by-side execution that are not relevant here; it’s the platform’s job
to ensure that the name gets bound appropriately for both security and compatibility.
In this respect an app treats a platform component just like a kernel call.

The way this is normally encoded is that the publisher
includes the principals that the app speaks for (such as MS/Word12.3.1,
s-msft-word12.3.1)
in the manifest, and then simply signs the hash of the manifest. This is just a
useful coding trick. Of course, the signer of the manifest (or other app
certificate) must be authoritative for the domain of the SID and for the name,
just as for any other speaks-for statement.

If the system trusts its file store, it can verify the
manifest at install time and cache it. This also covers cases where
installation includes updates to registry settings and such.

There may be good reasons not to change AppSIDs with each small
version change such as a patch. Changing the AppSID requires updating all
policy that references it. Some admins will want to do so; others will not. An
admin can avoid having to update lots of policy by adding a level of indirection,
defining a group and putting the AppSID for each new version into the group;
this gives the admin complete control. Publishers can make the admin’s life easier
by including multiple AppSIDs in a manifest. For example, the manifest for a
version of Word might say that it is Word, Word12, and Word12SP2
as well as Word12.3.1. In SP3, the first two SIDs remain the same.
Then Contoso ITG can say MS/Word12, MS/Word11.7.3
=> Contoso/GoodWord.
Since all trust is local, the structure of the name space for an app is in the
end up to the administrator of the machine that runs it. The job of a publisher
like Microsoft is to provide some versions and names that are useful to lots of
customers, not to meet every conceivable need.

The only assertions an app can make directly are ones
encoded in its manifest. When the app is running it depends on its host environment to provide the isolation
that is needed for an app identity to make any sense. Typically the host
environment is itself hosted, so the entire app identity is actually a stack:

StockChart

IE 7.0.1

Vista + patch44325

Viridian hypervisor + patch7654

MachineSID

At the bottom, the
machine gets its identity from a key it holds. Ideally this key is protected by
the TPM.

We could describe the identity of the app by hashing
together the hashes of all the things below it on the stack, just as we hashed
all the files of the app together in the manifest. This is probably not a good
idea, however, because if there are ten versions of each level in the stack
there will be 100,000 different versions—hard to manage. It’s better to manage
each level separately.

Access control of course sees the whole stack. Taking
account of plausible group memberships, an ACL might say GoodApponGoodOSonGoodMachine gets access, where “on” here is an informal
operator that makes a single principal out of an app running on a host. This
makes it easy for the administrator to decide independently which apps, which
OS’s, and which machines are good. Going further, the administrator might
define GoodApp
on GoodOS
on GoodMachineÞGoodStuff
and just put GoodStuff on ACLs.

Note that the policy for what stacks are acceptable might
come from the app rather than user or administrator. The main example of this
is DRM, in which some remote service that the app calls, such as the license
server, demands some kind of evidence that it is running on a suitably secure
hardware and OS. The app’s manifest might even declare its requirements, but of
course an untrustworthy host could ignore them, so the license server has to
check the evidence itself.[15]

When a running program loads some new code into itself (a
dll, a macro, etc.), it has a number of options about the appID of the
resulting execution environment. It can:

1.Use
the new code’s appID to decide not to load it at all.

2.Trust
the code and keep the same AppID the host had before. This is typically what
happens at an extensibility point, or in general when an app calls LoadLibrary.

3.Downgrade
its own AppID to reflect less trust in the new code.

4.Sandbox
the new code and add another level to the stack. Of course the credibility of
the resulting AppID is only as good as the isolation of the sandbox.

ACL entries on the operation of loading code can express this
choice. Note that when an app calls CreateProcess, for example, it
is not loading new code into itself, but asking its host OS to create a sibling
execution environment, and it’s the host’s job to assign the appID for the new
process, which might have different, even greater rights that the app that called
CreateProcess.

Simple principals
that appear in access control policy are usually human beings, devices or
applications. In many cases, two or three of these will actually provide proof
(authenticate a request). Today only one principal typically provides proof—either
a human being or a computer system. Multiple proofs of origin can be used to
strengthen security. One important example of this is combining a user
identifier and an appID. There are two main ways this can be done:

Protected
subsystem: access is granted only to the combination of two principals,
not to either of them alone—for example, opening of a file for backup can
be allowed to a registered backup operator, but only when that operator is
also running a registered backup application.

Restricted
Process: the desired access is granted only if each of the two or more
principals qualify for that access individually[16]—for example, an applet downloaded from
a web page at xyz.com
might be allowed to access things on xyz.com but not on the user’s local machine, and the
user running that applet might have access only to objects that the user
and the applet both can access.

These two ways of
combining principals correspond to and and or. The principal billgandHeadTrax is billg running the HeadTrax protected subsystem;
Windows doesn’t currently have a way to add such an appID to a security context.
The principal billgorMyDoom is billg running the MyDoom virus; in Windows today
this is a billg process
with a MyDoom restricted
token.

A Windows security
context (or NT token) is a set of SIDs that defines a principal: the and
of all those SIDs. This principal can exercise all the power that any of those
SIDs can exercise. Thus when a security context makes a request, the interpretation
is that each of the SIDs independently makes that request; if any of them is on
the resource’s ACL, the request is granted. So security contextsaysrequest is SID1 saysrequestand
SID2 saysrequest
..., which is another way of saying that security context = SID1 and
SID2 and ....

There are other
uses for compound principals made with and. Financial institutions often
demand what they call dual control: two principals have to make a request in
order for it to get access to an object such as a bank account. In speaks-for
terms, this is P1andP2Þobject. The method for making
long-term keys fault-tolerant described in section ‎5.1.2 is another example of this, which generalized and
to k-of-n.

There are also
other uses for compound principals made with or. In fact, an ACL is such
a principal. It says that (ACE1or ... or ACEn)Þobject.

5.9Capabilities

A
capability for an object is a claim that some principal speaks for the object
immediately, without any indirection. A familiar example in operating systems
is a file descriptor or file handle for an open file. When a process opens the
file, the OS checks that it speaks for some principal on the file’s ACL, and
then creates a handle for the open file. The handle encodes the claim that the
process speaks directly for reads and writes of the file, without any further
checking; this claim is encoded in the OS data structure for the handle. A
capability is thus a summary of a trust chain. Usually it has a quite
limited period of validity, in order to avoid the need to revoke it if the
trust chain becomes invalid.

For
a capability to work without a common host such as an OS, it must be in a token
of the form objectsaysPÞobject that the object issues after
evaluating a trust chain. Later P can make a request along with this
token, and the object will grant access without having to examine the whole
chain. Such a token doesn’t have to be secret, since it only grants authority
to P.

The main problem with authorization is
management. Products usually have enough raw functionality to express the
customer’s intent, but there is so much detail to master that ordinary mortals
are overwhelmed. The administrator (or user) needs a way to build a model
of the system that drastically reduces the number of items they need to configure.
The model needs to not only handle enterprise level security, but also “scale
down” to small businesses and homes where there is no professional IT
administrator, to peer-to-peer systems, and to mobile platforms and small devices.

Authorization also needs to be feasible to
implement. It needs to scale up to
the Internet, avoiding algorithms and data structures that only work for
intranet-sized systems or that depend on having a single management authority
for the whole system. Everything that works locally should work on the
Internet. Authorization needs to support least
privilege, by taking account of application as well as user identity, so
that trusted apps can get more privileges and untrusted ones fewer; this must
work even though apps come in many versions and are extensible. And it needs to
be efficient: fast in the common
case and reasonable in complex cases, even in a large system; it needs to
identify problem cases so that people setting policy can avoid them.

6.1Overview

The underlying semantics of authorization is
the notion of “speaks-for”: there is a chain of principals, starting with the
principal making a request (typically a channel on which the request is transmitted
or an encryption key that signs the request) and ending with the resource. For
example:

We call the part of this chain closer to the user
“authentication”, and the part closer to the resource “authorization”. This
division is somewhat arbitrary, since there is no sharp dividing line.

In order
to make authorization more manageable, you can build a model that collects resources
into scopes and defines roles, each with a set of
predefined permissions to execute operations on the resources in the scope. In
addition, you can build a template
for a scope and its roles, and then instantiate the template multiple times for
different collections of resources that have the same pattern of authorization
policy. Figure
7 is an overview that shows the main steps in
specifying and checking authorization.

This model-based access control (MBAC)
organizes resources into scopes and
principals making requests into roles.

1.The developer or IT architect defines templates for scopes and roles that can
be used repeatedly in similar situations.

2.The administrator or owner makes instances of these templates, groups
resources into scopes, and assigns principals to roles.

The remainder of the picture shows how to
implement the policy that the model defines.

3.The system compiles or synchronizes the model’s policy into groups, claims, and ACLs on resources
used to do access checks efficiently. When a service starts it acquires its own
identity and resource groups, along with those of its enclosing execution environments
(OS, device, etc.)

4.The user logs in to a service and acquires
groups and claims from the directory or STS to add to the identifiers she
already has. The system combines these with resource manager claims and service
trust policy to obtain a set of principals that the service thinks the user
speaks for.

5.Finally, the set of principals is checked
against the ACL for the resource the user is trying to access.

The templates and instances are part of
MBAC. The acquisition and access check are part of implementation. The model
and implementation are connected when the policy is synchronized.

6.2Model-Based
Access Control (MBAC)

The idea of MBAC is to make authorization
policy accessible to ordinary mortals; think of it as Excel for authorization.
The main customer pain point is that security management is too hard. There are
thousands of security knobs (individual ACLs, privileges, resource names, etc.)
on each computer, and in a large installation there are thousands of computers.
No human can keep that number of separate objects in mind. The model conceals
the complexity of the underlying implementation from users and administrators
(though they can dive down into individual groups and ACLs if they really need
to).

MBAC shines when complex policies apply to
multiple objects. It reduces repetitive manual effort by the administrator, and
makes it easy to find out what the policy is after a long history of
incremental changes. Our examples are necessarily contrived, since something
simple enough to put in this paper is simple enough to do manually. So use your
imagination to see how the reduction in administrative work is actually substantial
for real world scenarios.

Figure 8: The admin sees two scopes, emerald
and amber; both are instances of
a project repository template.
A project has two roles, dev and pm.
Sondra is a dev for emerald and a pm
for amber.

Figure
8 shows the administrator’s view of a model for part of
a system—two project repositories that are scopes for resources, one for the emerald
project and one for the amber project. Each project has
two roles: one for PMs and one for devs. When deploying a project repository you
create a group for each role, containing the users who are in that role for
that project. Thus a scope is a collection of resources, and a role is a collection
of principals.

This is a simple model—the admin just puts a
user, such a Sondra, into the correct group, and all the permissions and
memberships are created as a consequence. The actual situation might be
messier, as Figure
9 shows. Administering this manually would be quite difficult,
but with MBAC the administrator doesn’t have to worry about the mess when
configuring authorization policy.

Someone has to worry, of course, and that
person is the designer of the template, typically a developer or an IT architect.
Figure
10 shows the SharePoint
template and the emerald.specs scope that is an
instance of it. Such a leaf scope corresponds to an instance of a service along
with (a subset of) its resources. The developer of the service, in addition to
coding the service, creates a scope template that defines the roles for
the service. A role determines the permissions for a user in that role. Each
role is tailored to enable a user to perform some task—like being a teller, or
an HR benefits clerk, or in this example, a contributor or viewer of documents
on a SharePoint server. A viewer can read documents; a contributor can edit
documents, and also is a viewer (this is an example of role nesting). These
predefined roles determine the combination of permissions that get tested, to
make sure that they correctly enable the desired tasks. Thus the developer or
IT architect is responsible for all the details of authorization policy within
the scope. From the point of view of the administrator, all the ACLs are
immutable.

Figure 10: A template and an instance emerald.specs
for Sharepoint; Sondra is a viewer.

The administrator instantiates the scope
template to create a scope. The same template can be used to create many scopes.
Figure
10 shows one of these, in which the contributor and
viewer roles have the same permissions for the SharePoint resource in the scope
that the corresponding role templates had in the template. The administrator
has put Sondra into the viewer role for the emerald.specs
scope. Each scope precisely mirrors the scope template and has the resources,
roles, and permissions defined in the template, just as each instance of a
class in an object oriented programming language precisely mirrors the class
definition.

An IT architect can create higher level templates.
In Figure
11 SharePoint is used to create the project repository
we described earlier. The project has two subparts, called
specs and source. The PM
role is assigned to the contributor role in the specs
server, and the viewer role in the source
server. A part’s roles constitute the interface that it exports to containing
scopes. The smallest parts are actual services such as SharePoint;
composite parts such as project contain subparts. The
architect can nest these as deeply as necessary. We expect that there will be a
market for templates that are useful to more than one organization.

Figure 11: Build bigger parts from smaller
ones. The specs and sourcescope templates are
SharePoint scope templates that are parts of the outer project
scope template, and the inner contributor and viewer
role templates are populated from the outer pm
and dev ones.

Because the IT architect defines this for
all project repositories, all the admin has to do is instantiate the model; she
no longer needs to understand all of the details. Two instances of the project
template called emerald and amber
would get us back to Figure
8.

6.3The
model and the real world

This section explains how the model is connected
to the code and data in the real world that it is modeling. Although usually we
ignore the distinction between the model and the real world, in this section we
need to be clear about it, so we call the real world thing that corresponds to
an object in the model its entity.

The goal is to keep the model and the real
world synchronized, so that changes in entities (and especially creation of new
entities) are reflected in the model, and the access control policy set by the
model is reflected in its entities. There are three basic issues in synchronization:

1.Naming:
An object in the model and its entity in the real world are not necessarily
named in the same way.

2.Delay:
An object and its entity are supposed to be in sync, but there may be some delay.

3.Aggregation:
When entities change, how are the changes aggregated for notifying the model.

6.3.1Naming: Paths and handles

Objects are named by paths: sequences of
field names and queries (for selecting an object from a set-valued field).
Entities are named by handles, which are opaque from the viewpoint of
the model. The handle must have enough information to enable secure communication
with the root entity.

Because paths and handles are different in
general, there has to be a way to map between them. In particular, if the model
wants to refer to an object’s entity, it needs the entity’s handle. Similarly,
if an entity wants to refer to its object, it needs the object’s path. We take
the view that MBAC should work without any changes to entities, as long as they
have some sort of interface that is adequate for implementing the get,
set, and enum methods described below.
Thus the model needs to keep track of each object’s handle, which it can do by
storing it as part of the object.

In some cases a path may itself be a
suitable handle. For example, the model for a file system has objects that correspond
to directories and files with isomorphic names. Thus a directory object do
has a set-valued contents field whose elements
are the files and directories in do, each with a name
field. So a file with pathname a\b corresponds to the object
whose path is contents?{.name=”a”}.contents?{.name=”b”}.
As this example illustrates, a path may include queries, and hence to use a
path as a handle the entities have to be able to understand a query well enough
to follow a path. The simplest kind of query has the form [.name =
“foo”], where name is a primary key, and this
shouldn’t be too hard for an entity.

6.3.2The model is in charge

The model can read, and perhaps change, the
abstract fields of an entity that correspond to fields of the model by invoking
the get and set
methods of a corresponding object: obj.get(f) allows
the model to read the value of field f in the
entity, and obj.set(f, value) allows the
model to set the access control policy of the entity. If f
is an object, get returns a handle to that
object; see below. If a field is a large set, these methods are not suitable,
so set fields have a different method: obj.enum(f, i)
returns a handle to the ith element of the set, or nil
if it has fewer elements (along with a generation number that increases every
time something happens to change the object numbering). To change the
membership of the set you use operations on the containing scope, such as create.
Using these APIs a model can fully explore its entity (as long as the entity
isn’t changing too fast), learn the handles of all the entities, fill in all the
fields of the model, and tell the entity the values of any fields that are
determined by the model (normally roles).

In order to use MBAC, an entity must
implement these APIs. It may also need to implement query
and assign APIs to deal efficiently
with large sets of objects. To reflect changes to the entity in the model more
efficiently than by polling we may also want a change log. Entries in this log
are (h, f) pairs, meaning that
field f of entity h
has changed.

6.3.3Notification and aggregation

With these APIs the only way for the model
to find out about changes in the entities is to do a crawl, that is,
read out the entire state again with get and enum.
This seem impractical for models of any size, so it’s necessary to have some
kind of change notification. Notification has three issues:

1.It
has to be extremely reliable, since if any changes are missed the model’s state
will diverge from reality, and the only way to get it back in sync is do to a
crawl.

2.The
entity’s name space is handles, so it can only report changes in terms of
handles. These have to be mapped to paths.

3.It
might be desirable to aggregate all the notifications below some point in the
tree.

6.4Scale
Up

Current OS authorization mechanisms can
scale quite well to enterprises (one Windows AD installation exists that holds
6 million users, for example). They need some work, however, if they are to
scale to the Internet, both because things can get much bigger on the Internet,
and because there’s no single management authority that is universally trusted.

There are some basic features of access
control that are important for scaling up:

1.All authentication and authorization statements
(speaks-for statements) can be represented in three different ways:

·They can be stored locally (for example, in the trust
root).

·They can be held in a database on the network
(for example, active directory) and delivered over a secure authenticated connection.

·They can be expressed in a digitally signed
certificate (for example, X.509 or SAML tokens), which can be stored and
forwarded among the various parties in the transaction.

The
first and third ways permit offline operation and offload of online services
(caching). The third way means that claims can be transmitted via untrusted
parties.

2.All principal identifiers that are passed from
one system to another are globally unique. This means that there’s no ambiguity
about the meaning of an identifier.

3.Any system or domain can make use of statements
from any other domain. It is trust policy, rather than domain boundaries, that
distinguishes friend from foe.

4.There is an unavoidable tradeoff among
freshness, availability, and performance. If you want the latest information
about whether a key is revoked, for example, you cannot proceed if the source
of that information is unavailable, and you must pay for the communication to
get it. This tradeoff should be controlled by policy, rather than being baked
in. For example, here are two possible policies for key revocation:

·Fail without a fresh OCSP for every access.

·If OCSP isn’t available, treat all cached
statements as valid for some period.

Neither one is unconditionally better than the other; it’s a
matter for administrators’ judgment to choose the appropriate one.

In
addition to these general principles, there are two topics that require special
attention in scaling to the Internet:

·Trust in attribute claims made by other
authorities.

·Handling groups, because both the number of
groups that a principal belongs to and the total size of a group can become extremely
large.

6.4.1Scale Up: Attribute Claims

An attribute differs from a group in two
ways:

·It can have a value associated with it, for
example, birthdate.

·There may not be a single authority responsible
for its definition. For example, birthdates may be certified by any one of 50
state driver’s license issuing authorities.

For scaling up, only the second point is important. The first
one is handled by conditions.

It is a system’s trust policy that handles
attributes from other authorities. For example, consider using a driver’s license
from another state to verify date of birth at a bar in New York. It’s convenient
for states to agree on the string name of this property. Oasis.org
is a standards organization, and we will use oasis.org/birthdate
as the standard name.

The first step is for the bar’s trust policy
to say what the primary authority is for this property:

KNYÞ
oasis.org/birthdate

Then the primary authority says which other sources to trust:

KNY says KWA/oasis.org/birthdate
Þ
oasis.org/birthdate

This says that New York believes Washington about birth dates.
If they have a broader agreement, New York might believe Minnesota about all
properties defined by oasis.

KNY
says KMN/oasis.org/* Þ
oasis.org/*

Name translation can be done, too. Suppose
Illinois doesn’t adopt the oasis name:

KNY says KIL/DOB
Þ
oasis.org/birthdate

6.4.2Scale Up: Group Claims

Group membership is a scaling problem today,
at least in large organizations. The reason is that a user can be a member of
lots of groups, and a group can have lots of members. Today Windows manages
this problem in two ways:

·By distinguishing client and resource
groups (also called domain global and domain local groups in Windows), and
imposing restrictions on how they can be used.

·By allowing only administrators to define groups
used for security.

Figure 12: Corporate subscribers can access
CACM online. The arrows are group membership.

Figure
12 illustrates the problem. Imagine that ACM creates a
group of corporate subscribers to its online digital library. There are 1000 corporate
members, each with 10-1,000,000 employees, for a total of millions of individual
members. Furthermore, every Microsoft employee may implicitly be a member of
thousands of such groups, since Microsoft subscribes to lots of services. Thus
a client may be in too many groups to list, and a resource may define a group
with too many members to list.

In addition, there may be a privacy problem:
the client may not want to disclose all its group memberships, and the server
may not want to disclose all the groups that it’s using for access control.

This is the group expansion, or path
discovery, problem. The solution that Windows adopts today, and that we generalize,
is to distinguish two kinds of groups:

·Client
groups (also called push groups),
which the client is responsible for asserting when it contacts the resource. An
individual identifier is a special case of a client group. Thus in Figure 12, the client groups are green: billg,
FTE-Redmond6, and MicrosoftFTE.
A requestor’s client groups are thus known to all resources (subject to privacy
constraints), but there can only be a limited number of them.

·Resource
groups (also called pull groups),
which the resource is responsible for keeping track of and expanding as far as
client group members. In Figure
12 the resource groups are blue: ACMCorpSubs
and CACMAccess. The resource thus
knows all the client groups that are members, but there can be only a limited
number of them.

A client group can only have other client
groups as members. This means that there can be only one transition from green
to blue in the figure. The client asserts all its client group memberships, and
the resource expands its resource groups to the first level of client groups.
Consequently, if there is any path
from the client to the resource, what the client presents and what the resource
knows will intersect and the resource will know it should grant access.

Client groups are a generalization of
today’s domain global groups in AD. Unlike domain global groups, client groups
can have members from other domains, but the client must know all the client groups it belongs to so that it can assert
them, because the resource won’t try to expand client groups.

Resource groups are a generalization of
today’s domain local groups in AD. Unlike domain local groups, resource groups
can be listed on the ACL of any resource so long as the resource has permission
to read the group membership. It’s the resource administrator’s job to limit
the total size of the group, measured in first-level client groups. The resource
may cache the membership of third
party resource groups.

An added complication is that today Windows
eagerly discovers all the resource groups in a domain a client belongs to when
the client connects to any resource in the domain. This makes subsequent access
checks efficient, and the protocols allow the client and the resource to
negotiate at connection time, but if the domain is big (for example, if it contains
lots of big file servers) there might be too many resource groups. To handle
this, resources may use smaller resource scopes than an entire domain – for
example, a service.

To sum up, the way to handle large-scale
group expansion is by distinguishing client and resource groups. This extends
what Windows does today in five ways:

1.The
client and resource can negotiate what group memberships (or other attributes)
are needed.

2.Both
client and resource can query selected third parties for groups.

3.Both
client and resource can cache third party groups. The client must do this,
since it must assert all its client groups.

4.The
resource can use a smaller scope to limit the number of resource groups that
get discovered when the client connects.

5.The
client can be configured to know which groups the resource requires.

Appendix: Basic facts about cryptography

Distributed computer security depends heavily on
cryptography, since that is the only practical way to secure communication
between two machines that are not in the same room. You can describe
cryptography at two levels:

·Concrete: how to manipulate the bits

·Abstract: what the operations are and what properties
they have

This section explains abstract cryptography; you can take it on
faith that there are concrete ways to implement the abstraction, and that only
experts need to know the details.

Cryptography depends on keys. The essential idea is that if
you don’t know the key, you can’t do X, for various values of X. The key is the
only thing that is secret; everything about the algorithms and protocols is
public. There are two basic kinds of cryptography: public key (for example, RSA
or elliptic curve) and symmetric (for example, RC4, DES, or AES). In public key
(sometimes called asymmetric) cryptography, keys come in pairs, a public
key K and a secret key K-1. The public key is public,
and the secret key is the only thing that is kept secret. In symmetric crypto
there is only one key, so K = K-1.

Cryptography is useful for two things: signing and sealing.
Signing provides integrity: an assurance that signed data hasn’t changed since
it was signed. Sealing provides secrecy: only the intended recipients can learn
any of the bits of the original data even if anyone can see all the bits of the
sealed data.

For signing, the primitives are Sign(K-1,
data), which returns a signature, and Verify(K,
data, signature), which returns true if and
only if signature = Sign(K-1, data).
The essential property is that to make a signature that verifies with K
requires knowing K-1, so if you verify a signature, you know
it was made by someone that knew K-1. With public key, you
can verify without being able to sign, and everyone can know K, so the
signature is like a network broadcast. With symmetric crypto, anyone who can
verify can also sign, since K = K-1, so the signature
is basically from one signer to one verifier, and there’s no way for the
verifier to prove just from the signature that the signature came from the
signer rather than from the verifier itself.

For sealing, the primitives are Seal(K,
data), which returns sealed data, and Unseal(K-1,
sealedData), which returns data if and only if sealedData =
Seal(K,
data). The essential property is that you can’t learn any bits of data
(other than its length) from sealedData unless you know K-1.
With public key, anyone can seal data with K (since K is public)
so that only one party can unseal it; thus lots of people can send different
secrets to the same place. With symmetric crypto, the sealing is basically from
one sealer to one unsealer.

There’s a trick that uses public key sealing to get the
effect of a signature in one important case; it’s the usual way of using a certificate
to authenticate an SSL session. Suppose you have made up a symmetric key K
(usually a session key) and you want to know KÞP, That is, any messages
signed with K that you don’t sign yourself come from another party P.
Suppose you have a certificate for P, that is, you know KPÞP. This means that only P knows K-1.
The usual way to authenticate K is to get a signed statement KPsaysKÞP from P. Instead, you
can compute SK = Seal(KP, K)
and send it to P in the clear. Only P can unseal SK, so
only P (and you) can know K.

[1]
My colleagues Martin Abadi, Carl Ellison, Charlie Kaufman, and Paul Leach made
many suggestions for improvement and clarification. Some of these ideas originated
in the Taos authentication system [‎4, ‎6].

See the appendix for a sketch of what you need to
know about cryptography.

[3]
Saying that the workstation signs with the public key Klogon
means that it encrypts with the corresponding private key. Through the magic of
public-key cryptography, anyone who knows the public key can verify this
signature. This is not the only way to authenticate an SSL connection, but it
is the simplest to explain.

[4]
Intel can do this with an X.509 certificate, or by responding to a query “Is KAlice
the key for Alice@In­tel.com?”,
or in some other secure way.

[5]
Programs usually can deal only with identifiers, not with the real-world principals
that they denote. In this paper we will ignore this distinction for the most
part.

[6]
For a symmetric key we can use a hash of it as the public name of the channel,
though of course this is not enough to verify a signature.

[7]
Sometimes people call age=32 an “attribute-value” or an “attribute-value pair”,
and call age an “attribute”. This is perfectly good
English; it might even be better English than calling age=32an attribute. But it
is confusing to have both meanings for “attribute” floating around. In this
paper, “attribute” means the pair age=32, and age is the
attribute name. Sometimes we say “the age attribute”, meaning an attribute
whose name is age.

[8]
This is not the only meaning of ‘group’ in English, in computing, or in
security, but it is the usual meaning and the one we adopt.

[9]
The hash of some data is long-lived in the sense that it won’t change. However,
the hashes that are important for access control are hashes of code, and the
hash of code that you care about changes frequently, because of patches and new
versions. So in practice a hash has a much shorter lifetime than many keys.

[10]
Preferring names would also work, and it would be simpler since there would be
no need for the SID↔name correspondence, but it leads to inconvenience
when a name changes, and to insecurity when a name is reused.

Preferring keys seems
appealing at first, since although it needs a key↔name correspondence, it
doesn’t need anything else. Unfortunately, it’s insecure when a key is compromised,
unless the key in policy is no longer treated as a direct identifier but rather
as something that can be mapped reliably to a key that is currently valid. Doing
this makes it harder to handle than a SID. Since you can’t tell by looking at
it whether a key has been compromised, you have to do this work every time.

[12]
More precisely, the view is some function v of B’s state sB,
and A knows v(sBpast), where sBpast
is some past value of sB. A is consistent with B
if v(sBpast) = v(sB).

[13]
Sometimes a special kind of lock called a lease is acceptable; this is a
lock that times out. A lease prevents its issuer from changing the state until
either the leaseholder releases it, or the lease times out. People usually
don’t use leases for security information, but they could.

[14]
You might think that one protocol could work for any kind of authentication
factor. There are two reasons for using different protocols. One is purely
historical: existing services used particular protocols. The other is that some
protocols, such as Kerberos, depend on the fact that the workstation has a key
that it can use to communicate secrets to the service. In Kerberos, for
example, the user’s password is the source for such a key. Biometric samples
don’t work. Other protocols, such as SSL, create a secure channel to the
service and authenticate it starting with nothing but a trust root entry for a
generic authority such as Verisign. As far as I know, SSL secure channel setup
together with conventions for finding the service to use, encapsulating the
evidence, and allowing for interaction between the user and the service would
be a universal protocol.

[15]
The app itself could also demand properties from its host, but since the host
has complete control over the app, this demand could not be enforced very
securely. Ideally the evidence for the license server is a chain of
certificates rooted in the hardware TPM’s key.

[16]
This kind of access is provided today in Windows by the restricted token,
in which one has effectively two NTtokens, one for the user’s principals and
one for a service ID. AccessCheck
is called with each of those tokens and the Boolean results of those calls are
then anded.