Something You Know, Have, or Are

Methods
for authenticating people differ significantly from those for
authenticating machines and programs, and this is because of the major
differences in the capabilities of people versus computers. Computers are great at
doing large calculations quickly and correctly, and they have large
memories into which they can store and later retrieve Gigabytes of
information. Humans don't.
So we need to use
different methods to authenticate people.
In particular, the cryptographic protocols we've already discussed
are not well suited if the principal being authenticated is
a person (with all the associated limitations).

All approaches for human authentication rely on at least one of the following:

Something you know (eg. a password).
This is the most common kind of authentication used for
humans. We use passwords every day to access our systems.
Unfortunately, something that you know can become something you
just forgot. And if you write it down, then other people might find it.

Something you have (eg. a smart card).
This form of human authentication removes the problem of
forgetting something you know, but some object now must be with you
any time you want to be authenticated. And such an object might be stolen
and then becomes something the attacker has.

Something you are (eg. a fingerprint).
Base authentication on something intrinsic to the principal
being authenticated. It's much harder to
lose a fingerprint than a wallet. Unfortunately, biometric sensors
are fairly expensive and (at present) not very accurate.

We now explore each category in depth.

Something You Know

The idea here is that you know a secret --- often called
a password --- that
nobody else does.
Thus, knowledge of a secret distinguishes you from all other individuals.
And the authentication system
simply needs to check to see if the person claiming to be you knows
the secret.

Unfortunately, use of secrets is not a panacea.
If the secret is entered at some sort of keyboard,
an eavesdropper
("shoulder surfing") might see the secret being typed.
For authenticating machines, we used
challenge/response protocols to avoid sending a secret (key) over the
wire where it could be intercepted by a wiretapper.
But we can't force humans to engage in a challenge/response protocol on
their own, because people cannot be expected to do cryptographic calculations.

Furthermore, people will tend to choose passwords that
are easy to remember, which usually means that the password is easy to guess.
Or they choose passwords that are difficult to guess but are also difficult
to remember (so the passwords must be written down and
then are easy for an attacker to find).

Even if a password is not
trivial to guess, it might succumb to an offline search of the
password space. An offline search needs some way to check a guess
without using the system itself, and some methods used today for storing
passwords do provide such a way. (See below.)

Finally, changing a password requires human intervention. Thus,
compromised passwords could remain valid for longer than is desirable. And
there must be some mechanism for resetting the password (because
passwords will get forgotten and compromised). This mechanism could
itself be vulnerable to social-engineering attacks, which rely
on convincing a human with the authority to change or access information
that it is necessary to do so.

With all these concerns about passwords, you might wonder what is
required for a password to be considered a good one.
There are three dimensions, and they interact so that strengthening one can
be used to offset a weakness in another.

Length.
This is the easiest dimension for people to strengthen.
Longer passwords are better.
A good way to get a long password that is seemingly random
yet easy to remember is to think of a passphrase (like the first
words of a song) and then
generate the password from the first letters of the passphrase.

Character set.
The more characters that can be used in a password, the greater the number of
possible combinations of characters, so the larger the password space.
To search a larger password space require doing more work by an attacker.

Randomness.
Choose a password from a language (English, say) and
an attacker can leverage regularities in this language to
reduce the work needed in searching the password space
(because certain passwords are now "impossible").
For instance,
given the phonotactic and orthographic constraints of English, an
attacker searching for an English word need not try passwords containing
sequences like krz
(although this would be a perfectly reasonable to try if the
password was known to be in Polish).
Mathematically, it turns out that English has about 1.3 bits of
information per character. Thus it takes 49 characters to get 64
bits of "secret",
which comes out to about 10 words (at 5 characters on average
per word).

When passwords are used for authenticating a user,
the system must have a way to check whether the password
entered is valid.
Simply storing a file with the list of usernames and associated passwords,
however, is a bad idea because if the confidentiality of this file were ever
compromised all would be lost.
(Similarly, backup copies of this file would
have to be afforded the same level of protection, since people rarely
ever change their passwords.)
Better not to store actual passwords on-line.
So instead we might compute a cryptographic hash of the
password, and store that.
Now, the user enters a password;
the system computes a hash of that password;
and the system then compares that hash with what has been stored in the
password file.

Even when password hashes instead of actual passwords are what is being stored,
the integrity of this file of hashes must still be protected.
Otherwise an attacker could insert a different hash
(for a password the attacker knows) and log into the system using that
new password.

The problem with having a password file that is not confidential --- even
if cryptographic hashes are what is being stored --- is the possibility of
offline dictionary attacks.
Here, the attacker
computes the hash of every word in some dictionary and then compares each
hash with the stored password hashes.
If any match, the attacker
has learned a password.
An alternative to confidentiality for defending against offline dictionary attacks
is use of salt. Salt is a random number that is associated
with a user and is added to that user's password
when the hash is computed. With high probability, a given
pair of users will not have the same salt value.
And the system stores both h(password + salt) and the salt for each
account.

Salt does not make it more difficult for an attacker to guess the
password for a given account, since the salt
for each account is stored in the clear.
What salt does, however, is make it harder for the
attacker to perpetrate an offline dictionary attack against all users.
When salt is used,
all the words in the dictionary would have to be rehashed for every user.
What formerly could be seen as a "wholesale" attack has been transformed
into a "retail" one.

Salt is used in most UNIX implementations.
The salt in early versions of UNIX was 12 bits,
and it was formed from the system time and the process
identifier when an
account is created.
Unfortunately, 12 bits is hopelessly small, nowadays.
Even an old PC can perform 13,000 crypt/sec, which means
such a PC so can hash a 20k word dictionary with
every possible value of a 12 bit salt in 1 hour.

Secret Salt

Another defense against offline dictionary attacks
is to use secret salt (invented by
Manber and independently by Abadi and Needham).
In this scheme, we select a small set of possible "secret salt" values from a large
space.
The password file then stores for each user:
userid, h(password, public salt, secret
salt), public salt.
Note that the value of the secret salt used in computing the hash is not
saved anyplace.
When secret salt is being employed, a user login involves having the system guess
the value of secret salt that was used in computing the stored, hashed password;
the guess involves checking through the possible secret salt values.
The effect is to make computing a hashed password very expensive for attackers.

Examples of Password Systems

We now outline several widely-used password systems.

Unix.
Unix stores a hashed salted password and salt. For
the hash, it iterates DES 25 times with an input of "0" and with the
password as the key; it then adds the 12-bit salt.
As discussed above,
this is not strong enough for today's machines. Some
versions of Unix employ a shadow password file, so that it is harder
for an attacker to retrieve the hashed passwords.
There are then two files:
/etc/shadow and /etc/master.password.

FreeBSD.
FreeBSD stores a hashed password (where the hash is based on
MD5). There is no limit to the length of the password, and 48
bits of salt are used.

OpenBSD.
OpenBSD does a hash based on blowfish encryption, and then stores
the hashed password along with 128 bits of salt. The system guarantees
that no two accounts will have the same salt value.

Windows NT/2000/XP.
NT stores 2 password hashes: one called the LanMan hash and
another called the NT hash. The LanMan hash is used for backwards
compatibility with Windows 95/98, and it is a very weak scheme. The
following diagram shows how it works.

To see the weakness, consider how much work an attacker would have to
do to break this scheme. The numbers and uppercase letters together
make up 36 characters. Each half of a 14-character password then has
367 possible values, which comes out as 78,364,164,096.
The actual work factor then is 2 x 367 (whereas
the theoretical work factor for 14 characters is 3614 =
367x 367).

Note that if upper and lower case were both allowed, then there would
be (2 x 26) + 10 = 62 possible characters and thus
627 = 3,512,614,606,208 possible values, which is 100 times
greater than the LanMan value.

The NT hash is somewhat better. In the NT operating system, there was
still a 14 character limit, although this limit was removed in Windows
2000 and XP. The password is then passed through 48 iterations of MD4
to get a 128 bit hash. This hash is stored in the system, but no salt
is used at all.

Defense Against Password Theft: A Trusted Path

Given schemes that make passwords hard to guess,
an attacker might be tempted to try theft.
The attack is:
install some some sort of program to produce a window that resembles
a login prompt or otherwise invites the user to reveal a password.
Users will then type their passwords into this program,
where the password is saved for later use by the attacker.

How can you defend against such attacks?
What we would like is some way for a user to determine the
pedigree of any window purporting to be a login
prompt. If each point in the pedigree is trusted, then the login prompt
window must be trusted and it is safe to enter a password.
This idea is called a trusted path.

To implement a trusted path, the keyboard driver
recognizes a certain key sequence (Ctl-Alt-Del in Windows) and
always then transfers control to some trusted software that displays a
(password prompt) window and reads the contents.
Users are educated to type passwords only into windows that appear
after typing that special key sequence.

Notice, however,
that this scheme requires that a trusted keyboard driver is executing.
So, that means the system must be running an operating system that is trusted
to prevent keyboard driver substitutions.
One might expect that rebooting the machine would be a way to ensure that
a trusted operating system is executing (presuming you trust whatever
operating system is installed),
but what if the OS image on the disk had been altered by an attacker?
So, one must be certain that the operating system software
stored on the disk has not been modified, too.
But even that's not enough.
What about the boot loader, which might have been altered to read a
boot block from a non-standard location on the disk?
And so it goes.
Even if you start each session by booting from your own fresh OS CD,
a ROM or even the hardware might have been hacked by an attacker.
Physical security of the hardware then must also have been maintained.
In the end, though, to the extent that you can trust all layers from the
hardware to the keyboard driver,
the resulting trusted path provides a way to defend against attacks implemented
by programs that attempt
to steal passwords by spoofing.

Something You Have

Instead of basing authentication on something a principal knows and can
forget, maybe we should base it on something the principal has.
Various token/card technologies support authentication along these lines.
For all, 2-factor
authentication becomes important --- an authentication
process that involves 2 independent means of authenticating the principal.
So, we might require that a principal not only possess a device but also know
some secret password (often known as a PIN, or personal identification number).
Without 2-factor authentication,
stealing the device would allow an attacker to impersonate the owner of the device;
with 2-factor authentication, the attacker would still have another authentication
burden to overcome.

Here are examples of technologies for authentication based on
something a principal might possess:

A magnetic strip card. (eg. Cornell ID, credit card)
One serious problem with these cards is that they are fairly easy
to duplicate. It only costs about $50 to buy a writer, and it's easy
to get your hands on cards to copy them. To get around these
problems, banks implement 2-factor authentication by requiring knowledge
of a 4 to 7 character PIN whenever the card is used.

Short PINs are problematic.
First, they admit guessing attacks.
Banks defend against this by limiting the number of guesses before they
will confiscate the card.
Second there is the matter of how to check if a PIN that has
been entered is the correct one.
Storing the PIN on the card's magnetic stripe is not a good idea
because a thief who steals the card can easily determine the
associated PIN (and then subvert the 2-factor authentication protocol).
Storing an encrypted copy of the PIN on the card's magnetic stripe does
not exhibit this vulnerability, though.

Proximity card or RFID.
These cards transmit stored information to a monitor via RF.
There is currently
a debate in this country as to the merits of using RF proximity cards
(RFID tags) for identification of people and products.
Walmart speaks about puttung RFID tags on every
product they shelve, and both the German and U.S.
governments are including them in passports.
With RFID tags on Walmart products, for example. then somebody with a suitable receiver
could tell what you have purchased (even though your purchase is
hidden in a bag) --- and this is seen by some as a privacy violation.
With RFID tags in passports, somebody with a suitable receiver could remotely
identify on the street citizens of a given country and single them out for
"special treatment" (likely unpleasant).

There are two types of RF proximity cards: passive and active. The
former is not powered, and use the RF energy from the requester to
reply with whatever information is being stored by the card.
The latter is powered and broadcasts information, allowing anyone
who is in range and has a receiver to query the card.
You could imagine that if RF tags are put into
passports,
then some people might start carrying them in special Faraday-cage
passport holders, because now an interloper can learn about someone
without the victim's knowledge (or permission).

Challenge/Response cards and Cryptographic Calculators.
These are also called smart cards and perform some sort
of cryptographic calculation.
Sometimes the card will have memory, and sometimes it will
have an associated PIN.
A smart card transforms the authentication problem for humans, because
we are no longer constrained by stringent computational and storage
limitations.
Unfortunately, today's smart cards are vulnerable to power-analysis attacks.
Furthermore, one must exercise care in using a cryptographic calculator --- if it
is used to generate digital signatures, for example, then somehow the
device owner must be made
aware of what documents are being signed.

One prevalent form of smartcard is the RSA secure id.
It continuously displays encrypted time;
and each RSA secure id encrypts with a different key.
Whoever has an RSA secure id card responds to server challenges by
typing the encrypted time (so, in effect, it is secret) ---
a server, knowing what key is associated with each user's
card, can then authenticate a user.
(The server must be somewhat generous with respect to what
times it will accept.
Accept too many and replay attacks become possible;
accept too few and message delivery delays and execution times prevent people
from authenticating themselves).

Something You Are

Since people forget things and lose things, one might
contemplate basing an authentication scheme for humans on something
that a person is.
After all, we recognize people we interact with not because of some password
protocol but because of how they look or how they sound --- "something they are".
Authentication based on "something you are"
will employ behavioral and physiological
characteristics of the principal. These characteristics must be easily
measured accurately and preferably are things that are difficult to spoof.
For example, we might use

Retinal scan

Fingerprint reader

Handprint reader

Voice print

Keystroke timing

Signature

To implement such a biometric authentication scheme
some representation for the characteristic of interest is stored.
Subsequently, when authenticating that person, the characteristic
is measured and compared with what has been stored.
An exact match is not expected, nor should it be because of error
rates associated with biometric sensors.
(For example, fingerprint readers today normally exhibit
error rates upwards of 5%.)

Methods to subvert a fingerprint reader give some indication of
the difficulties of deploying unsupervised biometric sensors as the sole
means of authenticating humans.
Attacks include:

Steal a finger.
Difficult to do without the owner of the finger noticing.
Good supervision of the biometric sensor defends against this attack.

Steal a fingerprint.
Lifting a fingerprint is not that hard (at least, according to those
TV crime-drama shows).
Again, though, good human
supervision of the biometric sensor defends against this attack
because a guard will notice if somebody is not inserting a naked finger into
the reader.

Replace the biometric sensor.
At first glance, this type of attack might seem even more difficult
to execute than the two above.
Social enginnering might be easier for the attacker to employ, here, though.
It suffices that the guard believe that the senor should be changed
(maybe because the the old one is "broken").

There are several well known problems with biometric-based authentication
schemes:

Reliability of the method.
Similarity of physical features (faces, hands, or fingerprints) and
inaccuracy of measurement may together conspire to create an unacceptably
high false acceptance rate (FAR).

Cost and availability.
Currently, some readers cost $40-50 and more.
Are end users willing to pay that much for an authentication method
that does not work as well as passwords?

Unwillingness or inability to interact with biometric input devices.
Some people are uncomfortable putting a body part into a machine;
some are uncomfortable having lasers shined in their eyes for a retinal scans;
and some don't have fingers or eyes to be measured.

Compromise the biometric database or system.
It might be possible to circumvent the system's biometric sensor and provide
an "input" from another source.
The sensor is, after all, connected to a system and
hijacking that channel might be possible.
Knowledge of the stored representation for a characteristic would then
allow an attacker to inject the correct characteristic and impersonate anyone.

Revocation.
What does it mean to revoke a fingerprint?

The literature on biometric authentication uses the following vocabulary
to characterize what a scheme does and how well it works:

FAR: (false acceptance rate). This is the probability that the
system will fail to reject an impostor (aka FMR: false match rate)

FRR: (false reject rate). This is the probability that the system
will reject a bona fide principal. (aka FNMR: false non-match rate)

One-to-one matching: Compare live template with a specific stored
template in the system. This corresponds to authentication.

One-to-many matching: Compare live templates with all stored
templates in the system. This corresponds to identification.

Summary

Having looked at all these methods for authentication, we can see
that as a secondary form of authentication (but not identification!)
biometrics might be promising. The most likely form of authentication in
the future, however, will be a combination of something you have and
something you know. Passwords will be around for a long time yet.