HTTP is a stateless protocol - each request to a server cannot be distinguished from the previous ones, at least in theory. Various legitimate (and not) means to track user identity appeared during the years, but anonimity is still a main feature of the web: a user cannot be forced to maintain an identity over different websites.

HTTP headers

HTTP requests contain headers and metadata that can be used as the first step to identify a single client - first of all the IP address of the underlying TCP connection. Parameters of this kind are not deterministic at all as they change dynamically for many ISPs and may be shared between users in a NAT. Yet they are still used as a first line of defense against attacks: a security hole was recently exposed in Twitter that allowed users to brute-force search user passwords just by changing their IP Address.

In many cases, the standard method for tracking a user is a long-lived cookie: a piece of text that the client application agrees to send back to the server with each subsequent request. There exist other authentication methods like the basic access authentication, but they are inferior in user experience and security with respect to the flexibility of cookies; thus they became the foundation of user identity on modern websites: if you have this secret cookie, I am sure you can only be legitimate user X.

Cookies however are machine- and browser-specific: they must be recreated every time the user changes its client. This reason shifts necessarily the focus of user identity from tracking clients to providing credentials.

Credentials

Of course using site-wide credentials as the primary mean of authentication resulted in a plethora of user-password pairs to remember. At least the user bit was simplified by using an already existing unique key like an email address, but since providing the same password to many web sites exposes you to the weakest of them, you still have to remember many passwords.

So, apart from improving password managers inside browsers, web engineers tried to solve the problem of single authentication: a single credential pair that may give you access to any web service.

Old and new standards

Some open standards are trying to replace the noisy "registration or login screen" authentication method. The key for each method is its adoption rate: you wouldn't trust a credit card that worked only in 1% of the shops, so each standard needs wide diffusion to succeed.

We won't go into the details of each authentication flow here, since each of these methods would require a dedicated post; we'll focus on the differences between them.

OpenID provides the user with an URL of a provider website like Google or Yahoo!. By entering this URL in the website the requires authentication, you are sent back to the OpenID provider, which validates your identity and redirects to the original website.

OAuth's mechanism is similar (a 3rd party authenticates you, and you are redirected back), but the metaphor and intent are different: the OpenID provider tells the website your identity, while OAuth will give it access to your data on the provider itself.

For example, you may use Google's OpenID to log into Facebook; Facebook will only recognize that you are the same user each time you log in again. On the contrary, you may log into many websites like Disqus using Twitter as an OAuth provider: this lets the website pull anagraphical information like your real name and pseudonym (depending on what the provider's privacy settings are).

BrowserID is yet-another single sign-on mechanism where the key becomes your email address: once you have verified an address, the browser is able to let you log in with a single click anywhere. Unfortunately, this mechanism is experimental and requires support from both the email providers and your browser.

You have also to consider proprietary standards in addition to these open ones. In some cases, such as Facebook Connect, they predate their open equivalent (Facebook OAuth), but the trade-off between the diffusion of a proprietary platform and the long-term benefits of an open one is a bigger discussion than the user identity one. If your application already depends largely on Facebook's Graph API, leveraging it also for authentication may be a sensible strategy even given the lock-in.

Conclusions

We are far from having a single sign-on for the whole web; but 3rd party authentication has moved us into this direction, and decentralized systems like BrowserID may offer an alternative that doesn't depend on any private company like Facebook and Twitter.

Note however that many vendors view user accounts as strategically important, and even with the best 3rd party authentication technologies available may always stick to the old "registration or login" form (see Amazon's checkout process in the picture). The advantage of owning a list of users and their explicitly provided data is sometimes wide over owning just a list of tokens to borrow them from other service; tokens that may expire, and services that may be temporarily unavailable or cease to work.

Meanwhile, I worked on automatically connecting identities from different social networks that belong to the same person (e.g. Giorgio Sironi's LinkedIn and @giorgiosironi). While we wait for the utopia of a single Internet driver license, as Jeff Atwood calls it, it may be interesting to seamlessly allow identification, or at least tracking, from multiple providers.