Asynchronous, High-Performance Login for Web Farms

Introduction

During my consulting engagements, I often run into people who say, "some things just can't be made asynchronous" even after they agree about the inherent scalability that asynchronous communications pattern bring. One often-cited example is user authentication—taking a username and password combo and authenticating it against some back-end store. For purposes here, I'm going to assume a database.

The Setup

Just so that the example is in itself secure, assume that the password is one-way hashed before being stored. Also, given a reasonable network infrastructure our web servers will be isolated in the DMZ and will have to access some application server that, in turn, will communicate with the DB. There's also a good chance for something like round-robin load-balancing between web servers, especially for things like user login.

Before diving into the meat of it, I wanted to preface with a few words. One of the commonalities I've found when people dismiss asynchrony is that they don't consider a real deployment environment, or scaling up a solution to multiple servers, farms, or datacenters.

The Synchronous Solution

In the synchronous solution, each web server will contact the app server for each user login request. In other words, the load on the app server and, consequently, on the database server will be proportional to the number of logins. One property of this load is its data locality, or rather, the lack of it. Given that user U logged in, the DB won't necessarily gain any performance benefits by loading all username/password data into memory for the same page as user U. Another property is that this data is very non-volatile—it doesn't change that often.

I won't go to far into the synchronous solution because it's been analyzed numerous times before. The bottom line is that the database is the bottleneck. You could use sharding solutions. Many of the large sites have numerous read-only databases for this kind of data, with one master for updates—replicating out to the read-only replicas. That's great if you're using a nice cheap database like MySql (of LAMP), not so nice if you're running Oracle or MS SQL Server.

Regardless of what you're doing in your data tier, you're there. Wouldn't it be nice to close the loop in the web servers? Even if you are using Apache, that's going to be less iron, electricity, and cooling all around. That's what the asynchronous solution is all about—capitalizing on the low cost of memory to save on other things.

The Asynchronous Solution

In the asynchronous solution, you cache username/hashed-password pairs in memory on your web servers, and authenticate against that. Now, analyse how much memory that takes:

Usernames are usually 12 or fewer characters, but take an average of 32 to be sure. Using Unicode, you get to 64 bytes for the username. Hashed passwords can run between 256 and 512 bits, depending on the algorithm; divide by 8 and you have 64 bytes. That's about 128 bytes altogether. So, you safely can cache 8 million of these with 1 Gb of memory per web server. If you've got a million users, first of all, good for you. Second, that's just 128 Mb of memory—relatively nothing even for a cheap 2 Gb web server.

Also, consider the fact that, when registering a new user, you can check whether such a username is already taken at the web server level. That doesn't mean it won't be checked again in the DB to account for concurrency issues, but that the load on the DB is further reduced. Other things to notice include no read-only replicas and no replication. Simple. Our web servers are the "replicas."

The Authentication Service

What makes it all work is the "Authentication Service" on the app server. This was always there in the synchronous solution. It is what used to field all the login requests from the web servers, and, of course, allowed them to register new users and all the regular stuff. The difference is that now it publishes a message when a new user is registered (or rather, is validated—all a part of the internal long-running workflow). It also allows subscribers to receive the list of all username/hashed-password pairs. It's also quite likely that it would keep the same data in memory too.

I'll explain the implementation of this solutions using the open source communication framework nServiceBus, but the same elements will be found in any messaging or ESB solution.

By using the facility of nServiceBus of sending multiple logical messages in the one physical message, you can model the publication of single updates, and returning the full list with the same logical message. Let me define that message: