Asynchronous, High-Performance Login for Web Farms

Often during my consulting engagements I run into people who say, "some things just can't be made asynchronous" even after they agree about the inherent scalability that asynchronous communications pattern bring. One often-cited example is user authentication - taking a username and password combo and authenticating it against some back-end store. For our purposes here, I'm going to assume a database.

The Setup

Just so that the example is in itself secure, we'll assume that the password is one-way hashed before being stored. Also, given a reasonable network infrastructure our web servers will be isolated in the DMZ and will have to access some application server which, in turn, will communicate with the DB. There's also a good chance for something like round-robin load-balancing between web servers, especially for things like user login.

Before diving into the meat of it, I wanted to preface with a few words. One of the commonalities I've found when people dismiss asynchrony is that they don't consider a real deployment environment, or scaling up a solution to multiple servers, farms, or datacenters.

In the synchronous solution, each one of our web servers will be contacting the app server for each user login request. In other words, the load on the app server and, consequently, on the database server will be proportional to the number of logins. One property of this load is its data locality, or rather, the lack of it. Given that user U logged in, the DB won't necessarily gain any performance benefits by loading all username/password data into memory for the same page as user U. Another property is that this data is very non-volatile - it doesn't change that often.

I won't go to far into the synchronous solution since its been analysed numerous times before. The bottom line is that the database is the bottleneck. You could use sharding solutions. Many of the large sites have numerous read-only databases for this kind of data, with one master for updates - replicating out to the read-only replicas. That's great if you're using a nice cheap database like mySql (of LAMP), not so nice if you're running Oracle or MS Sql Server.

Regardless of what you're doing in your data tier, you're there. Wouldn't it be nice to close the loop in the web servers? Even if you are using Apache, that's going to be less iron, electricity, and cooling all around. That's what the asynchronous solution is all about - capitalizing on the low cost of memory to save on other things.

The Asynchronous Solution

In the asynchronous solution, we cache username/hashed-password pairs in memory on our web servers, and authenticate against that. Let's analyse how much memory that takes.

Usernames are usually 12 characters or less, but let's take an average of 32 to be sure. Using Unicode we get to 64 bytes for the username. Hashed passwords can run between 256 and 512 bits depending on the algorithm, divide by 8 and you have 64 bytes. That's about 128 bytes altogether. So we can safely cache 8 million of these with 1GB of memory per web server. If you've got a million users, first of all, good for you. Second, that's just 128 MB of memory - relatively nothing even for a cheap 2GB web server.

Also, consider the fact that when registering a new user we can check if such a username is already taken at the web server level. That doesn't mean it won't be checked again in the DB to account for concurrency issues, but that the load on the DB is further reduced. Other things to notice include no read-only replicas and no replication. Simple. Our web servers are the "replicas".

The Authentication Service

What makes it all work is the "Authentication Service" on the app server. This was always there in the synchronous solution. It is what used to field all the login requests from the web servers, and, of course, allowed them to register new users and all the regular stuff. The difference is that now it publishes a message when a new user is registered (or rather, is validated - all a part of the internal long-running workflow). It also allows subscribers to receive the list of all username/hashed-password pairs. It's also quite likely that it would keep the same data in memory too.

I'm going to be explaining the implementation of this solutions using the open source communication framework nServiceBus, but the same elements will be found in any messaging or ESB solution. By using the facility of nServiceBus of sending multiple logical messages in the one physical message, we can model the publication of single updates, and returning the full list with the same logical message. Let's define that message:

When the Authentication Service receives the GetAllUsernamesMessage, its message handler accesses its cache of usernames and hashed passwords, and builds a message that is returned to the caller as follows:

When the app server sends the full list, multiple objects of the type UsernameInUseMessage are sent in one physical message to that web server. However, the bus object that runs on the web server dispatches each of these logical messages one at a time to the message handler above.

So, when it comes time to actually authenticate a user, this the web page (or controller, if you're doing MVC) would call:

That UsernameInUseMessage would eventually arrive at all the web servers subscribed.

Performance/Security Trade-Offs

When looking deeper into this workflow we realize that it could be implemented as two separate message handlers, and have the email address take the place of the workflow Id. The problem with this alternate, better performing solution has to do with security. By removing the dependence on the workflow Id, we've in essence stated that we're willing to receive a UserValidatedMessage without having previously received the RegisterUserMessage.

Since the processing of the UserValidatedMessage is relatively expensive - writing to the DB and publishing messages to all web servers, a malicious user could perform a denial of service (DOS) attack without that many messages, thus flying under the radar of many detection systems. Spoofing a guid that would result in a valid workflow instance is much more difficult. Also, since workflow instances would probably be stored in some in-memory, replicated data grid the relative cost of a lookup would be quite small - small enough to avoid a DOS until a detection system picked it up.

Improved Bandwidth & Latency

The bottom line is that you're getting much more out of your web tier this way, rather than hammering your data tier and having to scale it out much sooner. Also, notice that there is much less network traffic this way. Not such a big deal for usernames and passwords, but other scenarios built in the same way may need more data. Of course, the time it takes us to log a user in is much shorter as well since we don't have to cross back and forth from the web server (in the DMZ) to the App server, to the DB server.

The important thing to remember in this solution is doing pub/sub. nServiceBus merely provides a simple API for designing the system around pub/sub. And publishing is where you get the serious scalability. As you get more users, you'll obviously need to get more web servers. The thing is that you probably won't need more database servers just to handle logins. In this case, you also get lower latencyper request since all work needed to be done can be done locally on the server that received the request.

ETags make it even better

For the more advanced crowd, I'll wrap it up with the ETags. Since web servers do go down, and the cache will be cleared, what we can do is to write that cache to disk (probably in a background thread), and "tag" it with something that the server gave us along with the last UsernameInUseMessage we received. That way, when the web server comes back up, it can send that ETag along with its GetAllUsernamesMessage so that the app server will only send the changes that occurred since. This can be done REST style as well, using HTTP GET with an additional "If-Modified-Since" header. All this drives down network usage even more at the insignificant cost of some disk space on the web servers.

And in closing...

Even if you don't have anything more than a single physical server today, and it acts as your web server and database server, this solution won't slow things down. If anything, it'll speed it up. Regardless, you're much better prepared to scale out than before - no need to rip and replace your entire architecture just as you get 8 million Facebook users banging down your front door.

More Info

nServiceBus is an open source communications framework that makes building enterprise .NET systems easier. By providing scalability critical features like publish/subscribe support, integrated long-running workflow, and deep extensibility nServiceBus provides a solid foundation for any distributed system.

About the author

Udi Dahan is The Software Simplist, a Microsoft Solutions Architect MVP, recognized .Net expert, and a member of both the Microsoft Architects and Technologists Councils.

Udi provides clients all over the world with training, mentoring and high-end architecture consulting services, specializing in Service-Oriented, scalable and secure .NET architecture design, and Web services.

He is a member of the International Speakers Bureau of the International .NET Association (INETA), an associate member of the International Association of Software Architects (IASA), a frequent conference presenter, a Dr. Dobb's sponsored expert on Web Services, SOA, & XML, and a regularly published author.

The alternative and probably more scalable approach would be a network attached cache like IBM ObjectGrid or one of the gigoherence competitors. The network attached cache can hold millions if not hundreds of millions of such pairs in the collective memory of the web farm and then provide a login service to the farm. You're adding a little latency because of the network hop to fetch the data but the benefit is that you are no longer limited to what fits in a single address space in terms of how much you can store. As the farm scales out, the grid scales out in parallel to keep up. You're now limited to what fits in the memory of the farm, not a single process.

I don't see where there is any asynchronous activity here. The user shows up on the login screen, which blocks until he provides credentials, which are then sent on to the authentication mechanism (cached or DB, either way), which blocks, and then the results are returned and the user is routed to either a welcome screen, or an error. There is no asynchronous activity here is there?

Now the idea of caching credential data in memory is great, and will certainly speed that up. But how does it make it asynchronous?

What I thought the article was going to be about was an AJAXy sort of thing, where users are allowed into the app as soon as they have provided their credentials. Initially they have no more access than a guest, but at least the UI has options they can start using right away. Meanwhile, in the background, asynchronously, their credentials are being authenticated (however that happens) and the results are applied against their first activity that requires authentication. So they don't wait for authentication up front, they only wait for it after they have initiated an activity that requires it.

I enjoyed reading this, but was just confused about exactly what the technique brings to the table.

I agree. I too enjoyed the article. The approach, whilst not new, is elegant and has clearer options for scalability than what you might call "traditional" options. But if you have a user at one end, sending authentication data to an authenticator at the other end, waiting for a yes/no result that is elemental to what can happen next, then that to me is synchronous in nature.

The Ajax model could work well in handling the to and fro over two asynchronous steps, but you'd need timeouts and retries because ultimately you need that authentication yes/no to get to the secured stuff.

The asynchronous part of the solution deals with registering new users and the "long-running" workflow involved. Keeping the cache updated across the farm is also handled asynchronously/push-based with respect to servers that didn't have the user register there.

From the user perspective, logging in is a blocking process. Asynchrony is not a concept found in human-computer interaction design. Users care about blocking and speed (particularly for blocking processes).

The speed that is achieved is by having the entire login process occur on the web server. From an overall system perspective, the load on the DB decreases thus increasing the scalability of other aspects of the system.

The choice of technology you bring up is an interesting one. I could definitely see changing the implementation of the Cache object on the web server to store its data in a distributed, in-memory cache rather than just locally in memory. However, the overall solution could still look the same.

You do bring up an interesting architectural trade-off - an extra network hop vs using less memory. I'll definitely be looking at that in greater detail in my consulting engagements.

Of course there is asynchrony in the example. But it is only used to implement a distributed cache. I would really not try to implement such a scheme myself. It is much better to simply use a mature cache like billy mentioned.

By using this you can even simplify the registry of users. When a new user is registered you simply add it to the distributed cache. So all machines know the user. This way you do not need explicit message passing.

I agree that this isn't necessarily the end of the line for even the architectural analysis. Given that we have such a cache that efficiently utilizes memory, we'd want an intelligent load/data partitioning scheme such that requests go to servers that have the data needed by that request locally (in the distributed cache), so that we can save the extra network hop (which is critical in some environments).

I'm really not trying to be argumentative here. But I believe words have meaning, and that their meanings are important, especially in technology.

The asynchronous part of the solution deals with registering new users and the "long-running" workflow involved.

I don't see how the login process is long-running or asynchronous. Can you explain how it is? In my mind it blocks for less time, since it is going to a cache, but it still blocks.

Keeping the cache updated across the farm is also handled asynchronously/push-based with respect to servers that didn't have the user register there.

I doubt it, but I can't be sure. In every caching example I've seen, going to the cache is a blocking synchronous action. If the cache is fresh, you at least have to block the caller long enough to retrieve the data from the cache. And if the cache is stale, the caller will block for as long as it takes to refresh the cache.

From the user perspective, logging in is a blocking process.

Right, and that's why I am confused that the article is about "asynchrony" when from the user's perspective it is not.

Asynchrony is not a concept found in human-computer interaction design. Users care about blocking and speed (particularly for blocking processes).

I'm not sure those two sentences jive. Users definitely understand asynchrony, and they like it! When I rename a folder in Lotus Notes, and the entire application blocks, instead of just the one folder I am renaming, I very much understand that is asynchrony at play. Like you said, users care about blocking, and blocking is just another way of saying "not asynchronous".

The speed that is achieved is by having the entire login process occur on the web server. From an overall system perspective, the load on the DB decreases thus increasing the scalability of other aspects of the system.

There is no question that there is a speed increase here, and that the DB load is decreased. But speed and asynchrony are two completely different things. A process can be slow, and asynchronous, such as the delivery of snail mail. A process can be fast and asynchronous, such as sending somebody an IM. A process can be slow and synchronous, such as driving to work in traffic. A process can also be fast and synchronous, such as flying to work in your private jet.

I am going to go re-read the article just to make sure I am not missing something here.

You're correct, though, about the login process blocking on the local cache - and, indeed, for much less time than when working with the DB.

And if the cache is stale, the caller will block for as long as it takes to refresh the cache.

That's one of the differences that this solution embodies. The local cache does not deal with refreshing itself - so the calling thead will still not block while something (the cache) is going to the DB.

When I rename a folder in Lotus Notes, and the entire application blocks, instead of just the one folder I am renaming, I very much understand that is asynchrony at play.

I would submit that you're not representative of most users (neither am I). From my experience, there are also quite a few programmers who don't understand asynchrony. Anyway, at least we agree that the solution leads to a faster login process and that the end user would benefit from that.

I will wrap up by saying that the title could have been made more precise, but I found that it was long enough already. You have my apologies for the lack of clarity.