Posts

Office 365, ADFS, and SQL

He’s an issue I’ve just run into that there doesn’t seem to be a good answer to on the Internet. When you are building a highly available ADFS farm to enable Single Sign On for Office 365, should you use the Windows Integrated Database (WID) that comes with Windows Server or store the ADFS Configuration on a SQL server? Put simply, if you are using ADFS *only* for Office 365, there is no need to use SQL server for storing the configuration database. Here’s why.

What Does ADFS with SQL Get You?

I spent a good day or so trying to answer this question because I came in after the initial configuration of a Hybrid Exchange migration to Office 365 that was set up using SQL to store ADFS. When we went to test failover, we ran into a *mess* of problems with this configuration. So I’m going to try to explain things in a way that makes sense, rather than just telling you what Technet says in their article on the subject.

Storing your ADFS configuration in a SQL database gives you 4 things:

The ability to detect and block SAML and WS-Federation Token Replay attempts.

The ability to use SAML artifact resolution

The ability to use SQL fail-over to increase availability of the Configuration database

Scalability!!!

Using WID with ADFS prevents these features from being available to you. Here’s what each of those lines of text mean (Why no, I’m not going to just give you that without definitions like Technet does. Can you tell I’m a little grumpy about Technet’s coverage of this subject?)

SAML and WS-Federation Token Replay Attempts

This is actually a potential security vulnerability that comes about because of how Token authentication mechanisms work. In some cases, an application that accepts a federated token identity for access may allow users to “replay” a previously issued authentication Token to bypass the authentication mechanisms used to issue that Token. You can do this by ending a session with a federated application, then returning to the application with a cached version of the interaction. The easiest way to do this is to just push the Back button in a web browser. If the federated application isn’t properly designed, it will accept the cached version of the token used in the previous session and continue operating just like someone never logged out.

It should be noted that this is a serious security issue. If you have an environment where there are systems exposed to public use by multiple users (Kiosks, for example), you will want to make sure that Token Replay attempts are dropped. But here’s the thing, Token Replay is a vulnerability that affects the application you are using federated login to access. It is not a vulnerability of ADFS itself. Basically, if the applications that you are using ADFS to federate with are designed properly, there is no need to have your ADFS environment provide this kind of protection. Office 365 was designed to prevent Token Replay attempts from succeeding. So if you are only federating with Office 365, you don’t need to have this functionality in your ADFS environment. So, no need to have a SQL based Configuration Database for this reason.

SAML Artifact Resolution

I have tried for a while to find a good definition for what SAML Artifact Resolution actually is. I’m still not 100% on it, because the technical documentation for the SAML 2.0 standard is about as informative as a block of graphite. However, it seems like this is essentially a method through which both sides of a SAML token exchange can check on and authenticate one another using a single session, rather than multiple sessions. This comes in handy for situations where load balancing is in use and the application requests verification of the token provider.

The good news is that you don’t have to understand SAML Artifact Resolution. Office 365 doesn’t use it, so you don’t have to worry about having it. No SQL required for this purpose.

SQL Failover Capabilities

At first glance, having SQL server failover capabilities for your ADFS configuration database sounds like a great idea. I mean, SQL server has great failover capabilities built in. The problem is, so does ADFS. When you use WID for your ADFS Configuration database, the first server in the Farm will store a read/write enabled copy of the configuration database. Each additional ADFS server added to the farm will pull a copy of this database and store it in a read only format. So in a WID based ADFS farm, every ADFS server in the farm has a copy of the configuration database by default. ADFS Proxy servers don’t store or need access to the configuration database, so only the backend ADFS servers will benefit at all from SQL integration.

But here’s the thing, for SQL integration to work properly in a way that allows full SQL failover capabilities, you need to have a valid SQL cluster. With SQL 2008R2 and earlier, this means that you have to have at minimum 2 SQL servers installed on a version of Windows that has Cluster Services available. Cluster services requires that you have shared storage before it will create a failover cluster for SQL versions prior to 2012. If you have 2012, you can manage much simpler failover, but if you’re limited to SQL 2008R2 and earlier, you’re going to end up with a significantly higher infrastructure requirement to get SQL failover working. It you can use SQL 2012 to store the ADFS database, there is some advantage to using SQL clustering for ADFS, but the advantage doesn’t actually justify the costs of doing so.

“What about SQL Mirroring?” You ask. Well, that’s a good idea in theory. You can have your ADFS config database mirrored on two SQL servers, but you will run into some major headaches using this technique, especially if you ever need to run a failover. First off, ADFS configuration with SQL server requires that you use the SFConfig command line tool to create the config database and add servers to the farm. This is a huge pain to do, but it’s necessary. Second, if you ever have to fail over to the other SQL server for any reason, you have to reconfigure every one of the servers in the farm. SQL mirroring doesn’t utilize a clustered VIP, so the configuration with SFConfig requires you to point the server to the active SQL server. When you activate the SQL database on the other server, things may or may not work without reconfiguration. If they do work, there’s a chance that they might *stop* working at any time in the future without notice until you run SFConfig on each member of the farm and use the now active mirror’s address in the command. Then you may have to log in to each of your Proxy servers and run the Proxy Config wizard to reinitialize the proxy trust between it and the ADFS farm. That means that if you want to fail over with SQL mirroring, you have to log in to 4 servers at minimum and reconfigure each one every time the active SQL mirror changes. This is a truly unwieldy and excessive failover process. It isn’t worth the loss of expense from not using a full SQL cluster to do this for ADFS’s configuration database.

The failover abilities of SQL allow you to have constantly available access to a writable copy of the ADFS configuration database. If you use WID and the first server in the farm goes down, you cannot make changes to your ADFS configuration. No adding new servers, no modifications to the ADFS functions, etc. The good news is that the only time you need access to a writable copy of the database is when you are adding new servers to the farm or making modifications to ADFS functions. If you’re using Office 365, you almost never have to make changes to either of these once its set up and running. ADFS will continue operating as long as even one server in the farm is accessible without a writable copy of the database. So realistically, you don’t need SQL failover capabilities for ADFS’s configuration database.

Scalability!!!

Scalability is one of my favorite buzz words. Scalability means “I can add more servers to handle this workload without any problems.” And SQL allows you to do that. However, WID integrated ADFS allows you to do so as well. The limitation, though, is that you can only have up to 5 servers in a WID based ADFS farm (NOTE: Proxy servers don’t count against this maximum number). “Only five servers!?” you may ask in heated exasperation. “Yes!” I answer. “Well, I can’t stand limitations! I will use SQL!” you respond. “You will never need more than 4 ADFS servers in ever!” I retort. ADFS is one of the most light-weight roles available for Windows server. I have deployed ADFS on numerous environments with user counts well past the 10,000 mark and the ADFS servers almost never measure a blip on the performance monitors. This is because ADFS does three things, it provides a website for users to log in to, it queries Active Directory to authenticate users, and it generates a small token that contains less than 1KB of data that it then sends to a remote service. This entire process requires about 10 seconds per user per session, and that includes about 9 seconds of someone typing in their credentials. So there isn’t really even much need to have load balancing capabilities on ADFS. One server can handle monstrous loads. If you have an environment with 100,000 users accessing numerous federated applications, you might need to add 1 or 2 extra servers to your farm to handle the load, but I doubt it. You’re better off just adding RAM and processing power to your ADFS servers to improve performance than adding extra servers to the farm.

If you want multi-site high availability, you may need 5 servers, but probably not more. Let’s say you want to have site resiliency in your ADFS configuration as well as failover capability in your primary and secondary site. This configuration requires a minimum of 4 ADFS servers. Two for the primary site, two for the failover site, and 4 proxy servers that don’t count against the maximum of 5 servers. There you have it, full site-resiliency and in-site failover capability using less than 5 servers. You can even add a third site to the farm without having any issues if you want using the remaining server. But let’s be honest here, if there is an emergency that takes down two geographically dispersed datacenters at the same time, you’re going to be looking for the nearest shotgun to fight off zombies/anarchist bandits, not checking to make sure your ADFS farm is still working, so triple hot-site resiliency for ADFS is probably more than anyone needs. So you don’t need SQL for ADFS if you want scalability. It’s scalable enough without SQL.

EXPLOSIONS! I mean Conclusions

If you are planning to build an ADFS farm for Single Sign On with Office 365, you will ask the question, “Do I need SQL?” The answer to that question, in pretty much every possible case is “Not if you’re just going to use it for Office 365.” SQL integrated ADFS configurations add some features, sure, but there is almost no situation where any of those features must exist for ADFS to work with Office 365. So don’t bother even trying it out. It adds an unnecessary level of complexity, cost, and management difficulties and give no advantages whatsoever.

Thanks for pointing that out. I don’t have an opportunity to test that out. My previous experience (and the impetus for this particular post) was having to fix a SQL based ADFS cluster for O365 authentication that was causing major problems in failover testing. I don’t know if it was set up exactly as required by the site you linked to (I do recall reading that article and running each of the commands myself at one time, though), but I do know that every time we failed-over the SQL mirror, we would have to re-run FSConfig to point ADFS to the SQL mirror (in addition to a bit of back-end work on the SQL cluster to activate the mirror node) for ADFS to resume normal operation. After about a week of fiddling, testing, and research, we determined that the SQL cluster was not necessary for that environment’s needs, and switched them to using ADFS with the Windows Integrated Database. Note that this was a couple years ago, and the latest edition of SQL Server probably works better than the one we were using at the time (2008R2, if I remember correctly).
Still, unless there is a defined requirement to protect against token replay, and provide artifact resolution, there isn’t much advantage to deploying a SQL based ADFS cluster. The native high-availability features of ADFS will work very well for any environment with less than 10,000 users (The environment I worked with was about 2500 users, and 2 ADFS servers was more than enough to meet their needs).

Excellent article I had no idea Token replay was application specific and didn’t apply to Office 365. Judging by your terminology (ADFS proxies) I’m assuming this was written about ADFS 2. Any changes you’re aware of with ADFS 3?

I’m not aware of any changes in ADFS 3.0, and don’t expect there are any. There isn’t much different between 2.0/2.1 and 3.0 as far as back-end operation is concerned. The primary changes were improvements to PowerShell configuration and implementation of the Application Proxy system for Windows Server 2012 R2.