Kerberos Delegation and Troubleshooting

When things go wrong with Active Directory’s (AD’s) Kerberos implementation, troubleshooting can be a daunting task. Ninety-nine percent of the time, everything just works—so opportunities to practice troubleshooting are limited. When you do need to solve a problem, it’s important to have a good technical understanding of the protocol. One of Kerberos’s most complicated configurable components is the concept of delegation.

In a nutshell, delegation lets a user access an application, and then the application accesses another service in the context of the user. A common example is a website that accesses a SQL Serverdatabase. Rather than access to the database occurring each time in the context of a service account, each request to the database is made in the context of the user accessing the website.

For background information about Kerberos in AD, see “Kerberos in Active Directory.” For the purposes of this article, let’s assume that your AD forest is running at the Windows Server 2003 or better functional level and that your application servers are running Server 2003 or later, unless otherwise noted. Server 2003 introduced numerous improvements to the Kerberos implementation in AD.

Kerberos Delegation

At a conference I recently presented at on the topic of Kerberos delegation, I asked the audience to raise their hands if they had ever had to configure Kerberos delegation. A large portion of the audience raised their hands. I subsequently asked those with their hands raised to leave them raised if they had gotten the delegation configured properly on the first attempt. Only a couple of hands remained raised. Unfortunately, little documentation exists on the topics of delegation and constrained delegation. But delegation configuration is a critical component of many enterprise applications.

As I mentioned earlier, the most common example of implementing delegation is to access an application (usually a web application) that subsequently accesses a resource such as a SQL Server database. In order to access the database, the application has to use credentials to make the connection. A common approach is to connect via a dedicated service account that has read and write access to all the necessary data in the database, as Figure 1 shows. The application is then responsible for managing access controls to the data itself because the service account has access to everything.

Another option is for the data to be controlled using SQL Server’s native access management capabilities on a per-user or per-group basis. In order for the controls to be effective, the application needs to make the connection in the context of the user who is accessing the application, as Figure 2 shows. The process under which this occurs is known as Kerberos delegation, or more frequently Kerberos constrained delegation.

In order to access the SQL Server system in Figure 2, the web server needs to obtain a service ticket to the SQL Server service. The service ticket must be for the user accessing the web application (e.g., User 1), not the web server’s service account. Thus, the web server presents User 1’s service ticket that was used to access the website (e.g., for www.contoso.com) to the Key Distribution Center (KDC) and requests a service ticket to the SQL Server system. The KDC evaluates the delegation settings in AD for the web server; if it’s permitted to delegate to the SQL Server system, the KDC takes the presented service ticket as proof that the user is authenticated and returns a new service ticket for the user to the SQL Server system. Figure 3 shows this exchange of information.

So far we’ve made the assumption that no configuration is required in AD for delegation to work as Figure 3 shows—but this isn’t the case, and for good reason. If any service could simply delegate authentication to any other service, a malicious person could lure a user to authenticate to the malicious person’sservice, giving that person access to every service on the network that the unwitting user has access to.

The default setting in the Microsoft Management Console (MMC) Active Directory Users and Computers snap-in’s Delegation tab is to not trust the user for delegation. This means that services running in the context of the account can’t delegate authentication. Windows 2000 Server supports the Trust this user for delegation to any service (Kerberos only) option, which you can see in Figure 4. With this option enabled, the service can request a service ticket on behalf of the user to anyother service in your environment. This option is inherently insecure and as a general rule, you should have nearly no use for it in your environment.

The preferred configuration, which Figure 4 shows, is to enable constrained delegation by selecting Trust this user for delegation to specific services only. This setting limits the service (or computer) account to only be able to request to delegate authentication to the services listed. In this case, service tickets can only be requested on behalf of other users to the SQL Server service on sql.contoso.com. When you click Add, you must browse for the user (i.e., service account) or computer hosting the service to which you want to allow delegation. In this case I selected the SQL Server service account. As Figure 5 shows, you’ll see a list of Service Principal Names (SPNs) defined on the selected user or computer from which you can select the services to allow delegation to.

Protocol Transition

Protocol transition is an added function of the AD Kerberos implementation that Microsoft introduced in Server 2003. So far in our discussion, when the www.contoso.com application needed to access SQL Serveras the current user, the web server presented the user’s service ticket to the web application to obtain a service ticket to the SQL Server system, as Figure 3 shows. This scenario is only possible if the user authenticates to the website using Kerberos. If the user authenticates via forms-based logon, through another protocol such as NTLM, or perhaps with another mechanism such as an RSA SecurID token, the web application can’t obtain a service ticket on the user’s behalf because Kerberos isn’t involved.

To enable this scenario, you can configure a service account or computer account to perform protocol transition, which lets the service or computer account request a service ticket to a service without having a service ticket from the user. In lieu of presenting the user’s service ticket to the website, the service account presents its own ticket-granting ticket (TGT) and requests a service ticket to itself in the name of the user. Figure 6 shows the sequence of Kerberos requests and replies when protocol transition is performed.

It’s important to note that because of the sensitivity of protocol transition, it’s available only in conjunction with Kerberos constrained delegation. To configure protocol transition, select Use any authentication protocol rather than Use Kerberos only when you enable constrained delegation. The sensitivity of this configuration comes from the fact that you’re giving the application the ability to make a claim to the KDC that it successfully authenticated a user regardless of whether it actually did so. To limit the risk, you must configure the services that the application can make this claim to in the form of a service ticket.

For the exchange illustrated in Figure 6 to be successful, several additional prerequisites are necessary in addition to configuring the service account’s settings. First, the service account must be able to access the group membership of the user it’s trying to obtain a service ticket for. This access is granted through membership in the AD Windows Authorization Access group. This group is delegated read access to the AD tokenGroupsGlobalAndUniversal attribute.

Next, to actually perform delegated authentication, the service account also needs the Act as Part of the Operating System (SeTcbPrivilege) and Impersonate a Client After Authentication (SeImpersonatePrivilege) security privileges. The Act as Part of the Operating System privilege is especially sensitive and by default is only granted to the SYSTEM account. If you grant this right to the service account running a web application and the application is compromised, the attacker will have full control of the server. Typically applications that make large-scale use of protocol transition, such as single sign-on (SSO) tools, implement a special service that runs under the SYSTEM account and performs the necessary Kerberos calls on behalf of the web application.

Troubleshooting Kerberos

Delegation is one of the most difficult Kerberos components to configure, and misconfiguration leads to broken applications. Numerous additional minor problems occur with Kerberos—these problems are important to recognize and remediate.

Two utilities that are commonly used to monitor Kerberos behavior on a Windows machine, as well as to troubleshoot, are Klist and Kerbtray. Klist is a command-line utility that’s built in to Windows. This tool lets you see all the tickets currently cached for a session, as well as view the TGT. Simply run klist to view the cached tickets; run klist tgt to view the TGT. To purge cached tickets (and the TGT), run klist purge. Purging tickets lets you get a new TGT with updated group membership stored in it without logging off.

Kerbtray is available in the Microsoft Windows Server 2003 Resource Kit and the Microsoft Windows 2000 Server Resource Kit. The data presented in Kerbtray is the same as Klist; however, Kerbtray runs in the system tray and provides a graphical view rather than text-based output.

The most common problem AD administrators face with Kerberos is duplicate SPNs. SPNs are used to identify services running in the environment. When a user requests a service ticket in order to access a service, the user specifies the SPN of the service he or she is trying to access as part of that request. The KDC subsequently searches for an account holding that SPN and encrypts the ticket using the account’s secret. If more than one account has the same SPN defined on it, the KDC won’t be able to properly encrypt the ticket because there’s more than one secret that can be used to encrypt the ticket.

Duplicate SPNs frequently occur when machines are joined on one domain and then joined to another domain in the forest, leaving an orphaned computer account behind in the old domain. Duplicates can also occur if an SPN is manually entered on multiple user or computer accounts. When the KDC receives a request for a service ticket and finds multiple matches for the specified SPN, an event similar to the one in Figure 7 is logged in the domain controller’s (DC’s) system log.

Time synchronization is critical to proper Kerberos operation. If the client, server, or KDC clocks aren’t synchronized, Kerberos won’t work correctly. Kerberos uses a timestamp to secure the various messages it depends on; when clocks don’t match across the environment, tickets are erroneously invalidated. By default, AD lets clocks drift a maximum of five minutes in either direction.

Kerberos is highly dependent on DNS. When you define SPNs, you define them in terms of the service’s DNS name (e.g., http/www.contoso.com). If you access the www.contoso.com website via a URL other than www.contoso.com, Kerberos won’t work correctly. In order to support browsing to applications via just their host name, it’s typical to also define an SPN for the host name (e.g. http/www) so that users don’t have to enter the service’s Fully Qualified Domain Name (FQDN). One scenario in which Kerberos never works is when a service is accessed via IP address. In this case, authentication typically falls back to NTLM.

When authentication falls back to NTLM, applications that depend on Kerberos delegation but don’t implement protocol transition will fail. Sometimes NTLM is used because of either a server or browser configuration issue rather than a Kerberos problem. One tool you can use to troubleshoot these types of issues with web applications is a free utility called Fiddler. You can download Fiddler from www.fiddler2.com.

Another common problem is an age-old issue known as token bloat. Kerberos stores a user’s group membership (among other things) in the Privilege Attribute Certificate (PAC) section of the user’s TGT and subsequently inside the service tickets. When the amount of memory required to store the group membership exceeds a certain value, the membership can no longer be fully stored in the PAC. The Microsoft article “New resolution for problems with Kerberos authentication when users belong to many groups” discusses how to adjust a registry setting (MaxTokenSize), as well as how to calculate each group’s contribution to the total size of the token. Although adjusting this registry setting can temporarily solve this problem, a better solution is to reevaluate your organization’s group membership strategy.

Quick and Easy Solutions

In general, you can count on Kerberos to work without incident. But if you need to set up an application that requires Kerberos constrained delegation, you need a solid understanding of how delegated authentication works. When Kerberos does break down, numerous tools let you troubleshoot the problem to find quick and easy solutions.