Protect AD from Administrative Errors

Imagine that you're the enterprise administrator of a multidomain Active Directory (AD) environment. You're attending a presentation by your new CIO Steve Johanson justifying the sizable IT budget to the shareholders. The meeting is supposed to start in 5 minutes and your CIO can't access his presentation on the company SAN. When you look up his account to make sure he has the necessary access permissions, you find that his account is missing. You look at the change log and see that your junior administrator was supposed to remove the account for Steve Johnson, who just retired. Then it dawns on you—the wrong user was removed. Now it's panic time. Fortunately, the CIO knows a few good jokes and can entertain the shareholders while you reanimate his user account, give him a new password, and add him back to all the groups in the other domains so he can access the presentation as well as the rest of his reference material. Fortunately, the CIO understands that mistakes happen, but you wish it could all have been avoided.

Most administrators have been in situations in which a mistake has led to users being accidentally deleted or removed from groups or users being granted access they shouldn't have. Although you can purchase expensive AD backup utilities or set up complicated scripts that let you recover an account in only a few minutes, wouldn't it be great if you could avoid these types of mistakes all together?

Protecting AD objects from administrative errors is challenging. One way to meet this challenge is to have administrators check each other's changes before implementing them. Another way is to use third-party tools to automate changes. One solution that not many people are aware of is to use selective authentication, which was introduced in Windows Server 2003, in an external trust.

The selective authentication solution takes some work to set up initially, but it provides an effective way to audit AD changes. When selective authentication is enabled, users (in this case, administrators) in a trusted domain are explicitly granted rights on specific computers in the trusting domain, so you can control what resources they can access.

Here's how to set up an AD environment for selective authentication:

On the production side of the AD forest, set up a lag site that contains one domain controller (DC) but no associated subnets. Set up a strict replication schedule in which you either allow replication at very limited times or require all replication to be manually triggered. (Turning off all scheduled replication on a site link will generate spanning tree error events on other DCs.) The replication limitation is controlled through the site link schedule.

Set up a second forest (aka the Admin Forest) that contains two or more DCs for redundancy. Place all the administrator accounts for which you want to validate changes in this forest.

et up an external trust between the two forests. Although the trust can be domain based or forest based, you need to set it up as a one-way trust, where the outgoing or trusted domain is the admin domain and the trusting side is the production AD. Instead of using the default authentication method, choose the selective authentication method.

Grant authentication permission. You now have a group of administrator accounts in the Admin Forest that can see the trust to the production forest but can't authenticate to any of the resources in it. So, you need to grant the Allowed to Authenticate permission to the administrator group on the DC in the lag site (aka lag DC).

Grant activity rights. Go through your standard delegation procedure to grant the administrators the rights they need to perform their jobs, such as adding or deleting objects, modifying DNS properties, and creating Group Policy Objects (GPOs).

Selective authentication combined with the Allowed to Authenticate permission on a single DC forces all changes to happen only on that machine. With this setup, administrators can perform their duties, but any mistakes are restricted to one DC in a site that doesn't perform any user authentication. The changes remain there until the replication schedule permits them to propagate. If the replication schedule is manual (i.e., no scheduled times for replication), the changes won't propagate until somebody manually releases them.

This brings us to how to use this solution. You should separate your administrators into two groups. The administrators in one group make changes on the lag DC. The administrators in the other group regularly look at all the changes that have been made on the lag DC. If the changes are acceptable, they force a replication into the live environment. If the changes aren't valid, contain mistakes, or violate company policy, they inform the administrator who made the changes so that he or she can remedy the situation.

So, how does a verification administrator check the changes? In Windows 2003 and earlier, the easiest way is to have Audit DS Changes enabled in the DC's audit policy. This allows all changes made on the DC to be recorded in the security log. Because all changes are being made on a single DC, the verification administrator just has to look at one log and search for any change events that have occurred since the last replication.

Windows Server 2008 introduced some better tools for reviewing directory service changes, such as Dsmain. With this tool, you can mount an LDAP database created in a backup (or created using the Ntdsutil utility), then use a script to compare all objects between the offline LDAP backup and the live lag-site forest, thereby letting you see all changes that have yet to propagate. Server 2008 also has enhanced event auditing, which lets you see more information about changes and create custom views to show only changed objects.

There are also third-party audit tools that you can use. These tools let you capture changes in real time and compare different databases on different DCs, providing an easy way to see what has changed.

Had the selective authentication solution been in place, the opening scenario would have played out much differently. Here's what would have happened: The junior administrator sees he needs to delete the account for Steve Johnson, so he logs on to the Microsoft Management Console (MMC) Active Directory Users and Computers (ADUC) console in the Admin Forest using his account, which is also in the Admin Forest. He navigates to the production forest and tries to connect to a DC. Because selective authentication is being used in an external trust, he can only connect to the lag DC—all other DCs give him an access denied message. He searches for Steve Joh* and accidentally deletes Steve Johanson on the lag DC. At this point, the mistake is made, but it's confined to the lag DC.

The verification administrator logs on to the production forest and looks at the changes made on the lag DC. He notices that the account for CIO Steve Johanson has been deleted. Instead of replicating the change to another site and allowing it to spread throughout the forest, he simply contacts the junior administrator about the problem. He also takes the lag DC offline until after the CIO's meeting is over. The CIO can access his resources and won't know about the mistake until he sees the monthly status report—at which point he will thank you profusely.

Note that there are a few caveats when using this solution:

The chances of a erroneous DS change impacting the production environment have been mitigated but not eliminated. A verification administrator might miss seeing a problem and propagate an erroneous change. This is especially likely if there are a large volume of changes being made. Verification administrators can get caught up in the number of events and not look at them as closely as they should.

The domain and enterprise administrator accounts still exist in the production forest and can make changes. So, if they really wanted to, administrators could circumvent the system and make changes directly on any DC in the production forest instead of on the lag DC.

Although these caveats exist, they're offset by the solution's potential benefits. Besides the obvious one (i.e., reducing the chance that an erroneous change impacts the production environment), the benefits include the following:

You have a straightforward way to audit and report on heritage object changes (especially if you use Server 2008) because every change takes place on one DC.

You add a bit of protection against account compromise. If an administrator account is compromised, the scope is restricted to the lag DC. So, all you need to do is wipe the lag DC and Admin Forest DCs clean, which is much less work than rebuilding AD and all its data.

Obviously, this solution isn't well-suited for a large multinational forest because it would create a tsunami of change verifications. It's also not well-suited for a call center Help desk that does password resets because the new passwords need to be immediately available to users.

However, this solution is well-suited for

Organizational units (OUs) that contain highly visible accounts, such as the CIO's account.

mall AD environments in which untrained staff work as AD administrators.

Small AD environments in which an erroneous change can be catastrophic.

Probationary administrators. (You can make sure that they know what they're doing before you let them loose.)

Administrators of critical services, such as DNS.

Configuration administrators of line of business (LOB) applications that store data in AD, where a mistake will make the application nonfunctional.

Using selective authentication in an external trust provides an effective solution for protecting AD objects from administrative errors. Although it requires some upfront work to set up, it can save you a lot of grief later on. As an advanced Microsoft feature, selective authentication is one more security tool that you can pull out of your bag of tricks.