David Everett here again with an interesting issue that causes the Advertising test in DCdiag.exe to fail when verifying the role of a global catalog (GC).

A customer called Microsoft Product Support to determine why the Advertising test in dcdiag.exe was reporting that the global catalog role was not working on a Windows Server 2008 Read-only domain controller (RODC) when all other indicators suggested it was functioning normally. DCDiag.exe reported the DC was advertising as a GC but DCDiag couldn’t perform a search against the GC when the command was issued local to the server or remotely:

Dcdiag /test:advertising /v /s:RODC

<..snip..>

Doing primary tests

Testing server: SITEZ\RODC01 Starting test: Advertising The DC RODC01 is advertising itself as a DC and having a DS. The DC RODC01 is advertising as an LDAP server The DC RODC01 is not advertising as having a writeable directory because it is an RODC The DC RODC01 is advertising as a Key Distribution Center The DC RODC01 is advertising as a time server Ldap search capabality attribute search failed on server RODC01, return value = 81 Server RODC01 is advertising as a global catalog, but it could not be verified that the server thought it was a GC. ......................... RODC01 failed test Advertising

Determine the health of the Global Catalog

There are some simple tests that can be done to verify the DC is advertising the Global Catalog role. First, make a connection to the W2K8 DC's ROOTDSE over port 389 or 3268 to determine if the DC has sourced and is advertising the global catalog:

isGlobalCatalogReady: TRUE; isSynchronized: TRUE;

Another useful test to verify the DC is advertising as a GC is to use the /GC parameter in nltest.exe and observe that GC is listed in the Flags:

The first two tests essentially confirm what dcdiag.exe already reports, namely that the server is advertising as a GC. The real question now is, “Do LDAP searches correctly retrieve objects from the global catalog?” Since the DC resides in contoso.com (the forest root domain) the search should be made to query an object from a different domain in the same forest over global catalog LDAP port 3268. The example below shows the GC successfully returns the child domain’s Administrator account:

This problem occurs when an administrator removes a domain controller machine account using adsiedit.mscbutfails to remove the objects from the Configuration partition and then promotes a new DC with the same name into a different site.

Apparently an RODC account was pre-created in the wrong Active Directory site using the Pre-created Read-only Domain Controller account... option in DSA.MSC. The Active Directory Promotion Wizard prompts the administrator to type the hostname and select the Site where the prospective DC will reside. The wizard uses this information to create the computer account and its corresponding NTDS Settings object. At some point, the unoccupied RODC machine account was deleted from the Domain Controllers OU using adsiedit.msc but its corresponding NTDS Settings object was left in the Configuration partition. Eventually the server was promoted as a DC into the correct site and successfully promoted to be a GC.

It’s important to note that this condition isn’t specific to RODCs. The same issue occurs if a writable DC is removed from metadata in the same way and a server with the same name is later promoted into a different site.

All NTDS Settings objects have a parent server object named after the DC. This parent server object contains a dNSHostName attribute that is populated with the fully qualified domain name of the DC. In this case the identically named stale and valid NTDS Settings objects in different sites have a dNSHostName attribute with the same FQDN. The DCDIAG Advertising test searches the CN=Sites,CN=Configuration,DC=Contoso,DC=Com container for a Server object whose dNSHostName attribute matches the fully qualified computer name of the DC being targeted. Once it finds the object with the matching dNSHostname it retrieves the objectGUID of the subordinate NTDS Settings object and attempts to contact the target domain controller by its fully qualified CNAME which would normally be registered under the DNS zone “_msdcs.contoso.com”.

If DCDIAG discovers a Server object whose dNSHostName attribute matches the targeted DC but the ObjectGUID on the subordinate NTDS Settings object wasn't created by the last DCPROMO promotion or machine account pre-creation, then the LDAP bind to the targeted DC will fail with LDAP error 81. To get a better understanding of what this error means, download Err.exe and pass it the error code and you find it translates to "LDAP_SERVER_DOWN".

Identify Valid and Invalid NTDS Settings objects and clean up

1. Determine the DSA object GUID and Site name that the DC is currently registering in DNS as a CNAME record by running this command:

Hello, David Everett here again. Recently a customer contacted Microsoft Product Support to determine why the Connect to Domain Controller option in Active Directory Users and Computers (aka: ADUC or dsa.msc) was generating an incomplete list of Domain Controllers (DCs) for one domain. Even though the list of available DCs was truncated we found we could manually enter the name of any DC not in the list and Active Directory Users and Computers would connect to the DC without issue.

Determining the scope of the issue:

Wanting to see if the truncated list of DCs was specific to Active Directory Users and Computers or if other tools also failed to locate all the DCs we ran nltest.exe /dclist:contoso.com. The output shown below revealed a complete list of domain controllers for contoso.com but many were missing their [DS] Site: information. We found that those DCs missing their [DS] Site: information happened to be the same DCs missing when Connect to Domain Controller was selected. One final observation was that the list of available DCs varied from one DC to the next when selecting Connect to Domain Controller.

The two DCs in the Los Angeles site saw themselves in the list of available DCs but not the other DC in the same site. Suspecting Active Directory (AD) replication might be at fault we ran Repadmin /showrepl * /csv > Showrepl.csv and found AD replication was free of errors forest wide.

Checking for Database Inconsistencies:

Since AD Replication was not at fault our focus switched to AD database inconsistencies. We focused on three primary objects which house all of the metadata needed for DC discovery:

The Distinguished Name (DN) of the DC’s object in the domain partition

Using LDP.EXE we connected to both DCs in the Los Angeles site and gathered dumps of all three objects for both DCs and compared the output. For those who tend to avoid this tool, see MSKB 252335 on how to Bind and Connect but make certain to select CN=Configuration,DC=forestrootdomain from the Base DN: drop-down. Expand the configuration partition on the left until you locate the server object of the DC that was restored.

This LDP dump of LADC01’s Server object in the Configuration partition was taken while bound to LADC01 (notice the DC name in blue title bar indicating which DC we’re bound to). Looking at the third attribute from the bottom we find the serverReference forward link attribute in the list of attributes. This attribute contains the DN path of the corresponding DC object in the DC=contoso,DC=com partition. Below is an LDP dump of LADC01’s Server object while bound to LADC02. Notice the serverReference forward link attribute is missing which indicates it is not populated on this DC’s copy of the AD Database.

When we examined the LDP dumps of LADC02’s Server object we found the same was true. LADC02 had a DN for its own DC object but the LDP dump of LADC02’s Server object taken while bound to LADC01 had an empty serverReference attribute. Finally, those DCs which always appear in the list of domain controllers had a populated serverReference attribute on all DCs.

To determine how widespread this issue was we queried the serverReference attribute for both Los Angeles DCs from every DC in the forest using the repadmin /showattr command below. DCs that returned a serverReference attribute had the DC object DN and those DCs that had no serverReference attribute were empty:

We connected to the configuration partition on LADC01 using adsiedit.msc and manually added the “CN=LADC02,OU=Domain Controllers,DC=contoso,DC=com” DN to the serverReference attribute on LADC02. This change made LADC02 appear in the list of available DCs when the Connect to Domain Controller option was selected in Active Directory Users and Computers. Also, nltest.exe /dclist:contoso.com now showed [DS] Site: LosAngeles next to LADC02.contoso.com on all DCs. Not shown here, but once the DN of the DC’s object in the contoso.com domain was added to the serverReference attribute, the serverReferenceBL back-link attribute was automatically populated on the DC object in the contoso.com domain.

Determining how this occurred:

Now that we understood why the DC list was incomplete we started looking for how this occurred. To do this we gathered replication metadata from these three objects for both LADC01 and LADC02. The command used to gather the metadata from LADC01 for LADC02’s server object in the configuration partition is:

Comparing the objects dumped from LADC01 and LADC02 we found the Ver (version) numbers matched. It wasn’t until we looked at metadata of the DC object in the domain partition and compared it with the corresponding Server object in the configuration partition that we understood what occurred.

Here is a showobjmeta dump of CN=LADC02,CN=Servers,CN=LosAngeles,CN=Sites,CN=Configuration,DC=contoso,DC=com from LADC01:

The Version number of the attributes on LADC01’s DC object in the domain partition have a USN that is 100,000 higher than the DC’s corresponding Server object in the configuration partition. This strongly suggests the DC object in the Domain Controllers OU was authoritatively restored with the default version increase of 100,000 while the DC’s corresponding Server object in the configuration partition was not authoritatively restored. The customer then remembered accidentally deleting several of the DCs a while back and performing an authoritative restore on the entire Domain Controllers OU.

Understanding the inconsistencies:

Now that we knew an authoritative restore of the domain controllers OU was performed we needed to determine why the serverReference and serverReferenceBL attributes for restored DCs were missing and different across all DCs.

Anyone who has performed authoritative restores of users and groups will recall an issue where group membership is not correct on replica DCs after users and groups are authoritatively restored; this is discussed at length in KB280079. In the case of restored users and groups, when a user is deleted their membership from the remaining group is removed. If the user is then restored, but the group is not, the membership will not be restored on any DC except the DC where the restore took place. For those wondering what this has to do with DCs being restored, it is identical. DCs are security principals just like users, and the DC’s server object in the configuration partition behaves much like a group. If the DC object is deleted from the domain partition the serverReference attribute containing the forward link will be NULL’ed out on the server object in the configuration partition. If just the DC object in the domain partition is restored the serverReference attribute on the corresponding server object in the configuration partition will not be updated on replica DCs once the restored DC object inbound replicates to them.

Avoiding this issue:

Since the release of Windows Server 2003 Service Pack 1 ntdsutil.exe has automatically created LDF files for all partitions in the forest where restored objects have back-links. This is discussed further in MSKB 840001. In the case of user accounts you ensure all users have the correct group membership on all DCs by allowing the restored user accounts to replicate to all DCs/GCs. Once all DCs have the restored account you use ldifde -i -f <AR*.ldf> and import the user’s group membership against to the recovery DC. Doing this ensures the user’s DN is added to the member attribute on the group and the version of the member attribute is bumped higher causing it to replicate to all DCs. Since all DCs have a copy of the restored user account in their local database the DN on the member attribute is retained. As a rule of thumb, if you are authoritatively restoring users, computers or groups you should always import the LDF files created by ntdsutil.exe and avoid issues like this.

Or even better, deploy Windows Server 2008 R2 and enable the AD Recycle Bin – it automatically handles back links and forward links.

Hi all Rob Greene here again. I thought I would share with you how to do some common tasks with a Windows Server 2008 clustered Certification Authority (CA). When the CA is clustered there are definitely different steps that need to be taken when you:

Make a change to the behavior of the CA by using certutil.exe with –setreg or –delreg switches.

Modify the registry values in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\CertSvc hive.

Renew the CA’s certificate.

In the past before the Certification Authority service (CertSvc) was supported in a cluster, you could make these changes and then stop and start the CertSvc service without a problem. This is still the case when the Certification Authority has not been clustered.

However, when you have the Certification Authority configured as a cluster you must avoid starting and stopping the service outside of the Cluster Administrator snap-in (Cluadmin.msc). The reason is that the Cluster Service not only keeps track of the service state for CertSvc, it is also responsible for making sure that the registry key location HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\CertSvc is saved to the quorum disk when it notices a change to the registry location. This is noted in the CA Cluster whitepaper also, which is required reading for anyone clustering CA’s.

Changing the behavior of the Certification Authority

If you need to make a change to the behavior of the Certification Authority with CertUtil.exe or direct registry modification, you must always follow the steps below:

1. Logon to the active clustered Certification Authority node. If you are unsure which node currently owns the resource do the following:

a. Launch the failover Cluster Management MMC.

b. Select the Certification Authority resource, and in the right hand pane you will see the “Current Owner” (See figure 1).

4. Take the Certification Authority resource (Service) offline and then bring it back online. We have to take the resource offline and back online since the CertSvc service will not read any registry key changes without being restarted, and as I stated above when the CA is clustered you should refrain from stopping and starting the CertSvc service directly.

a. Right click on the Certification Authority resource in the tree view pane.

b. Select either “Bring this service or application online” or “Take this service or application offline” (See figure 2).

Figure 2 - Taking the resource offline / online

Renewing the subordinate Certification Authority certificate

This section discusses the steps that need to be done when renewing a subordinate CA certificate. A Root certification authority shouldn’t be clustered and instead should be configured as an offline root.

Verify the request file location

When the CA certificate is renewed it will stop the CertSvc service to generate the certificate request file. The request file location and name is dictated by the following registry key:

Before renewing the CA you will want to make sure that the registry key points to a valid file system path. If it is not, either the renewal will fail silently, or you might get the error “The system cannot find the file specified” when you attempt to renew the CA. If you have to change this value do the following on the active CA node:

Renewing the Certification Authority certificate

As noted earlier, if the CertSvc service is stopped or started outside of the Failover Cluster Management snap-in the cluster system is not aware of any changes that are done to the registry. Here is a high level process of what happens when a CA is renewed so that you can understand why the below steps are necessary on a clustered CA:

1. CertSvc service is stopped to generate the certificate request file. It reads the RequestFileName registry value to determine where and what the file name should be for the request file.

2. CertSvc service is started once the request file has been generated.

3. CertSvc service is stopped again to install the issued certificate from the CA.

4. The CACertHash registry value is updated to include the new CA certificate hash. NOTE: NEVER DELETE OR MODIFY this registry value unless directed by Microsoft support. Modifying this registry key can cause the CA not to function properly or in some cases to not even start!

Here are the actual steps to renew the CA on a cluster.

1. Open the Failover Cluster Management snap-in.

2. “Pause” the inactive Certification Authority node. If you are unsure about which server is the active node see Figure 1.

3. Once the inactive node is paused you can renew the CA’s certificate. Please review the following TechNet article to help with the process of actually getting the subordinate CA certificate renewed.

4. Once you have gotten the CA’s certificate renewed by the root CA, and installed the new certificate to the subordinate CA you will need to take the Certification Authority resource offline and then back online within the Failover Cluster Management snapin.

a. Right click on the Certification Authority resource in the tree view pane.

b. Select either “Bring this service or application online” or “Take this service or application offline” (See figure 2 above).

5. Open the Certification Authority snapin, and target the Clustered Network resource name.

6. Right click on the Certification Authority name and select properties.

7. If you renewed with a new key pair you should see several certificates listed as show in figure 4.

Figure 4 - Certification Authority properties.

8. Once you have verified that the Certification Authority is using the renewed CA certificate you can “Resume” the node that was paused in step 2.

Since the Certification Authority service is configured as a generic service the above processes must be adhered to when managing a clustered CA. If changes are made outside of the Cluster service’s knowledge then the nodes will never be in sync and clustering will fail