Understanding and Detecting Secure Channel Problems

The Microsoft domain infrastructure design has some complicated aspects. Active Directory (AD), for example, relies on a commonly defined and working schema for objects and attributes in the database, demands network connectivity to peer domain controllers (DCs) to ensure timeliness of item updates, and needs DNS configuration to be correct, as well as other environment dependencies.

Every computer that’s joined to a domain—whether it’s a workstation client, a server, or a DC—requires connectivity to DCs in order to fulfill some of the service requirements that AD domains require. For workstations and servers, that connectivity is to the DCs in the domain that they’re a member of, as well as trusting domains’ DCs. DCs in one domain need connectivity to DCs in other trusting and trusted domains. The name describing the cached values in that inter-domain connectivity is the “domain secure channel.” To be clear, there are two kinds of secure channels: secure channels from a domain member to a DC in its domain, and trust secure channels between a trusted and trusting DC.

Why Do Secure Channels Matter?

Why does someone in support care about the health of the secure channel? The reason is that all domain-related services rely on the secure channel to a greater or lesser extent. Can’t get Group Policy? Check the secure channel. Can’t access a network resource? Check the secure channel. Can’t log on to the domain? Check the secure channel. Of course, there are other things that can cause these same problems, but few are more difficult to diagnose—or more common—than a problematic secure channel.

What do these computers do with these secure channels? The flippant answer is, “Anything domain-related, of course!” All domain-related services rely on the ability to locate a DC to send the service request to. This is true for a domain member (e.g., a workstation, a member server) as well as a DC. The availability of a responsive DC is really all that a secure channel is. If a server can’t be contacted to send the requests to, the services fail.

For example, if a user connects to a SharePoint site that’s configured to use Kerberos, he or she will need to request a Kerberos ticket to pass to that SharePoint server for authorization. The user’s computer looks to the cached secure channel information for that domain (a cache that’s maintained by the NetLogon service) for which DC to send the Kerberos ticket request to. If that DC is unresponsive for whatever reason, the ticket request won’t occur, and the SharePoint connection won’t be authenticated using Kerberos. Depending on the SharePoint configuration, this might result in a lack of access to that site—all because of a secure channel–related problem.

Let’s walk through a typical multi-domain scenario. Suppose User A from Domain A has logged on to Computer B in Domain B. A logon Group Policy for the user is processed, and a Domain A DC is queried via LDAP to determine which Group Policies are applicable to User A. How does Computer B, which is a member of Domain B, know where to send the network traffic in order to find out which Domain A policies to process? It’s able to do that because the network location information for that domain and a DC in it are kept up-to-date constantly. That information is kept up-to-date by the NetLogon service on every domain-joined Windows computer. The NetLogon service is constantly maintaining its list of available DCs and domains (when trusts are present). Below is a NetLogon debug log snippet that shows some of that ongoing routine. Of course, you can view your own NetLogon debug log information on your computers by following the steps in the Microsoft article “Enabling debug logging for the NetLogon service.”

At a high level, the causes of a secure channel problem can be boiled down to network connectivity problems. If the connectivity problems are intermittent, then when the network is working, all services are working as well. If the connectivity problems persist, then they stand a likelihood of causing a broken secure channel. A broken secure channel just means that the shared secret between the computer and AD would be dissimilar, and as a consequence the computer would be untrusted. The net effect in that situation is that no one would be able to log on to the domain and gain access to domain resources.

On a client computer or member server, a broken secure channel is bad because it might affect that computer’s authentication to network services and any other services it provides. On a DC, it could prevent AD replication and cause unexpected logon and access problems if left untreated.

Identifying a Secure Channel Problem

The best way to identify whether a computer is having a secure channel problem is to do something that ultimately calls the I_NetLogonControl2 function. I_NetlogonControl2 is one of the functions used in the NetLogon service (which is present on any Windows computer in any OS version) to keep knowledge about what domains and which DCs are accessible.

An administrator or support person has three easy ways to call that function and get a quick “thumbs up” or “thumbs down” about connectivity to a specific domain and DC: NLTest, PowerShell, and WMI.

NLTest.exe. NLTest.exe was shipped in the Windows 2000 and Windows Server 2003 Support Tools but is contained by default in most later Windows OSs. NLTest.exe’s sc_verify switch calls I_NetlogonControl2, and you simply need to supply the domain you’re concerned about.

In those cases where the secure channel problem can’t heal itself—when the computer’s shared secret is dissimilar to what AD has for that computer—the NLTest.exe sc_reset switch can repair the problem.

PowerShell. The Test-ComputerSecureChannel PowerShell cmdlt was added in PowerShell 2.0. This cmdlet also calls I_NetLogonControl2 but provides less detail in its test—it simply returns a Boolean response of True if the domain secure channel is healthy and a DC is reachable, and False if not.

PS C:\> Test-ComputerSecureChannel True

Similar to NLTest.exe, Test-ComputerSecureChannel can be used to fix the problem as well by calling the Repair switch.

WMI. By using the win32_ntdomain class, Windows Management Instrumentation (WMI) can query all the domains that the computer knows of. WMI can be useful in situations in which you can’t rely on PowerShell being installed on the computer you’re testing. Note that the following sample (where Win32_NTDomain is called via PowerShell’s Get-WMIObject cmdlet using its alias GWMI) shows only the local domain but would actually return every domain that the local domain has trusts with.

Note that the status of OK in that example corresponds with the True or False return from Test-ComputerSecureChannel and will indicate whether the secure channel is working.

Fixing a Secure Channel Problem

In the case of Microsoft Customer Service & Support’s Commercial Technical Support, we send an optional data-collection package to customers who call us for support. Within that package, we use WMI’s Win32_NTDomain class (called via PowerShell) rather than PowerShell’s native Test-ComputerSecureChannel cmdlet, because we want to be sure that the test will run even on older OSs such as Windows XP and Windows 2003. What specifically do we do in our test? Below are two script samples, using the same methods we use, that you can call via PowerShell on your own.

For the Microsoft diagnostic case, we also make it into a simple function so that we can reuse it.

Detecting secure channel problems in an enterprise environment is the tough part. Fixing them can be much easier. Hopefully, this article helps give you some tools to easily find those problems when they’re happening in your environment.

Discuss this Article 1

We have a test environment consisting of Win2k3 servers and Win XP workstations. It appears that someone changed the time on a DC for testing resulting in some weird replication behaviour. Ultimately we decommissioned and rebuilt one of the DCs, however this has the side effect where some computers changed their account passwords on the decommissioned DC and these were not replicated to the other DC. Consequently, when the DC was decommissioned these can no longer authenticate.

So I want to scan the network, connecting to each machine using the local admin account using VBScript to get the status of the secure channel.

And this is where your blog comes in. I can connect to the computers, however I have found that all of them return the win32_ntdomain status as OK.

I've created a virtual environment and set about creating a VM where the password is incorrect and the status still returns OK even though nltest and netdom report access denied. Any ideas?