I’ve recently encountered an issue that was difficult to resolve and I didn’t find the particular cause that was troubling us documented elsewhere on the web so thought I’d record it here.

The issue was with a service account connecting to SQL Server and intermittently failing to logon.

Errors reported in the Windows Application Event log were:
SSPI handshake failed with error code 0x8009030c
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.

The login attempt didn’t appear to get as far as the SQL instance, so no further information could be captured in a failed Logins trace.

This was affecting a large number of application servers using the same service account. Fortunately this was in development and test environments so no production impact.

The problem was that the account was getting locked out. A service was running every half hour using the account to connect to SQL, but with the wrong password. We also had a process running to unlock locked service accounts – so the account would start working again after a few minutes.

The resolution was to kill that service as it was no longer required. We were able to identify where the failed logins were coming from via the Active Directory audit logs for the account in question.

This was particularly difficult to troubleshoot as the error was a bit misleading.

“We also had a process running to unlock locked service accounts – so the account would start working again after a few minutes.” This is the worst kind of band-aid. Automated treatment of symptoms instead of correctly addressing the root cause of issues as they arise just makes for bigger problems down the road.

Agreed, I say “we” in the loosest sense. It was in development though – I guess they were trying to stop one developer’s mistakes from impacting everyone – though in the end that’s exactly what happened.

We’re getting the same issue, but we’re unable to see the actual login being used in either the Windows Event Viewer or in the SQL Server log. I can tell what server it’s coming from since the IP address is in the error, but that hasn’t been helpful yet. Do you have an approach for actually retrieve the login attempting the connection?

You may be able to find it in the Active Directory logs if your issue has a similar cause to the one we experienced. One other cause though can be timing out on connecting to AD in which case you could be stuck (I’ve seen this before when the network was getting swamped).