ESXi and Likewise – troubleshooting guide – part 2

In last part of this small series, we discussed theoretical background about components and technology related for adding ESX host to windows AD environment. Now it is time to describe troubleshooting options and some real life problems with solutions.

Let’s start from dividing all ESXi/Likewise issues into categories:

Domain Join Failures

Here are most often reasons that an attempt to join a domain fails:

The user name or password of the account used to join the domain is incorrect.

The name of the domain is mistyped.

The name of the OU is mistyped.

The local hostname is invalid.

The domain controller is unreachable from the client because of a firewall or because the NTP service is not running on the domain controller.

make sure the nameserver entry in /etc/resolv.conf contains the IP address of a DNS server that can resolve the name of the domain you are trying to join.

Make Sure nsswitch.conf Is Configured to Check DNS for Host Names

The /etc/nsswitch.conf file must contains the following line:

hosts: files dns

Ensure that DNS Queries Are Not Using the Wrong Network Interface Card

If the ESX host is multi-homed, the DNS queries might be going out the wrong network interface card. Temporarily disable all the NICs except for the card on the same subnet as your domain controller or DNS server and then test DNS lookups to the AD domain. If this works, re-enable all the NICs and edit the local or network routing tables so that the AD domain controllers are accessible from the host.

Determine Whether the DNS Server Is Configured to Return SRV Records

Your DNS server must be set to return SRV records so the domain controller can be located. It is common for non-Windows (bind) DNS servers to not be configured to return SRV records.

Diagnose by executing the following command:

nslookup -q=srv _ldap._tcp. ADdomainToJoin.com

Make Sure that the Global Catalog Is Accessible

The global catalog for Active Directory must be accessible. Diagnose by executing the following command:

nslookup -q=srv _ldap._tcp.gc._msdcs. ADrootDomain.com

From the list of IP addresses in the results, choose one or more addresses and test whether they are accessible on Port 3268 by using telnet.

Verify that the Client Can Connect to the Domain on Port 123

Windows time service must be running on the domain controller.

On a Linux computer, run the following command as root:

ntpdate -d -u DC_hostname

Log-in/Authentication issues

Make Sure You Are Joined to the Domain

Check ‘lw-lsa get-status’

Clear the Cache

Clear the cache to ensure that the client computer recognizes the user’s ID.

# ad-cache –delete-all

Clear the Likewise Kerberos cache to make sure there is not an issue. Execute the following command at the shell prompt with the user account that you are troubleshooting:

~#kdestroy

Check the Status of the Likewise Authentication Daemon

#/etc/init.d/lsassd status

Check Communication between the Likewise Daemon and AD

verify that the you can ping DC from ESX host.

Make Sure the AD Authentication Provider Is Running

# lw-lsa get-status

If the result will not include the AD authentication provider or will indicate that it is offline restart the authentication daemon

Check whether you can log on with SSH by executing the following command:

ssh DOMAIN\\username@localhost

Lsassd crash due to various reasons such as during trust enumeration etc.

analyze the lsassd,netlogond,lwiod logs, see where exactly where likewise daemon is crashing.

look into the hostd logs and tcpdump to get more info

Kerberos related issues

start to look into the packet capture (both sites esxi and ad) to see if we’re getting proper TGT and TGS.

//can be related to Kerberos cache so in this case empty the Kerberos cache using mentioned ‘kdestory’ command.

Hostd crash in Likewise code

Gather full log bundle and engage VMware GSS

Windows AD server related issues

Gather guest OS logs and engage MS Support.

Ok., so now we have in one place all troubleshooting options and methodology, now it is time for real life story experience based on one of my last service requests: Customer is unable to log in using Active Directory credentials. It shows invalid credentials even though “Authentication Services” shows that host is joined into domain correct domain. The issue is seen on most of the hosts within the environment. Only 2 hosts do not suffer from the problem – cannot find any difference in configuration. Customer running latest 6.0 build: 4600944

Some other symptoms observed during troubleshooting issue step by step:

We foud that on problematic ESXi hosts IPv6 communication was disabled but DC still using IPv6 in communication after couple test we confirm that after enabling IPv6 on ESXi or totally disabling it at DC site:

finally, there is no error with adding a host to the domain and DC authentication.

To clear more this whole situation we decided to perform additional investigation with VMware Support. GSS confirmed that they located the issue:

“…with the newer versions (vSphere 6) of ESXi in case it receives kdc in IPv6 format. In that situation the host will try to connect with IPv6. In case host has IPv6 disabled it will fail to join the domain “

One thought on “ESXi and Likewise – troubleshooting guide – part 2”

Hello! I could have sworn I’ve been to this blog before but after reading through some of the post I realized it’s
new to me. Anyways, I’m definitely glad I found
it and I’ll be book-marking and checking back often!