ESXi and Likewise – troubleshooting guide – part 1

Last week I had to troubleshoot strange issue related to Active Directory integration with ESXI (6.0 version), this was motivation to prepare small (two articles) series about ESXi / Likewise integration and troubleshooting based on my latest experience.

VMware use Powerbroker Identity Services (Formerly known as Likewise) for adding ESX host to windows AD environment. To begin as usual is good to have some theoretical background about related components and technology and we describe it all in this part.

Below some of the basics of PAM and Kerberos:

PAM (Pluggable authentication module) – It’s a mechanism to integrate multiple low-level application schemes into a high-level APIs. All the application programs like hostd, dcui etc use PAM for creating users and authenticating them. They are referred to system-auth file which in turn refers to /etc/security/login.map. login.map maps ‘vpxa’ user to system-auth-local and all other users are mapped to system-auth-generic. PAM on its own can’t implement Kerberos ; It’s not possible for a PAM module to request a Kerberos service ticket (TGS) from a Kerberos key distribution center (KDC).
Kerberos – Kerberos protocol is designed to provide reliable authentication over open and insecure networks where communications between the hosts belonging to it may be intercepted. So we can say Kerberos is an authentication protocol for trusted hosts on untrusted networks.

If you interested in more deep knowledge on this topic, take a look at this Kerberos tutorial: http://www.kerberos.org/software/tutorial.html

Before we describe communication with ESXi lets gathers together all Likewise components:
• Lsassd – The Likewise authentication daemon handles authentication, authorization, caching, and idmap lookups,
• Netlogond – Detects the optimal domain controller and global catalog and caches the data,
• Lwiod – The Likewise input-output service. It communicates over SMB with SMB servers,
• Caches – To maintain the current state and to improve performance, the Likewise agent caches information in several files, all of which are in /etc/likewise/db/
• lsass-adcache.filedb – Cache managed by the AD authentication provider,
• netlogon-cache.filedb – Domain controller affinity cache, managed by netlogond,
• pstore.filedb – Repository storing the join state and machine password.

OK, now it’s time to consider how Likewise extends Kerberos authentication to ESXi:

1. User logs in to ESX (c# or web client),
2. Username and password are sent to PAM,
3. pam_lsass.so library communicates with the Lsassd,
4. from username and password Lsassd generates a secret key,
5. using the secret key Lsassd request a TGT, from AD’s KDC,
6. The KDC verifies the secret key and then grants the ESXi Host a TGT,
7. ESXi host and the KDC exchange messages to authenticate the client,
8. Lsassd can use the TGT request service tickets for other services such as ssh.

To clarify more lets discuss important algoritms (netlogon) related to this process to address common questions:

1. How Netlogond finds the best DC (prioritization) ?

Likewise Netlogon obtains a list of candidate Domain Controllers using DNS. The algorithm for doing this is based on the algorithm used in Windows Netlogon. Each candidate Domain Controller which matches the site criteria is queried with a CLDAP request for the Netlogon attribute. The time to respond for each Domain Controller is stored as PingTime in the DomainControllerInfo output parameter. The Domain Controller with the lowest PingTime is returned to the caller.

2. Prefered DC

Netlogon attempts to find the domain controller which responds the quickest to CLDAP pings with a preference for domain controllers in the same site. The algorithm is rather complex 😉
If the request includes a site, then the query order is:

a) Preferred domain controller plugin with the requested site
b) DNS with the requested site

Last part of this article will discuss important packets in tcpdump – very important in case of troubleshooting problems with joining ESXi to domain :
1. CLDAP – Usage of CLDAP packets depends upon the attribute:

If attribute = netlogon -> These CLDAP pings are used by netlogond to verify the aliveness of the domain controller and also check whether the domain controller matches a specific set of requirements. netlogon version(NtVer) etc.
If attribute = time -> These’re used for selecting the nearest DC by netlogon during domain controller discovery phase.
2. ARP – Address resolution protocol is used for resolution of network layer address into link layer address i.e. IP address to MAC address. These’re common packets not specific to AD.

3. DNS – DNS queries related to srv records. Microsoft decided to use SRV records as a key part of the procedure whereby a client finds a domain controller. So where do these records come from? They are registered with DNS by the NetLogon service of a domain controller when it starts. There are actually quite a few of these records, but for right now let’s just look at two of them, the ones that have to do with domain controllers. They are in the following formats:
_ldap._tcp.dc._msdcs.dnsdomainname _ldap._tcp.sitename._sites.dc._msdcs.dnsdomainname
4. KRB5 : For Krb5 you would see four packets viz. AS-REQ, AS-REP, TGS-REQ, TGS-REP:
• AS_REQ is the initial user authentication request (i.e. made with kinit) This message is directed to the KDC component known as Authentication Server (AS);
• AS_REP is the reply of the Authentication Server to the previous request. Basically it contains the TGT (encrypted using the TGS secret key) and the session key (encrypted using the secret key of the requesting user);
• TGS_REQ is the request from the client to the Ticket Granting Server (TGS) for a service ticket. This packet includes the TGT obtained from the previous message and an authenticator generated by the client and encrypted with the session key;
• TGS_REP is the reply of the Ticket Granting Server to the previous request. Located inside is the requested service ticket (encrypted with the secret key of the service) and a service session key generated by TGS and encrypted using the previous session key generated by the AS;

5. LDAP : A standards-based protocol that is used for communication between directory clients and a directory service. LDAP is the primary directory access protocol for Active Directory. LDAP searches are the most common LDAP operations that are performed against an Active Directory domain controller. An LDAP search retrieves information about all objects within a specific scope that have certain characteristics, for example, the telephone number of every person in a department.
6. NBNS – NBNS serves much the same purpose as DNS does: translate human-readable names to IP addresses

9. TCP- Network traffic who uses TCP as transmission protocol viz. Kerberos,ssh,https,ldap etc.
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module [e.g. IP] to transmit each segment to the destination TCP.

10. SMB – Server Message Block,also known as Common Internet File Systems(CIFS) operates as application layer network protocol mainly used for providing shared access to files,printers,serial port, and miscellaneous communications between nodes on a network. It also provides an authenticated inter-process communication mechanism.
Now we are prepared for troubleshooting ! Stay tuned for next part 🙂