New Product Walkthrough – Hybrid vSphere SSL Certificate Replacement

[Update 9/6/2017] For those that attended or are interested in the session entitled vSphere Certificate Management for Mere Mortals – SER2936BU at VMworld US 2017 (or SER2936BE in Barcelona) – I’ve posted my deck to vSphere Central.

The VMware Certificate Authority (VMCA) was first introduced in vSphere 6.0 to improve the lifecycle management of SSL Certificates. This post will explain a little bit about the VMCA and its capabilities while also making a recommendation on how to deploy certificates in your environment. Finally, a new click-by-click walkthrough has been created to serve as a guide as you are planning the certificate replacement process.

VMCA Overview

Over time, certificates within a vSphere environment have become much more important. Certificates ensure that communication between services, solutions, and users are secure and that systems are who we think they are. By default, VMCA acts as a root certificate authority. Certificates are issued that chain to VMCA where the root certificate of VMCA is self-signed as it is the end of the chain. These VMCA-signed certificates generate those thumbprint and browser security warnings you may be used to seeing because they are not trusted by the client computers by default.

The VMCA acts as a central point in which certificates can be deployed to a vSphere environment without having to manually create Certificate Signing Requests (CSRs) or to manually install the certificates once they are minted. The VMCA, working in conjunction with its new purpose-built certificate store called the VMware Endpoint Certificate Store (VECS), has made managing certificates much easier than in prior vSphere releases.

As shown in the graphic below, the VMCA operates within the Platform Services Controller (PSC). Depending on the topology of your installation, you can choose to deploy a vCenter Server with an embedded PSC or utilize separate external PSCs. The VMCA then issues certificates to any vCenter Servers and associated ESXi hosts that are registered to it. Many of the certificates issued by the VMCA are for internal service-to-service communication within vCenter Server. These services, also called Solution Users, use the certificates to authenticate to one another. As vSphere Users and Administrators, we do not interact directly with these services and therefore these certificates are less impactful to our overall certificate strategy. Note that a vCenter Server has four Solution Users while a PSC has one. A vCenter Server with an embedded PSC has four Solution Users as well.

In vSphere 6.0 we also added a reverse proxy to vCenter Server so than when we do need to communicate with vCenter Server services, that communication is all done via port 443 and secured by the Machine SSL certificate of the vCenter Server. The Machine SSL certificate becomes the primary way in which users secure communications with vCenter Server and the PSC. Remember those annoying web browser certificate warnings when accessing the vSphere Web Client? Those are caused by an untrusted (and perhaps self-signed) Machine SSL certificate.

The real value of the VMCA is in the automation of replacing and renewing certificates without having to manually generate CSRs, mint certificates, then manually install those certificates. If you’ve replaced certificates in a vSphere 5.x (or prior) environment then you know the challenges and time commitment involved in that process prior to the VMCA. The VMCA allows us to drastically reduce the overhead of the certificate lifecycle. I should note that use of the VMCA is not required. The VMCA can essentially be bypassed and custom certificates can be requested and installed for each of the different vSphere components, however, this comes with a higher operational cost. Additionally, it may introduce more opportunity for misconfiguration which could lead to a lower standard of security. Tread wisely. Next, let’s take a look at some different operational models for the VMCA along with a recommendation on the best approach.

The Subordinate CA Approach

One of the operational models of the VMCA is to act as a Subordinate (or Intermediate) Certificate Authority. Initially, with the release of vSphere 6.0 and the VMCA, this was a rather attractive option for customers. As a sub CA to an already established Certificate Authority in an environment, the VMCA could issue certificates to vCenter Server and ESXi hosts that would be inherently trusted and easily get rid of those pesky self-signed certificate errors with ease. However, over time it became very apparent that the risk of this model has outweighed the benefit. From a security perspective, by having a Subordinate CA, a rogue administrator with full access to the PSC could mint fully trusted and valid certificates that are trusted all the way up to the organization’s Root CA. In talking with our customers, many of them who operate in a highly security conscious manner, this type of risk is a deal breaker for the Security teams in those organizations.

The Full Custom Approach

The Subordinate CA approach sounded like a great win for operational simplicity but its downfall was the security risk. On the other end of the spectrum we have the Full Custom approach where every certificate within the vSphere environment is replaced by a unique custom certificate minted by a Root CA. This approach is, in theory, the most secure but as previously mentioned, it introduces a lot more complexity and opportunity for misconfiguration, thereby impacting security negatively. It has a high operational cost in order to gain higher security which means generating a CSR for each vCenter Server and PSC VM, each Solution User, and each ESX host. This could be hundreds or thousands of CSRs to generate and certificates to manage. Once that’s all done then you must worry about renewing all those certs or replacing revoked certificates. This is definitely a tradeoff in simplicity and time in order to gain more security.

The Rise of the Hybrid Approach

The question now becomes, “How can we take advantage of the Certificate Lifecycle benefits of the VMCA (and VECS), mitigate the risk of a subordinate CA, and reduce the overall time and effort it takes to manage all of this?” And thus, a hybrid model was born. A few short months after vSphere 6.0 was released, Mike Foley wrote about a new approach in a post titled, “Custom certificate on the outside, VMware CA (VMCA) on the inside – Replacing vCenter 6.0’s SSL Certificate.” With this “hybrid” approach, custom certificates are used for the Machine SSL certificates of the Platform Services Controller and vCenter Server VMs and then the VMCA is left to manage the Solution Users and ESXi host certificates.

This method of certificate lifecycle management does not use the VMCA as a subordinate CA. It lets the VMCA function as an independent CA and issue the internal Solution User and ESXi host certificates. Meanwhile, custom certificates from an external CA will adhere to the controls of the Enterprise PKI policies. Put these two pieces together and this hybrid approach reduces the work of certificate lifecycle management for Operations while increasing security with the custom certificates. This model even meets strict auditing standards such as with the IRS.

Let’s look at an example. Consider a vSphere 6.x environment that contains 4 Platform Services Controllers and 6 vCenter Servers across 2 sites with 50 hosts per vCenter Server. Let’s look at replacing certificates in this environment while comparing and contrasting the Subordinate CA, Full Custom, and Hybrid approaches we discussed earlier.

First, if we were to use the Subordinate CA approach we would want each PSC in the SSO Domain to also be a Subordinate CA. While not a requirement, this ensures consistency across the environment and will make life easier if there is ever a need to repoint a vCenter Server from one PSC to another. Given that each PSC will be a Subordinate CA, we need to generate a CSR for each of those PSC Sub CAs and submit to the Root CA. Once that is completed and the VMCAs are fitted with their new signing certificates, the VMCAs can then issue Solution User, Machine SSL, and Host certificates. So, in this environment we only have to manually manage 4 CSRs to get 4 certificates. Not bad. But remember, most security teams will forbid this type of deployment because of the risks involved.

Next, let’s go to the Full Custom approach. Recall that this method uses custom certificates for everything. So, we need to generate CSRs for the Solution Users, Machine SSL, and Hosts. This adds up to 338 CSRs to generate the required certificates. Whoa, that’s going to take some time. Not only that, but when it comes to renewal time you get to do this all over again not to mention that certificates could get revoked, hosts could be replaced, and other operations that would require a new certificate. You should be able to see that this causes the most management overhead but it is the most secure way of deploying certificates. There are some environments that may require this approach but for a “normal” production environment this should not be required.

Last, the Hybrid approach mashes components of the previous two together to get the best of both worlds. We still need to manually generate a handful of CSRs for the Machine SSL certificates of each PSC and vCenter Server which gives us 10 certificates to install and manage over time. And by letting the VMCA do its thing, we gain operational benefits as we grow our datacenter. We don’t have to mint a new CSR for every new ESXi host we add into a cluster. We just add it and let VMCA do it’s thing. The same is true for Solution Users.

Below is a table that captures the totals for each of the methods we’ve discussed.

Conclusion

What we have found in talking to these customers that are embracing the hybrid approach is that security teams are most concerned with securing the control plane of the administrators with certificates issued by the security team via their enterprise PKI. The hybrid approach addresses that for securing access to vSphere by replacing the Machine SSL certificate. Per best practices, access to ESXi management should be limited in nature and only done on an isolated network. To address administrative access to functions like the ESXi UI (introduced in 5.5 U3 and 6.0 U2), the VMCA CA certificate can be exported and added to the Trusted Root Certification Authorities container in an Active Directory group policy.

As you can see, the Hybrid approach is the best of both worlds. It addresses the security needs of the Security Team by protecting access to vCenter Server while it also addresses the operational needs of the IT team.

Comments

I do not see how the PSC should be any less secure then any other Issuing CA. I think that the number of ppl that should have access to the PSC should be very limited, doesn’t the same problems apply to any other issuing CA’s?

If someone gets a hold of you Sub CA certificate for the PSC, you can always revoke you Sub CA certificate, as it is not that hard to replace, as it is only one certificate that is done manually. If on the other hand the same thing happens in you Hybrid setup, you have to replace a bunch of certificates.

I recommends that you have you PSC as the third tier in you PKI infrastructure. (Root CA -> Issuing CA -> VMCA Sub Certificate) That way you can easily revoke you VMCA Sub CA Certificate, hense invalidating all “illegal” certificates.

Since we do not check CRLs in vSphere, you can revoke the subCA signing certificate but it won’t invalidate the certificates for vCenter Server, PSC, or ESX hosts. Maybe that’s not a big deal for you either, though.

In general, and through talking to many customers and Enterprise PKI teams, having a subCA that they do not control is a non-starter. Sure, you can revoke the subCA’s certificate but that could be after damage has already been done. They’d rather prevent the malicious activity rather than reactively clean it up.

The recommendation of the Hybrid approach comes through all of these interactions and we feel is the correct approach in general. As with everything, there will be exceptions. You can decide which approach is best for your org and customers.