The platform services controller that was introduced in vSphere 6.0 has been a source of challenge for a lot of people who are upgrading into it. I have struggled to identify the best architecture to follow. This article assumes that you want to have a multi-vCenter single sign on domain with external PSC’s. There are a few key items to consider in architecting PSC’s:

Recovery

If you lose all PSC’s you cannot connect a vCenter to a new PSC you must re-install the vCenter loosing all data

To recover all failed PSC’s restore a single PSC from backup (Image level backup is supported) then redeploy new PSC’s for the rest. Restoring multiple PSC’s may introduce some inconsistencies depending on time of backup.

In 6.5 vCenter cannot be repointed to a PSC in a different site on the same domain (6.0 can)

All 6.x versions of vCenter do not support repointing to a PSC in a different domain

If you lose all PSC’s at a site you can install new PSC’s at the site as long as at least one PSC at another site survived then repoint the vCenter to the new PSC

Replication

All PSC replication is bi-directional but not automatically in a ring (big one)

By default each PSC is replicating with only a single other PSC (the one you select when installing the additional PSC)

Site names do not have anything to do with replication today they are a logical construct for load balancers and future usage

Changes are not unique to a site but to a domain – in other words all changes at all sites are replicated to all other PSC’s assuming they are part of the domain

If you use a load balancer configuration for PSC and have a failure of the active PSC the load balancer repoints to another PSC and no reconfiguration is required

Site name is important with load balancers you should place all PSC’s behind a load balancer in their own site – non-load balanced PSC’s at same site should have a different site name

Features

PSC’s have to be part of the same domain together to use enhanced linked mode

Performance

PSC can replicate to one or many other PSC’s (with an impact with many). You want to minimize the number of replication partners because of performance impact.

Topology

Ring is the supported topology best practice today

PSC’s know each other by IP address or domain name (ensure domain is correct including PTR) – using IP is discouraged because it can never be changed; use of FQDN allows for IP mobility.

PSC’s are authentication sources so NTP is critical and the same NTP across all PSC’s is critical. (If you join one PSC to AD all need to be joined to same AD – best not to mix appliance and windows PSC’s)

The only reason to have external PSC’s is to use enhanced linked mode – if you don’t need ELM use an embedded PSC with vCenter and back vCenter up at the same time – see http://vmware.com/go/psctree

Scalability

Current limits are on 8 PSC’s in a domain in 6.0 and 10 in a domain in 6.5

With all of these items in hand here are some design tips:

Always have n+1 PSC’s in other words never have a single PSC in a domain when using ELM

This is a challenging question. Let’s identify some design elements to consider

Failure of a single component should not create replication partitions

Complexity of setup should be minimized

Number of replication agreements should be minimized for performance reasons

Scaling out additional PSC’s should be as simple as possible

Ring

I spent some time in the ISP world and learned to love rings. They create two paths to every destination and are easy to setup and maintain. They do have issues when two points fail at the same time and potentially create partitions of routing until one of the two is restored. VMware recommends a ring topology for PSC’s at the time of this article as shown below:

Let’s review this topology against the design elements:

Failure of a single component should not create replication partitions

Each PSC replicates with its same site peer and one remote site peer thus making sure it’s changes are stored at two sites and with two copies that are then replicated locally and remotely (all four get it)

Let’s evaluate against the design elements:

Failure of a single component should not create replication partitions

True due to ring there are four ways for everything to replicate

Complexity of setup should be minimized

The setup requires forethought and at least one manual replication agreements

Number of replication agreements should be minimized for performance reasons

It has more replication agreements

Scaling out additional PSC’s should be as simple as possible

Adding a new PSC means potentially more replication agreements or more design

Update: The VVD reached out and wanted to be clear that adding additional sites is pretty easy. I believe the challenge comes when you try to identify disaster zones. Because PSC’s are replicating all changes everywhere it does not matter if all replication agreements fail you can still regenerate a site.

Which option should I use?

That is really up to you. I personally love the simplicity of a ring. Nether of these options increase availability of the PSC layer they are about data consistency and integrity. Use a load balancer if your management plane SLA does not support downtime.

About Author

Joseph Griffiths is a virtualization focused solutions architect who works with complex cloud based solutions. He currently holds many IT certifications including VMware VCDX-DCV and VCDX-CMA #143. This blog represents his random technical notes and thoughts. The thoughts expressed here do not reflect Joseph’s current employer in anyway. You can follow Joseph on Twitter @Gortees