Journey to the Virtual World

Monthly Archives: December 2017

I travel globally meeting VMware customers. A very popular request among customers is a simple set of dashboards that answer these questions:

Are the VMs served well?

If not, which VMs are affected? By what problems (CPU, RAM, Disk, Network)? How bad?

Is it because of Villain VMs, consuming excessive amount of shared resource? If yes, who are they?

Are the problems spread across clusters, networks, datastores? Or are they isolated to specific part of my IaaS?

How long has the problem been happening? Is there a pattern?

Is my Infrastructure running hot?

This could be a reason why the VMs were not served well.

If yes, which part? I need to see the 4 IaaS elements (CPU, RAM, Disk, Network), and easily spot where the problems are.

Blue = 0% = cold. Not used.

Red = 100% = hot. Highly utilized

Compute: Which cluster are running hot? Is it CPU or RAM?

Are the cluster balanced? Hosts in the cluster should have similar color.

Are the hosts of equal capacity? The bigger the host, the bigger the box, so I can spot it easily.

Select an ESXi, then click dashboard navigation to drill down into Troubleshoot a Hostdashboard.

Storage: Which datastores are running hot?

Hot = busy processing lots of IOPS.

Select a datastore, then click dashboard navigation to drill down into Troubleshoot a Datastore dashboard.

Network: Which LAN or VXLAN carries a lot of traffic?

The bigger the network (no of VMs or ports), the bigger the box.

The higher the traffic, the redder the color.

Using the dashboard best practices covered here, I translated the above into 4 dashboards. I added a simple VM Reclamation dashboard to complete the functionality. The picture below shows the functional relationship among the dashboards.

The result was 5 simple dashboards. It’s a lite version of Operationalize Your World, which has 50 dashboards. As a result, the import step is much simpler. It’s also upgradeable to the full OYW.

Are the VMs served well?

The above shows the present data. It’s suitable for live NOC screen, where you can see from a distance. All you want to see is green! You can customize the threshold, simply edit each widget.

Easy to spot the villain VMs. They are the biggest! If you have a large box occupying a relatively large area, that means you have a VM consuming a large percentage of your shared environment.

The above is not so good to show The Past. Unlike The Present (which has 1 data point), the past has many. For that, we need to use line chart. This is why the next dashboard is required.

Were the VMs served well?

Are my Infra running hot?

Was my Infra running hot?

Which clusters had the problem? Is it CPU or RAM? How bad is it?

Both max and average lines are shown so you get better idea.

If max is high but average is low, no one may complain yet. This is your proactive window!

Which datastores had the problem?

How bad is the situation?

Is the IO stuck in the queue?

What can I easily reclaim?

I focus on powered off VMs and Idle VMs as they are easier than active VMs.

From this dashboard, you can select a VM, then click dashboard navigation to drill down into VM Utilizationdashboard.

Implementation

Compare to the Operationalize Your World import step, this is much easier. It does not require preparation, which is time consuming. The strikethrough steps are not required.

A quick tip to help you to spend a less time to create your vRealize Operations dashboards. Sometimes when creating a dashboard, you need to understand what metrics are available, the relationship between objects, properties and so on. I created this simple dashboard to help that.

Just import this dashboard and save your precious time! It looks like this:

Hope you find it useful. Do reach out via Linkedin and Twitter. Thanks for reading!