Journey to the Virtual World

Tag Archives: Sunny Dua

In the previous post, I shared how we can quickly answer fundamental questions such as:

Is there any users out there who needs more RAM or CPU?

If yes, who and how much short are they? What time and how often they did this situation?

We covered CPU, let’s cover RAM now 🙂

RAM is not so simple. As you can see here, the Cached Memory and Free Memory are not visible outside the Guest OS. This means the counter you need to use should ideally be from the Guest, and not from the Hypervisor. The post here shows that it is possible that they differ.

The good thing in VDI is, Horizon View comes with the agent out of the box. The vRealize Operations for Horizon agent has been integrated into the base Horizon View agent. As a result, there is no need to deploy the vRealize Operations End Point agent.

Now, there are 2 ways we can determine when a user needs more RAM:

RAM usage is high.

Available RAM is low

I’m using the second one as it’s easier for you to see. If I show RAM Usage is 13574 MB, you still need to know the total configured RAM (e.g. 16 GB RAM), and then subtract the number. Well, that will take you to the Available RAM 🙂

Since we have lots of VDI users, the first thing we need to do is to ensure no one has high utilization that is too high, or runs out of available RAM. Super Metric comes in handy here. To find out if anyone runs out of Available RAM, you can create the super metric below.

Once you do that, it’s a matter of showing them as a line chart on the dashboard.

You do the same thing for the Committed Byte. Why do I use Committed Byte and not Memory In Use? Can you guess why?

Memory In Use can easily be determined. It is just Total RAM – Available RAM.

Committed Byte, on the other hand, does not always go hand in hand with Memory Usage. See this blog for the explanation. So we need to complement our Available RAM (MB) with Committed Memory (%). vRealize Operations for Horizon has the metric too.

The 2 super metrics will provide a good overview of the entire environment. We can just see the 2 line charts, and at a glance we know if everyone is doing well. If not, the list next to it will tell us which user was affected. The list is just using the standard View widget, which I covered in previous post.

V4V 6.2 lets you map the user name with the VM name and Windows name.

Hope that helps you in making sure your VDI users are happy, and productive! 🙂

VDI workload differs to Server workload. As a result, we cannot use the same approach to right size them. You probably know it well, so let me just highlight some of the differences

Usage is spiky, not predictable.

A server generally speaking has a nice predictable pattern on any given 5 minutes. The CPU and RAM does not go from 5% to 95% back and forth within 1 minute.

Human does not work non-stop.

Typing time, thinking time, meeting, coffee break, travelling, public holiday, sick leave, etc. A server, being a machine, has none of this 🙂

In this age of mobile cloud, there is no fixed “working hours”. Each user has his or her own work schedule. We cannot average across a long time period. 1 hour is probably as long as you want it when it comes to averaging.

You are user too. How long are you willing to wait for an application to launch? Yup, 1 minute 🙂

For the time being, I’d ignore Disk (IOPS) and Network, and just focus on Compute (CPU and RAM) for the time being.

As I have shared in this blog, RAM has different behaviour to CPU. As a result, we need a different counters for CPU and RAM.

For CPU, we should use the data from outside the Guest.
For RAM, we should use the data from inside the Guest.

Picking the right counter is critical. As you can see here, choosing the wrong counter can result in wrong decision.

Set aside the technology and tool, when should we give a user more CPU or more RAM?

Well… when she needs more.

How do we define “more”?

We must see her workload certainly.

If we want to be less generous, we consider the workload in the past 1 week.

If we want to provide a high performance, snappy VDI experience, then any given day is enough to warrant an upsize. We don’t wait for 1 week of unacceptable performance.

Ok, how do we get insight into the workload in the past 1 day? There is no point in getting the average of the last 24 hours, as she likely only generate workload for 8 hours. Maybe even less, as she may have meetings, phone calls, or even not in the office. As we said, the average will be low.

What we need is the Max of any given 5 minutes. This gives us insight whether she demanded more resource. 5 minute is a good and balanced window. Going to 1 minute will be too sensitive. Going to 10 minutes is too long for a user to wait.

vRealize Operations provides this via its View widget. The following screenshot show that it can display the Maximum during the sample period.

Beside the Maximum, what else do you notice?

Yup, I show Standard Deviation. I’ve shown below how you add it. Michael Ryom has written an excellent explanation here. Please read it first.

I’d use a simple example. Say user Marie CPU Workload average is 50% in the past 1 day. The standard deviation is 10%. That means in the past 24 hours, 95% of her workload falls between 30% – 70%. Standard Deviation formula states that 95% of the data falls within 2 standard deviation. If the max is 95%, that means she only hit that workload 5% in the past 1 day. That’s still 72 minutes, a long time from her viewpoint. 3 Standard Deviation takes us to 99.7%. That means 99.7% of the time, her CPU workload falls between 20 – 80%. That 0.3% translates into 4 minutes in the last 24 hours. So as what Michael said, the devil is in the detail, and now you have the details 🙂

Let’s now take a real example. Notice the first one has average of 21.52%. Standard Deviation is only 2.24%. Maximum is however a whopping 96%. So it is off the range. We can tell quickly that it not normal. Since the sample period below is 24 hours (1440 minutes), that means this is a one off data in vRealize Operations.

Zooming into the VM to plot the entire 24 hours, we can see it’s indeed one off. Bingo! 🙂

Now that you have insight, you can confidently decide if that’s a one off instance, or something that does need an upsize. BTW, since this is VDI, your starting line is probably 2 vCPU (I’d avoid going 1 vCPU) and you should only increment 1 vCPU at a time. Another word, I won’t jump from 2 vCPU to 4, 6, 8. I’d go 2, 3, 4, 5 as that hits my consolidation ratio.

Yes, that’s all you need to find out which User needs more CPU. Simple, yet accurate. Sometimes as engineer, we over engineering a solution 🙂

What about RAM?

Well… that’s a topic of another blog. I want you to review this first. Let me know which counter to use!

I must have committed a terrible thing in my previous life because I had to share a tiny hotel room with Sunny Dua. We were invited to present at VMworld to share real world stories on how customers operationalise performance and capacity management. The presentation was rated very high, and one reason was we shared a lot of dashboards. One key request was for us to share the dashboards in so others can just download and go.

Sharing room with Sunny also taught me about the man, and his relentless passion for vRealize Operations. I lost count how many times we discuss vRealize Operations at breakfast, lunch, dinner, and in the hotel room!

Both of us love Russell Peters, the famous comedian. We joked about “Be a man. Do the right thing!”. The right thing to do is to keep on sharing the power of vRealize Operations (and Log Insight). So I am super glad that Sunny has taken the initiative to share openly, so you can freely download it. Yup, no registration required. We are not interested in your personal particular 🙂

PS: After VMworld, we were also invited to share at VMware vForum, where we had a chance to share deeper. It also uses vRealize Operations 6.1. You can find the complete deck here. One reason for sharing is we are members of VMware CTO Ambassadors program. It is our duty to bridge the field (employee, partner, customer) with product team.