Monitor Your Virtual SQL Server Environment (Part 4 of 5)

Problem

Your last tip walked us through the planning stage and now I have a dozen or so virtual SQL Servers under by responsibility. As a DBA do I need to change the way I manage these servers? Is there anything I need to do differently? Check out this tip to learn about monitoring your virtual SQL Servers.

Solution

Well, you finally have some virtual servers. Most of you reading this tip probably already have a few virtual SQL Servers. Maybe they're mostly test systems but, most likely, you own a few light-weight, virtualized production systems. Those of you who have virtualized already will know that the sky did not fall and your company still has a need for your services. What you may notice is the picture has grown larger. If we take the analogy further and say that the frame around the picture are the tools you use to manage the environment then what you need to do is increase the frame or the tools you normally use so that you can see the larger picture. In this tip I'll focus on the new tools and terminology you'll need to effectively manage the new virtual environment.

VMWare Terminology for SQL Server Professionals

Keep in mind how a virtual environment provides a level of abstraction between the virtual machine and the physical hardware. This abstraction is what tends to confuse normal monitoring tools like Performance Monitor(Perfmon). Perfmon assumes the physical hardware is static. It doesn't understand the possibility that the hardware assigned to the server is simply an abstraction and can be dynamically added and removed based on set conditions or that there is a 3rd party controlling limits on how much hardware the server can see. Furthermore, there is an entire physical world occurring beneath the abstraction, which a tool like Perfmon cannot see. For example, if the CPU spikes Perfmon may see it, but have no idea that it was due to a new virtual machine being moved onto the host. A best practice is to be leery of time based counters. The reason for this is virtual machines can be overcommitted, which means the host will present more dedicated resources than what is actually reserved. Over time it can look as if the resources are constrained or available, but this may not be the case.

To see the big picture you'll need to spend some time getting to know the language. Perfmon has hundreds of counters, but you don't need to know all of them and you'll end up using only a few. The same thing goes for the VM counters. Here is a shortlist of the most common metrics and these are the ones you should definitely keep in your tool belt:

I've purposely left out any network related counters since I haven't encountered many SQL errors related to network problems. I'm not saying I haven't, but if you suspect network errors based on wait types then contact your network administrator first before bothering the VM team. The other counters are helpful when troubleshooting overall performance bottlenecks in a VM system and I'll briefly explain a few of them.

The most important concept to understand about resources in a VM environment is they are all pooled resources. You can have reserved memory, but that's much different then saying that memory is dedicated. Another way to look at it is the IO assigned to a physical host or cluster of hosts is an aggregate number. For example, if you have two physical hosts each with 16 GB of RAM and 4 dual-core CPUs at 4 GHz then you have 32 GB of RAM and 256 GHz of computing power. This aggregate number is what is available to be sliced up for each VM. Here is a diagram from the VMware architecture white paper demonstrating the concept of aggregate resources (I've provided a reference to the white paper in the Next Steps section):

SQL Server Memory Management on VMWare - Swapping and Ballooning

The two memory concepts to keep in mind are swapping and ballooning. There tends to be confusion between the them, but they completely different concepts. Keep in mind that a VM knows nothing about the host it runs on, but the host is aware of all the VMs. When you install the balloon driver on the guest VM it allows that guest to become "aware" of the host memory. They can now talk to each other. What this means is the host can then request that the guest give up some unused memory so another guest can use it. This can be an efficient and equitable relationship and most of the memory allocation is controlled by the guest OS. Swapping on the other hand is less friendly. Swapping means the host is stealing memory from the guest. Swapping also means the host is stealing memory whether or not that memory is kernel memory or clean buffer pages. This can cause serious problems for the guest. It is this reason that if you suspect memory performance problems you should first look at your swapping counters. Both excessive ballooning or excessive swapping could mean you have severely overcommitted your memory resources.

SQL Server CPU Management on VMWare

You'll want to look at the Ready counter for CPU issues. What is ready time? Ready time is the amount of time a guest vCPU waits on pCPU cycle. It's kind of like waiting in a bread line for a loaf of bread. If you have to wait a long time it usually means there aren't enough resources available handing out the bread. Like memory swapping this could also be a question of over committing resources. If you have 3 guests each with 2 vCPUs, but the host only has 4 pCPUs then maybe you underestimated how much idle CPU cycles would be available. You'll need to revisit your estimated workloads and add additional resources as necessary.

A concept I find interesting in VMWare is the concept of allocating CPU resources based on a proportional share-based algorithm (see the article on CPU Scheduling in Next Steps). What this essentially means is CPU is reserved for a VM and this reservation is called an "entitlement". So a guest OS is entitled to an assigned number of vCPUs. If the guest is not using a significant amount of its entitlement then that guest is stamped as a high priority or, in other worlds, most likely to get some of its entitlement taken away. This is classic you-snooze-you-lose means of allocating CPU resources. It doesn't matter what application is running or how critical the system is, what matters is whether or not you are using the resources you were assigned. If you aren't then the host will grab them so someone else can use it. Keep this in mind when setting up virtual machines on the same host. It may be in your best interest not to combine multiple end-of-quarter financial applications on the same host since both will need additional resources at the same time.

Summary of SQL Server Challenges on VMWare

This article only scratched the surface of the challenges you may face after virtualizing SQL Server. Here is a quick list of things I've noticed after virtualizing hundreds of SQL Servers of all shapes and sizes:

In most cases the hardware you use for your virtual environment will be faster than your physical hardware. This may significantly lessen the performance impact.

Be aware of what applications you mix and match on host. Don't obsess over it, but keep it in mind when provisioning your hosts.

Always be suspicious of the SAN. Look at disk latency and understand the SAN configuration.

Don't be afraid to ask for more IO. It is easy to provision, but don't go overboard. You won't encounter too many virtual machines needing more than 8 vCPUs or 16 GB of vRAM.

Get involved with the virtual community. Like SQL Server, the VM community is large and helpful. They even have their on conference - VMWorld. If you are handling a lot of virtual servers maybe you can get your company to pay for you to go?

After virtualizing your SQL Server there tends to be a habit of blaming every new problem on virtualization. Don't fall into this trap.

I think if setup correctly, virtualization is a great thing for SQL Server. It does cause some headaches like additional technology to learn, server sprawl, and even a small loss of control over your environment. Overall though the changes are fairly painless and are helpful to the organization. A virtual environment offers new opportunities like cloning, virtual desktop environments, and the cloud. My experiences with SQL Server and virtualization has for the most part been positive and I expect yours will be too. You'll find managing SQL in a virtual environment is not much different than managing it in physical environment. The same rules apply, but you're simply adding an additional level of abstraction. By better understanding the additional layer you'll find performance tuning just as fun and challenging as it was for your physical servers.