The Virtualization Room

It may be free, but it’s not forgotten — or is it? VMware Server 2 is officially in beta, and while Yankee Group’s Gary Chen took the release as a sign of continued development on the part of VMware, virtualization.info’s Alessandro Perilli isn’t so sure:

It’s evident the company is spending most of its R&D and marketing efforts around ESX Server and Workstation. In 11 months no company (in a big ecosystem of over 200 technology partners) developed products for VMware Server, despite its price and wide feature-set could push adoption like no others.

In other virtualization news, Sun Microsystems Inc. CEO Jonathan Schwartz will take the stage at Oracle’s OpenWorld conference today and is expected to officially announce the company’s xVM virtualization strategy: an x86 hypervisor based on Xen and Sun xVM Ops Center for unified management. Sun has also rounded up support for xVM from Advanced Micro Devices (AMD), Intel, Microsoft, MySQL, Quest Software and Symantec and is launching OpenxVM.org, which Sun describes as “an open source community for developers building next-generation datacenter virtualization and management technologies.”

Speaking of Oracle, VMware isn’t taking this whole Oracle VM thing lying down. Going live today on VMware’s Web site is a page devoted to running Oracle on VMware ESX, with links to white papers, case studies and other resources.

ESX 3i is a great opportunity not only to reduce the local storage footprint requirement, but also provide additional RAM and CPU available to virtual machines. ESX 3i is a small footprint, featuring a hardware integrated hypervisor that provides the VMWare ESX server on a small local footprint at around 32 MB.

While the small footprint is attractive in providing a quick install and minimal build time for adding additional hosts and consistent configuration linearly to the VMWare Infrastructure. The other more attractive, and possibly overlooked piece, is that by removing the customized Red Hat Enterprise Linux (RHEL) operating system that hosts the hypervisor in ESX 3.0.2 and earlier, this can free up between two or three percent of local CPU and RAM resources. Alone those are not much, but consider a large VMWare implementation – saving that much local resources can effectively reduce your number of required VMWare ESX hosts systems by providing that much more resources back to the virtual machines. For example, if you have 100 VMWare ESX hosts, you have effectively added the CPU and RAM power of 2 or 3 VMWare hosts by removing the RHEL layer from the host with certain host configurations.

This is a positive direction for the ESX product. For those of you who are historically Windows administrators, how frequently have you tried to do something in the ESX RHEL that didn’t quite turn out as you expected? My secondary virtualization mentor told me:

“If you don’t know anything about Linux or Unix – that will be great for an ESX administrator. If you do have experience there, don’t make assumptions based on the standard product” when introducing ESX.

For most situations where I have tried to do tasks outside of Virtual Center or the ESX install, some other issue has arisen. This, of course, is excepted when VMWare documentation gives Linux commands to perform tasks, David Davis’ recent blog on enabling SSH and SFTP on ESX is a good example. By removing that layer, the ESX product is more aligned to what it needs to do — providing horsepower to guest operating systems with central management.

Does the world really need another hypervisor? Oracle clearly thinks so, and it announced Oracle VM, its version of the open source Xen hypervisor. The cool thing is Oracle pledges to support a wide swath of its enterprise apps running in a VM. But for now, the company’s per-CPU pricing remains unchanged, negating much of the economic incentive to virtualize an application.

In the market for a new server? Dell’s new PowerEdge R900 server is seemingly tailor-made to run as a virtualization host. Among other impressive specs, the 4U behemoth features a 32 dual in-line memory module slots, which by my feeble math, equals 128 GB of total system memory (32 GB*4 GB=128 GB).

Or if its Hewlett-Packard hardware that your data center sports, know that there are some new virtualization features available in HP Insight Control, the company’s x86 server hardware management software. Among them are Virtual Machine Manager 3.0 and the HP Server Migration Pack, which will take you from physical to virtual and back again.

Editors’ note: Welcome to Virtualization Log, a new feature we’re trying out on the SearchServerVirtualization.com blog. Look here for a daily roundup of virtualization news and tips published on the main site and on sister TechTarget publications.

At Gartner’s Data Center Summit 2007 in London yesterday, analysts said virtualization will be the most significant factor in adding agility to data centers through 2012.

I think we already figured that, since virtualization can significantly cut back the number of servers, space, power and cooling demands in data centers.

The takeaway from Gartners declaration: If you aren’t at least looking at virtualization for your data center, you are falling behind businesses that already are — and that isn’t a good place to be.

Gartner had some recommendations to organizations planning or implementing virtualization:

– When looking at IT projects, balance the virtualized and unvirtualized services. Also look at the investments and trade-offs;
– Reuse virtualized services across the portfolio. Every new project does not warrant a new virtualization technology or approach;
– Understand the impact of virtualization on the project’s life cycle. In particular, look for licensing, support and testing constraints;
– Focus not just on virtualization platforms, but also on the management tools and the impact on operations;
– Look for emerging standards for the management and virtualization space.

So, you are very proud of yourself because you can roll out Windows virtual systems like popcorn, right? Well, don’t forget to ensure that you are using the correct version of VMware tools. This is important because it provides an optimized inventory of hardware for the guest operating system. On a Windows guest operating system (OS), take a look at the device manager and see how many devices have VMware, Inc. listed as the manufacturer for the device. The VMware tools will apply the correct drivers to the SCSI and RAID controllers, network interfaces, video display adapters, and many more.

Why Does This Matter?

The presence of VMware tools is good, but just as important is the version of VMware tools. Each VMware product has its own version of VMware tools, and if you migrate via VMotion or the VMware converter, your version of the tools may be out of date. Some items will be natively recognized with obselete versions of VMware tools, while others may not yet be determined in the Device Manager. The best candidate here is the network interface. For example, if you have a virtual machine hosted on VMware ESX 2.5.4 and you wish to migrate this guest to your newer VMware ESX 3.0.2 system. Your migration via your tool of choice will proceed correctly enough, but you may soon discover an issue with the VM.

Last week I wrote an article on IT environments that chose Network File System (NFS) for their shared VMware storage, and at least one large IT shop corroborates my story. An IT administrator at a well-known investment management firm writes that he runs 45 VMware ESX 3.0.2 hosts that run more than 1,000 VMs entirely on Network Appliance network-attached storage (NAS) 3070 boxes — and with great success.

“We haven’t seen any issue with speed, that is for sure,” he writes.

Before switching to NetApp, the firm ran its environment on EMC and Hitachi storage area networks (SANs). The admin described the latter as “a pain,” “expensive,” and suffering from SCSI lock, manageability and host bus adapter (HBA) issues.

By moving to NetApp NAS, the firm has also realized another benefit: improved data protection. “We also love the fact that we save a lot of money on the backup solution. We just use snaps to another NetApp — no agents, no tapes, no overpaid workers, no maintenance contracts on over 1,000 servers.”

Bless its heart, NetApp also chimed in on the article, taking umbrage at statements made by Fairway Consulting Group CEO James Price. About a year ago, NetApp began testing NFS for VMware at the behest of customers looking for more manageable storage but who were worried about the ability of NAS and NFS to scale. What they found is that “NFS is robust enough to run production environments,” said Vaughn Stewart, virtualization evangelist with the company. In the coming months, NetApp plans to publish results of tests performed in conjunction with VMware.

At the same time, NetApp is working with VMware to get the company on board with NetApp’s “NFS is good” message. As it stands, “VMware is inconsistent throughout its documentation about the role of NFS,” said Phil Brotherton, NetApp senior director of enterprise solutions. It may be a tall order, as the storage community has a strong bias in favor of Fibre Channel SANs.

But in Brotherton’s view, some of that preference is a bit self-serving. “A lot of people are trained on a technology, and that’s a good reason to be biased toward it,” he said, adding that many shops have sunk a lot of money into existing Fibre Channel infrastructure. “But I also see a lot of people try to spin technical arguments to justify what is really a sunken cost argument. . . . I would love to see the discussion move past performance to why people are really using NFS; performance is not the issue.”

When it comes to displacing SAN with NFS, our nameless IT administrator echoed Brotherton’s opinion. “I can tell you that you there are some old-school SAN guys (myself included) that are scared that they might not be needed as much as they think. It is becoming easier and easier to use NFS for most everything. There are certain cases where a SAN is needed, but it is not necessary for every case.”

A recent discussion on the OpenBSD mailing list led to the assertion that virtualization decreases security. For those interested, a summary of the discussion is available on Kernel Trap. But proponents on both sides of the argument have taken to throwing about emotionally driven comments rather than thinking objectively about the subject. Of course, because the original comment labeled all those as”stupid” and “deluded” who think virtualization somehow contributes to security weaknesses, who can really blame people for getting a bit emotional? All the flame-war commentary aside, the question remains, does virtualization weaken security? The original argument that virtualization can diminish security was based on two points:

If software engineers cannot create an OS or application without bugs, what hope does a virtualization solution have to be bug-free?

x86 hardware is ill-suited for virtualization.

Bug-free software
The first point does two things: it lumps all software engineers, operating systems, and applications into one pool and assumes that it is possible to find bug-free code.

Addressing the sub-points in order, while it is true that software engineers are human (and we make mistakes) and that software in general has a track record of imperfections, it is also true that the world does not judge all software engineers or software to be the same. In fact, I would guess that a lot of members of the OpenBSD mailing list in fact prefer OpenBSD to, Windows, let’s say. However, there are many readers of this blog that may prefer Windows to OpenBSD, or Linux, or OS X, etc. The same preference could be applied to office suites (OpenOffice, StarOffice, MS Office, KOffice, etc.). The fact of the matter is that we all have our own preferences: we do not judge software to be the same.

Secondly, the first point argues that the community should expect a bug-free hypervisor, and anything less contributes to the decrease of the overall security of a server platform. This is a very lofty expectation indeed! A very long time ago I wrote to Slashdot, heck, almost seven years ago now, and asked the question why it was not possible for developers to spend more time on projects and produce bug-free software. Commander Taco (the ring-leader of Slashdot) himself replied to me and said that it was a foolish expectation: software is 1) written by humans and 2) is far too complex today to be without errors. However, people still judge some operating systems to be more secure than others. The same for kernels. How can such a judgment possibly be made if all software has bugs? The answer is “easily.” We observe the rapidity that bugs are discovered in software, the impact that they have on the IT infrastructure across the world, the speed at which operating system and independent software vendors (OSVs and ISVs) release patches, and how easily those patches are applied without affecting the rest of the server platform, and then we judge the security of a piece of software. Therefore we do not judge a piece of software to be secure because it contains no bugs, but rather by the history of its imperfections and how quickly blemishes are removed.

Notice that I did not say whether or not I agreed with Mr. Slashdot. I do not. I do believe that software designed be generally purposed, such as today’s OSes, is doomed to be bug-laden, simply because it lacks a specific purpose and too many conditions have to be accounted for. However, imagine if the same leeway were given to the software that runs our air-traffic control systems? Or military installations? Such software is held to a higher standard, and it can be in part because it is designed with a specific purpose. The same is true of hypervisors: they are designed specifically for one purpose. They do not yet have all the cruft and bloat sitting on top of them that today’s OSes do. Here’s hoping that the ISVs producing today’s leading virtualization solutions step up to the challenge.

In short, I believe that there is a reasonable expectation that hypervisors will be a lot more secure than general purpose applications written on top of general purpose OSes could ever hope to be.

x86 hardware
Yes, the x86 instruction set was never designed to be virtualized, but to say that the instruction set has not grown well above and beyond its original intentions is to do an injustice to the original minds at Intel who took part in creating one of the most persevering pieces of technology to date. With the first set of virtualization extensions, those created to solve the problems of ring compression and ring aliasing, the x86 instruction set was given a breath of new life. And with the latest extensions, enabling live migrations of virtual machines across multiple generations of processor versions, the staying power of the x86 instruction set in a world with virtualization has been increased even further.

My point is simply that just because the x86 instruction set was not designed with virtualization in mind does not mean that it cannot work, and work securely. That is the beauty of x86: it can be extended to do what we need it to do. History has spoken.

Conclusions
With both of the original arguments shown to be false, is the conclusion of the original argument then reversed? Not completely. While virtualization does not decrease security, the potential for it to do so is there. Hypervisors are software, and although they are a lot less likely to have bugs than a general-purpose piece of software, bugs can still occur. However, to blanketly state that virtualization decreases security is far too general as there are many different implementations of virtualization. For example, if VMware ESX was found to have a memory sharing bug that allowed one virtual machine to read and write the memory of another, does this mean that XenSource is immediately compromised? Of course not. So even when a bug in a hypervisor is found it does not immediately mean that all virtualization is suddenly subject to the same problem.

As I stated earlier, the security of any software package is judged by the number of bugs historically observed, their impact (potential or real), and how quickly the parties responsible for said software fix the bug. While the potential is there, it is far too soon to observe whether or not virtualization decreases security. Only time will tell.

What do you do when you are giving a session on virtualization migrations, and no one shows up? You sit down and blog about how you are giving a session on virtualization migrations and no one showed up! In all fairness, TechTarget is giving away a Mercedes-Benz right now, so I think people are eagerly awaiting that drawing, and my session is right after lunch, AND it *is* a repeat of one I did earlier today. Yes, those are the things that I will tell myself tonight as I curl into a fetal position and sob in a corner

DCD 2007 has been a natural extension of last year’s conference. 2006 saw virtualization come into its own in the enterprise, and 2007 has seen virtualization mature into a ready-to-use, ready-to-integrate solution for many of today’s data center related problems. One of those problems has been disaster recovery, business continuity, and business resumption. This year’s DCD conference has had no shortage of innovative and instructional sessions on how to create cost-effective BC solutions using virtualization. Two technologies that are paving the way for these SMB to enterprise BC solutions are iSCSI and 10 GbE. Virtualization requires shared storage to enable any of the features used in DR and BC scenarios (LiveMigration, Resource Scheduling, Power Management), but for the longest time shared storage has meant fibre channel SANs, an expensive proposition for many. The continued success of iSCSI and the eventual commoditization of 10 GbE will enable enterprise-class shared storage at a fraction of the cost for SMBs and enterprise data centers alike. Inexpensive, enterprise-class shared storage will result in the availability of virtualization’s high end features that contribute to DR and BC solutions for all organizations — small, large, and everything in between.

Once again, DCD 2007 was a natural extension of 2006 — the data center has continued to evolve around virtualization. What will next year bring? My best guess? To steal from a colleague, we are going to start hearing about the highly-utilized data center, the dynamic data center, the data center that can be managed and monitored from a single console. And who is to say that the data center management of the future will require any human interaction at all?

Simply put, ROI is defined as the “ratio of money gained or lost on an investment relative to the amount of money invested”. One formula used to determine ROI is “net income plus interest divided by the book value of assets equals Return On Investment.”

In real terms, when you invest in a technology for your business, it’s about more than that. IT-related ROI often needs to provide cost savings, rather than generate revenue. In the case of virtualization for consolidation, this is often a simple calculation made difficult by many variables.

Variable 1) Power

This is a hot topic in the virtualization world, and has been ever since energy costs spiked and data center electric bills started going through the roof. Tracking the ROI of energy savings requires discipline, but in a large environment the numbers can be significant. It’s important to get a good baseline before implementing server consolidation via virtualization, which means getting the bills from the previous few years and calculating average monthly and yearly energy costs. Then, after the project is complete the process must be repeated and the results compared. Lastly, as the elapsed time periods pre- and post-project are matched up, the calculation must be re-run.

As an example, if you have an average cost of $50,000 per year for power over five years pre-consolidation, you need to calculate each year out and then each month, so that in the first month post-project you can compare that same month the year before, and then in six months the cost of the same six months the year before, etc. etc. This shows how fluid ROI can be over time, but how important it can be to be disciplined in tracking numbers like that to show success and failure rates over the long-haul, and not just the last quarter. Whether to include market fluctuations in power costs into your calculations or not is one I’ll leave to the reader. I personally don’t, because there’s one thing I can count on: Costs go up. If your bill is paid by the company, and includes other sources such as cube farms and the cafeteria, the calculations can still be made, but good luck removing the non-IT variables if needed (like say, the cafeteria closing for a month for renovations… that will cut power use dramatically).

Variable 2) Long-Term Staffing and Consulting

The hardest calculation of them all. How much did it cost for you to pay those consultants? How much time did your staff invest in the project, and how much is that time worth as an overall portion of their salary and benefits? Do you even calculate benefits as a factor in ROI? How much was spent on training and other job-related benefits during the time? Did a server fall on somebody’s foot and cause a Worker’s Compensation claim? How much time are staff members going to spend on administration? How does this impact other processes, and what’s the cost to them? The short answer is that you will spend more on virtualization experts, but less on hardware technicians, because there will be less hardware to break. This teeter-totter of staffing will carry over into several types of team – including networking, storage, etc. Tally the fully-burdened costs and compare them to pre- and post-project figures. Nobody likes to think about laying people off because they aren’t necessary anymore, but retasking is good for the soul, and often for the career of the retasked. That means you need to calculate the training costs outside of virtualization as well.

Variable 3) Infrastructure Hardware and Software

The easiest calculation of them them all. How much did it cost you to acquire all of your assets over how long a period of time? What is the average cost per year for an average growth rate? How much can you then expect to spend over an equivalent period of time in the future using that average, versus how much you project to spend using virtualization-based server consolidation. If you use chargebacks, what do you charge and how can that be reduced? If you reduce chargeback costs, should you be factoring in their lower costs to your ROI calculation on a seperate line item?

Variable 4) Services Reduction

That’s right – less services. Less management of services too. Backup and DR comes to mind as a prime service that can be reduced. A smart shop backs up as many virtual machines as they can using storage snapshots or virtual machine snapshots and then moves those snapshots to a remote location without the need for tape. That means no more tape pickups, which is a service reduction. Even for those shops who have systems where backups of the data in the guest machine still needs to be completed, there’s a serious reduction in services because there’s a huge reduction in tapes used and stored. There are also faster restore times. Take for example, if a file server falls over due to an OS corruption cause by a conflicting set patches – restore from the snapshot, and your in business. No call for tape, no waiting for delivery, and only minimal downtime. This is just one area where services are reduced, yet greater service is provided. Others include provisioning new servers, which in a large environment is time-consuming and costly. Replacing dozens of servers sitting cold in a DR facility with a few hefty virtualized systems can reduce physical storage costs just in terms of rack space and square footage. Needless to say, the calculations for this vary from shop to shop, and you will have to find your own service reduction ROI points. Some places to look:

Reducing tapes

Reducing tape and DR facility storage fees

Decreasing time and personnel costs to prepare new infrastructure

Decreasing hardware support / warranty contracts

Variable 5) Service Increases

Availability comes to mind here – no more worrying about hardware failures requiring a huge restore window means a huge bump in availability numbers. In the case of DR, there’s most likley a pre-determined cost per picosecond of business downtime – that figure is just ripe for plucking into an ROI calculation (albeit on a seperate line), because with tools like VMware’s VMotion, HA, and DRS, the time-to-recover from failures is drastically reduced. This means that the company is losing less money due to an outage, and therefore each tracked outage can be tallied up and compated to the pre-virtualization outages, yielding a good source of ROI from loss-aversion.

That’s the positive part of ROI – remember that ROI comes with a built-in double-edged sword – some costs will go up. In the services arena, you will pay more for the increased networking required for good remote DR. In the training arena, you will pay more for virtualization training. In salaries, you will pay more for virtualization experts. The list goes on. The “trick” of ROI is in being complete – finding all of the increases and decreases in costs that virtualization brings. I’m willing to bet that any environment with more than ten servers will get positive ROI in less than a year. The long and short of doing an ROI analysis is this – it’s a long, involved process that won’t give real numbers worth a darn if you don’t take the time to analyze your entire business-technology environment for the correct numbers. Claiming a positive ROI by server consolidation alone is a great win, but not at the cost of missing other aspects of your business’ ROI. To sum up, look at the following for sources of ROI:

Hardware Costs

Software Costs

Physical Storage Costs

Downtime Costs (averaged w/ equal periods, pre-project)

Consumables Costs (tapes)

Chargeback Costs

Salary and Benefits Costs

Training Costs

Consultant Costs

Energy Costs

Put these into two main columns, what you spent on the project and in production post-project and what you saved from pre-project expeditures. Adjust for inflation and print.

The VMware Certified Professional exam isn’t the easiest exam in the world, nor the hardest (many diverge on whether Red Hat, Citrix, or Cisco win that title for their premier certification levels), but it’s rapidly becoming one of the most sought-after enterprise certifications available. Since I’m still prepping for the test, and haven’t sat for it, I’m a little leery of making that statement, but I’ve sat through enough CBTs that are acclaimed for their similarity to the exam that I feel safe putting it out there, and I’ve taken the only class that VMware has geared towards the VCP exam. The question is why… Why is the VCP so valuable?

Because it’s a hot technology in the truest sense of the word, and one that isn’t likely to go away anytime soon. The story is slightly reminiscent of another darling of Wall Street, Citrix Systems, which appeared to fill in a void left by the need to remotely access applications and which has consistently improved its product lines and scope of business over the years. Interestingly enough, Citrix and VMware are about to go head to head in the market for virtualization customers following the CItrix / XenSource deal, and Citrix has always had a strong certification program (perhaps we’ll see some new certs, maybe CCXA – Citrix Certified Xen Admin or maybe CCVA – Citrix Certified Virtualization Admin). VMware, like Citrix, has made its mark with a single product (ESX), and then branched out to add more and more to the product line, always ensuring that there was a clear line of sight back from all of the added products to their main line (aka their “Core Competency” for those paradigm-shifting process re-engineering, value-adding self-empowered framework info-architects out there). This focus will make VMware an adopted technology almost everywhere, and another indicator of VMware’s “hotness” are their impressive number of customers. VMware claims 100% of the Fortune 100 and another 20,000 enterprise-level customers. That’s an impressive stock of customers in a relatively short lifespan.

Look no further than the IPO for proof that VMware is hot:

What do these numbers mean when you put them together? People are using the product at all levels of business, and business is expanding into new market spaces, new market locations, and new customers. This means jobs are open for qualified staff, contracts are available for qualified consultants, and in both cases, money is there to be made. Like the MCSE when Windows NT4 debuted, the CNE when Novell 4 debuted, and all of the other business-transforming certifications, having the VCP is ticket to a higher salary. In fact, the VCP is one of the hottest salary items out there, judging from this comparison I made using the tools at indeed.com (in US dollars, covering the US market):

Looking at those numbers, I’m astonished… a single-test exam cert beats out all but one of the other infrastructure certs in that list, including the Microsoft Certified Systems Engineer (seven exams at minimum), the Red Hat Certified Engineer (three exams, one of which is an on-site, hands-on lab), and the CCEA (five exams, one of which is also a hands-on lab). This is the market in a nutshell – the growth of virtualization is fueling a need, one that exceeds the need for other certifications, and so demand is driving up the earnings potential of VCPs just as much as the expertise level of the cert, if not more. In the long run, it will fade back in with the pack, but by then VMware will have undoubtedly expanded their certification program to include the typical technician/sysadmin/architect tier that most certification tracks fit into. The long and short of why the VMware’s certifications will remain higher than other certs is because all other non-hardware, IT-related products can run on top of VMware, inside guest machines. This in turn means that at the basic level, VMware will be more important than them all because it’s the root of the tree. Trained, certified people will command bigger salaries because VMware’s root-level position means it will also be the top of the food chain – hardware / virtualization / operating system / application.

This isn’t just a US trend, there are numbers from many countries as well. I qualified the chart above by it’s location because there’s also this graph from itjobswatch.co.uk that I’d like to share:

Want to travel the world? Get VMware certified and go to work for the right company!

The really great thing about the VCP is that it’s cross-disciplinary – you have to have (or learn) some good all-around IT knowledge to pass the exam. Much like Citrix’s exams, you need to know more than just the core product. For the Citrix CCA and up it’s takes Windows, networking, and Citrix knowledge. For the VCP exam you need to know a little bit about operating system administration, hardware configuration, shared storage, and networking in order to pass, because each of these plays a role inside the Virtual Infrastructure platform. Personally, I can’t wait to take the exam – it’s not going to have one little bit of impact on my current position or any probably future positions (I’m at the top of the food chain in my day job), but I want it for the same reason I hold a CCA – at heart I’m a generalist who loves to know how things work (the official line is “so that I can understand their impact on the business and their relevance to meeting the shared goals of our company’s mission”, but I have to admit, I also enjoy the learning and the tinkering for their own sakes).

It’s not just a certification, like the slew of them we can all put on out resumes to impress the next interviewer, it’s a career-enhancing move. Does the certification make you any more of an expert than you may or may not be? That’d be debatable on a case-by-case, person-by-person basis, but it’s definitely a mark that you know what’s hot, and aren’t “stale”. The hotness-factor of the VCP is high, the demand is there in the market, and the value a VCP can add to your career makes it worth the time and effort to earn.