I had recently a significant amount of requests to add Red Hat’s KVM-based Redhat Enterprise Virtualization to the Virtualization Matrix. After several long flights and weekends … here it is.

With the RHEV 3.x release Red Hat had created an offering that by many of my clients is perceived to be a cost-effective “good enough” (open-source based) alternative to e.g. VMware for many use cases and has further improved on this with its 3.1 release.
Red Hat is up against strong competition for the coveted “alternative virtualization vendor” spot. While Microsoft has released the “all inclusive” Windows Server / System Center 2012 release (already added to the Virtualization Matrix) with greatly improved virtualization and included private cloud capabilities, Citrix is starting to carve out a niche for the desktop virtualization and cloud service provider market with XenServer and its (Apache CloudStack based) CloudPlatform suite.

What’s new with RHEV 3.1?

So what’s new with 3.1 – one of the most anticipated feature was the ability to perform live snapshots (now possible) and the ability to perform live storage migration (now a Technology Preview) but there is far more in this release:

Admin Portal:

Cross-platform User Interface – the new web admin portal interface introduced as a (technology preview in RHEV 3.0) is now fully supported

Windows independent – it is now provided as a complete replacement of the Windows Presentation Framework (WPF) interface used in previous releases, including internationalization and improved user experience

Reporting – reporting functionality is now exposed from within the Administration Portal itself (standalone Reporting Portal is still available). Reporting dashboards for the system, specific data centers, or specific clusters, are now available from the Dashboard tab.

Tasks – a “Tasks”tab has been added to the Admin Portal, to monitor of long running operations and tasks

Guests:

Windows drivers – virtio-win drivers for Windows guests are now available as inf and ini files on the guest tools ISO (in addition to virtual floppy)

CPU Pinning – it’s now possible to pin the virtual CPUs (vCPUs) of a guest virtual machine to specific physical CPU cores on the host from the UI to control performance aspects

P2V – new physical-to-virtual machine tool (previously only V2V) – providing an ISO image to boot from a CD or USB, select the disks on the physical machine and then export the disk to convert it to a vm

Network:

Hot Plug for vNICs – Hot plugging and unplugging of vNICs attached to a vm is now supported (without stopping the vm)

Bridge-less Network Support – it’s now possible to define logical networks on a virtualization host without requiring a bridge to support that network (except if the logical network is marked as a “virtual machine network”)

New Network Setup Dialog – add or remove networks, add or remove bonds, and attach networks to bonds or detach networks from bonds in a single transaction.

Port Mirroring – configure the virtual Network Interface Card (vNIC) of a virtual machine to run in promiscuous mode, allowing the vm to monitor all traffic to other vNICs (e.g. useful for intrusion detection)

Configurable Maximum Transmission Unit (MTU) – configure the MTU of a logical network from the UI

Storage:

Live Snapshots – Snapshots of a vm can now be created without first having to stop it

Clone Virtual Machine from Snapshot – support for creating vms from snapshots

Floating Disks and Shared Disks – Floating Disks can be attached, and detached, from virtual machines throughout the data center as required; Shared Disks are disks that are attached to multiple virtual machines at the same time.

Hot Plug of Disks – attach disks to, and detach disks from vms without first having to stop the virtual machine

Direct LUN Support – attach any block device to a virtual machine as a disk by specifying the block device’s GUID (without VDSM hook script)

Cross Storage Domain Virtual Machines – create a vm which has disks on multiple different storage domains (previously all disks for a virtual machine had to be stored on the same storage domain)

Automatic Storage Domain Recovery – when a storage domain becomes temporarily inactive or non-operational RHEV-M will now automatically recover and update the status of the storage domain when it becomes available again.

Linux Command Line Interface – a CLI for interacting with the RHEV-M (manager) using the REST API, is now available.

Python Software Development Kit (SDK) – a Python SDK for interacting with RHEV-M using the REST API, is now available.

Session Support, non-administrative User API Access

As I said before, interest in RHEV is great, actual adoption will be determined by Red Hat’s ability to expand the ecosystem around RHEV (competing with the massive ecosystem around VMware and Microsoft’s naturally extensive ecosystem around the Windows platform).
Enhancements in the management of peripheral and operational aspects will be key (beyond the fundamental virtualization platform management). In addition Red Hat will be keen to create a well articulated cloud strategy, positioning it’s current CloudForms (IaaS) and OpenShift (PaaS) platforms, specifically in the context of OpenStack.

The recently announced acquisition of ManageIQ (Cloud Management and Automation) could provide the needed arsenal to bolster its capabilities in these areas …

Ahead of the general availability of System Center 2012 SP1, I’ve added an extensive feature comparison of Hyper-V in Windows Server 2012 and System Center 2012 SP1 (100+ individual features) to the Virtualization Matrix.

The matrix now includes a separatelisting for each edition with virtualization / cloud related features of Windows Server 2012:

Windows Server 2012 Standard

Windows Server 2012 Datacenter

Free Hyper-V Server + Free Management

Free Hyper-V Server + System Center 2012 SP1

A couple of comments:

Please note that SP1 (System Center) is required to manage Windows 2012 hosts (GA expected early 2013) so all System Center related features listed are based on SP1.

You can select the appropriate ‘Editions‘ e.g. Standard, Datacenter or Hyper-V Server under the vendor (Microsoft) and product (Hyper-V 2012) – then simply click on the “refresh” button (see picture)

A new tick-box allows you now to include “previous versions” in any comparison e.g. older versions of vSphere, Hyper-V and XenServer and – soon to come – Redhat Enterprise Virtiualization 3.1 RHEV.

As the free ‘Hyper-V Server 2012′ can either be managed with the (fee-based) System Center (SP1) or the included (free) management tools (Server Manager, Hyper-V Manager, PowerShell etc) I’ve decided to add separate “editions” for ‘Hyper-V Server 2012 with System Center’ as well as ‘Hyper-V Server without System Center. This will avoid the confusion I’ve seen arising when mixing ‘fee’ and ‘free’ in any comparison.

What’s New in Windows Server 2012 and System Center 2012 – the main improvements included amongst many others in the Virtualization Matrix comparison are:

Some IT departments now have to prove that Hyper-V is “not good enough” if they want to justify continued use of VMware for all workloads

Given the interest I have seen from clients it is clear that Windows Server 2012 & System Center will make a significant impact in the virtualization space this year. The long list of enhancements is impressive. But the most appealing aspect is the “all-inclusive” packaging that comes with Cloud, Desktop Virtualization as well as physical and virtual operations management that specifically appeals to the small / medium enterprises.

I’ve had comments from several of my (also larger clients) with existing MS licensing agreements that they now have to prove to their management that Hyper-V is “not good enough” if they want to justify continued use of VMware for all workloads. So tables have turned a little.

But let’s also be realistic, while some VMware customers might be longing for the “second-vendor” alternative, they are also used to a mature, intuitive and sophisticated product.

Microsoft will have to demonstrate that they can match these high expectations over time, otherwise potential customers could simply end up using this “alternative” as leverage for price negotiations with VMware – certainly a scenario Microsoft will want to avoid …

Enjoy!

Andy

PS Hover over the individual feature or click on it (pop-up) to get background information and details on the evaluation (see below for an example).

Eyes glaze over, the meaningless automatic nodding starts and you can feel the person’s mind is miles away … yes, I admit, I had several such fruitless attempts of explaining the concept and benefits of using ‘stateless versus dedicated desktops’.

The inconsistency of todays “VDI” terminology doesn’t help and that includes the description of the relationship between the user and the desktop image.
Stateless desktops are often also referred to as ‘non-persistent’, ‘pooled’ and ‘dynamic’ and dedicated desktops as “persistent” or “private”.

The image deployment model has a fundamental impact on the arguably most important metric for VDI – cost per user – so getting your point across to even the potentially less technical folks is imperative.

… I hope your eyes haven’t glazed over yet …

Actually its simple – a stateless desktop is to the IT Administrator what a hotel room is to a property manager.

Assume a property manager (IT administrator) has been tasked to provide and maintain accommodation (virtual desktops) on a tight budget to a large number of tenants (VDI users … you get the drift …). He knows that unless housing is considered functional and homely by the tenants (user experience) the project will be considered a failure.

He evaluates two approaches:

Hotel apartments (stateless desktops)

Residential area with private properties (dedicated desktops)

Let’s see how these approaches compare …

Hotel: (stateless desktop)

Tenant checks into the hotel and get any available apartment (desktop) allocated.

On check-in the tenant typically brings their suitcase (user profile) that they use to populate or customize the apartment with personal things they need or like while in it.

They use it for a period of time, will check out when not needed which means all personal belongings will taken away and stored in their suitcase for the next visit.

The apartment will be cleaned (“reset”) so that the next tenant finds it “as new” (changes to the desktop will be discarded; making the desktop itself “stateless”)

The apartment will then be made available to any other potential tenant.

The next time our tenant checks into the hotel he/she will (most likely) get a totally different apartment (remember … desktop) and won’t care as long as it provides the same functionality.

Tenant’s View (user):

Functionality: Good, as the apartments are equipped with all the facilities and appliances commonly required e.g. kitchen, bathroom etc (MS office, Mail applications etc. built into the Golden image)

Personalization: Typically adequate, the level of personalization varies – depending on how big your suitcase is and what you are allowed to bring, the hotel chain would provide centralized storage for permanent personal items outside your apartment or hotel (network drives, folder redirection), some even provide the equivalent of a personal designer service that allows for advanced customization of your apartment to make it really feel like yours (advanced profile management software like AppSense, RES etc).

Major functionality upgrades: The hotel will obviously not allow you to buy a personal home cinema system (your favourite PC game) and permanently install it in the hotel apartment. You could … but be assured that it won’t be there next time you check in (remember, you’ll check into a different room and changes you make to the apartment get cleaned up anyway).The hotel could however provide custom services (applications) through alternative methods if required, think of it as ‘room service’ without having to install a kitchen (e.g. application streaming or XenApp publishing).

Property Manager’s View (IT administrator):

Build and Maintenence Effort: Low, a collection of standard “cookie cutter” apartments from a common blue-print (golden image) NB: I’ll avoid the delta disk/Linked Clone analogy. A common set of furniture and appliances (apps) can be maintained across all apartments. For any custom services like room service to complement the base functionality, additional facilities (cost) are required but can be handled centralized (e.g. streamed apps).

Availability Requirements: Low, If a apartment becomes unavailable due to scheduled maintenance or unforeseen problems (flooded bath = image corrupted throughuser error) the tenant can simply check out and check in to another apartment (connects to another desktop). There is no dependency between the tenant and a apartment (user and desktop image).Even if the entire hotel experiences a power cut (host failure) with all apartments becoming unusable, the tenant can simply check into a apartment in another hotel as long as total capacity across all hotels is sufficient.

Utilization: The apartments can be oversubscribed (need to accommodate number of concurrent tenants only)

So the stateless desktop provides the user with a set of common capabilities and application, a mechanism to personalize, use and store personal data permanently that is accessible from any desktop but natively does not allow you to install personal applications into the image. You will never own the desktop but the user experience is close to that of a privately owned one, giving a suitable experience for most users.

The stateless desktop allows the administrator to build desktops from a common image base that is easily deployed and maintained, the stateless desktop itself does not need to be made highly available and can easily be replaced with another available desktop if the desktop (or host) becomes unavailable.

The critical personal data is logically separated from the common image (using ‘suitcase’ and ‘centralized hotel storage space’). This results in greatly reduced storage, availability and backup requirements, allowing the use of cheaper local storage for the common desktop image files as described in detail in this post here.

In Contrast – Private Properties (Dedicated Desktops):

Properties build from a common blue print, also offering custom built “executive” houses

Even if they are built from a common blue-print, tenants can and will customize them over time in any way they want, over time they will become unique.

Tenant’s View (user) :

Functionality and major upgrades: “unlimited”, the apartments are already equipped with common facilities (applications) and the tenant can install any additional ones they would like.

Personalization: Great, the tenant owns the property, they will personalize every aspect of the house and permanently store personal items anywhere in the property (personal data anywhere in the image).

Property Manager’s View (IT administrator):

Build and Maintenance Effort: High, any custom build will require additional design effort (image). Even if the initial build is from a common blue-print the property becomes unique over time anyway. Maintaining and supporting these additional facilities (applications) as well as controlling compliance with property regulations increases cost significantly.

Utilization: The property is yours, if you are not using it, no on else can, it will remain empty (desktop unused), no over subscription is possible.

Availability Requirements: Very High, if your property becomes unavailable due to scheduled maintenance or unforeseen issues the tenant is ‘homeless’. In the event that your home itself (your desktop) gets destroyed it would have to be rebuild from scratch (assuming your property manager maintains updated “build plans” of your ever changing property (image backups) - all driving up the maintenance cost for your property significantly.If the infrastructure (e.g. electricity) running your private property fails (host failure) the property will be unusable unless it has been build with redundant/ shared facilities that can take over and run your property instead (host level failover using shared storage) – again, driving up cost massively.There is inherent dependency between the tenant and the property (user and his/her desktop image).

… Reality is that the tenant will probably check into a ‘hotel’ at this point ;)

So while the tenant or better desktop user, will appreciate the potentially unlimited level of personalization and upgrade of functionality, this scenario is a nightmare for the IT organization.

Maintenance of a large number of unique images requires careful backup and availability planning, maintaining the additional applications (or correcting issues they can cause) will result in significant administrative overhead compared to stateless images.

The infrastructure required to run these highly available images will drive up cost significantly – specifically through a drastic increase in shared storage requirements.

The private property approach is however the one we are used to (who wants to live in a hotel?) … and for VDI users with specific requirements or simply executives who want maximum functionality whatever the cost a dedicated desktop has its place. We often see hybrid deployments and the key to success (reducing cost) is a careful user categorization and analysis of functional requirements to increase the share of stateless desktops in your environment.

Future:

We have seen technologies becoming mainstream that blend the the two approaches. They have been around for some time as point solutions like Unidesk but are increasingly integrated into the vendor apartments with Citrix’s personal vDisk being a great example.

Imagine you are in a hotel that provides the futuristic feature of a “floating” personal room that can be detached and magically attached to any of the apartments.

The tenant is allowed to store any personal items and even install the above mentioned home cinema system (or any other personal applications) in this “floating personal room” (personal vDisk).
When the user moves between apartments the personal room will be detached and reattached to the new apartment retaining the personalization and functionality that it provides over and above the standard apartment even if the apartment was cleaned or refurbished (image reset or recomposed).

If you are familiar with VMware View’s “persistent disk” or Verde’s “user disk” implementation you know that this personal “room” exists today but can only be used to store your suitcase items (profile) and items you’d have put into hotel storage (my documents etc.) surviving a clean of the apartment or even a refurbishment (reset or recomposing of the image). If you however decided to install the above home cinema system (personal application) in this room it would be there after the apartment was cleaned/refurbished but it would not function anymore.

Why?Well, the installation of these applications also make changes to the base image (think of it installing a power junction in the standard hotel apartment (not your personal room) to power your home cinema system. There is no intelligence that tracks the dependencies and changes and when you try to reattach the magic floating room to a new apartment the required power junction is simply not there. So the home cinema system is still physically in your room but won’t function.

When using the personal vDisk a filter driver in the image will track all changes and ensure that they are routed to your “personal room” and more importantly that they continue to exist in isolation from the base image (think of it as installing a duplicate power supply in the floating room rather than utilizing the existing one in the apartment).

The result is a model that preserves the best of both approaches in the VDI world, a stateless base image (with all the associated benefits) combined with a ‘layer” (room) of personal applications and customizations (requiring only those to be highly available and backed up rather than the entire image).

Even the personal vdisk has limitations today where in reality it does not float automatically between desktops (following the user) but is associated with the desktop and needs to be manually reattached to the new “hotel apartment” by the administrator in recovery situations – but we are halfway there and other vendors work on similar functionality (e.g. VMware’s Mirage).

It’s just a question of time … and … well, inventing floating hotel rooms …

When we introduced a building block approach to our reference architecture many questions from the wider team revolved around the scaling maximums and limitations of the respective desktop virtualization solution in order to create valid configurations and correctly sized building blocks.

It became quickly apparent that while e.g. in VMware’s case vSphere maximums were well documented, the virtual desktop specific guidelines are scattered around in different documents, some are not listed at all and others (e.g. storage related) are based on 3rd party vendor recommendations rather than limits specified by VMware.

How many systems per “cluster”, how many VMs per replica/base image or LUN, how many broker or management servers per building block ?

So I thought it’d be worthwhile to summarize the (high-level) guidelines and assumptions we used for View, XenDesktop and Verde (verified by the respective vendors) in this post.

In a nut shell, understanding the scaling limitations allows you to “assemble” systems into clusters, building blocks and larger constructs (PODs) by combining them with management and other peripheral components.

I’ve created a few graphics to show you the approach conceptually in the example below.

Figure 1 – Step 1: Assembling hosts into a cluster and adding storage and management components to create a building block for XenDesktop on vSphere

Figure 2 – Step 2: Adding broker and other access components to the building block for user access

Storage:

Storage maximums with VMware View are less clear defined (mainly best practices have been published)

Typically the recommended maximum number is 64-128 linked clones (VMs) per VMFS datastore

VAAI storage systems typically allow for numbers higher than 128

(the limit without VAAI is primarily driven by the SCSI reservations on the VMFS file system during metadata updates – with VAAI enabled LUNs ESX uses atomic test and set (ATS) algorithm to lock the LUN, reducing the impact greatly)

Storage vendors have shown that an NFS based datastore (NFS is not using VMFS) can support up 256 and more linked clones (the actual theoretical maxima of VMFS and NFS for vSphere are higher).

Max 64TB per LUN (VMFS), max size of NFS LUNs are often determined by the NFS storage array (check with your vendor)

The maximum size listed above is not intended to be a “recommended” number as this will be determined by various factors (performance per LUN, vm sizes, operational concerns like backup etc)

(the limit without VAAI is primarily driven by the SCSI reservations on the VMFS file system during metadata updates – with VAAI enabled LUNs ESX uses atomic test and set (ATS) algorithm to lock the LUN, reducing the impact greatly)

Storage vendors have shown that an NFS based datastore (NFS datastores are not using VMFS) can support 256 and more linked clones (the actual theoretical maxima of VMFS and NFS for vSphere are higher).

Max 1000 VMs per MCS base image (equivalent of View replica disk)

Management Server and Maximum Number of Connections:

The number of supported users depends heavily on the actual load generated by registrations or logins per minute, so the below numbers are only high level guideline)

Provisioning Server: A single virtual server (4 vCPU and 32GB RAM) will support approximately 1000 users
Always use N+1 servers for redundancy.

License Server: A single Citrix License server (2 vCPU, 4GB RAM can issue approximately 170 licenses per second or over 300,000 licenses every 30 minutes.
Because of this scalability, a single virtual license server with VM level HA can be implemented (if the license server is down a grace period of 30 days is available).

A single Web Interface serve has the capacity to handle more than 30,000 connections per hour.
Two Web Interface servers should always be configured and load balanced through NetScaler to provide redundancy and balance load (NetScaler design is outwith the scope of this paper).

For environments with smaller numbers of users (e.g. <500; actual numbers will depend on user activity) the Web Interface service as well as the license server can reside on the XenDesktop controller instance rather than on a dedicated server.

Citrix NetScalerAccess Gateway: Provides secure remote access and single sign-on capabilities (e.g. from outside the corporate network): A single access gateway can provide 10 000+ concurrent ICA connections and should be deployed in N+1 configurations.

Virtual Bridges VERDE 6.0

One of the aspects I really like about VERDE is that fact that it’s scaling limitations are actually far simpler to deal with as the product inherently provides an architecture that allows for easy horizontal scaling.

Rather than having to use dedicated management servers (potentially with multiple UIs) each VERDE Server comes with an integrated connection broker, a hypervisor to run VDI sessions, and is managed from a single Management Console.

Servers can be clustered together using Virtual Bridges’ stateless cluster algorithm, in addition the Distributed Connection Brokering architecture eliminates any single choke point and therefore increases the scalability and availability of the VDI solution.

Max 10 000 hosts per VERDE cluster

Max 1 000 000 VMs per VERDE cluster

Max 2000 users per connection broker (typically each hypervisor host will also act as broker so that the alignment of “users per server” will automatically take care of that)

Cloud Branch Servers are sized like regular clusters

Storage

Shared storage needs to be NFS based

Shared storage typically only contains the golden image and the VERDE persistent disks for the VERDE user profile management (plus the small cluster metadata) greatly reducing the shared storage requirements!

Yes, it can be that simple … ;)

PS I have included more details on creating building blocks and the surrounding design consideration in the upcoming Redbook.
Please always check the latest vendor documentations for official (and updated) numbers where available.

As you probably know, View 5 (in conjunction with vSphere 5) introduced has a software-based GPU function that gives users basic DirectX and OpenGL capability without the use of a physical GPU (like e.g. Citrix’s HDX3D GPU-passthrough requires). Typical target use cases include Aero and low-end 3D animations, not “high-end” 3D engineering application.

One of the questions I’ve been regularely asked is “what amount of overhead does this create?

As we have no physical GPU in this scenario to execute the graphics related work (3D rendering etc.), the (general purpose) system CPU will have to perform this task. As the CPU is the most likely bottleneck in the majority of VDI deployments, enabling 3D should have a direct impact on user density – but how much?

So I wanted to qualify and qauntify the impact of enabling 3D as part of our reference architecture testing. Using LoginVSI we investigated the maximum user density (VSImax) when:

Configuring 3D capability for all desktops in a View 5 pool without enabling an Aero theme (this configuration would give user the ability to run e.g. Google Earth but have no Aero animations configured)

Both results will be compared to the base line (PCoIP connections to the same pool without any 3D capability enabled)

As the steps to enable 3D capability are actually not well covered in the View documentation I’ll give a step-by-step log of what we did … in case you are less interested in the “how” – here are the high-level results:

Conclusion:

Enabling 3d capability for the pool and image causes essentially zero overhead. This means that you can enable 3D capabilities for a pool even if you are not planning to use this capability in the future without impact on user density. That will allow users to use 3D-like applications like e.g. Google Earth if needed without having to reconfigure the pool (these 3D applications will of course create additional load if used).

Enabling Aero capabilities should be done with care. Only enable users that really require this level of user experience as the impact on overall user density is significant.
As enabling Aero is done on a user level this can be done easily with the needed level of granularity.

As a side comment, I do like the View 5 capability to enable basic 3D functionality for users without a physical GPU, regardless of the (expected) overhead. I expect increased interest in the VDI/3D Graphics area over the next 12 months given e.g. the product announcements regarding physical GPU support with NVIDIA at VMworld 2011.
But we’ll also keep a close eye on Citrix which is currently arguably the market leader in the high-end space with HDX3D Pro capabilities. The recent NVIDIA announcements around the “Kepler” GPU with virtualization capabilities are very promising as this approach will overcome the crippling 1:1 (1 vm per physical GPU) relationship with traditional GPU passthrough.

If you want to understand how to enable 3D/Aero on a View 5 pool simply read on …

Pool capabilities without 3D

We have an existing View 5 pool with 80 virtual desktops. The LoginVSI benchmark measured that we can support up to 74 multimedia users (VSImax) on our server (dual socket, 12 cores) when connecting with PCoIP.

Let’s test first what happens if we try to run a 3D app in a non-3D enabled vm …

We connect to the Internet and try to install Google Earth – it fails with:

Applying any of the above suggested actions will not resolve the issue

Additionally when we try to configure Aero in the vm it fails (as expected).

The above confirms that you can neither run 3D (DirectX/OpenGL) applications nor enable Aero effects for users in a non-3D enabled pool.

Configuring 3D

Before you start, remember that you will need both, vSphere 5 AND View 5 in order to configure 3D. With virtual hardware version 8 the VMware tools install the 3D capable graphics adapter as the new default adapter, so ensure that you have the tools updated. For this article we assume that you have already installed and configured View 5.

Preparing the Master Image for 3D

First step for enabling 3D is to prepare the master image (for the linked clones). The View user documentation is a bit loose here and seems to omit that you need to prepare the master image before configuring the pool settings.

Note: In order to prepare the image for e.g. Aero use you will have to manually perform the following operations on the master image before creating the pool, perform the configuration changes and then take a snap shot for the linked clones.
As Windows was initially installed on a non-3D capable system, Aero is not enabled, nor has Windows established the User Experience Index (required), nor attempted to enable required Aero related services.
Therefore if you only enable 3D for the View pool (without the following image changes) the updates required to enable Aero would be missing and any changes you perform as user in the individual vms will be discarded on refresh (e.g. logoff). The result is that you would e.g. be able to run Google Earth (if installed) but not Aero.

We took a clone of our existing master image to have an independent image stream (snapshots) for 3D

Open a console to the cloned master vm (refered to as “3D Master” from now)

Important: Shutdown or power off the vm after the reconfiguration – do not just “reset” the vm

We tried again to install Google Earth again – this time it installed OK and worked as expected!

As this point we took a snapshot “Google Earth without Aero enabled” which we will use later to test the supported number of users with LoginVSI.

Enabling Aero

We went on to prepare the image for Aero. As you might know from personal experience enabling Aero (after installing Win 7 on a non-3D capable system) feels slightly random but here are the steps that worked for us:

Most VDI optimization procedures will disable services required for Aero functionality (including the official VMware script:

The easiest way to enable Aero is to go to > Control Panel > Troubleshooting > Display Aero Desktop Effects > and follow the wizard ..

This time (as 3D was enabled in the vm settings) the wizard will be able to fix most issues, even though it wrongly indicated that the Desktop Window Manager service was disabled the following actions were successful)

Go to >System properties > advanced and apply “best appearance” – At this stage we still weren’t able to select advanced Aero options (e.g. Aero Peak)

The important step is to run the “Windows Experience Index” for the system to confirm the appropriate 3D capability

Confirm that the 3D capability was recognized

Now we can enable advanced settings like Aero Peak appearance

At this stage we created a second snapshot “Aero Enabled”

We have now for the same Master image two snap shots, one with Google Earth installed without Aero enabled and a second one for 3D enabled with Aero capabilities enabled. This allows us to test the overhead enabling general 3D capability as well as the delta of running Aero for all users.

Note: The Aero theme is a user setting, not a computer setting, so even with our second snap shot we will still have to enable an Aero theme for users of this image to get Aero working in the linked clone desktops.

Create a new pool or recompose the the existing pool edit the pool settings for your existing pool with the desired snap shot.

Enable 3D as seen below, this settings will automatically enable 3D for all images of the pool ( so you don’t have to edit the individual virtual machine settings)

For our test we selected the maximum amount of VRAM

You will see a “reconfigure virtual machine” in vCenter – after the reconfiguration you can verify that the tick box “enable 3D support” has been set for all the virtual machines in the pool.

You are not done yet, in order for Aero to be effective ensure that you have enabled Aero for the user (e.g. using GPOs as shown in the below screenshot) The default for the user is a non-Aero theme when installing Win 7 on a non-3D capable vm. So while you might assume that at this point you have enabled Aero for all users of the pool a test will show you that users logging in will not have Aero capabilities.

Enable an Aero them as shown for all users of the pool:

Determining the overhead (impact on user density)

We simply used the respective snapshot to recompose our View pool and ran another series of LoginVSI tests.

The first test was done using the 3D enabled snapshot but Aero was not configured nor an Aero theme enabled for the users:

As you can see from above the test shows that there is basically no overhead when enabling the 3D capability on the pool, the user number stayed easily within the 5% bracket that we allow for VSImax fluctuation when running tests (we took the averages of 3 tests).

We then recomposed the pool with the image that had 3D enabled, Aero configured and configured an Aero theme for all 80 users of the pool.

After the first test run we reduced the number of vms to 60 in order to avoid a skewed result due to many idling virtual machines)

The result is shown below:

The Aero enable test clearly shows the overhead created by enabling the additional Aero graphics workload. The impact on the CPU to emulate 3D functionality, causes the number of supported users to drop by over 35%.

OK, hopefully that gave you some insight into a) how to enable 3D and Aero and b) what type of overhead you should expect when doing so. The actual overhead will of course depend on the individual 3D workloads or animations you decide to run in your virtual desktops.

It is well understood that a key inhibitor to VDI solutions (amongst general complexity, technical limitations and migration effort) is the upfront capital cost. As I’ve been leading this project architecturally I want to elaborate a little on the importance of our storage design approach.
Just again this month two of my (larger financial) customers have estimated respectively 40% and 50% of the projected VDI project CAPEX cost to be related to enterprise storage, primarily due to the specific IOPS requirements of VDI.
So let’s have a closer look at the storage approach for our RA … As you can tell from the above, our desired (but of course not only) use case will be the pooled “non-persistent/stateless desktop that enables users to connect to a new/different desktop image every time they login while keeping aspects of user experience persistent (profiles).
This allows the usage of local storage instead of shared storage as no user-associated data will reside persistently in the image, in (the unlikely) case of host failures, users can simply reconnect to a desktop hosted on another system without the need for e.g. VMware HA.
I will here not discuss again each of the architectural approaches in detail (e.g. persistent v non-persistent, positioning VDI v SBC etc.) and I ask for forgiveness for brushing over important alternatives discussed in previous articles on this blog.

Importance of in-depth performance analysis

As stated in the overview, the key design principle of our RA approach is to radically reduce the cost of VDI by utilizing local SSD storage instead of shared storage where possible.
Without going into great detail (see the final publication for details) I want to assure you that we have performed extensive analysis particularly of the storage related aspects in order to validate this approach. I ensured we measured and documented all IOPS performance aspects and monitored latency on all storage components. The collected data does not only validate the local SSD architecture but also gives us unmatched insight into the IOPS distribution and allows us to create sizing models for local and shared storage approaches which we will feed into new sizing tools. The above example illustrates the detail of the storage related data collected for every test (IOPS and latency measurements of a single test on each storage tier).

So, local instead of shared storage for VDI …

This approach is not new but unfortunately still not promoted widely enough.Why?
Ok, allow me to be slightly controversial … review the majority of VDI reference architecture documents out there yourself and you’ll see that they are primarily created with/by major storage vendors … now, would it be in your interest to promote local storage if you goal is to sell enterprise SAN/NAS … I’ll let you be the judge …
And yes you could argue “what about you IBM – you are a storage vendor, no?” – let me say that common sense does sometimes prevail ;)

So why have we decided to make the ‘local storage’ architecture the core of our approach?

To gain maximum return on investment, non-persistent/stateless virtual desktops should be the default approach in any VDI deployment(reduce storage requirements, minimise size&number of images, enable pooled images etc.). To be blunt – if one argues “I can’t do it with non-persistent, I need all to be dedicated desktops” then VDI’ is most likely the inappropriate approach anyway (e.g. high-end user requirements across the board) or the capability of “stateless” is misunderstood.

Cost: Shifting IOPS data from shared storage to local storage allows you to significantly reduce cost – get a quote from any storage vendor for the same capacity/performance configured on their Enterprise SAN/NAS and compare it with the equivalent local storage cost and you will get a feel for the massive delta!

Building blocks (servers) with local storage allow you for simple linear capacity scaling – add another system and you will get a linear capacity gain – no complexity in estimating impact on shared storage.

So please approach VDI with non-persistent and treat persistent as “exception”! Local storage goes architecturally hand-in-hand with stateless desktops.
Of course reality is that there will be “exceptions” in most deployments and we will absolutely cater for those but let’s be clear, a deployment with 100% persistent desktops has little chances of (financial) success.

So what about those “exceptions”?

We all know that in most deployment you will be asked to provide persistent desktop. I’m sure I made clear that you should validate any “demands” for persistent desktops – Do NOT assume that the requestor has already done that! However, if persistent desktops are indeed required, our architecture will provide a hybrid of persistent and non-persistent desktops with the same building blocks (local SSD removed) simply through the addition of external storage, win/win.

One more comment – you will probably be familiar with interesting 3-rd party ‘SAN caching/optimization’ appliances like Atlantis’ ILIO (with their latest diskless feature) that try to address the storage cost issue. We have tested Atlantis in the past and seen very efficient offload so I am absolutely not discounting solutions like that – there is a place, specifically if primarily persistent desktops are required and we have been working with e.g. Atlantis in the past to provide ILIO based solutions.
So why have we not included SAN caching appliances (at least at this stage)?

I’m a believer in simplicity – most VDI solutions today are clearly already too complex and non-integrated.
Introducing an additional layer (of 3rd party components) should only be done if the return absolutely justifies this.
From my experience the additional (licensing) cost, additional support layer makes the simple local SSD storage approach the preferred model where appropriate.
Cost for SSDs decrease rapidly, capacity and durability go up with it – arguably becoming a primary storage technology.

Most major VDI vendors are increasingly integrating caching algorithms; you will be familiar with e.g. XenServer’s IntelliCache, VMware’s announced Storage Accelerator (CBRC) and Verde’s Storage Optimiser.
No, they are functionally not identical to e.g. ILIO (too big a topic to go into detail) – they address the issue in varying ways and primarily only the “read” IOPS – which is great for ‘boot storms’ but less so for the majority of “working state” IOPS (which is write). However they are/will be vendor-integrated, provide at least a subset of the functionality natively as part of the product, are fully compatible with our local SSD approach and I personally prefer to choke one throat in case of any issues.

I suppose the summary of this is that I have yet to see a SAN or SAN optimization appliance based building block that will flat out beat “local SSD” on price and simplicity for non-persistent desktops …
Again, let me be clear, there is no “one-fit” all approach and I am by no means implying that there won’t be cases where primarily persistent desktops, SAN+SAN optimization appliances or of course Terminal Services like solutions are appropriate (or in TS’s case potentially even more appropriate) – I have made my view absolutely clear on this before.

“So what about shared storage then, are you telling me I can get away without it completely?”

Ehhm, I’d be a fool to claim that …so let me be clear. There are types of data even in a non-persistent desktop environment that you need to keep available from any host/desktop and therefore needs to be hosted on shared storage … primarily the bits that give the user the feeling of persistency i.e. the user profile (desktop setting etc.) and any persistent user data (stored documents etc).
This is nothing new and has been achieved through various methods like roaming profiles, folder redirection etc. for ages and is increasingly enhanced through product features like VMware Personas, Citrix personal vDisk and Microsofts UE-V.
Bottom line is that you will need some shared storage …

Our architectural approach on this is clear and should (hopefully) make you happy ..

As explained above – we absolutely minimize shared storage requirements by placing the heaviest load on local SSD and only use shared storage for persistent user data and profiles (these are typically already on shared storage for physical desktop environments in your environment – so most likely no additional investment at all)

We understand that most have already a preferred storage platform – continue to use your own if you want to – our building block systems provide IP, FCoE and iSCSI based storage capabilities.

In order to further minimize shared storage cost and allow maximum flexibility we suggest a file (not block-based) storage system (NFS or CIFS) – again, this will depend on your environment

In the next post we’ll move on to share more of the preliminary results – continuing with “enabling 3D and Aero capabilities in View 5 – impact on user density and step-by step instructions” – coming soon …

As promised before, I want to start sharing some of our experiences with our on-going (IBM) VDI reference architecture (RA) work.
I’ve discussed VDI patterns and inhibitors (based on real-world client engagements) in various previous articles so I’ll cut to the chase …

What have we been doing?We have set up three industry-leading VDI solutions in our labs (US and UK) and are performing architecture verification and LoginVSI performance tests on different sets of IBM hardware, IBM Blades, IBM Rack systems and the recently announced IBM PureSystems (which has great potential to become the ideal platform for VDI – I’ll get into more details in another post).

A key design principle of our Reference Architecture is to address the arguably most common inhibitor to VDI – storage cost. Our approach will radicallyreduce the storage requirements for your VDI deployment by favouring local SSD storage instead of large-scale shared storage arrays.

We have been working closely with each individual vendor during the project and I’m grateful for their help (since our initial architectural workshop with the vendors last year).
It is also important to point out that the purpose of this document is NOT to compare the vendor solutions AGAINST each other but to demonstrate the suitability of our architectural approach on IBM hardware for each solution.

A key objective for the project is to determine the supported user density for individual workloads in order to create scalable building blocks and sizing models – essentially make sure our approach works, scales and gets you to the best price point.

That’s not all though – since I have the privilege to lead this effort architecturally I also wanted to provide additional value for the VDI enthusiast – or indeed you sceptics ;) – by investigating specific aspects of performance and user density.
I am frequently asked “will using PCoIP instead of RDP create any overhead?” or “does filling up memory in my server help or harm user density?” …

So what is the impact on user density (and therefore on the all-important metric of cost/user) when:

So we set out to determine the following values (example View 5 / dual socket Intel Westmere 6-core / 192GB RAM):

You can see that I’ve already listed some of the results as “teasers”. Pay for instance attention to the decrease in supported users when using PCoIP instead of RDP for all users. 20% less users is significant but of course PCoIP provides e.g. advanced graphics and redirection capabilities. Also the results are measured using the default settings (e.g. BTL enabled etc.) so tuning is absolutely possible.

We’lll discuss and share the other results in the following posts, of course all the results (and more) will be also be officially published (IBM Redpaper) as they become available (starting with VMware View), I will post the link(s) to any documents here as well.

Throughout our testing we have performed extensive performance analysis on all aspects (user density, IOPS, latency, network etc) and the findings will be fed into new VDI sizing tools that we’ll make available in due course.

So … a VDI approach that will allow you to reduce your cost per user, allow you to add scalable building blocks as you grow and all bundled with first-hand technical insight and sizing guides based on our testing … sounds interesting? If yes, then feel free to check out the following posts and upcoming publications …

PS A massive credit to the extended IBM team and the VDI vendor teams for their help with this project!

Background:

IBM Smart Cloud Entry can notify users about important activities on the cloud (e.g. new project created, virtual machine deployed, request approved etc.) Google (and other common email systems) require SSL based authentication.
IBM SmartCloud Entry currently uses non SSL based email for its email notifications. You can use a relay server to forward non-SSL based mail (generated by SCE) to Gmail or other external mail systems by following this article.

In the process we will install a free Windows based mail server that will establish an SSL mail “proxy” relationship to Gmail for you. Gmail will then send mail to any Gmail user directly or relay it to other mail systems.

Pre-requisites:

Note: You will need a valid Gmail account that is used to authenticate any mail requests (suggested to create a dedicated mail account)

The hMailServer needs to be installed on a computer (or vm) that has access to the Smart Cloud Entry environment as well as the Internet

1 – Install hMailServer as Email Relay Server

You can use other products with similar function – the purpose of this article is not to endorse the product rather than to explain how to enable the function for IBM Smart Cloud Entry.

Add a name for your local local domain, e.g. local.yourdomain.com and save it

Got to > Settings > Protocols > SMTP > Delivery of e-mail:

Local host name: enter “localhost” or full host name (irrelevant)

Remote host name: “smtp.gmail.com”

Remote TCP/IP port: “465”

Important: Server requires authentication: yes (checked)

User name = user@gmail.com (enter a valid gmail account that will be used to authenticate the relay requests)

Password: enter your gmail account password

Important: Use SSL: Yes (checked)

Click “save”

Got to > Settings > Advanced >IP Ranges > Internet

Lower IP – Upper IP = leave as is (all access)

Other >

Anti-Spam: No (Cleared)

Anti-Virus: No (Cleared)

Require SMTP Authentication:

Local to local e-mail addresses: No (Cleared)

Local to external e-mail addresses: No (Cleared)

External to local e-mail addresses: No (Cleared)

External to external e-mail addresses: No (Cleared)

Save

Exit

Please note that in order to properly secure your mail server in production environments you should limit the scope of IP addresses, the type of mail traffic and enable spam and anti-virus functionality.

2 – Configure SKC to use the hMailServer Relay Server

We have configured the mail relay server, now we need to point SCE to forward mail to it (instead of directly attempting to send it to Gmail)

Locate the email.properties file on the SKC system (user directory) as seen below

Change the IP address of the relay host to the system where you installed hMailServer on (can be on the SKC system – not a statement of official support though)

3 – Enable the user to receive email notifications and test the setup

In SCE, ensure that the user has email notifications enabled as seen below.

To test, add the user (ensure that the email address specified here is a valid external email address – it does not have to be a gmail account)

Ensure to tick the “Send notifications …” box

Save the new user and verify that an email has been sent to the address specified.

Tip: In case of problems you can enable logging on the hMailServer as shown below

IMHO the only way to provide relevant coverage of vendor capabilities on Virtualizationmatrix.com is through hands-on experience with the products in client projects or lab test – finding the time to always document this in detail is however a challenge.
So initially this “test log” was not intended to be a blog but I decided to post it after running a few colleagues through my experience with VMM 2012 and they asked me to share it with others.

So here it goes …

Scenario:

Migrate one of our Hyper-V lab clusters from VMM 2008 R2 to VMM 2012 and evolve the managed virtualization environment into a private cloud that facilitates controlled Self-Service access for visitors (remote demo users and developers).

Upgrade the VMM management of our existing 2-node Hyper-V cluster “HypVCluster” to VMM 2012 (RC) by installing a NEW instance of VMM (not an upgrade of the existing instance)

The new SCVMM instance will reside in a highly available virtual machine hosted on this Hyper-V cluster. The new high availability feature for SCVMM (where you can install SCVMM as “cluster aware application” on each node will not be used (no “hard” technical reason, mainly to keep VMM “portable” as vhd in our test environment).

As the VMM appliance is based on an evaluation version of W2K8R2 Standard and an evaluation version of SQL2008R2 we will activate Windows to make it permanent and use an existing (full) version of SQL2008R2 for the DB (installed on old SCVMM server: SCVMMR2.eebc.dom) to avoid any expirations.

Preparation:

Accounts: Even if you normally may not bother with dedicated accounts in test environments – DO create a dedicated VMM “service account” in your domain (do not use your “default” admin account as various steps will check that you don’t use it and hard stops and problems can occur if you do – more details are covered in the VMM documentation http://technet.microsoft.com/en-us/library/gg697600.aspx)

Ensure to make the VMM service domain account a member of local admins group on your new VMM Server (vhd)

If you use an external SQL server, ensure you have an authorized account for the DB creation

Get Started:

Start virtual machine, run through the initial Windows setup, adjust domain and network settings to integrate the vm into your environment.

Start VMM setup by clicking the existing icon on the desktop of the VMM server

Ensure to specify the created “VMM service account” (I’d suggest to add it as “run as” account in VMM to allow for easy future re-use)

Specify the SQL instance you want to use (in our case “SCVMMR2″ but it could be in your case the bundled evaluation version)

Check that the setup finished without any issues – you will now have the VMM start icon on your desktop – start the VMM console.

Get a feel for the GUI, explore the new Office style “ribbon” menu but do not start adding hosts or clusters yet

Configure the Fabric

Before we upgrade and add our hosts we will prepare the “fabric” environment we want to add the hosts to.
Fabric is the new collective term for servers, network and storage managed by VMM.

Host Groups:

Host groups are hierarchical folder structures to group virtual machine hosts in meaningful ways, often based on physical site location and resource allocation.

From the Fabric pane start off by creating a hierarchical folder structure that reflects your logical infrastructure layout (e.g. by datacentre locations and sub-locations).

Review the available properties you can set on the host group level, like “Placement Rules”, “Network” and “Storage” allocationsNote: Resources typically need to be allocated on the host group level before they can be assigned to hosts and clusters so familiarize yourself with these options (right-click on the host group and select properties)

Library:

If unfamiliar with the VMM library simply think of it is a repository for any resources you might need to access such as virtual hard disks, virtual floppy disks, ISO images and application packages (new in SCVMM 2012) as well as virtual machine and service templates and profiles that reside in the VMM database.
The library in VMM 2012 has been enhanced to support additional resource types and cloud structures. You can also store driver files that are used during the bare-metal deployment of a Hyper-V host and custom resources that would normally not be recognized as VMM reources (such as scripts)

Explore the library view of the default library (installed during setup), including the templates and profiles as well as “equivalent objects” (you can now mark identical objects as equivalent – that allows VMM to have multiple sources for the same object in order to decrease dependency on physical resources and consider locality)
Note: VMM has no integrated mechanism to syncronise changes to the equivalent objects (you need to ensure e.g. replication or manual copy after changes)

Optional: Install a second library server (just to test some of the new functions like marking objects as equivalent for location sensitive deployments)- we installed a second library server: “w2k8r2-trial-0.eebc.dom”Note: If you plan to implement a private cloud I strongly suggest to review the topic “Implementing a private cloud” in the VMM documentation before creating your final library structure as organisations might require dedicated library resources (to ensure access to THEIR resources)

Observed problem: Adding second library server did not reliably add all the default resources (skipped some of the default categories), circumvented by manually copying remaining files from one library to the new one)

Comment: There is no integrated functionality to view/monitor capacity information on the library servers i.e. to avoid running out of space or make decisions where to store images)

Note: We will later verify the assignment of the logical networks to hosts by mapping them to the physical adapters (once the hosts have been added). As part of logical network creation, you can create network sites to define the VLANs, IP subnets that are associated with the logical network in each physical location.

Comments:

Reviewing the structure of the logical networks and the relationship to the hierarchical structure (e.g. host groups) is not easily done, there is no good view in VMM to get an “at a glance” view of the architectural network structure.

Setting up Storage

I will deploy the new SMI-S based storage management that allows the admin to perform common storage activities from the VMM GUI (e.g. create LUNs, assign storage etc) and also allows to offload certain storage functions directly to the storage array (if you are familiar with vSphere then this is a similar approach to – but not identical – to VMware’s VAAI/VASA approach).

Note: You need a supported storage array (array: storage system) to integrate VMM with SMI-S but you can of course use standard storage using non-SMI-S based storage allocation but you won’t be able to manage them through VMM.

Comment: You can work with existing disk resources or create new ones.Depending on the storage you might have to perform some actions using the native array GUI

In our case I created a new “SCVMM” aggregate on the Nseries using array GUI

Downloaded and install your SMI-S provider – in or case I installed the Nseries (Netapp) SMI-S provider and installed it on SCVMM server (could obviously be on another server)

Added hostname of SMI-S provider system to Providers under Storage (without SSL), the array was discovered OK and new aggregate “SCVMM” was listed

Observation: Please note that there can be delays in updating the array status in VMM after VMM driven configuration updates, ensure to “refresh” before performing new actions if problems occur Fabric Pane -> Storage -> Providers

Selected the disk resources you want to manage through, I selected the “SCVMM” aggregate to be managed by VMM

I also tested the array interaction by creating and deleting a test LUN through SCVMM and verified the activities through the Nseries GUI – all successful.

OK, we have prepared the fabric environment, now we have our hierarchical folder structure, added library servers to store images, created logical networks and prepared the storage.
Let’s add the hosts.

Adding Host Resources

Preparing Hosts

If you have not already done so add the storage multi-path (MPIO) feature to each host before adding the host/cluster to VMM. MPIO will then be configured automatically when adding hosts to VMM.

As the IBM blades are configured with Broadcom NICs I installed and configured the BASP failover driver with the defaults.Note: If the hosts are already configured for Hyper-V (as in our case) you will have to un-associate the NIC from the hyper-v virtual switch as the BASP installation will otherwise not be able to continue with the following error “The selected Adapter is bound to Hyper-V Virtual Network …”:

Re-associated logical network (hyper-v switch) with the team (rather than a physical adapter) as shown

Note: If you receive a warning 26179 when adding hosts/cluster “Couldn’t enable multi-path i/o for known storage arrays xxx” you have either not or incorrectly configured multi-path on the hosts before adding them. VMM will attempt to configure MPIO when adding host. Correct the MPIO settings before continuing.

If the hosts were part of a SCVMM 2008 cluster remove the existing SCVMM agents from the host before adding the host to the new SCVMM instance

Add Hosts/Cluster

From the Fabric Pane, select the appropriate host group and add the cluster (you can specify a cluster node and it will pick up the existing cluster.

Your cluster should now be imported into the new VMM instance and any existing virtual machines should be visible and operational.

Adding Storage to the Cluster

The storage allocation to hosts can be slightly confusing to the new user.

Be sure that you can see the storage array and any resources on the array you want to use from VMM

As part of this you should create storage classification to describe the properties (e.g. if you have different storage tiers)

Then “allocate” storage to a host group (folder containing hosts or clusters): properties -> storage
You can allocate existing storage pools, existing (unmapped) LUNs or create new LUNS (free space on existing pool) and allocate them

Then (and only then) you can “assign” storage to the cluster: select the cluster -> properties
Note: you can add (assign) LUNs as “available storage” (think “normal” LUN) or “shared volumes” (think “cluster shared volumes”) – for what it’s worth – I don’t like the naming convention here nor the way of allocation

Feel free to convert between CSV and “normal” LUNs – I selected all shared storage as CSV for the obvious advantages (there aren’t many reasons why you’d want to have “normal” LUNs in a cluster scenario)

In our case we had two existing cluster LUNs (Quorum + 1x CSV) on an existing aggregate (30 spindles) from the initial SCVMM2008R2 managed cluster, and as mentioned above I added an aggregate “SCVMM” with 2 additional LUNs (5 spindles) on the Nseries

We then selected both aggregates (original + new) to be managed by VMM and created 2 VMM storage classifications to reflect the performance differences (spindles) as shown below

Comment: There is no feature to exclude LUNs of a managed Storage Pool from the management (in our case we added an aggregate that also contains LUNs not used for the VMM environment). This distorts the capacity information (as unrelated LUNs are included) and introduce potential admin errors (e.g. can delete unrelated LUNs).

Observation: I ensured that the managed disk pool is allocated to host group but any attempt to add the new storage pool (or LUNs within the pool) to the cluster failed with error 26184 “The Storage Group existing for xxx doesn’t match storage group setting at array xxx”

Resolution: As VMM will create relevant LUN-to-host mappings at this point, any existing conflicting configurations may cause problems. Use the native array GUI to remove invalid old mappings for the HBAs/hosts (in our case in the “initiator” section of the Nseries GUI). After deleting invalid old mappings the process worked.

As expected, after fixing the “ghost mappings” the assignment of available storage created automatically the respective LUN to host (initiator) mappings on the Nseries storage.

As the storage was assigned to the cluster it also created automatically the cluster resources (as seen in failover manager)

I then converted the volumes to CSVs – no problems – the CVS were created automatically and made available to the cluster nodes.

Verify Host Network Config

Again, verify that MPIO is configured correctly: Admin Tools -> MPIO, if you have added MPIO before adding the hosts (or configured MPIO manually correctly) you should see something like the below

Observation: Adding an additional host to an existing cluster fails with error 25343: “No network adapter found on host xxx that matches cluster virtual network xxx”. The error refers to a miss-match with the VIRTUAL network. However the recommended action points out that you should set the LOGICAL network on the NIC.

Therefore do NOT just try to create a matching VIRTUAL network like below on the host:

Instead as described above, select the host before adding it to the cluster -> properties ->hardware -> NIC and ensure that the associated logical network is connected correctly.

Configuring Dynamic Optimization and Power Optimization

Dynamic Optimisation for Hyper-V (again, if you are familiar with vSphere think “DRS”) is now very easy to set up. Forget the extremely awkward SCOM/PRO dependency for even basic optimization in SCVMM 2008.

In the properties for the host group containing the cluster, enable Dynamic Optimization with the appropriate settings – literally nothing else is required at this stage …

10 mins later first “optimisation” took place:

Note: Power optimization requires direct out of band BMC access for IPMI control (i.e. try to ping the BMC IP address from the VMM server … since the BladeCenter chassis uses central management of the blades through its management module it will not work on this setup.

VMM Updates (WUS)

VMM now supports compliance scanning and remediation of the fabric servers (again, think “VMware Update Manager” in vSphere). VMM supports orchestrated updates of Hyper-V host clusters (VMM places one cluster node at a time in maintenance mode and then installs updates) while vms are being live migrated. If the cluster does not support live migration, VMM saves state for the virtual machines

We will install a dedicated WUS server for VMM (installed on the VMM server). You can also use an existing WUS server in conjunction with SCCM.

Reviewed the default baselines and created a new test baseline – added critical and security baselines to “all hosts” host group

Comment: There seems to be no intuitive method of filtering/selecting updates at this stage and the baselines are not continuously maintained (e.g. you sorted all updates by “critical” and created an “all critical updates” baseline. That means that critical updates released in the future are not automatically added to this baseline)

Scanned all hosts for compliance:

Remediated the non-compliant server (if I had a non-compliant cluster then remediation would have put hosts into maintenance mode in round-robin before applying updates)- This is what you should see after the Remediation:

Comments: This all works and is straight forward but …

- No integrated WUS synchronization (to download new updates) – only “on-demand” (marketing term for “manual”) – No dynamic updates of baselines to include the new updates (i.e. by category “all critical”)

So in order to stay updated one needs to:

1) Manually sync the WUS server (to download new updates)

2) Manually update baselines to include the new (synced) updates

OK, so now we have added our hosts and associated them with storage, logical networks, enable Dynamic Optimization and configured updates for the hosts.

Our virtualization environment is basically configured and we could go ahead creating vms, templates and deploy workloads. However, what we really want is to create a private cloud ….

Private Cloud

I will assume that the reader is familiar with the concept of a private cloud. Essentially we want to create an environment that allows us not only to pool our underlying resources (which we have essentially done) but to enable shared Self-Service access for users from different organisations, delegate management without requiring users to ask the private cloud provider for administrative changes beyond increasing capacity and quotas as their needs change. While you can create private cloud from either Hyper-V hosts, VMware ESX hosts and Citrix XenServer hosts we will only u

Scenario:

We want to make the resources in the host group “ATS Lab” available through two private clouds:

Private Clouds:

Cloud 1: ATS Department

Cloud 2: Visitors and Test/Dev

Capacity:

ATS Cloud with have unlimited capacity quotas on the underlying resources

Visitor and Dev Cloud will have limitations on memory, storage and number of virtual machines

Network:

All will have access to the same logical network (DHCP)

Only ATS will additionally be given a dedicated IP pool (fixed IPs)

Storage Tiers:

ATS: Gold

Visitors: Silver

Library:

ATS: Both Library shares on SCVMMLibrary1

Visitors: SCVMMLibrary2

Prepare Cloud Libraries

Please spend some time to properly plan the library structure to accommodate multiple orgs/dptms

Create read-only library folder structures (not shares) on the library server(s) that allow dedicated folders (with unique paths for each “organsisation”) to store vms. You can see below that we created dedicated “write” folders on the same library server as the “read-only” library share but not within the share! (suggest to review the impact of user rights and folder structures in the documentation)

Note that (just as a test) in this example we have selected separate folders on the same server (Library1) for both orgs to store vms (while the read-only shares are dedicated to Library1 and Library2 respectively. This is by no means intended to be a “best practices” library setup.

Comment: A “reference library layout” in the GA VMM documentation would be useful – the library structure can be confusing given the different types of folders, shares and access requirements for the cloud libraries (in addition to the standard libraries)

Creating the “ATS Cloud”

From the “VMs and Services” Pane select “create cloud”, then specify the cloud properties

Creating the “Visitors and Dev_Test” Cloud

Verify that the clouds were successfully created from the VMs and Services Pan

Comment: The capacity settings have some inconsistencies and limitations:

Danger of “over-committing” capacity

There is no way to guarantee resource – only limit/cap the usage

There is no warning when “overcommitting”, i.e. you can only have 36GB of physical RAM combined in the resource pool shared by two clouds but you can “limit” to e.g. 64GB on each cloud – no warning or visibility of how much of the resource has been “committed” (it’s not commited as such as it’s a “limit”)

Values shown as “unlimited” – which is strictly speaking correct but meaningless i.e. how much is “unlimited”?

Configuring Self Service

Self-service users can deploy their virtual machines and services to private clouds.

Role-level quotas on the self-service user role are used to allocate computing capacity and other storage within the cloud.

Member-level quotas set individual limits for self-service user role members.

Self-service users can also create their own templates and profiles. The Author action for a self-service user role grants self-service users authoring rights. Users with authoring rights can create hardware profiles, guest operating system profiles, application profiles, SQL Server profiles, virtual machine templates, and service templates.

Preparation:

You typically create security group(s) in active directory and associate Self Service User Roles to these groups.

In AD:

Created Security Groups “SelfService_ATS” and “SelfService_visitors”

Added “ATS1” and “Visitor1” as new test users to the respective groups

Creating the Self Service User roles in VMM

We will create two Self Service User Roles in order to test different levels of entitlements to the cloud environments.

ATS Self Service User (“unrestricted” access to both clouds)

“Visitors Self Service User” (restricted access to visitors cloud (only) without “Author” rights and limited quota for max of 2 vms per user)
(The Author right determines whether a user can create their own templates)

Create ATS Self Service User Role:

From the Settings Pane -> Create User Role

Add “SelfService_ATS “ role

Gave access to both private clouds (ATS and Visitors)

Granted all Self Service rights

Created and shared a folder for the user role data path (where SS users will be able to upload and share the physical resources that they use to create service templates and deploy services)

Ensure to give the user group associated with the role read and write access on the share

Created and shared a folder for the user role data path (see above example)

Note: In order to test assigment and sharing of resources between user roles I subsequently created and added a vm template and a guest OS profile to the library and added them as available resources to the ATS Self Service User role only!

Observation: Capacity and Quota assignment is straight forward in VMM but viewing the (effective) allocations is not intuitive as the cloud overview does not seem to correctly reflect the assigned values. Example:

The “visitors” User role restricts the member to the following quotas:

the role (group) level is unrestricted

the member level is restricted (e.g. to 2 vms only) as show below

However, logging in with as the self service user “visitor1” (which is member of the security group that is associated with the “visitor” user role does not display any limitations in “ Quota for visitor1” – see below:

Changing the role level quota to a restricted amount is however correctly reflected so the user is only able to see group-level quotas NOT user-level quotas (which would be more appropriate for Self Service Users in order to understand what is available to the particular user when logged in).

Logging in as Self Service User

We are now logging in as the respective Self Service Users to verify the correct resource assignment.
Note that you can concurrently log in as administrator and self service user(s) from the same system as shown below

Log in as user “ATS1”

Note that there is no Fabric Pane

Library Pane: As expected we can see all cloud library resources (not the physical library servers)

The assigned resources (guest profile as an example) are visibleNote:

ATS1 can create new templates as we have given “author” right to the user role.

Pay attention to the context menu options. ATS1 can create templates as we have given the user role “author” rights.

Now log in as Self Service user “visitor1”

As expected we can see only the cloud library resources associated with the visitor’s cloud but not the other library resources.

No resources (e.g. guest profiles as shown) are available yet as we have not assigned any to the visitors user roleNote: Pay attention to the context menus – as we have NOT given the author right to the visitor user role there is no option to create a template (see limited menu options) – all working as expected …

Sharing Resourcesbetween Self Service Users

Finally we want to test the ability of VMM 2012 to share resources between Self Service Users. SS Users can either be entitled to resources through their user role or through object based sharing of resources if the user role “rights” (“Actions” as defined above in the user role) allow that to happen.
As mentioned above, the ATS Self Service users have already 2 resources allocated (one guest profile and one vm template) – the Visitor SS users have not been allocated any.

In order to share resources between ATS and visitors the ATS SS user role must have the “share” action enabled and the Visitors SS user role the “receive” action enabled (we have done this when we created the user roles)

Also note that the SS user must be the owner of the resource in order to share it (e.g. must have created the resource or be made the owner by an admin)

We logged in as user “ats1” and created a test guest profile “Shared Guest Profile”

In the properties of the resource (library view) ats1 can now share the resource with other user roles (that have the “receive” action enabled), see below:Note: Ensure that that the logged in user is the owner of the resource and add other user roles for access as desired

After performing the above action and logging in with “visitor1” we can now see the shared profile being available to “visitor1” as expected.

Deploying a Resource to the Cloud

Finally let’s deploy a test virtual machine to the cloud using the “visitors” user role.

After specifying Source, virtual machine name and virtual hardware you are asked to specify whether you want to deploy to the cloud.

As expected we are able to (only) specify the visitors cloud as target

OK, that was my test log from the initial upgrade to SCVMM, covering the required steps to configure the fabric resources including storage, network and compute, updates to the fabric servers, dynamic optimisation and finally the creation of private clouds and Self Service User roles.

I still have to write up the Service Profiles section and will test the App Controller hybrid cloud functionality. As I’ve also tested the new bare-metal deploy function I might add another blog on this.

Due to its nature this was clearly a less “opinionated” article – but don’t get your hopes up too high – the next one might just be the opposite ;)

So why would anyone want an entry-level cloud solution?

Let me rewind … many many years ago I was sent some test code for a very basic web interface allowing self-service requests for virtual machines – developed by a single VMware employee in his spare time – looking back, this was the first time I actually “did cloud”. And I liked it because it was exactly what I wanted at the time – a simple way to enable, control and streamline resource requests.

With marketing engines blazing today we seem to have forgotten what drove these initial efforts and often it feels that vendor capabilities drive our (perceived) cloud requirements rather than the other way around (as it should be). It seems that everyone today is brainwashed into thinking they are a public cloud service provider (and I understand that some IT departments indeed become some sort of “service providers” for internal divisions but typically with totally different security requirements).

“With marketing engines blazing it seems that everyone today is brainwashed into thinking they are a public cloud service provider”

So it’s not surprising that security concerns, compliance and business process integration challenges often spring to mind first when listing cloud adoption inhibitors.

However, on a more practical level from my experience for many smaller private cloud projects the upfront implementation effort with the associated cost, complexity and lack of in-house skills are the first (and still often final) hurdle.

And yes, this doesn’t come as surprise, the first time we installed vCloud Director last year it took us the best part of 4 days – a far cry from the “click next”, “next”, next, next” experience many got so used to with e.g. vCenter (and don’t get me wrong – vCloud Director has a great UI and this is by no means a “VMware only” issue).

So why is it that even vendors like VMware who are known for intuitive management UIs struggle to deliver a simple, “end-user” installable cloud management suite?

To a certain extent it’s the nature of the beast … a full multi-tenant cloud management stack is vastly more complex as it touches and incorporates not only layers of the classical server infrastructure but the extended network, security and more importantly the interfacing business support systems. When combined with the inherent requirement for system wide orchestration and extensibility through comprehensive APIs for each individual customer environment it is clear that by it’s very nature it will not be “simple main-stream” for some time.

Efforts like VMware’s vCloud Director Appliance (for evaluation purposes) show that this is a recognised problem.

So while there is no easy short-term solution, the question you should ask is whether YOUR environment always really needs all the bells and whistles of ” a full-blown” cloud management stack … and I’m by no means implying that the answer will always be “no” …

The question you should ask is whether YOUR environment really needs all the bells and whistles of ” a full-blown” cloud management stack?

So – you might argue – nothing new here … one needs to understand the specific functional and operational requirements of the environment and translate that into ones custom solution – that’s what we (architects) do right…?

Well, unfortunately today’s reality is that your are likely to be presented with a “one-fit-all” approach by most vendors when it comes to “your” cloud solution – it’s typically “take that or nothing” – unless I’ve just missed the e.g. vCD “light” version? ;)

What is Starter Kit for Cloud (SKC)?

When we showed off “IBM SmartCloud Entry delivered by IBM Starter Kit for Cloud” at VMworld 2011 I was seriously taken by surprise how much interest it generated but retrospectively it makes clearly sense.

OK, so what is the IBM Starter Kit for Cloud? In a simplified way it is a browser-based orchestration layer that is installed on your existing virtualization environment to provide cloud-like functionality. Take for instance your existing vSphere infrastructure, install SKC and point it to your vCenter server. It will automatically surface your existing vSphere workloads and templates and add extended self service portal functionality.

Take your existing vSphere infrastructure, install SKC and point it to your vCenter server. It will automatically surface your existing vSphere workloads and templates and add extended self service portal functionality

So what’s good about it? (and I am conscious of the fact that I’m an IBM employee covering an IBM product so please bear with me before shouting “fix”)

It’s has an extremely intuitive user interfaceYes, I can hear some of you … “An intuitive user interface from IBM??”. I am probably the first to admit that our UIs can sometimes be intimidating to the novice user but if you are e.g. familiar with the IBM Storwize interface http://youtu.be/aHC5X_-gzw0 then you know that great attention is being paid internally to user experience and the SKC UI clearly reflects that.

It installs in minutes …Now, I really mean that – I have put together a short “Our Angry Boss Wants a Cloud” video that captured the entire install process in our lab environment. It also gives you an overview of the interface and overall functionality. If you have a few minutes then have a look above. And yes – humor is intended but bare in mind that I’m German ;)

It provides the core functionality for private cloud portalsA web-based self service user portal, project based workload entitlement, request and approval management with email notifications and basic metering and billing for the deployed workloads.

Multi-Virtualization Vendor supportOK, so today it only “cloudifies” VMware vSphere and IBM system p Unix systems (separate editions) but given IBM’s publicly stated policy of open choice it would seem logical that SKC would be extended to support other x86 hypervisors from a single SKC instance in the future(I am not making any official forward-looking statements but think of e.g. KVM as additional virtualization platform)

Attractive price point and easily extensibleSKC is priced per server (so independent of the number of virtual machines!) and can be purchased for under $

2K with a 1 year S&S.

SKC has a documented REST API that allows for integration and customization of SKC in your environment.

As always I’ll be straight on this blog (even if I talk about one of “our” products) …

… So what is SKC (not) … ?

It is what it says on the tin, an entry cloud solution – it is e.g. not intended to be a fully fledged multi-tenant cloud solution for service providers – IBM has other products in the portfolio addressing this space – see our Cloud Service Provider (CSP2) offering or our SmartCloud Portfolio.

SKC is not intended to be a full multi-tenant public cloud solution for Service Providers – there are other products in IBM’s portfolio to address this space

To give you an example, while you can e.g. create virtual networks in SKC it does not have secure network isolation a la vShield with VMware vCD. And if you look for all the advanced functions IBM’s ISDM or VMware’s vCloud Director + (fee-based) extensions can provide then don’t be disappointed not to find them all in SKC. Also be aware that the currently supported vSphere version is 4.1 (with support for v5 coming early next year.

I really do like SKC for what it is (otherwise I wouldn’t cover it here) – so if the core of what you need is delegation of resource provisioning, control vm sprawl through request and approval management, basic metering and billing and you feel that other offerings are too complex, too costly and simply overkill for what you need then I can only suggest to evaluate SKC (see HERE for details and contact).

SKC is by no means the “one fit all” answer to all of our cloud scenario – it is simply another option in your architectural toolbox – determining whether it fits your needs will be required …

If it does fit, I believe it can simplify your job greatly and give you quick time to value on your journey to the white fluffy thing … .