It is every IT administrator’s worst nightmare.
All the employees’ desktops have been virtualised and are running on a server. The pilot project worked well and everyone was happy, but then the team tried to scale it up and now it’s Monday morning and 3,000 users have just walked in with their lattes and croissants, sat down at …

COMMENTS

Only a surprise to noobs

Whether it's hundreds of users coming in on a monday morning and all trying to access their email (off the one single server, that was only capacity-planned for a steady-state load) at the same time.

Or the hundreds of call centre staff who all go <click> when their shift starts: like the email or Windows servers "storm", but much more intense, as they all start within a few minutes of each other.

Or (worst of all) bringing up a system after a crash when EVERYone tries to log in continuously just as soon as they see the login screen.

It's even been a problem in the days of mainframes when everyone tried to fill in their weekly timesheets at 16:30 on a friday afternoon (they had to be done before you left, you couldn't fill them in earlier - 'cos you didn't know what you'd be doing - well you did: you'd be waiting for CROVM4 to respond for about 15 minutes)

But, of course, no manager is prepared to shell out for a system that's specc'd at 500% of their steady-state capacity requirements, to handle a workload that will only exist for a few minutes once or twice a week.

Re: Only a surprise to noobs

That's pretty much what I was thinking - you'd have to have been pretty crap in your initial project to *not* think "Hmm, how're we going to deal with peak load situations?" I mean it's not like they don't already happen even with domain-bound workstations that use networked authentication and home directories...

Yes, there'll still be idiots who get caught out by it, but then that's the nature of idiots, isn't it?

Hmm.

I wonder if there's milage in caching some of the OS components on the client machines. Sure, you'd need slightly more expensive and complex and powerful clients, but on the other hand if they booted using a local kernel and userland and then accessed files and applications over the network, think of the bandwidth-saving!

So what you're saying..

No

What he's saying is that VDI only works if you have an obcene budget and/or more money than sense or if your users access the system sequentially and we all know how quick sequential access is, right chaps? ;-)

More proof this cloud BS is simply another method of trying to shift more server units

Everything's under utilised and that's a good thing

All computers (servers, PCs etc) are under utilised. When they ARE fully utilised (i.e. running at 100% capacity) the response time is so bad that everyone complains. The basic problem is that t'management buy hardware, not a service and baulk at the idea of spending £-mega for a machine that will sit around rusting for most of the time.

Although if you look in any company carpark, you'll see more £-mega of cars sitting around rusting, which nobody is worried about. That's because the owners of those fine vehicles are prepared to pay for the utility they get (i.e. not having to sit/stand beside/be groped by - some smelly stranger on public transport: rather than the utilisation they get measured in minutes of use per day.

The basic problem is that people fixate on the reports from performance monitoring tools, rather than the quality of the service they are getting.

KSM

Using one shared disk image is not always enough; this is where technologies like Linux / KVM's KSM (Kernel Samepage Merging) can drastically cut down the memory requirements on the host server. RedHat claim to have had 600 virtual machines running on a server with 256GB of RAM and 48 cores. That's almost certainly over doing it, but it can be done with KVM/KSM.

Intelligent clients

Perhaps we could develop an Intelligent Thin Client™, the author touches on the subject of direct-attached storage but what if we could somehow extrapolate that concept further and have a client which could process it's own boot sequence and not just by utilising local storage but by utilising client-side memory and processsing capacity.

Such a magic bullet could and would masively alleviate the burden on the network and server resources and even allow users to continue working should back-end infrastructure fail. It'd also cost a fraction of upgrading to a fibre-core LAN/WAN/SAN and doubling the number of servers in the VDI cluster to provide sufficent failover capacity (IMHO).

Already invented

Sorry!

Actually the compelling case for VDI is security and support for large enterprises with multiple sites. Thin clients require little skill to maintain and support, and upgrades can all be managed from the data centre. A corrupted VM can be blatted and a fresh one started in moments, an upgrade requires a few images to be modified off line, not an estate of thousands of machines.

Plan 9

Well

We are about to deploy a large VDI implementation, and we have been told from day 1 by our vendor that you always maintain a configurable pool of pre-booted instances on each server, and that you need to be very careful with bandwidth between your storage and servers. They also suggested that holding the VM image on a flash disk in the server might not go amiss.

To me that was "No Sh** Sherlock", but then I've been doing this a long time, and remember all about session start-up on our VAX servers at 9am on a Monday morning. Mind you we did only have to worry about RS232 connectors to VT220s and the odd terminal concentrator. Nor did I have to cope with managers who thought that VPNs increased the available bandwidth on a link.

Load- and StressTest during and after PoC

VDI is a complex matter, not just due to performance, but complex in many ways (as can be read in the comments). But the basics are equals as with Server Based Computer (SBC) as we've been doing for the last ten years.

Durint the design phase you should always consider the capacity, and therefore the scaling requirements and possibilities.The impact of booting a machine is huge and logging in a user is about 150% of a regular user (as a guideline). Jim Moyle wrote an excellent article about IOPS in an VDI environment which I highly recommend : http://www.jimmoyle.com/2011/05/windows-7-iops-for-vdi-deep-dive/

The thing is though, you should always test the performance during and after a PoC (or in the development phase). What is the impact of one machine, one user or certain processes? What happens if you increase the users? What happens if all users write their billable hours at the same time?

This is why a LoadTest is key for success. A StressTest would have prevented the problems described in the article, or atleast you would have known upfront (and could have informed the managent before the managers started complaining).

I wrote an two part article about LoadTest / StressTest best practices here http://www.ingmarverheij.com/2011/05/loadtesting-best-practices-part-1/.

LoadTesting is relative easy and cheap if you know what your objectices are. If you have the chance, and I bet there is, make sure you can validate your design (and a bonus, you'll be able to see the impact of changes in the infrastructure or applications).

It's a pity

All this technology spent on parallel processing and all we get is the same problem we've had since the dawn of time. Man simply can't sneeze and hiccup at the same time, not that I'm sure one would want to.