Facilities Management

I have often had the opportunity to make decisions about which vendors or contractors to use during my career and, like many, I chose the easy way out – I just extended the contract or agreement. In hindsight, I have to say, I missed a great opportunity. I continued to dance with the devil I knew rather than with the angel I didn’t. On one occasion, however, I took a different approach by inviting several vendors to compete for a contract. We did a very thorough specification and evaluation based upon what we needed, backed it up with research, and articulated it in such a manner as to be able to compare apples to apples. After we narrowed our selection to three vendors and had them perform for us, we evaluated the results – which showed that we should go with a new vendor. The decision actually gave us exactly what we needed at a very significant cost savings. Add to this the fact that the new company was also local and gave us almost immediate response to our issues, and you can see why it was a very satisfying outcome....

It’s not uncommon for data centers to spend hundreds of thousands of dollars a year maintaining vital equipment. A typical maintenance budget can run 1 percent to 3 percent of your initial capital investment for each year of operation, and this amount goes up as the equipment ages. It’s a necessary evil if you expect high levels of reliability in your data center operations. But curiously, in this industry (and many others, by the way) the budget allocated to maintain their most important asset is usually a mere fraction by comparison. Are maintenance budgets misallocated? It’s well known that human-caused downtime is one of the largest reasons for downtime – if not the largest reason. It’s just an observation, but oftentimes in my personal experience, I see an allocation of budgeted resources that doesn’t align with what we say is important. Many budgets look like this: Equipment maintenance budget – $600,000/year Training for technicians budget – $10,000/year (if that)...

Ever wonder what it’s like to be the facilities manager of a major data center? Here is a “normal” day – but in retrospect, there really is no such thing as a “normal” day for a facilities manager. 0500 – 22 unread emails Somewhere close to 5:00 a.m., thoughts begin to penetrate your slumber. Your eyes open and confirm again that, for you, an alarm clock is not a necessity. A call comes in from Security as you go about your morning routine, something about a contractor who wants to bring in an employee that is not on the access list. During this call, the contractor calls you. Switching between calls, you determine this person really is from a sub-contractor of the contractor and that he is needed to help install a piece of equipment integral to their project. You acquiesce and give the guard your approval, informing him exactly where this individual is permitted to be and for how long. Two cell phones, radio, keys, computer bag – cold coffee – and you’re out the door....

I can only speculate as to what caused Amazon’s latest outage, an apparent “loss of power.” But this week, I’m going to express my opinions in no uncertain terms – fair warning. In my experience, most organizations actually CHOOSE to have outages. I don’t care what their sales slogans promise. They choose to have outages. If you don’t believe me, just read their SLAs (Service Level Agreements). Most offer some sort of guarantee of uptime or service availability. Amazon guarantees 99.95 percent uptime – or about 0.72 minutes of downtime a day. It translates to more than four hours a year. Beyond that, most will give you “credit” toward the loss of service with either billing credit or more services. So as long as the outage is less than four hours per year, no foul. You might even get a “We’re sorry.” Rackspace offers a 100 percent uptime guarantee but will only reimburse 5 percent of your monthly fee for every half hour of outage. So if you have 10 hours of downtime, you don’t have to pay the monthly fee. Not a great option if your business is global and your average revenue is a million dollars/hour....

Rivalries between people are caused by competing goals, limited resources, or personal issues. As the leader of an organization, you can control goals and resources. When it comes to personal issues, more often than not you wind up removing them from the organization one way or another. The different groups within organizations also operate as people do in the sense that collectively they have competing goals and must vie for limited resources. It’s natural that rivalries would develop as groups with divergent goals compete for limited resources; but, left unchecked, rivalries can be very detrimental to the efficiency of an organization as a whole. So how do you as a leader of such an organization prevent this type of rivalry and create an environment that promotes cooperation and organizational efficiency? Communicated goals tend to unify organizations The answer lies in communicating the goals and priorities of the organization. While goals usually remain consistent over long periods of time, priorities can actually change from day to day – even hour to hour. Yet when everyone knows the goals and understands the priorities, they will naturally work toward those as informed. Leadership needs to make sure that each group understands how their tasks work to achieve the goals and where they fit into the priorities of the organization as a whole. It sounds simple enough, but it’s actually far more difficult to put into practice. So hopefully an example or two will illustrate how things can go awry....

In the mission critical environments industry, we often talk about that “failure is not an option” – and for the most part, we believe that and work toward that goal. But the stark reality is that failure is inevitable. At some point in the future, everything will fail. We do not have unlimited resources, nor do we have perfect engineering or flawless operations. Whether we look at air travel, nuclear power plants, or even the brakes on our cars, failure occurs. We build backup systems for those inevitable failures, but even the backup systems will fail. I have seen quadruple backup processes fail. The only thing we can do is try to mitigate the results of failure and/or how often it occurs. So how do we cope with inherently dangerous or expensive processes and their inevitable failure? In some cases, we deem failure an “acceptable risk.” We calculate the chance of failure as Mean Time Between Failure (MTBF), the likelihood of an event is 1 in 200,000 years, 100-year flood plains, et cetera. The number of deaths per passenger mile on commercial airlines in the United States between 1995 and 2000 was about three deaths per 10 billion passenger miles. Not bad, but not perfect either – three people died per 10 billion passenger miles. Is that an acceptable risk? Probably not if you’re one of the three. But even in the highly regulated, inspected and trained world of commercial air travel, failures occur. What I wonder is, If we doubled the resources that we use in that industry for safety, would we see a reduction to 1.5 deaths per 10 billion passenger miles? What if we spent ten times what we do now? Would the statistic be reduced to 0.3 deaths per 10 billion passenger miles? At what point do we run out of resources, and can we ever get it to zero failures? The answer is no, there will always be something unforeseen – just ask the management of the Fukushima nuclear power plants....