Friday, May 30, 2008

Looking for inspiration, I checked out my latest Google Alerts for "cloud computing" and found an interesting--perhaps even disturbing--trend: people are locking in their definitions of cloud computing. The problem is these definitions are largely inconsistent.

Cloud computing describes a systems architecture. Period. This particular architecture assumes nothing about the physical location, internal composition or ownership of its component parts. It represents the entire computing stack from software to hardware, though system boundaries (e.g. where does one system stop and another begin) may be difficult to define. Components are simply integrated or consumed as need requires and economics allow.

For what its worth, I have found myself shifting a little; not so much on the definition, but on what exactly it defines. Given the largely consensus opinion that Cloud Computing refers to a service model, I am willing to concede that the description above really describes a "Cloud Oriented Architecture" for a complex integrated environment. The true definition of cloud computing is still evolving in my mind.

Now, back to the posts at hand. What I believe I am seeing these days is a split between two camps; the "cloud computing is only about services" camp, and the "cloud computing is getting what ever you need from the Internet" camp.

"There seems to be a group myopia around so-called ‘cloud computing’ and it’s definitions. What we’re really talking about are ‘cloud services’ of which, ‘computing’, is only a subset. It gets worse when you have people talking about Software as a Service (SaaS) as a ‘cloud’ service. Things continue to become murkier when the SaaS crowd, bloggers, and reporters start making up new definitions for cloud services using SaaS-like terms such as Platform as a Service (PaaS) and Infrastructure as a Service (IaaS)."

"When I think of a service as cloud computing, it is characterized by being an offering of nearly unlimited capacity (although it may be billed differently at different utilizations) which has some sort of generic utility but beyond certain minimal architectural requirements there should be no inherent specificity in what it may or should do. It may be a service of a certain type of utility, perhaps storage, raw processing capability, or data storage, but in the same way that a datacenter does not restrict what servers you may host with them, it should not restrict what sort of data you store, process, or serve."

[Some definition links removed]

Sort of a "cloud services have a cloudy definition" kind of definition.

"Cloud Computing (Figure 1.0) is a commercial extension of computing resources like computation cycles and storage offered as a metered service similar to a physical public utility like electricity, water, natural gas, or telephone network. It enables a computing system to acquire or release computing resources on demand in a manner such that the loss of any one component of the system will not cause total system failure. Cloud computing also allows the deployment of software applications into an environment running the necessary technology stack for the purposes of development, staging, or production of a software application. It does all this in a way that minimizes the necessary interaction with the underlying layers of the technology stack. In this way cloud computing obfuscates much of the complexity that underlies Software as a Service (SaaS) or batch computing software applications. To explain better though, let's simplify that and break it down this definition to it's constituent parts."

Langley's definition is more closely aligned with utility computing, but may be best summarized as a "if you can run it on the Internet, its a cloud".

All of which leads to a gap in terminology that gets filled by whatever reaches the vacuum at the moment: what do you call a "cloud-like" infrastructure in a private data center? As I noted to the Google Groups Cloud Computing alias:

"[H]ere (is) how I arrived at that conclusion:

If "grid computing" is about running job-based tasks in a MPP model (e.g. HPC) (as it seems to be defined for many), and

If "utility computing" is a business model for providing computing on an as-needed, bill-for-what-you-use basis, and

If "cloud computing" is a market model describing services provided over the Internet (which it is for most of the Web 2.0 world), and

If "virtualization" describes providing software layers in the execution stack to decouple software from the hard resources it depends on (and it is important to note for the purposes of this argument that "resource-pooled" does NOT require virtualization in this sense; it is quite possible to run your software on bare metal server pools, as we did at Cassatt)

Then, what do we call the systems/infrastructure model where resources are pooled together, and used for a variety of workloads, including both job-based and "always running" tasks (such as web applications, management and monitoring applications, security applications, etc.)?

Do we redefine "grid" to cover the expanded role of resource-pooled computing (as 3TERA seems wont to do)? Do we leverage "utility computing" as an adjective for platforms that can deliver that business model for those that own infrastructure (as Cassatt and IBM tend to do)? Does the term "virtualization" represent a broader view than how VMWare, Microsoft and Citrix are defining it? Is there another term (such as "resource-pooled computing"--ugh) that would better serve the discussion?"

I'm still hunting for the answer to that one.

However, in terms of my definition of cloud computing, I have to say I lean towards the "anything you can run on the Internet" camp, as it--to me--best represents what an actual drawing of a cloud means in a system diagram. Just "go to the cloud" and get what you need, whether its a complete CRM system or a simple purchasing service. This eliminates a million potential grey areas at the boundaries of the "only about services" definition. Is PayPal a cloud service? Why or why not?

I'd love to hear from those of you that are beginning to see some consensus in online communities about what a constitutes a cloud or cloud service and what doesn't. In the meantime, I am settling down for another long summer of fog (this is the Bay Area, after all), though I'll have plenty of company, I'm sure.

Turns out that Luis has set up a group blog that I feel is much better targeted at the Alfresco audience in general, so I am going to kill MiningAlfresco before it really gets started. Sorry for the false alarm, but check out The Gang: Thoughts from the Alfresco Field Technical Team if you remain interested in the topic.

Thursday, May 29, 2008

I didn't want to sully this blog by introducing a whole bunch of ECM/Alfresco stuff here, so I created a second blog for that content. Mining Alfresco will cover my experiences in learning the ECM market, Alfresco (look for a lot of technical postings), and how all of that relates to the topic of this blog, Cloud Computing. If you have an interest in ECM or "Content in the Cloud", you may want to check it out and subscribe.

I also want to apologize for the "dead time" in my posting, but as you can imagine spinning up a new job takes a lot of focus. I'll try to fit in several new posts in the coming days.

Thursday, May 22, 2008

Ken Oestreich blogged recently about the very cool, probably landmark release of Cassatt that just became available, Cassatt Active Response 5.1. He very eloquently runs down the biggest feature--demand based policies--so I won't repeat all of that here. What I thought I would do instead is relate my personal thoughts on monitoring based policies and how they are the key disruptive technology for data centers today.

To be sure, everyone is talking about server virtualization in the data center market today, and that's fine. It's core short-term benefit, physical system consolidation and increased utilization is key for cost-constrained IT departments, and features such as live motion and automatic backup are creating new opportunities that should be carefully considered. However, virtualization alone is limited in its applications, and does little to actually optimize a data center over time. (This is why VMWare is emphasizing management over just virtualizing servers these days.)

The technology that will make the long term difference is resource optimization: applying automation technologies to tuning how and when physical and virtual infrastructure is used to solve specific business needs. It is the automation software that will really change the "deploy and babysit" culture of most data centers and labs today. The new description will be more like "deploy and ignore".

To really optimize resource usage in real time, the automation software must use a combination of monitoring (aka "measure"), a policy engine or other logic system (aka "analyze") and interfaces to the control systems of the equipment and software it is managing (aka "respond"). It turns out that the "respond" part of the equation is actually pretty straight forward--lots of work, but straight forward. Just write "driver" like components that know how to talk to various data center equipment (e.g. Windows, DRAC, Cisco NX-OS, NetApp Data ONTAP, etc.), as well as handle error conditions by directly responding or forwarding the information to the policy engine.

The other two, however, require more immediate configuration by the end user. Measure and analyze, in fact, are where the entire set of Service Level Automation (SLAuto) parameters are defined and executed on. So, this is where the key user interface between the SLAuto system and end user has to happen.

What Cassatt has announced is a new user interface to define demand based policies as the end user sees fit. For example, what defines an idle server? Some systems use very little CPU while they wait for something to happen (at which point they get much busier), so simply measuring CPU isn't good enough in those cases. Ditto for memory in systems that are compute intensive but handle very little state.

What Cassatt did that is so brilliant (and so unique) is to allow the end user to leverage the full range of SNMP attributes for their OS, as well as JMX and even scripts running on the monitored system to create expressions that define an idle metric that is right for that system. For example, on a test system you may in fact say that a system is idle when the master test controller software indicates that no test is being run on that box. On another system, you may say its idle when no user accounts are currently active. Its up to you to define when to attempt to shut down a box, or reduce capacity for a scale-out application.

Even when such an "idle" system is identified, Cassatt gives you the ability to go further and write some "spot checks" to make sure they system is actually OK to shut down. For example, in the aforementioned test system, Cassatt may determine that its worth trying to power down a system, but a spot check could be run to determine if a given process is still running, or an administrator account is currently actively logged in to the box that would indicate to Cassatt that it should ignore that system for now.

I know of no one else that has this level of GUI configurable monitor/analyze/respond sophistication today. If anyone wants to challenge that, feel free. Now that I no longer work at Cassatt, I'd be happy to learn about (and write about) alternatives in the marketplace. Just remember that it has to be easy to configure and execute these policies, and scripting the policies themselves is not good enough.

It is clear from the rush to release resource optimization products for the cloud, such as RightScale, Scalr, and others, that this will be a key feature for distributed systems moving forward. In my opinion, Cassatt has launched itself into the lead spot for on premises enterprise utility computing. I can't wait to see who responds with the next great advancement.

However, I was beginning to feel like I needed another beefed up system of my own at home to act as a multi-guest virtual "server farm" for various experiments, etc., that may include scale-out benchmarking, interesting integration issues, etc. My initial thought was a 8-core Mac Pro loaded with memory and disk, which would have set me back about $6500. So I asked Luis what he thought, and he said, "Don't Bother. Whenever I need a bunch of servers to test with, I generally find [Amazon] EC2 works perfectly fine."

You could have heard the head slap a mile away.

With all of my focus being on enterprise computing the last two years, I had totally lost sight of the "individual" applications of a cloud like EC2. I no longer have to think about building up a server farm of my own, or purchase a big honkin' dual Quad-core tower, or even reserve space on the corporate "cluster library". I just need my credit card, my Amazon account, and a little time with the "Getting Started" tutorial, and I have all the server resources I need at a price that is a fraction of buying the big box, with billing that allows me to easily expense work-related computing. Damn, I love the modern world!

Now, all of this probably seems so obvious to all of you out there, and it probably cracks you up to see a cloud computing blogger miss this opportunity to "reach for the clouds", so to speak. However, I think this is indicative of the change that both individuals and enterprises must go through to take advantage of these new breed of technologies.

I, like may Fortune 500 IT departments, am an old school client-server/SOA guy. I have a "use the right tool for the job" mentality, driven by years of pain trying to force procedural pegs into SOA holes. This mentality leads to a "best of breed" bias that leads one to worry about the ground up implementation of any software solution. If a tool was found that reliably hid some of that implementation, that was awesome and incredibly helpful to productivity. However, one needed to still understand how the server worked with the OS worked with the middleware worked with the application implementation to be comfortable to go to production.

To me, Amazon, Mosso, Cassatt and others are indicative of a major change in this mentality. With reliable shared configurations of systems (or a reliable systematic infrastructure for matching compute tasks to disparate resources that can handle those tasks), application developers now need to know less and less about the server, networking and storage part of the equation. Now, with the focus from the OS on up the stack, developers can start shopping for the infrastructure that makes economic sense for the problem they are trying to solve. The trick, of course, is to remember there are alternatives to buying your own servers.

So, this week I started to play with Amazon EC2, S3 and Cloud Services' new instance management tool, Cloud Studio. Let me just say, I am incredibly impressed with what I've done so far, which is little more than creating, starting and terminating instances (with a little between machine networking thrown in for fun). Even using Amazon's command line tools, it is a pretty straight forward process to get either a 32-bit or 64-bit server, but when you add the visual cues of Cloud Studio, it just becomes so simple it boggles the mind.

Now, there are definitely disadvantages to using Amazon for some problems. Windows support is out, for instance. (Anyone have a good suggestion for a true on-demand pricing option for Windows? Mosso would work, I hear, but they have a fixed upfront price that is a little steep for my general needs.) Also, any work that involves large amounts of data transfer ups the ante greatly. (Kevin Burton talked about this some time ago--see his note about bandwidth pricing just below the last quote, about half way down.) However, I will never again forget to consider the cloud before "own your own" for any computing task I have in my personal world.

Saturday, May 17, 2008

As I discussed in my last post, the change of jobs gives me the opportunity to broaden the coverage of this blog somewhat beyond the basic topic of delivering SLAuto to enterprise data centers. To more completely reflect this, and (quite frankly) to increase visibility to those searching for information about cloud computing and utility computing, I have changed the title and description of this blog.

Now titled "The Wisdom of Clouds" (with absolute apologies to James Surowiecki and his great book, The Wisdom of Crowds) this blog will discuss cloud computing, utility computing, SaaS, PaaS and Haas as they relate to both the enterprise and individual users. This really isn't much of a departure from the topics covered in the last year or so--in fact, I considered sub-titling the blog "Covering your *aaSes since 2006"--but the explicit description allows more people to more readily discover my ramblings.

For those who have been following this blog for some time, as well as those who have just discovered it, I thank you. I hope you will join me in creating and shaping "the wisdom of clouds".

Thursday, May 15, 2008

It is with mixed feelings that I announce that I am leaving Cassatt, effective COB tomorrow. I want to state first and foremost that this change was for personal and family reasons, and NOT because of issues with either the company or technology at Cassatt. I had a phenomenal two years with the company, and will remain in touch with much of the organization in the coming years. Cassatt still has the most technology independent solution to data center optimization and on-premises utility computing infrastructure. I still firmly believe in the vision and opportunity that is Cassatt.

That being said, the commute was killing me, and with Owen in preschool, Emery home with a Nanny that rightfully deserves reasonable working hours each day, and Mia in clinicals for her sonography program, something had to give. Not being in a hurry to make a change, I dabbled in conversation with a few traditional enterprise sales companies, but none of them blew me away. Then, unexpectedly, Matt Asay contacted me via LinkedIn (the worlds BEST professional network), and asked me if I would be interested in his current open source endeavor, Alfresco.

Now, Enterprise Content Management (ECM) has never been my gig, but I thought Matt's pitch was interesting, so I took a call with him. As he covered the company, the technology and the opportunity, I found myself getting more and more excited. After then talking to the VP of Alliances, Martin Musierowicz and the Senior Director of Solutions Engineering, Luis Sala and downloading and playing with the technology, I felt it was an opportunity that met both my immediate and long term needs. It was almost a no-brainer to accept the offer when it arrived.

The position, a solutions engineering role working with their Alliances group, let's me work from home, so the commute couldn't be better (though there are clients all around the Bay Area, and there will be occasional travel both domestically and internationally). It is also with a growing open source company, which I am extremely excited about. Oh...and I get a Mac!

In that first conversation with Matt, I asked him a key question: "Matt, you've seen my blog; I remain keenly interested in cloud/utility computing. How does that fit in with Alfresco's needs, and would I be able to continue exploring the subject as part of my job." In response, Matt said flat out that Alfresco very much encourages the building of personal brands, and that cloud computing is indeed a subject of interest to Alfresco. So, I will continue this blog, though I will probably rename it later this week. (Mostly for better search visibility...)

As for ECM, I intend to build some real expertise in the space very quickly, so I will create a separate blog (probably ecm.jamesurquhart.com) to cover that space, and my experience with Alfresco. There will be some cross-linking between the blogs as I discover how the two technologies can help each other, but I will endeavor to keep each on topic as much as possible.

(One interesting side effect of creating an ECM blog is that I risk landing in the crosshairs of James McGovern. How fun would that be?)

I will miss my friends and colleagues at Cassatt, but I am very much looking forward to this new future. As always, I can be contacted on LinkedIn, FriendFeed or at jurquhart (no spammers) at yahoo dot com.

Monday, May 12, 2008

A few days ago there was significant coverage of Project Caroline, Sun's new open source cloud computing platform and service offering. While seemingly taking a page directly out of Google's play book, Caroline is actually interesting for a few key differences (adapted from Rich Zippel's blog):

It is an open source research project, not an actual product offering at this time. This means Sun's services are offered for free. Of course, there is one catch with regards to the Sun offering: you must have a Grid account, and you will be charged for resources used on that grid.

The source code for the entire stack is freely available today. Not just the programming APIs, as in Google's case, but the entire stack. If you are comfortable using Glassfish, Postgres, and "limiting" languages to Java, Perl, Python, Ruby, and PHP, you can start your own Caroline-compatible cloud computing company today. Just remember, its a research project, so all of this is subject to change.

In some ways, this is what you would expect from Sun: an engineering research project touted as the future of computing. No charge for the software, etc, but note that Sun can actually monitize this through the Grid-hosted offering.

I still hold some Sun stock, so I'm actually a little excited about the possibility that there may be an actual new revenue stream here. Could you imagine, Sun actually branching out from pure hardware? The timing is good too, as they may have a better prescription than their more successful competitors, at a time when sales to corporate data centers may be hitting their peak. If handled well (which is a big "if" with Sun), this could guarantee a growing revenue stream for decades to come, even if corporate IT nearly stops buying servers.

I haven't played with Caroline yet, but I think Sun is at least marketing the platform I hoped that Google, or Microsoft, or Adobe, or someone out there would have built. Yeah, its Sun, so its probably a computer science dissertation project to configure and manage the thing, but who else is doing five languages on industry standard infrastructure with RDBMS support?

I'm hoping to get around to evaluating this in some detail in the next couple of weeks, so stay tuned.

Friday, May 09, 2008

A few weeks ago I joined the Google Groups Cloud Computing group as a charter member of sorts. Today there is an excellent thread going on regarding the various elements needed to make a cloud market work. It started with Reuven Cohen of Enomaly proposing the need for a new marketing term...er, technology concept...he calls a Virtual Private Cloud. The idea here is to create a logical container for a variety of resources located across multiple data centers and technology, making it all appear as a single homogeneous computing environment with security, management (SLAuto?), etc.

However, what has spawned from that original proposal is a wide ranging, but thoughtful discussion of what is needed to allow for an open market for compute resources in the cloud. Participating are several of the vendors supplying various services for Amazon EC2 today (and, I'm sure, a wide range of others in the future), and several end users of primarily "Computational Grid" technologies.

"Condor (http://www.cs.wisc.edu/condor/) is a project that enables this sort of scenario. But in so doing, things go full sircle [sic], and suddenly the paradigm is the old days of mainframe computing where the notion of "job" is separate from the underlying computing resource."

I replied

"I hate the notion of every software executable being a "job". [With most] high availability applications running additional instances [of "permanent" processes] in excess capacity at another site is a distinctly possible scenario.

I prefer the term "software payload" to describe what gets moved from cloud provider to cloud provider, at least at the HaaS level."

The idea of a "job" is just not applicable to many user-facing applications.

"Application scaling IMHO will alwaysinvolve a mixture of automated systems and programing changes to theapplication. I don't think this aspect of cloud computing can ever becompletely automated.

The typical "throwing hardware at it" works up to a point and in cloudcomputing will be no different since the a cloud system is still based uponthe Von Neumann architecture. There is a point where it becomes more than asysadmin/scaling challenge. Programming changes will need to made to thespecific application. What scales to X, won't scale to Y because of adifferent bottleneck."

And I agree, adding

"I see two architectural problems to be managed when building an application for the cloud:

- Scalability - which you cover below

- Fluidity - which is the ability to move an application, *applicationtier* or service between cloud infrastructures without rewriting orreconfiguring the software payload"

Geoffrey Fox notes that much work has been done to analyze and design Computational Grid economies. I assumed that Geoffrey was taking about grid computing models based on splitting up "jobs", so I noted:

"The problem is partially that Computational Grid computing is a subclass of the Cloud Computing metrics and standard problem. See earlier notes about long running processes versus "jobs"."

In other words, I'm not sure how far people have gone to analyze enterprise computing on a grid versus just HPC, etc.

There is so much more to read on this thread and others in this rapidly growing group; not the least of which will be the responses to my thoughts here. If you haven't joined yet, and you are interested in cloud computing strategy and tactics, I recommend that you get involved.

Now, I'm a huge Gabriel fan, so this was interesting in part because I feel for the guy and hope nothing of great value was stored on those servers. However, my interest was peaked by the realization that this highlights one of the key values of decoupling software from hardware. To illustrate this advantage, I'd like to paraphrase Heisenberg's famous Uncertainty Principle:

In shared resource computing, you can locate the server, but you cannot firmly define what is running on the server (over time); conversely, you can define the software image, but it is difficult to firmly locate which server it is running on (over time).

Thus, if someone comes into a data center that is sharing server resources in a utility computing like model and steals a server, they will very likely get no data whatsoever. Conversely, if they want the data, they have to steal all of the storage associated with the server image, which in many environments is spread amongst several physical drives; is dependent on the network infrastructure in which it is running; and is useless without both a compatible server to execute it, and a compatible management system to deliver it to that server.

To me, this greatly enhances system security over dedicated server models. If Gabriel's stuff had been PXE booted on random servers around the hosting center, from distributed storage systems, he may have foiled his thief's plans. He certainly would have made it much more technically difficult for them.

The more I learn about decoupling software from hardware, whether through server virtualization or policy-based dynamic deployment, the more I think its a no-brainer for most computing applications. Plus, it makes SLAuto possible--which has its own benefits, of course.

Saturday, May 03, 2008

I've been quite silent for a week or two, mostly because of my responsibilities as a sales engineer; doing my part in closing key deals for my employer. I've spent this time sitting in meetings, installing and configuring software, and measuring power savings in large dev/test lab installations. (By large I mean hundreds approaching thousands of servers.) All in all, its been a successful couple of weeks, but its kept me from keeping too close an eye on the big news coming out of the cloud and utility computing markets.

However, as I thought about this more, I realized that I have drifted significantly from my core subject, Service Level Automation (or SLAuto), in the last six months or so--mostly due to the incredible burst of cloud computing innovation to be announced and/or delivered in that time frame. I still believe that there are two key components to an open cloud market that scales:

Portable platforms that allow customers to change vendors on a whim

Automation that takes action to acquire, release or replace services based on pre-determined service targets

The latter, simply said, is SLAuto.

Of course, what is happening is sort of the nascent birth of cloud computing technologies, where the DNA hasn't had a chance to recombine to build long term survivability into any given "species" yet. We all knew that AWS was doing cool things, but who knew that they would cross the chasm in terms of customer demand as completely as they did? Yet, there is no portability story for Amazon (at least not off of Amazon); and the market forming for SLAuto (see RightScale and others) is tightly tied to the Amazon platform.

The rest of the "big" announcements are worse: Microsoft has no concept of management in Live Mesh (other than synchronization) that I can see, and Google and Yahoo are both building platforms with developers in mind, where service levels are a business agreement, not a platform differentiator. I understand we are taking baby steps here, but I wonder how long it is before corporate IT realizes that they are both a) locked in (at least in an economic sense), and b) paying too much to operate software that doesn't even run in their data center.

Now, I say all of this, but truth be told, most corporate IT shops don't do SLAuto today. So, why should this change in the cloud? I hinted at it earlier: scale. Not scale of functional execution or data access, as we usually think of the term, but scale of market--the speed at which companies will need to respond to the ever evolving marketplace for cloud services and platforms. As self-professed "open" nature of Google and Yahoo's platforms become more of a reality, combined with true innovation in "industry" standard APIs (for capacity management, code platforms and feature integration), there is little doubt that pressure will be on the IT shop to optimize the cost of delivering business services to the rest of the company. Again, I argue that this cannot be done without SLAuto. Prove me wrong.

I am really concerned that SLAuto is still considered "bleeding edge" in most IT shops. Its not rocket science, and the future of IT cost management almost certainly has to be built around it. On the other hand, perhaps as some of these customers I worked with the last couple of weeks serve as references to the value of SLAuto--at least in terms of energy costs--more of them will understand its urgency.

About Me

James Urquhart is a widely experienced enterprise software field technologist. James started his career programming a manufacturing job tracking system on the Macintosh (circa 1991), and slowly expanded his experience to include distributed systems architectures, online community and identity systems, and most recently utility computing and cloud computing architectures. He has held positions in pre and post sales services, software engineering, product marketing, and program management for the online developer communities of one of the largest developer sites in the world. His admittedly schizophrenic background is driven by a desire to work with technologies that are disruptive, but that simplify computing overall.

James is also an avid blogger. His primary blog, recently renamed "The Wisdom of Clouds" (http://blog.jamesurquhart.com), is focused on utility computing, cloud computing and their effect in enteprises and individuals.

In addition to his online work, James is the father of two children: a son, Owen; and a daughter, Emery; and the husband of the perfect friend and wife, Mia. James lives in Alameda, CA, plays rock and bluegrass guitar.