January 18, 2010

The Missing Piece in the Virtualization Stack (Part 1)

This and the next post will discuss how virtualization and cloud computing, as we know it today, is only a small part of the solution for today’s IT inefficiencies. While new technologies and delivery models have made it much simpler to manage the infrastructure, this is not where our core inefficiencies lie. Virtualization principles must be extended to higher levels of the application stack, to make it easier for all of us to manage, tune and integrate applications. Otherwise we will continue to spend most of our time on things that don’t provide real value to the business.

What do we really spend our time on?

If you’ve been in the application development space for a while, I'm sure that you are all familiar with the current application development cycle. The diagram below shows a typical application development cycle. As you can see, we spend a large part of our time on things that don’t provide real value to our business.

Typical application development lifecycle

The continuous demand for scale and scalability has made things even worse – many of us are forced to repeat this cycle over and over again every time we are faced with new scaling requirements:

The promise of virtualization/cloud

Virtualization and cloud computing aim to solve a large part of the overhead involved in setting up the infrastructure (buying new hardware, setting it up, installing it, etc). Indeed, we can now start a new machine just by calling an API, we can lease a machine or even completely outsource our entire infrastructure to a public hosting provider.

Does this solve all of our problems?

As I outlined in the diagram above, setting up the infrastructure is only part of the challenge in the development of new business application. If you measure the complexity/effort required, plugging an application into the infrastructure isn’t necessarily the biggest challenge. Most of us spend most of our time maintaining our code, plumbing it to other services within our organizations and continuously maintaining and tuning it. In recent years, with the growth of data volumes on the one hand and the demand for better efficiency on the other hand, I found that most of the time (and cost!) is spent on dealing with these two contradicting requirements: Each demand for additional scaling forces us to go through a complete cycle of tuning, design and in some cases, through a complete product selection phase to meet the demand.

Last week alone, I found myself spending a good amount of time in discussion with a large telco ISV that built its solution through a combination of storage devices, databases, and so on. In the telco world many of these services face both an increase in the size of the data (per user) and an increase in the number of users. Imagine the increase in the size of pictures that you’re able to send through your phone. It started with few KB, is now up to 100KB and will soon get up to Megabytes of data per message, as the camera resolution grows. Multiply it by the number of users and messages per second and you get a classic scaling challenge. In this telco ISV’s specific case, it is fairly easy to partition the problem based on users (personally, I believe that this is only a temporary assumption, as I'm sure that with the likes of Twitter this assumption will no longer hold true). Now, they could have gone through the traditional way of scaling which is to duplicate their system several times, each unit dealing with smaller amount of users. Now that sounds easy, so why they are still reluctant to do that? The answer is fairly easy – cost. This is potentially an easy solution but fairly inefficient. Taking into account that there is only so much a customer would be willing to pay, that cost will come out of their pocket and effect their profit margin. It may even cost them their business, as at this point they might be beaten by a competitor who comes up with a more efficient solution.

The elephant in the room

Now here is the elephant in the room – would virtualization or cloud solve their problem? It might be a solution – but for only a small part of their challenge. And that’s my point. We spent a good deal of the past year talking about cloud and virtualization as the solution for all of our inefficiency problems, but we forgot that they cover only a small part – in some cases even a fairly insignificant part – of our challenge. Our main real-life challenge is not how to make our infrastructure more efficient but how to make our business more efficient!

To illustrate this gap, I like to use the three questions below:

Assuming that with cloud and virtualization you can easily create a new machine by a call of an API…

Q1: What would happen to your existing application when you add a new machine? A1: Nothing – it wouldn’t even know that it exists if we wouldn’t tell it (through manual work).

Q2: Assuming that you addressed (1) – which part of your application would you run on that new machine? A2: It depends... we need to measure and see => meaning more manual work…

Q3: Assuming that you addressed (2) – what do you expect would be the impact of the new hardware capacity on your application, in terms of latency/throughput or concurrent users? A3: We wouldn’t know until we measure it in real life => meaning lots of manual tuning, testing, optimization and in some cases redesigning your entire system.

The solution

The challenge I was trying to outline doesn’t necessarily point to a flaw in virtualization or even cloud computing, which is basically an outsourced version of virtualization. It is more to do with the fact that the IT world has applied the concept of virtualization only to the lower level part of the stack – the infrastructure, and expected that it will solve all its inefficiencies. Conceptually, I believe that virtualization is the way to go but it needs to be applied through the entire stack,as I outlined in one of my earlier posts (The Missing Piece in Cloud Computing – Middleware Virtualization). To learn how to apply the concept of virtualization through the entire stack, it must be better understood how virtualization works in other layers, above the infrastructure. If you examine different virtualization technologies such as storage virtualization, operating system virtualization and desktop virtualization, a pattern emerges:

The Virtualization pattern

1. Break big physical resources into smaller logical units

2. Decouple the application from the physical resources

3. Provide an abstraction that makes all the small units look like one big unit

Scaling pattern of a virtual resource

When you scale a virtualized resource, you basically plug in more small physical resources, and thus increase your capacity. The abstraction layer is responsible for detecting these new resources and adding the new resource to its pool. Since the application is decoupled from these resources, it “sees” the increased capacity without necessarily worrying about where those resources exist.

Making it more efficient through resource sharing and pooling

The way to make the solution more efficient, is to pool and share resources together among multiple instances of the application. This is often called multi-tenancy. The general idea is that you can pool the resources of multiple users of your application, and assume that none of them is going to require your full capacity, so you can put them on the same underlying hardware. Obviously, one of the the biggest challenges with multi-tenancy is isolation, i.e., how to let each user “feel” as if she is running on her own dedicated resource.

I know this is a fairly simplistic view of the concept, and obviously doing this for a mission critical application that is running in production is going to require much more thought. In part II of this post I’ll discuss in more depth how to apply those principles through the entire stack.

Comments

The Missing Piece in the Virtualization Stack (Part 1)

This and the next post will discuss how virtualization and cloud computing, as we know it today, is only a small part of the solution for today’s IT inefficiencies. While new technologies and delivery models have made it much simpler to manage the infrastructure, this is not where our core inefficiencies lie. Virtualization principles must be extended to higher levels of the application stack, to make it easier for all of us to manage, tune and integrate applications. Otherwise we will continue to spend most of our time on things that don’t provide real value to the business.

What do we really spend our time on?

If you’ve been in the application development space for a while, I'm sure that you are all familiar with the current application development cycle. The diagram below shows a typical application development cycle. As you can see, we spend a large part of our time on things that don’t provide real value to our business.

Typical application development lifecycle

The continuous demand for scale and scalability has made things even worse – many of us are forced to repeat this cycle over and over again every time we are faced with new scaling requirements:

The promise of virtualization/cloud

Virtualization and cloud computing aim to solve a large part of the overhead involved in setting up the infrastructure (buying new hardware, setting it up, installing it, etc). Indeed, we can now start a new machine just by calling an API, we can lease a machine or even completely outsource our entire infrastructure to a public hosting provider.

Does this solve all of our problems?

As I outlined in the diagram above, setting up the infrastructure is only part of the challenge in the development of new business application. If you measure the complexity/effort required, plugging an application into the infrastructure isn’t necessarily the biggest challenge. Most of us spend most of our time maintaining our code, plumbing it to other services within our organizations and continuously maintaining and tuning it. In recent years, with the growth of data volumes on the one hand and the demand for better efficiency on the other hand, I found that most of the time (and cost!) is spent on dealing with these two contradicting requirements: Each demand for additional scaling forces us to go through a complete cycle of tuning, design and in some cases, through a complete product selection phase to meet the demand.

Last week alone, I found myself spending a good amount of time in discussion with a large telco ISV that built its solution through a combination of storage devices, databases, and so on. In the telco world many of these services face both an increase in the size of the data (per user) and an increase in the number of users. Imagine the increase in the size of pictures that you’re able to send through your phone. It started with few KB, is now up to 100KB and will soon get up to Megabytes of data per message, as the camera resolution grows. Multiply it by the number of users and messages per second and you get a classic scaling challenge. In this telco ISV’s specific case, it is fairly easy to partition the problem based on users (personally, I believe that this is only a temporary assumption, as I'm sure that with the likes of Twitter this assumption will no longer hold true). Now, they could have gone through the traditional way of scaling which is to duplicate their system several times, each unit dealing with smaller amount of users. Now that sounds easy, so why they are still reluctant to do that? The answer is fairly easy – cost. This is potentially an easy solution but fairly inefficient. Taking into account that there is only so much a customer would be willing to pay, that cost will come out of their pocket and effect their profit margin. It may even cost them their business, as at this point they might be beaten by a competitor who comes up with a more efficient solution.

The elephant in the room

Now here is the elephant in the room – would virtualization or cloud solve their problem? It might be a solution – but for only a small part of their challenge. And that’s my point. We spent a good deal of the past year talking about cloud and virtualization as the solution for all of our inefficiency problems, but we forgot that they cover only a small part – in some cases even a fairly insignificant part – of our challenge. Our main real-life challenge is not how to make our infrastructure more efficient but how to make our business more efficient!

To illustrate this gap, I like to use the three questions below:

Assuming that with cloud and virtualization you can easily create a new machine by a call of an API…

Q1: What would happen to your existing application when you add a new machine? A1: Nothing – it wouldn’t even know that it exists if we wouldn’t tell it (through manual work).

Q2: Assuming that you addressed (1) – which part of your application would you run on that new machine? A2: It depends... we need to measure and see => meaning more manual work…

Q3: Assuming that you addressed (2) – what do you expect would be the impact of the new hardware capacity on your application, in terms of latency/throughput or concurrent users? A3: We wouldn’t know until we measure it in real life => meaning lots of manual tuning, testing, optimization and in some cases redesigning your entire system.

The solution

The challenge I was trying to outline doesn’t necessarily point to a flaw in virtualization or even cloud computing, which is basically an outsourced version of virtualization. It is more to do with the fact that the IT world has applied the concept of virtualization only to the lower level part of the stack – the infrastructure, and expected that it will solve all its inefficiencies. Conceptually, I believe that virtualization is the way to go but it needs to be applied through the entire stack,as I outlined in one of my earlier posts (The Missing Piece in Cloud Computing – Middleware Virtualization). To learn how to apply the concept of virtualization through the entire stack, it must be better understood how virtualization works in other layers, above the infrastructure. If you examine different virtualization technologies such as storage virtualization, operating system virtualization and desktop virtualization, a pattern emerges:

The Virtualization pattern

1. Break big physical resources into smaller logical units

2. Decouple the application from the physical resources

3. Provide an abstraction that makes all the small units look like one big unit

Scaling pattern of a virtual resource

When you scale a virtualized resource, you basically plug in more small physical resources, and thus increase your capacity. The abstraction layer is responsible for detecting these new resources and adding the new resource to its pool. Since the application is decoupled from these resources, it “sees” the increased capacity without necessarily worrying about where those resources exist.

Making it more efficient through resource sharing and pooling

The way to make the solution more efficient, is to pool and share resources together among multiple instances of the application. This is often called multi-tenancy. The general idea is that you can pool the resources of multiple users of your application, and assume that none of them is going to require your full capacity, so you can put them on the same underlying hardware. Obviously, one of the the biggest challenges with multi-tenancy is isolation, i.e., how to let each user “feel” as if she is running on her own dedicated resource.

I know this is a fairly simplistic view of the concept, and obviously doing this for a mission critical application that is running in production is going to require much more thought. In part II of this post I’ll discuss in more depth how to apply those principles through the entire stack.