Monday, 18 October 2010

This is the second blog entry in a series documenting the underlying points I made in my recent talk at the OSGi Community Event in London. Entitled "OSGi And Private Cloud", the slides are available here and the agenda is as follows:

In this section of the talk I look at where OSGi fits into the Cloud architecture. However, as the community event was co-hosted with JAX London it wasn't a given that everyone at my talk would know OSGi. This is also possibly true for others reading this blog, so to make sure we're all starting from a similar page, I'll briefly explain the basics of what OSGi is about for those who have not come across it before.

OSGi A Quick Review

I've been working with Richard Hall, Karl Pauls and Stuart McCulloch on writing OSGi In Action which explains OSGi from first principles to advanced use cases, so if you want to know more that's a good place to look. However, here I'd like to give my elevator pitch for OSGi which would be something like as follows...

OSGi first became a standard in 1999 and provides a set of specifications for building dynamic modular Java applications. It has success stories in every area of Java development from embedded devices, though desktop applications to enterprise applications. The core features that OSGi provides a Java application are:

Modules - the building blocks from which to create applications

Life cycle - control when modules are installed or uninstalled and customise their behaviour when they are activated

Services - minimal coupling between modules

You might say that none of these are new ideas, so why is OSGi important? The key is in the standardisation of these fundamental axioms of Java applications. Instead of every software stack having a new and inventive way of wiring classloaders together, booting components, or connecting component A to component B, OSGi provides a minimal flexible specification that allows us to get interoperability between modules and let developers get on with the interesting part of building applications.

An Uncomfortable Truth

To see where OSGi fits into the Cloud story it's worth taking a brief segue to consider a point made by Kirk Knoernschild at the OSGi community event in February this year. Namely that we are generating more and more code with every passing day:

Lines of code double every 7 years

50% of development time spent understanding code

90% of software cost is maintenance and evolution

By 2017, we'll have written not only double the amount of code written in the past 7 years but more than the total amount of code ever written combined! Object Orientation has helped in encapsulating our code so that changes in private implementation details do not effect consumers. But in fact OO turns out to be just a stop gap and it is reaching the limits of its capabilities. If you refactor public objects or methods you still need to worry about who is consuming these and without modules this can be a hard question to answer.

Eric Newcomer of Credit Suisse gave another good talk at the recent community event on the scale of software development at the bank. The message I took away from this presentation is that within any large organisation, one can probably locate an example of virtually any computer algorithm ever conceived (in fact if you look hard enough you will more than likely find two). If we look out to small and medium sized organisations, sure they won't have pre-canned examples but any developer worth his salt can probably; knock up some approximation within a couple of days, find an open source library to do the same job, or part with some money to a vendor to get the job done.

The message from these two presentations is, that as we move into the era of Cloud computing the real problem is not how to author code but how to manage and reuse code and to do so at scale. As businesses grow and Cloud makes hardware cheaper and cheaper to use, market competition is driving computer software to larger and larger scales to cope with increased processing, network and storage volumes. This scaling tends to lead to more complexity and often in an exponential relationship. But what do I mean by scale when talking about software in the Cloud? And how do we tame the complexity versus scale curve?

Types of Scale

There are three measures of scale that I think are of relevance to this discussion of OSGi and the Cloud:

Operational scale - the number of processors, network interfaces, storage options required to perform a function

Architectural scale - the number and diversity of software components required to make up a system to perform a function

Administrative scale - the number of configuration options that our architectures and our algorithms generate

In fact, I think we've got pretty good patterns by now for dealing with the operational scale. As we increase the number of physical resources at our disposal, this drives the class of software algorithms required to perform a function. To pick a random selection Actors, CEP, DHTs and Grid are just some of the useful software patterns for use in the Cloud. However, I think architectural and administrative scale is often less well managed.

In terms of architectural scale, think about all the libraries we have to perform similar functions; logging, data access layers, RPC frameworks, web frameworks. How many of the millions of lines of code that we are generating are boilerplate copies of each other? Redundant architecture is a major problem in the growth of software. Which parts are really providing value to the business? Which parts are harmless clones of each other? Which parts are a maintenance cost that should be replaced? We employ abstractions to protect ourselves from underlying implementations but these abstractions can themselves become maintenance costs. I would argue that as software engineers we are suffering from the paradox of choice.

When managing code we need to worry about updating code, as code is very rarely a static entity; bugs are fixed, new APIs are created, old ones are deprecated and removed. With the volume of code in existence, we need mechanisms to manage the complexity created by the constant churn of logic that makes up our business systems. This leads us onto the problems of administrative scale.

Administrative scale hampers our ability to reason about and evolve deployed systems. The human brain has evolved to deal with relatively small connected graphs. But software today consists of multiple configuration options - libraries that implement APIs, network configurations, storage configurations, queue depths, the list is endless. When we look at the interconnected nature of many software architectures, do we really know what the impact of changing parts of the configuration will have?

All this brings me to...

OSGi Cloud Benefits

In Part 1 of this series of blogs I mentioned that the Nistdefinition of a cloud includes the statement that: "Cloud software takes full advantage of the cloud paradigm by being service oriented with a focus on statelessness, low coupling, modularity and semantic interoperability", to my mind OSGi has these bases covered.

OSGi is a specification for modular Java that encourages low coupling via the use of services and it certainly allows you to build stateless applications. OSGi also promotes semantic interoperability via the fact that the code runs in a JVM and is abstracted from the underlying platform. Higher order levels of interoperability can easily be enabled using API and implementation modules that provide service abstractions around common platform functions, such as accessing data or scheduling tasks.

But why should cloud software have these features?

Returning to my theme of scale and complexity from the previous section, modularity and service orientated architectures enable encapsulation of coherent components to help reduce architectural complexity. Semantic interoperability aids in the war against administrative complexity as the same code can run no matter what hardware or network environment it is deployed in. Finally, stateless architectures are just a good design goal for dealing with production scale.

OK interesting, but you might say that "TechnologyX (pick your favourite) can also provide these features, so really sell me on the OSGi cloud benefits". In which case I propose that there are four additional benefits of OSGi with respect to Cloud software which I'll deal with in turn:

Dynamic: Clouds can be tempestuous environments with latency and contention being major factors. In these sorts of environments software that is designed to cope with runtime dependencies coming and going is more robust than static architectures. Consider the analogy with civil engineering and bridge or aeroplane wing design - rigid architectures are more fragile than those that incorporate degrees of flexibility. The Remote Services chapter from OSGi 4.2 specification promotes a discovery based services API. Thus, if a service dependency is lost due to movements in the Cloud then client code does not go into a spin making socket connect timeouts, it just gracefully moves into what ever state makes most sense and the rest of the processing can continue.

Extensible: Clouds are all about expansion so following good XP principles when you start a new project you should only develop the parts of the application you actually know are needed. However, versioned module dependencies and service interfaces mean that you can easily abstract or update simple implementations as the application demands grow.

Lightweight: Clouds are meant be light, right? OSGi promotes modular design. Modular designs in turn allow you to tune a software deployment to the actual task at hand. OSGi lifecycle and dynamic services patterns even allow us to extend an application at runtime. This enables all sorts of interesting new use cases, for example:

if you need to get diagnostics information out of the software, only deploy the diagnostics components for the time that they are needed - for the rest of the time run lean

if you need to scale up a certain component's processing power, swap an in-memory job queue for a distributed processing queue and when you're done swap it back again.

Self describing: OSGi bundles contain a description of the module in their jar manifest files. This helps in the war against administrative complexity, notably via automation and audit. Just as an OSGi framework can validate that a bundle has been deployed with all its necessary dependencies, it is also possible to reverse this process and download required dependencies automatically. There are several implementations of this pattern already in use in OSGi today; Nimble, OBR and P2. This simplifies deployments by allowing software engineers to focus on what they want to deploy instead of what they need to deploy.

In terms of audit, OSGi bundles have a number of standardised headers to describe meta features such as name, description, license and documentation, so if you want to find out about a piece of software in a bundle just look at its manifest information. Once you get into this frame of thinking, other meta data such as author, build date, business unit also make sense to be embedded into the bundle. This sort of meta information can greatly benefit system admins and system builders in the future.

OSGi Cloud Services

To conclude this post, assuming I've managed to convince you of the benefits of OSGi in Cloud architectures, here are some ideas for potential cloud OSGi services (definitely non exhaustive):

MapReduce services - Hadoop or Bigtable implementations?

Batch services - Plug and play Grids?

NoSQL services - Scalable data for the Cloud!

Communications services - Email, Calendars, IM, Twitter?

Social networking services - Cross platform widgets?

Billing services - Making money in the Cloud!

AJAX/HTML 5.0 services - Pluggable UI architectures?

These would enable developers to start building modular, dynamic, scalable applications for the Cloud and are in fact pretty simple to achieve if there's the will power to make it happen.

I think OSGi provides an excellent foundation for building Cloud software. There are things it doesn't do but its extensible nature means that it is very easy to build additional tools on top of it and really start addressing the problems of scale. I'll look at some of tools I've been working on in this area in the final post.

So all good right? Well there are still of course challenges, so in the next post I'll look at some of these and discuss how to overcome these. In the meantime, I'm very interested in any feedback on the ideas found in this post.