Geir Magnusson on Cloud Computing

Recorded at:

Bio Geir Magnusson Jr. is VP of Engineering at 10gen Inc, a "cloud" technology startup. His diverse background includes being a technology leader at Joost, Intel, IBM/Gluecode and Adeptra. He serves on the Board of Directors of the Apache Software Foundation and has been instrumental in Apache Harmony, Geronimo, and Velocity.

QCon is a conference that is organized by the community, for the community. The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community. QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.

That is a difficult question. It means a lot of things to a lot a people. I think the best definition you can use is describing it as IT related capabilities in services that are delivered from somebody over a network, whether it's the Internet, it could be a private network, but it's really about capabilities, resources, that are run by somebody else, that are delivered to you or to the consumer. The consumer could be a developer building an application or could be an end user consumer like someone with an IPhone.

Well I think you are right, I think that cloud computing has become an overloaded term, it's an umbrella for a lot of things, I sometimes joke it means everything that is in IT now. Personally speaking, having been in a company now for a year, that has been focusing on the cloud computing area, I am looking forward for us as an industry to come up with a new term, or a set of terms that help people, sort of disambiguate or make distinctions between the various aspects of what is now called cloud computing. You are right, I think that one way to partition cloud computing is into the "as a service" analogy that people use. It's maybe a cliché at this point, maybe not, but I think it does give people a good handle on how to think about at least one way to partition.

So at the top of that, if you think of it as a stack, or at least a capability stack. Starting from top to the bottom: starting from top you might have things like sales force as a platform, software as a service where a complete application is delivered to you over a network, Internet isn't the case of a sales force, and you don't really have to worry about it other than maybe set up paying for it and just using it. Right below that you will have platform as a service. I think you mentioned app engine as a good example of that, that is something that is programmable, the service that is provided is the environment to run programs in. You still have to provide the program, so that is something that is more developer focused.

Below that you sometimes you can partition it out to tools as a service, for example simple DB from Amazon is something I might call a tool as a service: it's an ingredient, as a developer and architect that you can use in an application you are building but by itself it doesn't serve much purpose. I mean it needs to be used by something else. That something would be your program. In the case of Simple DB that would be a program running on Amazon's EC2 for example, which would be lower down the stack I think of it as a infrastructure as a service and in that case you have the capability of running as many operating systems and virtual servers as you like, it's elasitic in its ability to provide for your demands, and I think that is a good way to address that tier of services.

I mean if we go back to the as a service model I think the one that has been the most easily understood and wide spread in its use has been the infrastructure as a service. You have Amazon EC2 which people can take the operating systems that they know and used in their non cloud environment in their manage data centers or in the data centers the run or just the machine under the desks, and they are able to do that and use them in the same way with the same software stacks that they use in other environments.

Go grade is an example, Joyent is an example, there are lots of people going after this infrastructure as a service I think because it is the most accessible and easiest to use and it is the easiest to transition to from your current environment to there because for the most part your app stacks don't have to change. You may have a little difference in monitoring, certainly the fact that the VMs can disappear is a problem, although it's no different than a server crashing in your machine room, which seams to happen at a little bit of a higher rate than it would to physical hardware. But you are probably designing to handle that already so it's not that big of a leap to go one under the other.

I think what we are going to see and this is a little bit longer term, because right now like I said you can take existing stuff and move it right over to the cloud. You can take your existing Ruby on Rails app and deploy it out to someone like Engineyard. And it doesn't really change much of what you develop, it does allow you to focus on time and capital on solving business problems rather than having to be an operations organization. So I think in the near term that has been really useful for software development organizations.

I think on the longer term what is going to be interesting is as the adoption of services further up the stack becomes more common place, instead of getting a Virtual Machine run an operating system with your regular stack, you start incorporating tools as a service, ingredients like that into your application. You start using Simple DB, or you go up higher on the stack and start running your application for AppEngine. Those capabilities or those services, have new APIs and new paradigms for your applications and how you think about persistence and how you think about concurrency and being able to do transactions.

A lot of the limitations of these environments, be it simple DB or AppEngine are really a direct result of the ability for the scalability capability that it gives you. But these have changes to the developer in terms of compared to what they used to expect in just having their own local database which was private to them, relational database and its asset properties, some of these things are going to, if not go away, have to be handled in a different way by the app developer. I think that is where we are going to see the immediate changes which is going to influence how people are designing their data models, their object models and how they think about certainly persistence. In terms of enterprise it may be more of an operational thing for enterprise developers and it's not really clear what exactly you mean by enterprise developers, do you mean large corporations or do you mean server-side infrastructure which I don't think is particular to the enterprise.

But what I am hoping to see is that this sort of "by the drink" services where I can very easily spin up the environment to put a Ruby on Rails app out, are going to help enterprises separate from constraints that internal IT departments place upon them for buying capacity management deployment planning and allowing them to be more Agile, and being able to quickly spin up apps, change apps, maybe it's just department level to start but historically that's how we have seen a lot of these destructive technologies enter the enterprise. If you saw how Linux got into the enterprise, it was the print server under somebody's desk that worked well and just functioned. Hopefully we'll see these technologies get into the enterprise the same way.

That is a very important point. It's not clear to me how Engineyard could be considered a candidate for locking because - I am not an Engineyard user - but my understanding is that services like Engineyard and Engineyard itself is that they are providing a really hosted platform for a Rails application and it is standard in the sense that you should be able to move your Rails app which you are running on wither you laptop with your stack plus MySql, and in your server room right into Engineyard and conievably you don't have to make API changes to do this. AppEngine is a different story. I am a big fan of AppEngine, I like it a lot.

People do have the concern that you could be "stuck" if you use AppEngine and to Google's credit they have made their infrastructure, the basic container environment available as open source under a very nice license, under the Apache license, so conceivably you can take the app engine SDK and put it under a more robust web serving infrastructure and find a persistent data store underneath it, and run your app yourself, your AppEngine applications. To be frank we for Mango DB have an adaptor to do just that, you can put Mango DB underneath AppEngine as a persistent data store, and that is a scale-to-perform a data store, so if you could find a way to host the AppEngine API in a more robust container than the SDK you could run your AppEngine appin your environment.

Now, that said, Google does offer a lot of APIs that you can't duplicate outside Google. They seam to do ok with search for example. They have their shopping cart service. There is a lot of services that are really useful, the integration with the logins and user identity services, but as a developer you have to make that choice. You have to say "I am going to use this", and hopefully you do understand the implications of lock-in but it may be worth it to you, the trade off, knowing that if you do want to go you have to make a change. I think that if it is clear to everybody that you understand what is a service that cannot be moved and what is, I think it is ok that people make the decisions to do this.

I think open source has played an incredible role in making Java what it is today. I thought about this question a bit, in all truth you gave me this questions ahead of time to think about, and I don't want to overstate the role of open source in Java, I personally have been involved with open source in Java for 9 years now, so my prospective is clouded, so I certainly admit that up front. But if I remember what happened regarding Apache Tomcat, the Jakarta project, the partnership in a way, between Sun and Apache to bring at least then what was service side Java to make it open source and ubiquitous and consumable by all, I think that is what really made Java really spread very fast for server side use.

It all of a sudden was very easy to get applications written in Java, using Java standards, JSP at the time, for free and easily and quickly and up and going, there is a community around it, that could help you through things, it is important. And after that I think you could find lots of other places where open source has continued to influence the Java ecosystem but also how people think about their software and how they expect to get it and be able to use it. If you look at how the path of EGB was changed that started out very much as the EGB 1 and 2 was very much by the expert group, which was, let's be honest, a fairly closed group of entities working together.

But when you saw EGB3 that was really influenced a lot by what was happening in the open source world, I mean Hibernate for example had a tremendous if not influence sort of momentum to make that change, to break away from the older model, into something to what we have today. So I think there are a lot of places where open source gets the credit, there are powerful influences in IDEs, there are options and choices, whether you like them or not, there is Eclipse, there is Idea which is closed, there is Network which is open, logging.

The log for J-project that Apache had an incredible influence in what eventually came to be logging in Java and in the Java SDK itself. Spring for example. Spring has had an incredible effect on enterprise Java in the J2EE sense, Java EE sense, this was an open source project done by people who really did understand and understood that they would like to see things differently and has helped drive significant change in openness in this ecosystem as well. Those are a couple of examples I don't think I am overstating when I say Java has played a big role. I am sorry open source has played a big role.

The Apache Harmony project is a completely different implementation. In terms of direct effect, there is none, in terms of just the basic tangible code. It is good for the project in the sense that while Sun's implementation of Java was available under non open source licenses, we are very careful at Apache Harmony to ensure that we didn't accidentally take any of that code.

The fact that it is now open source has helped us relax that policy, we still won't take any of the code but because it's open source it allows us to relax some of the protections we had to ensure that we didn't take copyrighted material that was not under the open source license into the project. In terms of the larger implications of that question first of all the fact that Sun did this is very good. More open source Java the better. There is no way this could be construed as a bad thing. The fact there are two different licenses I think is good.

I think that the world is big, and there's lots of people in it, and lots of people who have interests in using Java technology and there is no license that goes for everybody. There are people that do not want to use software under a given license, either JPL or the Apache license; they don't want to contribute to a project under either the JPL or the Apache license, so diversity is good. It allows communities to form and conserve their own interests rather than try to force everyone into one box. There has been some negative effect certainly, people do ask why we need two implementations of open source Java.

It's a good question because certainly the important thing about Java's compatibility. That said we have got similar questions when we are trying to do an open source implementation of Java EE. There was JBoss at a time, and we were starting the Apache Geronimo project and under the Apache license at Apache and people were afraid that the world was going to end the sky was going to fall, that you will have all these incompatibilities with frameworks, that fracturing the market would be a disaster. It turns out to have strengthened the market.

That you now have multiple implementations that are all good, all good at different things maybe, they may have strong points and weak point, but they just give you choices and they are all compatible. It's that compatibility promise as far as you can be compatible in J2EE or Java EE is meant by all these implementations. But I think this is really healthy. So, we'll see what happens. The obvious thing I would bring up as Apache representative in JCP is that the bigger issue is our struggle to get the TCK license so that we can certify that the Apache implementation is compatible with the spec. But that is really an issue that should be orthogonal to how Sun is licensing their own implementations quite frankly, because that is the idea of the JCP, that multiple compatible implementations that are independent can coexist in the ecosystem, as long as they satisfy the compatibility requirements that when a user uses a software it behaves as they expect. We keep working at this and we'll see what happens.

It certainly would love to be able to pass the TCK and demonstrate that it is compatible with the spec, I think, honestly, that is the first and foremost what it has to do, I mean that is table stakes. If you can't pass the TCK, you are not Java compatible. And after that there is a whole raft of work to do for performance and stability these are all things that the TCK cannot attack, its out of scope for the TCK. The project continues, to be honest the lack of the TCK has been problematic, it's suffocating the project because if you want to participate in a project, you would like to make sure that the project can execute according to the project's purpose.

You wouldn't contribute to a project that could never be what it is trying to be. So this lack of the TCK or the trouble of getting the TCK is sort of, I don't know: what was the phrase you used to describe Microsoft did to Netscape? Suffocated or deny the air out of it? It's doing the same thing. That is my biggest worry that if we can resolve this problem and get the TCK license before it is efffectively strangled. That said, it is good technology. The class library is as far as we can tell - not having the TCK complete at least for Java 5 they are working on Java 6. The class library is being used several places the most notable would be by Google in the Google phone.

As part of the Android project they have taken Harmony class library and they are using that with the Java VM. I think that is a very strong vote of confidence in the technology, if you are willing to put it into phones. They have some interesting servicability requirements, they don't simply send out somebody to patch. It would be interesting to see where the project goes, there is opportunities in some of the new runtime ideas people are having. OSGI is a fascinating platform, it is just missing a runtime underneath it. Apache Harmony would serve very well in that role I think because it has the complete Java API, which OSGI is built on, it has some very nice modularization built in.

From the beginning of the project, we decided to modularize the class library, actually as OSGI bundles it turns out, and that make it easy to construct a subset of the functionality if that is all you need. That wouldn't be considered Java compatible, because the promise of the Java spec is that the complete API is available to you, that said it is interesting to see the conversations around would be SDK or Java 7, is it talking about the same idea? How do we modularize Java in a way to make it easier for later deployments so you may still be meeting the breadth of the API just you might have to load those modules later if you need them. It's a nice way to go about solving this problem, of having both a rich class library as well as the desire to be able to deliverin a lightweight way to, say network deployments. I mean all these options exist for Harmony, we go this big elephant in the room which is try to pass the compatibility demonstration.

Yes, that has actually been done, people have been taken and slim down parts of it and I don't think it was ever done anything with that. But it was very clear that the Harmony class library would have been a good superset from which the pieces could have been picked for ME profiles. Personally I am not really sure what kind of life ME has in it anymore. It is interesting to see what is going to happen with ME over time because devices now become so powerful they are capable of running a complete SE stack. But we'll see, it's still an interesting and big community and there are still billion of devices that are running ME so let's see how this plays out.

I believe AppEngine did a pretty awesome job. They even went as far as to go with a Java standard JDO or JPA to abstract their data store layer. I don't feel it's a *high* risk for vendor locking, but it would require some thought in abstracting certain parts of your code base in case you intend to move away from them.