TechCenter

DellTechCenter.com is a community for IT professionals that focuses on Data Center and End User Computing best practices. Here you can learn about and share knowledge about Dell products and solutions.

Learn firsthand about OpenStack, its challenges and opportunities, market adoption and CERN’s engagement in the community. My goal is to interview all 24 members of the OpenStack board, and I will post these talks sequentially at Dell TechCenter. In order to make these interviews easier to read, I structured them into my (subjectively) most important takeaways. Enjoy reading!

#1 Takeaway: CERN’s interest in OpenStack results from a growing need to do more with less resources

Rafael: Why did CERN decide for OpenStack and which alternatives did you consider?

Tim: Our interest in the areas of virtualization and cloud been ongoing for number of years. But we got to an interesting point about 18 months ago; we had an agreement to have a second Hungary based data center for CERN roughly doubling our computing capacity. However the IT staff numbers are fixed so this basically means that we need to look very carefully at the areas where we could be saving both operationally and also for the computing resources and make things more efficient. We had a number of investigations into virtualization on Hyper-V, System Center Virtual Machine Manager SCVMM based on Microsoft technologies and also around OpenNebula. Both of those were able to access good test beds but when we looked at the scale we would be facing over the next 18 months to 2 years, it was clear that on their own these wouldn’t allow us to provide an efficient solution.

At the same time the OpenStack project got announced and was starting to get to the point where we could be testing it. Around the OpenStack Diablo release time frame we started to have a look, and we were impressed with how much functionality was there. But also we were aware that there was a lot more that needed to be done before we could do large scale deployments. From that point of view we were being gradually building up the OpenStack deployment over the past year, and now we are running on around 2,000 guests on 500 hypervisors with aim towards 2015 to getting to about 15,000 hypervisors. We need to get to 90 percent of our computer center running on the virtualized environment.

Rafael: … which would probably be the biggest OpenStack deployment so far, is this correct?

Tim: My impression is that by 2015 we would not be the biggest. In fact one of the things that we find encouraging around OpenStack is that people are talking of these sort of numbers. Architecturally it’s a very scalable model but it’s more than just the pure architecture. It’s the question of not having to do it on your own but also having a chance to share with others who have similar experiences.

#2 Takeaway: CERN is expecting limitations to scalability to significantly decrease with the OpenStack Grizzly release

Rafael: What are the roadblocks that currently stand in the way of large scale OpenStack deployments?

Tim: One of the developments we have been watching very closely has been the cells development. Clearly a number of sites are pushing the thousand plus hypervisor scale at the moment, but the key break through will be with the OpenStack Grizzly release when the cells functionality is there, and this will allow us to construct hierarchies of cells of compute resources. This would remove one of the major limitations in terms of the total scalability. My understanding is that the code was dropped about two weeks ago … that’s one of the areas that we have been investigating in the short term.

#3 Takeaway: Multi hypervisor support is crucial. CERN is using Hyper-V and KVM hypervisors in its OpenStack environment (and is working closely with vendors to enhance functionalities)

Rafael: Can you tell us a bit more about your collaboration with the OpenStack Hyper-V team at Microsoft?

Tim: We have been very happy users of Hyper-V and SCVMM for our server consolidation environment. We were keen that we would be able to carry on working in a mode where the hypervisors are a tactical choice rather than a strategic choice. The multi hypervisor support was very important to us. We want to be able to test out performance compatibility questions on different hypervisors without being forced into a mode that says: “If you choose this cloud solution you must have this hypervisor.” We started to work with Hyper-V when it was first in OpenStack, and as it failed to keep up with the testing it was something we were very keen on working with Microsoft and the various companies that were working on Hyper-V functionality - most importantly to equivalence with hypervisors such as KVM.

Together with Microsoft we have been going through a lot of testing, validation of different combinations and also filling in the various areas where there are small functionality gaps, things like getting the consoles working. Within our OpenStack environment now we are running both Hyper-V and KVM hypervisors.

#4 Takeaway: Quo vadis, OpenStack? That question needs to be answered by the OpenStack Board of Directors and Technical Community

Rafael: Which challenges do you see ahead of the OpenStack community in general as well as for the OpenStack board?

Tim: If I take things from the board’s perspective … it’s been a very interesting six months since I was elected as an individual member. Working through some of the legal and construction of the board has been clearly the focus for the last 6 months. We now need to clarify some fairly hard issues around where OpenStack is going. In particular, there is a key discussion along with the Technical Committee which is to define the overall direction for what is OpenStack, and that will be key in defining where we go looking forward. On top of that, the other item that we do need to do some work on over the next 12 months is to improve the election process for the individual board members. Along with the Platinum and Gold Members there are the eight members of the board who cover the representatives for the individual members of the community. We need to make it clear so that this process is a transparent and representative one.

#5 Takeaway: Configuration, management of OpenStack and the ability to move workloads between OpenStack clouds needs to be enhanced

Rafael: What needs to be worked on in OpenStack?

Tim: As I look out one of the key things to establish will be clear feedback loops between the user community and the developers of the software. Now that OpenStack is in a state where it can be deployed in production and run, and we are getting reasonable experience of production deployments … we now need to get those experiences back into the main line code and move towards a mode where we improve the out of the box experience for the end users.

CERN has the advantage of having highly skilled team. We are able to bring some very good system administrators to work through some of the more difficult aspects of configuring and managing an OpenStack environment. That experience needs to be improved, so that it becomes a standard part of the product and the associated eco systems. Examples of this are the work with Puppet Labs we are doing … so that it becomes very easy to configure and deploy an OpenStack environment using Puppet. Most of this is just a question of coding the best practices into a set of scripts, a set of configuration details … such that people who are deploying OpenStack don’t need to be OpenStack experts.

The other item I think looking out is … we need to find a way to move workloads between OpenStack environments more easily. To define that core set of functions that everyone expects to be available in an OpenStack cloud and to be able to validate … so that when clouds are OpenStack compliant, then we are able to be sure that the workload that we run for example in the CERN private cloud could be equally deployed as we need it into a commercial OpenStack provider.

Rafael: How do early adopters like CERN propel and influence OpenStack market adoption?

Tim: CERN has the benefit of being a very open environment. We are able to be very explicit, very detailed in terms of how we use the various forms of computing resources. This means that we can perform outreach in a way that some of the private cloud companies for example are not able to. On top of that … because we have a resource pool as our staff, many of whom are on limited duration contracts … we have seen these people arriving at CERN, working for a while with OpenStack and then leaving CERN when their contracts come to an end. And this also helps to propagate and enlarge the pool of skilled resources available to deploy and use OpenStack.

I think CERN can act in a role of being relatively leading edge. However we also have to be very aware that we do not want to be in a situation where CERN’s requirements are unique. We don’t have the resources available to develop our own cloud solution. We have to make sure that whenever we have something that we think is a unique CERN requirement that we check that with other people who are deploying similar styles of cloud. In that respect things like peer review, like the Open Design Summit, like the user stories are very useful for us to allow us to check what we are doing is aligned with the rest of the direction that the industry is going in.

Rafael: What are your expectations towards Dell as you move into cloud, virtualization and … OpenStack??

Tim : CERN’s hardware procurement model is based on a very open tendering process. Under that we would write up specifications for the kind of machines that we are looking for and send them to a large number of vendors and then we will get back proposals for solutions. Where OpenStack is interesting for using in this respect is … it allows us to potentially be asking for a core subset of functionality in a modular way, so that vendors who wish to do enhancements are able to do so while remaining within the core functionality that we will require. One example would be how you do bare metal management, we wouldn’t need to require that you use specific base board controllers but instead we could be asking to say: “You must be providing hardware which is compatible with this implementation of OpenStack.” And that gives vendors a lot more opportunity to innovate … and equally on our side means that we assure we don’t end up in a situation where we have a large variety different hardware which increases our cost dramatically.

#7 Takeaway: CERN is applying the cattle (commodity) and pets (sophisticated) model for its hardware configurations

Rafael: What are your thoughts on cloud ready hardware in general. Should hardware become more stupid or should it become more intelligent in the future?

Tim: We have a lot of discussions around the role from basic commodity hardware through to the more sophisticated hardware configurations. In this respect we found a model by Cloudscaling to be a very useful approach which is ‘pets’ and ‘cattle’. The aim is … where we can, we want to move towards a situation where the software is providing the redundancy, and we are able to move to a mode where we don’t have machines that are critical. That having been said software takes a while to arrive at that level. We run a wide variety of applications written by physicists all over the world, and it’s not possible to guarantee that everyone would have written things to the level of redundancy that a large cloud provider can expect as part of the cattle style model.

What we are hoping to be able to do is to provide a cattle model for a large majority of our environment. But instead to have pets which are the more custom machines looked after more carefully and be able to sustain and support that … but within a single framework , and this means to start looking at things like the ability to restart virtual machines in the event of a hypervisor failing or built in migration capabilities in the event of a hardware failure. But we are certainly moving towards the mode of saying we want to see our redundancy moving higher in the stack so that we are able to cover underlying hardware, networking and equally operating systems software failures in more resilient fashion.

Rafael: Tim, thank you very much for this interview!

Tim: No problem, and thanks for all the work and the previous interviews that you have done have been very interesting.