Tag Archives: Nova

Post navigation

Whew….Yesterday, Dell announced TWO OpenStack block storage capabilities (Equallogic & Ceph) for our OpenStack Essex Solution (I’m on the Dell OpenStack/Crowbar team) and community edition. The addition of block storage effectively fills the “persistent storage” gap in the solution. I’m quadrupally excited because we now have:

both Nova drivers’ code is in the open at part of our open source Crowbar work

Frankly, I’ve been having trouble sitting on the news until Dell World because both features have been available in Github before the announcement (EQLX and Ceph-Barclamp). Such is the emerging intersection of corporate marketing and open source.

As you may expect, we are delivering them through Crowbar; however, we’ve already had customers pickup the EQLX code and apply it without Crowbar.

The Equallogic+Nova Connector

If you are using Crowbar 1.5 (Essex 2) then you already have the code! Of course, you still need to have the admin information for your SAN – we did not automate the configuration of the storage system, but the Nova Volume integration.

We have it under a split test so you need to do the following to enable the configuration options:

Install OpenStack as normal

Create the Nova proposal

Enter “Raw” Attribute Mode

Change the “volume_type” to “eqlx”

Save

The Equallogic options should be available in the custom attribute editor! (of course, you can edit in raw mode too)

Usage note: the integration uses SSH sessions. It has been performance tested but not been tested at scale.

The Ceph+Nova Connector

The Ceph capability includes a Ceph barclamp! That means that all the work to setup and configure Ceph is done automatically done by Crowbar. Even better, their Nova barclamp (Ceph provides it from their site) will automatically find the Ceph proposal and link the components together!

This turned out to be a major open cloud gab fest! In addition to Dell OpenStack leads (Greg and I), we had the Nova Project Technical Lead (PTL, Vish Ishaya, @vish), HP’s Cloud Architect (Alex Howells, @nixgeek), Opscode OpenStack cookbook master (Matt Ray, @mattray). We were joined by several other Chef Summit attendees with OpenStack interest including a pair of engineers from Spain.

We’d planned to demo using Knife-OpenStack against the Crowbar Diablo build. Unfortunately, the knife-openstack is out of date (August 15th?!). We need Keystone support. Anyone up for that?

Highlights

There’s no way I can recapture everything that was said, but here are some highlights I jotted down the on the way home.

After the miss with Keystone and the Diablo release, solving the project dependency problem is an important problem. Vish talked at length about the ambiguity challenge of Keystone being required and also incubated. He said we were not formal enough around new projects even though we had dependencies on them. Future releases, new projects (specifically, Quantum) will not be allowed to be dependencies.

The focus for Essex is on quality and stability. The plan is for Essex to be a long-term supported (LTS) release tied to the Ubuntu LTS. That’s putting pressure on all the projects to ensure quality, lock features early, and avoid unproven dependencies.

There is a lot of activity around storage and companies are creating volume plug-ins for Nova. Vish said he knew of at least four.

Networking has a lot of activity. Quantum has a lot of activity, but may not emerge as a core project in time for Essex. There was general agreement that Quantum is “the killer app” for OpenStack and will take cloud to the next level. The Quantum Open vSwitch implementaiton is completely open source and free. Some other plugins may require proprietary hardware and/or software, but there is definitely a (very) viable and completely open source option for Quantum networking.

HP has some serious cloud mojo going on. Alex talked about defects they have found and submitted fixes back to core. He also hinted about some interesting storage and networking IP that’s going into their OpenStack deployment. Based on his comments, I don’t expect those to become public so I’m going to limit my observations about them here.

We talked about hypervisors for a while. KVM and XenServer (via XAPI) were the primary topics. We did talk about LXE & OpenVZ as popular approaches too. Vish said that some of the XenServer work is using Xen Storage Manager to manage SAN images.

Vish is seeing a constant rise in committers. It’s hard to judge because some committers appear to be individuals acting on behalf of teams (10 to 20 people).

Reminder: 12/8 Meetup @ Austin!

Based on our last meetup, it appears deployment is a hot topic, so we’ll kick off with that – bring your experiences, opinions, and thoughts! We’ll also open the floor to other OpenStack topics that would be discussed – open technical and business discussions – no commercials please! 

We’ll also talk about organizing future OpenStack meet ups! If your company is interested in sponsoring a future meetup, find Joseph George at the meetup and he can work with you on details.

Jon Dickinson who is the Project Technical Lead for Swift (Object Storage) was there and presented information on the current Swift offering; It is interesting to note that Swift releases continuously when most of OpenStack releases during the 6 month development cycle like Nova (Compute)

Stephen and Jim Plamondon from Rackspace presented information on the overall community and talked about the announcement yesterday from Internap about their Compute public cloud and the information on the MercadoLibre 600 Node Compute cloud running their business:

“With 58 million users of MercadoLibre.com and growing rapidly, we need to provide our teams instant access to computing resources without heavy administrative layers. With OpenStack, our internal users can instantly provision what they need without having to wait for a system administrator,” said Alejandro Comisario, Infrastructure Senior Engineer, MercadoLibre, the largest online trading platform in Latin America. “With our success running OpenStack Compute in production, we plan to roll OpenStack Diablo out more broadly across the company, and have appreciated the community support in this venture, especially through the OpenStack Forums, where we are also global moderators.”

Discussion on the OpenStack API Issue which is a significant open issue at this time – should OpenStack focus on creating an API specification and then let multiple implementations of that API move forward or build 1 implementation of the API as official OpenStack (see my post for more on this).

The Crowbar testing cycle drove two significant architectural changes that are interesting as general challenges and important in the details for Crowbar adopters.

Challenge #1: Configuration Sequence.

Crowbar has control of every step of deployment from discovery, BIOS/RAID configuration, base image, core services and applications. That’s a great value prop but there’s a chicken and egg problem: how do you set the RAID for a system when you have not decided which applications you are going to install on it?

The urgency of solving this problem became obvious during our first full integration tests. Nova and Swift need very different hardware configurations. In our first Crowbar flows, we would configure the hardware before you selected the purpose of the node. This was an effect of “rushing” into a Chef client ready state.

We also needed a concept of collecting enough nodes to deploy a solution. Building an OpenStack cloud requires that you have enough capacity to build the components of the system in the correct sequence.

Our solution was to inject a “pause” state just after node discovery. In the current Crowbar state machine, nodes pause after discovery. This allows you to assign them into the roles that you want them to play in your system.

In testing, we’ve found that the pause state helps manage the system deployment; however, it also added a new user action requirement.

Challenge #2: Multi-Master Updates

In Chef, the owner of a node’s data in the centralized database is the node, not the server. This is a logical (but not a typical) design pattern and has interesting side effects. Specifically, updates from Chef Client runs on the nodes are considered authoritative and will over-write changes made on the server.

This is correct behavior because Chef’s primary focus is updating the node (edge) and not the central system (core). If the authority was reversed then we would miss critical changes that Chef effected on the nodes. From this perspective, the server is a collection point for data that is owned/maintained at the nodes.

Unfortunately, Crowbar’s original design was to inject configuration into the Chef server’s node objects. We found that Crowbar’s changes could be silently lost since the server is not the owner of the data. This is not a locking issue – it is a data ownership issue. Crowbar was not talking to the master of the data when it made updates!

To correct this problem, we (really Greg Althaus in a coding blitz) changed Crowbar to store data in a special role mapped to each node. This works because roles are mastered on the server. Crowbar can make reliable updates to the node’s dedicated role without worrying the remote data will override changes.

This pattern is a better separation of concerns because Crowbar and barclamp configuration in stored in a very clearly delineated location (a role named crowbar-[node] and is not mixed with edge configuration data.

It turns out that these two design changes are tightly coupled. Simultaneous edge/server writes became very common after we added the pause state. They are infrequent for single node changes; however, the frequency increases when you are changing a system of interconnected nodes through multiple state.

More simply put: Crowbar is busy changing the node configs at the exactly same time the nodes are busy changing their own configuration.

Note: I want to repeat that Crowbar is not tied to Dell hardware! We have modules that are specifically for our BIOS/RAID, but Crowbar will happily do all the other great deployment work if those barclamps are missing.

<service bulletin> Server virtualization is not cloud: it is a commonly used technology that creates convenient resource partitions for cloud operations and infrastructure as a service providers. </service bulletin>

OpenStack claims support for nearly every virtualization platform on the market. While the basics of “what is virtualization” are common across all platforms, there are important variances in how these platforms are deployed. It is important to understand these variances to make informed choices about virtualization platforms.

Of course, there are many more hypervisors and many different ways to deploy the three I’m referencing.

This picture shows all three options as a single system. In practice, only operators wishing to avoid exposure to RESTful recreational activities would implement multiple virtualization architectures in a single system. Let’s explore the three options:

OS + Hypervisor (KVM) architecture deploys the hypervisor a free standingapplication on top of an operating system (OS). In this model, the service provider manages the OS and the hypervisor independently. This means that the OS needs to be maintained, but is also allows the OS to be enhanced to better manage the cloud or add other functions (share storage). Because they are least restricted, free standing hypervisors lead the virtualization innovation wave.

Bare Metal Hypervisor (XenServer) architecture integrates the hypervisor and the OS as a single unit. In this model, the service provider manages the hypervisor as a single unit. This makes it easier to support and maintain the hypervisor because the platform can be tightly controlled; however, it limits the operator’s ability to extend or multi-purpose the server. In this model, operators may add agents directly to the individual hypervisor but would not make changes to the underlying OS or resource allocation.

Clustered Hypervisor (ESX + vCenter) architecture integrates multiple servers into a single hypervisor pool. In this model, the service provider does not manage the individual hypervisor; instead, they operate the environment through the cluster supervisor. This makes it easier to perform resource balancing and fault tolerance within the domain of the cluster; however, the operator must rely on the supervisor because directly managing the system creates a multi-master problem. Lack of direct management improves supportability at the cost of flexibility. Scale is also a challenge for clustered hypervisors because their span of control is limited to practical resource boundaries: this means that large clouds add complexity as they deal with multiple clusters.

Clearly, choosing a virtualization architecture is difficult with significant trade-offs that must be considered. It would be easy to get lost in the technical weeds except that the ultimate choice seems to be more stylistic.

Ultimately, the choice of virtualization approach comes down to your capability to manage and support cloud operations. The Hypervisor+OS approach maximum flexibility and minimum cost but requires an investment to build a level competence. Generally, this choice pervades an overall approach to embrace open cloud operations. Selecting more controlled models for virtualization reduces risk for operations and allows operators to leverage (at a price, of course) their vendor’s core competencies and mature software delivery timelines.

While all of these choices are seeing strong adoption in the general market, I have been looking at the OpenStack community in particular. In that community, the primary architectural choice is an agent per host instead of clusters. KVM is favored for development and is the hypervisor of NASA’s Nova implementation. XenServer has strong support from both Citrix and Rackspace.