HP: OpenStack's networking nightmare Neutron 'was everyone's fault'

OpenStack Summit HP has admitted that the networking component of OpenStack sucks, and that the community needs to have a serious think about how to develop it in the future.

The enterprise IT company's chief operating officer for cloud, Saar Gillai, told El Reg on Tuesday that the reason why OpenStack's "Neutron" system has had so many problems is "we approached it as the community wrong."

Neutron is the OpenStack project's "networking-as-a-service" technology which is meant to help administrators create, configure, and manage software-defined networks.

The tech is a core project, and its stability is absolutely crucial for the creation of large infrastructure-as-a-service clouds based on OpenStack.

Neutron is based on Quantum, which was contributed into the project by Nicira. Nicira was then acquired by VMware, and the company's members continue to make contributions into the tech. Many original users of Quantum paired it with Nicira's "NSX" plug-in, which made use of the company's software-defined networking tech.

Unfortunately, when using Neutron without the NSX plug-in, it had serious problems.

"This is the only project where we required a third-party [component] to be whole," explains Red Hat's director of product management for its virtualization business unit, Andrew Cathrow. "The challenge really was that Neutron really grew up around one vendor, and when people were talking about Neutron they were really talking about Nicira."

HP ran into this weakness when building its own public cloud service. The problems were so severe that HP was forced to rewrite the networking component of OpenStack for its own cloud, Gillai confirmed. "We ran into [Neutron's problems] on public cloud and that's why we stayed off of mainline [OpenStack] for a long, long time," he explains. Now, with HP Helion, the company is trying to develop a stock OpenStack distribution based on the main community brand and fix networking along the way.

The Neutron problem was that the type of problems HP runs into only happen at very large scale, and so many smaller production deployments of OpenStack didn't experience these issues. But as the tech limbers up for significant production deployments, Neutron's problems are becoming more and more apparent, causing users to complain at a user panel for the tech held on Monday.

"It was everyone's fault," Gillai says. "I think the community understands that in the next 12 months a lot of work will be done. We ran into [Neutron problems] hot and heavy in our public cloud. It's something that has to get solved. It will require vendors to step aside and accept they've got to follow the OpenStack way of doing things."

Neutron's problems are both endemic to its own design, and software-defined networking systems that it plugs into.

"We tried supporting a bunch of the commercial software-defined networks [in Neutron]," explained Piston Cloud Computing's chief technology officer Joshua McKenty in a chat with El Reg. "The theory was, 'is it just the open vSwitch in Neutron is crap?' – [but] even the commercial ones aren't where they need to be."

The problem is that the community "focused on shiny layer four through seven features" at the expense of base networking tech, McKenty said.

"I expect in the next twelve months Neutron will get better," Gillai says. "It's a big problem, I agree. To some extent you could argue it stayed this way because, quote unquote, people could get away with it. At the end of the day it's now caused enough problems at customers that we have to go fix it."

Gillai's comments are echoed by Red Hat. "We've got all the vendors in the ecosystem acknowledging it," said Red Hat's Cathrow. "I feel positive about the future."

In many ways, Neutron's failure and planned rebirth are a metaphor for OpenStack as a whole, with the tech promising too much at the start, becoming overly dependent on vendors, and only being fixed when paying punters started to confront its weaknesses. As the OpenStack collective learn these lessons the hope is that they will run into fewer errors, and perhaps make good on their plan to provide a viable cloud operating system to telcos and other businesses.

"We have to look back and learn the lessons about when things leave incubation, and I believe those decisions have been learnt," Cathrow said. "We all carry scars from those decisions." ®