OpenStack comes up huge for Walmart

For those skeptics who still think OpenStack isn’t ready for prime time, here’s a tidbit: @WalmartLabs is now running in excess of 100,000 cores of OpenStack on its compute layer. And that’s growing by the day.

San Bruno, California–based @WalmartLabs, which is the e-commerce innovation and development arm for the [company]Walmart[/company] retail colossus, started working with OpenStack about a year and a half ago, at first relying heavily on the usual vendors but increasingly building up its in-house talent pool, Amandeep Singh Juneja, senior director of cloud operations and engineering, said in an interview.

Building a private cloud at public cloud scale

@WalmartLabs has about 3,600 employees worldwide, 1,500 of whom are in the Bay Area. Juneja estimated the organization has hired about 1,000 engineers in the last year or so — no mean feat given that there are lots of companies, including the OpenStack vendors, in the market for this expertise.

“Traditionally, Walmart is vendor-heavy in its big technology investments — name a vendor and we’ve worked with it and that was also true with OpenStack,” Juneja noted. “We started about one and a half years ago with all the leading distribution vendors involved … we did our first release with Havana and [company]Rackspace[/company]. But then we invested internally in building our own engineering muscle. We attended all the meet-ups and summits.” Havana is the code name for the eighth OpenStack code release.

Amandeep Singh Juneja, @WalmartLabs

Nothing says big like Walmart. It has around $480 billion in annual revenue, more than 2 million employees, and more than 11,000 retail locations worldwide (including Sam’s Club and Walmart International venues). Walmart.com claims more than 140 million weekly visitors. So scale was clearly an issue from the get-go.

What @WalmartLabs loved about OpenStack was that it could be molded and modified to fit its specifications, without vendor lock-in.

AWS need not apply

This is a massive private cloud built on a public cloud scale. There are also some macro issues at play here. Since parent company Walmart competes tooth and nail with [company]Amazon.com[/company], the chances of Walmart using Amazon Web Services public cloud are nil. (I asked Juneja whether Walmart would ever use any public cloud capabilities and he politely responded that this question was above his pay grade.)

The beauty of open-source projects like OpenStack is that new capabilities continually come on line and there is a community of deeply technical people working on the code. Going forward, Juneja is particularly interested in Ironic, an OpenStack project to enable provisioning of bare metal (as opposed to virtual) machines, and in the Trove database-as-a-service project. Trove, he noted, has matured a bit and Walmart will be using more DbaaS going forward.

Another work in progress is the construction of a multi-petabyte object store using the OpenStack Swift technology, but there are also plans to bring more block storage in-house, possibly using OpenStack Cinder. And the team is looking at Neutron for software-defined network projects.

One thing Walmart must deal with is its brick-and-mortar roots. The ability to order online and pick up in the store means that what @WalmartLabs builds must interact with inventory and other systems already running the Walmart/Sam’s Club storefronts. Non-e-commerce-related IT projects are run by Walmart’s Information Services Division at the company’s Bentonville, Arkansas headquarters.

So the ability of the shiny new OpenStack systems to interface with infrastructure that’s been in place for decades or so — some for as much as 50 years — is critical. It also spells the full employment act for all those @WalmartLabs engineers.

Note: this story was updated at 11:30 a.m. PST to reflect that Walmart is running 100K+ cores, not nodes, of OpenStack

For Bare Metal Provisioning, check out Ubuntu Metal as a Service (MaaS). A fully open source tool that has been years in development and praised highly in the community. +1 on off the shelf cloud platforms. 1,000 cloud engineers for a 5,000 server deployment seems far fetched.

Sounds like they could have gotten a lot more bang for their buck with some off the shelf cloud automation/orchestration software!

So 1,000 Bay Area salaries on average of 120k (and that’s low). That’s $120M a year in just salaries!!

Furthermore…100,000 cores deployed roughly equates to 6,250 servers if I’m generous and assume 8 core CPU’s and two socket servers…..that means they have 1,000 engineers to manage 6.25 servers each. That’s pathetic by any respectable IT operations outfit. Those numbers and stats would get you fired most places!

You make a good point. We’ve made the decision that we don’t need Rackspace. We certainly aren’t Walmart, but with a spend of $40,000 a month across all three or four of our Rackspace accounts, we aren’t small potatoes.

We’ve had infrastructure with Rackspace the last four years and things have increasingly become worse:

(1) On our dedicated environment, I either get a new lead tech or team what seems like every other month. As a result, there is no continuity and I’m forced to re-explain our business, application, configuration mitigating any advantage of “fanatical support”.

(2) Getting new dedicated hardware takes way too long. A month ago, I wanted to meet to discuss adding three new servers and our coordinator was unavailable to meet for a week and then we are quoted two week lead times. I ask about their “rapid deployment” option and apparently it isnt available to existing customers!

(3) Contrary to marketing, there is no such thing as fanatical support on any of the public cloud products. The marketing says you get a dedicated team, but that’s only true if a 1,000+ Racker call center constitutes a dedicated team. The promises made on our quote, which are summarized here https://www.rackspace.com/managed_hosting/support/dedicatedteam just arent reflective of reality at all.

(4) When digging deeper on the support side, the marketing-speak is just over powering. Things like certified cloud engineers? Certified in what? What exactly is a cloud engineer? Everyone I interact with is a Linux Sys Admin, am I missing out on the cloud engineers?

(5) When discussing the disconnect with my AM, she continued to emphasize like a broken record that Rackspace is unique because of fanatical support. After using AWS for our staging environment the last six months, I can say that our support experiences have been better, faster, and more consistent there. At the end of the discussion my AM, wanted to upsell me to a “private cloud” that was more suitable for enterprises. It was going to cost more and still tie me into dedicated hardware, no advantage whatsoever for our growing business.

(6) I was going to reach out to a bunch of VPs and Directors as a last ditch effort I met at a customer advisory meeting a few years ago who were in the public cloud group only to find that every single one of them was no longer with the company.

All this to say, we’re moving to AWS and will save 60% or more and have a better, more reliable environment. I imagine Rackspace just didnt deliver Walmart enough value and they decided they could do it better themselves. That’s where we are.

Just to clarify….the Wal Mart team still relies on Rackspace for OpenStack support. In fact Amandeep Juneja (interviewed in this article) will be speaking at our Rackspace::Solve event in San Francisco coming up on March 4th. http://www.rackspacesolve.com/sanfrancisco.html

“…at first relying heavily on the usual vendors but increasingly building up its in-house talent pool”

And this is exactly how they made it production ready, by applying their own significant engineering efforts to fill the many, many gaps in OpenStack. Most companies don’t have access to this kind of resources and capital, so I would not use Walmart as an example that Openstack is production ready.

Most companies also don’t need to run 100,000 cores. There are plenty of vendors out there whom can assist companies whom run in the 100s to 10,000s of cores and make it as easy as consuming any other cloud resource.

Some in-house talent is needed but this is the same for any open source project. CERN has Linux experts to work out what is a CERN local issue rather than a product issue. We have the same for Microsoft Exchange to work out if it is a badly configured phone or a problem with IMAP on Exchange.