Automation is the key to saving cash in the cloud. But how to do it?

We’re a month out from our Structure conference, so we took the opportunity on this week’s Structure Show podcast to talk about one of the underlying themes of every Structure event: scalability. More specifically, how companies — from startups to the Fortune 500, web giants to cloud providers — build systems big enough to handle their needs but simple enough to actually manage.

We shed some light on this with two guests who have rather different ideas on how infrastructure should be managed: Bill Fathers, senior vice president and general manager of the Hybrid Cloud Services Business Unit at VMware; and Florian Liebert (above), founder and CEO of a systems-management startup called Mesosphere.

Here are the highlights of their interviews, but anyone interested in the economics and competitive landscape of cloud computing, or how the open source Mesos technology (and others like it) is running some large web properties, will certainly want to hear the whole thing. Alternatively, they can attend Structure on June 18 and 19 in San Francisco. Fathers himself will be speaking, as will engineering vice presidents from Leibert’s former employers — and Mesos devotees — Twitter and Airbnb.

VMware isn’t worried about the vaunted economies of scale

Fathers made it very clear during the interview that VMware, for a variety of reasons, doesn’t really view cloud providers such as Amazon Web Services and Google as legitimate competitors — even if they’re famously able to keep driving down the prices of computing and storage by adding efficiencies to their infrastructure.

“The single biggest driver that we see for getting your unit costs lower is automation and virtualization. So at the end of the day, if you’re very, very good at virtualizing your compute, your network and your storage, and you can automate it, you get to unit economics that make you world-class,” Fathers said. “We’re in 7 data centers but we’ll probably be in 15 by the end of the year. We’re already big enough that we’re starting to see some benefits of economies of scale, but having run very large global data center companies for many years, the economies of scale benefits are actually marginal. If fundamentally you haven’t virtualized and automated, the benefits of economies of scale actually start to dwindle.”

He continued: “When you get into uber, uber scale and you’re trying to take out another cent, then you get into things like building your own servers and building your own network switches, but we’re a long way from that. So, I’m genuinely not worried about my ability to operate at unit economics that will keep us competitive, because of our investments in virtualization.”

Bill Fathers, SVP and GM of VMware’s hybrid cloud services.

Still, VMware isn’t just standing idle as other cloud providers try to drive down prices by designing even their hardware from the ground up. “We have built our own switches and we’ve built our own firewalls — we’ve built everything — we just happen to have built it in software,” Fathers said. “So our entire stack is self-constructed, and it’s software, which is why we feel like we have the unit economics to compete.”

And why doesn’t VMware think cloud computing ultimately comes down to a storage-and-compute price war? Because the real value comes in selling higher-level services that big-time enterprises need for their mission-critical applications.

“[I]f you get dragged into a bare-knuckle fight about pure storage or pure compute, then I think you are in trouble,” Fathers explained. “… So, we’re using our own technology to get our own unique costs very low, we’re adding value on top of the platform that are directly related to our clients — often things that they couldn’t do for the same price themselves — and I think that’s increasingly how this will pan out.”

On the other hand, maybe virtualized servers aren’t all that

Mesosphere’s Leibert acknowledges that VMware’s — and its customers’ — general view of virtualization resulted in some great savings around consolidation over the past decade, but times are changing.

“If we look at VMs and how VMs have evolved in maybe the last 10 years or so … the hardware was really growing in terms of capacity and applications were relatively small, so virtual machines came in and allowed you to place — manually — multiple small applications on these big servers,” Leibert explained. “Today, if we look at a lot of applications that are being written — like Spark, like Hadoop and so forth — they are pretty much distributed systems from the get-go. And we are also using commodity hardware. So, today, this VM model doesn’t make as much sense any more. Rather than splitting up the applications onto multiple machines, we have to aggregate all the machines and present them to the application as a pool of resources.”

Florian Leibert

Enter Mesos, the open source resource-management software that runs entire data centers for companies like Twitter and eBay, and large cloud environments for companies such as Airbnb. It was built to be similar to Google’s vaunted Borg software that manages its hundreds of thousands of servers. Just specify the compute and memory resources an application needs, and the software handles the rest, including rebalancing load should servers go down. Leibert describes it (especially when combined with the Marathon technology Mesosphere built) as kind of like a platform-as-a-service layer, but much more.

And Mesos can run atop pretty much any Linux machine.

“HubSpot, for example, is also running their infrastructure on Amazon, and after switching to Mesos they cut their costs — their Amazon bill — in half. So there’s tremendous benefits to running Mesos on top of your VMs even,” Leibert said. “That doesn’t even account for the savings that you’ll have by not having to have as many [site reliability engineers], for example, looking after your nodes because your cluster looks very homogeneous when Mesos is running on top of it. Even if the hardware profiles differ from one box to another, the system looks really, really homogenous. That just makes administration much simpler.”

Asked whether it’s a fair argument that enterprises might not want to trust mission-critical or other important applications to a Mesos environment, Leibert suggested that Twitter — a company valued at nearly $20 billion — probably considers its service a mission-critical application.

“The key thing here is that Mesos is a really proven system,” he said. “… If one of the largest communication platforms of the internet is using this as the backend, it must be a battle-tested system.”

He added: “[T]hink about how many resources Twitter, Airbnb and Mesosphere, and other companies, have pooled into this project to make it a really highly available, fault-tolerant system that can scale to tens of thousands of servers. We did a back-of-the-envelope calculation, and we came up with a number — more than 30 million [man hours] have been invested into this project over the last four years or five years.”