]]>Druid, an open source database designed for real-time analysis, is moving to the Apache 2 software license in order to hopefully spur more use of and innovation around the project. It was open sourced in late 2012 under the GPL license, which is generally considered more restrictive than the Apache license in terms of how software can be reused.

Druid was created by advertising analytics startup Metamarkets(see disclosure) and is used by numerous large web companies, including eBay, Netflix, PayPal, Time Warner Cable and Yahoo. Because of the nature Metamarkets’ business, Druid requires data to include a timestamp and is probably best described as a time-series database. It’s designed to ingest terabytes of data per hour and is often used for things such as analyzing user or network activity over time.

Mike Driscoll, Metamarkets’ co-founder and CEO, is confident now is the time for open source tools to really catch on — even more so than they already have in the form of Hadoop and various NoSQL data stores — because of the ubiquity of software as a service and the emergence of new resource managers such as Apache Mesos. In the former case, open source technologies underpin multiuser applications that require a high degree of scale and flexibility on the infrastructure level, while in the latter case databases like Druid are just delivered as a service internally from a company’s pool of resources.

However it happens, Driscoll said, “I don’t think proprietary databases have long for this world.”

Disclosure: Metamarkets is a portfolio company of True Ventures, which is also an investor in Gigaom.

]]>Google just finished off another record-setting quarter and year for infrastructure spending, according to the company’s earnings report released last week. The web giant spent more than $3.5 billion on “real estate purchases, production equipment, and data center construction” during the fourth quarter of 2014 and nearly $11 billion for the year.

As we have explained many times before, spending on data centers and the gear to fill them is a big part of building a successful web company. When you’re operating on the scale of companies such as Google, Microsoft, Amazon and even Facebook, better infrastructure (in terms of hardware and software) means a better user experience. When you’re getting into the cloud computing business as Google is — joining Amazon and Microsoft before it — more servers also mean more capacity to handle users’ workloads.

Google Vice President of Infrastructure — and father of the CAP theorem — Eric Brewer will be speaking at our Structure Data conference in March and will share some of the secrets to building the software systems that run across all these servers.

But even among its peers, Google’s capital expenditures are off the chart. Amazon spent just more $1.1 billion in the fourth quarter and just under $4.9 billion for the year. Microsoft, spent nearly $1.5 billion on infrastructure in its second fiscal quarter that ended Dec. 31, and just under $5.3 billion over its past four quarters. Facebook only spent just over $1.8 billion in 2014 (although it was a 34 percent jump from 2013’s total).

]]>Among U.S. government agencies, the adoption of cloud computing hasn’t been moving full steam ahead, to say the least. Even though 2011 saw the Obama administration unveil the cloud-first initiative that called for government agencies to update their old legacy IT systems to the cloud, it hasn’t been the case that these agencies have made great strides in modernizing their infrastructure.

In fact, a September 2014 U.S. Government Accountability Office report on federal agencies and cloud computing explained that while several agencies boosted the amount of IT budget cash they spend on cloud services since 2012 (the GAO studied seven agencies in 2012 and followed up on them in 2014), “the overall increase was just 1 percent.” The report stated that the agencies’ small increase in cloud spending compared to their overall budget was due to the fact that they had “legacy investments in operations and maintenance” and were not going to move those over to the cloud unless they were slated to be either replaced or upgraded.

But there’s at least a few diamonds in the rough. The CIA recently found a home for its cloud on Amazon Web Services. And, in 2012, NASA contracted out with cloud service broker InfoZen for a five-year project worth $40 million to migrate and maintain NASA’s web infrastructure — including including NASA.gov — to the Amazon cloud.

This particular initiative, known as the NASA Web Enterprise Services Technology (WestPrime) contract, was singled out in July 2013 as a successful cloud-migration project in an otherwise scathing NASA Office of Inspector General audit report on NASA’s progress in moving to cloud technology.

Moving to the cloud

In August, InfoZen detailed the specifics of its project and claimed it took 22 weeks to migrate 110 NASA websites and applications to the cloud. As a result of the project’s success, the Office of Inspector General recommended that NASA departments use the WestPrime contract or a smilier contract in order to meet policy requirements and move to the cloud.

The WestPrime contract primarily deals with NASA’s web applications and doesn’t take into account high-performance computing endeavors like rocket-ship launches, explained Jonathan Davila, the InfoZen cloud architect and DevOps lead who helped with the migration. However, don’t let that lead you to believe that migrating NASA’s web services was a simple endeavor.

Just moving NASA’s “flagship portal” of nasa.gov, which contains roughly 150 applications and around 200,000 pages of content, took about 13 weeks to move, said Roopangi Kadakia, a Web Services Executive at NASA. And not only did NASA.gov and its related applications have to get moved, they also had to be upgraded from old technology.

NASA was previously using an out-of-support propriety content management system and used InfoZen to help move that over to a “cloudy Drupal open-source system,” she said, which helped modernize the website so it could withstand periods of heavy traffic.

“NASA.gov has been one of the top visited places in the world from a visitor perspective,” said Kadakia. When a big event like the landing of the Mars Rover occurs, NASA can experience traffic that “would match or go above CNN or other large highly traffic sites,” she said.

NASA’s Rover Curiosity lands on Mars

NASA has three cable channels that the agency runs continually on its site, so it wasn’t just looking for a cloud infrastructure that’s tailored to handle only worst-case scenarios; it needed something that can keep up with the media-rich content NASA consistently streams, she said.

The contract vehicle takes in account that the cost of paying for cloud services can fluctuate based on needs and performance (a site might get a spike in traffic on one day and then have it drop the next day). Kadakia estimates that NASA could end up spending around $700,000 to $1 million for AWS for the year; the agency can put in $1.5 million into the account that can cover any unforeseen costs, and any money not spent can be saved.

“I think of it like my service card,” she said. “I can put 50 bucks in it. I may not use it all and I won’t lose that money.”

Updating the old

NASA also had to sift through old applications on its system that were “probably not updated from a tech perspective for seven-to-ten years,” said Kadakia. Some of the older applications’ underlying architecture and security risks weren’t properly documented, so NASA had to do an audit of these applications to “mitigate all critical vulnerabilities,” some of which its users didn’t even know about.

“They didn’t know all of the functionalities of the app,” said Kadakia. “Do we assume it works [well]? That the algorithms are working well? That was a costly part of the migration.”

After moving those apps, NASA had to define a change-management process for its applications so that each time something got altered or updated, there was documentation to help keep track of the changes.

To help with the nitty gritty details of transferring those applications to AWS and setting up new servers, NASA used the Ansible configuration-management tool, said Davila. When InfoZen came, the apps were stored in a co-located data center where they weren’t being managed well, he explained, and many server operating systems weren’t being updated, leaving them vulnerable to security threats.

Without the configuration-management tool, Davila said that it would “probably take us a few days to patch every server in the environment” using shell scripts. Now, the team can “can patch all Linux servers in, like, 15 minutes.”

NASA currently has a streamlined devops environment in which spinning up new servers is faster than before, he explained. Whereas it used to take NASA roughly one-to-two hours to load up an application stack, it now takes around ten minutes.

What about the rest of the government?

Kadakia claimed that moving to the cloud has saved NASA money, especially as the agency cleaned out its system and took a hard look at how old applications were originally set up.

The agency is also looking at optimizing its applications to fit in with the more modern approach of coupled-together application development, she explained. This could include updating or developing applications that share the same data sets, which would have previously been a burden, if not impossible, to do.

A historical photo of the quad, showing Hangar One in the back before its shell was removed. Photo courtesy of NASA.

Larry Sweet, NASA’s CIO, has taken notice of the cloud-migration project’s success and sent a memo to the entire NASA organization urging other NASA properties to consider the WestPrime contract first if they want to move to the cloud, Kadakia said.

While it’s clear that NASA’s web services have benefited from being upgraded and moved to the cloud, it still remains hazy how other government agencies will follow suit.

David Linthicum, a senior vice president at Cloud Technology Partners and Gigaom analyst, said he believes there isn’t a sense of urgency for these agencies to covert to cloud infrastructure.

“The problem is that there has to be a political will,” said Linthicum. “I just don’t think it exists.”

Much like President Obama appointed an Ebola czar during the Ebola outbreak this fall, there should be a cloud czar who is responsible for overseeing the rejiggering of agency IT systems, he said.

“A lot of [government] IT leaders don’t really like the cloud right now,” said Linthicum. “They don’t believe it will move them in the right direction.”

Part of the problem stems from the contractors that the government is used to working with. These organizations like Lockheed Martin and Northrop Grumman “don’t have cloud talent” and are not particularly suited to guiding agencies looking to move to the cloud.

Still, as NASA’s web services and big sites are now part of the cloud, perhaps other agencies will begin taking notice.

]]>Microsoft is putting on the cloud computing full-court press this week, complete with appearances by CEO Satya Nadella on CNBC and at a press event in San Francisco on Monday. There’s a lot of talk about the scale of its cloud platform, and about its ability to serve hyperscale, enterprise and hybrid cloud environments. But scale alone, or even the biggest and best virtual servers around, won’t make or break Microsoft in the cloud: applications will.

It’s not that scale isn’t important, but it’s just a game that Microsoft has to prove it can play. When Nadella and cloud boss Scott Guthrie talk about Microsoft’s 19 data center regions, $4.5 billion in annual infrastructure spending and millions of servers under management (with current capacity to house about 10 million, if need be), they’re not talking about anything Google or Amazon Web Services can’t do or aren’t already doing. They’re really saying, “See, three can play at this game.”

Frankly, if the name of the game is just building out capacity in a race to host the most virtual servers at the lowest price, it’s not clear Microsoft can win, or that it would even want to. That’s only a defensible position for so long, until someone or something comes along to pull the rug out from under it like cloud computing did to the server business. The coronation as the “Dell, HP or IBM of the Cloud” is not exactly an honor Microsoft, Google or Amazon are fighting to claim.

But if you look a little below the surface, Microsoft’s real strength as a cloud provider begins to stand out: its software. It still sells a lot of it, which is what makes its hybrid cloud and enterprise cloud stories (“hyperscale” cloud is the third leg of the cloud-messaging trifecta the company has been pushing lately) so compelling. By making Azure play nice with current customers’ SQL Server, Windows Server and Active Directory environments, Microsoft has ensured there’s a very large installed base predisposed to give Azure a serious look.

Microsoft doesn’t even have to cling to the past in order to keep legitimate the enterprise software business it has spent decades building. On Monday, the company announced a new appliance called Cloud Systems Platform that’s essentially the entire suite of Azure services and features preinstalled on Dell hardware and ready to run inside customers’ data centers. Four years after Microsoft first floated the idea, Azure is now officially a cloud platform and a new software product to sell.

Nadella encapsulated the idea during his speech at Monday’s event, stating that when Microsoft says “mobile-first,” it means “the mobility of the the individual experience.” Finally, after years of talking about it, Microsoft under Nadella seems serious about delivering a single experience regardless where that software is running, or whatever that experience might be.

However, systems software is secondary to Microsoft’s primary software assets, which are still its applications. Microsoft went on the offensive — and risked coming across as petty (or, worse, unfunny) — against Google with the “Scroogled” campaign because Google’s line of collaboration and productivity services are a significant threat to a multibillion-dollar business Microsoft enjoys. A big reason the company is able to claim that 80 percent of the Fortune 500 is running on the Microsoft cloud, that its cloud brings in $4.4 billion in revenue per year, and that it’s storing 30 trillion objects in Azure Storage is that Bing, Office 365 and Xbox Live all run on Azure.

Google has the same sort of cycle in place, mind you, with the data it generates in search and its experience building web infrastructure leading to everything from Google Now to the rather impressive BigQuery, but it’s still playing catchup to Microsoft when it comes to getting its products used by those lucrative corporate customers.

Even when Microsoft announced its new Azure Marketplace on Monday — a service designed to cater to software vendors in the cloud much like Windows did on the desktop — I found myself most impressed with the promise offered up by Microsoft’s own applications. A live demonstration highlighted the ease with which Azure users can now deploy a Hadoop cluster running Cloudera’s software, but that’s not too novel considering Cloudera just launched last week a product to enable the same thing on the AWS cloud. And that Microsoft already offers its own Hadoop service called HDInsight and a tight partnership with Cloudera rival Hortonworks.

The part where Cloudera’s Mike Olson queried the company’s Impala SQL-on-Hadoop engine using a natural language interface in Excel — now that was something! Being able to query data in Excel, just by typing simple queries, could be a lot bigger factor in the deployment of new Azure instances running Hadoop than might be the fact that it’s now easier to do.

So, yeah, scale does matter in cloud computing and it does matter to Microsoft, but not just for the sake of running the most virtual servers at the lowest margins, or being the place where all the hot startups want to be. AWS and Google are going to win those fights, it’s in their DNA.

Microsoft knows how to build commercial software and how to sell it. VMware under Paul Maritz (a former Microsoft executive himself) made a short-lived play to load up on applications and become the Microsoft of the cloud, but it didn’t have the clout, the experience or the fortitude to carry through on it. If Microsoft is going to be the one legacy IT vendor that remains relevant in the decade to come, it’s going to be because it figures out how to do what it does at cloud scale.

]]>Salesforce.com CEO Marc Benioff (pictured above) has always talked about his company as a cloud computing provider, but as popular as its applications have been, Salesforce has never operated at the same scale as its infrastructure-based cousins such as Amazon Web Services or Google. However, things might be about to change thanks to a new data analytics service, called Wave, that the company is announcing on Monday.

The new service, which is underpinned by a search-like, massively parallel processing architecture, represents a “dramatic expansion” of the company’s data center footprint, according to Keith Bigelow, general manager and senior vice president of the Salesforce Analytics Cloud. In order to provide capacity for Wave, he said, Salesforce has deployed thousands of servers and “many, many, many, many petabytes of storage” into its data centers — an order of magnitude more capacity than all the rest of Salesforce.com.

Even if it’s still nowhere near the estimated million-plus servers operated by massive-scale operations such as Google and Microsoft (possibly even Facebook), Bigelow’s assertions about the scale of Wave suggest Salesforce understands the amount of capacity that’s required not just to serve lots of users, but to store, analyze and visualize their data, as well.

Servers, and lots of them, are a big part of what keeps major web services humming? Source: Microsoft

That’s a good thing, because the cloud part of Wave is what will ultimately make or break its success. Sure, Salesforce.com users have long complained about that product’s weak analytics features, but there are now plenty of good, easy and relatively inexpensive ways to analyze that data via API connections to other analytics applications. Making Wave a reasonably priced and highly scalable general-purpose analytics application, however, is where Salesforce really stands to benefit from it financially.

Research firm IDC estimates business analytics to be a roughly $40 billion market, Bigelow said, and Salesforce expects to take a big piece of it. It might be easy enough to distinguish itself from the legacy vendors that still command billions of that revenue, but the trickier part will be fending off the rising field of next-generation vendors that are increasingly based in the cloud, as well.

From a product perspective, Wave looks about like what you’d expect from an analytics application built in 2014. It’s intuitive, it’s fast, and it looks (based on the demo I saw) easy enough to drill down into data points, join datasets (in fact, it will suggest fields that might make good join candidates) or switch from view to view. It has a mobile app clearly designed for mobile displays, meaning it shows as much information as possible while keeping clutter to a minimum.

This being Salesforce, Wave certainly connects to data from the company’s other services, but also includes connectors to data in other business applications such as SAP and Oracle. And not to worry, users can also upload their own data in spreadsheet form if they need to analyze something not generated via a connected application or database.

What will be really interesting is to see how Salesforce evolves Wave in the months and years to come — whether it starts working in more advanced machine learning and predictive analytics, or perhaps begins targeting even smaller businesses than it already does. Bigelow said to “stay tuned” on the innovation front, because the company plans to iterate very fast.

Salesforce already changed the course of the CRM market and now, he said, “We think there’s the same opportunity to hit the reset button on the analytics industry.”

]]>Analytics startup Interana launched last week promising a product for analyzing event data that marries a super-scalable data store with an easy visualization layer. Gigaom covered the launch, touching on the company’s Facebook roots and its theory on why event data matters, but co-founders Ann and Bobby Johnson also came on the Structure Show podcast to delve a little deeper into these topics — and to explain why starting a company with your spouse can be not only rewarding, but also a wise business decision.

Here are some highlights from the interview, but do yourself a favor and listen to the whole thing. You’ll get co-host Barb Darrow’s and my thoughts on the big HP breakup, and then the whole story from the Johnsons on why they launched Interana, how companies are using it, and all that’s possible when you can track behaviors of anything from servers to customers over years.

Taking Facebook’s experiences to the mainstream

Bobby Johnson, Interana’s co-founder and CTO, spent nearly six years at Facebook and led its engineering team. Interana’s third co-founder, Lior Abraham, was also a Facebook engineer and was responsible for a popular analytics tool called Scuba. Here’s how their experiences at the social network giant informed the product design at Interana.

“We saw a lot of data. We built a lot of things to handle large interconnected data and try to make things really fast,” Bobby explained. “From that experience we just saw a lot of things that we could do better in the analytics space, so we’re taking a lot of those things that we learned from scaling Facebook and applying them to this problem.”

One of the things he saw was how fast the right tools could spread from their initial users and uses into entirely new areas:

One of the first things I wrote at Facebook was a program called Scribe, which just collects logs. And when I first wrote it I thought we had four use cases we were targeting and we could think of another three or four we thought might be interesting. So we stood it up and we made it really easy to stick in other datasets, and within six months we had more than 100 datasets. It turned out that people want to know the numbers, we never had a problem convincing people to care about the numbers. The problem was just it’s often sufficiently hard to get to them that people just kind of give up and they make a guess instead.

Ann Johnson answered on behalf of Abraham and his experience building Scuab:

He put in a lightweight backend that let you save two weeks worth of data, and people loved it. He thought it would be used by the performance team, but within months it was being used by all of engineering. And then it moved not only to engineering, but to customer service and HR and marketing, and they were all using the same simple interface.

The Interana founding team. Source: Interana

Rich data plus timestamps equals behavioral insights

“[N]ow that storage is very cheap, people are saving richer data, and what that rich data looks like is just a series of events over time,” Ann explained. “And the reason people are saving it is all of a sudden that opens up looking at behavior. So you can look at the behavior of your users, you can look at the behavior of your products. And the cool thing about behavior is that’s where a lot of your really interesting business metrics are.”

Asked about the difference between Interana and something like Splunk, she added, “The thing about machine data … that data kind of gets boring after a couple weeks, you never really go back and look at it. When you’re looking at user behavior data, you look at it over 10 years, you look at for as long as you can.”

The Facebook effect is already happening

Interana’s beta users include a list of big-name customers largely in the web space (Sony, Jive, Asana, Tinder and Orange Silicon Valley, among them) and they’re already doing exactly what Interana hoped by deploying Interana for specific use cases and then quickly finding new ways to use it.

“We call them ‘off-label’ uses,” Ann joked. “… The performance guys are always hungry for more data. The performance guys always run off in the corner and start using it as well. That’s happened in at least half of our beta customers.”

She added, “We’re trying very hard with our pricing to enable this kind of viral spread throughout the company, because that’s what we saw at Facebook and that’s what we’re seeing at our beta customers.”

A screenshot of the funnel analysis in Interana.

I asked whether that might not be easier with a cloud service rather than enterprise software, and Ann said Interana was initially planning to launch a cloud service (and still easily could) but was brought down to earth in part by customers’ concerns over having to move too much data.

“We were so sure we were going to do this in the cloud at first. We were like, ‘Salesforce did it in the cloud, everybody’s doing it in the cloud, we’ll do it in the cloud!'” she said. “But as soon as we started out doing some of our customer research, everyone was like, ‘No. We will not talk to you if you’re in the cloud.'”

The family that wire-wraps motherboards together …

Technology aside, the most notable thing about Interana might be that Ann and Bobby are married — a rare thing among tech startups. To ask them, though, it’s not weird at all, especially if it’s indeed true that investors look for good teams as much as good ideas.

“It’s wonderful,” Ann said of the startup experience so far. “We were at Caltech together, we used to wire-wrap boards together, we’d build crazy stuff for parties. We love building stuff together, we always knew that. It was always our dream to work together, but we never seemed to get it to work when we were working at other companies.”

That kind of relationship is hard to beat, Bobby said: “One of the things about starting a company is you have to have people you really trust and really know you can work with, and we’ve been doing projects together for 15 years. I can’t imagine starting a company with somebody I didn’t know that well.”

]]>Chinese search engine Baidu is trying to speed the performance of its deep learning models for image search using field programmable gate arrays, or FPGAs, made by Altera. Baidu has been experimenting with FPGAs for a while (including with Altera rival Xilinx’s gear) as a means of boosting performance on its convolutional neural networks without having to go whole hog down the GPU route. FPGAs are likely most applicable in production data centers where they can be paired with existing CPUs to serve queries, while GPUs can still power much behind-the-scenes training of deep learning models.

]]>The tech world noticed when Facebook announced on Monday the TODO group it had launched along with a collection of other large web companies, including Google and Twitter. Short for “talk openly, develop openly,” the group members plan to turn their shared experiences creating and managing open source projects — thousands of them collectively — into resources that can help other companies get started down the open source path. James Pearce, Facebook’s head of open source and developer advocacy came on the Structure Show podcast this week to talk about TODO and, more importantly, why open source software matters.

It’s an entertaining and insightful interview, definitely worth listening to for anyone considering open sourcing their software or wondering why anybody else would want to. There’s even a funny story about how the group got its name. But here are some highlights.

GitHub is just the first step

About a year into his new open source role, Pearce has had to wrap his head around hundreds of open source projects at Facebook and a culture that encourages open software, even if it didn’t prescribe any measures for doing it right. He explained his viewpoint on when a project is ready for public consumption:

Ultimately, if an engineering team has created a technology that they think is valuable to their community or is valuable to other companies that are operating at our scale, then by default we will consider open sourcing it. It’s a strong part of our culture, it’s a strong part of our identity. We obviously built Facebook originally on open source software, so we feel we have default obligation to share back.

Of course, we need to make sure — and this is the main criterion, I think — that the engineering teams that are open sourcing technology are going to follow through. Actually, the releasing part is pretty easy … The challenges actually come once we’ve launched and the community starts to adopt a technology, because that’s when we need to continue to support it, continue to improve it, continue to work with the community. And, I’ll be honest, in the past that’s an area where we haven’t done such a good job.

However, Pearce added, “Over the last year, I’ve been kind of firm about this and, as a result, the project that we have put out have been markedly more successful than those that were sort of done without a lot of thought and without a lot of discipline.”

James Pearce. Credit: Facebook

Navigating the license soup

Asked about Facebook’s approach to the myriad open source licenses available, Pearce explained that the company tries to release code under the best license for each project:

We do not have one single license that we use on every project. We are sensitive, I guess you could say, to cultural expectations of different communities. For example, when we’re open sourcing some of our big data infrastructure that’s perhaps Java-based, we might default more toward something like Apache because that’s more compatible with things like Maven and the kind of tools developers are used to using there. In other cases, particularly with our mobile libraries, we default to BSD, and if you look back through our portfolio there’s a variety of other stuff. … I think it’s more important for us to know that the way we license something is going to resonate well with the community that we think is going to use it than just trying to impose one model across all our products.

Internally, Facebook has its own list of open source licenses it likes, and if developers find software they like under one of those licenses, they’re free to use it. “We try not to let it get in the way of producing high-quality software and shipping it as fast as possible,” Pearce said.

Bringing together a diverse group in TODO

As competitive as companies such as Twitter, Google and Facebook can be on the product front, they’re starting to come together on the open source front. In fact, Pearce noted, the companies involved in TODO actually share more experiences and viewpoints than they have philosophical differences. During an interview on Monday at Facebook’s @Scale conference, he said the WebScaleSQL initiative that Facebook, Google, Twitter and LinkedIn launched in March served as an inspiration that a group like TODO could actually exist and work together.

“I’ve been amazed at how common our experiences have been as a group,” he elaborated during the podcast. “… ‘Oh my goodness, did we not think to talk to each other before we went off and rebuilt all these wheels?!’ The commonality is far greater than anything that’s different.”

He added, “Walmartlabs is one of the [companies] that people are a little surprised to see, they don’t necessarily associate Walmart with fast-moving open source. But actually, if you go github.com/walmartlabs, you’re going to be extremely surprised because there are tens if not hundreds of great projects on there that they’ve been building.”

TODO’s small, but original (but soon to expand) group of members.

Quality over quantity, guidelines over mandates

Pearce said TODO isn’t really interested in dictating what types of open source licenses its members or others must use, or in trying to set quotas about how frequently they must open source stuff. Rather, the group will focus on things such as building tooling and developing best practices, maybe building a directory of open source projects and labeling them based on how old they are or how well they’ve been maintained.

“If I look at the Facebook portfolio, it may be that only some of our own projects meet that criteria,” he acknowledged. “We have some big flagship projects that I know we run really well, but I’m honest in admitting that there are some other small projects that, yeah, we still could do a better job on. I would rather see tens or hundreds of projects run really, really well than I would see thousands or tens of thousands of projects that have been thrown over the wall and forgotten about.”

Open source isn’t just for Silicon Valley, or even just for web companies

Although TODO began its life very focused on tech companies, specifically those in Silicon Valley, Pearce said the goal is to expand broadly to any type of company that wants to get involved. During our chat on Monday, he suggested to me that even companies like John Deere, that many people don’t associate with software design, might have something to add. Give the organization a few months, and that just might come true.

“If we can help companies that haven’t got a history of open source get more comfortable with the concept, both of using it and of releasing their own software, then that will have been very, very successful,” he said during the podcast. “… Hundreds of companies have reached out to us since Monday; it’s been very exciting. … We’ve had universities reaching out to us, we’ve had airlines manufacturers reaching out to us, we’ve had all sorts of very surprising sectors.”

]]>Facebook announced a pair of open source efforts Monday at its inaugural @Scale conference in San Francisco, including an organization called TODO that will help other companies get started down their own open source paths, and a new networking tool called mcrouter.

TODO, which is short for “talk openly, develop openly,” is a collaboration between Facebook and other large technology companies that are strong open source advocates. They include Box, Dropbox, Github, Google, Khan Academy, Stripe, Square, Twitter, and Walmart Labs. Facebook promised more details on the project in the coming weeks, but James Pearce, the company’s head of developer advocacy and open source, shared the following description on a blog post:

[O]ur overall goal in this collaboration is to make open source easier for everyone. We want to run better, more impactful open source programs in our own companies; we want to make it easier for people to consume the technologies we open source; and we want to help create a roadmap for companies that want to create their open source programs but aren’t sure how to proceed.

This is not the first time Facebook has collaborated with other web companies in order to advance open source technology. In March, it announced a collaboration with three other companies — including TODO partners Google and Twitter — on a database project called WebScaleSQL. The technology is a version of the popular MySQL database that’s designed to help other companies run it at large scale without requiring them to spend untold amounts time and resources re-engineering it like the WebScaleSQL contributors had to.

A diagram of mcrouter. Credit: Facebook

Facebook’s other announcement, mcrouter, is the latest in a string of open source releases the company has made over the past several years, including popular technologies such as Cassandra and HipHop, and the hardware-based Open Compute Project. The gist is that it’s a tool for handling traffic on Facebook’s cache layer, which is where the company stores in memory data that needs to be accessed and served faster than is possible if it was on hard disk inside a traditional database system.

Mcrouter is a memcached protocol router that is used at Facebook to handle all traffic to, from, and between thousands of cache servers across dozens of clusters distributed in our data centers around the world. It is proven at massive scale — at peak, mcrouter handles close to 5 billion requests per second. Mcrouter was also proven to work as a standalone binary in an Amazon Web Services setup when Instagram used it last year before fully transitioning to Facebook’s infrastructure.

]]>It doesn’t have a enticing name like cloud computing or the appconomy, but cluster management really is some sexy stuff. Important, too: Done right, it’s the thing that makes the web run by letting companies including Google, Facebook and Twitter scale to billions of users without spending every spare dollar and every spare second of engineer time managing their servers.

Mesosphere is a startup (read more about it here and here) that’s built on top of the Apache Mesos technology. Mesos is essentially an open source version of the system that Google uses to automate its data centers, with end result being that many applications and services can share the same set of resources simultaneously because the system ensures that each gets everything it needs in order to run optimally. Mesophere makes it easier to deploy Mesos and achieve those benefits, and also adds some tooling on top of it.

In addition to the new Mesosphere cluster-deployment features, the two companies also worked together to integrate Kubernetes and Mesos, giving joint users the option to manage their Docker containers with Kubernetes and manage the whole cluster (Docker containers included) with Mesos. To borrow an analogy from Docker creator Solomon Hykes in a recent podcast interview, if a Docker application is a Lego brick, Kubernetes would be like a kit for building the Millennium Falcon and the Mesos cluster would be like a whole Star Wars universe made of Legos.

In an interview about the Mesosphere partnership, Google Cloud product manager Craig McLuckie described the evolution of Google’s systems from requiring an “inordinate” amount of effort to manage into the epitome of automation they are today. It was the move to containers, and then to Borg as the data center operating system, if you will, that really made the difference.

“The number of services we were able to maintain massively increased, and we were able to focus on other parts of the organization,” he said. Urs Hölzle, Google’s senior vice president for technical infrastructure, explained this evolution in more detail at our Structure conference in June.

[youtube=http://www.youtube.com/watch?v=I9R4P0TLViA&w=640&h=390]

That’s the same pitch cloud computing providers have been making for years, only few (save for those pushing platform-as-a-service offerings) really had an end-to-end automation story to speak of. The cloud has always made it drastically easier to procure resources and launch applications, but infrastructure as a service did not mean distributed architectures, high availability and pooled resources as a service. In many instances, those things still require some real effort to achieve (see, e.g., what Netflix has built for itself atop Amazon Web Services).

And although there’s no guarantee the world will buy into Mesosphere’s approach, or even into Google’s push around containers and Kubernetes, the cat is out of the bag when it comes to cluster management. Being able to scale like Google is cute, but being able to run like Google is sexy.

Companies offering cloud computing services or private-cloud software are going to have to figure out a strategy for providing this type of capability, or be left looking like yesterday’s news.