Even though VMware initially called its Amazon competitor vCloud Hybrid Services, make no mistake, it’s the company’s public cloud (now renamed vCloud Air.)

And, VMware really wants workloads that might run ow on Amazon Web Services to come on over, says Bill Fathers, EVP and GM of cloud services for VMware. That’s a tall order. Face it, AWS has been around as, an old boss would have said since “Hector was a pup.” The first services launched in 2006, and vCloud Air is, what?? a two-year old toddler. Fathers said VMware now has thousands of customers on vCloud Air but said that wasn’t the plan. Initially, VMware wanted a few hundred key companies to act as “beachhead clients” who derive real value from its cloud, especially from vCloud Air’s networking infrastructure and has surpassed that goal, he said.

But Fathers point is that a small percentage of total computing is now running on any public cloud — he thinks it’s now 5 percent up from the two to three percent he thought it was last June at Structure. Which means that there’s a ton of work up for grabs.

Fathers positions both the Google relationship and this week’s Partner Exchange announcements. And he’s clearly not backing away from a fight with the biggest of big clouds.

Beyond the VMware universe there was a bit of big data moving and shaking with Cloudera buying Explain.io and its self-service query modeling expertise and Datastax picking up Aurelius, the keeper of the Titan graph database. Could this be a sign of even more M&A to come? That’s something we’ll hear more about at Structure Data from March 18-19 in New York from the CEOs of Cloudera, Hortonworks and other data powerhouses, so book your tickets now.

]]>The landscape of data solutions has been significantly disrupted in the last several years, on multiple fronts. Another such disruption is taking place now, with the mainstreaming of in-memory database (IMDB) products.

]]>“For me personally, no, I didn’t see this coming,” DataStax Co-founder and CTO Jonathan Ellis admitted during a recent phone interview about $106 million series E round of venture capital the company announced on Thursday.

“I felt that we were at the right place at the right time in terms of distributed databases being a problem that the industry needed to solve, but I don’t know that I saw quite this level of success,” he continued. “My wife was telling after one of my recent trips, ‘When you started this, I thought I was signing up for four years of you traveling and being gone, and now you’re raising a series E.'”

But DataStax, which was founded in 2010 and sells a commercial version of the open source Apache Cassandra distributed database, is officially a big company. Its latest round of financing brings the company’s total equity investment to $190 million and values the company at $830 million. Kleiner Perkins Caufield & Byers led the round, which also included new investors ClearBridge, Cross Creek, Wasatch, PremjiInvest and Comcast Ventures. DataStax’s existing investors all participated as well and, as co-founder and Chief Customer Officer Matt Pfeil noted, Series A investor Lightspeed Venture Partners “really went above and beyond” what was required.

They’re all excited by DataStax’s impressive growth: an employee headcount that has doubled to 350 in 2014 and will hit 450 by the year’s end; a 500-company-plus customer base that includes 25 percent of the Fortune 100; and offices and engineers spread across the United States, Europe (where a large number of its Cassandra developers are based) and Asia. New investor PremjiInvest came on board to help the company expand into India.

Not bad for a company that grew out of a kumbaya moment about six years ago, when open source non-relational databases began popping out of the woodwork, all collectively unifying under the NoSQL banner. Cassandra itself came out of Facebook, which developed it for some specific applications but eventually began moving future services to a similar open source technology called HBase. For a while, there was a collective rallying cry that relational databases couldn’t scale and NoSQL — pick a project, any project — was the answer.

For a look back to the good, old days, check out Pfeil pitching Riptano (as DataStax was originally called) at our Structure 2010 launchpad competition.

“If your dataset sits on one machine, it almost doesn’t matter what technology you use,” Ellis said.

Some of the other projects featured in the NoSQL Tapes.

At the high end of the market, though, where customers have big data and big budgets, technology does matter. But building a technology that can hold up at that scale in customers’ data centers without requiring them to employ a team of Facebook- or Google-level engineers means making some sacrifices. For DataStax, that meant, somewhat reluctantly and not always naturally, saying goodbye to its hacker roots and focusing on how to get its technology into enterprises.

Although that’s not to say DataStax has moved entirely away from the early adopter crowd. The company runs its own startup program that Ellis said includes more than 300 participants in Europe alone. And it’s still making some engineering bets that lean futuristic today and will take some time to make their way into the mainstream — such as the addition of support for Apache Spark, to improve the speed of analytic jobs, in the latest version of the DataStax Enterprise software.

“We’re kind of straddling two worlds right now,” Ellis explained. “We had our roots in kind of the Silicon Valley, early-adopter crowd, and we’re expanding rapidly in the more traditional enterprise market. I think the former of those is the one that’s like, ‘Oh, Spark is the greatest thing since sliced bread,’ and it hasn’t really registered quite yet with the less tech-centric customers.”

The latest DataStax marketecture diagram.

Despite its focus on being the biggest, fastest, most-scalable database around, though, the DataStax team knows that it can’t sacrifice usability entirely in the name of performance and scale. Ellis didn’t come right out and say it, but he did suggest that unlike in web startups or other engineering-centric companies that will run numerous databases for numerous different tasks, enterprise customers are often looking to settle on a relational database systems (such as Oracle or MySQL) and a NoSQL system. That might mean there’s not room for DataStax, Couchbase and MongoDB within the same company.

“On the one hand, they’re finding that Oracle is a really terrible way to scale to millions of users, but one the other hand they they don’t want to go full-on polyglot persistence because that’s a maintenance madhouse,” he said. “So really what they want to do is standardize on a relatively small toolset that they can use over and over. … Often that’s Oracle and Cassandra or MySQL and Cassandra. … Those two are general enough that you … might not necessarily need to add Redis into the mix, for instance.”

Competing primarily against, and often sitting side by side, a behemoth like Oracle seems long way off from when Ellis and Pfeil left Rackspace to start DataStax four years ago, Riptano moniker and rhinoceros logo in tow. Not that they’re complaining about the company’s good fortune.

“The hacker crowd using Node.js, we might not be the best fit for them,” Ellis said, “but I’m OK making that tradeoff in exchange for being a better tool for the Fortune 500 and the Fortune 1000.”

]]>NoSQL startup DataStax announced on Wednesday that it has added an in-memory option to its commercial version of the Cassandra key-value database. Cassandra is seeing an uptick in adoption right now because of its scalability and ability to span data centers, and the ability to serve data from memory instead of disk will make it a lot faster, too. If the approaches of startups like DataStax, MemSQL and others are any indication, it looks like databases of the future will feature broad ranges of capabilities, data formats and storage options.

]]>DataStax, a San Mateo, Calif.-based startup offering a commercial version of the Apache Cassandra NoSQL database, has raised a $45 million series D round led by new investor Scale Ventures. Existing investors Lightspeed Venture Partners, Crosslink Capital and Meritech Capital Partners, and new investors DFJ Growth and Next World Capital, also contributed. DataStax has now raised a total of $83.7 million in venture capital since launching in 2010 as Riptano.

Even if Facebook and Twitter (this time for real, I’ve heard) have stopped building applications on Cassandra, we’ve covered its continued use in companies such as eBay, Netflix, Eventbrite and BloomReach. There’s also a London-based company called Acunu, whose Founder and CEO Tim Moreton will be presenting at our Structure: Europe conference in September, that has built a real-time analytics platform on top of Cassandra. For its commercial version of the database, called DataStax enterprise, DataStax claims more than 300 customers, including 20 companies in the Fortune 100 (one of which is Disney).

DataStax’s OpsCenter management software, which also supports Hadoop and Solr for search.

Making hay one Oracle customer at a time

Cassandra’s success with such large users has to do with its ability to handle large-scale online applications that demand steady levels of performance, DataStax CEO Billy Bosworth told me. Scalability and performance have never been among Cassandra’s shortcomings, and the database is capable of replicating data across data centers. Large companies used to choose Oracle for applications that needed these capabilities, but now that NoSQL options are around and relatively mature, companies are rethinking whether the relational database model was ever really correct for some applications in the first place.

Billy Bosworth

And a good number of them, Bosworth said, are deciding it wasn’t and switching to Cassandra. When he joined DataStax in 2011, he thought NoSQL adoption would mirror that of client-server adoption, meaning a handful of customers moving to Cassandra from relational database by 2014 would be a sign of success. Everything actually happened much faster: “The scale I had anticipated for 2014 actually hit us in 2012,” Bosworth said.

In fact, he added, the company wasn’t even out raising money; investors came to DataStax with a lot of money at a good valuation, so it jumped on the opportunity. The new capital will help the company expand its European presence as well as generally ramp up product development and marketing.

Open source FTW!

He credits the open source nature of many NoSQL databases with helping speed the pace of adoption beyond what previous technological shifts have experienced. That makes sense. The monetary barriers are so much lower when it’s free (at least in terms of licenses) to get started and scale something that developer time and curiosity might be the biggest obstacles to trying new things that aren’t being offered as part of a commercial software product.

Of course, DataStax hasn’t exactly hurt its own cause. The company maintains a free Community Edition of its Cassandra distribution that also includes a lightweight version of the company’s management software, and then it has the full-featured Enterprise package that’s prepackaged with Hadoop and Solr for search. Both of DataStax’s products received substantial upgrades as of Tuesday, coinciding with the release of Apache Cassandra 2.0.

But even though DataStax has raised a lot of money and landed some big-name users, Bosworth isn’t so blind as to declare victory over the database market. He thinks we’re in an era of “polyglot persistence” (which is the database version of polyglot programming) in which developers choose the right database for the right job. Based on how many companies are running Cassandra, a relational database, and even other NoSQL stores such as MongoDB or maybe HBase, he’s probably right.

]]>Last year was a good year for NoSQL outfit DataStax. The big data company’s customer base increased roughly tenfold to 270, including 20 Fortune 100 firms and names such as eBay, Netflix and Thomson Reuters. It also picked up a $25 million C round in October, with one of the intended uses of that funding being global expansion. Now it’s making good on that promise by opening a European subsidiary.

As the company has noticed that much of its new customer base was sited in Europe, the Middle East and Africa (EMEA), its latest move makes sense: DataStax has opened up a London office, and it’s a full-on subsidiary rather than just a branch office.

As Bosworth told me, the idea here is to be able to respond quickly to European market demands, which range from language variation to a different style of partnership:

“Without any presence in EMEA, we ended up in 2012 with 10 percent of our customers located in the EMEA region – that was 100 percent inbound; we didn’t do any programs or outbound activity. We have Scoreloop in Germany, the mobile gaming platform, and Trademob, the mobile app platform. We have mobile carriers who are decommissioning Oracle because they have to have a multi-data-center solution, and a London-based bank chose DataStax over Oracle for their ecommerce platform.

“In the UK, the business aspect of it is not that different from the U.S. … but as you move into the European continent, you do want to have some local language skills. And when you move into France and Spain and Italy, now you’re into a very boutique partner network. Those partners have very good relationships with their customers but are often not on the same scale as a big [systems integrator] like Accenture. The only way to really get close enough to that partner network is for us to be in the region as well.”

With a portfolio as open-source-centric as DataStax’s is, Bosworth added, the company is also looking forward to hosting “a ton of meet-ups in the region” in the coming months.

]]>Disney is a massive company, but when it comes to its big data platform, the entertainment conglomerate looks a lot like a startup. Kind of, that is. By the sheer power of its will (and ingenuity), a small team has been able to craft a large custom platform out of Hadoop, NoSQL databases and other open-source technologies. But for better or for worse, doing big data at such a large company means playing by a different set of rules.

When it came to putting a big data platform in place, Arun Jacob, director of data solutions in the Disney Technology Solutions & Services group, told a room at the IE Group Big Data Innovation conference in Boston on Thursday that Disney chose to build something from scratch rather than buy software from a large vendor. Cost certainly played in a role, but really it was flexibility that made the decision.

Reduce, reuse, recycle

In order to provide the most value to the company, Disney’s big data platform has to be everything to everyone, which it turns out is a tall order. Initially, Jacob said, “We treated ourself like a small consulting organization and we had something to sell.” When a division wanted it to use the platform for a particular function, Jacob would say yes and then get busy actually figuring out how to build it.

Architecturally, it’s all about being able to recompose the path data takes through the platform and the components that are used for each particular purpose, or being able to easily replace pieces altogether if something better comes along. The Disney platform has a foundation of Hadoop, Cassandra and MongoDB complemented by a suite of other tools for particular use cases. The operations team uses the platform to view, analyze and index error messages, while another division runs a recommendation engine on top of it. Application developers get the high-throughput, low-latency data access they need, while the analytics team has the higher-latency data access it requires.

However, although Jacob wanted to keep costs down with open source software, he did have a luxury that most startups don’t — a budget for outsourcing and the occasional product. When he needed support with a Hadoop cluster, he could call Cloudera. When an implementation of Solandra (an open source search engine built atop Solr and Cassandra) tipped over under the weight of Disney’s scale, he bought the enterprise edition of DataStax’s Cassandra-based product (Solandra’s creator had since taken a job with DataStax and was expanding upon Solandra’s capabilities in DataStax Enterprise).

Flexibility isn’t free

The Solandra incident actually underscores the tradeoffs that come when you use free open-source software and don’t reach for the checkbook at any sign of trouble. “You pay for [open-source projects] late at night, you pay for them by learning to run them, you pay for them by reading people’s source code who even if you could read it, it still doesn’t make any sense,” Jacob said. But those things can be overcome if you’re willing to put in the time.

And at a company the size of Disney, those problems — and whole lot more — have to be overcome. For example, Jacob explained, you can fudge your way around things like fault tolerance, high availability and security when you’re standing up a deployment, but you do have figure out a way to achieve those things eventually.

Ready for mass consumption

You also have to make systems built on open-source software consumable by everyone who needs to use them. That means it’s not enough to just build a scalable and stable system; the system also has to be easy enough for thousands of internal developers of all types and all skill levels to use. In a six-person startup, Jacob said, it’s easy enough for everyone to just learn Hadoop in a month and then start using it, but that’s not the case in a large enterprise.

So his team made it easy.

In order to “remove the excuses” for business users not loading their data into the system, they just need to point the custom-built user interface at their files. (Disney’s platform is growing at 5TB a day, and there are still many other types of data it needs to house, Jacob said.) Because they’ve built wrappers around the technology, Jacob’s team doesn’t talk about Hadoop and MongoDB to internal users, only about analytics and queries. It built client frameworks in a bunch of programming languages so developers can interact with the platform without writing RESTful API calls.

In some cases, the team decided to hide the platform’s complexity from users; not to facilitate its use, but to keep loose-cannon developers from doing something crazy that could take down the whole cluster. It could show them all the controls and knobs in a NoSQL database, but “they tend to shoot each other,” Jacob said. “First they shoot themselves, then they shoot each other.”

Still, after all the work he put into building Disney’s big data platform, it’s not exactly a process Jacob is hoping to repeat as the platform evolves. The tools for managing big data are getting better, he said, so he still does a build-versus-buy analysis when it’s time to make a change. Building custom tools is fine when you don’t have a choice, but it’s not always wise when buying something could save untold man-hours and headaches.

Update: DataStax has informed me that the slides previously linked to here have been removed. If you want more technical details on Disney’s big data platform, a slide deck Jacob’s recent presentation at the Cassandra Summit is available here.

]]>Hadoop is on its way to becoming the de facto platform for the next-generation of data-based applications, but it’s not without flaws. Ironically, one of Hadoop’s biggest shortcomings now is also one of its biggest strengths going forward — the Hadoop Distributed File System.

Within the Apache Software Foundation, HDFS is always improving in terms of performance and availability. Honestly, it’s probably fine for the majority of Hadoop workloads that are running in pilot projects, skunkworks projects or generally non-demanding environments. And technologies such as HBase that are built atop HDFS speak to its versatility as storage system even for non-MapReduce applications.

But if the growing number of options for replacing HDFS signifies anything, it’s that HDFS isn’t quite where it needs to be. Some Hadoop users have strict demands around performance, availability and enterprise-grade features, while others aren’t keen of its direct-attached storage (DAS) architecture. Concerns around availability might be especially valid for anyone (read “almost everyone”) who’s using an older version of Hadoop without the High Availability NameNode. Here are eight products and projects whose proprietors argue can deliver what HDFS can’t:

Cassandra (DataStax)

Not a file system at all but an open source, NoSQL key-value store, Cassandra has become a viable alternative to HDFS for web applications that rely on fast data access. DataStax, a startup commercializing the Cassandra database, has fused Hadoop atop Cassandra to provide web applications fast access to data processed by Hadoop, and Hadoop fast access to data streaming into Cassandra from web users.

Cleversafe got into the HDFS-replacement business on Monday, announcing a product that will fuse Hadoop MapReduce with the company’s Dispersed Storage Network system. By fully distributing metadata across the cluster (instead of relying on a single NameNode) and not relying on replication, Cleversafe says it’s much faster, more reliable and scalable than HDFS.

GPFS (IBM)

IBM has been selling its General Parallel File System to high-performance computing customers for years (including within some of the world’s fastest supercomputers), and in 2010 it tuned GPFS for Hadoop. IBM claims the GPFS-SNC (Shared Nothing Cluster) edition is so much faster than Hadoop in part because it runs at the kernel level as opposed to atop the OS like HDFS.

Isilon (EMC)

EMC has offered its own Hadoop distributions for more than a year, but in January 2012 it unveiled a new method for making HDFS enterprise-class — replace it with EMC Isilon’s OneFS file system. Technically, as EMC’s Chuck Hollis explained at the time, because Isilon can read NFS, CIFS and HDFS protocols, a single Isilon NAS system can serve to intake, process and analyze data.

Lustre

Lustre is a an open source high-performance file system that some claim can make for an HDFS alternative where performance is a major concern. Truth be told, I haven’t heard of this combination running anywhere in the wild, but HPC storage provider Xyratex wrote a paper on the combination in 2011, claiming a Lustre-based cluster (even with InfiniBand) will be faster and cheaper than an HDFS-based cluster.

MapR File System

The MapR File System is probably the best-known HDFS alternative, as it’s the basis of MapR’s increasingly popular — and well-funded — Hadoop distribution. Not only does MapR claim its file system is two to five times faster than HDFS on average (although, really, up to 20 times faster), but it has features such as mirroring, snapshots and high availability that enterprise customers love.

]]>Top executives at NoSQL startups are putting on a brave face in response to Amazon Web Services’ new DynamoDB offering. They roundly cite the new product (as well as Oracle’s October entrance into the space) as validation for the technology NoSQL companies have been pushing for years, while generally dismissing the competitive ramifications of having major vendors now playing in the same pool. But is that confidence justified?

Validation is good

Dwight Merriman, CEO of MongoDB proprietor 10gen, summed up the general sentiment of his peers in an email response to my request for a comment:

The Amazon Dynamo DB announcement is further validation that NoSQL is a big deal, and we are excited to see large players like Oracle and Amazon recognizing the need for alternatives to the relational database. Their entry into the field makes it clear to all large enterprises that this is an important trend – as we have seen that traditional databases do not fit well with cloud computing. New database technologies will be needed in the cloud, and also in the enterprise private cloud.

DataStax CEO Billy Bosworth makes a similar argument on his blog, as did new Cloudant CEO Derek Schoettle during a Friday-morning phone call. He said DynamoDB is “awesome” and Cloudant is “excited about it.” “[AWS] will be a competitor by default,” he said “but their success will be our success.” As the saying goes, and as GigaOM Pro’s Jo Maitland explains in research note on DynamoDB (subscription req’d), a rising tide floats all boats.

But is competition really good?

However, there are plenty of reasons for NoSQL-based startups to fear these new big-name competitors. When competing against Oracle, the challenge will be to convince large enterprises that third-party NoSQL databases are a better fit with existing Oracle ecosystems than is Oracle’s custom-built offering. Nobody ever got fired for buying Oracle, and if it’s offering NoSQL as part of an integrated data environment that also includes a relational database, data warehouse and Hadoop, there might be a natural inclination to just go with Oracle.

With AWS and DynamoDB, however, NoSQL companies find themselves fighting for the websites and other web-based customers that are now their bread and butter. Sid Anand, who helped transition Netflix from Oracle to AWS’s SimpleDB to Cassandra and who now is on the LinkedIn infrastructure team, wrote on his blog earlier this week that “[i]f [your NoSQL database] is not hosted (e.g. by AWS), be prepared to hire a fleet of ops folks to support it yourself. If you don’t have the manpower, I recommend AWS’[s] DynamoDB.”

It appears some are following his advice. One commenter on a blog post by Apache Cassandra chairman (and DataStax co-founder) Jonathan Ellis detailing the technical differences between Cassandra and DynamoDB wrote, “Cassandra’s tech is superior, as far as I can tell. But we’ll probably be using DynamoDB until there is an equivalent managed host service for Cassandra. Moving to Cassandra is simply too expensive right now.”

It depends whom you ask

As with most cloud services, at least in their initial incarnations, DynamoDB definitely favors simplicity over lots of features and fine-grained control. Amazon CTO Werner Vogels explains as much in his post announcing the service. If those things are important, users are almost certainly better off choosing a full-featured database.

Ellis’ aforementioned post lays out the reasons one might choose Cassandra. A spokesperson for Basho, which develops the Riak database, sent me a list of three questions everyone should ask when choosing a NoSQL option:

Is this solution proprietary or open-source?

Is my data secure? Is the solution fault tolerant?

What are the querying capabilities for search and indexing?

Basho thinks might very well argue that Riak is superior to DynamoDB on all counts, and CTO Justin Sheehy said via email that Riak runs on any infrastructure and very likely will cost less to run over time. Assuming that’s true, it’s really just an extension of the discussion of tradeoffs of choosing cloud-based servers or relational databases, now applied to a NoSQL database.

Cloudant CEO Schoettle acknowledges there’s “about 60 percent overlap” between DynamoDB and Cloudant, but companies dealing with large data sets and trying to solve complex problems would be better off choosing his company’s hosted CouchDB-based service. While DynamoDB is “essentially a key-value store with a hash methodology,” Cloudant offers integrated search, replication and advanced data analysis capabilities. It also offers SSDs if customers need them.

There also are a handful of hosted MongoDB options available, including MongoHQ and MongoLab, and MongoDB instances are available through a number of IaaS and PaaS providers. DataStax’s Cassandra database is currently in private beta on the Heroku platform.

So perhaps NoSQL vendors really are right to welcome Amazon’s DynamoDB with open arms. “You can perhaps get a little weak in the legs [when you hear you’re competing with Amazon],” Schoettle said, but Amazon will go a long way toward educating potential customers on NoSQL, generally. When they realize they need something more, the existing camp of NoSQL will be there to help.

]]>DataStax, a Burlingame, Calif-based NoSQL startup, has created the first commercial distribution of the Apache Cassandra database and has just closed an $11 million Series B funding round. The money came from new investor Crosslink Capital and existing backer Lightspeed Venture Partners. Neither piece of news should come as a shock because as NoSQL products have been maturing over the past year, money has followed.

The new product is called DataStax Enterprise, and it melds the Cassandra database with Hadoop and DataStax’s existing OpsCenter product. Essentially, it sounds a lot like Brisk, the Hadoop distribution DataStax announced at our Structure: Big Data conference in March, only with some additional management features and enterprise fine tuning. What that means is a product designed to deliver real-time application performance and heavy-duty analytics on the same physical infrastructure, with both workloads benefiting from each other’s presence. If need be, Hadoop gets speedy access to web data, as do web applications to Hadoop data.

DataStax OpsCenter

Hopefully for DataStax, though, the new product — DataStax’s first commercial release other than its OpsCenter management tool — will put Cassandra back in the limelight. DataStax is pushing the analytics angle pretty hard, and that could turn out to be a smart decision in a very crowded NoSQL space. Tying in the Hadoop (plus Hive) integration make DataStax Enterprise stand out as almost a high-speed unstructured data warehouse on top of Cassandra’s proven reputation as a database for real-time, webscale applications.

Cassandra was an early darling in the NoSQL space — in large part because of its Facebook roots — but it has been somewhat overshadowed recently by projects such as CouchDB, MongoDB and HBase that have garnered lots of press and big-time users. The former two have commercial versions in place and are finding large-enterprise traction thanks to Couchbase and 10gen respectively, and even Facebook chose HBase over Cassandra to power numerous new features such as Messaging, real-time analytics and its “social inbox.” Couchbase and 10gen have also raised a lot of money recently, to the tunes of $14 million and $20 million, respectively, in the last two months.

DataStax also is rolling out a DataStax Community Edition today, which is a more-polished version of the free, open source Apache Cassandra distribution. Both products will be available in the fourth quarter of this year.