Items Tagged: Cloudera

Working with the Taneja Group on the Customer Validation report was a true pleasure. We are very happy with how it turned out and have received numerous glowing praises on the quality of the work and value of the customer research findings. Many thanks for creating such an excellent demand generation resource, and for the webinar where you presented the findings of the report. Both assets have become very valuable tools for our demand generation efforts.

If you are in IT, 2013 is going to be the year that you will want to dive into the "big data" pool if you haven't been pushed in already. But don't worry - it's no longer sink or swim. For one, we'll be here to help coach IT folks through it all. And while the concepts, terminology and hype have been all over the place, once you start floating around you'll find that under the surface much of what fills the big data pool is familiar IT infrastructure, data management, and services re-cast around a few easy-to-grasp innovations.
For example, if you are in IT and asked to pick a Hadoop distro to stand up, you'd probably start with evaluating the three main distributions of Hadoop (other than getting it straight off Apache) followed by other downstream OEM'd and pre-integrated versions. The main distros are from Cloudera, Hortonworks, and MapR. I didn't really appreciate the differences until talking with all three individually (at 2012 NY Strata, see below).

When we talk about big data today we aren't talking just about the data and its three V’s (or up to 15 depending on who you consult), but more and more about the promise of big transformation to the data center. In other words, it’s about big money. First, consider recent news about some key Hadoop distro vendors. Many of them are now billion dollar players, much of that on speculation and expectation of future data center occupation....

For the IT crowd just now getting to used to the idea of big data's HDFS (Hadoop's Distributed File System) and it's peculiarities, there is another alternative open source big data file system coming from Cloudera called Kudu. Like HDFS, Kudu is designed to be hosted across a scale-out cluster of commodity systems, but specifically intended to support more low-latency analytics.
At it's heart, Kudu sits between the capabilities of HDFS and HBase to meet the growing use of interactive drill-down analytics (e.g. Impala) and the faster time-to-response Spark platform. It's a combination of on disk column store technology (for low latency queries) fronted by an in-memory write layer (for low latency updates/inserts), and fully distributed across the cluster....

It's time to look at big data again. Last week I was at Cloudera's growing and vibrant annual analyst event to hear the latest from the folks who know what's what. Then this week Strata (conference for data scientists) brings lots of public big data vendor announcements. A noticeable shift this year is less focus on how to apply big data and more about maturing enterprise features intended to ease wider data center level adoption. A good example is the "mixed big data workload QoS" cluster optimizating solution from Pepperdata.

In the last few months I’ve been really bullish on Apache Spark as an big enabler of wider big data solution adoption. Recently we got the great opportunity to conduct some deep Spark market research (with Cloudera’s sponsorship) and were able to survey nearly seven thousand (6900+) highly qualified technical and managerial people working with big data from around the world. ... Some highlights -- First, across the broad range of industries, company sizes, and big data maturities, over one-half (54%) of respondents are already actively using Spark to solve a primary organizational use case. That’s an incredible adoption rate....

Galactic Exchange announced today that the cloud version of ClusterGX™, the world's most easy to deploy and use Enterprise-Class Spark/Hadoop clustering platform, is now available for deployment on AWS free of charge.