Five or six years ago, analysts working with big datasets made queries and got the results back overnight. The data world was revolutionized a few years ago when Hadoop and other tools made it possible to getthe results from queries in minutes. But the revolution continues. Analysts now demand sub-second, near real-time query results. Fortunately,... more...

Five or six years ago, analysts working with big datasets made queries and got the results back overnight. The data world was revolutionized a few years ago when Hadoop and other tools made it possible to getthe results from queries in minutes. But the revolution continues. Analysts now demand sub-second, near real-time query results. Fortunately,... more...

"Data is here, it's growing, and it's powerful." Author Cathy O'Neil argues that the right approach to data is skeptical, not cynical??it understands that, while powerful, data science tools often fail. Data is nuanced, and "a really excellent skeptic puts the term 'science' into 'data science.'" The big data revolution shouldn't be dismissed as hype,... more...

"Data is here, it's growing, and it's powerful." Author Cathy O'Neil argues that the right approach to data is skeptical, not cynical??it understands that, while powerful, data science tools often fail. Data is nuanced, and "a really excellent skeptic puts the term 'science' into 'data science.'" The big data revolution shouldn't be dismissed as hype,... more...

Although you don?t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS). Authors... more...

How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you?ll learn Flume?s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems. Using Flume... more...

Planning to deploy and maintain a public, private, or hybrid cloud service? This cookbook?s handy how-to recipes help you quickly learn and install Apache CloudStack, along with several API clients, API wrappers, data architectures, and configuration management technologies that work as part of CloudStack?s ecosystem. You?ll learn how to use Vagrant,... more...

Get up and running with OpenStack Swift, the free, open source solution for deploying high-performance object storage clusters at scale. In this practical guide, Joe Arnold, co-founder and CEO of SwiftStack, brings you up-to-speed on the basic concepts of object storage and walks you through what you need to know to plan, build, operate, and measure... more...