Tag: Amazon EMR

Wangechi Doble is an AWS Solutions Architect Introduction Apache HBase is an open-source, column-oriented, distributed NoSQL database that runs on the Apache Hadoop framework. In the AWS Cloud, you can choose to deploy Apache HBase on Amazon Elastic Cloud Compute (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on […]

Nick Corbett is a Big Data Consultant for AWS Professional Services Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently. Amazon EMR uses the popular open source framework Apache Hadoop combined with several other AWS products to do such tasks as web indexing, data […]

Jonathan Fritz is a Senior Product Manager for Amazon EMR ———————– Please note – Amazon EMR now officially supports Spark. For more information about Spark on EMR, visit the Spark on Amazon EMR page or read Intent Media’s guest post on the AWS Big Data Blog about Spark on EMR. ——–————— Over the last five […]

Jonathan Fritz is a Senior Product Manager for Amazon EMR. AWS Solutions Architect Manjeet Chayel also contributed to this post. The EMR File System (EMRFS) is an implementation of HDFS that allows Amazon Elastic MapReduce (Amazon EMR) clusters to store data on Amazon Simple Storage Service (Amazon S3). Many Amazon EMR customers use it to […]

Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Big Data is on every CIO’s mind. It is synonymous with technologies like Hadoop and the ‘NoSQL’ class of databases. Another technology shaking things up in Big Data is R. This blog post describes how to set up R, RHadoop packages and RStudio […]

This is a guest post by Kyle Porter, a Sales Engineer at Simba Technologies. Jon Einkauf, a Senior Product Manager for Amazon Elastic MapReduce and AWS Senior Technical Writer Jeff Slone also contributed to this post. —————- Note: Ports have changed on EMR 4.x,. Before walking through this post, please consult the EMR documentation to […]

Rahul Bhartia is an AWS Solutions Architect Introduction Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Pig and Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. […]

Steve McPherson is a Senior Manager for Amazon Elastic MapReduce Note: This post was updated 2/8/16. The Presto bootstrap action documented in the original post has been deprecated because EMR now offers a Presto-Sandbox as a full-fledged EMR application. For details, see the EMR sandbox. Amazon Elastic MapReduce (EMR) is a fully managed Hadoop-as-a-service platform […]