Using The Hortonworks Virtual Sandbox

Transcription

1 Using The Hortonworks Virtual Sandbox Powered By Apache Hadoop This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution- ShareAlike3.0 Unported License.

2 Legal Notice Copyright 2012 Hortonworks, Inc. The text of and illustrations in this document are licensed by Hortonworks under a Creative Commons Attribution Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC- BY-SA is available at sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Hortonworks, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. 2

4 About the Hortonworks Virtual Sandbox The Hortonworks Virtual Sandbox is a hosted environment that runs in an Amazon Web Services (AWS) EC2 environment. It provides a packaged environment for demonstration and trial use of the Apache Hadoop ecosystem on a single node (pseudo-distributed mode). For more details on the suite of components included in the Hortonworks Virtual Sandbox (including the Hortonworks Data Platform), see: About Hortonworks Data Platform. The Virtual Sandbox is accessible as an Amazon Machine Image (AMI) and requires that you have an account with AWS. An Amazon Machine Image (AMI) is a special type of pre-configured operating system and virtual application software which is used to create a virtual machine within the Amazon Elastic Compute Cloud (EC2). Prerequisites Ensure that you have the Amazon Web Services (AWS) account ID (see: EC2 Getting Started Guide). To connect to the AWS instance from a Windows client machine, ensure that you install PuTTY on your local client machine. See: Installing PuTTY. To be able to execute import jobs for Apache Sqoop and/or execute Java Map/Reduce programs, you must install Java Development Kit (JDK) on the AWS instance. Launching Hortonworks Virtual Sandbox Step 1: Configure AWS instances. Step 1-a: Log into the AWS Management Console and select the EC2 tab. On the left hand side, click on the EC2 Dashboard. Under Getting Started section, click the Launch Instance You should now see the Create New Instance wizard as shown below: button. 4

5 Keep the default selection ( Classic Wizard ) and click Continue. 5

6 Step 1-b: Select the Hortonworks Virtual Sandbox AMI. Select the Community AMIs tab. In the Viewing: pull-down menu, make sure it is set to either All Images or Public Images. Using the Search input box, enter hortonworks and press the return key. On the AWS Management Console, paste the AMI ID in the search box and click enter. Select the Hortonworks-HDP Single-Node AMI that is found. Note: the AMI returned is specific for your EC2 Region. Please refer to the FAQ below Where can I find AMIs for a specific AWS region? for information on using an AMI from a different Region. Step 1-c: Select the instance type. Keep the default value for the Number of Instances. From the Instance Type drop-down, select Large (m1.large) and click Continue. Keep the default Advanced Instance Options and click Continue. Enter a name for the instance and click Continue. Step 1-d: Select the EC2 Key Pair. You can use one of the following options: Option I: Select from existing Key Pair. If you already have existing EC2 Key Pairs, select the Key Pair you would like to assign to the instances. The following screen shot illustrates the Key Pair selection step: 6

7 OR Option II: Alternatively, use the following instructions: Select Create a new Key Pair and provide a name for your Key Pair. Click Create & Download your Key Pair. This will download the Key Pair file (for example: ec2-keypair) on your local client machine. Ensure that you remember this location. Click Continue. Step 2: Configure AWS security groups. You can use one of the following options: Option I: Select existing AWS security group. If you have an existing AWS security group, select Choose one or more of your existing security groups. Ensure that your security group uses all the ports as specified in Mandatory Ports For Hortonworks Virtual Sandbox. OR Option II: Alternatively, use the following instructions: Select Create a new Security Group. Provide arbitrary names for the Group Name and Group Description. Under the Inbound Rule section, select Custom TCP Rule from the Create a New Rule drop down. Provide the values for Port Range and click Add Rule button for each of the row provided in the following table: TCP Port (Service) Source 22 (SSH) /0 80 (HTTP) / / (HTTP) /0 443 (HTTPS) / / / /0 7

8 / / / / / / / / / / / / / / / / / / / /0 8

9 / /0 Table 1: Mandatory Ports For Hortonworks Virtual Sandbox IMPORTANT: This step can make the AWS instance vulnerable and therefore it is strongly recommended that you should not load sensitive data in your AWS instance. You should now see the following result on your AWS Management Console: Step 3: Review and launch. Review all of your settings. Ensure that your instance type is a Large (m1.large) instance. Click Launch. This should take two to three minutes to launch your instance. On your AWS management console, click on the Instances link (on the left hand side navigation menu). This will open the My Instances page on the right hand side. On the My Instances page, scroll to your newly launched AWS instance. (This instance will have the name as provided in Step 1-c). Select this instance and copy the public DNS name (as shown in the screen shot below). (The public DNS name is also required for Step 4 and Step 6.) 9

10 Step 4: Connect to your AWS instance using SSH. Step 4-a: On the AWS Management Console, browse to the My Instances page. Right click on the row containing your instance and click Connect. Step 4-b: Connect to the AWS instance using SSH. You will need the Key Pair file downloaded in step 1-d and the Public DNS Name for your AWS instance (obtained in Step-3). For UNIX: cd <Full_Path_To_Private_Key_Pair_File_On_Your_Client_Machine> chmod 400 <Name_Of Key_Pair_File> ssh -i <Full_Path_To_Key_Pair_File> Name_For_AWS_Instance> For Windows, see: Connecting from a Windows Machine using PuTTY. INFO: You can also copy this command from the AWS Management Console. Step 5: Install Java JDK. The Java JDK is required if you want to compile Java programs for use with Map/Reduce or import Sqoop jobs. By default, the Hortonworks Virtual Sandbox does not include the Java JDK. To install JDK, use the instructions listed below: On your local client machine, point your browser to download the Oracle Java JDK version 6, update 31. Download 64-bit JDK installer binary file (with *.bin extension). From your local client machine, perform a secure copy of the downloaded JDK binary files to the AWS instance cd <Path_To_Private_Key_Pair_File_On_Client_Machine> scp -i <Private_Key_Pair_File_Name> <Path_To_Java_JDK_Binary_File_On_Client_Machine> 10

12 Smoke Test Result ================================================================ Hadoop Smoke Test : Pass Pig Smoke Test : Pass Zookeeper Smoke Test : Pass Hbase Smoke Test : Pass Hcat Smoke Test : Pass Templeton Smoke Test : Pass Hive Smoke Test : Pass Sqoop Smoke Test : Pass Oozie Smoke Test : Pass ================================================================= Step 7: Enable access to the HDP Monitoring Dashboard from your local client machine. Step 7-a: On your local machine, open command line utility and execute the following command. (The public DNS name for the AWS instance can be obtained from Step -3). ping -c1 <Public_DNS_Name_For_AWS_Instance> RESULT: This command will provide you the IP address for your AWS instance. Step 7-b: On your local client machine, open a command line utility and execute the following commands. Replace the <ec2-public-ip-address> with the IP address obtained in Step 7-a. For Unix (use root privileges): sudo vi /etc/hosts <ec2-public-ip-address> hortonworks-sandbox.localdomain hortonworks-sandbox For Windows (use Administrator privileges): notepad \WINDOWS\system32\drivers\etc\hosts <ec2-public-ip-address> hortonworks-sandbox.localdomain hortonworks-sandbox INFO: Use the hosts file to enable access to the AWS instance for use in your local machine. This step ensures that you can access the HDP Monitoring Dashboard from your local client machine. For more details, see: Using the Hosts file. Step 7-c: Use the following URL to access the HDP Monitoring Dashboard: Step 7-d: To use Nagios UI, click on the Nagios tab Ensure that you use the following credentials: nagiosadmin/admin. NOTE: The Hortonworks Virtual Sandbox provides a total disk space of 840 GB for your data operations. The data on the AWS instance store will survive only if your instance is rebooted. This data will not be persisted under the following circumstances: Failure of an underlying drive. Running an instance on degraded hardware. Stopping an Amazon EBS-backed instance. 12

13 Terminating an instance. For more details, see: Amazon EC2 Instance Storage. Using Hortonworks Virtual Sandbox The Hortonworks tutorials are located here: /root/tutorial. You can also go to the Virtual Sandbox page and execute these tutorials. We are working hard to add more tutorials, so check back often. You can also take the Hortonworks training, check the latest class schedule here. Frequently Asked Questions (FAQs) Q. I am unable to connect to the HDP Monitoring Dashboard. This problem arises if the firewall settings for your system are enabled. Follow the instructions listed below to disable firewall settings for your AWS instance: Step 1: Verify if the existing firewall settings for the AWS instance are disabled: /etc/init.d/iptables status If this command does not display the following message, execute the next step to disable the existing firewall settings (stop iptables). Step 1: Execute the following command to disable existing firewall settings. /etc/init.d/iptables stop Q. I am unable to see the metrics on the Ganglia UI. This happens when your Ganglia server fails to start. Use the following steps to restart the Ganglia server: Connect to the AWS instance (using command line utility or PuTTY). Execute the following commands as a root user (su -): /etc/init.d/hdp-gmetad restart /etc/init.d/hdp-gmond restart Q. How to execute the smoke tests to verify if my HDP components are working correctly? Smoke tests are executed after the AWS instance is launched. These tests can be executed again using the following command: /etc/init.d/hdp-stack syscheck Q. How do I start or stop individual Hadoop services? To start or stop all the Hadoop services, you can use the following auxiliary scripts provided with the Hortonworks Virtual Sandbox: To stop HDP services: /etc/init.d/hdp-stack stop To start HDP services: /etc/init.d/hdp-stack start 13

Single Node Hadoop Cluster Setup This document describes how to create Hadoop Single Node cluster in just 30 Minutes on Amazon EC2 cloud. You will learn following topics. Click Here to watch these steps

Using Amazon EMR and Hunk to explore, analyze and visualize machine data Machine data can take many forms and comes from a variety of sources; system logs, application logs, service and system metrics,

Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services Sayed Hadi Hashemi Last update: August 28, 2015 1 Overview Welcome Before diving into Cloud Applications, we need to set up the environment

Getting Started with Oracle Data Mining on the Cloud A step-by-step graphical guide to launching and connecting to the Oracle Data Mining Amazon Machine Image (AMI) version 0.86 How to use this guide This

Online Backup Guide for the Amazon Cloud: How to Setup your Online Backup Service using Vembu StoreGrid Backup Virtual Appliance on the Amazon Cloud Here is a step-by-step set of instructions to get your

19.10.11 Amazon Elastic Beanstalk A Short History of AWS Amazon started as an ECommerce startup Original architecture was restructured to be more scalable and easier to maintain Competitive pressure for

Web Application Firewall Getting Started Guide August 3, 2015 Copyright 2014-2015 by Qualys, Inc. All Rights Reserved. Qualys and the Qualys logo are registered trademarks of Qualys, Inc. All other trademarks

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

Technical White Paper jwgoerlich.us Secure Web Browsing in Public using Amazon J Wolfgang Goerlich Written July 2011 Updated August 2012 with instructions for Mac users by Scott Wrosch. Abstract The weary

Pseudo code Running Knn Spark on EC2 Documentation Preparing to use Amazon AWS First, open a Spark launcher instance. Open a m3.medium account with all default settings. Step 1: Login to the AWS console.

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not

Hadoop and AWS Developing with Hadoop in the AWS cloud Hadoop is Linux based. You can install Linux at home and run these examples. We will create a Linux instance using AWS and EC2 to run our code. Log

Using ArcGIS for Server in the Amazon Cloud Randall Williams, Esri Subrat Bora, Esri Esri UC 2014 Technical Workshop Agenda What is ArcGIS for Server on Amazon Web Services Sounds good! How much does it

CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT With this post we thought of sharing a tutorial for configuring Eclipse IDE (Intergrated Development Environment) for Amazon AWS EMR scripting and development.

Quick Start Guide Cerberus FTP is distributed in Canada through C&C Software. Visit us today at www.ccsoftware.ca! How to Setup a File Server with Cerberus FTP Server FTP and SSH SFTP are application protocols

IBM WebSphere Application Server Version 7.0 Centralized Installation Manager for IBM WebSphere Application Server Network Deployment Version 7.0 Note: Before using this information, be sure to read the

Guide to the LBaaS plugin ver. 1.0.2 for Fuel Load Balancing plugin for Fuel LBaaS (Load Balancing as a Service) is currently an advanced service of Neutron that provides load balancing for Neutron multi

Sangoma VM SBC AMI at AWS (Amazon Web Services) SBC in a Cloud Based UC/VoIP Service. One of the interesting use cases for Sangoma SBC is to provide VoIP Edge connectivity between Soft switches or IPPBX's

The Amazon Web Services (AWS) Storage Gateway uses an on-premises virtual appliance to replicate a portion of your local Drobo iscsi SAN (Drobo B1200i, left below, and Drobo B800i, right below) to cloudbased

Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

How to install PowerChute Network Shutdown on VMware ESXi 3.5, 4.0 and 4.1 Basic knowledge of Linux commands and Linux administration is needed before user should attempt the installation of the software.

Installation and Configuration Guide for Windows and Linux vcenter Operations Manager 5.0.3 This document supports the version of each product listed and supports all subsequent versions until the document

AWS Starting Hadoop in Distributed Mode This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download. 1) Start up 3

FOR MAGENTO COMMUNITY EDITION Whenever a patch is released to fix an issue in the code, a notice is sent directly to your Admin Inbox. If the update is security related, the incoming message is colorcoded

Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice. Before installing and using the product, please review the readme files,

VMware vcenter Log Insight Getting Started Guide vcenter Log Insight 1.5 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by

CONFIGURING MICROSOFT SQL SERVER REPORTING SERVICES TECHNICAL ARTICLE November/2011. Legal Notice The information in this publication is furnished for information use only, and does not constitute a commitment

orrelogtm Security Correlation Server Quick Installation Guide This guide provides brief information on how to install the CorreLog Server system on a Microsoft Windows platform. This information can also