6 The Fundamentals of Hadoop Hadoop evolved directly from commodity scientific supercomputing clusters developed in the 1990s. Hadoop consists of a parallel execution framework called Map/Reduce and Hadoop Distributed File System (HDFS).

7 Latest Developments

8 HDFS Very high fault tolerance Can not be updated but corrections can be appended File blocks are replicated multiple types Three types nodes: Name Node (Directory) Backup Node ( checkpoint) Data Node-actual data

9 MapReduce A programing framework for library and runtime. just like.net Map Function - Take a task and break it down into small tasks Reduce Function - Combine the partial answers and find the combined list Master (Job Tracker) Is where you submit a query. Manages the Task Trackers which do the actual Map or Reduce task. Workers (Task Trackers) Do the work, just as each nodes in the cluster have a data node, they also have a task tracker

14 HDFS and MapReduce The Main Node: runs the Job tracker and The name node controls the files. Each node runs two processes: Task Tracker and Data Node

15 Hive and Pig MapReduce Java write many lines of code Pig Mostly used by yahoo highly used for data processing Shares some constructs with SQL e.g. filtering, selecting, grouping, and ordering. But syntax is very different from sql. Is more Verbose Needs a lot of training for users with limited procedural programming background. Gives you more control over the flow of data. Hive Mostly used by Facebook for analytic purposes Used for analytics Relatively easier for developers with SQL experience. Less control over optimization of data flows compared to Pig Not as efficient as MapReduce Higher productivity for data scientists and developers

19 Hortonworks June 2011 funded by $23 million from Yahoo! and Benchmark Capital as an independent company Horton the Elephant - Horton Hears a Who! Employs contributors to project Apache Hadoop October 2011 partnered with Microsoft : Azure and Windows Server. Cloudera founded in October 2008 started the effort to be Microsoft Azure Certified in October 2014.

35 Summary Understand your data growth to determine when to Scale-Out. Determine the right tool for the workload you have. Choose the right deployment of Big Data Solutions Hybridize, do not start from scratch!

Advanced Analytics & IoT Architectures Presented by: Tom Marek and Orion Gebremedhin Use Case: ETL Offloading Have you outgrown your data delivery SLAs? Get the right data at the right time 2 ETL Processing

Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions

Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

www.etidaho.com (208) 327-0768 Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014 5 Days About this Course This five day instructor led course teaches students how to use the enhancements

Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

Introducing the Reimagined Power BI Platform Jen Underwood, Microsoft Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor and manage user

Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of