3 About the Hadoop File System (HDFS) WORM access model Uses commodity hardware with the expectation that failures will occur Reads data in large, contiguous data blocks and process very large files Is Hardware agnostic Assumes that moving computation is cheaper than moving data 9

4 HDFS Performance is Limited HDFS Premise Moving Computation is Cheaper Than Moving Data The data ALWAYS has to be moved Either from local disk Or from the network Includes Replication operations for availability Results data movement And with a good network: the network wins Hadoop performance is gated by file system performance 10

5 Hadoop File System (HDFS) Challenges Performance a lack of caching in the case of random loads slow file modifications due to WORM and synchronous replication HTTP used for data transfer cannot use DMA Scalability Large block sizes limits the number of files Limits full use of resources in the case when data is not at the CPU HDFS RAID can eliminate need for replication but impacts CPU Storage Not POSIX compliant and non-general purpose access Data transfer into and out of Hadoop environment is required Data Replication storage costs 11

Highest reliability, availability and serviceability ClusterStor gets you productive fast with robust professional service offerings available as part of solution delivery, including quality controlled

Xyratex Update Michael K. Connolly Partner and Alliances Development Is Now 2 The Continued Power of Xyratex Global Solutions Provider of High Quality Data Storage Hardware, Software and Services Broad

Map/Reduce on Lustre Hadoop Performance in HPC Environments Nathan Rutman Senior Architect, Networked Storage Solutions Notices The information in this document is subject to change without notice. While

MAKING THE BUSINESS CASE LUSTRE FILE SYSTEMS ARE POISED TO PENETRATE COMMERCIAL MARKETS table of contents + Considerations in Building the.... 1... 3.... 4 A TechTarget White Paper by Long the de facto

www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

Mellanox Accelerated Storage Solutions Moving Data Efficiently In an era of exponential data growth, storage infrastructures are being pushed to the limits of their capacity and data delivery capabilities.

Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).

ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

HPC Advisory Council September 2012, Malaga CHRIS WEEDEN SYSTEMS ENGINEER WHO IS PANASAS? Panasas is a high performance storage vendor founded by Dr Garth Gibson Panasas delivers a fully supported, turnkey,

Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed