This is the end of the preview. Sign up
to
access the rest of the document.

Unformatted text preview: SecureMR: A Service Integrity Assurance Framework for MapReduce Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Department of Computer Science, North Carolina State University Raleigh, North Carolina, United States { wwei5,jdu } @ncsu.edu, { gu,yu } @csc.ncsu.edu Abstract MapReduce has become increasingly popular as a powerful parallel data processing model. To deploy MapReduce as a data processing service over open systems such as service oriented architecture, cloud computing, and volunteer computing, we must provide necessary security mechanisms to protect the integrity of MapReduce data processing services. In this paper, we present SecureMR, a practical service integrity assurance framework for MapReduce. SecureMR consists of five security components, which provide a set of practical security mechanisms that not only ensure MapReduce service integrity as well as to prevent replay and Denial of Service (DoS) attacks, but also preserve the simplicity, applicability and scalability of MapRe- duce. We have implemented a prototype of SecureMR based on Hadoop, an open source MapReduce implementation. Our analytical study and experimental results show that SecureMR can ensure data processing service integrity while imposing low performance overhead. I. INTRODUCTION MapReduce is a parallel data processing model, proposed by Google to simplify parallel data processing on large clus- ters [1]. Recently, many organizations have adopted the model of MapReduce, and developed their own implementations of MapReduce, such as Google MapReduce [1] and Yahoos Hadoop [2], as well as thousands of MapReduce applications. Moreover, MapReduce has been adopted by many academic researchers for data processing in different research areas, such as high end computing [3], data intensive scientific analysis [4], large scale semantic annotation [5] and machine learning [6]. Current data processing systems using MapReduce are mainly running on clusters belonging to a single administration domain. As open systems, such as Service-Oriented Architec- ture (SOA) [7], [8], Could Computing [9] and Volunteer Com- puting [10], [11], increasingly emerge as promising platforms for cross-domain resource and service integration, MapReduce deployed over open systems will become an attractive solution for large-scale cost-effective data processing services. As a forerunner in this area, Amazon deploys MapReduce as a web service using Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (Amazon S3). It provides a public data processing service for researchers, data analysts, and developers to efficiently and cost-effectively process vast amounts of data [12]. However, in open systems, besides communication security threats such as eavesdropping attacks, replay attacks, and Denial of Service (DoS) attacks, MapRe- duce faces a data processing service integrity issue since service providers in open systems may come from different administration domains that are not always trustworthy.administration domains that are not always trustworthy....
View Full
Document