InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,950

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and XML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book. Plus, take 20% off when purchasing directly through IGI Global's Online Bookstore.

Abstract

Cloud computing has attracted a lot of interests from both the academics and the industries, since it provides efficient resource management, economical cost, and fast deployment. However, concerns on security and privacy become the main obstacle for the large scale application of cloud computing. Encryption would be an alternative way to relief the concern. However, data encryption makes efficient data utilization a challenging problem. To address this problem, secure and privacy preserving keyword search over large scale cloud data is proposed and widely developed. In this paper, we make a thorough survey on the secure and privacy preserving keyword search over large scale cloud data. We investigate existing research arts category by category, where the category is classified according to the search functionality. In each category, we first elaborate on the key idea of existing research works, then we conclude some open and interesting problems.

Introduction

Cloud computing has attracted a lot of interests from both the academics and the industries, since it provides efficient resource management, economical cost, and fast deployment (Armbrust, et al, 2010). For both personal and enterprise users, the cloud computing creates a lot of opportunities for them to enjoy innovation, collaboration, and convenience. Due to the huge potential economical benefits of cloud computing, a lot of companies have deployed their cloud centers. For example, the Elastic Compute Cloud (EC2) of Amazon, the App Engine of Google, the Azure of Microsoft, and Blue Cloud of IBM.

Although the cloud computing has a lot of benefits, both individual and enterprise users are not willing to outsource their sensitive data (e.g., personal health records, financial records) to the cloud server. Because once these data are outsourced to the cloud server, the corresponding data owners will lose direct control over these data. The Cloud Service Provider (CSP) would promise that they can preserve the security of these data by using techniques like fireware, virtualization, and Intrusion Detection System(IDS). However, since the CSP takes full control of these data, these techniques cannot prevent employers of the CSP from revealing sensitive data. Encryption would be an alternative way to solve the problem. However, data encryption makes the traditional plaintext based search schemes impractical. A probable solution is downloading all these encrypted files and decrypting them locally to find the desired files. However, this is obviously unrealistic, since it would cause unacceptable communication and computation cost for the end users, whose communication and computation capabilities are often constrained. Therefore, devising a secure and privacy search scheme over encrypted cloud data would be grateful.

To address this problem, secure and privacy preserving keyword search over large scale cloud data is proposed and widely developed. A secure search system often includes three entities, i.e., data owner, cloud server, and data users. The data owner outsources encrypted files and indexes to the cloud server. Authorized data users generate secret trapdoors and submit them to the cloud server. The cloud server further returns the search results without knowing sensitive data. We can categorize existing secure search schemes based on different criterions.

Second, based on the adopted encryption method, these researches can be categorized into: symmetric encryption based schemes and asymmetric encryption based schemes. The symmetric encryption based schemes often achieve high efficiency while the asymmetric encryption based schemes achieve strong security.

Third, based on the threat model, existing research works consider two different models, one assumes the cloud server to be “curious but honest”, i.e., the cloud server will follow the proposed schemes, but they will try to reveal the sensitive data of both the data owner and data users. The other one assumes the cloud server to be “dishonest” (Zhang, et al, 2015), i.e., the cloud server would probably return false retrieval result, therefore, the corresponding research works seek to verify the retrieval results.

Forth, based on the number of data owner involved in the search system, existing research works are divided into single owner model and multi-owner model. In comparison, the multi-owner model would be more adapted to be deployed in reality.

Fifth, based on the number of cloud server, there are two kinds of research works, i.e., secure keyword search on the centralized cloud server, and secure keyword search among distributed cloud servers.