Transcript of "A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems "

1.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 1 A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems Umit Cavus Buyuksahin, Maria Stylianou, Nicos Demetriou, Muhammad Adnan Khan Abstract—Over the last decades, distributed systems are pro- their capacity. Due to this demand, researchers turn to unusedmoted for extended computations and are presented as the ideal storage resources. Globally, there are many personal computersstorage space for large amounts of data. Distributed Storage whose resources are not fully used by their owners. VolunteerSystems have been moved from the centralized architecture to amore decentralized approach. This change allows such systems to computing systems aim to use these storage for enormous-be used by volunteer computing systems, where the exploitation sized computations by considering them as if they were partsof any available storage and resources is essential and greatly of a huge supercomputer. This is a powerful way to utilizeneeded. This survey explores the characteristics of scalable distributed resources, in order to complete large-scale tasks.decentralized storage systems that can be used by volunteer Volunteer computing systems have two main bases [7]. Thecomputing systems and discusses the various existing systems interms of the speciﬁed characteristics. For each surveyed system ﬁrst one is the computational base, in which large computa-we give a brief description and whether the required properties tion tasks are split into smaller tasks which are assigned toare ensured. volunteer participants’ computers. The second base is called Index Terms—decentralized storage systems, volunteer com- participative base and it deploys large number of volunteerputing systems participants who offer their resources. One of the well known volunteer computing systems is SETI@home launched by BOINC projects [8]. Nowadays, I. INTRODUCTION SETI@home works with about one million computers which Storage is one of the fundamental parts of the computing provide approximately 70 TeraFLOPs processing rate [8].[1]. Although it has lower speed than RAM, it has great Of course this resource usage can be increased when wepersistence and low cost. Thus, central storage systems were look at the potential resource in the world. However this isconstructed and focused on reliability, stability, and efﬁciency. unnecessary since the network is growing rapidly.However, nowadays computation is not limited on a central These volunteer computing systems produce huge amountsstorage space, but it is executed in a global environment, of computational data that should be stored. This data maylike Internet. As Internet becomes part of this computation, it be used for later processing or sharing with other scientiﬁcproduces huge amounts of information that need to be gathered organizations that may contribute to science area. However,and stored. For addressing this challenge, distributed storages today’s volunteer computing systems use centralized stor-systems are introduced. In this design, data stored by hosts age systems [9] to distribute data to participants. It suffersbecome geographically distributed. Because of this distribu- from limitations of centralized storage systems such as fault-tion and the appearance of huge demands, new challenges tolerance, availability and scalability.arise, such as fault-tolerance, availability, security, robustness, In order to pass over these limitations, new storage systemssurvivability, scalability, anonymity. are developed which are decentralized and can be used by With the grow of Internet, distributed storage systems are volunteer computing systems efﬁciently. As previously men-able to scale using larger amounts of users. This growth tioned, there are many kind of decentralized storage systems.has emerge the difﬁculty of having one central point for However, not all of them are suitable to be used in volunteeradministrating the system. Therefore, it is observed in other computing systems. In this survey we study several storagesurveys that these systems are moving from the centralized systems, we discuss their characteristics and challenges and wearchitecture to a more decentralized approach [1]. propose the most proper one to be used in volunteer computing Meanwhile, supercomputers are situated among us exe- systems.cuting big computations which require huge storage, power The rest of the paper is organized as follows: In section 3,and computational resources, and lead to a rapid decrease of we present related work done by other researches in the ﬁeld. In section 4, design issues of decentralized storage systems Umit Cavus Buyuksahin, Universitat Politecnica de Catalunya (UPC). E- that can be used in volunteer computing systems are examinedmail: ucbuyuksahin@gmail.com Maria Stylianou, Universitat Politecnica de Catalunya (UPC). E-mail: by extracting characteristics. In section 5 we brieﬂy overviewmarsty5@gmail.com some of the existing decentralized storage systems. Later on, Nicos Demetriou, Universitat Politecnica de Catalunya (UPC). E-mail: in section 6 we compare them regarding their characteristicsnicosdem7@gmail.com Muhammad Adnan Khan, Universitat Politecnica de Catalunya (UPC). E- and beneﬁts and propose the most suitable one to be used inmail:malikadnan78@gmail.com volunteer computing systems. Finally, in section 6 we conclude

2.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 2the survey with our ﬁnal remarks about the systems studied. anonymity in volunteering can increase the number of par- ticipants which is highly appreciated and encouraged. What is II. R ELATED W ORKS more, anonymity can be a way to prevent the denial of access for special groups of people, which is possible when personal In this section we present the different surveys related information is shared.to the subject that we are focused on. [3] discusses the 5) Robustness: Both types of systems, storage and vol-different properties of the Peer-to-Peer based distributed ﬁle unteer computing are prone to failures, as machines maysystems. It shows the various beneﬁts of using P2P systems, crash, reboot, or change location with different network char-the design issues and properties. In addition it presents the acteristics and capabilities. In order to efﬁciently associatemajor distributed ﬁle systems comparing the advantages and decentralized storage systems with volunteer ones, the formerdisadvantages for each one in detail. As well, [4] provides an systems should be robust enough to handle these changes andinsight into existing storage systems, giving a good overview repair themselves in the case of failures, in order to preserveof each and describes the important characteristics they should this advantage in volunteer computing systems as well.have. In [1], a variety of distributed storage systems is coveredin depth, presenting their functionalities and putting the readerinto the problems that these systems face and the solutions IV. D ECENTRALIZED S TORAGE S YSTEMSproposed to overcome them. A quite short but rich paper is In the following section, we present a short summary for the[2] discusses the evolving area of distributed storage systems storage systems studied, referring to the previously explainedand gives a brief summary of some related systems in order properties.to provide a broader view for the subject. A. FreeHaven III. P RINCIPAL C HARACTERISTICS OF D ECENTRALIZED FreeHaven [10] ﬁrstly came with a solution about S TORAGE S YSTEMS anonymity whose implementation is not commonly handled by Several decentralized storage systems have been proposed distributed storage systems. This means that it provides peersover the last years. However, not all of them are suitable to distribute and share data anonymously by protecting peers’for volunteer computing. Speciﬁc characteristics should be identity. The other goals of FreeHaven are: (a) Persistence forexamined and we should ensure their existence in the intended determining lifetime of documents, (b) Flexibility for changingstorage systems, in order to meet the requirements of volunteer systems functions, (c) Accountability for limiting damage tocomputing systems. Below, we analyze the most important system.ones, their speciﬁcations and effects. Since there is not a hierarchy and all nodes are on the 1) Symmetry: Symmetry is a desired characteristic as much same level, it is a pure peer-to-peer system, it is symmetricfor decentralized storage systems as for volunteer computing and balanced. Despite of the fact that nodes do not have spe-systems. In the case of storage systems, and more precisely in cial capability unlike client-server systems, they have specialpure peer-to-peer systems, symmetry exists when all peers are roles such as the author who initially creates documents, theon the same level with equivalent functionality [3]. Similarly, publisher who put the documents to FreeHaven system, thein the case of volunteer computing systems, each volunteer reader who takes documents from systems, and servers whoparticipant does not have priority nor a special treatment provide storage. All these nodes have a pseudonym and nodescompared to others. Also, volunteers do not need a permission know each other by their pseudonym. Thus, locating the peersfrom an administrator to execute a task or to save data. This is a difﬁcult issue. In addition, tracing the routes is difﬁcultis done by deﬁnition independently and automatically. issue as well, since FreeHaven uses onion routing that is used 2) Availability: In volunteer computing systems, it is ex- for broadcasting the queries. The difﬁculties in both locatingpected that participants can not be enforced to enter the system peers and tracing the routes is for protecting the user identityor leave the system in speciﬁc moments. Data should be reach- that means supplying anonymously communication. Serverable independently from the peers status, from their location nodes periodically trade parts of documents called shares withand from the time of the request. Therefore, availability is an each other. That trading gives ﬂexibility to the system inessential property for decentralized storage systems in order the sense that servers can join and leave easily and withoutto be used in volunteer computing systems. special treatment. For trading, nodes are chosen by a node 3) Scalability: Another important issue that has to be list that is ordered by reputation. While a successful tradeconsidered in both storage and volunteer computing systems, is increases the node’s reputation, malicious behavior decreasesthe system’s scalability. Apparently, in decentralized systems, it [1]. In order to avoid malicious behavior and limitingit is mandatory that they can scale enough regarding the damage the system, each node notiﬁes its buddies about sharenumber of nodes. Scalability is an essential property for these movements. This buddy mechanism supplies accountability.systems, in order to ensure that their functionality is preserved Moreover, FreeHaven is also robust since it can keep documentwith the increase system’s size. although a high threshold of its shares is lost. 4) Anonymity: In volunteer computing systems, it is highly Because of its pursuit of anonymity, persistence, ﬂexibilitydesirable from volunteers to keep their identity secret, while and accountability; efﬁciency and convenience are ignored.offering their resources. People are less willing to help when In order to supply availability it uses trading mechanismthey are required to share personal information. Therefore, instead of replication mechanism, thus the system is not highly

3.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 3available[2]. Finally, inefﬁcient broadcasts for communication and write operations. Though, the number of users that canmake FreeHaven less efﬁcient. use Ivy is limited. Thus, it is designed to be utilized by small groups of cooperative users.B. FreeNet All peers are identical and have ability of working either as a client or as a server. Because of its symmetric architecture, it is FreeNet [11] is an adaptive pure peer-to-peer storage sys- called pure peer-to-peer. Each node has two main components:tem for publication, replication, anonymity of authors/readers Chord/Dhash for reliable P2P distributed storage and Ivywhile retrieving data. Like FreeHaven, ﬁrst goal of FreeNet is Server for transferring data between peers. This architecture isanonymity and privacy. However, the anonymity of FreeNet actually log based. Each peer has its own log that includes userdoes not stand for all network, it is just for ﬁle transaction be- information and changes in the ﬁle system. Thus for each NFScause FreeNet provides anonymity at application layer instead operation a log is created that is stored by Chord/DHash. Sinceof transport layer. Thus, discovering source and destination is they are immutable and are kept inﬁnitely, peers can withdrawinfeasible. The other goals of FreeNet is deniability, resistance, any changes. This ﬂexibility is one of the best properties ofefﬁciency and decentralization. Ivy. All users can read any logs though some ﬁle permission The nodes in the peer-to-peer FreeNet network, query a attributes.ﬁle that is represented by a location independent key that While a ﬁle system is created, a set of logs is created andis obtained from hash functions for anonymity. Each node a group of peers is set upon these logs. An entry pointing tomaintains each local store that is accessible for others to read a ﬁle’s log is put on a view array. This array is traversed byand write and have dynamic routing table that includes other all peers in order to create a snapshot. The logs are ordered inpeers’ address with their own keys. Whenever a node receives the array and peers use them for records. Thus some users cana request, it ﬁrstly checks its local store. If it exists, it returns use one of the logs concurrently. This cause conﬂicts, sincedata, otherwise it forwards the request to the node that has the Ivy permits concurrent write operations. For this purpose, Ivynearest key in the routing table. Furthermore, if the request uses close-to-open consistency in a group of peers. In thisbecomes successful, intended data will return like the request. consistency, the Ivy server waits for Dhash which will receiveWhile data is retrieved, a node on the way also caches this data new log receipts in order to commit a modify operation. Thenand inserts new key to its own routing table. This mechanism that modiﬁcation is announced. For each NFS operation, peersprovides transparent replication and increasing connectivity in take the latest view array from DHash. Then peers checkthe system. In order to cope with limited storage capacity concurrent view vectors that affect the same ﬁle by traversingefﬁciently, node storage is managed by LRU (Least Recently logs. In any conﬂict condition, differences are analyzed andUsed) that means data items are sorted based on time of most merged. For ﬁle modiﬁcation an optimistic approach is used,recent request. Therefore, lastly requested data will be at the although for ﬁle creation locking approach is used. Thusend of the queue. This mechanism does not ensure long term when the number of users is increased, performance will besurvivability for less-interested ﬁles. decreased. Because of limited scalability [1], Ivy is suited for The FreeNet protocol is packet-oriented and uses self- a small group of users.contained messages. Each message contains hops-to-live limit, Every user stores a log of their modiﬁcations and at adepth counter and randomly generated transactionID. It makes speciﬁed time interval, it generates a snapshot, a process whichthe corresponding ﬁle traceable by nodes. Hops-to-live is set requires them to retrieve logs from all participating users.by the sender of the message and it prevents indeﬁnite message Although retrieving logs of all peers cause a bottleneck inforwarding. Depth counter is used for setting a sufﬁcient performance, peers can freely change a ﬁle system regardlessnumber of hop-to-live to ensure that the request will reach of other peers’ state. The immutable and indeﬁnitely storedits destination. Thus, it is incremented at each node. These logs can be used for withdrawing changes. But this operationthree values are used for inserting, retrieving and requesting is highly costed. As a result, Ivy is distributing its storage butoperations. In order to supply anonymity, it uses probabilistic it only supports a limited write-once/read-many interface [1].routing that does not direct communication towards speciﬁcreceivers. D. Frangipani Since probabilistic routing is used for providing anonymity, Frangipani [13] is a high performance distributed storageperformance and reliability is not addressed. Like FreeHaven, that is utilized by a cooperative group of users. It is not ain order to supply anonymous communication, performance is pure peer to peer system, since there is an administrator. It isscariﬁed. However, because of dynamic storage and routing, aimed to minimize operations of the administrator that meansFreeNet network is highly scalable [3]. Moreover it is robust Frangipani keeps it simple while many nodes are joining [1].against big failures. Moreover, it is designed to be used in an institution that has secure and private network. Thus, it is not so scalable.C. Ivy However, it provides to users a good performance, since it Ivy [12] is another peer-to-peer storage system with ﬁle stripes data between servers by increasing performance in thesystem like interface. There is no centralized or dedicated number of active servers. Frangipani can also be conﬁguredcomponent, thus each user is on the same level. Although to replicate data [1]. Therefore, it offers redundancy andmany other peer-to-peer storage systems just support either resilience to failures. This is a crucial property for volunteerread or write operations for one owner, Ivy supports both read computing systems.

4.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 4 Frangipani has three main components. The ﬁrst one is its simplicity by providing correct read-write and shared-writethe Petal Server which provides a virtual disk interface to semantics between clients via synchronous I/O, and extendingdistributed storage. It looks like a local storage, thus it supports the application interface to relax consistency for performancea transparent interface to users since distributed storage is conscious distributed applications. File and directory metadatahidden. The second component is the Distributed Locking in Ceph is very small, almost only directory entries (ﬁleService. It supports consistency in the manner of multiple names) and inodes (80 bytes) in comparison with conventionalreaders - single writer locking philosophy. There are two types ﬁle systems, where no ﬁle allocation metadata is necessary. Inof locks, the read and the write. When there are multiple Ceph, object names are constructed using the inode number,changes on a ﬁle, this service makes them serial to keep and distributed to OSDs using CRUSH. In order for Cephconsistency by using these locks. Since Frangipani ensures to distribute large amount of data a strategy is adapted thatall ﬁle in consistent state by locking mechanism, it fairly distributes new data randomly, migrates a random subsampledegrades its performance. The third component is Frangipani of existing data to new devices and uniformly redistributesFile Server Module that provides a ﬁle system like an interface. data from removed devices. To maintain system availabilityIt communicates with other components to be in a consistent and ensure data safety in a scalable fashion, RADOS (Reli-state with determined block capacity. Moreover, Fragipani able Autonomic Distributed Object Store) manages its ownFile Server deploys write-ahead redo logging of meta-data for replication of data using a variant of primary-copy replica-recovery. When an error is detected in the File Server, the tion. In order to provide data safety, when acknowledginglogged data that is written in a special area in Petal Server updates, RADOS allows Ceph to realize low-latency updatesis used for recovery. This mechanism makes Frangipani more for efﬁcient application synchronization and well-deﬁned datarobust with replication mechanism. safety semantics. For certain failures, such as disk errors or As a result, Frangipani is a distributed ﬁle system that can be corrupted data, OSDs can self-report. Failures that make anscalable in terms of size and performance. However, network OSD unreachable on the network, however, require activecapacity is a barrier on its performance, because of its design monitoring, which RADOS distributes by having each OSDissue. One of the biggest design problems in Frangipani is monitor those peers with which it shares Placement Groups.that it assumes secure interconnection in order to scale and To facilitate fast recovery, OSDs maintain a version number foroperate within an institution [1]. Because of this issue, it each object and a log of recent changes (names and versions ofdoes suffer not only from performance but also from non- updated or deleted objects) for each Placement Group. Cephscalability. Besides, it makes an assumption that all nodes in OSD manages its local object storage with EBOFS, an Extentthe system are trusted, and thus it can not supply a secure and B-tree based Object File System.system. Subsequently, the locking mechanism for keeping By Ceph’s shedding design assumptions, like allocationconsistency of the system can cause a dramatic performance lists, data are totally separated from metadata management,drop. allowing them to scale independently. RADOS leverages in- telligent OSDs to manage data replication, failure detection and recovery, low-level disk allocation, scheduling, and dataE. Ceph migration without giving a burden on any central server. Ceph [16] is a distributed ﬁle system that provides excellent Finally, Ceph’s metadata management architecture provides aperformance, reliability and scalability and separates data and single uniform directory hierarchy, which obeys the POSIXmetadata in a maximum manner. It leverages the intelligence semantics, with scaling performance as new metadata serversin Object Storage Devices (OSD) to distribute the complexity join the system.surrounding data access and utilizes a highly adaptive dis-tributed metadata cluster architecture, improving scalabilityand reliability. F. TFS Ceph eliminates ﬁle allocation tables and lists and replaces TFS [17] provides background tasks with large amounts ofthem with generating functions. It comprises of Clients, Clus- unreliable storage without an impact on the performance ofters of OSD (which stores all data and metadata) and Metadata standard ﬁle access operations. It allows a peer-to-peer storageserver clusters (which manages the namespace: ﬁles and direc- system to provide more storage and double its performance. Ittories). File data are stripped onto predictably named objects has an impact on replication in peer-to-peer storage systems.using a special purpose data distribution, CRUSH (Controlled The problem with contributory storage systems is that the ap-Replication Under Scalable Hashing), which assigns objects to plication performance degrades. As more storage is activated,storage devices. Novel metadata cluster architecture distributes the ﬁle system operations quickly degrade and this is whyresponsibility for managing the ﬁle system directory hierarchy. TFS tries to adapt transparency, which is the non burdeningClients run on each host executing application code and effect on the system performance as contributory processes areexposing a ﬁle system interface to applications. The code is run running. Another problem is that disks are often half emptyentirely to user space, and can be accessed either by linking to and user are not keen to contribute freely their free space. TFSit directly or as a mounted ﬁle system. CRUSH maps data onto is a system that contributes all of the idle space while keeping aa sequence of objects. If one or more clients open a ﬁle for very low load on the performance of the local user’s system.read access, an MDS grants them the capability to read and It stores ﬁles in the ﬁle systems free space and minimizescache ﬁle content. The Ceph synchronization model retains interference with ﬁle system’s block allocation policy. Other

5.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 5normal ﬁles can overwrite the contribution ﬁles at any time. is assigned to the nodes and Tapestry (a self-organizing routingIn addition there is no impact on the bandwidth needed and object location subsystem) uses local-neighbor maps tofor replication. TFS is useful for replicated storage systems route messages to their destination NodeID, digit by digit.executing on stable machines with plenty of bandwidth. (This When an OceanStore server inserts a replica into the system,environment is similar to the one used in FARSITE). In a Tapestry publishes its location by putting a pointer to thestable network TFS can offer essentially more storage than replica’s location at each hop between the new replica anddynamic. A small contribution of storage gives little impact the object’s root node. In order to locate an object, a clienton the ﬁle system’s performance and so TFS ensures the routes a request to the object’s root until it encounters a replicatransparency of contributed data. In exchange for performance pointer, which routes directly to that replica.it sacriﬁces ﬁle persistence as it provides good ﬁle systems When a node wants to join, it chooses a random NodeIDperformance by minimizing the amount of work needed by and a node close to itself. Through routing from this NodeID,the system when writing ordinary ﬁles. It records which blocks ﬁnds other existing nodes that share length sufﬁxes, generateshave been overwritten by marking them as overwritten. If an full routing table and all the neighbors are notiﬁed. When aoverwritten ﬁle is tried to be open, the system returns an error node disappears, neighbors are detecting the absence and theyand the inode/directory entry for that ﬁle is deleted and it is use backpointers to inform relying nodes. In addition a serverdenoted as free. Every time a ﬁle is deleted the TFS detects can be removed from OceanStore when it becomes obsolete,and replicates the ﬁle returning error to peers. needs schedule maintenance or has component failures. A TFS leaves the allocation for local ﬁles intact, avoiding shutdown script to inform the system of server removal isissues of fragmentation; TFS stores ﬁles in such a way that executed. Even if this script is not used OceanStore will detectthey are completely transparent to local access. TFS consis- and correct the server’s absence. OceanStore’s design providestently provides at least as much storage without overloading scalability, fault tolerance, self-maintaining and distributedlocal performance. TFS can provide about 40 per cent more storage through adaptation.storage than the best user-space technique, in the case whenthe network is quite stable and enough bandwidth is available. H. AntiquityThis may create questions concerning availability but TFSprimarily depends on a distributed system characteristics, such Antiquity [14] provides storage services for ﬁle systemsas machine availability, bandwidth and the amount of storage and backup applications. It is a wide-area distributed storageavailable. system that its design assumes that all servers eventually will fail and tries to keep the data integrity even with these failures. Antiquity was developed in the context of OceanStore.G. OceanStore In its model the client can be an end-user machine, the OceanStore [6] is a global storage infrastructure which server in a client-server system or a replicated service. Theautomatically recovers from failures of servers and network, system identiﬁes the client and its append-only log from aputs new resources easily into the system and adjusts to cryptographic key pair. A log is stored in chunks and when ausage patterns. It combines erasure codes with a Byzantine new chunk needs to be allocated the administrator is consulted,agreement protocol for consistent update serialization, even who authenticates the client and selects a set of storagewhen malicious servers are present. servers that can host the new chunk. In order to maintain OceanStore consists of individual servers, each cooperating data securely, high availability and most of all stored datato provide a service. Such a group of servers is called a pool. integrity, it uses a secure log which replicates on multipleData ﬂows freely between these pools, thus creating replicas servers. This way durability is ensured in a way that no dataof a data object to anywhere, increasing availability. Because is lost and all logs can be read. In the case that some logsOceanStore is composed of untrusted servers, it utilizes redun- are not modiﬁable due to the failure of some servers or lackdancy and client-side cryptographic techniques to protect data. of replicas, a quorum repair protocol replaces lost replicasOceanStore attacks the problem of storage-level maintenance and eventually restores modiﬁability. In addition Antiquitywith four mechanisms: a self-organizing routing infrastructure, uses dynamic Byzantine fault-tolerant quorum (threshold) tom-of-n data coding with repair, Byzantine update commit- provide consistency among replicas. When the data is repli-ment, and introspective replica management. Erasure coding cated on multiple servers, it can be retrieved later even ontransforms a block of input data into fragments, which are server failures. What is more, Antiquity uses distributed hashspread over many servers; only a fraction of the fragments tables to connect the storage servers and to monitor livenessare needed to reconstruct the original block. A replica of an and availability of servers. It stores only pointers that identifyobject must be exactly the same as the original, despite any servers in which the actual data are stored.failures or corruption of fragments. OceanStore resolves this Antiquity’s design pursues integrity, incremental secureby naming each object and its associated fragments by the write and Random read access, durability, consistency andresult of a secure hash function on the contents of the object, efﬁciency with low overhead. The results from a simulationcalled globally unique identiﬁer (GUID). A node can act as a showed that from almost all checks done, a quorum of serversserver that stores objects, as a client that initiates requests, was reachable and in a consistent state, and thus providing aas a router that forwards messages or as all of these. A high degree of availability and consistency. The quorum repairunique identiﬁer NodeID (location and semantics independent) process balances the availability and consistency even more.

6.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 6Concerning the scalability issue, as each log uses a single the coordinator who is responsible of all operations sendsadministrator and multiple instances are allowed the role of the the vector clock to reachable nodes that are selected in aadministrator scales well and different logs can use different preference order list. Writing operations are done, accordingadministrators. to the receiving number of responses. Namely, this mechanism is based on quorums. Lastly, if a node does not give anyI. BigTable response, it is supposed to be in failure mode. When it is removed from the ring, all surrounded nodes are adjusted to BigTable [18] is a large-scale distributed storage system for the new state.managing structured data. It is built on top of several existing Dynamo is targeted to come with solutions of main prob-Google technologies such as Google File System, Chubby, lems of database management, such as scalability, availability,and Sawzal and used by many Google’s online services. reliability and performance. While it offers highly-availableThe contributors have as primary goals the achievement of and scalable system, it keeps performance high with handlingﬂexibility, high performance and availability. failures. However, reaching anonymous system is not targeted Essentially, BigTable is a "sparse, distributed, persistent in Dynamo.multi-dimensional sorted map" that indexes each row, columnand timestamp tuple to an array of bytes[19]. Data in BigTableis maintained in tables that are partitioned into row ranges K. MongoDBcalled tablets. Tablets are the units of data distribution and MongoDB [22] is a scalable, high-performance, openload balancing in BigTable. The Bigtable constitutes of three source, document-oriented structured storage system. It pro-major components: a library that is linked into every client, vides document-oriented storage with full index support, auto-one master server, and many tablet servers, each one of sharding, sophisticated replication, and compatibility with thethem managing some number of tablets. Different versions Map/Reduce paradigm.of data are sorted using timestamp. BigTable supports single- Instead of storing data in tables and rows as it is regularlyrow transactions, which can be used to perform atomic read- done with relational databases, in MongoDB data is storedmodify-write sequences on data stored under a single row key. with dynamic schemas. The goal of MongoDB is to bridge In overall, Bigtable is tremendously scalable, offering data the gap between key-value stores and relational databases.availability and high performance to its users. However, it does MongoDB has two separate constructs for multi-node topolo-not deal with issues like security among the nodes, and fault- gies, which are often combined in the highest-performancetolerance. systems: replica sets and shared replica sets. Replica sets are an asynchronous cluster replication technology, and shardingJ. Dynamo is an automatic data distribution system. Increasing the number of instances in a replica set provides horizontal scalability for Dynamo is a key-value storage system that provides keys read performance and fault-tolerance. Increasing the numberto value mapping. It is developed and managed by Amazon of shares (each one being a replica set) allows the distributionthat makes it a proprietary database [21]. However, it is of distinct data to provide horizontal scalability for writeprovided to some foundations’ research such as Cassandra. performance.High-availability and scalability are the main design issues of MongoDB has similar features with relational databases,Dynamo. It has incremental scalability that means one node like indexes and dynamic queries. It accomplishes availabilitycan be scaled at a time. Moreover, there is not any central as it supports asynchronous replication of data between serversadministrator and all nodes are on the same level. and it also features a backup and repair mechanism using jour- Dynamo is a combined form of both distributed hash naling which increases durability and robustness. Changingtables(DHTs) and databases [20]. The created keys by hashing the data model from relational to document-oriented providesdata are stored in circular system structure. While they are greater agility through ﬂexible schemes and easier horizontalstored, the nearest node in clockwise direction is selected to scalability.be assigned. Moreover, there are virtual nodes that mimica node but they are responsible for more than one node.This mechanism provides incremental scalability by solving L. Riakthe partitioning problem. Dynamo has effective replication Riak [23] is a key-value storage systems that is inspired bymechanism in order to increase availability of the data in Dynamo. Like Dynamo, it is distributed, highly-available andthe system. In this mechanism, each data is replicated to scalable. It uses map-reduce mechanism to reduce functionalits speciﬁed number of successors. Therefore, each node has limitations of key-value and to increase power of querying overreplicated data of its predecessors. In addition, system may stored data in the Riak system. Riak provides fault-toleranthave more than one versions of a ﬁle to increase availabil- service to its users and this property increases its robustnessity. However since it causes an inconsistency, vector clocks level.are used to determine causal relationship between different Since it is inspired by Amazon’s Dynamo storage systemversions. These properties increases Dynamo’s durability, as that is analyzed above, Riak has many similarities with it.well as availability. Besides "Always writable" property is It includes both databases storage and distributed hash tablestargeted by Dynamo, this is the second reason of using vector (DHTs). Like Dynamo, by using consistent hashing methods,clocks. When a user wants to do a write operation, ﬁrstly keys are mapped to its ring system. Thus all nodes on this ring

7.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 7are identical. Whenever a node joins the network, it is assigned description allows us to say that it follows a loose P2P scheme,to deﬁne key range partitions. Then it is replicated to reach a since not everybody have the role of a peer. However, it ismore available system. Like dynamo, write and read operations described as scalable despite the number of hosts that joinare done based on quorums. Concurrent operation requests and leave the system consecutively. Because of the replicationare not handled by the help of locks because of performance technique, data is persistent and available. It is also observedissues. Instead of a lock mechanism, vector clocks are used high consistency in the system.in order to make system strong against failures and keep TotalRecall could be used for Volunteer Computing in thesystem consistent. Another powerful point of Riak is using case of lazy repair is chosen with erasure code. With thesemap-reduce method in querying. Using this method, request options, TotalRecall performs better when having dynamicmessages are directed to a set of nodes instead of propagating environments and high possibility of unavailability.over all nodes. Riak has symmetric structure in the node manner since it O. Farsitedoes not have any super or master node among all nodes. Farsite system [26] is a serverless, distributed storage sys-Moreover it meets some design issues of intended decen- tem that runs on a set of machines and takes advantage of theirtralized storage systems such as high availability, scalability unused storage and network resources. Although, it providesand robustness. However, anonymity is not handled in this the semantics of a central NTFS ﬁle server, its able to scaledesign, since it is relatively a new system, and it has many and run on several machines using a portion of their storage.compatibility problems. Users have access to private and public ﬁles through a location-transparent environment. Data replicas are encryptedM. Pastis to provide security since the nodes themselves are not secure. Pastis [24] is a completely decentralized P2P ﬁle system Moreover, these replicas are distributed among several nodeswith multiple users performing read and write operations. It to provide a reliable system despite the unreliability anduses the Past, a highly-scalable P2P storage service, which frequent unavailability of the nodes. The ﬁles structure is basedprovides a distributed hash table abstraction. It combines on a hierarchy, maintained by a distributed directory service.Past with Pastry, a P2P key-based routing algorithm, to route Atomicity and scalability are two important properties onmessages between large amounts of Past nodes. the Farsite system. All tasks are designed as fully atomic For every ﬁle, Pastis keeps an inode in which the ﬁle’s actions in order to remain undivided while they get executed.metadata is stored. Each inode is stored in User Certiﬁcate Farsite could be used for Volunteer Computing, since the man-Blocks (UCB) and ﬁles contents are stored in Content Hash agement operations can be distributed among the machines,Blocks (CHB). When a user writes to the ﬁle, the version security is provided because of the encryption algorithm used.counter is increased and saved to the corresponding inode with Though, it could be used only for small volunteer computingthe user’s id. To avoid conﬂicts, if a second user appears and systems, since it can scale up to a certain number of nodes.tries to write to the same ﬁle, a procedure is triggered to solvethe conﬂict by comparing the counters and users’ ids from P. Storage@homeother replicas in the network. Storage@home [27] is a distributed storage infrastructure The combination of the Past and the Pastry characterizes designed to store huge amounts of data across many machinesPastis as a highly-scalable system in terms of network size which join the system as volunteers. It is based on the Fold-and amount of concurrent clients. Good locality helps in ing@home and it made its appearance to face the problems ofacquiring optimized routes, while self-organization as well this previous system. More precisely, the contributors addressas fault tolerance are achieved thanks to the design. Data is the problems of backing up and distributing data efﬁcientlyreplicated among the nodes and therefore it is characterized among the nodes, keeping in mind the limited bandwidth andby high data availability. A write access control and data the small donation of storage from each node.integrity are implemented and therefore Pastis is secure since Storage@home constitutes of the volunteers - who haveit is assumed that users trust each other. an agent installed on their machines, a registration server, a metadata server, an identity server and a policy engine.N. TotalRecall The Metadata Server is responsible to store information about TotalRecall [25] is a P2P storage system that takes into the location of the ﬁles stored in the system and to allowhigh consideration an important property of storage systems; queries for those ﬁles. The Identity Server is responsible forthe availability. The system administrator can specify an avail- the security and identity functionality, as well as for trackingability target and studying the previous behavior of the peers, it effectively the location of IP hosts; whether they are mobilecan predict their future availability, despite the dynamic chang- or dynamic. The Registration Server is responsible to linking nature of the environment. Depending on the condition of the users’ proﬁles from the old system; the Folding@homethe system, TotalRecall may use replication, erasure-code or with this new proposed system. This task is hard to gethybrid techniques for preserving its redundancy, while it can implemented since a beneﬁcial aspect of Storage@home is thedynamically repair itself using eager or lazy repair. anonymity and the intentional omission of user’s information. Except from the peers, the TotalRecall system constitutes The Policy Engine behaves as the master of the system, as itof the master host, the storage host and the client host. This coordinates all the components of the system. It is responsible

8.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 8to plan where to put replicas of data in order to minimize in order to do a task. Thus, this system does not provide athe chances of data loss, how data can be retrieved and how symmetric node network and it is not proper for volunteerto be transferred to reach the node that has sent a query. It computing systems. Also, In MongoDB there are three kind ofalso remains vigilant to perform repair operations when it is nodes: Standard, Passive, and Arbiter. Similarly to MongoDB,needed. BigTable has master nodes and many chunk servers. Also, Storage@home has vital requirements that help it preserve Antiquity contains the role of an administrator among theits nature as a storage system as well as a volunteer computing peers, who is responsible for the new chunk allocation ofsystem. As a storage system, it should handle effectively ﬁle logs. Thus MongoDB, BigTable and Antiquity are notfailure and recovery operations, and as a volunteer computing symmetric in the node manner. Moreover, Farsite is based onsystem it should manage the relocation of data stored in host a centralized scheme. Some nodes have - for a period of timethat disappeared. While maintaining the above requirements, - authority on some ﬁles, their content, directory, and usersthe authors needed to face several challenges regarding the permissions. Similarly, TotalRecall constitutes of differentvolunteers recruiting and motivation, the policy risk and the types of nodes; each type having different responsibilitieshost relocation. With respect to recruiting volunteers and regarding the ﬁles. Therefore, in both systems other nodeskeeping them motivated, the system needed to be adopted as cannot work freely, without the permission of other "master"a reward system that offers points to volunteers in order to nodes. Antiquity does not provide symmetry as it has the rolemotivate and put them in a friendly competition that makes of the administrator among the peers who is responsible for thethem have fun among them. Regarding the policy risk, it new chunk allocation of a log. Last but not least, in OceanStorewas quite common for storage@home to get blocked by nodes can have different roles, such as a server, a client, acompanies, ISPs and new policies. Therefore the storage of router or all of them, thus it is not symmetric.replicas in different nations, states and ISPs appeared to be a The rest of the systems, as it can be seen in Table 1,fair solution. Last but not least, host relocation was another constitute of equal nodes and are subsequently characterizedgreat challenge that needed to be considered. The system had as symmetric.to deal with hundreds of students who were changing residence- most of the times decreasing their bandwidth - and becoming 2) Availability: In volunteer computing systems, partici-slower and less effective. Also, the problem of switching off pants can enter and leave from the system in random timethe machine for a long time for traveling or maintenance periods. In order to retrieve data, the intended storage systemspurposes cost to the system and consequently a penalization should be highly available, despite the unavailability of thepolicy was introduced to make the volunteers more responsible participants.at informing the system for any changes in there condition. In Most of the systems analyzed are highly available as it isgeneral this system appears to be reliable as it manages to shown in the Table 1. Though, the FreeHaven system presentsprevent the loss of data. It is able to work with thousands of limited level of availability, since there is no replicationvolunteers showing its great scalability and its functionality mechanism, but only periodical trading which makes datais preserved with the existence of churn. Internet connections available. Similarly, FreeNet has limited availability becauseappear as the bottleneck in the system performance showing of lack of replication mechanisms and also because it suffersthat any other possible pitfalls of the system are not signiﬁcant from long term survivability, especially for non-popular ﬁles.as they can not bypass the bandwidth problem. DHash component of Ivy makes it highly available, since DHash replicates and distributes the blocks of ﬁles. Thus V. D ISCUSSION participants logs can be available even if they are not available themselves. Moreover, Frangipani has cluster member com- All systems described offer storage distribution following ponents that are large abstract containers on highly availabledifferent approaches and architectures. In this section, we block level. These cluster members make Frangipani highlydiscuss up to what extend these systems have the properties available. Ceph accomplishes availability using RADOS whichthat are needed in volunteer computing systems. In Table 1 manages data replication following a primary-copy replicationwe gather all systems and characteristics together, showing a scheme and also provides update synchronization of the data.clear view of their state. In OceanStore, one of its main goals is to provide availability, as data ﬂows freely and thus replicas for the data are created. 1) Symmetry: As previously mentioned, in pure peer-to- Antiquity uses a secure log which is distributed among mul-peer systems, all peers are on the same level with equivalent tiple servers, thus providing a high degree of availability andfunctionality. Since each volunteer participant does not have ensures that all data can be accessed. If for any reason somepriority over other participants and although they are con- data is lost, a repair service is available for recovery.trolled by central server of the system, intended distributed Furthermore, Farsite system replicates data in order tosystems should be purely peer-to-peer. ensure availability even with the often unavailability of the In the world of storage systems, scientists have trouble nodes. Likewise, Pastis implements a lazy replication protocolwith presenting systems with "independent" nodes who work to manage replicas in different nodes. TotalRecall has aswithout the guidance of an administrator. In Fragipani ﬁle a main goal the provision of availability, and it suggestssystem, there is an administrator who arranges states of nodes, different ways to ensure that, such as redundancy managementand nodes need to take permission from the administrator with speciﬁed mechanisms, replication, dynamic repairs in

9.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 9 Characteristics Systems Symmetry Availability Scalability Anonymity Robustness FreeHaven Yes Mid Low High High FreeNet Yes Mid High Mid High Ivy Yes High Mid No High Frangipani No High High No High Ceph Yes High High No High OceanStore No High High No High Antiquity No High High No High BigTable No High High No High Dynamo Yes High High No High MongoDB No High High No High Riak Yes High High No High Pastis Yes High High No High TotalRecall No High High No High Farsite No High Mid No High Storage@home Yes High High High High Tablo I C OMPARISON OF DIFFERENT S TORAGE S YSTEMSthe case of nodes are leaving permanently from the system. 5) Robustness: By deﬁnition, volunteers can come and go, may crash or change their network status. Therefore, volunteer 3) Scalability: Scalability is an additional property re- computing systems - and by extension storage systems - shouldquired. There are three main scaling techniques: Replication be enough robust to face these situations.for spreading copies of data, caching for reusing the cached All systems studied are highly-robust, thanks to variousdata and distribution of divided computation [5]. Thus the reasons and mechanisms. In FreeHaven, while peers are inintended decentralized storage systems should have replication trading, copies of data are stored in a while until provingor a similar mechanism. trustworthy. Although this mechanism is not good for perfor- Of the systems studied, only three of them do not show high mance, it increases robustness of FreeHaven. Moreover, buddyresults in the scalability issue. FreeHaven and Ivy do not have system makes it robust, since buddies of each node can regen-the scalability characteristic as their primary goal and therefore erate the lost data. Frangipani uses write-ahead redo loggingthey are not highly scalable. Farsite is limited to scale up to mechanism to recovery failures easily. In Freenet protocol, a ˆ105 nodes, which is quite restrictive. failure message is forwarded to owner of the request without propagating to any nodes. Thus original requester can make Unlike to these systems, Frangipani is designed as highly- another request. By the help of this property of the Freenetscalable. Petal services competent works cooperatively to protocol, it will be robust against the failures.supply virtual disks to its user are distributed in order to MongoDB and Riak has replication mechanisms that makesincrease scalability. Also, The rest of the storage systems these systems large-scaled and they are fault-tolerant. Theseare classiﬁed as large-scale storage systems since they are characteristics of them provide highly-robust systems. Likespecially designed to offer scalability. them, BigTable and Dynamo have great robustness since they are highly-scalable. 4) Anonymity: Participants in volunteer computing systems Ceph has a very good mechanism for disk failure monitoringwant to keep secret their identities from others. Thus, intended and detection as well as fast recovery using different structuresdistributed systems should provide anonymity. From our re- for the ﬁle system and by keeping a version number for eachsearch, we found out that most of the systems do not support object. In addition, OceanStore’s main goal is to provide a highanonymity, as it was not in their main concerns. level of failure recovery providing fault tolerance and self- Systems like FreeHaven, FreeNet offer anonymity as they maintenance mechanisms with automatic repair. Antiquity’sfocus in their participants needs. They propose to keep users quorum repair recovers failures and replaces lost replicasidentity, thus they increase resistance against censorship. In which makes the system quite robust.fact, for this purpose they scarify efﬁciency. Like them, users Storage@home provides self-repair operations for eachin Ceph and Storage@home are anonymous. Moreover, in node involved. Pastis takes advantage of the fault toleranceCeph the code runs directly from the user space and the property of the storage layer that it is based on, the Pastprocesses RADOS and CRUSH are executed without revealing DHT . In TotalRecall, things are even easier. Since it dealsany information about the identity of the client, even when data primarily with availability, it addresses this issue using repairare distributed. mechanisms which help as well for preserving robustness. Anonymity is not a design issue in Frangipani. Thus each The Farsite system was designed in that way that it handlesuser in the Frangipani ﬁle system are noticeable and can be Byzantine faults and therefore be more robust.detected easily. Like Frangipani, large-scale decentralized TFS is mainly a ﬁle system that works underneath storagestorage systems such as Dynamo, Riak, BigTable, and systems. Its availability and anonymity are dependent on theMongoDB do not handle anonymity as a design issue. nodes state and whether the nodes by themselves can be available and anonymous. Thus, it is not included in our