The success of the P2P idea has created a huge diversity of approaches, among which overlay networks, for example, Gnutella, Kazaa, Chord, Pastry, Tapestry, P-Grid, or DKS, have received specific attention from both developers and researchers. A wide variety of algorithms, data structures, and architectures have been proposed. The terminologies and abstractions used, however have become quite inconsistent since the P2P paradigm has attracted people from many different communities, e.g., networking, databases, distributed systems, graph theory, complexity theory, biology, etc. In this paper we propose a reference model for overlay networks which is capable of modeling different approaches in this domain in a generic manner It is intended to allow researchers and users to assess the properties of concrete systems, to establish a common vocabulary for scientific discussion, to facilitate the qualitative comparison of the systems, and to serve as the basis for defining a standardized API to make overlay networks interoperable.

We present a framework for building and evaluating P2P systems in simulation, local execution, and distributed deployment. Such uniform system evaluations increase confidence in the obtained results. We briefly introduce the Kompics component model and its P2P framework. We describe the component architecture of a Kompics P2P system and show how to define experiment scenarios for large dynamic systems. The same experiments are conducted in reproducible simulation, in real-time execution on a single machine, and distributed over a local cluster or a wide area network. This demonstration shows the component oriented design and the evaluation of two P2P systems implemented in Kompics: Chord and Cyclon. We simulate the systems and then we execute them in real time. During real-time execution we monitor the dynamic behavior of the systems and interact with them through their web-based interfaces. We demonstrate how component-oriented design enables seamless switching between alternative protocols.

Message-passing concurrency (MPC) is increasingly being used to build systems software that scales well on multi-core hardware. Functional programming implementations of MPC, such as Erlang, have also leveraged their stateless nature to build middleware that is not just scalable, but also dynamically reconfigurable. However, many middleware platforms lend themselves more naturally to a stateful programming model, supporting session and application state. A limitation of existing programming models and frameworks that support dynamic reconfiguration for stateful middleware, such as component frameworks, is that they are not designed for MPC.

In this paper, we present Kompics, a component model and programming framework, that supports the construction and composition of dynamically reconfigurable middleware using stateful, concurrent, message-passing components. An added benefit of our approach is that by decoupling our component execution model, we can run the same code in both simulation and production environments. We present the architectural patterns and abstractions that Kompics facilitates and we evaluate them using a case study of a non-trivial key-value store that we built using Kompics. We show how our model enables the systematic development and testing of scalable, dynamically reconfigurable middleware.

At the core of any distributed system is a set of concurrent distributed algorithms that coordinate the functionality of the distributed system. We present a software architecture, Kompics that is component-based and compositional which facilitates building distributed protocols. The underlying computation model subsumes that of event-based systems, SEDA (staged event-driven architecture.) and thread-based models. We illustrate various salient features of Kompics such as ease of use, compositionality and configurability through a series of well chosen distributed protocols.

Distributed key-value stores employed in data centers treat each key-value pair as a shared memory register. For fault-tolerance and performance, each key-value pair is replicated. Various models exist for the consistency of data amongst the replicas. While atomic consistency, also known as linearizability, provides the strongest form of consistency for read and write operations, various key-value stores, such as Cassandra, and Dynamo, offer only eventual consistency instead. One main motivation for such a decision is performance degradation when guaranteeing atomic consistency. In this paper, we use time with known bounded uncertainty to improve the performance of write operations, while maintaining atomic consistency. We show how to use the concept of commit wait in a shared memory register to perform a write operation in one phase (message round trip), instead of two. We evaluate the solution experimentally by comparing it to ABD, a well-known algorithm for achieving atomic consistency in an asynchronous network, which uses two phases for write operations. We also compare our protocol to an eventually consistent register. Our experiments show an improved throughput, and lower write latency, compared to the ABD algorithm.

Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor. We present Flink's core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink's backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and largedeployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.

8.

Carbone, Paris

et al.

KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.

Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots.

In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user-defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all.

In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.

The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India.

There has been much recent interest in information services that offer to manage an individual's healthcare records in electronic form, with systems such as Microsoft HealthVault and Google Health receiving widespread media attention. These systems are, however, proprietary and fears have been expressed over how the information stored in them will be used. In relation to these developments, countries with nationalized healthcare systems are also investigating the construction of healthcare information systems that store Electronic Health Records (EHRs) for their citizens.

Internet Connectivity Establishment (ICE) is becoming increasingly important for P2P systems on the open Internet, as it enables NAT-bound peers to provide accessible services. A problem for P2P systems that provide ICE services is how peers discover good quality ICE servers for NAT traversal, that is, the TURN and STUN servers that provide relaying and hole-punching services, respectively. Skype provides a P2P-based solution to this problem, where super-peers provide ICE services. However experimental analysis of Skype indicates that peers perform a random walk of super-peers to find one with an acceptable round-trip latency. In this paper, we discuss a self organizing approach to discovering good quality ICE servers in a P2P system based the walk Topology. The walk Topology uses information about each peer’s ability to provide ICE services (open IP address, available bandwidth and expected session times) to construct a topology where the “better” peers for providing ICE services cluster in the center of the topology; this adaptation of the super-peer search space reduces the problem of finding a good quality ICE server from a random walk to a gradient ascent search.

We present the SC-ABD algorithm that implements sequentially consistent distributed shared memory (DSM). The algorithm tolerates that less than half of the processes are faulty (crash-stop). Compared to the multi-writer ABD algorithm, SC-ABD requires one instead of two round-trips of communication to perform a write operation, and an equal number of round-trips (two) to perform a read operation. Although sequential consistency is not a compositional consistency condition, the provided correctness proof is compositional.

This paper gives a brief summary of our experience in applying a physics-style approach for analyzing the behavior of structured overlay networks that deploy self-organization and self-repair policies. Such systems are not always simple to model analytically and simulation of scales of interest can be prohibitive. Physicists deal with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach proved its substantial usefulness when applied to satisfiability theory and it is the hope that it can be as useful in the field of large-scale distributed systems. We report here our finding of one simple self-organization-related intensive variable, and a more complex self-repair-related intensive variable.

In the majority of structured peer-to-peer overlay networks a graph with a desirable topology is constructed. In most cases, the graph is maintained by a periodic activity performed by each node in the graph to preserve the desirable structure in face of the continuous change of the set of nodes. The interaction of the autonomous periodic activities of the nodes renders the performance analysis of such systems complex and simulation of scales of interest can be prohibitive. Physicists, however, are accustomed to dealing with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach has proved its usefulness when applied to satisfiability theory. This work is the first attempt to apply it in the area of distributed systems. The contribution of this paper is two-fold. First, we describe a methodology to be used for analyzing the performance of large scale distributed systems. Second, we show how we applied the methodology to find an intensive variable that describe the characteristic behavior of the Chord overlay network, namely, the ratio of the magnitude of perturbation of the network (joins/failures) to the magnitude of periodic stabilization of the network.

In the majority of structured peer-to-peer overlay networks a graph with a desirable topology is constructed. In most cases, the graph is maintained by a periodic activity performed by each node in the graph to preserve the desirable structure in face of the continuous change of the set of nodes. The interaction of the autonomous periodic activities of the nodes renders the performance a﻿nalysis of such systems complex and simulation of scales of interest can be prohibitive. Physicists, however, are accustomed to dealing with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach has proved its usefulness when applied to satisfiability theory. This work is the first attempt to apply it in the area of distributed systems. The contribution of this paper is two-fold. First, we describe a methodology to be used for analyzing the performance of large scale distributed systems. Second, we show how we applied the methodology to find two intensive variables that describe the characteristic behavior of the Chord overlay network, the variables are: 1) The density of nodes in the identifier space and 2) The ratio of the magnitude of perturbation of the network (joins/failures) to the magnitude of periodic stabilization of the network.

19.

El-Ansary, Sameh

et al.

Swedish Institute of Computer Science (SICS).

Haridi, Seif

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

In this paper, we present a complete analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a master-equation-based approach, which is used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. We demonstrate that this methodology is infact also well suited to describing structured overlay networks by an application to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of: the distribution of inter-node distances, the probability of network disconnection, the fraction of failed or incorrect successor and finger pointers and show how we can use these quantities to predict both the performance and consistency of lookups under churn. Additionally, we also discuss how churn may actually be of different 'types' and the implications this will have for structured overlays in general. All theoretical predictions match simulation results to a high extent. The analysis includes details that are applicable to a generic structured overlay deploying a ring as well as Chord-specific details that can act as guidelines for analyzing other systems.

In this paper, we present a complete analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a master-equation-based approach, which is used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. We demonstrate that this methodology is infact also well suited to describing structured overlay networks by an application to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of: the distribution of inter-node distances, the probability of network disconnection, the fraction of failed or incorrect successor and finger pointers and show how we can use these quantities to predict both the performance and consistency of lookups under churn. Additionally, we also discuss how churn may actually be of different ’types’ and the implications this will have for structured overlays in general. All theoretical predictions match simulation results to a high extent. The analysis includes details that are applicable to a generic structured overlay deploying a ring as well as Chord-specific details that can act as guidelines for analyzing other systems.

Locating entities in peer-to-peer environments is a fundamentaloperation. Recent studies show that the concept of distributed hash table can be used to design scalable lookup schemes with good performance (i.e. small routing table and lookup length). In this paper, we propose a simple framework for deriving decentralized lookup algorithms. The proposed framework is simple in that it is based on the well-known concept of k-ary search. To demonstrate the applicability of our framework, we show how it can be used to instantiate Chord. When deriving a generalized Chord from our framework, we obtain better performance in terms of the routing table size (38% smaller than the generalization suggested by the Chord authors).

In this position paper, we present an efficient algorithm for performing a broadcast operation with minimal cost in structured DHT-based P2P networks. In a system of N nodes, a broadcast message originating at an arbitrary node reaches all other nodes after exactly N - 1 messages. We emphasize the perception of a class of DHT systems as a form of distributed k-ary search and we take advantage of that perception in constructing a spanning tree that is utilized for efficient broadcasting. We consider broadcasting as a basic service that adds to existing DHTs the ability to search using arbitrary queries as well as dissiminate/collect global information.

Structured peer-to-peer systems rely on replication as a basic means to provide fault-tolerance in presence of high churn. Most select replicas using either multiple hash functions, successor-lists, or leaf-sets. We show that all three alternatives have limitations. We present and provide full algorithmic specification for a generic replication scheme called symmetric replication which only needs O(1) message for every join and leave operation to maintain any replication degree. The scheme is applicable to all existing structured peer-to-peer systems, and can be implemented on-top of any DHT. The scheme has been implemented in our DKS system, and is used to do load-balancing, end-to-end fault-tolerance, and to increase the security by using distributed voting. We outline an extension to the scheme, implemented in DKS, which adds routing proximity to reduce latencies. The scheme is particularly suitable for use with erasure codes, as it can be used to fetch a random subset of the replicas for decoding.

25.

Ghodsi, Ali

et al.

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

Haridi, Seif

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

Alima, Luc Onana

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

Structured peer-to-peer systems have emerged as infrastructures for resource sharing in large-scale, distributed, and dynamic environments. One challenge in these systems is to efficiently maintain routing information in the presence of nodes joining, leaving, and failing. Many systems use costly periodic stabilization protocols to ensure that the routing information is up-to-date.In this paper, we present a novel technique called correction-on-change, which identifies and notifies all nodes that have outdated routing information as a result of a node joining, leaving, or failing. Effective failure handling is simplified as the detection of a failure triggers a correction-on-change which updates all the nodes that have a pointer to the failed node. The resulting system has increased robustness as nodes with stale routing information are immediately updated.We proof the correctness of the algorithms and evaluate its performance by means of simulation. Experimental results show that for the same amount of maintenance bandwidth correction-on-change makes the system by far more robust when compared to periodic stabilization. Moreover, compared to adaptive stabilization which adjusts its frequency to the dynamism in the system, correction-on-change gives the same performance but with considerably less maintenance bandwidth. As correction-on-change immediately updates incorrect routing entries the average lookup length is maintained close to the theoretical average in the presence of high dynamism. We show how the technique can be applied to our DKS system as well as the Chord system.

We present two broadcast algorithms that can be used on top of distributed hash tables (DHTs) to perform group communication and arbitrary queries. Unlike other P2P group communication mechanisms, which either embed extra information in the DHTs or use random overlay networks, our algorithms take advantage of the structured DHT overlay networks without maintaining additional information. The proposed algorithms do not send any redundant messages. Furthermore the two algorithms ensure 100% coverage of the nodes in the system even when routing information is outdated as a result of dynamism in the network. The first algorithm performs some correction of outdated routing table entries with a low cost of correction traffic. The second algorithm exploits the nature of the broadcasts to extensively update erroneous routing information at the cost of higher correction traffic. The algorithms are validated and evaluated in our stochastic distributed-algorithms simulator.

When evaluating Peer-to-Peer content distribution systems by means of simulation, it is of vital importance to correctly mimic the bandwidth dynamics behaviour of the underlying network. In this paper, we propose a scalable and accurate flow-level network simulation model based on an evolution of the classical progressive filling algorithm which follows the max-min fairness idea. We build on top of the current state of the art by applying an optimization to reduce the cost of each bandwidth allocation/deallocation operation on a node-based directed network model. Unlike other works, our evaluation of the chosen approach focuses both on efficiency and on accuracy. Our experiments show that, in terms of scalability, our bandwidth allocation algorithm outperforms existing directed models when simulating large-scale structured overlay networks. In terms of accuracy we show that allocation dynamics of our proposed solution follow those of the NS-2 packet-level simulator by a small and nearly constant offset for the same scenarios. To the best of our knowledge, this is the first time that an accuracy study has been conducted on an improvement of the classical progressive filling algorithm.

Currently, the development of overlay network systems typically produces two software artifacts: a simulator to model key protocols and a production system for a WAN environment. However, this methodology requires the maintenance of two implementations, as well as adding both development overhead and the potential for errors, through divergence in the different code bases. This paper describes how our message-passing component model, called Kompics, is used to build overlay network systems using a P2P component framework, where the same implementation can be simulated or deployed in a production environment. Kompics enables two different modes of simulation: deterministic simulation for reproducible debugging, and emulation mode for stress-testing systems. We used our P2P component framework to build and evaluate overlay systems, and we show how our model lowers the programming barrier for simulating and deploying overlay network systems.

Peer-to-peer systems have recently received tremendous amount of popularity in both research and commercial endeavors. This paper argues for the systematic exploration of a hybrid of centralized and peer-to-peer system design. We give an example application of peer-to-peer architecture to an inherently centralized service and show how this application raises an interesting research question in the field of epidemic information dissemination. We propose a previously unexplored push mechanism for the distribution of updates for system software that exists in millions of copies.

Structured peer-to-peer overlay networks have recently emerged as good candidate infrastructure for building novel large-scale and robust Internet applications in which participating peers share computing resources as equals. In the past three year, various structured peer-to-peer overlay networks have been proposed, and probably more are to come. We present a framework for understanding, analyzing and designing structured peer-to-peer overlay networks. The main objective of the paper is to provide practical guidelines for the design of structured overlay networks by identifying a fundamental element in the construction of overlay networks: the embedding of k–ary trees. Then, a number of effective techniques for maintaining these overlay networks are discussed. The proposed framework has been effective in the development of the DKS system, whose preliminary design appears in [2].

In this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale distributed systems. Despite the common goals, the two areas have, however, been relatively isolated. We focus on three problem domains where there is an untapped potential of using gossiping combined with SONs. We argue for applying gossip-based membership for ring-based SONs---such as Chord and Bamboo---to make them handle partition mergers and loopy networks. We argue that small world SONs---such as Accordion and Mercury---are specifically well-suited for gossip-based membership management. The benefits would be better graph-theoretic properties. Finally, we argue that gossip-based algorithms could use the overlay constructed by SONs. For example, many unreliable broadcast algorithms for SONs could be augmented with anti-entropy protocols. Similarly, gossip-based aggregation could be used in SONs for network size estimation and load-balancing purposes.

Key/value stores which are built on structured overlay networks often lack support for atomic transactions and strong data consistency among replicas. This is unfortunate, because consistency guarantees and transactions would allow a wide range of additional application domains to benefit from the inherent scalability and fault-tolerance of DHTs. The Scalaris key/value store supports strong data consistency and atomic transactions. It uses an enhanced Paxos Commit protocol with only four communication steps rather than six. This improvement was possible by exploiting information from the replica distribution in the DHT. Scalaris enables implementation of more reliable and scalable infrastructure for collaborative Web services that require strong consistency and atomic changes across multiple items.

We define a practical algorithm for distributed rational tree unification and prove its correctness in both the off-line and on-line cases. We derive the distributed algorithm from a centralized one, showing clearly the trade-offs between local and distributed execution. The algorithm is used to realize logic variables in the Mozart Programming System, which implements the Oz language (see http://www.mozart-oz.org). Oz appears to the programmer as a concurrent object-oriented language with dataflow synchronization Logic variables implement the dataflow behavior. We show that logic variables can easily be added to the more restricted models of Java and ML, thus providing an alternative way to do concurrent programming in these languages. We present common distributed programming idioms in a network-transparent way using logic variables. We show that in common cases the algorithm maintains the same message latency as explicit message passing. In addition, it is able to handle uncommon cases that arise from the properties of latency tolerance and third-party independence. This is evidence that using logic variables in distributed computing is beneficial at both the system and language levels. At the system level, they improve latency tolerance and third-party independence. At the language level, they help make network-transparent distribution practical.

Much progress has been made in distributed computing in the areas of distribution structure, open computing, fault tolerance, and security. Yet, writing distributed applications remains difficult because the programmer has to manage models of these areas explicitly. A major challenge is to integrate the four models into a coherent development platform. Such a platform should make it possible to cleanly separate an application's functionality from the other four concerns. Concurrent constraint programming, an evolution of concurrent logic programming, has both the expressiveness and the formal foundation needed to attempt this integration. As a first step, we have designed and built a platform that separates an application's functionality from its distribution structure. We have prototyped several collaborative tools with this platform, including a shared graphic editor whose design is presented in detail. The platform efficiently implements Distributed Oz, which extends the Oz language with constructs to express the distribution structure and with basic primitives for open computing, failure detection and handling, and resource control. Oz appears to the programmer as a concurrent object-oriented language with dataflow synchronization. Oz is based on a higher-order, state-aware, concurrent constraint computation model.

37.

Havelka, Dragan

et al.

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

Schulte, Christian

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

Brand, P

Haridi, Seif

KTH, School of Information and Communication Technology (ICT), Microelectronics and Information Technology, IMIT.

Strong mobility enables migration of entire computations combining code, data, and execution state (such as stack and program counter) between sites of computation. This is in contrast to weak mobility where migration is confined to just code and data. Strong mobility is essential for many applications where reconstruction of execution states is either difficult or even impossible: load balancing, reduction of network latency and traffic, and resource-related migration, just to name a few. This paper presents a model, programming abstractions, implementation, and evaluation of thread-based strong mobility. The model extends and takes advantage of a distributed programming model based on automatic synchronization through dataflow variables. It comes as a natural extension of dataflow computing which carefully separates issues concerning distribution and mobility. The programming abstractions capture various migration scenarios which differ in how the source and destination site relate to the site initiating migration. The implementation is based on replicating concurrent lightweight threads between sites controlled by migration managers.

Distributed Hash Tables (DHT) with order-preserving hash functions require load balancing to ensure an even item-load over all nodes. While previous item-balancing algorithms only improve the load imbalance, we argue that due to the cost of moving items, the competing goal of minimizing the used network traffic must be addressed as well. We aim to improve on existing algorithms by augmenting them with approximations of global knowledge, which can be distributed in a DHT with low cost using gossip mechanisms. In this paper we present initial simulation-based results from a decentralized balancing scheme extended with knowledge about the average node load. In addition, we discuss future work including a centralized auction-based algorithm that will be used as a benchmark.

Many SMEs and non-profit organizations suffer when their Web servers become unavailable due to flash crowd effects when their web site becomes popular. One of the solutions to the flash-crowd problem is to place the web site on a scalable CDN (Content Delivery Network) that replicates the content and distributes the load in order to improve its response time. In this paper, we present our approach to building a scalable Web Hosting environment as a CDN on top of a structured peer-to-peer system of collaborative web-servers integrated to share the load and to improve the overall system performance, scalability, availability and robustness. Unlike cluster-based solutions, it can run on heterogeneous hardware, over geographically dispersed areas. To validate and evaluate our approach, we have developed a system prototype called DOH (DKS Organized Hosting) that is a CDN implemented on top of the DKS (Distributed K-nary Search) structured P2P system with DHT (Distributed Hash table) functionality [9]. The prototype is implemented in Java, using the DKS middleware, the Jetty web-server, and a modified JavaFTP server. The proposed design of CDN has been evaluated by simulation and by evaluation experiments on the prototype.

Peer-to-peer (P2P) mechanisms allow users with limited resources to distribute content to a large audience, without the need of intermediaries.

These P2P mechanisms, however, appear to be ill-suited for mobile devices, given their limited resources: battery, bandwidth, and connectivity. Even Spotify, a commercial straming service where desktop clients stream about 80% of the data via P2P, does not use P2P on mobile devices.

This paper describes Tribler Mobile, a mobile app that allows users to broadcast their own videos to potentially large audiences

directly from their devices. Our system delegates most of the distribution tasks to boosters running on desktop computers. Our mechanisms are designed to be fully-decentralized and consider mobile devices’ limitations.

Tribler Mobile is available as open-source software and have been installed by almost 500 users on their Android devices.

Spotify is a large-scale peer-assisted music streaming service. Spotify’s P2P network serves 80% of music data to desktop clients. On the other hand, the rapidly growing number of mobile clients do not use P2P but instead stream all data from Spotify’s servers.

We enable P2P on a Spotify mobile client and empirically eval- uate the impact of P2P protocols (in particular low-bandwidth traffic between peers) on energy consumption, both on 3G and

Wifi. On 3G, current P2P protocols are highly energy inefficient, but simple modifications bring consumption close to the client-server configuration. On Wifi, the extra energy cost of enabling P2P is much lower.

Finally, we propose a protocol modification to further integrate mobile devices in Spotify’s P2P network according to their capa- bilities (power source, access network). This allows us to break the artificial division between desktop and mobile platforms and dynamically adapt as resources become (un)available to the device.

Iterative computations are in the core of large-scale graph processing. In these applications, a set of parameters is continuously refined, until a fixed point is reached. Such fixed point iterations often exhibit non-uniform computational behavior, where changes propagate with different speeds throughout the parameter set, making them active or inactive during iterations. This asymmetrical behavior can lead to a many redundant computations, if not exploited. Many specialized graph processing systems and APIs exist that run iterative algorithms efficiently exploiting this asymmetry. However, their functionality is sometimes vaguely defined and due to their different programming models and terminology used, it is often challenging to derive equivalence between them. We describe an optimization framework for iterative graph processing, which utilizes dataset dependencies. We explain several optimization techniques that exploit asymmetrical behavior of graph algorithms. We formally specify the conditions under which, an algorithm can use a certain technique. We also design template execution plans, using a canonical set of dataflow operators and we evaluate them using real-world datasets and applications. Our experiments show that optimized plans can significantly reduce execution time, often by an order of magnitude. Based on our experiments, we identify a trade-off that can be easily captured and could serve as the basis for automatic optimization of large-scale graph-processing applications.

Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

What makes a Web site popular?2004In: Communications of the ACM, ISSN 0001-0782, E-ISSN 1557-7317, Vol. 47, no 2, p. 51-55Article in journal (Refereed)

Abstract [en]

Several factors which affect the popularity of the web sites are discussed. A computational model involving two superimposed interaction networks with random connections is developed which links the sites as well as organizes social interactions among internet users. It is suggested that understanding of information flows and connection networks surroundings is necessary to ensure a constant increase in user interest, loyalty and market share. It is also recommended that E-marketers should investigate and leverage the long-term rampifications of information network structures for predicting the behavior of internet users towards their organizations' web sites.

This paper describes a novel language independent model for distribution of language entities, which allows for fine-grained instrumentation of entity consistency protocols on a per-entity basis. The model is implemented as a middleware component, designed to enhance arbitrary high-level programming languages with distribution support on the language entity level. The middleware library is extendable using internal interfaces to add new protocols over three different aspects of distribution.

This paper presents the design and implementation of a decentralized home-migration protocol for the Distribution SubSystem(DSS) middleware. The DSS provides generic distribution support for shared data structures in open distributed systems. Previous approaches for migrating homes, such as forward pointers, broadcasts, and centralized directory services are known to have disadvantages. We propose using a structured P2P system to store the location of migrated homes. This enables seamless migration of homes, without the need for lengthy forward pointer chains that degrade performance and robustness. Nor is a dedicated data-base that requires administrative effort needed. The presented design depicts how the self-organizing aspects of peer-to-peer computing can be used to construct a fault-tolerant, scalable, efficient, distributed home-location service to enhance middleware functionality.

The paper presents a modular design of a distribution middleware that supports the wide variety of entities that exist in high level languages. Such entities are classified into mutables, immutables and transients. The design is factorized in order to allow multiple consistency protocols for the same entity type, and multiple coordination strategies for implementing the protocols that differ in their failure behavior. The design is implemented and evaluated. It shows a very competitive performance.

This paper describes a novel language independentmodel for distribution of language entities, which allowsfor fine-grained instrumentation of entity consistencyprotocols on a per-entity basis. The model is implementedas a middleware component, designed to enhancearbitrary high-level programming languages withdistribution support on the language entity level. Themiddleware library is extendable using internal interfacesto add new protocols over three different aspectsof distribution.

One of the problems of middleware for shared state is that they are designed, explicitly or implicitly, for symmetric networks. However, since the Internet is not symmetric, end-to-end process connectivity cannot be guaranteed. Our solution to this is to provide the middleware with a network abstraction layer that masks the asymmetry of the network and provides the illusion of a symmetric network. We describe the communication service of our middleware, the Distribution Subsystem (DSS), which carefully separates connections to remote processes from the protocols that communicate over them. This separation is used to plug-in a peer-to-peer module to provide symmetric and persistent connectivity. The P2P module can provide both up-to-date addresses for mobile processes as well as route discovery to overcome asymmetric links.

We introduce a scheme for distributed garbage collectionthat is an extension of Weighted Reference Counting. This scheme represents weights as fractions. It solves the problem of limited weight, preservesthe property of third-party independence, and does not induce extra messages for reference merging.