Abstract

As the amount of data in todays Internet is growing larger, users are exposed to too much information, which becomes increasingly more difficult to comprehend. Publish/subscribe systems leverage this problem by providing loosely-coupled communications between producers and consumers of data in a network. Data consumers, i.e., subscribers, are provided with a subscription mechanism, to express their interests in a subset of data, in order to be notified only when some data that matches their subscription is generated by the producers, i.e., publishers. Most publish/subscribe systems today, are based on the client/server architectural model. However, to provide the publish/subscribe service in large scale, companies either have to invest huge amount of money for over-provisioning the resources, or are prone to frequent service failures. Peer-to-peer overlay networks are attractive alternative solutions for building Internet-scale publish/subscribe systems. However, scalability comes with a cost: a published message often needs to traverse a large number of uninterested (unsubscribed) nodes before reaching all its subscribers. We refer to this undesirable traffic, as relay overhead. Without careful considerations, the relay overhead might sharply increase resource consumption for the relay nodes (in terms of bandwidth transmission cost, CPU, etc) and could ultimately lead to rapid deterioration of the system’s performance once the relay nodes start dropping the messages or choose to permanently abandon the system. To mitigate this problem, some solutions use unbounded number of connections per node, while some other limit the expressiveness of the subscription scheme.
In this thesis work, we introduce two systems called Vitis and Vinifera, for topic-based and content-based publish/subscribe models, respectively. Both these systems are gossip-based and significantly decrease the relay overhead. We utilize novel techniques to cluster together nodes that exhibit similar subscriptions. In the topic-based model, distinct clusters for each topic are constructed, while clusters in the content-based model are fuzzy and do not have explicit boundaries. We augment these clustered overlays by links that facilitate routing in the network. We construct a hybrid system by injecting structure into an otherwise unstructured network. The resulting structures resemble navigable small-world networks, which spans along clusters of nodes that have similar subscriptions. The properties of such overlays make them an ideal platform for efficient data dissemination in large-scale systems. The systems requires only a bounded node degree and as we show, through simulations, they scale well with the number of nodes and subscriptions and remain efficient under highly complex subscription patterns, high publication rates, and even in the presence of failures in the network. We also compare both systems against some state-of-the-art publish/subscribe systems. Our measurements show that both Vitis and Vinifera significantly outperform their counterparts on various subscription and churn scenarios, under both synthetic workloads and real-world traces.