Introduction

The Digital Age has brought about unprecedented growth in the amount
of data being generated, the number of data consumers, and the diversity
of their interests and locations. Traditionally, users poll
sources for information, but for many applications, polling is hardly
scalable and may miss important events. The alternative offered by
publish/subscribe systems is to push notifications to
users with matching interests. This approach suits many applications,
ranging from personal, commercial, medical, to environmental, military,
and security. However, traditional publish/subscribe systems are
becoming inadequate for advanced applications, where users want to
receive information that has been filtered, joined, and summarized, and
only when certain conditions are met.

This project aims at building a next-generation
publish/subscribe system to face the new challenges. We are developing
an end-to-end solution consisting of techniques from subscription
processing and indexing to dissemination network design, which work
together to support efficient and powerful subscription functionalities,
allowing users to control precisely what they want and when they want it.

One main feature distinguishing our approach from previous work is joint consideration of subscription processing and notification dissemination.
Traditionally, these problems are considered separately by database and
networking communities. However, there exists a wide spectrum of
interesting alternatives for interfacing processing with dissemination.
We propose a promising novel approach called reforumulation
that allows complex, stateful subscriptions to be handled by simple,
stateless dissemination mechanisms, with a clean system design that is
easy to implement and scale. A cost-based optimizer, inspired by
database query optimization, chooses the best processing and
dissemination strategies jointly and dynamically.

Besides system building, this project tackles many new
algorithmic challenges, including, e.g., scalably processing a large
number of complex subscriptions; exploiting event and subscription
characteristics to combat worst-case complexity; balancing semantic
similarity and network proximity in dissemination network design; and
efficiently maintaining statistics for high-dimensional events and
subscriptions.

Progress

In the first year of this project, we made progress on the following specific research problems: (1) ProSem
system development and demonstration; (2) scalable processing and
dissemination of select-join subscriptions; (3) dissemination network
design for wide-area publish/subscribe; (4) scalable processing and
dissemination of value-based notification conditions; (5)
input-sensitive scalable continuous join query processing; (6) querying
uncertain data; (7) maintaining data summaries. A detailed description
of our contributions can be found below in our 2007-2008 project report.

In the second year of this project, we continued to
make progress on (1), (3), (5), (6), and (7). In addition, we worked on
the following problems: (8) a generator for wide-area content-based
publish/subscribe workloads, which we are making public; (9) extending
our ideas and framework beyond relational data and queries to
information retrieval and extraction context; (10) data aggregation and
scheduling over a network. A detailed description of our contributions
can be found below in our 2008-2009 project report.

In the third year of this project, we wrapped up our
work on (8), and embarked on the following problems: (11) dissemination
network design for wide-area publish/subscribe; (12) scalable support
for range top-k subscriptions; (13) scalable support for top-k
preference subscriptions. A detailed description of our contributions
can be found below in our 2009-2010 project report.

In the fourth and final year of this project, we
wrapped up our work on (11), (12), (13), and further studied (14)
subscriptions that arise in the context of computational journalism;
(15) finding one-of-the-few patterns. A detailed description of these
efforts can be found below in our final project report.

Badrish Chandramouli and Jun Yang. "End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems." In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB '08), Auckland, New Zealand, August 2008. Acceptance rate: 16.5%.Available for download: paper.

P. K. Agarwal, T. Mølhave, L. Arge, and M. Revsbæk, "Scalable
algorithms for large high-resolution terrain data", First
International Conference on Computing for Geospatial Research and
Applications, 2010.