Data Management in Distributed Stream Processing Systems

4.11 - 1251 ratings - Source

Dynamic data-driven applications need to ingest and react to large amounts of information about their environment. In response, the scientific community is adopting on-the-fly data stream processing to avoid the large wait times involved in storing data to disk or a database temporarily while processing data through a reduction/analysis pipeline. The stream processing systems must be highly efficient and scalable in processing data that vary in size, metadata, information content and importance. The dynamic nature of data streams introduces significant and interesting challenges for stream provenance, asynchronous stream joins and missing stream data. This dissertation addresses these challenges. The proposed solutions are implemented and verified in the Calder stream processing system, a continuous query grid service that enables application web services to submit long running, continuously executing queries on data streams. Stream provenance is addressed by both an information model and a collection model, which enables recording of the system activities with minimal increase in real-time processing latency. This approach is validated by experimentally quantifying the perturbation overhead of provenance collection and the scalability of the prototype provenance service implemented in Calder. The challenge of memory conservation in asynchronous stream joins is addressed using a rate-sizing algorithm that sets the join-window size in sliding window joins based on stream rates. A performance study of the rate-sizing algorithm using a realistic workload makes an argument for the use of time-based join window sizes, and for dynamic adaptation of join window sizes in response to stream rates. The problem of temporary gaps in stream data is addressed by a data estimation approach using Kalman filters. Experimental results show that the Kalman filter approach enables real-time one pass prediction of incoming events with good accuracy.Linear Kalman filtering method is used in the AForecast algorithm to
approximately generate optimal forecasting ... Financial Prediction: An adaptive
Kalman filter was used in predicting market data in [62]. ... An example of using a
single-constraint-at-a-time Kalman filter is the HiBall Tracking System discussed
in [110].

Title

:

Data Management in Distributed Stream Processing Systems

Author

:

Publisher

:

ProQuest - 2007

ISBN-13

:

You must register with us as either a Registered User before you can Download this Book. You'll be greeted by a simple sign-up page.

Once you have finished the sign-up process, you will be redirected to your download Book page.