Nowadays, the quantity of data that is created every two days is
estimated to be 5 exabytes. This amount of data is similar to the
amount of data created from the dawn of time up until 2003. Moreover, it
was estimated that 2007 was the first year in which it was not possible
to store all the data that we are producing. This massive amount of
real time streaming data opens new challenging discovery tasks. Some of
them are already addressed with mature algorithms, while new challenges
emerge, including learning on not one but multiple streams. This
tutorial has two parts. The first part gives an introduction to recent
advances in algorithmic techniques and tools to cope with challenges on
stream mining. The second part discusses state of the art research on
mining multiple streams – distributed streams and interdependent
relational streams.

Concept drift plays a central role in this tutorial. In the first part,
we address it in the context of conventional one-stream mining to set
the scene. In the second part, we recapitulate on it after introducing
multiple-stream mining, and we also consider machine learning methods
that are appropriate for incremental data and slow streams.

NOTICE: This tutorial is longer than the others ECML-PKDD 2012 tutorials.

Presenters

Albert Bifet. Researcher at Yahoo! Research Barcelona. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the core developers of MOA software environment for implementing algorithms and running experiments for online learning from evolving data streams.

João Gama.
Researcher at LIAAD, University of Porto, working at the Machine
Learning group. His main research interest is in Learning from Data
Streams. He published more than 80 articles. He served as Co-chair of
ECML 2005, DS09, ADMA09 and a series of Workshops on KDDS and Knowledge
Discovery from Sensor Data with ACM SIGKDD. He is author of a recent
book on Knowledge Discovery from Data Streams.

Ricard Gavaldà. Professor at the Department of Software, U. Politècnica de Catalunya – BarcelonaTech. He has published over 70 papers and supervised 7 Ph.D. students. His current research interests are algorithmics of machine learning and data mining, with emphasis on streaming and adaptive methods. He is also working on the use of data mining in autonomic and green computing.

Georg Krempl. Postdoc researcher in the Knowledge Management & Discovery (KMD) lab at the Otto-von-Guericke-University Magdeburg, Germany. Doctorate from University of Graz, Austria. Main research interest is learning on evolving, drifting data. Has given several courses on data mining, statistics and optimization for students from different degrees at Univ. Graz and since 2011 at Univ. Magdeburg.

Mykola Pechenizkiy.
Assistant Professor at the Department of Computer Science, Eindhoven
University of Technology, the Netherlands. He has broad research
interests in data mining and its application to various (adaptive)
information systems serving industry, commerse, medicine and education.
He has been organizing several workshops and conferences in these areas.

Bernhard Pfahringer.
Associate Professor with the Computer Science Department of the
University of Waikato. His main research interests are in Machine
Learning and Data Mining, especially in efficient algorithms, stream
mining, randomization, and applications.

Indrė Žliobaitė.
Lecturer in computational intelligence at Bournemouth University, UK
and a research task leader within the INFER.eu project. Her research
interests and competences concentrate around online predictive modeling,
context awareness and adaptation over time, predictive analytics
applications.

Acknowledgements: I. Žliobaitė’s involvement has been supported by the EC funded INFER project within the Marie Curie Industry and Academia Partnerships and Pathways (IAPP) programme under the grant agreement no. 251617.