Decomposable Graphical Models are of high relevance for complex industrial applications. The Markov network approach is one of their most prominent representatives and an important tool to decompose uncertain knowledge in high dimensional domains. But also relational and possibilistic decompositions turn out to be useful to make reasoning in such domains feasible. Compared to conditioning a decomposable model on given evidence, the learning of the structure of the model from data as well as the fusion of several decomposable models is much more complicated. The important belief change operation revision has been almost entirely disregarded in the past, although the problem of inconsistencies is of utmost relevance for real world applications. In this talk these problems are addressed by presenting several successful complex industrial applications.

Bio

Rudolf Kruse is Professor at the Faculty of Computer Science at University of Magdeburg in Germany. He obtained his Ph.D. and his Habilitation in Mathematics from the Technical University of Braunschweig in 1980 and 1984 respectively. Following a stay at the Fraunhofer Gesellschaft, he joined the Technical University of Braunschweig as a professor of computer science in 1986. Since 1996 he is a professor in the Computational Intelligence Group in Magdeburg. He has coauthored 15 monographs and 25 books as well as more than 350 peer-refereed scientific publications in various areas with 16000 citations. He is associate editor of several scientific journals. Rudolf Kruse is Fellow of the International Fuzzy Systems Association (IFSA), Fellow of the European Association for Artificial Intelligence (EURAI/ECCAI ), and Fellow of the Institute of Electrical and Electronics Engineers (IEEE). His group is successful in various industrial applications in cooperation with companies such as Volkswagen, SAP, Daimler, and British Telecom. His current main research interests include data science and intelligent systems.

The psychological state of a person is characterised by cognitive and emotional variables which can be inferred by psychometric methods. Using the word lists from the Linguistic Inquiry and Word Count, designed to infer a range of psychological states from the word usage of a person, we studied temporal changes in the average expression of psychological traits in the general population. We sampled the contents of Twitter in the United Kingdom at hourly intervals for a period of four years, revealing a strong diurnal rhythm in most of the psychometric variables, and finding that two independent factors can explain 85% of the variance across their 24-h profiles. The first has peak expression time starting at 5am/6am, it correlates with measures of analytical thinking, with the language of drive (e.g power, and achievement), and personal concerns. It is anticorrelated with the language of negative affect and social concerns. The second factor has peak expression time starting at 3am/4am, it correlates with the language of existential concerns, and anticorrelates with expression of positive emotions. Overall, we see strong evidence that our language changes dramatically between night and day, reflecting changes in our concerns and underlying cognitive and emotional processes. These shifts occur at times associated with major changes in neural activity and hormonal levels.

Bio

Fabon obtained his PhD in Computer Science in 2013 “on Learning and Representation from Texts for both Emotional and dynamical Information” at the University of Pierre et Marie Curie, in the DAPA department at LIP6. After graduating he held a short post-doctoral position in LIP6, working on building interpretable models for the classification of multivariate time series’ data. At this time he grew an interest in the analysis of time series’ data, and in the Fourier transform as a mean to extract meaningful features from data. He later joined the University of Bristol as a research associate in 2014 where he worked on efficient machine learning algorithms for data streams, and developed tools to study our human behaviours at a collective level via the analysis of the social media and large samples of press archives. He combined his works on information dynamics and his interest in the study of emotions to research periodic patterns of emotions and mental health. His results provide evidence that a share of the variance in our collective behaviours and emotions are predictable across the year, and over the 24-h cycle.

This talk describes our efforts to abstract from the animal visual system the computational principles to explain images in video. We develop a hierarchical, distributed architecture of dynamical systems that self-organizes to explain the input imagery using an empirical Bayes criterion with sparseness constraints and dual state estimation. The interpretation of the images is mediated through causes that flow top down and change the priors for the bottom up processing. We will present preliminary results in several data sets.

Bio

Jose C. Principe (M’83-SM’90-F’00) is a Distinguished Professor of Electrical and Computer Engineering and Biomedical Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs) modeling. He is BellSouth Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu . His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab has been studying signal and pattern recognition principles based on information theoretic criteria (entropy and mutual information).

Dr. Principe is an IEEE Fellow. He was the past Chair of the Technical Committee on Neural Networks of the IEEE Signal Processing Society, Past-President of the International Neural Network Society, and Past-Editor in Chief of the IEEE Transactions on Biomedical Engineering. He is a member of the Advisory Board of the University of Florida Brain Institute. Dr. Principe has more than 800 publications. He directed 92 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled “Neural and Adaptive Systems” published by John Wiley and Sons and more recently co-authored several books on “Brain Machine Interface Engineering” Morgan and Claypool, “Information Theoretic Learning”, Springer, and “Kernel Adaptive Filtering”, Wiley.

In this talk I will describe two of the main subjects the data science team at Dailymotion is focusing on. I will first start by describing how a video is automatically characterized in terms of verticals (sport, music, ...) and topics (coming from wikipedia) using multi-modal approaches based on the sound and the images of the video but also the text characterizing it. In a second step, I will describe how we are able to pick out of a 250 million video catalog the most accurate videos for millions of users especially using sequence models for session-based recommendation

Bio

Yves Mabiala is a data scientist leading the data science team at Dailymotion. He is currently working working on large scale recommendation problems and content characterization from raw signals (audio, video).
Prior to Dailymotion he was working at Thales as a research scientist in the data science lab where he was focusing on large scale unsupervised anomaly detection in cyber-security, credit card fraud detection or unsupervised sequence learning especially applied to predictive maintenance.
He was also a member of the LIP6/Thales joint lab, where he was working with the ComplexNetwork team on studying the dynamics of large graphs but also with MLIA team on time series representation learning.

Circadian regulation of sleep, cognition, and metabolic state is driven by a central clock, which is in turn entrained by environmental signals. Understanding the circadian regulation of mood, which is vital for coping with day-to-day needs, requires large datasets and has classically utilised subjective reporting. We use a massive dataset of over 800 million Twitter messages collected over the course of 4 years in the United Kingdom. We extract robust signals of the changes that happened during the course of the day in the collective expression of emotions and fatigue. We use methods of statistical analysis and Fourier analysis to identify periodic structures, extrema, change-points, and compare the stability of these events across seasons and weekends. We reveal strong, but different, circadian patterns for positive and negative moods. The cycles of fatigue and anger appear remarkably stable across seasons and weekend/weekday boundaries. Positive mood and sadness interact more in response to these changing conditions. Anger and, to a lower extent, fatigue show a pattern that inversely mirrors the known circadian variation of plasma cortisol concentrations. Most quantities show a strong inflexion in the morning. Since circadian rhythm and sleep disorders have been reported across the whole spectrum of mood disorders, we suggest that analysis of social media could provide a valuable resource to the understanding of mental disorder.

Bio

Fabon defended his PhD thesis in Computer Science in 2013 “on Learning and Representation from Texts for both Emotional and dynamical Information” at the University of Pierre et Marie Curie, in the DAPA department at LIP6. After graduating he held a short post-doctoral position in LIP6, working on building interpretable models for the classification of multivariate time series’ data. At this time he grew an interest in the analysis of time series’ data, and in the Fourier transform as a mean to extract meaningful features from data. He later joined the University of Bristol as a research associate in 2014 where he worked on efficient machine learning algorithms for data streams, and developed tools to study our human behaviours at a collective level via the analysis of the social media and large samples of press archives. He combined his works on information dynamics and his interest in the study of emotions to research periodic patterns of emotions and mental health. His results provide evidence that a share of the variance in our collective behaviours and emotions are predictable across the year, and over the 24-h cycle.

In the context of the Linked Open Data effort, a significant number
of public SPARQL endpoints had been made available on the Web to provide
query-based access to various types of datasets. Many such endpoints have
sacrificed high availability because maintaining a server that provides a
reliable SPARQL endpoint is costly. To address this issue we have started
investigating approaches that shift some of the effort of executing queries
from the server to the clients; these approaches rely only on data access
interfaces that are limited to simple types of requests. In this two-parts
talk I will first introduce two such interfaces and present experimental
results that highlight their respective properties. Thereafter, in the second
part of the talk, I will introduce an abstract machine model that allows us to
study such client-server scenarios formally. I will present results of such a
study based on which we have drawn a fairly complete expressiveness lattice
that shows the interplay between several combinations of client and server
capabilities. Additionally, I will show the usefulness of our model to
formally analyze the fine-grain interplay between several metrics such as the
number of requests sent to the server, and the bandwidth of communication
between client and server.

Bio

Olaf is an Assistant Professor at the Department of Computer and
Information Science of Linköping University. He holds a Ph.D. in Computer
Science from the Humboldt-Universität zu Berlin, and worked previously as a
postdoctoral research fellow at the Cheriton School of Computer Science at the
University of Waterloo and, thereafter, at the Hasso Plattner Institute,
Potsdam. Olaf is interested in problems related to the management of data and
databases. His focus in this broad context is on data on the Web and on graph
data, as well as on problems in which the data is distributed over multiple,
autonomous and/or heterogeneous sources. Regarding these topics, Olaf's
interests range from systems-building related research (e.g., efficient storage
of data, query processing, and query optimization) all the way to theoretical
foundations (e.g., complexity and expressive power of query languages). Olaf
was honored with the SWSA Distinguished Dissertation Award in 2015 for his
Ph.D. dissertation “Querying a Web of Linked Data: Foundations and Query
Execution,” and he has received two best research paper awards (ESWC 2009 and
ESWC 2015). Olaf is leader or contributor of several open source projects,
most notably SQUIN, which is a novel query processing system for the Semantic
Web. He co-organized international research workshops, served on multiple
program committees, and participated as an invited expert in the provenance
incubator group and the provenance working group of the World Wide Web
Consortium.

Astronomy is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new sky surveys. To cope with a change of paradigm to data-driven science new computational intelligence, machine learning and statistical approaches are needed. In this talk I will present two main applications. The first is to discriminate periodic versus non-periodic light curves, and then estimate the period of the periodic ones. Light curves are one-dimensional time series of the brightness of a star versus time. We have developed several methods based on the correntropy function (generalized correlation using information theoretical learning concepts), which outperforms conventional approaches. Results using 32.8 million light curves will be presented. Interestingly, some of these techniques can be applied to other problems such as sleep EEG analysis, and I will present preliminary results on this topic too.
The second application is the automated real-time transient detection in astronomical images. The aim is to achieve real-time detection of supernovae and other transients with the Dark Energy Camera. A novel transient detection pipeline was developed. We have been applying convolutional neural nets (deep learning) to discriminate between true transients and bogus transients, among other techniques, e.g non-negative matrix factorization combined with random forests. Results using 1.5 million images will be presented. The new pipeline was successfully tested online in February 2015 finding more than 100 supernovae in a few days of telescope observation.

Bio

Pablo A. Estévez received his professional title in electrical engineering (EE) from Universidad de Chile, in 1981, and the M.Sc. and Dr.Eng. degrees from the University of Tokyo, Japan, in 1992 and 1995, respectively. He is a Full Professor with the Electrical Engineering Department, University of Chile, and former Chairman of the EE Department in the period 2006-2010.

Prof. Estévez is one of the founders of the Millennium Institute of Astrophysics (MAS), Chile, which was created in January 2014. He is currently leading the Astroinformatics/Astrostatistics group at MAS. He has been an Invited Researcher with the NTT Communication Science Laboratory, Kyoto, Japan; the Ecole Normale Supérieure, Lyon, France, and a Visiting Professor with the University of Tokyo.

Prof. Estévez is an IEEE Fellow. He is currently the President of the IEEE Computational Intelligence Society (CIS) for the term 2016-2017. He has served as IEEE CIS President-elect (2015), CIS Vice-president of Members Activities (2011-2014), CIS ADCOM Member-at-Large (2008-2010), CIS Distinguished Lecturer (2006-2011) and as an Associate Editor of the IEEE Transactions on Neural Networks (2007-2012).

Prof. Estévez served as conference chair of the International Joint Conference on Neural Networks (IJCNN), held in July 2016, in Vancouver, Canada, and general chair of the Workshop on Self-Organizing Maps (WSOM), held in December 2012, in Santiago, Chile. Currently he is serving as general co-chair of the 2018 IEEE World Congress on Computational Intelligence, WCCI 2018, to be held in Rio de Janeiro, Brazil, July 2018.

His current research interests include big data, deep learning, neural networks, self-organizing maps, data visualization, feature selection, information theoretic-learning, time series analysis, and advanced signal and image processing. One of his main topics of research is the application of computational intelligence techniques to astronomical datasets, and EEG signals.

Multi-Criteria aggregation is a pervasive problem appearing in many technological domains. During this presentation we shall discuss some issue related to this task. One issue is the modeling of multi-criteria decision functions and a related issue is the evaluation of these decision functions in the face of uncertain information. One case we shall consider is the evaluation of the OWA operator when the satisfaction to the individual criteria is expressed via a probability distribution. We shall also consider the case of interval criteria satisfactions. We shall look at the role of fuzzy measures in the modeling process. One issue that must be dealt with is the ordering of the complex uncertain criteria satisfactions that is required to use the Choquet integral in the criteria aggregation.

Bio

Ronald R. Yager is Director of the Machine Intelligence Institute and Professor of Information Systems at Iona College. He is editor and chief of the International Journal of Intelligent Systems. He has published over 500 papers and edited over 30 books in areas related to fuzzy sets, human behavioral modeling, decision-making under uncertainty and the fusion of information. He is among the world’s most highly cited researchers with over 57,000 citations in Google Scholar. He was the 2016 recipient of the IEEE Frank Rosenblatt Award the most prestigious honor given out by the IEEE Computational Intelligent Society. He was the recipient of the IEEE Computational Intelligence Society Pioneer award in Fuzzy Systems. He received the special honorary medal of the 50-th Anniversary of the Polish Academy of Sciences. He received the Lifetime Outstanding Achievement Award from International the Fuzzy Systems Association. He received honorary doctorate degrees, honoris causa, from the Azerbaijan Technical University and the State University of Information Technologies, Sofia Bulgaria. Dr. Yager is a fellow of the IEEE, the New York Academy of Sciences and the Fuzzy Systems Association. He has served at the National Science Foundation as program director in the Information Sciences program. He was a NASA/Stanford visiting fellow and a research associate at the University of California, Berkeley. He has been a lecturer at NATO Advanced Study Institutes. He was a visiting distinguished scientist at King Saud University, Riyadh Saudi Arabia. He was an honorary professor at Aalborg University in Denmark. He received his undergraduate degree from the City College of New York and his Ph. D. from the Polytechnic Institute New York University. He recently edited a volume entitled Intelligent Methods for Cyber Warfare.

Hervé Le Borgne is a researcher at the CEA LIST since 2006, carrying out
research on computer vision and multimedia retrieval. Previously, he
received his PhD from the INP Grenoble in 2004 and worked as a post-doc
at Dublin City university from 2004 to 2006. He published more than 50
articles in international conferences and journals. His research
interests include multimedia retrieval, computer vision, machine
learning and more generally multimedia mining in order to extract
semantic. He has served as a reviewer for several international
conferences and journals, including Computer Vision and Image
Understanding and Multimedia Tools and Applications. He has been a
project manager since 2006, both for public funded projects and
industrial contracts. He supervised 15 master students and co-advised
one PhD in collaboration with Ecole Centrale Paris. Currently, he
co-advises two PhD students, in collaboration with CNAM and Ecole
Centrale Paris.

Benjamin Labbé is a researcher at the CEA LIST since 2011, carrying out
transfer of technology and research on computer vision and multimedia retrieval.
He received his PhD in computer science from the INSA Rouen in 2011.
His research interests include first of all machine learning since its PhD to design
multiclass and novelty detecting support vector machines in the context
of naval infrared defensive systems. Then his research interests spread
out to computer vision and large scale multimedia retrieval. One of his last
achievements is the transfer to industrial partners of the image retrieval
software framework ELISE for copy detection, instance search and semantic image annotation.

Séminaire DAPA du 9 / 3 / 2017 à 10h

Massive Online Analytics for the Internet of Things (IoT)

Albert Bifet (Telecom ParisTech)

Lieu : salle 405, couloir 24-25, 4 place Jussieu, 75005 Paris

Résumé:

Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing
increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce
some popular open source tools for data stream mining.

Bio

Albert Bifet is Associate Professor at Telecom ParisTech and Honorary Research Associate at the WEKA Machine Learning Group at University of Waikato. Previously he worked at Huawei Noah's Ark Lab in Hong Kong, Yahoo Labs in Barcelona, University of Waikato and UPC BarcelonaTech. He is the author of a book on Adaptive Stream Mining and Pattern
Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of BigMine (2015, 2014, 2013, 2012), and ACM SAC Data Streams Track (2017, 2016, 2015, 2014, 2013, 2012).

Models for pessimistic or optimistic decisions under different uncertain scenarios

Giulianella Coletti (University of Perugia, Italy)

Lieu : LIP6, salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Résumé:

We consider a problem in the ambit of decisions under uncertainty, i.e., we study situations where a (not necessarily complete) preference relation is given on an arbitrary set of gambles and the decision model of reference is the Choquet expected value with respect to a belief or a plausibility function. For this aim we introduce two rationality principles which are necessary and sufficient conditions for the existence of a belief function or a plausibility function such that the corresponding Choquet integral represents the relation.
Nevertheless sometimes a decision maker could be either not able or not interested in giving preferences among gambles, but he could only be able to specify his preferences under the hypothesis that a particular event happens. In other words, he could not be able to express his preference relation under a generic scenario, but could assess it under various scenarios, which are taken into account at the same time. So we need to consider the Choquet expected values with respect to a conditional belief or plausibility function as decision model.
As is well known there are several notions of conditioning for belief or plausibility functions in the literature. The choice of the different conditioning notions heavily impacts on the properties of relations represented by the above model.

Bio

Giulianella Coletti is a Full Professor of Probability and Mathematical Statistics at the University of Perugia, Italy. She has been Coordinator of the "Dottorato di Ricerca" (Ph.D. school) "Mathematics and Informatics to Handle and Represent Information and knowledge " at University of Perugia since 2003 and Supervisor for the "Dottorato di Ricerca" (Ph.D. school) in Mathematics, Computer Science and Statistics, promoted by University of Florence, University of Perugia and INdAM since 2012. She has also been a member of the Scientific Committee of INdAM (Istituto Nazionale di Alta Matematica) since 2013. Her main fields of interest are: probability, non additive uncertainty measures, decision making, theory of measurements.
She is the author of more than 160 articles, 1 book (edited by Kluwer) and she is the editor of 4 books (edited by Elsevier, Springer, Plenum Press, CNR Applied Mathematics Monographs- Giardini Editori ) and of some special issues for international journals.

The realization of the Internet of Things (IoT) is creating an unprecedented tidal data wave, consisting of the collection of continuous measurements from an enormous number of sensors. The goal is to better understand, model, and analyze real-world phenomena, interactions, and behaviors. Consequently, there is an increasingly pressing need for developing techniques able to index and mine very large collections of sequences, or data series. This need is also present across several applications in diverse domains, ranging (among others) from engineering, telecommunications, and finance, to astronomy, neuroscience, and the web. It is not unusual for the applications mentioned above to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size.

In this talk, we describe recent efforts in designing techniques for indexing and mining truly massive collections of data series that will enable scientists to easily analyze their data. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce solutions to this problem. Furthermore, we discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. We also show how our methods allow mining on datasets that would otherwise be completely untenable, including the first published experiments using one billion data series.

Finally, we present our vision for the future in big sequence management research.

Bio

Themis Palpanas is a professor of computer science at the Paris
Descartes University (France), where he is a director of the Data
Intensive and Knowledge Oriented Systems (diNo) group. He received
the BS degree from the National Technical University of Athens,
Greece, and the MSc and PhD degrees from the University of Toronto,
Canada. He has previously held positions at the University of Trento
and the IBM T.J. Watson Research Center. He has also worked for the
University of California, Riverside, and visited Microsoft Research
and the IBM Almaden Research Center. His research solutions have been
implemented in world-leading commercial data management products
and he is the author of nine US patents. He is the recipient of
three Best Paper awards (including ICDE and PERCOM), and the IBM
Shared University Research (SUR) Award in 2012, which represents
a recognition of research excellence at worldwide level. He has been
a member of the IBM Academy of Technology Study on Event Processing,
and is a founding member of the Event Processing Technical Society.
He has served as General Chair for VLDB 2013, the top international
conference on databases. His research has been supported by the EU,
CNRS, NSF, Facebook, IBM Research, Hewlett Packard Labs, and Telecom
Italia.

The mining of frequent itemsets from uncertain databases has become a very hot topic within the data mining community over the last few years. Although the extraction process within binary databases constitutes a deterministic problem, the uncertain case is based on expectation. Recently, a new type of databases also referred as evidential database that handle the constraint of having both uncertain and imprecise data has emerged. In this talk, we present an applicative study case of evidential databases use within the chemistry field. Then, we shed light on a WEvAC approach for amphiphile molecule properties prediction.

Furthermore, the most existing approaches of pattern mining, which are based on procedural programs (as we often use/develop), would require specific and long developments to support the addition of extra constraints. In view of this lack of flexibility, such systems are not suitable for experts to analyze their data. Recent researches on pattern mining have suggested to use declarative paradigms such as SAT, ASP or CP to provide more flexible tools for pattern mining. The ASP framework has been proven to be a good candidate for developing flexible pattern mining tools. It provides a simple and principled way for incorporating expert's constraints within programs.

Bio

Ahmed Samet is a post-doctoral researcher at the University of Rennes 1. He received his M.Sc. degree in Computer Science from the Université de Tunis (Tunisia) in 2010. Then, he obtained a Ph.D. in Computer Science within a Cotutelle agreement between the Université de Tunis (Tunisia) and Université d'Artois (France). He held, at first, the position of a postdoctoral researcher with Sorbonne University: Université de technologie de Compiegne (France). His research topics involve decision making, machine learning under uncertainty and data mining.

Over the past years, challenges in data management having gained more and more attention. Assessment of quality of data is one such challenge that has tremendous potential. In this talk, we revise the current state-of-the-art about measurement of data quality and argue that there is a great need of fundamental research to establish formal systems for measurement of data quality. We revise a formal framework that was proposed very recently and expresses quality in an ordinal manner. We then show the role of uncertainty modelling within this framework. We conclude the talk with revising the role of fusion functions within systems of measurement of data.

Bio

Antoon Bronselaer is assistant professor at Ghent University and member of the DDCM research group (http://ddcm.ugent.be). Over the past ten years, he has been conducting research in the field of data quality, with an emphasis on the application of uncertainty models.

Linguistic summarization techniques make it easy to gain insight into large amounts of data by describing the main properties of the data linguistically. We focus on a specific type of data, namely process data, i.e., event logs that contain information about when some activities were performed for a particular customer case. An event log may contain many different sequences, because actions or events are often performed in slightly different orders for different customer cases.

We discuss protoforms that are designed to capture process specific information. Linguistic summaries can capture information on the tasks or sequences of tasks that are frequently executed as well as properties of these tasks or sequences, such as their throughput and service time. Such information is of specific interest in the context of process analysis and diagnosis.
Through a case study with a data from practice, we show that the knowledge derived from these linguistic summaries is useful for identifying problems in processes and establishing best practices.

Bio

Anna Wilbik received her Ph.D degree in computer science from the Systems Research Institute, Polish Academy of Science, Warsaw, Poland in 2010. She is currently an Assistant Professor at School of Industrial Engineering, Eindhoven University of Technology, The Netherlands. In 2011 she was a Post-doctoral Fellow at Electrical and Computer Engineering Department, University of Missouri, Columbia, MO, USA. In 2012 she participated in TOP 500 Innovators: Science - Management – Commercialization Program of the Polish Ministry of Science and Higher Education. Her research interests include linguistic summaries, data analysis, machine learning, and computational intelligence with a focus on applications in healthcare.

Over recent years, increasing quantities of data have been generated and recorded about many aspects of our lives. In cases such as internet logs, physical access logs, transaction records, email and phone records, the data consists of multiple overlapping sequences of events related to different individuals and entities. Identification and analysis of such event sequences is an important task which can be used to find similar groups, predict future behaviour and to detect anomalies. It is ideally suited to a collaborative intelligence approach, in which human analysts provide insight and interpretation, while machines perform data collection, repetitive processing and visualisation. An important aspect of this process is the common definition of terms used by humans and machines to identify and categorise similar (and dissimilar) events.

In this talk we will argue that fuzzy set theory gives a natural framework for the exchange of information, and interaction, between analysts and machines. We will describe a new approach to the definition of fuzzy hierarchies, and show how this enables event sequences to be extracted, compared and mined at different levels of resolution.

Bio

Trevor Martin (M’07) is a Professor of artificial intelligence at the University of Bristol, U.K. He received the B.Sc. degree in chemical physics from the University of Manchester, in 1978, and the Ph.D. degree in quantum chemistry from the University of Bristol, in 1984. Since 2001, he has been funded by British Telecommunications (BT) as a Senior Research Fellow, for his research on soft computing in intelligent information management, including areas such as the semantic Web, soft concept hierarchies, and user modeling.