abstract = "Due to the advances in data capture and storage
techniques over the last decade, the size of
Multivariate Time Series (MTS) data being recorded has
grown massively. Many of these MTS are characterised by
a large number of interdependent variables with large
possible time lags. If new and useful knowledge is to
be automatically learnt from this type of data in order
to aid the understanding of the underlying processes, a
paradigm must be identified that is capable of
modelling data with these characteristics but at the
same time exhibiting transparency in how it models the
data. A key challenge is that the number of possible
models is very large since it does not only depend on
the number of time series variables, but also on the
size of possible time lags between causes and
effects.

In this thesis a general framework is described for
automatically learning probabilistic models from MTS
with large time lags and high dimensionality in order
to explain the underlying processes involved.
Specifically, a novel method to learn dynamic Bayesian
networks for explanation from these series is
developed. This involves an efficient pre-processing
stage, which effectively groups MTS variables in order
to reduce the dimensionality of the problem. After
pre-processing, a combination of Evolutionary
Programming, Genetic Algorithms and heuristics is used
to speed up convergence when learning models. In
addition, an approach is looked at for the off-line
learning of dynamic Bayesian networks with changing
dependency structures. All experiments have been
carried out on a mixture of synthetic and real data
taken from an oil refinery repository. The resultant
models are used to generate explanations that are
evaluated in several ways, including reviewing the
feedback from chemical process engineers. These results
have demonstrated that the proposed framework is very
promising in terms of both efficiency and accuracy",