Computational Induction of Scientific Process Models

This project aims to develop a framework that unifies two separate but
central themes in information technology -- computational simulation
of models to explain important phenomena and computational induction
of knowledge from observed regularities in data. Unlike most previous
work in machine learning and data mining, the approach emphasizes
methods that generate knowledge in established scientific formalisms,
incorporate domain knowledge where possible, focus on causal and
explanatory models, address induction from observational time-series
data, and are embedded in a simulation environment which scientists
can use for model development.

Our approach revolves around a new class of models that consist of
interacting quantitative processes and the problem of inducing such
models from time-series data. Computational challenges that we will
address include reducing overfitting and variance, inducing conditions
on processes, handling large, heterogeneous data sets with missing
values, and scaling to complex models. We will incorporate the resulting
algorithms in a trainable simulation environment that lets users
construct models manually or induce them from data, then simulate
their behavior. Experimental evaluation will involve both Earth
Science observations from the Ross Sea and synthetic data.

The trainable simulation environment will let Earth scientists search
the space of candidate models systematically, producing more accurate
models in much less time. Moreover, the novel computational methods
should aid model construction in other fields like systems biology and
engineering. Both the environment and sample models will be utilized
in courses and accessible at future incarnations of this Web site.