IPM Timeline

The Institute for the Study of Learning and Expertise (ISLE)
has been working on Inductive Process Modeling (IPM)
for the last decade.
This page, based on abstracts (etc) from supporting papers,
summarizes the project's trajectory in terms of research problems and solutions.

Highlights

2002

A novel research problem is proposed: constructing a process model from continuous data.

The Inductive Process Model (IPM) algorithm is described.

2003

Induction of process models is made more robust.

PROMETHEUS (a model construction GUI) is announced.

IPM is applied to inducing ecosystem models from background knowledge and time-series data.

2004

IPM is applied to photosynthesis regulation.

IPM supports computational revision of models.

2005

HIPM does constrained search for hierarchical models.

Overfitting in process model induction is addressed.

2006

Generic processes (ie, templates) are introduced.

IPM is applied to biochemical kinetics.

The problem of missing data is addressed.

2007

An inductive logic programming approach is introduced to address the issue of learning declarative bias.

An inductive logic programming approach is introduced.

An approach for extracting constraints on process model construction is introduced.

A logical formalism is introduced.

A computational method for acquiring scientific knowledge from candidate process models is introduced.

2010

SC-IPM (Structural Constraints for IPM) is introduced.

MISC (a procedure that learns and transfers constraints) is introduced.

Spatio-temporal process modeling is introduced.

2011

Application of SC-IPM to human physiological models is proposed.

2012

An automated approach to discovering constraints is introduced.

Papers

[We] pose a novel research problem for machine learning
that involves constructing a process model from continuous data.
We claim that casting learned knowledge in terms of processes with associated equations
is desirable for scientific and engineering domains, where such notations are commonly used.
We also argue that existing induction methods are not well suited to this task,
although some techniques hold partial solutions.
In response, we describe an approach to learning process models from time-series data
and illustrate its behavior in a population dynamics domain.

[We] revisit the problem of inducing a process model from time-series data.
We illustrate this task with a realistic ecosystem model,
review an initial method for its induction,
then identify three challenges that require extension of this method.
These include dealing with unobservable variables,
finding numeric conditions on processes,
and preventing the creation of models that overfit the training data.
We describe responses to these challenges and present experimental evidence
that they have the desired effects.

Most AI research on scientific model construction aims
to automate this process using discovery techniques.
In contrast, we describe an interactive environment for model construction
that lets the user construct, edit, and visualize scientific models,
use them to make predictions, and call on discovery methods to revise them
in ways that better fit the available data.

Ecosystem models are used to interpret and predict the interactions
of species and their environment.
In this paper, we address the task of inducing ecosystem models
from background knowledge and time-series data,
and we review IPM, an algorithm that addresses this problem.

We address the task of inducing explanatory models
from observations and knowledge about candidate biological processes,
using the illustrative problem of modeling photosynthesis regulation.
We cast both models and background knowledge
in terms of processes that interact to account for behavior.

Most ecological models are developed manually by scientists,
who decide on their basic structure, tune their parameters,
compare them against available data, and refine them in response.
In contrast, most work on computational scientific discovery
has emphasized the automated generation of models
from data and background knowledge.
We believe that computational tools for model revision
offer great practical value to scientists
by decreasing the time required to search for models
while letting them retain control over the search space. ...

Research on inductive process modeling combines background knowledge
with time-series data to construct explanatory models,
but previous work has placed few constraints on search through the model space.
We present an extended formalism that organizes process knowledge
in a hierarchical manner, and we describe HIPM,
a system that carries out constrained search for hierarchical process models.
We report experiments that suggest this approach
produces more accurate and plausible models with less effort.

[IPM] uses background knowledge about possible component processes
to construct quantitative models of dynamical systems.
... previous methods for this task tend to overfit the training data,
which suggests ensemble learning as a likely response.
However, such techniques combine models in ways that reduce comprehensibility,
making their output much less accessible to domain scientists.

[We] introduce a new approach that induces a set of process models
from different samples of the training data
and uses them to guide a final search through the space of model structures.
Experiments with synthetic and natural data suggest this method reduces error
and decreases the chance of including unnecessary processes in the model.

[We] present an approach that represents candidate models as sets of quantitative processes
and that treats revision as search through a model space
which is guided by time-series observations
and constrained by background knowledge cast as generic processes
that serve as templates for the specific processes used in models.

We address the task of inducing explanatory models
from observations and knowledge about candidate biological processes,
using the illustrative problem of modeling photosynthesis regulation.
We cast both models and background knowledge
in terms of processes that interact to account for behavior.
We demonstrate [IPM's] use both on photosynthesis
and on a second domain, biochemical kinetics.

[We] discuss approaches to learning with missing values in time series,
noting that these efforts are typically applied for descriptive modeling tasks
that use little background knowledge.
We also point out that these methods assume that data are missing at random --
a condition that may not hold in scientic domains.
[We] compare an expectation maximization approach
with one that simply ignores the missing data.

[We] present a language for stating process models and background knowledge
in terms familiar to scientists,
along with an interactive environment for knowledge discovery
that lets the user construct, edit, and visualize scientific models,
use them to make predictions, and revise them to better fit available data.
We report initial studies in three domains
that illustrate the operation of this environment
and the results of a user study carried out with domain scientists.

Scientists investigate the dynamics of complex systems with quantitative models,
employing them to synthesize knowledge, to explain observations,
and to forecast future system behavior.
Complete specification of systems is impossible,
so models must be simplified abstractions.
Thus, the art of modeling involves deciding which system elements to include
and determining how they should be represented.
We view modeling as search through a space of candidate models
that is guided by model objectives, theoretical knowledge, and empirical data.

In this contribution, we introduce a method for representing process-based models
that facilitates the discovery of models that explain observed behavior.
This representation casts dynamic systems
as interacting sets of processes that act on entities.
Using this approach, a modeler first encodes relevant ecological knowledge
into a library of generic entities and processes,
then instantiates these theoretical components,
and finally assembles candidate models from these elements.
We illustrate this methodology with a model of the Ross Sea ecosystem.

In this paper, we introduce an inductive logic programming approach
to learning declarative bias.
The target learning task is inductive process modeling, which we briefly review.
Next we discuss our approach to bias induction while emphasizing predicates
that characterize the knowledge and models associated with the HIPM system.
We then evaluate how the learned bias
affects the space of model structures that HIPM considers
and how well it generalizes to other search problems in the same domain.

Results indicate that the bias reduces the size of the search space
without removing the most accurate structures.
In addition, our approach reconstructs known constraints in population dynamics.
We conclude the paper by discussing a generalization of the technique
to learning bias for inductive logic programming.

In this paper, we introduce an approach for extracting constraints
on process model construction.
We begin by clarifying the type of knowledge produced by our method
and how one may apply it.
Next, we review the task of inductive process modeling,
which provides the required data.

We then introduce a logical formalism and a computational method
for acquiring scientific knowledge from candidate process models.
Results suggest that the learned constraints make sense ecologically
and may provide insight into the nature of the modeled domain.

In this paper, we pose a novel research problem for machine learning
that involves constructing a process model from continuous data.
We claim that casting learned knowledge
in terms of processes with associated equations
is desirable for scientific and engineering domains,
where such notations are commonly used.
We also argue that existing induction methods
are not well suited to this task,
although some techniques hold partial solutions.
In response, we describe an approach to learning process models
from time-series data and illustrate its behavior in three domains.

In previous publications, we have reported a computational approach
to constructing explanatory process models of dynamic systems
from time-series data and background knowledge.
We have not aimed to mimic the detailed behavior of human researchers,
but we maintain that our systems address the same tasks
as ecologists, biologists, and other theory-guided scientists,
and that they carry out search through similar problem spaces. ...

Scientific modeling is a creative activity that can benefit from computational support.
This chapter reports five challenges that arise in developing such aids,
as illustrated by PROMETHEUS,
a software environment that supports the construction and revision of explanatory models.
These challenges include the paucity of relevant data,
the need to incorporate prior knowledge, the importance of comprehensibility,
an emphasis on explanation, and the practicality of user interaction.

The responses to these challenges include the use
of quantitative processes to encode models and background knowledge,
as well as the combination of AND/OR search through a space of model structures
with gradient descent to estimate parameters.
This chapter reports our experiences with PROMETHEUS on three scientific modeling tasks
and some lessons we have learned from those efforts.
This chapter concludes by noting additional challenges
that were not apparent at the outset of our work.

Research on computational models of scientific discovery
investigates both the induction of descriptive laws
and the construction of explanatory models.
Although the work in law discovery
centers on knowledge-lean approaches to searching a problem space,
research on deeper modeling tasks emphasizes the pivotal role of domain knowledge.
As an example, our own research on inductive process modeling
uses information about candidate processes
to explain why variables change over time.

However, our experience with IPM,
an artificial intelligence system that implements this approach,
suggests that process knowledge is insufficient
to avoid consideration of implausible models.
To this end, the discovery system needs additional knowledge
that constrains the model structures.
We report on an extended system, SC-IPM,
that uses such information to reduce its search through the space of candidates
and to produce models that human scientists find more plausible.
We also argue that although people carry out less extensive search than SC-IPM,
they rely on the same forms of knowledge -- processes and constraints --
when constructing explanatory models.

People constantly apply acquired knowledge to new learning tasks,
but machines almost never do.
Research on transfer learning attempts to address this dissimilarity.
Working within this area,
we report on a procedure that learns and transfers constraints
in the context of inductive process modeling, which we review.
After discussing the role of constraints in model induction,
we describe the learning method, MISC,
and introduce our metrics for assessing the cost and benefit of transferred knowledge.
The reported results suggest that cross-domain transfer is beneficial
in the scenarios that we investigated,
lending further evidence that this strategy is a broadly effective means
for increasing the efficiency of learning systems.

Quantitative modeling plays a key role in the natural sciences,
and systems that address the task of inductive process modeling
can assist researchers in explaining their data.
In the past, such systems have been limited to data sets
that recorded change over time,
but many interesting problems involve both spatial and temporal dynamics.

To meet this challenge, we introduce SCISM,
an integrated intelligent system which solves the task
of inducing process models that account for spatial and temporal variation.
We also integrate SCISM with a constraint learning method
to reduce computation during induction.
Applications to ecological modeling
demonstrate that each system fares well on the task,
but that the enhanced system does so much faster than the baseline version.

In this paper, we review the paradigm of inductive process modeling
and examine its application to human physiology.
This framework represents models as a set of interacting processes,
each with associated differential or algebraic equations
that express causal relations among variables.
Simulating such a quantitative process model
produces trajectories for variables over time
that one can compare to observations.
Background knowledge about candidate processes enables search
through the space of model structures and their associated parameters,
and thus identify quantitative models that explain time-series data.

We present an initial process model for aspects of human physiology,
consider its uses for health monitoring,
and discuss the induction of such models.
In closing, we consider related efforts on physiological modeling
and our plans for collecting data to evaluate our framework in this domain.

Scientists use two forms of knowledge in the construction of explanatory models:
generalized entities and processes that relate them;
and constraints that specify acceptable combinations of these components.
Previous research on inductive process modeling,
which constructs models from knowledge and time-series data,
has relied on handcrafted constraints.

In this paper, we report an approach to discovering such constraints
from a set of models that have been ranked
according to their error on observations.
Our approach adapts inductive techniques for supervised learning
to identify process combinations that characterize accurate models.

We evaluate the method's ability to reconstruct known constraints
and to generalize well to other modeling tasks in the same domain.
Experiments with synthetic data indicate that the approach
can successfully reconstruct known modeling constraints.
Another study using natural data suggests that transferring constraints
acquired from one modeling scenario to another within the same domain
considerably reduces the amount of search for candidate model structures
while retaining the most accurate ones.