4
Advantages of Quantitative Process Models they refer to notations and mechanisms familiar to scientists; they refer to notations and mechanisms familiar to scientists; they embed quantitative relations within qualitative structure; they embed quantitative relations within qualitative structure; they provide dynamical predictions of changes over time; they provide dynamical predictions of changes over time; they offer causal and explanatory accounts of phenomena; they offer causal and explanatory accounts of phenomena; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are a good target for discovery systems because: Quantitative process models provide an important alternative to formalisms used currently in machine learning and discovery.

5
Observed values for a set of continuous variables as they vary over time or situations Generic processes that characterize causal relationships among variables in terms of conditional equations Inductive Process Modeling A specific process model that explains the observed values and predicts future data accurately Induction training data background knowledge learned model

6
Generic Processes as Background Knowledge the variables involved in a process and their types; the variables involved in a process and their types; the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges; the forms of conditions on the process; and the forms of conditions on the process; and the forms of associated equations and their parameters. the forms of associated equations and their parameters. Our framework casts background knowledge as generic processes that specify: Generic processes are building blocks from which one can compose a specific quantitative process model.

8
Previous Results: The IPM Algorithm 1. Find all ways to instantiate known generic processes with specific variables, subject to type constraints; 2. Combine instantiated processes into candidate generic models, with limits on the total number of processes; 3. For each generic model, carry out gradient descent search through parameter space to find good parameter values; 4. Select the parameterized model that produces the lowest mean squared error on the training data. Langley et al. (2002) reported IPM, an algorithm that constructs process models from generic components in four stages: We showed that IPM could induce accurate process models from noisy time series, but it tended to include extra processes.

9
The Revised IPM Algorithm Accepts as input those variables that can appear in the induced model, both observable and unobservable; Accepts as input those variables that can appear in the induced model, both observable and unobservable; Utilizes the parameter-fitting routine to estimate initial values for unobservable variables; Utilizes the parameter-fitting routine to estimate initial values for unobservable variables; Invokes the parameter-fitting method to induce the thresholds on process conditions; and Invokes the parameter-fitting method to induce the thresholds on process conditions; and Selects the parameterized model with the lowest description length: M d = (M v + M c ) log (n) + n log (M e ). Selects the parameterized model with the lowest description length: M d = (M v + M c ) log (n) + n log (M e ). We have revised and extended the IPM algorithm so that it now: We have evaluated the new system on synthetic and natural data.

10
Evaluation of the IPM Algorithm 1. We used the aquatic ecosystem model to generate data sets over 100 time steps for the variables nitro and phyto; 2. We replaced each true value x with x (1 + r n), where r followed a Gaussian distribution ( = 0, = 1) and n > 0; 3. We ran IPM on these noisy data, giving it type constraints and generic processes as background knowledge. To demonstrate IPM's ability to induce process models, we ran it on synthetic data for a known system: In two experiments, we let IPM determine the initial values and thresholds given the correct structure; in a third study, we let it search through a space of 256 generic model structures.

11
Experimental Results with IPM The main results of our studies with IPM on synthetic data were: 1. The system infers accurate estimates for the initial values of unobservable variables like zoo and residue; 2. The system induces estimates of condition thresholds on nitro that are close to the target values; and 3. The MDL criterion selects the correct model structure in all runs with 5% noise, but only 40% of runs with 10% noise. These suggest that the basic approach is sound, but that we should consider other MDL schemes and other responses to overfitting.