Several activity-based transportation models are now becoming operational and are entering the stage ofapplication for the modelling of travel demand. Some of these models use decision rules to support itsdecision making instead of principles of utility maximization. Decision rules can be derived from differentmodelling approaches. In a previous study, it was shown that Bayesian networks outperform decision treesand that they are better suited to capture the complexity of the underlying decision-making. However, one ofthe disadvantages is that Bayesian networks are somewhat limited in terms of interpretation and efficiencywhen rules are derived from the network, while rules derived from decision trees in general have a simpleand direct interpretation. Therefore, in this study, the idea of combining decision trees and Bayesiannetworks was explored in order to maintain the potential advantages of both techniques. The paper reportsthe findings of a methodological study that was conducted in the context of Albatross, which is a sequentialrule based model of activity scheduling behaviour. The results of this study suggest that integrated Bayesiannetworks and decision trees can be used for modelling the different choice facets of Albatross with betterpredictive power than CHAID decision trees. Another conclusion is that there are initial indications that thenew way of integrating decision trees and Bayesian networks has produced a decision tree that isstructurally more stable.

31. INTRODUCTIONOver the last decade, activity-based transportation models have set the standard for modelling traveldemand. The most important characteristic in these models is that travel demand is derived from theactivities that individuals and households need or wish to perform. The main advantage is that travel has nolonger an isolated existence in these models, but is perceived as a way to perform activities and to realizeparticular goals in life.Several activity-based models are now becoming operational and are entering the stage of applicationin transport planning (e.g. Bhat, et al. 2004, Bowman, et al. 1998; Arentze and Timmermans 2000). Thismultitude of modelling attempts seems to converge into two approaches. First, discrete choice utility-maximizing models that were originally developed for trip and tour data, were extended to activity-basedmodels by including more facets. The second approach emphasizes the need for rule-based computationalprocess models, since it is claimed by several scholars that utility-maximizing models do not always reflectthe true behavioural mechanisms underlying travel decisions (people may reason more in terms of “if-then”structures than in terms of utility maximizing decisions). For this reason, several studies have shown anincreasing interest in computational process models to predict activity-travel patterns. This study contributesto this line of research by narrowing down on one operational computational process model, i.e. theAlbatross system (A Learning Based Transportation Oriented Simulation System), developed by Arentzeand Timmermans (2000) for the Dutch Ministry of Transport. Albatross is a multi-agent rule-based systemthat predicts which activities are conducted where, when, for how long, with whom and the transport modeinvolved. It uses decision rules to predict each of those facets (where, when, etc.) and to support schedulingdecisions. These decision rules can be derived by various decision tree induction algorithms (C4.5, CHAID,CART, etc.). Comparative studies by Wets, et al. (2000) and Moons, et al. (2004) found evidence thatdifferent kinds of decision tree induction algorithms achieve comparable results. A previous study byJanssens, et al. (2004) suggested that Bayesian networks outperform decision trees and that they are bettersuited to capture the complexity of the underlying decision-making process. Especially, it was found thatBayesian networks are potentially valuable to take into account the many (inter)dependencies among thevariables that make up the complex decision-making process. However, the study also revealed thatBayesian networks had some disadvantages. First, in cases where decision rules need to be derived from theBayesian network, the technique seemed to be somewhat limited. In particular, each decision rule that isderived from the network contains the same number of conditions, resulting in potential sub-optimaldecision-making. Second, the interpretation of the rules may be an issue. It should be realized that decision

4rules which are derived from decision trees have a simple direct interpretation: condition states are directlyrelated to choices. In contrast, Bayesian networks link more variables in sometimes complex, direct andindirect ways, making interpretation more problematic. Consequently, it may be interesting to explore thepossibility of combining these approaches and to examine where advantages can be maintained. In thispaper, the results of such a study are reported. To this end, a novel classification technique is proposed inthis paper that integrates decision trees and Bayesian networks. The new heuristic is referred to as aBayesian Network Augmented Tree (BNT) in the reminder of this paper.The remainder of the paper is organized as follows. First, the conceptual framework underlying theAlbatross-system is briefly discussed in order to provide some background information with respect to thistransportation model. Next, we will elaborate on the traditional decision tree formalism as it is also used inthe original Albatross system. Third, Bayesian network learning will be introduced. In this section, generalconcepts such as parameter learning, entering evidences and structural learning will be given, along with aproblem formulation and a description of the new BNT classification algorithm. Section 5 then describes thedesign of the experiments that were carried out to validate the new approach and gives an overview of thedata that were used. Section 6 provides a detailed quantitative analysis and compares the performance ofBayesian networks and BNT at two levels: the activity pattern level and the trip level. Finally, conclusionsand implications for the development and application of future activity-based models of travel demand arereported.

2. THE ALBATROSS SYSTEMThe Albatross system (Arentze and Timmermans 2000) is a computational process model that relies on aset of decision rules to predict activity-travel patterns. Rules are typically extracted from activity diary data.The activity scheduling agent of Albatross is the core of the system which controls the schedulingprocesses in terms of a sequence of steps. These steps are based on an assumed sequential execution ofdecision tables to predict activity-travel patterns (see Figure 1). The first step involves for each persondecisions about which activities to select, with whom the activity is conducted and the duration of theactivity. The order in which (the non-work) activities are evaluated is pre-defined as: daily shopping,services, non-daily shopping, social and leisure activities. The assignment of a scheduling position to eachselected activity is the result of the next two steps. After a start time interval is selected for an activity, trip-chaining decisions determine for each activity whether the activity has to be connected with a previous

5and/or next activity. Those trip chaining decisions are not only important for timing activities but also fororganizing trips into tours. The next steps involve the choice of transport mode for work (referred to asmode1), the choice of transport mode for other purposes (referred to as mode2) and the choice of location.Possible interactions between mode and location choices are taken into account by using locationinformation as conditions of mode selection rules. Each decision in the Albatross system (see oval boxes ofFigure 1) is extracted from activity travel diary data using a Chi-squared based technique (hereafter referredto as CHAID, (Kass 1980)). As mentioned above, CHAID is a widely-used decision-tree induction method.<INSERT FIGURE 1 HERE>3. DECISION TREES3.1. General ConceptsDecision trees are state-of-the art techniques, which are used to make decisions from a set of instances.There are two types of nodes in a decision tree: decision nodes and leaves. Leaves are the terminal nodes ofthe tree and they specify the ultimate decision of the tree. Decision nodes involve testing a particularattribute. Usually, the test at a decision node compares an attribute value with a constant. Ultimately, toclassify an unlabeled instance, the case is routed down the tree according to the values of the attributestested in successive decision nodes and when a leaf is reached, the instance is classified according to theprobability distribution over all classification possibilities.The decision tree is typically constructed by means of a “divide-and-conquer” approach. This meansthat first an attribute is selected to place at the root node of the tree. This root node splits up and divides thedataset into different subsets, one for every value of the root node. Each value is specified by a branch.Then, the construction of the tree becomes a recursive problem, since the process can be repeated for everybranch of the tree. It should be noted that only those instances that actually reach the branch are used in theconstruction of the tree. In order to determine which attribute to split on, given a set of examples withdifferent classes, different algorithms can be adopted (C4.5, CHAID, CART). The CHAID algorithm inAlbatross, starts at a root tree node, dividing into child tree nodes until leaf tree nodes terminate branching.The splits are determined using the Chi Squared test.After the decision tree is constructed, it is easy to convert the tree into a rule set by deriving a rule foreach path in the tree that starts at the root and ends at the leaf node. Decision rules are often represented in adecision table formalism. A decision table represents an exhaustive set of mutual exclusive expressions that

6link conditions to particular actions, preferences or decisions. The decision table formalism guarantees thatthe choice heuristics are exclusive, consistent and complete. A simplified example of a decision tree alongwith its corresponding decision table is represented in Figure 2.<INSERT FIGURE 2 HERE>3.2. Decision trees: problem formulationDespite their huge popularity, it was already shown in other application domains (Bloemer, et al. 2003) thatthe model structure of decision trees can sometimes be instable. This means that when carrying out multipletests, mostly the same variables enter the decision tree but the order in which they enter the tree is different.The reason for this is known as “variable masking”, i.e. if one variable is highly correlated with another,then a small change in the sample data (given several tests) may shift the split in the tree from one variableto another.

4. BAYESIAN NETWORK LEARNING4.1. General ConceptsA Bayesian network consists of two components (Pearl, 1988): first, a directed acyclic graph (DAG) inwhich nodes represent stochastic domain variables and directed arcs represent conditional dependenciesbetween the variables (see definition 1-3) and second, a probability distribution for each node as representedby conditional dependencies captured with the directed acyclic graph (see definition 4). Bayesian networksare powerful representation and visualization tools that enable users to conceptualise the associationbetween variables. However, as will be explained below, Bayesian networks can also be used for makingpredictions. To formalize, the following definitions are relevant:

Definition 1 A directed acyclic graph (DAG) is a directed graph that contains no directed cycles.■

Definition 2 A directed graph G can be defined as an ordered pair that consists of a finite set V of verticesor nodes and an adjacency relation E on V. The Graph G is denoted as (V,E). For each (a, b) ε E (a and b arenodes) there is a directed edge from node a to node b. In this representation, a is called a parent of b and bis called a child of a. In a graph, this is represented by an arrow which is drawn from node a to node b. Forany a∈V, (a,a)∉E, which means that an arc cannot have a node as both its start and end point. Each nodein a network corresponds to a particular variable of interest. ■

7

Definition 3 Edges in a Bayesian network represent direct conditional dependencies between the variables.The absence of edges between variables denotes statements of independence. We say that variables B and Care independent given a set of variables A if P(c |b,a)=P(c |a) for all values a,b and c of variables A, B andC. Variables B and C are also said to be independent conditional on A. ■

Definition 4 A Bayesian network also represents distributions, in addition to representing statements ofindependence. A distribution is represented by a set of conditional probability tables (CPT). Each node Xhas an associated CPT that describes the conditional distribution of X given different assignments of valuesfor its parents. ■<INSERT FIGURE 3 HERE>The definitions mentioned above are graphically illustrated in figure 3 by means of a simple hypotheticalexample. Learning Bayesian networks has traditionally been divided into two categories (Cheng, et al.1997): structural and parameter learning. Since these learning phases are relevant for the new integratedBNT classifier, the following sections elaborate on them into detail.4.2. Parameter learningParameter learning determines the prior CPT of each node of the network, given the link structures and thedata. It can therefore be used to examine quantitatively the strength of the identified effect. As mentionedabove, a conditional probability table P (A|B1…Bn) has to be attached to each variable A with parents B1, …,Bn. Note that if A has no parents, the table reduces to unconditional probabilities P(A). According to thislogic, for the example Bayesian network depicted in Figure 3, the prior unconditional and conditionalprobabilities to specify are: P(Driving License); P(Gender); P(Number of cars); P(Mode choice|DrivingLicense, Gender, Number of cars). Since the variables “Number of cars”, “Gender” and “Driving license”are not conditionally dependent on other variables, calculating their prior frequency distribution isstraightforward. Calculating the initial probabilities for the “Mode choice” variable is computationally moredemanding.In order to calculate the prior probabilities for the “Mode choice” variable, the conditional probabilitytable for P(Mode Choice| Driving License, Gender, Number of cars) was set up in the first part of Table 1.Again, this is straightforward mathematical calculus. In order to get the prior probabilities for the Mode

8Choice variable, we now first have to calculate the joint probability P(Choice, Gender, Number of cars,Driving License) and then marginalize “Number of cars”, “Driving License” and “Gender” out. This can bedone by applying Bayes’ rule, which states that: P(Choice, Gender, Number of cars, DrivingLicense)=P(Choice|Gender, Number of cars, Driving License)*P(Gender, Number of cars, Driving License).Since “Gender”, “Number of cars” and “Driving License” are independent, the equation can be simplifiedfor this example as: P(Choice, Gender, Number of cars, Driving License)=P(Choice|Gender, Number ofcars, Driving License)*P(Gender)*P(Number of cars)*P(Driving License). Note that P(Gender=male;Gender=female)=(0.75; 0.25), P(Driving License=yes; Driving license=no) = (0.6; 0.4) and P(Number ofcars=1;Number of cars>1)=(0.2;0.8) which are the prior frequency distributions forthose 3 variables. By using this information, the joint probabilities were calculated in the middle part oftable 1. Marginalizing “Gender”, “Number of cars” and “Driving License” out of P(Choice, Gender,Number of cars, Driving License) yields P(Mode Choice=bike; Mode Choice=car) = (0.506; 0.494). Theseare the prior probabilities for the “Mode choice” variable. Of course, computations become more complexwhen “Gender”, “Number of cars” and “Driving License” are dependent. Fortunately, in these cases,probabilities can be calculated automatically by means of probabilistic inference algorithms that areimplemented in Bayesian network-enabled software.<INSERT TABLE 1 HERE>4.3. Entering evidencesIn fact, Figure 3 only depicts the prior distributions for each variable. This is useful but not very innovativeinformation. An important strength of Bayesian networks, however, is to compute posterior probabilitydistributions of the variable under consideration, given the fact that values of some other variables areknown. In this case, the known states of variables can be entered as evidence in the network. When evidenceis entered, this is likely to change the states of other variables as well, since they are conditionallydependent. This is demonstrated by entering the evidence in the network that the “Mode choice” variable isequal to “car”. In this case, evidence on “Mode choice” now arrives in the form of P*(Mode Choice)=(0, 1),where P*indicates that we are no longer calculating prior probabilities. Then P*( Choice, Gender, Numberof cars, Driving License)= P(Number of cars, Gender, Driving License | Mode choice) * P*(Mode Choice)=(P(Choice, Gender, Number of cars, Driving License)*P*(Mode Choice))/P(Mode Choice). This means thatthe joint probability table for “Choice”, “Number of cars”, “Driving License” and “Gender” is updated bymultiplying by the new distributions and dividing by the old ones. The multiplication consists of

9annihilating all entries with “Choice”=“bike”. The division by P(Mode Choice) only has an effect on entrieswith Mode Choice=”car”, so therefore the division is by P(Mode Choice=”car”). For this simple example,the calculations can be found in the lower part of table 1. The distributions P*(Number of cars), P*(Gender)and P*(Driving License) are calculated through marginalization of P*(Choice, Gender, Number of cars,Driving License). This means that P*(Gender=male; Gender=female) = (0.765;0.235); P*(Number of cars=1;Number of cars>1)= (0.255;0.745) and P*(Driving License=yes; Driving License=no)= (0.522; 0.478) whenevidence was entered that the “Mode choice” variable equals car. Obviously, the calculation of this exampleis simple, however, in real-life situations it is likely that conditionally dependent relationships between the“choice” variable and other variables exist as well, and as a result the evidence will propagate through thewhole network. More information about efficient algorithms for propagation of evidence in Bayesiannetworks can be found in Pearl (1988) and in Jensen, Lauritzen and Olesen (1990).4.4. Structural learningStructural learning determines the dependence and independence of variables and suggests a direction ofcausation (or association), in other words, the position of the links in the network. Experts can provide thestructure of the network using domain knowledge. However, the structure can also be extracted fromempirical data. Especially the latter option offers important and interesting opportunities for transportationtravel demand modelling because it enables one to visually identify which variable or combination ofvariables influences the target variable of interest. Structural learning can be divided into two categories:search & scoring methods and dependency analysis methods. Algorithms, belonging to the first categoryinterpret the learning problem as a search for the structure that best fits the data. Different scoring criteriahave been suggested to evaluate the structure, such as the Bayesian scoring method (Cooper and Herskovits,1992; Heckerman, Geiger and Chickering, 1995) and minimum description length (Lam and Bacchus,1994).A Bayesian network is essentially a descriptive probabilistic graphical model that is potentially wellsuited for unsupervised learning. However, the technique can also be tuned so that it becomes suitable forsupervised (or classification) learning, just like decision trees, neural networks or for instance support vectormachines. A number of Bayesian network classifiers (eg. Naïve Bayes, Tree augmented Naïve Bayes,General Bayesian network) have been developed for this purpose

104.5. Bayesian network classifiers: problem formulationWhile Bayesian network classifiers have proven to give accurate and good results in a transportation context(Janssens, et al. 2004; Torres and Huber, 2003), Achilles’ tendons obviously are the decision rules whichcan be derived from the Bayesian network. As mentioned above, each decision rule that is derived from thenetwork contains the same number of conditions, resulting in potential sub-optimal decision-making.Second, Bayesian networks link more variables in sometimes complex, direct and indirect ways, makinginterpretation more problematic.To illustrate this, the procedure of transforming a Bayesian network into a decision table (i.e. rulebased form) is shown in the second part of figure 4. In this figure, probability distributions of the targetvariable are calculated for every possible combination of states. This can be done by entering evidences forthose states in the network (see section 4.3). An example is shown in the middle part of Figure 4.<INSERT FIGURE 4>As it can be seen from this figure, every rule contains the same number of condition variables. For theexample shown here, this number is equal to 4. Moreover, the number of rules that are derived from thenetwork is fixed and can be determined in advance for a particular network. This number is equal to everypossible combination of states (values of the condition variables). Therefore, the total number of rules,which has to be derived from the network shown in Figure 4 is equal to 2*3*2*7=84, assuming that theduration attribute is taken as the class attribute. Especially when more nodes are incorporated, this number islikely to become extremely large. While this does not need to be a problem as such, it is obvious that anumber of these decision rules will be redundant as they will never be “fired”. This flaw has no influence onthe total accuracy of each Bayesian network classifier (see Janssens, et al. 2004), but it is clearly a sub-optimal solution, not only because some of the rules will never be used, but also because this large numberof conditions do not favour the interpretation.For both reasons mentioned here, i.e. the possibility of combining the advantage of Bayesiannetworks (take into account the interdependencies among variables) and the advantage of decision trees(derive easy understandable and flexible (i.e. non-fixed) decision rules), and for the reason mentionedbefore, i.e. deal with the variable masking problem in decision trees, the idea to integrate both techniquesinto a new classifier was conceived.

11

4.6. Towards a new integrated classifierIn the integrated BNT classifier, the idea is proposed to derive a decision tree from a Bayesian network (thatis build upon the original data) instead of immediately deriving the tree from the original data. By doing so,it is expected that the structure of the tree is more stable, especially because the variable correlations arealready taken into account in the Bayesian network, which may reduce the variable masking problem. Tothe best of our knowledge, the idea to build decision trees in this way has not been explored before inprevious studies.In order to select a particular decision node in the BNT classifier, the mutual information value that iscalculated between two nodes in the Bayesian network is used. This mutual information value is to someextent equivalent with the entropy measure that C4.5 decision trees use. It is defined as the expected entropyreduction of one node due to a finding (observation) related to the other node. The dependent variable iscalled the query variable (denoted by the symbol Q), the independent variables are called findings variables(denoted by the symbol F). Therefore, the expected reduction in entropy (measured in bits) of Q due to afinding related to F can be calculated according the following equation (Pearl, et al., 1988):∑∑⎟⎟⎠⎞⎜⎜⎝⎛=q ffpqpfqpfqpFQI)()(),(log*),(),((1)where, p(q,f) is the posterior probability that a particular state of Q (q) and a particular state of F (f) occurtogether; p(q) is the prior probability that a state q of Q will occur and p(f) is the prior probability that a statef of F will occur. The probabilities are summed across all states of Q and across all states of F.The expected reduction in entropy of the dependent variable can be calculated for the variousfindings variables. The finding variable that obtains the highest reduction in entropy is selected as the rootnode in the tree. To better illustrate the idea of building a BNT classifier, we consider again the network thatwas shown in figure 3 by means of example. In this case, the dependent variable is “Mode choice” and thedifferent finding variables are “Driving license”, “Gender” and “Number of cars”. In a first step, we can forinstance calculate the expected reduction in entropy between the “Mode choice” and the “Gender” variable.

The calculation of the joint probabilities P(Modei, Genderj) for i={bike, car} and j={male,female} iscompletely the same as explained in section 4.2. The calculation of the individual prior probabilitiesP(Modei) and P(Genderj) is straightforward as well (see section 4.2). As a result, the expected result offormula (1) is: I (Mode choice,Gender) = 0.372*log0.75*0.5060.372+ 0.378*log0.75*0.4940.378+0.134*log0.25*0.5060.134+ 0.116*log0.25*0.4940.116= 0.00087. In a similar way,I(Mode choice, DrivingLicense) = 0.01781 andI(Mode choice, Number of cars)=0.01346 can be calculated. SinceI(Modechoice,Driving License) >I(Mode choice, Number of cars) >I(Mode choice, Gender); the variable DrivingLicense is selected as the root node of the tree (see figure 5). Once the root node has been determined, thetree is split up into different branches according to the different states (values) of the root node. To this end,evidences can be entered for each state of the root node in the Bayesian network and the entropy value canbe re-calculated for all other combinations between the findings nodes (except for the root node) and thequery node. The node, which achieves the highest entropy reduction is taken as the node which is used forsplitting up that particular branch of the root node. In our example, the root node “Driving License” has twobranches: Driving License=yes and Driving License=no. For the split in the first branch (DrivingLicense=yes), only two variables have to be taken into account (since the root node is excluded): “Numberof cars” and “Gender”. The way in which the expected reduction in entropy is calculated is the same asshown above, expect for the fact that an evidence needs to be entered for the node “Driving License”, i.e.P(Driving License=Yes; Driving License=no)=(1;0) (since we are in the first branch). The procedure fordoing this was already described in section 4.3. Again,I(Mode choice, Gender)=0.02282 andI(Modechoice, Number of cars)=0.07630. SinceI(Mode choice, Number of cars)>I((Mode choice, Gender); thevariable “Number of cars” is selected as the next split in this first branch. Finally, the whole process thenbecomes recursive and needs to be repeated for all possible branches in the tree. A computer code has been

13established to automate the whole process. The final decision tree for this simple Bayesian network is shownin figure 5.<INSERT FIGURE 5 HERE>5. DATA AND DESIGN OF THE EXPERIMENTS5.1. DataThe activity diary data used in this study were collected in the municipalities of Hendrik-Ido-Ambacht andZwijndrecht in the Netherlands (South Rotterdam region) to develop theAlbatrossmodel system (Arentzeand Timmermans 2000). The data involve a full activity diary, implying that both in-home and out-of-homeactivities were reported. The sample covered all seven days of the week, but individual respondents wererequested to complete the diaries for two designated consecutive days. Respondents were asked, for eachsuccessive activity, to provide information about the nature of the activity, the day, start and end time, thelocation where the activity took place, the transport mode, the travel time, accompanying individuals andwhether the activity was planned or not. A pre-coded scheme was used for activity reporting. After cleaning,a data set of a random sample of 1649 respondents was used in the experiments.There are some general variables that are used for each choice facet of theAlbatrossmodel (i.e. eachoval box). These include (among others) household and person characteristics that might be relevant for thesegmentation of the sample. Each dimension also has its own extensive list of more specific variables,which are not described here in detail.

5.2 Design of The ExperimentsThe aim of this study is to examine both the predictive capabilities and the potential advantages of the BNTclassifier. To this end, the predictive performance of this integration technique is compared with a decisiontree learning algorithm (CHAID) and with original Bayesian network learning.For the CHAID decision tree approach, experiments were conducted for the full set of decision agentsof theAlbatrosssystem. First, decision trees were therefore extracted from activity-travel diaries. Hereafter,these decision trees were converted into decision tables as described in section 3.1. Next, the decision tableswere successively executed to predict the activity-travel patterns for the randomly selected sample of 1649respondents.

14For the Bayesian network approach, a Bayesian network was constructed for every decision agent using astructural learning algorithm, developed by Cheng,et al. (1997). This implies that the structure of thenetwork was not imposed on the basis of a-priori domain knowledge, but was learned from the data. Thestructural learning algorithm was also enhanced by adding a pruning stage. This pruning stage aims atreducing the size of the network without resulting in any loss of relevant information or loss of accuracy.This means that nodes, which are not valuable for decision-making, are pruned away. In order to decidewhich nodes in the network are suitable for pruning, the reduction in entropy between two nodes wascalculated using equation (1), shown in section 4.6. Obviously, a huge entropy reduction indicates apotentially important and useful node in the network. An entropy reduction of less than 0.05 bits was usedas a threshold to prune the network. Once the pruned network is constructed for every decision agent, themodel can be used for prediction. To this end, probability distributions of all the variables in the networkshave to be computed. A parameter learning algorithm developed by Lauritzen (1995) was used to calculatethese probability distributions. An example of such a distribution can be seen in the left part of Figure 4,where each state in the network is shown with its belief level (probability) expressed as a percentage and abar chart. The last step is to transform the predictive model to the decision table formalism. The approachfor doing so was already described in section 4.5.For building the BNT classifier, a decision tree is not derived directly from the original data, but fromthe Bayesian networks that are built in the previous step. The procedure for doing this was explained insection 4.6, while the approach in section 3.1 can be used here as well for converting the tree into a decisiontable format. Once again, the decision tables are then sequentially executed to predict activity-travelpatterns.In the next section, we report the results of detailed quantitative analyses that were conducted toevaluate the BNT classifier for every decision agent in theAlbatrossmodel. The results of the threealternative approaches are validated in terms of accuracy percentages. The techniques are compared at boththe activity pattern level and the trip level.6. RESULTS6.1 Model Comparison: Accuracy ResultsTo be able to test the validity of the presented models on a holdout sample, only a subset of the cases is usedto build the models (i.e., “training set”). The decline in goodness-of-fit between this “training” set and the

15validation set is taken as an indicator of the degree of overfitting. The purpose of the validation test is also toevaluate the predictive ability of the three techniques for a new set of cases. For each decision step, we useda random sample of 75% of the cases to build and optimise the models. The other subset of 25% of the caseswas presented as “unseen” data to the models; this part of the data was used as the validation set. Theaccuracy percentages that indicate the predictive performance of the three models on the training and testsets are presented in Table 2.It can be seen that the accuracy percentages of the BNT classifier are comparable with the accuracyresults of the Bayesian network approach. This should not be surprising of course, because Bayesiannetworks were used as the underlying structure of the decision trees. More important is the observation thatthe newly proposed BNT classifier outperformed the CHAID decision trees for all nine decision agents oftheAlbatrossmodel. This means that by using Bayesian networks as the underlying structure for buildingdecision trees, better results can be obtained than using a traditional CHAID based decision tree approach.In terms of validity of the three models, we can conclude that the degree of over-fitting (i.e., the differencebetween the training and the validation set) is low for all decision agents. Therefore, we conclude that thetransferability of the models to a new set of cases is satisfactory. The next section examines the results ofthe models at the pattern level.<INSERT TABLE 2 HERE>6.2. Activity Pattern Level AnalysisAs explained before, the set of decision tables which is derived for each model will predict activityschedules, assuming the sequential execution of the decision tables, depicted in Figure 1. At the activitypattern level, sequence alignment methods (SAM) (Joh,et al., 2001a) were used to calculate the similaritybetween observed and generated activity schedules. This measure allows users to evaluate the goodness-of-fit. SAM originally stems from work in molecular biology to measure the biological distance between DNAand RNA strings. Later, it was used in time use research (Wilson, 1998). To account for differences incomposition as well as sequential order of elements, SAM determines the minimum effort required to maketwo strings identical using insertion, deletion and substitution operations.The mean SAM distances between the observed and the predicted schedules are shown in Table 3.SAM distances were separately calculated for the qualitative activity pattern attributes (activity type, with-whom, location and mode). Also, both “UDSAM” and “MDSAM” measures were calculated. UDSAMrepresents a weighted sum of attribute SAM values, where activity type was given a weight of two units and

16the other attributes a weight of one unit. To account for the multidimensionality, which is incorporated intheAlbatrossmodel, the MDSAM measure (Joh,et al., 2001b) was used. The lower the SAM measure, thehigher the degree of similarity between observed and predicted activity sequences.<INSERT TABLE 3 HERE>Obviously, the number of decision rules, which are derived from the Bayesian network is much largerthan the number of rules, which are extracted from both decision trees. This difference should be attributedto the different nature of the technique; the number of variables that are incorporated is the same. This largernumber of rules has no negative impact on performance; in fact, Table 3 provides evidence on the contrary.However, as said before, Bayesian networks likely suffer from (i) suboptimal decision making becausemany rules may never be used and (ii) interpretation difficulties. The integration approach developed in thisstudy aimed at solving both potential problems. It can be seen from table 3 that the decision tree using theBayesian network as the underlying structure, performed better than the state-of-the-art CHAID decisiontree approach. Unfortunately, however, compared to Bayesian networks, some of the very good performanceis lost. However, the rules which are derived from the integrated decision tree are better understandable anddo not contain a fixed number of variables in the conditions of the rules.Thus, it seems that the problem can be reduced to a trade-off between model complexity andaccuracy. Since the CHAID decision tree approach and the BNT classifier are comparable in terms of modelcomplexity (both approaches do contain more or less the same number of decision rules), the BNT classifieris a better way of predicting activity schedules when a pattern level performance measure is used forvalidation.

6.3. Trip Matrix Level AnalysisThe last measure to evaluate the predictive performance is calculated at the trip level. The origins anddestinations of each trip, derived from the activity patterns, are used to build OD-matrices. The originlocations are represented in the rows of the matrix and the destination locations in the columns. The numberof trips from a certain origin to a certain location is used as a matrix entry. A third dimension was added tothe matrix to break down the interactions according to some third variable. The third dimensions consideredare day of the week, transport mode and primary activity. The bi-dimensional case (no third dimension) wasconsidered as well. In order to determine the degree of correspondence between predicted and observedmatrices, a correlation coefficient was calculated. To this end, cells of the matrix were rearranged into one

17array and the calculation of the correlation is based on comparing the corresponding elements of thepredicted and the observed array. The results are presented in Table 4.It can be seen from Table 4 that the Bayesian network model generates higher correlation coefficientsbetween observed and predicted OD matrices than the original CHAID-basedAlbatrossmodel and theBNT classifier. Unfortunately, the good performance of BNT at the activity pattern level could not bemaintained at trip level. The correlation coefficient is especially low for the OD matrix, where the primaryactivity is taken as the third dimension. Although this should be the subject of additional and futureresearch, it is believed that the integrated approach did predict less activities in the activity schedule thanboth CHAID and Bayesian networks. This deficiency is less apparent at the pattern level than at the triplevel. The smaller number of rules, which are used in the integrated decision tree may be responsible forthis.<INSERT TABLE 4 HERE>7. CONCLUSION AND DISCUSSION OF THE RESULTSSeveral activity-based models are nowadays becoming operational and are entering the stage of applicationin transport planning. Some of these models (likeAlbatross) rely on a set of decision rules that are derivedfrom activity-travel diary data rather than on principles of utility maximization. While the use of rules mayhave some theoretical advantages, the performance of several rule induction algorithms in models ofactivity-travel behavior is not well understood. This is unfortunate because there is some empirical evidencethat decision tree induction algorithms are relatively sensitive to random fluctuations in the data. To add tothe growing literature on the performance of alternate decision-tree induction algorithms, we proposed inthis paper a way of combining Bayesian networks and decision trees. The idea was motivated by the resultsof a previous study (Janssens,et al.,2004) which suggested that Bayesian networks outperformed decisiontrees but that model complexity also increased significantly along with the increase in accuracy. For thisreason, this study was designed to examine whether a decision tree (which is implicitly always lesscomplex) that uses the structure of a Bayesian network (referred to as Bayesian network augmented treeclassifier, BNT) to select its decision nodes can achieve simultaneously accuracy results comparable toBayesian networks and an easier and less complex model structure, comparable to traditional decision trees.In order to test the validity and the transferability to a new set of cases of the proposed approach,datasets were split up into training and validation sets. The predictive performance of the new approach wasevaluated at three different levels. The test has shown that the BNT approach indeed achieved comparably

18good accuracy results than Bayesian networks and along with the Bayesian networks outperformed CHAIDdecision trees for all decision agents of theAlbatrossmodel. Moreover, the results showed that the decreasein model complexity of the BNT classifier also led to a decrease in performance at the activity pattern levelin comparison with Bayesian networks. However, when the BNT approach was compared to the equallycomplex CHAID decision tree, BNT outperformed CHAID at pattern level.Finally, at the third level of validation, the trip matrix level, correlation coefficients between observedand predicted origin-destination matrices showed that Bayesian networks outperform both CHAID andBNT. Thus, the good results of the integrated decision tree were not maintained at the trip level. It isbelieved that the technique did predict fewer activities in the activity schedule than both CHAID andBayesian networks. While the smaller number rules that are used may be responsible for this, this findingshould be the subject of additional research.In summary then, this study has shown some interesting results. There are some initial indicationsthat the new way of integrating decision trees and Bayesian networks may produce a decision tree that isstructurally more stable and less vulnerable to the variable masking problem. Additionally, the results at theactivity level and trip level suggest at least for theAlbatrossdata, a trade-off between model accuracy andmodel complexity. When the main issue is the interpretation and the general understanding of the decisionrules, the integrated BNT approach may be favoured above CHAID decision trees when decisions need tobe made at pattern level. At a more detailed level, one may benefit from the use of the CHAID approach.However, when the main issue is model accuracy, Bayesian networks should be favoured. Additional andfurther research should examine the behavior of the three techniques under evaluation on other data sets.