This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.

"... In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers -- Naïve-Bayes, tree augmented Naïve-Bayes, BN augmented Naïve-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BNlearnin ..."

In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers -- Naïve-Bayes, tree augmented Naïve-Bayes, BN augmented Naïve-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BNlearning algorithm. Experimental results show the obtained classifiers, learned using the CI based algorithms, are competitive with (or superior to) the best known classifiers, based on both Bayesian networks and other formalisms; and that the computational time for learning and using these classifiers is relatively small. Moreover, these results also suggest a way to learn yet more effective classifiers; we demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities. 1 INTRODUCTION Many tasks -- including fault diagnosis, pattern recognition and forecasting -- c...

by
Jie Cheng, David A. Bell, Weiru Liu
- In Proceedings of the Sixth ACM International Conference on Information and Knowledge Management

"... This paper presents an efficient algorithm for learning Bayesian belief networks from databases. The algorithm takes a database as input and constructs the belief network structure as output. The construction process is based on the computation of mutual information of attribute pairs. Given a data ..."

This paper presents an efficient algorithm for learning Bayesian belief networks from databases. The algorithm takes a database as input and constructs the belief network structure as output. The construction process is based on the computation of mutual information of attribute pairs. Given a data set that is large enough, this algorithm can generate a belief network very close to the underlying model, and at the same time, enjoys the time complexity of O N ( ) 4 on conditional independence (CI) tests. When the data set has a normal DAG-Faithful (see Section 3.2) probability distribution, the algorithm guarantees that the structure of a perfect map [Pearl, 1988] of the underlying dependency model is generated. To evaluate this algorithm, we present the experimental results on three versions of the wellknown ALARM network database, which has 37 attributes and 10,000 records. The results show that this algorithm is accurate and efficient. The proof of correctness and the analysis of c...

This paper investigates the methods for learning predictive classifiers based on Bayesian belief networks (BN) -- primarily unrestricted Bayesian networks and Bayesian multinets. We present our algorithms for learning these classifiers, and discuss how these methods address the overfitting problem and provide a natural method for feature subset selection. Using a set of standard classification problems, we empirically evaluate the performance of various BN-based classifiers. The results show that the proposed BN and Bayes multi-net classifiers are competitive with (or superior to) the best known classifiers, based on both BN and other formalisms; and that the computational time for learning and using these classifiers is relatively small. These results argue that BN based classifiers deserve more attention in the data mining community. 1 In t roduct i on Many tasks -- including fault diagnosis, pattern recognition and forecasting -- can be viewed as classification, as each r...

"... This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our three-phase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the ..."

This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our three-phase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the node ordering is given, the algorithm only require ) ( 2 N O CI tests and is correct given that the underlying model is DAG-Faithful [Spirtes et. al., 1996]. The other algorithm deals with the general case and requires ) ( 4 N O conditional independence (CI) tests. It is correct given that the underlying model is monotone DAG-Faithful (see Section 4.4). A system based on these algorithms has been developed and distributed through the Internet. The empirical results show that our approach is efficient and reliable. 1 Introduction The Bayesian network is a powerful knowledge representation and reasoning tool under conditions of uncertainty. A Bayesian network is a directed acyclic graph ...

"... Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Twomethods for prediction of molecular bioactivity fo ..."

Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Twomethods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both techniques improve identification performance and in conjunction provide an improvement over using only one of the techniques. Our results suggest the importance of taking into account the characteristics in this data which may also be relevant in other problems of a similar type. Availability: Matlab source code is available at

"... Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). ..."

Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). Within the score+search paradigm, the dominant approach uses local search methods in the space of directed acyclic graphs (DAGs), where the usual choices for de ning the elementary modi cations (local changes) that can be applied are arc addition, arc deletion, and arc reversal. In this paper, we propose a new local search method that uses a dierent search space, and which takes account of the concept of equivalence between network structures: restricted acyclic partially directed graphs (RPDAGs). In this way, the number of dierent con gurations of the search space is reduced, thus improving eciency. Moreover, although the nal result must necessarily be a local optimum given the nature of the search method, the topology of the new search space, which avoids making early decisions about the directions of the arcs, may help to nd better local optima than those obtained by searching in the DAG space.

"... In this paper we present a novel constraint based structural learning algorithm for causal networks. A set of conditional independence and dependence statements (CIDS) is derived from the data which describes the relationships among the variables. Although we implicitly assume that there exist ..."

In this paper we present a novel constraint based structural learning algorithm for causal networks. A set of conditional independence and dependence statements (CIDS) is derived from the data which describes the relationships among the variables. Although we implicitly assume that there exists a perfect map for the true, yet unknown, distribution, there does not need to be a perfect map for the CIDSs derived from the limited data. The reason is that the distribution of limited data might differ from the true probability distribution due to sampling noise. We derive a necessary condition for the existence of a perfect map given a set of CIDSs and utilize it to check for inconsistencies. If an inconsistency is detected, the algorithm finds all Bayesian networks with a minimum number of edges such that a maximum number of CIDSs is represented in each of the multiple solutions. The advantages of our approach are illustrated using the alarm network data set. 1