from the UCI repository (699 patterns), and the Heart disease dataset from Statlog (270 patterns). An ensemble consisting of two networks, each with five hidden nodes, was trained using NC. We use 5-fold cross-validation, and 40 trials from uniform random weights in

Then we apply it to two real-world medical diagnosis datasets, the breast-cancer dataset and the heart disease dataset. 4.1. A Synthetic Dataset A two-variable synthetic dataset is generated by the two-dimensional gamma distribution. Two classes of data are

using different the sets of internal nodes. The same behavior is seen for k = 4 and k = 5. In all cases the discovered decision trees differ syntactically per fold and random seed. The Heart Disease Data Set The results on the Heart disease data set are displayed in Table 6. All our gp algorithms show a large improvement in misclassification performance over our simple gp algorithm. In all but two cases

are tabulated in Table III. Table III shows that the generalization ability of NeC4.5 with µ = 100% is better than that of C4.5. In detail, pairwise two-tailed t-tests indicate that there are ten data sets (balance, breast, cleveland, credit, heart iris, vehicle, waveform21, waveform40, and wine) where NeC4.5 with µ = 100% is significantly more accurate than C4.5, while there is no significant

perform differently in 19 out of 27 cases. For some rows, the test consistently indicates no difference between any two of the three schemes, in particular for the iris and Hungarian heart disease datasets. However, most rows contain at least one cell where the outcomes of the test are not consistent. The row labeled "consistent" at the bottom of the table lists the number of datasets for which all

For a simple comparison, we give the following statistics numbers: -- Comparing PCL, C4.5, Bagging and Boosting, PCL won the best accuracy on 5 data sets (i.e., breast-w, cleve, heart HIV, and promoter); Bagging won on 1 data set (hypothyroid); and Boosting won the best accuracy on 4 data sets (i.e., hepatitis, lymph, sick and splice). -- Comparing

greatly offsets its weakness in the conciseness of the generated rule sets. A typical rule set generated by the proposed algorithm is shown in Table 3, which is obtained from one run on the data set Heart disease. IV. CONCLUSIONS In this paper, we propose a novel rule learning algorithm that employs neural network ensemble as front-end process. The algorithm trains a neural network ensemble at

has 351 examples in 33 dimensions and is slightly noisy. The heart data set has 270 examples in 13 dimensions. The Pima Indians diabetes data set has 768 examples in eight dimensions. These last two data sets have a high degree of overlap which leads to a dense model for

Recognition Experiments In this section we compare the Bayesian-Transduction (BT) algorithm and the kernel perceptron when used within the typicalness framework. We ran experiments on two toy datasets, and the well-known heart benchmark data set. For the artificial data, one dataset was created using a uniform prior over w such that jjwjj = 1 (this is the correct prior for Bayesian

decision nodes may also improve the accuracy of the tree because samples from real world problems may be better separated by oblique hyperplanes. This is the case with the heart disease data set (HeartD in Table 2) where significant improvement is achieved by the neural network methods over C4.5. There is no significant difference in the accuracy and size of the decision trees generated by

where the base learner solves (6) exactly, then to examine LPBoost in a more realistic environment. 5.1 Boosting Decision Tree Stumps We used decision tree stumps as a base learner on six UCI datasets: Cancer (9,699), Heart (13,297), Sonar (60,208), Ionosphere (34,351), Diagnostic (30,569), and Musk (166,476). The number of features and number of points in each dataset are shown in parentheses

attribute values rather than as missing values. In auto, the class variable was the make of the automobile. In the breast cancer domains, all features were treated as continuous. The heart disease data sets were recoded to use discrete values where appropriate. All attributes were treated as continuous in the kingrook-vs-king (krk) data set. In lymphography, the lymph-nodes-dimin, lymph-nodes-enlar,

rates ranging from 71.4% to 74.4%. Further, [19] reports a 76% correct prediction rate using 75% of the data for training. heart Disease (Cleveland). The Cleveland Clinic Foundation heart disease dataset, contributed to the repository by R. Detrano, contains 303 observations, 165 of which describe healthy people and 138 sick ones; 7 observations are incomplete, and 2 of the observations of healthy

starting with 0. Unknown values are set to 0.5. heart Disease This dataset concerning heart disease diagnosis contains 4 sub-databases col16 lected from 4 locations. Each database has the same instance format. We used two of them in our experiment: one from Cleveland

situation. The performance variation among the member models in bagging is rather small because they are derived from the same learning algorithm using bootstrap samples. Section 3.3 4. The heart dataset used by Breiman (1996b; 1996c) is omitted because it was very much modified from the original one. 284 Issues in Stacked Generalization shows that a small performance variation among member models

treatment is done for unknown values, exploiting each algorithm its own characteristics. PEBLS and HOODG algorithms are not able to handle unknown values: thus, they are only used in the four datasets without unknown values (diabetes, heart liver and lymphography). For each database and algorithm, a classification model is induced using the specified training set: when run with fixed default

representation. To demonstrate our interpretation, we consider the alternating tree presented in Figure 4. This tree is the result of running our learning algorithm for six iterations on the cleve data set from Irvine. This is a data set of heart disease diagnostics for which the goal is to discriminate between sick and healthy people 3 In our mapping positive classification correspond to healthy and

rules and some very high (say 90%) confidence rules using approaches similar to mining top rules. Experimental results using the Mushroom, the Cleveland heart disease, and the Boston housing datasets are reported to evaluate the efficiency of the proposed approach. 1 Introduction Association rules [1] were proposed to capture significant dependence between items in transactional datasets. For

seem to be especially well suited for these reduction techniques. For example, RT3 required less than 2% storage for the Heart Swiss dataset, yet it achieved even higher generalization accuracy than the kNN algorithm. On the other hand, some datasets were not so appropriate. On the Vowel dataset, for example, RT3 required over 45% of the

and Robert Detrano, of the V.A. Medical Center, Long Beach and Cleveland Clinic Foundation, for supplying the heart disease dataset. Please see the documentation in the UCI Repository for detailed information on all datasets. Appendix A This appendix describes how, for each one of P prototypes, the relevant features are chosen

available in the UCI Machine Learning Repository 2 [21], and some of them have even been used to compare different pruning methods [25], [20], [3]. The database heart is actually the union of four data sets on heart diseases, with the same number of attributes but collected in four distinct places (Hungary, Switzerland, Cleveland, and Long Beach). 3 Of the 76 original attributes, only 14 have been

only 2 attributes and achieve higher accuracy rate on the testing data. The separator generated from the pruned network is depicted in Fig. 5. B. Detailed analysis 2: Cleveland Heart Disease Dataset. The dataset consists of 303 patterns. We discarded patterns with missing attribute values and used only the remaining 297 patterns. The patterns were divided randomly into training and testing set.

resulted in a fairly small number of prototypes that can achieve a very good level of classification accuracy. For example, Aha's IB3 algorithm achieves 79% accuracy on the Cleveland heart disease data set [Murphy and Aha, 49 1994] while retaining only approximately 4% of the 303 instances [Aha, 1990] . Results such as this hint that a small number of prototypes will suffice on some data. In Table

to test learned models on noise-free examples (including noisy variants of the KRK and LED domains) but for the natural domains we tested on possibly noisy examples. The large variant of the Soybean data set was used and the 5-class variant of the Heart data set was used. 5.1. Does using multiple rule sets lead to lower error? In this section we present results of an experiment designed to answer the

we expected IDTM to fail miserably, given that the chances of matching continuous features in the table are slim without preprocessing the data. Although C4.5 clearly outperforms IDTM on most datasets, IDTM outperforms C4.5 on the heart dataset and achieves similar performance on nine out of the 22 datasets (australian, cleve, crx, german, hepatitis, horse-colic, iris, lymphography, and

in error. The execution time on a Sparc20 for feature subset selection using ID3 ranged from under five minutes for breast-cancer (Wisconsin), cleve, heart and vote to about an hour for most datasets. DNA took 29 hours, followed by chess at four hours. The DNA run took so long because of ever increasing estimates that did not really improve the test-set accuracy. 7 Conclusions We reviewed the

together, since their test costs have different scales (see Appendix A). The test costs in the Heart Disease dataset, for example, are substantially larger than the test costs in the other four datasets. Third, it is difficult to combine average costs for different values of k in a fair manner, since more weight

The five selected datasets were: echocardiogram, hayes-roth, heart horse-colic,andiris datasets. These datasets (marked in Table 7.1 with a * symbol beside their name) contain a sampling of attribute types and domains. For

however, naive Bayes performs very well, and on some datasets (such as heart c and labor) it performs considerably better than the OB1 results shown (presumably because its attribute independence assumption isn't violated). The next section investigates

from the UCI Machine Learning Repository [13]: Wisconsin Diagnosis Breast Cancer (WDBC), Ionosphere, and Cleveland heart The fourth dataset is a dataset related to the nontraditional authorship attribution problem related to the federalist papers [7] and the fifth dataset is a dataset used for training in a computer aided detection

and in the stopping criteria. Both methods were allowed the same maximum number of iterations. 8.1. Boosting Decision Tree Stumps We used decision tree stumps as base hypotheses on the following six datasets: Cancer (9,699), Diagnostic (30,569), Heart (13,297), Ionosphere (34,351), Musk (166,476), and Sonar (60,208). The number of features and number of points in each dataset are shown, respectively,

0.85. On euthyroid, threshold-moving is the best, under-sampling is the worst in the effective range, while the ensemble methods become poor when PCF(+) is bigger than 0.8. On the remaining nine data sets all the methods work well. On heart s the ensemble methods are slightly better than others. On heart the ensemble methods are apparently better than over-sampling, under-sampling, and

profile instead of using all attributes in the original clinical data. The results remain the same. RESULTS We evaluated the system by applying it to heart disease, diabetes, and breast cancer. All data sets were obtained from the UCI Repository of Machine Learning databases and domain theories. 7 Heart Disease Four clinical data sets were used. These sets consists of patients who had been referred for

The other datasets (Echocardiogram, Glass 2, Heart and Hepatitis) are small, and the results of the experiments are not normally distributed, so the t-test cannot be applied. Dataset ECL-LSDc ECL-LSDf ECL-LUD

by replacing a node's test with the test at one of its children, so perhaps m=1 gives more latitude in the pruning phase. Information-gain (turning the g parameter on) was a big winner on several datasets: vehicle, segment, hypothyroid, heart and cleve. Turning on the s parameter helped in tic-tactoe and monk1. Table 5: Experimental results: Accuracies for C4.5, C4.5-AP, and C4.5* from running on

with ~ C = ¯ C 2a = decision value at x using the linear kernel with ¯ C. We can observe the result of Theorems 8 and 9 from Figure 1. The contours show five-fold cross-validation accuracy of the data set heart in different r and C. The contours with a = 1 are on the left-hand side, while those with a = 0.01 are on the right-hand side. Other parameters considered here are log 2 C fromlog 2 (- r) from

decision nodes may also improve the accuracy of the tree because samples from real world problems may be better separated by oblique hyperplanes. This is the case with the heart disease data set (HeartD in Table 2) where significant improvement is achieved by the neural network methods over C4.5. There is no significant difference in the accuracy and size of the decision trees generated by

of the UCI database. From left to right: Pima, Ionosphere, and Heart dataset. Top: small fraction of data with missing variables (50%), Bottom: large number of observations with missing variables (90%) The experimental results are summarized by the graphs(1). The robust

models optimized carefully using the IVGA implementation. The model search of our IVGA implementation was able to discover the best grouping, i.e., the one with the smallest cost. 3.2. Arrhythmia data set The identification of different types of heart problems, namely cardiac arrhythmias, is carried out based on electrocardiography measurings from a large number of electrodes. We used a freely

small clusters. We can also see that for c 2 [1.5, 4] the number of instances and CPU time reduce significantly. The results presented in Table 2 show that appropriate values for the heart disease data set are c 2 [0, 1.5], because further decrease in c leads to changes in the cluster structure of the data set. We can again see that these values of c allow significant reduction in the number of

1.0 7 152/297 26.73 1.5 6 122/297 14.43 2.0 5 107/297 8.25 4.0 5 65/297 5.05 6.0 5 41/297 3.34 8.0 5 28/297 3.22 The results presented in Table 9 show that appropriate values for the heart disease dataset are c 2 [0, 1.5], because further decrease in c leads to changes in the cluster structure of the dataset. We can again see that these values of c allow significant reduction in the number of

in which they are found to work. This approach is compared to the equivalent global evolutionary computation approach with respect to predicting the occurrence of heart disease in the Cleveland data set. It outperforms a global approach, but the space of attributes within which this evolutionary process occurs can greatly effect the efficiency of the technique. 1. Introduction The idea here is to

Heart disease status is known. By evaluating a new patient's attributes with respect to the separating plane a diagnosis is made. The Cleveland Heart Disease Database (Heart) is a publicly available dataset that contains information on 297 patients using 13 attributes [6]. A second application, as discussed previously, is the diagnosis of breast cancer. To evaluate whether a tumor is benign or

namely Ljubljana breast cancer, Wisconsin breast cancer, Hepatitis and Heart disease. In two data sets, Ljubljana breast cancer and Heart disease, the difference was quite small. In the other two data sets, Wisconsin breast cancer and Hepatitis, the difference was more relevant. Note that although