completely parameter-free! Neither a threshold nor the number of clusters need to be specified. The number of clusters discovered by our algorithms seem to be very reasonable choices: for the Votes dataset most people vote according to the official position of their political parties, so having two clusters is natural; for the Mushrooms dataset, notice that both ROCK and LIMBO achieve much better

have been created from the #-informative words, and have been labelled, the classifier is all set and ready to go. 5 times 10 fold cross validation was performed in a similar manner to the voting dataset and the results collated. Standard statistical techniques including Wilcoxon Mann-Whitney ranking [33] and Students t-test [30]are used in order to analyse the results. The Wilcoxon ranking was used

problem with nine features that can take on between two and five values. The relative performances of the policies are closer to each other, but their behaviour is similar to Figure 4(b). The votes dataset (Figure 4(d)) is a binary class problem (democrat vs. republican), with 16 binary features, 435 instances, and a positive class probability of 0.61. In the votes dataset, there is a high proportion

In order to further test the HPBP algorithm on a sequential learning task drawn from a realworld
database, we selected the 1984 Congressional Voting Records database from the UCI
repository (Murphy & Aha, 1992)

1. Table 1 also indicates the number of observations for which some data values are missing. In our experiments, we removed the observations with missing values, with the exception of the voting data set, where almost half of the observations contained missing values. In this data set, there are sixteen attributes, and all of them are binary. In this case, we have substituted the missing binary

respectively. The three dataset are public domain datasets. The vote dataset contains votes for each of the U.S. House of Representatives Congressmen on the sixteen key votes. The problem is learning a concept for distinguishing

validation and the results are therefore based on averages of the test set calculated over 10 runs. Unless specified otherwise, all results are based on C4.5 without pruning. 3 AN EXAMPLE: THE VOTE DATASET In order to illustrate the problem with small disjuncts and introduce a way of measuring this problem, we examine the concept learned by C4.5 from the Vote dataset. Figure 1 shows how the correctly

reported the configuration of the network in detail. Among their configuration, the learning rate 2.0 and the range of initial weights 17 [Gamma 0:3; +0:3] are adopted in our experiments. Vote This dataset consists of the voting records of 435 congressmen on 16 issues in the 1984 congress, 2nd session. The votes are classified into ``yea'', ``nay'', and ``unknown''. The classification problem is to

takes a few minutes on average and leaves a few thousand entries in the hash table. For the larger Votes dataset the run takes 24 minutes on average and leaves around 16000 entries in the hash table. During testing, whenever a new belief state b o a was generated that was not in the hash table, b o a was

the cluster which has the highest score with respect to that transaction. We performed clustering of transactions on 1984 United States Congressional Voting Records Database provided by [MM96]. The data set includes 435 transactions each corresponding to one Congressman's votes on 16 key issues. We removed class values from each transaction, and we followed the steps specified in Section 2.1 to

Assistant-R and LFC achieve significantly better result (99.95% confidence level). This result confirms that RELIEFF estimates the quality of attributes better than the information gain. On the VOTE data set the naive Bayesian classifier is the worst, while both versions of Assistant are comparable to the rule based classifier by Smyth et al. [31]. The most interesting results appear in the MESH

is linearly separable. 1984 United States Congressional Voting Records Database This data set includes votes for each of the 435 U.S. House of Representatives Congressmen. There are 267 democrats and 168 republicans. The chosen attributes represent 16 key votes. Possible values for the

1990 [17]. For our experiment, 315 samples were randomly selected for training, 35 samples were selected for cross-validation, and 349 for testing. 2. United States Congressional voting records Dataset. The dataset consists of the voting records of 435 congressmen on 16 major issues in the 98th Congress. The votes are classified into one of the three different types of votes: yea, nay, and

(table 6) indicates that the attributes are irrelevant to the class. On the VOTE data set the naive Bayesian classifier is the worst, while both versions of Assistant are comparable to the rule based classifier by Smyth et al. (1990). The most interesting results appear in the MESH

75 times, while a non-discriminative feature such as feature 18 is bought an average of only 2 times. For some budgets, the 0/1 error of SFL is nearly half that generated by Round-Robin. The votes dataset (Figures 3.7(a) and 3.7(b)) is a binary class problem (democrat vs. republican), with 16 binary features, 435 instances, and a positive class probability of 0.61. In the votes dataset, there is a

problem with nine features that can take on between two and five values. The relative performances of the policies are closer to each other, but their behaviour is similar to Figure 4(a). The votes dataset (Figure 4(c)) is a binary class problem (whether or not republican), with 16 binary features, 435 instances, and a positive class probability of 0:61. In the votes dataset, there is a high