"... This paper describes REGAL, a distributed genetic algorithm-based system, designed for learning First Order Logic concept descriptions from examples. The system is a hybrid between the Pittsburgh and the Michigan approaches, as the population constitutes a redundant set of partial concept descriptio ..."

This paper describes REGAL, a distributed genetic algorithm-based system, designed for learning First Order Logic concept descriptions from examples. The system is a hybrid between the Pittsburgh and the Michigan approaches, as the population constitutes a redundant set of partial concept descriptions, each evolved separately. In order to increase effectiveness, REGAL is specifically tailored to the concept learning task; hence, REGAL is task-dependent, but, on the other hand, domain-independent. The system proved to be particularly robust with respect to parameter setting across a variety of different application domains. REGAL is based on a selection operator, called Universal Suffrage operator, provably allowing the population to asymptotically converge, in average, to an equilibrium state, in which several species coexist. The system is presented both in a serial and in a parallel version, and a new distributed computational model is proposed and discussed. The system has been test...

by
Richard Sutton
- In Proceeding of Tenth National Conference on Artificial Intelligence AAAI-92

"... Appropriate bias is widely viewed as the key to efficient learning and generalization. I present a new algorithm, the Incremental Delta-Bar-Delta (IDBD) algorithm, for the learning of appropriate biases based on previous learning experience. The IDBD algorithm is developed for the case of a simple, ..."

Appropriate bias is widely viewed as the key to efficient learning and generalization. I present a new algorithm, the Incremental Delta-Bar-Delta (IDBD) algorithm, for the learning of appropriate biases based on previous learning experience. The IDBD algorithm is developed for the case of a simple, linear learning system---the LMS or delta rule with a separate learning-rate parameter for each input. The IDBD algorithm adjusts the learning-rate parameters, which are an important form of bias for this system. Because bias in this approach is adapted based on previous learning experience, the appropriate testbeds are drifting or non-stationary learning tasks. For particular tasks of this type, I show that the IDBD algorithm performs better than ordinary LMS and in fact finds the optimal learning rates. The IDBD algorithm extends and improves over prior work by Jacobs and by me in that it is fully incremental and has only a single free parameter. This paper also extends previous work by pr...

...pabilities of the IDBD algorithm were assessed using a series of tracking tasks---supervised-learning or concept-learning tasks in which the target concept drifts over time and has to be tracked (cf. =-=Schlimmer 1987-=-). Non-stationary tasks are more appropriate here than conventional learning tasks because we are trying to assess the IDBD algorithm's ability to learn biases during early learning and then use them ...

"... Feature selection is an integral part of most learning algorithms. Due to the existence of irrelevant and redundant attributes, by selecting only the relevant attributes of the data, higher predictive accuracy can be expected from a machine learning method. In this paper, we propose the use of a ..."

Feature selection is an integral part of most learning algorithms. Due to the existence of irrelevant and redundant attributes, by selecting only the relevant attributes of the data, higher predictive accuracy can be expected from a machine learning method. In this paper, we propose the use of a three-layer feedforward neural network to select those input attributes that are most useful for discriminating classes in a given set of input patterns. A network pruning algorithm is the foundation of the proposed algorithm. By adding a penalty term to the error function of the network, redundant network connections can be distinguished from those relevant ones by their small weights when the network training process has been completed. A simple criterion to remove an attribute based on the accuracy rate of the network is developed. The network is retrained after removal of an attribute, and the selection process is repeated until no attribute meets the criterion for removal. Our ...

"... The anomaly detection problem has been widely studied in the computer security literature. In this paper we present a machine learning approach to anomaly detection. Our system builds user profiles based on command sequences and compares current input sequences to the profile using a similarity meas ..."

The anomaly detection problem has been widely studied in the computer security literature. In this paper we present a machine learning approach to anomaly detection. Our system builds user profiles based on command sequences and compares current input sequences to the profile using a similarity measure. The system must learn to classify current behavior as consistent or anomalous with past behavior using only positive examples of the account&apos;s valid user. Our empirical results demonstrate that this is a promising approach to distinguishing the legitamate user from an intruder.

"... In anomaly detection, the normal behavior of a process is characterized by a model, and deviations from the model are called anomalies. In behavior-based approaches to anomaly detection, the model of normal behavior is constructed from an observed sample of normally occurring patterns. Models of nor ..."

In anomaly detection, the normal behavior of a process is characterized by a model, and deviations from the model are called anomalies. In behavior-based approaches to anomaly detection, the model of normal behavior is constructed from an observed sample of normally occurring patterns. Models of normal behavior can represent either the set of allowed patterns (positive detection) or the set of anomalous patterns (negative detection). A formal framework is given for analyzing the tradeoffs between positive and negative detection schemes in terms of the number of detectors needed to maximize coverage. For realistically sized problems, the universe of possible patterns is too large to represent exactly (in either the positive or negative scheme). Partial matching rules generalize the set of allowable (or unallowable) patterns, and the choice of matching rule affects the tradeoff between positive and negative detection. A new match rule is introduced, called-chunks, and the generalizations induced by different partial matching rules are characterized in terms of the crossover closure. Permutations of the representation can be used to achieve more precise discrimination between normal and anomalous patterns. Quantitative results are given for the recognition ability of contiguous-bits matching together with permutations.

...roblem is solvable for noise-free classes and certain formal domains, in many practical cases, the problem is known to be computationally intractable [27] or to lead to substantial overgeneralization =-=[26]-=-. The statistical community has examined the closely related problems of outlier detection and robust statistics, finding that the effectiveness of the learning system depends critically on domain-dep...

by
Giulia Pagallo
- Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 1989

"... We investigate the problem of learning DNF concepts from examples using decision trees as a concept description language. Due to the replication problem, DNF concepts do not always have a concise decision tree description when the tests at the nodes are limited to the initial attributes. However, th ..."

We investigate the problem of learning DNF concepts from examples using decision trees as a concept description language. Due to the replication problem, DNF concepts do not always have a concise decision tree description when the tests at the nodes are limited to the initial attributes. However, the representational complexity may be overcome by using high level attributes as tests. We present a novel algorithm that modifies the initial bias determined by the primitive attributes by adaptively enlarging the attribute set with high level attributes. We show empirically that this algorithm outperforms a standard decision tree algorithm for learning small random DNF with and without noise, when the examples are drawn from the uniform distribution. 1

Hybrid Intelligent Systems that combine knowledge based and artificial neural network systems typically have four phases involving domain knowledge representation, mapping of this knowledge into an initial connectionist architecture, network training and rule extraction respectively. The final phase is important because it can provide a trained connectionist architecture with explanation power and validate its output decisions. Moreover, it can be used to refine and maintain the initial knowledge acquired from domain experts. In this paper, we present three rule extraction techniques. The first technique extracts a set of binary rules from any type of neural network. The other two techniques are specific to feedforward networks with a single hidden layer of sigmoidal units. Technique 2 extracts partial rules that represent the most important embedded knowledge with an adjustable level of detail, while the third technique provides a more comprehensive and universal approach. A rule eval...

... instances are divided into a training set of size 341 and a test set of size 342. Other popular data sets that have been used as benchmarks for rule extraction approaches are the Monk [49], Mushroom =-=[21]-=- and the DNA promoter [54] data sets. All three of these data sets inputs are symbolic/discrete by nature. Since we want to test more general problems that may include continuous valued variables, Iri...

"... This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing &quot;blas selection &quot; systems, examining the similarities and differences i ..."

This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing &amp;quot;blas selection &amp;quot; systems, examining the similarities and differences in their inductive policies, and idemify three techniques useful for building inductive policies. We then present a framework for representing and automaticaIly selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost oferrors. The experiments show that a common framework can be used to implement policies for a variety of different types of blas selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.

...es, others do not. STABB (and FRINGE (Pagallo, 1989) and CITRE (Matheus, 1989)) starts each search of the hypothesis space from scratch, not constructing a concept description across biases. STAGGER (=-=Schlimmer, 1987-=-), on the other hand, constructs its concept description as the system's bias is modified by adding new terms.s40 F.J. PROVOST AND B.G. BUCHANAN The VBMS system (Rendell, Seshu, & Tcheng, 1987) choose...