Propositional Learning Systems

Reference: Bratko section 18.5

Aim:

To describe an algorithm whose input is a collection of instances
and their correct classification and whose output is one or more
sentences in some form of logic, describing each of the possible
output classes in terms of the input attributes.

Rather than searching for discriminant functions, symbolic learning systems
find expressions equivalent to sentences in some form of logic. For example, we
may distinguish objects according to two attributes: size and colour. We may
say that an object belongs to class 3 if its colour is red and its size is very
small to medium. Following the notation of Michalski (1983), the classes in
Figure 1 may be written as:

Note
that this kind of description partitions the universe into blocks, unlike the
function approximation methods that find smooth surfaces to discriminate
classes.

Interestingly, one of the popular early machine learning algorithms,
Aq
(Michalski, 1973), had its origins in switching theory. One of the concerns of
switching theory is to find ways of minimising logic circuits, that is,
simplifying the truth table description of the function of a circuit to a
simple expression in Boolean logic. Many of the algorithms in switching theory
take tables like Figure 1 and search for the best way of covering all of the
entries in the table.

Figure 1: Discrimination on attributes and values

Aq,
uses a covering algorithm, to build its concept description:

cover := {};
repeat
select one positive example, e;
construct the set of all conjunctive expressions
that cover e and no negative example in E-;
choose the `best' expression, x, from this set;
add x as a new disjunctof the concept;
remove all positive examples covered by x
until there are no positive examples left;

The
`best' expression is usually some compromise between the desire to cover as
many positive examples as possible and the desire to have as compact and
readable a representation as possible. In designing Aq, Michalski was
particularly concerned with the expressiveness of the concept description
language.

A drawback of the Aq learning algorithm is that, unlike
ID3,
it does not use statistical information, present in the training sample,
to guide induction.

Summary:

Aq iteratively constructs a logic-based description of concepts on
the basis of instances described by features.

APPENDIX

Michalski has always argued in favour of rule-based representations over tree
structured representations, on the grounds of readability. When the domain is
complex, decision trees can become very `bushy' and difficult to understand,
whereas rules tend to be modular and can be read in isolation from the rest of
the knowledge-base constructed by induction. On the other hand, decision trees
induction programs are usually very fast. A compromise is to use decision tree
induction to build an initial tree and then derive rules from the tree thus
transforming an efficient but opaque representation into a transparent one
(Quinlan, 1987).

It is instructive to compare the shapes that are produced by various learning
systems when they partition the universe. Figure 1 demonstrates one weakness of
decision tree and other symbolic classification. Since they approximate
partitions with rectangles (if the universe is 2-dimensional) there is an
inherent inaccuracy when dealing with domains with continuous attributes.
Function approximation methods and IBL may be able to attain higher accuracy,
but at the expense of transparency of the resulting theory. It is more
difficult to make general comments about genetic algorithms since the encoding
method will affect both accuracy and readability.

As we have seen, useful insights into induction can be gained by visualising it
as searching for a cover of objects in the universe. Unfortunately, there are
limits to this geometric interpretation of learning. If we wish to learn
concepts that describe complex objects and relationships between the objects,
it becomes very difficult to visualise the universe. For this reason, it is
often useful to rely on reasoning about the concept description language.

As we saw, the cover in Figure 1 can be expressed as clauses in propositional
logic. We can establish a correspondence between sentences in the concept
description language (the hypothesis language) and a diagrammatic
representation of the concept. More importantly, we can create a correspondence
between generalisation and specialisation operations on the sets of objects and
generalisation and specialisation operations on the sentences of the language.

Figure 2: Generalisation as set covering

For
example, Figure 2 shows two sets, labelled class 1 and class 2. It is clear
that class 1 is a generalisation of class 2 since it includes a larger number
of objects in the universe. We also call class 2 a specialisation of class 1.
By convention, we say the description of class 1 is a generalisation of
the description of class 2. Thus,

(1)

is
a generalisation of

(2)

Once
we have established the correspondence between sets of objects and their
descriptions, it is often convenient to forget about the objects and only
consider that we are working with expressions in a language. The reason is
simple. Beyond a certain point of complexity, it is not possible to visualise
sets, but it is relatively easy to apply simple transformations on sentences in
a formal language. For example, clause (2) can be generalised very easily to
clause (1) by dropping one of the conditions.