2.
Missing values Classification rules
Popular alternative to decision trees
¡
Does absence of value have some significance?
¡
Antecedent (pre-condition): a series of tests (just like
¡
Yes ‰“missing” is a separate value
¡
the tests at the nodes of a decision tree)
¡
No ‰“missing” must be treated in a special way
Tests are usually logically ANDed together (but may
¡
z Solution A: assign instance to most popular branch
also be general logical expressions)
z Solution B: split instance into pieces
Consequent (conclusion): classes, set of classes, or
¡
Pieces receive weight according to fraction of training
£
instances that go down each branch probability distribution assigned by rule
Classifications from leave nodes are combined using the
£
Individual rules are often logically ORed together
¡
weights that have percolated to them
z Conflicts arise if different conclusions apply
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 7 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 8
From trees to rules From rules to trees
Easy: converting a tree into a set of rules More difficult: transforming a rule set into a tree
¡
¢
z One rule for each leaf:
Tree cannot easily express disjunction between rules
Antecedent contains a condition for every node on the path Example: rules which test different attributes
£
¢
from the root to the leaf
Consequent is class assigned by the leaf
£
If a and b then x
Produces rules that are unambiguous
¡
If c and d then x
z Doesn’t matter in which order they are executed
Symmetry needs to be broken
¢
But: resulting rules are unnecessarily complex
¡
Corresponding tree contains identical subtrees
¢
z Pruning to remove redundant tests/rules
(‰“replicated subtree problem”)
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 9 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 10
A tree for a simple disjunction The exclusive-or problem
If x = 1 and y = 0
then class = a
If x = 0 and y = 1
then class = a
If x = 0 and y = 0
then class = b
If x = 1 and y = 1
then class = b
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 11 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 12
2

3.
A tree with a replicated subtree “Nuggets” of knowledge
Are rules independent pieces of knowledge? (It
¡
seems easy to add a rule to an existing rule base.)
Problem: ignores how rules are executed
¡
If x = 1 and y = 1
then class = a Two ways of executing a rule set:
¡
If z = 1 and w = 1
then class = a z Ordered set of rules (“decision list”)
Otherwise class = b
£
Order is important for interpretation
z Unordered set of rules
Rules may overlap and lead to different conclusions for the
£
same instance
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 13 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 14
Interpreting rules Special case: boolean class
What if two or more rules conflict? Assumption: if instance does not belong to class
¡ ¡
z Give no conclusion at all? “yes”, it belongs to class “no”
z Go with rule that is most popular on training data? ¡
Trick: only learn rules for class “yes” and use
z … default rule for “no”
What if no rule applies to a test instance?
¡
If x = 1 and y = 1 then class = a
z Give no conclusion at all? If z = 1 and w = 1 then class = a
z Go with class that is most frequent in training data? Otherwise class = b
z …
Order of rules is not important. No conflicts!
¡
Rule can be written in disjunctive normal form
¡
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 15 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 16
Association rules Support and confidence of a rule
Support: number of instances predicted correctly
¡
¢
Association rules… ¡
Confidence: number of correct predictions, as
z … can predict any attribute and combinations proportion of all instances that rule applies to
of attributes ¡
Example: 4 cool days with normal humidity
z … are not intended to be used together as a set
If temperature = cool then humidity = normal
¢
Problem: immense number of possible
associations ‰ Support = 4, confidence = 100%
Output needs to be restricted to show only the Normally: minimum support and confidence pre-
¡
z
most predictive associations ‰ only those with specified (e.g. 58 rules with support * 2 and
high support and high confidence confidence * 95% for weather data)
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 17 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 18
3

4.
Interpreting association rules Rules with exceptions
¢
Interpretation is not obvious: ¢
Idea: allow rules to have exceptions
If windy = false and play = no then outlook = sunny
¢
Example: rule for iris data
and humidity = high
If petal-length * 2.45 and petal-length < 4.45 then Iris-versicolor
is not the same as ¢
New instance:
If windy = false and play = no then outlook = sunny Sepal Sepal Pet al Pet al Type
If windy = false and play = no then humidity = high lengt h wi dth lengt h wi dth
5.1 3.5 2.6 0.2 Iri s- setosa
¢
It means that the following also holds: ¢
Modified rule:
If humidity = high and windy = false and play = no If petal-length * 2.45 and petal-length < 4.45 then Iris-versicolor
then outlook = sunny EXCEPT if petal-width < 1.0 then Iris-setosa
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 19 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 20
A more complex example Advantages of using exceptions
¢
Exceptions to exceptions to exceptions … ¢
Rules can be updated incrementally
z Easy to incorporate new data
default: Iris-setosa
except if petal-length * 2.45 and petal-length < 5.355 z Easy to incorporate domain knowledge
and petal-width < 1.75
then Iris-versicolor
¢
People often think in terms of exceptions
except if petal-length * 4.95 and petal-width < 1.55
then Iris-virginica
¢
Each conclusion can be considered just in the
else if sepal-length < 4.95 and sepal-width * 2.45
then Iris-virginica
context of rules and exceptions that lead to it
else if petal-length * 3.35 z Locality property is important for understanding
then Iris-virginica
except if petal-length < 4.85 and sepal-length < 5.95
large rule sets
then Iris-versicolor z “Normal” rule sets don’t offer this advantage
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 21 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 22
More on exceptions Rules involving relations
Default...except if...then... So far: all rules involved comparing an attribute-
¢
¡
is logically equivalent to value to a constant (e.g. temperature < 45)
These rules are called “propositional” because they
¡
if...then...else
have the same expressive power as propositional
(where the else specifies what the default did) logic
¢
But: exceptions offer a psychological advantage ¡
What if problem involves relationships between
z Assumption: defaults and tests early on apply examples (e.g. family tree problem from above)?
more widely than exceptions further down z Can’t be expressed with propositional rules
z Exceptions reflect special cases z More expressive representation required
07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 23 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 24
4