GATree Home

What is GATree?
This work is an attempt to overcome the use of greedy heuristics and search the decision tree space in a natural way. More specifically, we make use of genetic algorithms to directly evolve binary decision trees in the conquest for the one that most closely matches the target concept. On doing so we adopt a natural representation of the search space using actual decision trees and not binary strings. We couple our objective with a simplification motivation. We use GAs to robustly evolve accurate as well as simple decision trees.

Preference VS Procedural Bias
A preference bias is based on the learner’s behavior while a procedural bias is based on the learner’s design. For example, C4.5 is biased towards accurate, small trees (preference bias) and uses the gain-ratio metric and minimum-error pruning (different procedural biases). A preference bias is most often desirable since it determines the characteristics of the produced tree. On the other hand, an inadequate procedural bias may severely affect the quality of the output. The proposed search imposes a new weak procedural bias, one that allows the concept learner to consider a relative large number of hypotheses, in a relative efficient manner. The proposed weak bias employs global metrics of tree quality. We thus shift from “how to induce a tree” (standard, impurity-based induction) to “what criteria an induced tree must satisfy”.

What is special about GATree?
(a) GATree can continue decision tree evolution for as long as needed. If we have ample resources then we can expect an increasingly best-fit decision tree. Also, we can stop the evolution whenever the results are satisfactory since we evolve complete solutions to the problem.
(b) GAtree allows the user to select the characteristics of the resulting decision tree. Its easy to prefer smaller or more accurate trees.
(c) GAtree can provide a set of totally different decision trees that are close matches to the solution space. All those trees can be used alternatively to the best-fit one.
(d) There are certain domains where statistical inducers can not produce optimal trees. GATree can overcome global or local minimums. Please read the papers that present this approach and its benefits