Abstract

The identification and management of context over time has become important in machine learning research over the last ten years. Although, there are established systems that have been created to detect changes successfully, there is room for further research to obtain an optimal system to both identify context and detect context changes from large datasets in an efficient manner. With the recent advancement of Bayesian network learning and Graphical presentation, this research extends the work by Schlimmer (118, 119), Widmer (34) and Harries (39) to derive context and detect the point of concept drift from large datasets automatically. In the research described in this thesis, we have made the following breakthroughs: 1. Firstly, by combining the use of Java Bayesian Network Classifiers (JBNC) and JavaBayes, we achieve a very rich package for learning and gaining understanding of concepts. The JBNC (51, 57), with the nodes discarding facility, has outperformed the Naive Bayes classifier (111) in producing accurate Bayesian network structures. The precise Bayesian network structures, with the contextual attributes (125) and relationships between these attributes, are shown graphically by JavaBayes (56). The contextual values of the attributes are statistically identified with the probability tables stored within each attribute node. Eventually, the Boolean characterization (118), which is the context, is derived. 2. Secondly, with the concept that the dataset with only one hidden context produces good self-test accuracy, we proposed novel "Top-down" and "Bottom-up" learning methods to detect the disjoint points where the concept begins to drift within large datasets automatically. These methods do not use the traditional "Windowing" method (30), which is very popular among the existing methods. Instead, the proposed methods utilise simple search operators. 3. Finally, learning accuracy mechanisms and noise handling methods are proposed to ensure the learning methods can detect concept drift in a real-life environment efficiently. Music Chord (34) and Vowel datasets (126) were previously used by Widmer (34) in his verification of METAL(B) and METAL(IB) in detecting concept drift in real-life environment. The proposed methods in this thesis are further validated with these datasets to show their efficiency and effectiveness in learning real-life datasets as compared to the META-Learning System.