7
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 7 Extracted features Original features How to construct good RS for SL? RQ4: Which features – original, extracted or both – are useful for SL? RQ1 – How important is to use class information in the FE process? RQ2 – Is FE data oriented or SL oriented or both? RQ5 – How many extracted features are useful for SL? RQ6 – How to cope with the presence of contextual features in data, and data heterogeneity? RQ7 – What is the effect of sample reduction on the performance of FE for SL? RQ3 – Is FE for dynamic integration of base-level classifiers useful in a similar way as for a single base-level classifier?

8
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 8 Research Problem  Studying both theoretical background and practical aspects of FE for SL in KDSs Main Contribution  Many-sided analysis of the research problem  Ensemble of relatively small contributions Research Method  A multimethodological approach to the construction of an artefact for DM (following Nunamaker et al., 1990-91) DM Artifact Development Experimentation Theory Building Observation

9
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 9 Further Research How to help in decision making on the selection of the appropriate DM strategy for a problem at consideration? When FE is useful for SL? What is the effect of FE on interpret- ability of results and transparency of SL?

11
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 11 Research Questions: RQ1 – How important is to use class information in the FE process? RQ2 – Is FE a data- or hypothesis-driven constructive induction? RQ3 – Is FE for dynamic integration of base-level classifiers useful in a similar way as for a single base-level classifier? RQ4 – Which features – original, extracted or both – are useful for SL? RQ5 – How many extracted features are useful for SL?

12
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 12 Research Questions (cont.): RQ6 – How to cope with the presence of contextual features in data, and data heterogeneity? RQ7 – What is the effect of sample reduction on the performance of FE for SL? RQ8 – When FE is useful for SL? RQ9 – What is the effect of FE on interpretability of results and transparency of SL? RQ10 – How to make a decision about the selection of the appropriate DM strategy for a problem at consideration?

13
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 13 RQ1: U se of class information in FE  Tsymbal A., Puuronen S., Pechenizkiy M., Baumgarten M., Patterson D. 2002. Eigenvector- based Feature Extraction for Classification ( Article I, FLAIRS’02) Use of class information in FE process is crucial for many datasets: Class-conditional FE can result in better classification accuracy while solely variance-based FE has no effect on or deteriorates the accuracy. No superior technique, but nonparametric approaches are more stables to various dataset characteristics

14
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 14 RQ2: Is FE a data- or hypothesis-driven CI?  Pechenizkiy M. 2005. Impact of the Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5 ( Article III, AI’05) Search for the most appropriate FE technique FE process Trans- formed train set Train set Search for the most appropriate SL technique FE model SL process SL model Test set Prediction Search for the most appropriate FE technique FE process Trans- formed Train set Search for the most appropriate SL technique FE model SL process SL model Test set Prediction Ranking of different FE techniques according to the corresponding accuracy results of a SL technique can vary a lot for different datasets. Different FE techniques behave also in a different way when integrated with different SL techniques. Selection of FE method is not independent from the selection of classifier

16
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 16 RQ4: How to construct good RS for SL?  Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, ( Article II, CBMS’2004) Combination of original features with extracted features can be beneficial for SL with many datasets, especially when tree-based inducers like C4.5 are used for classification. Which features – original, extracted or both – are useful for SL?

18
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 18 RQ5: How many extracted features are useful?  Criteria for selecting the most useful transformed features are often based on variance accounted by the features to be selected  all the components, the corresponding eigenvalues of which are significantly greater than one  a ranking procedure: select principal components that have the highest correlations with the class attribute  Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, ( Article II, CBMS’2004)

21
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 21 RQ8: When FE is useful for SL?  Kaiser-Meyer-Olkin (KMO) criterion: accounts total and partial correlation IF KMO > 0.5 THEN Apply PCA General recommendation: Rarely works in the context of SL

22
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 22 RQ9: What is the effect of FE on interpretability?  Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, ( Article II, CBMS’2004)  Interpretability refers to whether a classifier is easy to understand. – rule-based classifiers like a decision tree and association rules are very easy to interpret, – neural networks and other connectionist and “black-box” classifiers have low interpretability. FE enables: New concepts – new understanding Information summary from a large number of features into a limited number of components The transformation formulae provide information about the importance of the original features Better RS – better neighbourhood – better interpretability by analogy with similar medical cases Visual analysis projecting data onto 2D or 3D plots.

23
20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 23 RQ9: Feature Extraction & Interpretability (cont. )  The assessment of interpretability relies on the user’s perception of the classifier  The assessment of an algorithm’s practicality depends much on a user’s background, preferences and priorities.  Most of the characteristics related to practicality can be described only by reporting users’ subjective evaluations.  Thus, –the interpretability issues are disputable and difficult to evaluate, –many conclusions on interpretability are relative and subjective.  Collaboration between DM researchers and domain experts is needed for further analysis of interpretability issues Objectivity of interpretability  Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, (Article II, CBMS’2004)