32 Supplementary Lecture SlidesNote: The slides following the end of chapter summary are supplementary slides that could be useful for supplementary readings or teachingThese slides may have its corresponding text contents in the book chapters, but were omitted due to limited time in author’s own course lectureThe slides in other chapters have similar convention and treatmentApril 13, 2017Data Mining: Concepts and Techniques

38 Are All the “Discovered” Patterns Interesting?Data mining may generate thousands of patterns: Not all of them are interestingSuggested approach: Human-centered, query-based, focused miningInterestingness measuresA pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirmObjective vs. subjective interestingness measuresObjective: based on statistics and structures of patterns, e.g., support, confidence, etc.Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.April 13, 2017Data Mining: Concepts and Techniques

39 Find All and Only Interesting Patterns?Find all the interesting patterns: CompletenessCan a data mining system find all the interesting patterns? Do we need to find all of the interesting patterns?Heuristic vs. exhaustive searchAssociation vs. classification vs. clusteringSearch for only interesting patterns: An optimization problemCan a data mining system find only the interesting patterns?ApproachesFirst general all the patterns and then filter out the uninteresting onesGenerate only the interesting patterns—mining query optimizationApril 13, 2017Data Mining: Concepts and Techniques

41 A Few Announcements (Sept. 1)A new section CS412ADD: CRN and its rules/arrangements4th Unit for I2CS studentsSurvey report for mining new types of data4th Unit for in-campus studentsHigh quality implementation of one selected (to be discussed with TA/Instructor) data mining algorithm in the textbookOr, a research report if you plan to devote your future research thesis on data miningApril 13, 2017Data Mining: Concepts and Techniques

42 Why Data Mining Query Language?Automated vs. query-driven?Finding all the patterns autonomously in a database?—unrealistic because the patterns could be too many but uninterestingData mining should be an interactive processUser directs what to be minedUsers must be provided with a set of primitives to be used to communicate with the data mining systemIncorporating these primitives in a data mining query languageMore flexible user interactionFoundation for design of graphical user interfaceStandardization of data mining industry and practiceApril 13, 2017Data Mining: Concepts and Techniques

47 DMQL—A Data Mining Query LanguageMotivationA DMQL can provide the ability to support ad-hoc and interactive data miningBy providing a standardized language like SQLHope to achieve a similar effect like that SQL has on relational databaseFoundation for system development and evolutionFacilitate information exchange, technology transfer, commercialization and wide acceptanceDesignDMQL is designed with the primitives described earlierApril 13, 2017Data Mining: Concepts and Techniques