4 Training Bayesian Networks Scenario: Given both the network structure and all variables observable: learn only the CPT (similar to naive Bayesien) Data Mining: Concepts and Techniques 4

5 Training Bayesian Networks Scenario: Network structure known, some variables hidden: gradient descent (greedy hill-climbing) method, i.e., search for a solution along the steepest descent of a criterion function (similar to neural network training) Example optimization function: likelyhood of observing the data Weights are initialized to random probability values At each iteration, it moves towards what appears to be the best solution at the moment, w.o. backtracking Weights are updated at each iteration & converge to local optimum 5

6 Training Bayesian Networks Scenario: Network structure unknown, all variables observable: search through the model space to reconstruct network topology Define a total order of the variables Construct sequences and for each sequence remove the variables that do not affect the current variable Creating an arc using remaining dependencies 6

14 Nearest Neighbor Classification Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes X 14

15 Nearest Neighbor Classification Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M Solution? Real-valued prediction for a given unknown tuple Returns the mean values of the k nearest neighbors 15

17 Collaborative Filtering: knn Classification/Prediction in Action User Perspective Lots of online products, books, movies, etc. Reduce my choices please Manager Perspective if I have 3 million customers on the web, I should have 3 million stores on the web. CEO of Amazon.com [SCH01] 17

19 How it Works? Each user has a profile Users rate items Explicitly: score from 1..5 Implicitly: web usage mining Time spent in viewing the item Navigation path Etc System does the rest, How? Collaborative filtering (based on knn!) 19

25 Challenges for Recommender Systems Collaborative systems Scalability Quality of recommendations Dealing with new users (no history available) Content-based systems False negatives False positives Limited by the features used to describe the items 25

27 Support Vector Machines: Overview A relatively new classification method for both separable and non-separable data Features Sound mathematical foundation Training time can be slow but efficient methods are being developed Robust and accurate, less prone to overfitting Applications: handwritten digit recognition, speaker identification, September 15,

35 Support Vector Machines What if the problem is not linearly separable? Introduce slack variables to the constraints: w w x x i i b b 1 for y 1 for y i i i i 1 1 Upper bound on the training errors: i i September 15,

36 Nonlinear Support Vector Machines What if decision boundary is not linear? Transform the data into higher dimensional space and search for a hyperplane in the new space Convert the hyperplane back to the original space September 15,

38 Support Vector Machines: Comments and Research Issues Robust and accurate with nice generalization properties Effective (insensitive) to high dimensions - Complexity characterized by # of support vectors rather than dimensionality Scalability in training - While the speed in test phase is largely solved, training for very large datasets is an unsolved problem. Extension to regression analysis Extension to multiclass SVM still in research Kernel selection still in researchh issue. 38

39 SVM Related Links SVM web sites Representative implementations LIBSVM: an efficient implementation of SVM, multi-class classifications SVM-light: simpler but performance is not better than LIBSVM, support only binary classification and only C language SVM-torch: another recent implementation also written in C. 39

40 SVM Introduction Literature Statistical Learning Theory by Vapnik: extremely hard to understand, containing many errors too C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), Better than the Vapnik s book, but still written too hard for introduction, and the examples are not-intuitive The book An Introduction to Support Vector Machines by N. Cristianini and J. Shawe-Taylor Also written hard for introduction, but the explanation about the Mercer s theorem is better than above literatures The neural network book by Haykins Contains one nice chapter of SVM introduction 40

CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

Chapter 8 Support Vector Machines for Face Recognition 8.1 Introduction In chapter 7 we have investigated the credibility of different parameters introduced in the present work, viz., SSPD and ALR Feature

Support vector machines Dominik Wisniewski Wojciech Wawrzyniak Outline 1. A brief history of SVM. 2. What is SVM and how does it work? 3. How would you classify this data? 4. Are all the separating lines

CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose

CLASSIFICATION OF CUSTOMER PURCHASE BEHAVIOR IN THE AIRLINE INDUSTRY USING SUPPORT VECTOR MACHINES Pravin V, Innovation and Development Team, Mu Sigma Business Solutions Pvt. Ltd, Bangalore. April 2012

MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each

Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?

Instance-Based Learning Mar Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

Algorithms in Bioinformatics II, SoSe 07, ZBIT, D. Huson, June 27, 2007 263 6 SVMs and Kernel Functions This lecture is based on the following sources, which are all recommended reading: C. Burges. A tutorial

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 9805 jplatt@microsoft.com Abstract Training a Support Vector Machine

A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

Linear Regression: One-Dimensional Case Given: a set of N input-response pairs The inputs (x) and the responses (y) are one dimensional scalars Goal: Model the relationship between x and y (CS5350/6350)