3 Course language will be English. This applies to lectures, exercises, announcements, etc. Why? Essentially all machine learning publications and books are written in English. Knowing the original terms is crucial. If strongly preferred, you may contact the course staff in German. English is encouraged though, because we may use your (anonymized) question to clarify points to the entire class. Stefan Roth, Department of Computer Science GRIS 3

5 Exam & Bonus System Most likely there will be an oral exam. Likely during the first week after classes end. Can be taken in English or German. Details depend on how many students end up taking the class. There will be a bonus of up to a full grade for those who do well in the homework assignments. Details TBA. Exercises: In order to get credit for 2+2 SWS, you need to actively participate in the exercises / turn in the homework assignments. If you do not hand in homework assignments regularly, you can only get credit for the lecture part (2 SWS). Stefan Roth, Department of Computer Science GRIS 5

6 Style Lectures: I would like the lectures to be at least partly interactive. Maybe more interactive than you are used to. This is supposed to be helpful for you and me. You are encouraged to ask questions! Exercises: Mostly interactive. You are encouraged to ask detailed questions! Your participation counts: Bonus for final exam. Stefan Roth, Department of Computer Science GRIS 6

7 Homework assignments Mix of written and programming assignments. We will have around 4-5 assignments. Programming assignments in MATLAB, standard environment for scientific computing. - Goal: Work with some real data to get a first hand knowledge of how the techniques work that we will learn. - Introduction during first exercise (next week). Also pen and paper exercises. The last assignment may be a larger project-like one: Stay tuned... Stefan Roth, Department of Computer Science GRIS 7

9 Readings Additional readings: At times I will post papers and tutorials. Will be available or linked from the course web page. I will often assign weekly readings: Please read them and come to class prepared! The Bishop book is a good investment, because it is also a very useful reference. Stefan Roth, Department of Computer Science GRIS 9

10 How does it fit into your course plan? Diplom: Anwendungsbezogene Informatik Possibly Praktische or Theoretische Informatik if you can find someone who will count ML I toward this. - Note that I will not be able to offer an exam in Theoretische Informatik. B.Sc. / M.Sc.: Human Computer Systems (see Modulhandbuch) Not Data Knowledge Engineering If you are strongly interested in machine learning you should: - Take ML: Statistical Methods for HCS credit and - Take ML: Symbolische Methoden for DKE credit Stefan Roth, Department of Computer Science GRIS 10

12 Machine Learning What is ML? What is its goal? Develop a machine / an algorithm that learns to perform a task from past experience. Why? What for? Fundamental component of every intelligent and/or autonomous system Discovering rules and patterns in data Automatic adaptation of systems Attempting to understand human / biological learning Stefan Roth, Department of Computer Science GRIS 12

14 Machine Learning: Examples Example 1: Recognition of handwritten digits These digits are given to us as small digital images. We have to build a machine to decide which digit it is. Obvious challenge: There are many different ways in which people handwrite. Stefan Roth, Department of Computer Science GRIS 14

15 Machine Learning: Examples Example 2: Classification of fish salmon sea bass count FIGURE 1.1. The objects to be classified are first sensed by a transducer (camera), whose signals are preprocessed. Next the features are extracted and finally the classification is emitted, here either salmon or sea bass. Although the information flow is often chosen to be from the source to the classifier, some systems employ information flow in which earlier levels of processing can be altered based on the tentative or preliminary response in later levels (gray arrows). Yet others combine two or more stages into a unified step, such as simultaneous segmentation and feature extraction. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc.! Stefan Roth, Department of Computer Science GRIS l* length FIGURE 1.2. Histograms for the length feature for the two categories. No single threshold value of the length will serve to unambiguously discriminate between the two categories; using length alone, we will have some errors. The value marked l will lead to the smallest number of errors, on average. From: Richard O. Duda, Peter E. Hart, and c 2001 by John Wiley & Sons, Inc. David G. Stork, Pattern Classification. Copyright "

23 Some Key Challenges We need generalization! We cannot simply memorize the training set. What if we see an input that we haven t seen before? Different shape of the digit image (unknown writer) Dirt on the picture, etc. We need to learn what is important for carrying out our task. This is one of the most crucial points that we will return to many times. Stefan Roth, Department of Computer Science GRIS 23

24 Generalization How do we achieve generalization? width salmon sea bass 19 18? lightness FIGURE 1.5. Overly complex models for the fish will lead to decision boundaries that are complicated. While such a decision may lead to perfect classification of our training samples, it would lead to poor performance on future patterns. The novel test point marked? is evidently most likely a salmon, whereas the complex decision boundary shown leads it to be classified as a sea bass. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Stefan Roth, Department of Computer Science GRIS 24

26 Some Key Challenges Input x: Features: Choosing the right features is very important. Coding and use of domain knowledge. May allow for invariance (e.g. volume and pitch of voice). Curse of Dimensionality: If the features are too high-dimensional, we will run into trouble - more later. Dimensionality reduction Stefan Roth, Department of Computer Science GRIS 26

27 Some Key Challenges How do we measure performance? 99% correct classification in speech recognition: What does that really mean? We understand the meaning of the sentence? We understand every word? For all speakers? Need more concrete numbers: % of correctly classified letters average distance driven (until accident...) % of games won % correctly recognized words, sentences, etc. Training vs. testing performance! Stefan Roth, Department of Computer Science GRIS 27

29 Some Key Challenges Which is the right model? The learned parameters can mean a lot of different things. - w: may characterize the family of functions or the model space - w: may index the hypothesis space - w: vector, adjacency matrix, graph,... Stefan Roth, Department of Computer Science GRIS 29

30 Some Key Challenges Even if we have solved the other problems, computation is usually quite hard: Learning often involves some kind of optimization Find (search) best model parameters Often we have to deal with thousands of training examples Given a model, compute the prediction efficiently Stefan Roth, Department of Computer Science GRIS 30

31 Why is machine learning interesting (for you)? Machine learning is a challenging problem that is far from being solved. Our learning systems are primitive compared to us humans. Think about what and how quickly a child can learn! It combines insights and tools from many fields and disciplines: Traditional artificial intelligence (logic, semantic networks,...) Statistics Complexity theory Artificial neural networks Psychology Adaptive control,... Stefan Roth, Department of Computer Science GRIS 31

32 Why is machine learning interesting (for you)? Allows you to apply theoretical skills that you may otherwise only use rarely. Has lots of applications: Computer vision Computer linguistics Search (think Google) Digital assistants Computer systems... It is a growing field: Many major companies are hiring people with machine learning knowledge. Anecdote: At a recent workshop on computer graphics, about 2/3 of the groups said they would benefit from more machine learning knowledge. Stefan Roth, Department of Computer Science GRIS 32

35 Credits Large parts of the lecture material have been developed by Prof. Bernt Schiele for the previous iterations of this course. Many figures that I will use are directly taken out of the books by Chris Bishop and Duda, Hart & Stork. Stefan Roth, Department of Computer Science GRIS 35

37 Brief Review of Basic Probability We usually do not mention the random variable (RV) explicitly (for brevity). Instead of p(x = x) we write: p(x) if we want to denote the probability distribution for a particular random variable X. p(x) if we want to denote the value of the probability of the random variable being x. It should be obvious from the context when we mean the random variable itself and a value that the random variable can take. Some people use upper case P (X = x) probability distributions. I usually don t. for (discrete) Stefan Roth, Department of Computer Science GRIS 37

42 Continuous RVs p(x) P (x) Probability density function = pdf δx x Cumulative distribution function = cdf We can work with a density (pdf) as if it was a probability distribution: For simplicity we usually use the same notation for both. Stefan Roth, Department of Computer Science GRIS 42

46 Preview Review of some basics about probability Bayesian decision theory Loss functions Disclaimer: It will get quite a bit more mathematical than this :) Don t get scared away, but be aware that this will not be a walk in the park. Stefan Roth, Department of Computer Science GRIS 46

47 Readings for next week Introduction to ML: Bishop 1.0, 1.1 Review of the basics of probability: Bishop 1.2 (you can skip and for now) Decision theory: Bishop 1.5 For the curious: Probability: You could also look at MacKay 2 Brush up on information theory: Bishop 1.6 Stefan Roth, Department of Computer Science GRIS 47

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

Statistics W4240: Data Mining Columbia University Spring, 2014 Version: January 30, 2014. The syllabus is subject to change, so look for the version with the most recent date. Course Description Massive