Discount calculator

The Data Science Toolbox and Machine Learning

*Laptop requiredIn order to participate, you must bring your own portable computer.

Course Outline

The primary aim of this course is to introduce probabilistic machine learning methods to social scientists working with high-dimensional data. These methods consist of constructing and estimating explicit statistical models for generating data, so are well-suited for structural modelling and uncertainty quantification, which lie at the heart of much of the empirical work in social science. The course develops the methodology for solving canonical problems such as variable selection in regression models with many covariates and dimensionality reduction for vast input spaces. It moreover provides an introduction to deep learning techniques. The course is build around hands-on application of data science techniques using python. Certain emphasis is placed on learning from high-dimensional unstructured data, such as text. The ultimate objective of the course is to provide students with tools that allow them to develop new, application-specific models and algorithms that go beyond those covered in the course itself.

The Data Science Toolbox

This will be a hands-on session where a range of tools commonly used in Data Science will be introduced:

Alexandros Karatzoglou is the Scientific Director at Telefonica Research, his research focuses on Machine Learning. Alexandros received his PhD in Machine Learning from the Vienna University of Technology (TUWIEN). During his PhD he was a frequent visitor to the Statistical Machine Learning group at the ANU/NICTA in Canberra Australia. He has over 50 papers in the field and has won 3 best paper awards at the ACM RecSys and ECMLPKDD conferences. He is also the author of the core machine learning R package kernlab, and enjoys giving lectures on Machine Learning, Recommender Systems and Computational Statistics.

Ilias Leontiadis is currently a Research Associate at Telefonica Research. In the past he was a researcher at University of Cambridge and received his PhD from University College London (UCL). His research interests include mobile systems, pervasive computing, wireless networks, sensor networks, mobile phone privacy and mobility modeling.

Bayesian Machine Learning in Social Sciences

*Laptop requiredIn order to participate, you must bring your own portable computer.

Applicants to all Summer School programs should meet the basic entry requirements. In addition, Data Science participants are expected to have a basic knowledge of linear algebra, basic computing skills, and familiarity with any kind of programming language (not necessarily R, Python).

For this course, previous knowledge of Bayesian statistics is not required, the first day of the course will provide an authoritative overview of all key ideas. However, participants are strongly encouraged to have studied Chapters 1-2 (minimally) and also Chapter 3 (ideally) of Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014, 3rd edition). Bayesian data analysis (Vol. 2). Boca Raton, FL: CRC press, or another textbook that covers this type of material. The course will be based on R code that will be provided to participants, hence prior knowledge of R is not compulsory either. Again, we strongly encourage participants completely unfamiliar with R to go through a tutorial such as this very short intro by Torfs and Brauer. Or those already familiar with basic concepts to look into a more advanced such as this one by Venables, Smith, and the R Core team.

Course Outline

In the course we develop the following main components of modern statistical methodology:

high-dimensional regression and estimation of treatment effects in presence of very large number of instruments. We cover both modern optimization-based penalized likelihood approaches, such as the lasso, but also probabilistic inference approaches such as Bayesian variable selection and model averaging

Advanced computational methods for statistical learning with models as in i and ii, such as Markov chain Monte Carlo and variational Bayes.

We also develop one successful approach to learning content from text data, known as topic modelling, which is an instance of the high-dimensional Bayesian models we discuss in i, ii that computationally relies on the methods covered in iii.

Alongside the presentation of these statistical tools, we will discuss economic applications that use them, for example treatment effect estimation and central bank communication. The course includes also lab sessions, delivered by the three instructors, that provide a hands-on experience on the material.

About the Instructors

Stephen Hansen is Associate Professor in the Department of Economics at the University of Oxford and a Fellow of University College. He was previously Assistant then Associate Professor of Economics at Pompeu Fabra University after receiving his PhD in Economics from the London School of Economics in 2009. He serves as an academic consultant to the Bank of England, a Turing Fellow at the Alan Turing Institute, and is an Associate of the Centre for Economic Policy Research, CESifo, and the Oxford Man Institute. A main line of his research addresses the organization of monetary policymaking institutions, and his recent work has developed the use of machine learning algorithms for text data to answer questions in this area. His academic papers have been published in leading international journals, including the Quarterly Journal of Economics, Review of Economic Studies, Journal of Monetary Economics, and Journal of International Economics.

Omiros Papaspiliopoulos is ICREA Research Professor at UPF. He is the Scientific Director of the Barcelona GSE Master's Degree in Data Science. His research has appeared in the top journals in Statistics, including several articles in the Journal of the Royal Statistical Society Series B, Biometrika and the Annals of Statistics. He has been an Associate Editor for the first two journals and a Deputy Editor for Biometrika. He has delivered more than 80 invited talks, and has given courses at ENSAE in Paris, the Berlin Mathematical School, the Department of Mathematics at University of Copenhagen, and the Engineering Department at Osaka University. In 2010 he was awarded the Royal Statistical Society's Guy Medal in Bronze. His research interests include Monte Carlo Methods, Computational Methods, Bayesian Statistics, Stochastic Processes, Machine Learning.

David Rossell is Ramón y Cajal Fellow at UPF. Previous appointment include the Biostatistics Dept. at MD Anderson Cancer Center (Houston, USA), IRB Barcelona (Spain) as head of Biostatistics & Biostatistics Unit and the Dept. of Statistics at the University of Warwick as Assistant/Associate professor. His work combines methodological research, such as theory, computation and the development of modelling techniques, and a substantial amount of inter-disciplinary work in biomedicine, chemistry and the social sciences. Specific research interests include high-dimensional inference, experimental design, dimensionality reduction and applied statistical modelling, with emphasis on the Bayesian approach. He authored over 35 peer-reviewed publications and has developed a number of R packages implementing statistical methodology. He has taught undergraduate and graduate courses at the University of Warwick, the Oxford-Warwick OxWaSP PhD program for Big Data and at UPF, shorter courses workshops at various universities, and >70 invited talks at international conferences and workshops.

Laptop required for practical courses

Practical courses will be held in a lecture room, not in a computer lab. Participants must bring a laptop in order to follow these sessions.

Entry requirements

Applicants to all Summer School programs should meet the basic entry requirements. In addition, Data Science participants are expected to have a basic knowledge of linear algebra, basic computing skills, and familiarity with any kind of programming language (not necessarily R, Python)

Certificate

At the conclusion of the Summer Schools, participants will receive a certificate for the number of hours attended. All Barcelona GSE courses require an average of twice the lecture hours for readings, pre-readings and class preparation. Interested students should check with their universities to see if these hours are transferable into ECTS credits.