Bayesian Analysis for the Social Sciences

Transcription

1 Bayesian Analysis for the Social Sciences Simon Jackman Stanford University November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

2 Introduction to Bayesian Inference Bayesian inference relies exclusively on Bayes Theorem: p(h data) p(h) p(data h) h is a usually a parameter (but could also be a data point, a model, a hypothesis) p are probability densities (or probability mass functions in the case of discrete h and/or discrete data) p(h) a prior density; p(data h) the likelihood or conditional density of the data given h p(h data) is the posterior density for h given the data. Gives rise to the Bayesian mantra: a posterior density is proportional to the prior times the likelihood Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

6 Introduction to Bayesian Inference p(h data) p(h) p(data h) Bayesian inference involves computing, summarizing and communicating summaries of the posterior density p(h data). How to do this is what this class is about. Depending on the problem, doing all this is easy or hard; we solve hard with computing power. We re working with densities (or sometimes, mass functions). Bayesian point estimates are a single number summary of a posterior density Uncertainty assessed/communicated in various ways: e.g., the standard deviation of the posterior, width of interval spanning 2.5th to 97.5th percentiles of the posterior, etc. Sometimes, can just draw a picture; details, examples coming. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

8 Why Be Bayesian? conceptual simplicity: say what you mean and mean what you say (subjective probability) a foundation for inference that does not rest on the thought experiment of repeated sampling uniformity of application: no special tweeks for this or that data analysis. Apply Bayes Rule. modern computing makes Bayesian inference easy and nearly universally applicable Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

9 Conceptual Simplicity p(h data) p(h) p(data h) the posterior density (or mass function) p(h data) is a complete characterization of beliefs after looking at data as such it contains everything we need for making inferences Examples: the posterior probability that a regression coefficient is positive, negative or lies in a particular interval; the posterior probability that a subject belongs to a particular latent class; the posterior probability that a hypothesis is true; or, the posterior probabilities that a particular statistical model is true model among a family of statistical models. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

13 Subjective Uncertainty how do we do statistical inference in situations where repeated sampling is infeasible? inference when we have the entire population and hence no uncertainty due to sampling: e.g., parts of comparative political economy. Bayesians rely on a notion of subjective uncertainty e.g., h is a random variable because we don t know its value Bayes Theorem tells us how to manage that uncertainty, how to update beliefs about h in light of data Contrast objectivist notion of probability: probability as a property of the object under study (e.g., coins, decks of cards, roulette wheels, people, groups, societies). Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

14 Subjective Uncertainty Many Bayesians regard objectivist probability as metaphysical nonsense. de Finetti: PROBABILITY DOES NOT EXIST The abandonment of superstitious beliefs about...fairies and Witches was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is not less a misleading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs. In investigating the reasonableness of our own modes of thought and behaviour under uncertainty, all we require, and all that we are reasonably entitled to, is consistency among these beliefs, and their reasonable relation to any kind of relevant objective data ( relevant in as much as subjectively deemed to be so). This is Probability Theory. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

15 Subjective Uncertainty Bayesian probability statements are thus about states of mind over states of the world, and not about states of the world per sé. Borel: one can guess the outcome of a coin toss while the coin is still in the air and its movement is perfectly determined, or even after the coin has landed but before one reviews the result. i.e., subjective uncertainty obtains irrespective of objective uncertainty (however conceived) not just any subjective uncertainty: beliefs must conform to the rules of probability: e.g., p(h) should be proper: i.e., H p(h)dh = 1, p(h) 0 h H. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

21 Cromwell s Rule: the dangers of dogmatism p(h data) p(h) p(data h) p(h data) = 0 hs.t.p(h) = 0. Cromwell s Rule: After the English deposed, tried and executed Charles I in 1649, the Scots invited Charles son, Charles II, to become king. The English regarded this as a hostile act, and Oliver Cromwell led an army north. to the outbreak of hostilities, Cromwell wrote to the synod of the Church of Scotland, I beseech you, in the bowels of Christ, consider it possible that you are mistaken. a dogmatic prior that assigns zero probability to a hypothesis can never be revised likewise, a hypothesis with prior weight of 1.0 can never be refuted. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

23 Bayesian Point Estimates Bayes estimates: single number summary of a posterior density but which one?: e.g., mode, median, mean, some quantile(s)? different loss functions rationalize different point estimate Loss: Let H be a set of possible states of nature h, and let a A be actions availble to the researcher. Then define l(h, a) as the loss to the researcher from taking action a when the state of nature is h. expected loss: Given a posterior distribution for h, p(h y), the posterior expected loss of an action a is m(p(h y), a) = H l(h, a)p(h y)dh. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

28 HPD intervals HPDs can be a series of disjoint intervals, e.g., a bimodal density these are uncommon; but in such a circumstance, presenting a picture of the density might be the reasonable thing to do. See Example 1.7, p28: y i N(0, R), subject to extreme missingness. The posterior density of q(r) = r 12 / r 11 r 22 : Correlation Coefficient Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

29 Bayesian Consistency for anything other than a dogmatic/degenerate prior (see the earlier discussion of Cromwell s Rule), more and more data will overwhelm the prior. Bayesian asymptotics: with an arbitrarily large amount of sample information relative to prior information, the posterior density tends to the likelihood (normalized to be a density over h). central limit arguments: since likelihoods are usually approximately normal in large samples, then so too are posterior densities. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

30 Bayesian Consistency The prior remains fixed across the sequence, as sample size increases and h * is held constant. In this example, n = 6, 30, 90, 450 across the four columns. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

31 Bayesian Consistency The prior remains fixed across the sequence, as sample size increases and h * is held constant. In this example, n = 6, 30, 150, 1500 across the four columns. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, / 32

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

The result of the bayesian analysis is the probability distribution of every possible hypothesis H, given one real data set D. This prestatistical approach to our problem was the standard approach of Laplace

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson Contents What Are These Demos About? How to Use These Demos If This Is Your First Time Using Fathom Tutorial: An Extended Example

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

2. Information Economics In General Equilibrium Theory all agents had full information regarding any variable of interest (prices, commodities, state of nature, cost function, preferences, etc.) In many

This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Statistics

What is Bayesian statistics and why everything else is wrong 1 Michael Lavine ISDS, Duke University, Durham, North Carolina Abstract We use a single example to explain (1), the Likelihood Principle, (2)

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

Economic Order Quantity and Economic Production Quantity Models for Inventory Management Inventory control is concerned with minimizing the total cost of inventory. In the U.K. the term often used is stock

Final Mathematics 51, Section 1, Fall 24 Instructor: D.A. Levin Name YOU MUST SHOW YOUR WORK TO RECEIVE CREDIT. A CORRECT ANSWER WITHOUT SHOWING YOUR REASONING WILL NOT RECEIVE CREDIT. Problem Points Possible

38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

AP Statistics 1998 Scoring Guidelines These materials are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement

Null Hypothesis Significance Testing: a short tutorial Although thoroughly criticized, null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

3. Data Analysis, Statistics, and Probability Data and probability sense provides students with tools to understand information and uncertainty. Students ask questions and gather and use data to answer

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 01/19/2014 Basic probability Think of Probability as a function that assigns a real number to an event For events E and F in probability space

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

Machine Learning: Fundamentals A model is a formal description of a belief about the world. Learning is the construction and/or revision of a model in response to observations of the world. The mathematical/statistical

2 Matrix Algebra 2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES Theorem 8: Let A be a square matrix. Then the following statements are equivalent. That is, for a given A, the statements are either all true

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

Open book and note Calculator OK Multiple Choice 1 point each MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data.

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE TROY BUTLER 1. Random variables and distributions We are often presented with descriptions of problems involving some level of uncertainty about