Data Settings (cont.)

Is the average rent (\(Y_i\)) of OSU students different for different kinds of accommodation (dorm, apartment, house)?

Regression settings: (ST552)

Simple: One outcome variable and one continuous explanatory variable

How much does rent of OSU students (\(Y_i\)) decrease based on the number of people they live (\(X_i\)) with?

Multiple: One outcome variable and one or more explanatory variables

What’s the average rent that OSU students pay for a \(Z\) square foot house with \(X\) bedrooms, \(D\) miles from campus?

For the next few weeks…

We will focus on the one sample random sampling setting.

Measure \(Y\) on \(n\) randomly sampled units from a population of interest.

Interested in some question/hypothesis about some parameter of the population.

Parameters of interest

Parameter: some summary measure of \(Y\) for all units in the population

Population mean: average of variable of interest for all units in the population

Population median: median of variable of interest for all units in the population

Population variance: variance of variable of interest for all units in the population

… any one number summary of the variable of interest for all units of the population

Questions about parameters

Point Estimate: the single best guess of the population parameter value

Interval Estimate: a range of likely values for the population parameter

Hypothesis test: is a specific value of the population parameter plausible?

Your Turn

Do people support the idea of a single payer health system?

Discuss with neighbor, what might be the population, variable, parameter and question/hypothesis?

Population:Variable:Parameter:Question/Hypothesis:

Probability Review

Population Distribution

The population distribution is the distribution of \(Y\) for the entire population.

It tells us how likely values are over the range of \(Y\).

In particular, it provides us a probability model for \(Y\), so we can find probabilities such as:

\[
P(Y \in (a, b]) = P(a < Y \le b)
\] In words: the probability, for a random unit drawn from the population, that the value of the variable of interest is between \(a\) and \(b\) (technically greater than \(a\) and less than or equal to \(b\)).

Common distributions

It’s sometimes convenient to assume mathematical forms for population distributions.

Continuous distributions: the range of possible values is the real line