is a sample statistic U used to provide a one-point approximation to the value of an unknown parameter θ.

Sampling Error

is given by the absolute value of the difference between the estimated value and true value of the parameter of interest.

| U-θ |

Geometrically: Sampling error measures the distance between the two points U and θ.

Unbiased Estimator

U attempting to estimate θ such that E(U) = θ.

Bias

E(U) - θ

More efficient

Suppose U and W are two unbiased estimators for θ. Then U is ___________ provided that it has a smaller variance.

Relative Efficiency

of U to W:

100[V(W)/V(U)]

Confidence Level

the percentage 100(1-α)%

Confidence Interval

Sample Mean ± Critical Value * Standard Error

Margin of Error

limit on the size of sampling error.

P(sampling error<MOE)= 1-α

Prototype Confidence Interval for the Mean- Assumptions

Normal Population

VSRS

σ is known

Prototype Confidence Interval for the Mean- Calculation

xbar ± Zα/2 (σ/√n)

Interpretation of Confidence Level

Under repeated random sampling, the statistical procedure yeilds an interval containing the true value Θ of the parameter of interest 100(1-α)% of the time.

Large Sample Confidence Interval- Assumptions

VSRS

n is sufficiently large (ie: n is greater than 30)

σ is unknown

Large Sample Confidence Interval- Calculations

xbar ± Zα/2 (s/√n)

T-Distributions appear to be bell-shaped but actually...

have fatter tails than normal curves

Variance of a T-Distribution

df/(df-2)

is greater than 1 but convergys to 1 as df-->infinity

T-Distribution congergys to...

standard normal distribution as df-->infinity

Small Sample Confidence Interval- Assumptions

VSRS

normal population

σ is unknown

Small Sample Confidence Interval for the Mean- Calculations

xbar ± tα/2 (s/√n)

Large Sample Confidence Interval for the Proportion- Assumptions

Binomial Population

VSRS

n is sufficiently large (np≥5, n(1-p)≥5)

Large Sample Confidence Interval for the Proportion- Calculation

pbar ± Zα/2 (√pbar(1-pbar)/n)

Null Hypothesis

statement about the population of interest.

Alternative Hypothesis

statement about the populatin of interest that contradicts the null hypothesis.

Test Statistic

sample statistic that is used to summarize the evidence.

Significance

probability of making an error in judgement.

Fisher's Theory

(Heuristic) The P-value is the probability of observing a value of the test statistic "at least as extreme as" what was actually observed.

-focuses on type I error (false positive)

Interpreting P-Values

the P-velue is computed by temporarily assuming the null hypothesis is true.

a low P-value casts doubt upon the assumption that the null hypothesis is true. Thus, we reject it.

a high P-value is consistent with the assumption that the null hypothesis is true. Thus, we do not reject it.

Neyman-Pearson Theory

power calculations

the p-value is the smallest significance level for which the null hypothesis can be rejected.

Type I Error

false positive

Type II Error

false negative

Decision Rule

specifies the set of values of the test statistic for which the null hypothesis is rejected in favor of the alternative hypothesis and the set of values for which the null hypothesis is accepted.

Rejection Region

consists of all values of the test statistic for which the null hypothesis is rejected.

Acceptance Region

consists of all values of the test statistic for which the null hypothesis is accepted.

Critical Value

value that separates the rejection region from the acceptance region.

Power

1-B where B is the probability of making type II error

Large Sample Hypothesis Test for the Proportion- Assumptions

Binomial Population

VSRS

n is sufficiently large (based on null hypothesis value of p)

Large Sample Confidence Interval for the Proportion- Calculations

z=pbar-pnot/(√pnot(1-pnot)/n)

P-Value

probability of observing a value of the test statistic "at least as extreme as" what was actually observed.

the p-value is the smallest significance level for which the null hypothesis can be rejected.

**p-value is the observed significance level**

Regression Statistics- Multiple R

positive square root of R Square. Large value indicates a good fit.

Regression Statistics-R Square

number indicates that ___% of the variation in the response variable Y is explained by its linear relationship with the explanatory variable X.

Regression Statistics-Adjusted R Square

adjustement of R Square for degrees of freedom. The adjustment serves as a penalty for using too many explanatory variables.

Regression Statistics-Standard Error

"Standard Error of Regression" and estimates the standard deviation of the error specification in the model. A relatively small standard error is another indication of a good fit.

ANOVA- First Column

"Regression" always refers to the part that is explained and "Residual" to the part that is unexplained. The "Total" is typically the sum of the explained and unexplained components, wherever this way of thinking is useful.

ANOVA- Second Column

analyzes degrees of freedom. The total degrees of freedom is always n-1. If there is only one explanatory variable, Regression df=1. The Residual degrees of freedom is what is left over (ie: 19-1=18).

ANOVA- Third Column

analyzes the variation in the response variable in terms of Sum of Squares. The Total Sum of Squares is simply the sum of the squared deviations from the sample mean for Y. The Regression SS is the variation in the response variable Y that is explained byt he regression line. Residual SS is interpreted as the variation in the response variable Y that is not explained by the linear regression line.

ANOVA- Fourth Column

Mean Square is a Sum of Squares divided by the appropriate degrees of freedom. It is easy to check that the square root of the Residual MS is equal to the Standard Error of Regression listed in the Regression Statistics.

ANOVA- Fifth Column

F-Statistic is the ratio of MS Regression to MS Residual. Large value of F supports the alternative hypothesis that the linear regression model fits the data against the null hypothesis that the linear regression model does not fit the data.