# reimport so I don't have to unmeltheig=pd.DataFrame.from_csv('data/Galton.csv',index_col=None)# get counts of values for the bubble ploth=pd.DataFrame({'count':heig.groupby(["child","parent"]).size()}).reset_index()h[:10]

Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one half that of the outcome. The correlation between the two variables is .5. What value would the slope coefficient for the regression model with Y as the outcome and X as the predictor?

Students were given two hard tests and scores were normalized to have empirical mean 0 and variance 1. The correlation between the scores on the two tests was 0.4. What would be the expected score on Quiz 2 for a student who had a normalized score of 1.5 on Quiz 1?

Consider taking the slope having fit Y as the outcome and X as the predictor, β1 and the slope from fitting X as the outcome and Y as the predictor, γ1, and dividing the two as β1/γ1. What is this ratio always equal to

from__future__importprint_function"""Edward Tufte uses this example from Anscombe to show 4 datasets of xand y that have the same mean, standard deviation, and regressionline, but which are qualitatively different.matplotlib fun for a rainy day"""frompylabimport*x=array([10,8,13,9,11,14,6,4,12,7,5])y1=array([8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68])y2=array([9.14,8.14,8.74,8.77,9.26,8.10,6.13,3.10,9.13,7.26,4.74])y3=array([7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73])x4=array([8,8,8,8,8,8,8,19,8,8,8])y4=array([6.58,5.76,7.71,8.84,8.47,7.04,5.25,12.50,5.56,7.91,6.89])deffit(x):return3+0.5*xxfit=array([amin(x),amax(x)])subplot(221)plot(x,y1,'ks',xfit,fit(xfit),'r-',lw=2)axis([2,20,2,14])setp(gca(),xticklabels=[],yticks=(4,8,12),xticks=(0,10,20))text(3,12,'I',fontsize=20)subplot(222)plot(x,y2,'ks',xfit,fit(xfit),'r-',lw=2)axis([2,20,2,14])setp(gca(),xticklabels=[],yticks=(4,8,12),yticklabels=[],xticks=(0,10,20))text(3,12,'II',fontsize=20)subplot(223)plot(x,y3,'ks',xfit,fit(xfit),'r-',lw=2)axis([2,20,2,14])text(3,12,'III',fontsize=20)setp(gca(),yticks=(4,8,12),xticks=(0,10,20))subplot(224)xfit=array([amin(x4),amax(x4)])plot(x4,y4,'ks',xfit,fit(xfit),'r-',lw=2)axis([2,20,2,14])setp(gca(),yticklabels=[],yticks=(4,8,12),xticks=(0,10,20))text(3,12,'IV',fontsize=20)#verify the statspairs=(x,y1),(x,y2),(x,y3),(x4,y4)forx,yinpairs:print('mean=%1.2f, std=%1.2f, r=%1.2f'%(mean(y),std(y),corrcoef(x,y)[0][1]))show()

Can be used to create a confidence interval for $\theta$ via $\hat \theta \pm Q_{1-\alpha/2} \hat \sigma_{\hat \theta}$
where $Q_{1-\alpha/2}$ is the relevant quantile from either a normal or T distribution.

In the case of regression with iid sampling assumptions and normal errors, our inferences will follow
very similarily to what you saw in your inference class.

We won't cover asymptotics for regression analysis, but suffice it to say that under assumptions
on the ways in which the $X$ values are collected, the iid sampling model, and mean model,
the normal results hold to create intervals and confidence intervals