* Each item (testing question) has three parameters:
* a - discrimination (the ability this item has of deciphering between people of different ability levels)
* b - the difficulty of this item (1 is easy, 0 is hard)
* c - the guessing probability (the probability that someone who knows nothing will guess the correct answer

* Let us generate now the 100 items, for all 1000 test takers

* Each item will have a different a,b, and c.

* Let's also generate a total score for each person
gen total_score = 0

forv i = 1/100 {

local a = .2 + runiform()/4
local b = .3 + runiform()/2
local c = .2 + runiform()/3
* I am not very sure what the best way of parameterizing these items are since I know this matters.

local a`i' = `a'
local b`i' = `b'
local c`i' = `c'

* We will generate item responses as pi(theta) = ci + (1-ci)/(1 + exp(-ai*(theta-bi)))
* This is saying for a person, the probability of getting the item i right is a function of the item parameters ai bi ci as well personal ability theta.
gen item`i' = rbinomial(1, `c'+(1-`c')/(1+exp(-`a'*(theta-`b'))))

* Add the result of this item to the total score
replace total_score = total_score + item`i'

Wednesday, August 29, 2012

* Let's imagine we would like to estimate how effective 90 different teachers in 3 different grades each teaching 20 students are individually. Every student receives a teacher and all teachers are assigned to students randomly (an important assumption often violated).

clear
set obs 90

gen teffect = rnormal()+1.5
label var teffect "True Teacher Effect"

gen tid = _n

* This should create 3 different grades for the teachers to be assigned to
gen grade = ceil(_n/30)

* Multiplying start_level by .75 implies that students retain 75% of the ability that they had going into the school year.

* Now we want to see how well we can infer teacher ability

tab tid, gen(tid_)

* We might want to start with a straightforward regression of current achievement on teacher id
reg current_level tid_2-tid_90 start_level

* Since tid=1 is ommitted due to multicolinearity, we will set its effect equal to 0 as a base of reference.
gen reg_res = 0 if tid==1

* Note, you may want to identify the true magnitude of the teacher effect. This however, is not possible because all students have recieved teachers. Therefore, we can only at best hope to estimate how good teachers are relative to each other.

forv i = 2/90 {
cap replace reg_res = _b[tid_`i'] if tid == `i'
}

* The problem with this is now we have to figure out how to compare teachers.

* One way would be to correlate the estimated teacher effect with the true teacher effect (which we know).
corr reg_res teffect

* This correlation looks really bad primarily because teachers are only teaching in one grade each and grades have different learning effects.

* We can see generally their is a correlation between higher teacher effect and higher estimates of teacher effects across all grades. However, within grades the correlation is even more clear.
* One may attempt to correct this problem by including grade dummies
reg current_level tid_2-tid_90 i.grade start_level

* However, the system experiences multicolinearity issues and sometimes drops the grade dummies.

Alan: The above will error out at some point when it hits its recursion limit,
and Mata will print out a (long) traceback log intended to help in
debugging when a Mata function hits some system limit. Due to such deep
recursion, this traceback log will be long and so the output from
recursive_call() will scroll off the top of your screen. In any case,
after running the above, you can then re-enter Mata and see what the value
of "a" is:

mata:
a
end

*/

You should see 32766.

As an aside, you will notice that

* I imposed an upper limit of recursive calls at 10000, if left uncontrolled it will keep going till it gives an error. I am impressed with the speed and power of Mata!
recursive_call

Monday, August 27, 2012

* In order for OLS to be the best unbiased estimator, the error must be distributed normally (along with other OLS assumptions).

* Failure or the normality assumption does not cause bias but may cause the estimator to be potentially less efficient than some non-linear estimators. However, so long as the error is spherical (homoskedastic and uncorrelated) then the OLS estimator is still BLUE (best lineary unbiased estimator).

* Likewise, in order to calculate p-values and confidence intervals accurately the underlying distribution must be normal.

* However, by the central limit theorem so long as E(error^2)
* Of course, asympotics sometimes only appear in very large samples.

* In this post I will explore what happens to confidence intervals when the error is not distributed normally distributed and the sample size is "small".

cap program drop randols
program define randols, rclass

* This tells stata to call the first argument in the command randdraw N and the second distribution
args N distribution

* The first arugment of the
clear
set obs `N'

* Generate the variable
gen u = `distribution'

gen x = rnormal()

gen y = 1*x + u

reg y x

* The Degrees of freedom are N less the number of parameters estimates (2)
local DOF = `=`N'-2'

* Now we check if the estimated coefficient is within the CI
* Since the true coefficient is 1, we are checking if the CI encloses 1.
if `lowerCI'<1 amp="amp" upperci="upperci">1 return scalar within = 1
if !(`lowerCI'<1 amp="amp" upperci="upperci">1) return scalar within = 0
end

randols 100 runiform()
return list

* Now we are set up to do some tests to see how well the 90% CI is working.

* If the CI is too wide (the standard error estimates are too large) then the CI will enclose the true value more than 90% of the time.

* If the CI is too tight then the CI will enclose the true vale too few times.

* As a means of comparison let's start with a normal distribution of the error.

* Set the random see so as to create replicable runs.
simulate r(within), reps(2000) seed(1011) : randols 100 rnormal()
sum

* We can see that the standard estimates are working well, capturing the CI at 90.55% of the time.

* We do not expect exactly 90% since our simulation is a series of random draws as well.
* Now let's try a uniform distribution. Note because the uniform has a non-zero mean this will effect the constant estimate, but since we are not interested in the constant we don't mind.
simulate r(within), reps(2000) seed(1011) : randols 100 runiform()
sum
* Likewise the random uniform distribution does not present a serious problem in estimating CI.

* Perhaps the bernoulli(.5)
simulate r(within), reps(2000) seed(1011) : randols 100 rbinomial(1,.5)
sum
* Likewise, the normality assumption is working very well even in binomial(.5) errors.

* We can see that the coefficient estimates above approximate the true values.

* However this type of analysis of explanatory power by variable might only be useful for the case when explanatory variables are orthoganol. If explanatory variables were correlated then the variance in one variable is not indepentent of the variance in the other and thus together they will have less explanatory power than two orthoganal explanatory variables, ceteris parabis.

Friday, August 24, 2012

* For a series of random draws from any well behaved distribution, the mean of that series of samples from that distribution is normally distributed as the size of those samples gets large.

* This might seem at first like a lot of mumbo jumbo but this result is easily demonstrable and extremely useful.

* To show the central limit theorem (CLT) in action we will use Stata's simulate command.

* First we need to program a "program" to simulate

cap program drop randdrawprogram define randdraw

clear

* This tells stata to call the first argument in the command randdraw N and the second distribution args N distribution

* The first arugment of the set obs `N'

* Generate the variable gen x = `distribution'

sum x

end

* It is not neccessary that the mean be equal to zero but it might be useful for comparison between different distributionsranddraw 1000 "rpoisson(10)-10"* This command drew 1000 draws from the poisson distribution less 10. Which has a E(poisson(10)-10)=E(poisson(10))-10=10-10=0

simulate mean_x =r(mean), reps(1000): randdraw 1000 "rpoisson(10)-10"

hist mean_x, kden* Looks pretty normal, but that is not so surprising because the poisson distribution begins to look pretty normal once k gets as large as 10.

* So how exactly does the central limit relate to all of this and why is it so cool?

* The argument is that as N the number of draws going into each mean calculation gets large, the distribution of the means begins to look normal.

* This is the mean of a bernoulli distribution (weighted to .05) either 0 or 1 with each mean representing 10 draws.hist mean_x

* We can see that this is clearly not normally distributed. However what happens when we increase the draws to 100 rather than 10?
simulate mean_x =r(mean), reps(1000): randdraw 100 "rbinomial(1,.05)-.05"
hist mean_x

* Already things are beginning to look a lot more like a normal distribution. Obviously it is not normal because the potential outcomes are discrete.

* This is what is meant by asymptotically converging on a normal distribution. As N gets large the distribution of the means begins to look more like a normal distribution (so long as the variance of x is finite).
* Crazy, huh?

* Now things are looking very close to normal. Of course there will be some randomness in the 1000 draws which have enough inherent variation to make even a normal variable look non-normal.

* This might take a little while but let's increase the draws one more time.
simulate mean_x =r(mean), reps(1000): randdraw 1000000 "rbinomial(1,.05)-.05"
hist mean_x, kden

* So using a x variable with are bernoulli(.05) leads to a convergence in means that looks normal not until around 100,000 draws. This might sound bad but it really is not.

* Because, usually we don't need the individually drawn means TO BE normal just to be approximated by a normal distribution as well as their standardized values as being approximated by a chi-squared distribution for null tests.

* In addition, a bernoulli(.05) is an extreme distribution with a large skew and discrete outcomes. Most distributions, even binary and count distributions are going to have properties which are more similar to a normal distribution causing more rapid convergence on normality.

# However, let's allow for some momentum (whatever speed the point had at the previous step that speed is retained during this step plus some random movement. This is in effect a data source that exhibits a random walk even after first differencing.

* The method is argued to require weaker assumptions than natural experiments.

* The method is seemingly deceptively simple.

* Imagine there is some rule implemented at z = c a heterogenous value in the population.

* z could be correlated with the outcome variable y of interest. However, if one were to look at the group of individuals whose value of z were sufficiently close to c then one would find that the only remaining difference would be the result of either recieving the treatment T or not recieving the treatment.

* We can see that our estimates are closer. However they are not perfect yet it does not help if we restrict our data further.
* Sepecification 2:
reg performance SAT scholarship mentoring if SAT > 2070 & SAT < 2180

* It is easy to see the problem with the RD approach in this example. RD is highly sensitive to sample selection.

* If our selection is too narrow then we do not have enough data to identify the effect or the confidence interval for useful analysis.

* In 0 we can see that our confidence interval does not enclose the true parameter estimate, while in all other specifications it does.

* However, from specifications 2-4 we might be forced to conclude that after using RD "precision", the scholarship had no effect size significantly difference from zero.

* We can that the estimated effect of mentoring suffers equally bad as that of estimating the effect of scholarship by restricting the sample.

* Ultimately using RD to restrict the sample is going to have an equally deliterious effect on other coefficients of interest.

* This might sound overall negative. However, when you have large enough sample sizes things start improving.

* Let's imagine that the administrator is able to use multiple years of students to estimate the effect of the program.

* Increase sample size to 2 or 3 times the current sample size and include a year fixed effect and the RD estimator is still going to outperform the biased estimator which cannot get better through the inclusion of more data.

* Example 2: Imagine you are trying to calculate the reccomended cooking times for a frozen food that you have developed that is best for the most people.

* You know that for each 1000 feet above sea level you need to reduce the temperature by 5% (this is entirely fictional). You have experimented and found the ideal cooking temperature at sea level is 350. The result is E(Y|x)=350*(.95)^x where x is 1000 feet.

* Now you want to calculate a single cooking tempurature reccomendation that gives the ideal tempurature for an entire population.

* You know 30% of consumers live at sea level, 20% at 500 feet, 20% at 1000, 10% at 2000, 10% at 3500, and 10% at 5000.

Saturday, August 18, 2012

* Imagine that you have some data set that is missing some of the variables of interest but you have a complete set of explanatory variables. You might be concerned that the selection from the sample is correlated or is causing correlation in errors with your explanatory variable of interests thus creating potential bias.

* Imagine that you are interested in estimating if obedience school for dogs has the potential to reduce their risk of biting people. As the y variable you have self-reported (by owners) number of bites. As the explanatory variable you have breed aggressiveness and an indicator if the dog went to obedience school.

* Imagine also that you have information on the aggressiveness of the owners of the dogs which is correlated with the error in estimating the number of bites. It is also an explanatory variable of selection.

reg bites breed_agg n_classes
* We can see in this case the estimates look good (absent of selection)

replace bites = . if s == 0

reg bites breed_agg n_classes
* Now, the effects of classes seem greatly diminished in our observables because of the correlation between selection and the error component resulting from ownernship aggressiveness.

* There are two ways I can think of generating an unbiased estimates.

* We must do this by removing the correlation with selection and the error.

reg bites breed_agg n_classes owner_agg
* Is the easiest way to do this. However, this post is about inverse probability weighting. So that is what we will do.

probit s n_classes owner_agg

predict shat
* We want to estimate probability of selection from observables

gen ishat = 1/shat
* Then first the inverse of it to use in the pweight command

reg bites breed_agg n_classes [pweight=ishat]
* We can see this estimate is working well though the previous regression was generally better.

Thursday, August 16, 2012

* Imagine we have a sample of hotels and prices of rooms in those hotels
clear
set obs 200
set seed 101
* This sets the random seed at 101. Thus every time this simulation is run it will product identical results.

sum p
* There are some p values which are less than 0 but we can think of those as special deals, coupons, refunds, or other situations that might result in the effective price being less than 0 dollars.
* If we were to eliminate the less than 0 prices then we would in be enforcing left censoring which is a different problem. See "tobit". This blog has several posts touching on the use of the tobit.

* Estimate the price through direct OLS
reg p star seasonality

* Set the panel level observation
xtset id

* Though we have panel data we cannot effectively use fixed effect or random effects approaches to identify the price effect having one more star has on prices.
xtreg p star seasonality, fe
xtreg p star seasonality, re

* Now let's attempt a two step method to more efficient identify the error and reestimate the OLS.

reg p star seasonality
* The OLS regression looks pretty good. Let's see if we can improve on it. Note, the 95% confidence interval did not capture the true coefficient of 20 but that is not necessarily a problem.

reg p star seasonality, robust
reg p star seasonality, cluster(id)
* Using robust and cluster robust estimates of the standard error does not change the variance so much that the 95% confidence interval encloses the 20.

* Using the MLE estimator we seem to have gained precision in the 3rd decimal place of the coefficients. The 95% CI still does not enclose the 20 but it is closer. Still, this is not indicative of a problem. Perhaps if we simulated this 1000 times and substantially more than 50 times the CI did not enclose the 20 then we might be worried.

% This is a comment; it is not shown in the final output.% The following shows a little of the typesetting power of LaTeX\begin{align}First Stage: Standard OLS \\(1.1) X=Z\widehat{\gamma} \\ (1.2) Z'X=Z\widehat{\gamma} \\(1.3) Z'X=Z'Z\widehat{\gamma} \\(1.4) (Z'Z)^{-1}Z'X=(Z'Z)^{-1}Z'Z\widehat{\gamma} \\(1.5) (Z'Z)^{-1}Z'X=\widehat{\gamma} \\