Sunday, July 15, 2012

Use bootstrapped draws for simulating draws - expand method

* This presents an alternative method to resampling results from the last post

webuse mheart0, clear

* First let's generate an observation index
gen obs_id = _n

* And you want to test how well an estimator will work on sampled data from that data set.

* There are obviously many ways to do this.

* One way would be to resample from that data 1,000 draws and then generate a dependent variable and test how well your estimator works.

sum

* First we want to mark the draws but we can see that bmi is missing some information.

* For our purposes we could either drop the observations for which bmi is missing or inearly impute bmi.

* Let's just impute bmi:

reg bmi age smokes attack female hsgrad marstatus alcohol hightar

predict bmi_fill

replace bmi = bmi_fill if bmi==.

sum bmi
drop bmi_fill

di "Now what we want is approximately 1,000 results (we do not need to be exact)"

di "We have " _N " observations"

local obs_add =1000/_N

di "So we need to add approximately " round(`obs_add') " observations per observation"

* One way to do this would be to add (or subtract) randomly more duplicate observations.

* The uniform distribution is a natural choice. However, its expected value is 1/2 so we need to multiply by 2 to ensure that we get the right number of observations.
gen add = round(`obs_add'*runiform()*2)

* Note: alternative distributions might be any non-negative distribution for which you can specify the expected value.
* For example: poission. This distribution will be less likely to drop observations and have more proportional representation of initial observations.

tab add
* First let's drop any obervations that are slated to be dropped

drop if add == 0

* Now the command expand is very useful because it allows us to easily duplicate observations