The following function simulates a data set for survival analysis. There are two covariates in addition to the survival time and status. All variables have missing values. The missing value patterns for the survival time and status variables are identical.

In [2]:

defgendat(n):# Covariatesx1=np.random.normal(size=n)x2=np.random.normal(size=n)# The event timeevent_time=np.random.exponential(size=n)*np.exp(x1)# The duration of observationobs_time=np.random.exponential(size=n)# The event or censoring timetime=np.where(event_time<obs_time,event_time,obs_time)# The censoring statusstatus=np.where(time==event_time,1,0)df=pd.DataFrame({"time":time,"status":status,"x1":x1,"x2":x2})# Create missing valuesdf.time.iloc[0:100]=np.nandf.status.iloc[0:100]=np.nandf.x1.iloc[80:150]=np.nandf.x2.iloc[140:200]=np.nanreturndf

Next we create a data set and wrap it in a MICEData object. We will have the survival status variable "follow" the time variable. When using PMM to do imputation, each imputed time value is actually an observed time value from the data set. By having status follow time, the status value and time value will always be obtained from the same observation. Note that this requires that the status and time values have the same missingness pattern, which would normally be the case in practice.

Also note that we must indicate that the string "status" is a variable name and the corresponding data must be obtained from the data frame.

Finally, the intercept is not identified in a proportional hazards regression model, so we must explicitly exclude the intercept by including "0 + " in the formula.

/projects/sage/sage-6.9/local/lib/python2.7/site-packages/pandas/core/indexing.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)

Next we run a few rounds of imputation.

In [4]:

forjinrange(5):x=idata.next_sample()

Here is a histrogram of the imputed survival times in the final iteration:

In [5]:

_=idata.plot_imputed_hist('time')

Here is a scatterplot showing the fitted and imputed or observed survival times at the final MICE iteration. Note that since the linear predictor corresponds to the hazard rate, there is an inverse relationship between the "time fitted" axis (which is actually a fitted hazard ratio) and the observed or imputed time axis.

In [6]:

_=idata.plot_fit_obs('time')

Here is a scatterplot showing the relationship between the survival times and the values of the first predictor variable.

In [7]:

_=idata.plot_bivariate('time','x1',jitter=(0,0.1))

This website does not host notebooks, it only renders notebooks
available on other websites.