# If either known syndrome or secondary diagnosisdemographic['synd_or_disab']=demographic.apply(lambdax:x['secondary_diagnosis']orx['known_synd'],axis=1)

Missing sibling counts were properly encoded as None (missing).

In [53]:

demographic.loc[demographic.sib==4,'sib']=None

We reduced the number of race categories, pooling those that were neither caucasian, black, hispanic or asian to "other", due to small sample sizes for these categories. Category 7 (unknown) was recoded as missing.

In [54]:

races=["Caucasian","Black or African American","Hispanic or Latino","Asian","Other"]demographic=demographic.rename(columns={"race":"_race"})demographic["race"]=demographic._race.copy()demographic.loc[demographic.race==7,'race']=Nonedemographic.loc[demographic.race>3,'race']=4print("_race:")print(demographic._race.value_counts())print("race:")print(demographic.race.value_counts())print("There are {0} null values for race".format(sum(demographic.race.isnull())))# Replace with recoded column