How do I ____ in SUDAAN? | SUDAAN FAQ

This page illustrates some of the "peculiarities" often encountered when
using SUDAAN. For more information about a specific procedure or option,
please see the SUDAAN help available through SAS or consult the SUDAAN manuals.
All examples use the CHIS adult data set. (include link)

Missing data/dummy variables How do I use categorical variables in
SUDAAN?

Dummy variables

Suppose that you are running a regression or a logistic regression, and
instead of getting the results that you expect, you find that SUDAAN considers a large portion of
your data to be missing. You use proc print or something to see
your data set, and it looks fine to you. What has happened? You may have
included in your model and on the subgroup statement dummy variables that are coded 0/1.
W

While this is the standard
way of coding dummy variables, SUDAAN considers the cases coded
as 0 to be missing for variables that are listed on the subgroup
statement. In other words, to SUDAAN, non-positive values in variables
that are used as categorical independent variables are considered to be missing. Hence, when SUDAAN does a listwise deletion of missing
data, a large portion of your cases may be deleted, possibly to the point of
making the model unestimatible. (Please see pages 165-166 of the SUDAAN
manual for a complete description regarding the use of the subgroup
statement, including valid values for subgroups, and below for the example using
the subgroup statement.) Consider the example below in which
srsex is coded 1/2 and newvar1 is coded 0/1. As you can see, an
error is printed in the log and the number of cases used in the analysis is
about 4050 fewer than there should be (the 4050 cases that are coded 0 in the
data step). You have several ways of dealing with this problem.
Perhaps the easiest is to not list the 0/1 variable on the subgroup
statement. In many ways the subgroup statement in SUDAAN is like
the class statement in SAS. In the same way that you would not list
a 0/1 variable on the class statement in SAS, you do not list a 0/1
variable on the subgroup statement in SUDAAN. Another solution is
the recode the 0/1 variable to be a 1/2 variable. If you have a variable
that is 0/1/2, then you need to recode it. You can do this in a data step
before running the procedure.

Opened SAS data file TEMP01 for reading.
DATA WARNING:
The matrix for estimable parameters is singular.
The model may be overspecified. You should reduce the number of
variables on the right-hand side and refit the model before attempting
to draw any conclusions.
DATA WARNING
:
Degrees of freedom for OVERALL contrast are less than maximum number of estimable parameters
You may wish to rerun this job with a tolerance (TOL) of 1.000000e-007 and 1.000000e-005

This problem is caused by the dummy variable newvar1. If you
compare the number of cases used by SUDAAN for the analysis above, 51339, you
will see that the 4050 cases coded as 0 in the data step above are missing. Although
in the example below we have recoded the problem variable in a data step, you could also use
the recode statement in SUDAAN to temporarily recode the variable. If you
have many variables that need to be recoded, you may want to use an array
in a data step. These options are perhaps most useful when you really want
to have the dummy variable listed on the subgroup statement, such as when
you are using proc crosstabs. As mentioned above, you could also list only the
categorical variables coded with non-zero values on the subgroup
statement.

The subgroup statement

In this example we have a 0/1 variable (newvar1) and we are not
using it on the subgroup statement. If you want to have the table
broken out by the values of newvar1, then you need to recode it to be a
1/2 variable and include it on the subgroup statement and include the
number of levels on the levels statement.

Creating interaction terms on the model statement

In proc regress, proc rlogitst and proc survival, you can use a * between two variables
(such as two categorical variables or one categorical and one continuous
variable) to create an
interaction term on the model statement. However, you cannot do this with
two continuous variables; you need to create the interaction term in a data
step before running the model.

Date/time

To suppress the printing of the time and/or date at the top of your results,
you can use the notime and/or nodate option on the print
statement in all of the analysis procedures.

Limiting the number of observations

If you are working with a very large data set and you find that running
procedures takes a while, you can use the maxobs = option on the proc
statement of all analysis procedures to limit the number of observations that
are read in. This can be very useful when you are debugging a program.
Just remember to delete that option when you have the programming working
correctly. Compare the results of the two proc reg calls below.

Stars instead of numbers

If you see stars in your output where numbers should be, you can change the length of the column
width (which is specified on the setenv statement) so that it is wide enough to display the results correctly. In the
example below, the column width (colwidth = 10) is not set to be large enough, even though it is
set higher than the default.

Proc records crashing

The records procedure will not work properly if you have a data set with
formats that have negative values, for example, as the CHIS data set does.
SUDAAN has been notified of this problem.

Reference categories for categorical independent variables and how to change
them

By default, the last category (i.e., the highest numbered category) is used
as the reference category when you have categorical predictors in a regression
model. In this example, srsex is coded 1 = male and 2 = female, and
racehpra is coded 1 = Latino, 2 = Pacific Islander, 2 = AIAN and 4 =
Asian.

The following examples show what will happen if you use a dichotomous
variable coded as 1/2 as the dependent variable in a logistic regression in
SUDAAN. As you can see, SUDAAN, like other statistical packages, requires
that the dependent variable in a logistic regression be coded as 0/1.

Using the recode statement

You can use the recode statement with all procedures in SUDAAN (except
proc records). This statement is especially useful when you need to
create a categorical variable from a continuous variable. The original
continuous variable is recoded "on the fly", and the recoded variable is not
added to your data set; rather, it exists only for the duration of the
procedure. In the first example below, a 0/1 variable is created from the
continuous variable ab23. A cut-off value of 50 is given, so in the
recoded variable, values less than 50 will be coded 0 and values equal to and
greater than 50 will be coded 1. Please see page 164 of the SUDAAN manual
for more information regarding the recode statement. Note that
proc descript does not consider 0 to be a missing value. On the var
statement, you need to specify the variable one time for each level of that
variable that appears on the catlevel statement. On the catlevel
statement, you need to specify the value of each level of the variable that you
want displayed in the output.

In the example below, the recode statement is used to create a
three-level variable from the continuous variable ab23. In the
recoded variable, values less than 20 will be coded as 0, values less than or
equal to 30 will be coded as 1, and values less than 70 will be coded as 2.

The example below shows how you can use the recode statement to
recode a 0/1 variable into a 1/2 variable. Although this is not needed
for proc descript, this is used because the recoding is so clearly
shown in the output.

The example below shows how you can use the recode statement to
recode a 1/2 variable into a 0/1 variable. According to the SUDAAN
website, you cannot use the recode statement to recode a value of 2 to
0 (2 = 0).

The contrast statement

A new page will be developed describing this statement.

The subpopn statement

Below is an example of the subpopn statement. This statement
should be used whenever you want to analyze only a subpopulation in your
data. You should NOT subset your data in a data step before running the
analysis, as this can cause a wide variety of problems, from incorrect results
to difficulties running the procedure at all. See pages 166-169 of the
SUDAAN manual for more
information regarding the subpopn statement, how to use it, and how
missing values are handled. See especially the note in the middle of
page 169 for a more complete explanation of why the subpopn statement
should be used instead of subsetting the data first.

The test statement

You can use the test statement to obtain different types of chi-squared
tests. Please see pages 278-279 of the manual for a description of the
tests available in the crosstabs procedure. The nose option is used
on the proc crosstabs statement to suppress the display of the standard
errors. We have done this to make the output more readable.