Factor variables

Stata handles factor (categorical) variables elegantly. You can
prefix a variable with i. to specify indicators for each level
(category) of the variable. You can put a # between two variables to
create an interaction–indicators for each combination of the categories
of the variables. You can put ## instead to specify a full factorial
of the variables—main effects for each variable and an interaction. If
you want to interact a continuous variable with a factor variable, just
prefix the continuous variable with c.. You can specify up to
eight-way interactions.

We run a linear regression of cholesterol level on a full factorial of age
group and whether the person smokes along with a continuous body mass index
(bmi) and its interaction with whether the person smokes.

. regress cholesterol i.smoker##agegrp bmi i.smoker#c.bmi

Source

SS df MS

Number of obs = 4,049

F(9, 4039) = 15.30

Model

137.845627 9 15.3161808

Prob > F = 0.0000

Residual

4044.55849 4,039 1.0013762

R-squared = 0.0330

Adj R-squared = 0.0308

Total

4182.40412 4,048 1.0332026

Root MSE = 1.0007

cholesterol

Coef. Std. Err. t P>|t| [95% Conf. Interval]

smoker

smoker

-.7699108 .337665 -2.28 0.023 -1.431921 -.1079012

agegrp

45-49

.1554985 .0620537 2.51 0.012 .0338391 .2771579

50-54

.1838839 .0618467 2.97 0.003 .0626303 .3051375

55-59

.1746813 .0763244 2.29 0.022 .0250433 .3243193

smoker#
agegrp

smoker #
45-49

-.118553 .1367914 -0.87 0.386 -.3867396 .1496336

smoker #
50-54

-.1332379 .1363604 -0.98 0.329 -.4005796 .1341038

smoker #
55-59

-.2466412 .1717679 -1.44 0.151 -.5834009 .0901185

bmi

.0253916 .0059336 4.28 0.000 .0137585 .0370246

smoker#c.bmi

smoker

.0501707 .0129223 3.88 0.000 .0248358 .0755055

_cons

5.437234 .1520921 35.75 0.000 5.139049 5.735418

We could have used parenthesis binding, to type the same model more briefly:

. regress cholesterol smoker##(agegrp c.bmi)

Base levels can be changed on the fly: i.agegrp uses the default base
level of 1, whereas b3.agegrp makes 3 the base level.

The level indicator variables are not created in your dataset, saving lots of space.

Factor variables are integrated deeply into Stata’s processing of variable
lists, providing a consistent way of interacting with both estimation and
postestimation commands.