Navigation

Since version 0.5.0, statsmodels allows users to fit statistical
models using R-style formulas. Internally, statsmodels uses the
patsy package to convert formulas and
data to the matrices that are used in model fitting. The formula
framework is quite powerful; this tutorial only scratches the surface. A
full description of the formula language can be found in the patsy
docs:

Notice that we called statsmodels.formula.api instead of the usual
statsmodels.api. The formula.api hosts many of the same
functions found in api (e.g. OLS, GLM), but it also holds lower case
counterparts for most of these models. In general, lower case models
accept formula and df arguments, whereas upper case ones take
endog and exog design matrices. formula accepts a string
which describes the model in terms of a patsy formula. df takes
a pandas data frame.

dir(smf) will print a list of available models.

Formula-compatible models have the following generic call signature:
(formula,data,subset=None,*args,**kwargs)

Looking at the summary printed above, notice that patsy determined
that elements of Region were text strings, so it treated Region as a
categorical variable. patsy‘s default is also to include an
intercept, so we automatically dropped one of the Region categories.

If Region had been an integer variable that we wanted to treat
explicitly as categorical, we could have done so by using the C()
operator:

Notice that all of the above examples use the calling namespace to look for the functions to apply. The namespace used can be controlled via the eval_env keyword. For example, you may want to give a custom namespace using the patsy.EvalEnvironment or you may want to use a “clean” namespace, which we provide by passing eval_func=-1. The default is to use the caller’s namespace. This can have (un)expected consequences, if, for example, someone has a variable names C in the user namespace or in their data structure passed to patsy, and C is used in the formula to handle a categorical variable. See the Patsy API Reference for more information.

Even if a given statsmodels function does not support formulas, you
can still use patsy‘s formula language to produce design matrices.
Those matrices can then be fed to the fitting function as endog and
exog arguments.