Main Features

glm: Generalized linear models with support for all of the one-parameter
exponential family distributions.

discrete choice models: Poisson, probit, logit, multinomial logit

rlm: Robust linear models with support for several M-estimators.

tsa: Time series analysis models, including ARMA, AR, VAR

nonparametric : (Univariate) kernel density estimators

datasets: Datasets to be distributed and used for examples and in testing.

PyDTA: Tools for reading Stata .dta files into numpy arrays.

stats: a wide range of statistical tests

sandbox: There is also a sandbox which contains code for generalized additive
models (untested), mixed effects models, cox proportional hazards model (both
are untested and still dependent on the nipy formula framework), generating
descriptive statistics, and printing table output to ascii, latex, and html.
There is also experimental code for systems of equations regression,
time series models, panel data estimators and information theoretic measures.
None of this code is considered "production ready".

Where to get it

Development branches will be on Github. This is where to go to get the most
up to date code in the trunk branch. Experimental code is hosted here
in branches and in developer forks. This code is merged to master often. We
try to make sure that the master branch is always stable.

Windows Help

The source distribution for Windows includes a htmlhelp file (statsmodels.chm).
This can be opened from the python interpreter

>>> import scikits.statsmodels.api as sm
>>> sm.open_help()

Discussion and Development

All chatter will take place on the or scipy-user mailing list. We are very
interested in receiving feedback about usability, suggestions for improvements,
and bug reports via the mailing list or the bug tracker at

to discuss development and design issues that are deemed to be too specialized
for the scipy-dev/user list.

Python 3

scikits.statsmodels has been ported and tested for Python 3.2. Python 3
version of the code can be obtained by running 2to3.py over the entire
statsmodels source. The numerical core of statsmodels worked almost without
changes, however there can be problems with data input and plotting.
The STATA file reader and writer in iolib.foreign has not been ported yet.
And there are still some problems with the matplotlib version for Python 3
that was used in testing. Running the test suite with Python 3.2 shows some
errors related to foreign and matplotlib.

Release History

0.3.1

Removed academic-only WFS dataset.

Fix easy_install issue on Windows.

0.3.0

Changes that break backwards compatibility

Added api.py for importing. So the new convention for importing is:

import scikits.statsmodels.api as sm

Importing from modules directly now avoids unnecessary imports and increases
the import speed if a library or user only needs specific functions.