bnpy : Bayesian nonparametric machine learning for python.

About

This python module provides code for training popular clustering models on large datasets. We focus on Bayesian nonparametric models based on the Dirichlet process, but also provide parametric counterparts as well.

bnpy supports the latest online learning algorithms as well as standard offline methods.

Supported probabilistic models

Mixture models

FiniteMixtureModel : fixed number of clusters

DPMixtureModel : infinite number of clusters, via the Dirichlet process

Topic models (aka admixtures models)

FiniteTopicModel : fixed number of topics. This is Latent Dirichlet allocation.

Demos

Quick Start

You can use bnpy from the terminal, or from within Python. Both options require specifying a dataset, an allocation model, an observation model (likelihood), and an algorithm. Optional keyword arguments with reasonable defaults allow control of specific model hyperparameters, algorithm parameters, etc.

Below, we show how to call bnpy to train a 8 component Gaussian mixture model on the default AsteriskK8 toy dataset (shown below).
In both cases, log information is printed to stdout, and all learned model parameters are saved to disk.

This conference paper introduces our new memoized variational algorithm, which is the cornerstone of allowing scalable inference that can also effectively explore model complexity.

For background reading to understand the broader context of this field, see our Resources wiki page.

Target Audience

Primarly, we intend bnpy to be a platform for researchers.
By gathering many learning algorithms and popular models in one convenient, modular repository, we hope to make it easier to compare and contrast approaches.
We also how that the modular organization of bnpy enables researchers to try out new modeling ideas without reinventing the wheel.