resources/code/Dirichlet Process Mixture Tutorial/DPMM.m

classdef DPMM < handle% DPMM - Dirichlet Process Mixture Model% % A class for inferring 1D gaussian mixture components from data using% a dirichlet process prior. % Cooked up for a Pfunk lab meeting tutorial during a hot week in% August 2011% % OVERVIEW:% Step 1: create an initial distribution over the thetas (n draws from% G_{0} using poly-urn process. Since we haven't observed any data yet,% the expected number of clusters is roughly \alpha% log(1+\frac{n}{\alpha}). Note that each of these draws corresponds first% to a draw of c_{i} (which identifies the cluster to which the atom% belongs), and then, if it belongs to a new cluster, to a draw of the% parameters theta_{i} which describe this cluster.

% steps for resampling one of the theta i's:% 1: compute the conditional distribution P(theta_{i} | theta_{-i})% (theta_{i} removed).% - if we we sample a known cluster, great, just move on% - if we sample a new cluster, then draw it's parameters from the% POSTERIOR distribution based on the prior G_{0} and the datapoint x_{i}% in question (I.E., don't forget to condition on the datapoint)%% 2: for all clusters, update the theta^{*}_{j} according to the POSTERIOR% distribution based on the prior G_{0} and all the data points currently% associated with each cluster. Technically we should probably do this% after we resample each datapoint (for the clusters that changed), but% it's probably okay to do it once per sweep.%% 3: Resample alpha from its gamma posterior given the number of data% points and components%% ...% % (Repeat until bored)% (plot stuff)%% Author: Jon Scholz% Date: 9/1/2011

function sample_prior(obj)% Samples component parameters from DP prior, only% according to current component proportions and base% distribution (see resample_component for version that depends% on a data value as well)
a = obj.alpha;
n = sum(obj.N);

function resample_component(obj, i)% Sample the component associated with the ith data point,% conditional on xi, the component proportions, and the% component parameters.% This will either assign to another existing cluster, or draw% a new one from the base posterior given the ith data point

% sample a component for this pointifrand < a/(n+a)% get an available cluster location
freeidx = obj.get_free_idx;% draw new sample from H|x,c
obj.Phi(freeidx,:) = obj.sample_base_conditional(obj.X(i)); % append params for a new component% if sum(isnan(obj.Phi(freeidx,:)))>0% error('isnan in Phi params! (update_component_params)');% end% if obj.Phi(freeidx,2)<0% error('got negative precision! (%f)', obj.Phi(freeidx,2));% end
obj.N(freeidx) = 1; % set the count for this component to 1
obj.C(i) = freeidx; % associate this data point

function resample_alpha(obj)% Resamples the main DP alpha parameter, conditional on the% current distribution of clusters. I used a softmax trick% here to control how likely we were to sample the ML alpha% (turned out to work better)

function post = gibbs_twostep(obj, n_sweeps)% Implements Neal (2000) algorithm 2: a two-step gibbs sampling% procedure for computing the posterior DPMM given a set of% observed data. Phase one resamples all indicators, and phase% two resamples component parameters (separating these leads to% faster mixing - see neal 2000)% Note: this differs from Rasmussen 2000 in that we don't% perform the full bayesian update of the base parameters for% the mean (u,v) and the variance/precision (a,b).