README.md

DynamicDiscreteModels.jl

Back-end support for doing statistical inference with any model that can described as a "partially observed" (discrete) Markov chain, such as a Hidden Markov Model.

Model construction and data simulation:

julia>#use a constructor defined in a front-end package:
julia> model=myawesomemodel()
julia>coef!(model,trueparameter)
julia>#or simply:
julia> model=myawesomemodel(trueparameter)
julia>#generate data from the model:
julia> data=rand(model,1000); #1 time series with 1000 observations
julia> data=rand(model,100,10); #10 time series with 100 observations each
julia> data=rand(model,[100,120,115]); #3 time series with 100, 120 and 115 obs.

The job of a front-end package MyModel.jl is simply to specify coef!(mymodel,myparameter) which maps myparameter to DynamicDiscreteModels.jl's canonical representation. All of the above methods will then be available. Optionally, a front-end package may also extend em() and/or mle() with specialized optimization wrapping DynamicDiscreteModels.jl's estep() and loglikelihood().

Installation

julia> Pkg.add("DynamicDiscreteModels")

Usage

For the purpose of this package, a "dynamic discrete model" is a discrete Markov chain z=(x,y), where x=1...dx is latent and y=1...dy is observed. The package defines an abstract type DynamicDiscreteModel with two fields:

m is a (dx,dy,dx,dy) array that holds the Markov probabilities m[x,y,x',y'] of moving from (x,y) today to (x',y') tomorrow.

mu is a (dx,dy) array that holds the joint initial distribution of the chain.

Any front-end implementation of a DynamicDiscreteModel will specify a mapping from a statistical parameter θ to the transition matrix m in a coef!(model,parameter) function. Being a back-end package, DynamicDiscreteModels.jl is agnostic as to the specific mapping, but examples/toymodel.jl provides a simple example. In this example (x,y) is a Hidden Markov model where x moves from today to tomorrow according to the Markov transition matrix A(θ) and y is drawn conditional on x according to the emission/transition matrix B(θ):

With the model and some data, we can find the most likely parameter value behind the data, according to the model, by computing the maximum likelihood estimator. Two methods are available: direct numerical optimization of the log-likelihood via mle() and the EM algorithm via em(). To help the optimizer, it is a good idea to first reparametrize the model on R rather than on [0,1]:

Notice how coef!() will use multiple dispatch to choose between the eta and theta parametrizations. mle() and em() will dispatch on parameter::Array by default so keep this signature for a parametrization which plays nicely with numerical optimization.

Good news, up to some numerical precision error, mle() and em() agree on the maximum of the likelihood function. It also turns out that with this sample size, we can pinpoint the true parameter value (0.65,0.5) rather well. This is not the case with a smaller sample size: