Notice that there are a number of parameters of the model. Each one is a object itself, can be accessed seperately, in two different ways. Directly or via a regular expression

In [6]:

m.rbf.lengthscale

Out[6]:

index

GP_regression.rbf.lengthscale

constraints

priors

[0]

1.00000000

+ve

In [7]:

m['.*lengthscale']

Out[7]:

index

GP_regression.rbf.lengthscale

constraints

priors

[0]

1.00000000

+ve

We can plot the current model, with its default parameters in the following way.

In [8]:

_=m.plot()

/Users/alansaul/anaconda/lib/python2.7/site-packages/matplotlib/figure.py:1718: UserWarning:This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.

Of course the default parameter values are not the optimal settings, as such we need to optimize them.

Note that all of the parameters of the model are restricted to be positive, as such we have a positive constraint on each of them (indicated by +ve). This is done by allowing the optimizer to optimize the parameters in the real space, but a transformation of $\log(\exp(theta) + 1)$ if performed before the parameter is given back to the model, gradients are transformed appropriately.

To find the optimal parameters we must optimize the model. A number of optimizers are available, such as 'scg', 'lbfgs', 'org-bfgs', 'fmin_tnc', as well as stochastic optimizers if a dependency on a package 'climin' is satisfied, such as 'adadelta' and 'rprop'.

In [9]:

m.optimize(messages=1,ipython_notebook=True)# Messages indicates you wish to see the progress of the optimizer, needs ipywidgets to be installedm

Widget Javascript not detected. It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

Here we can see that the optimizer was not able to go above the bounds.

Instead of constraints you may wish to set a prior. In this case we will have a constraint where we need to consider the prior of the hyperparameter during optimization. HMC is available in GPy which allows hyperparameter integration, but here we will show a simple MAP solution.

In [15]:

# Reset modelm.unconstrain()m[:]=1.0m.constrain_positive()# Make a gamma prior and set it as the prior for the lengthscalegamma_prior=GPy.priors.Gamma.from_EV(1.5,0.7)gamma_prior.plot()m.rbf.lengthscale.set_prior(gamma_prior)m

A GPRegression model is a wrapper around a more flexible 'GP' model that expects input $X$, output $Y$, both of which must be 2d numpy arrays. It also expects you to provide your own likelihood, your own kernel and your own inference method. We will look at alternative models shortly but for now we will show what the wrapper is doing

In GPy you can easily combine covariance functions you have created using the sum and product operators, + and *. So, for example, if we wish to combine an exponentiated quadratic covariance with a Matern 5/2 then we can write

It may be that you only want one kernel to work one set of dimensions, and another kernel on another set. That is the two kernels work on different domains of the input. In GPy this capability is provided with active_dims argument to the kernel, which tell the kernel what to work on. Lets return back to a two dimensional example.

In [28]:

# sample inputs and outputs from 2d modelX2=np.random.uniform(-3.,3.,(50,2))Y2=np.sin(X2[:,0:1])+np.sin(X2[:,1:2])+np.random.randn(50,1)*0.05# define a sum kernel, where the first dimension is modelled with an RBF, and the second dimension is modelled with a linear kerelker=GPy.kern.RBF(input_dim=1,active_dims=[0])+GPy.kern.Linear(input_dim=1,active_dims=[1])# create simple GP modelm2=GPy.models.GPRegression(X2,Y2,ker)m2.optimize()_=m2.plot()

If we want to plot the marginal of the model where we fix one input to a certain value, we can use "fixed_inputs", the format for which is a list of tuples, where we specify the dimension, and the fixed value.

Here we fix the first dimension (the domain of the linear kernel) to 0.2

In [29]:

_=m2.plot(fixed_inputs=[(1,0.2)])

And then lets fix the zeroth dimension (domain of rbf) to 1.0

In [30]:

_=m2.plot(fixed_inputs=[(0,1.0)])

As we would expect, the data doesn't necessarily sit on the posterior distribution, as this is a projection of all the data onto one dimension.

Considered the olympic marathon data that we use as a running example in many of our other notebooks. In 1904 we noted there was an outlier example. Today we'll see if we can deal with that outlier by considering a non-Gaussian likelihood. Noise sampled from a Student-t density is heavier tailed than that sampled from a Gaussian. However, it cannot be trivially assimilated into the Gaussian process. Below we use the Laplace approximation to incorporate this noise model.

We might want to use a non-Gaussian likelihood here. This also means that our inference method can no longer be exact, and we have to use one of the approximate methods. Lets use a Student-T distribution that has heavy tails to allow for the outliers without drastically effecting the posterior mean of the GP.

If we instead wanted to perform classification, it might be that we want to use Expectation Propagation (EP). Currently EP only works on a couple of likelihoods, namely the Bernoulli likelihood

In [34]:

frommatplotlib.patchesimportPolygonfrommatplotlib.collectionsimportPatchCollectionimportcPickleaspickleimporturlliburllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/M.Zwiessele/gpss/lab2/EastTimor.pickle','EastTimor2.pickle')#Load the datawithopen("./EastTimor2.pickle","rb")asf:Xeast,Yeast,polygons=pickle.load(f)

Now we will create a map of East Timor and, using GPy, plot the data on top of it. A classification model can be defined in a similar way to the laplace regression model, but now using GPy.inference.latent_function_inference.EP(). A wrapper for a classification model using EP is given by GPy.models.GPClassification()