Abstract:

A system (100) for analysing and synthesising a plurality of sources of
sample data (310, 320) by automated learning and regression. The system
includes data storage (110) with a stored multi-task covariance function,
and an evaluation processor (102) in communication with the data storage
(110). The evaluation processor (102) performs regression using the
stored sample data and multi-task covariance function and synthesises
prediction data for use in graphical display or digital control.

Claims:

1. A system for analyzing and synthesizing data from a plurality of
sources of sample data by Gaussian process learning and regression, the
system comprising:data storage with a stored multi-task covariance
function and associated hyperparameters, andan evaluation processor in
communication with the data storage that:performs Gaussian process
regression using the stored sample data and multi-task covariance
function with the hyperparameters andsynthesizes prediction data for use
in graphical display or digital control,wherein the multi-task covariance
function is a combination of a plurality of stationary covariance
functions.

2. The system of claim 1 further comprising a training processor to
determine the hyperparameters by analyzing the sample data and the
multi-task covariance function.

3. The system of claim 1 wherein the sampled measurement data is derived
from measurement of a plurality of quantities dependent and distributed
over a spatial region or temporal period.

4. The system of claim 3 wherein the sampled measurement data is derived
from sensors measuring a plurality of quantities at spatially distributed
locations within a region.

5. The system of claim 4 wherein the sensors measure quantities related to
geology and/or rock characteristics within the region.

6. The system of claim 1 wherein the multi-task covariance function is
determined by a selected combination of separate stationary covariance
functions for each task corresponding to a separate source of sampled
measurement data.

7. The system of claim 6 wherein the covariance functions for each
separate task are the same.

8. The system of claim 6 wherein the covariance functions for each
separate task are different.

9. The system of claim 6 wherein at least one of the covariance functions
combined into the multi-task covariance function is a squared-exponential
covariance function.

10. The system of claim 6 wherein at least one of the covariance functions
combined into the multi-task covariance function is a Sparse covariance
function.

11. The system of claim 6 wherein at least one of the covariance functions
combined into the multi-task covariance function is a Matern covariance
function.

12. The system of claim 1 wherein the cross-covariance function is
determined by selecting a stationary covariance function for each data
source task, and combining the plurality of covariance functions using
Fourier transform and convolution techniques.

13. A method of computerized data analysis and synthesis for estimation of
a desired first quantity comprising:measuring the first quantity and at
least one other second quantity within a domain of interest to generate
first and second sampled datasets,storing the sampled datasets,selecting
first and second stationary covariance functions for application to the
first and second datasets,determining a multi-task covariance function
determined from the selected first and second covariance
functions,training a multi-task Gaussian process by computing and storing
optimized hyperparameter values associated with the multi-task covariance
function using the stored first and second datasets, andperforming
Gaussian process regression using the selected multi-task covariance
function, computed and stored hyperparameters and stored datasets to
predict unknown values of the first quantity within the domain of
interest.

14. The method of claim 13 wherein the first and second quantities are
spatially distributed measureable quantities.

15. The method of claim 14 wherein the first and second quantities are
derived from geological characteristics within a body of earth.

16. The method of claim 13 wherein the first and second covariance
functions are the same.

17. The method of claim 13 wherein the first and second covariance
functions are different.

18. A method for determining a Gaussian process for regression of a
plurality of related tasks comprising:receiving a data set associated
with each one of the plurality of related tasks;receiving one covariance
function associated with each one of the related tasks; andusing the data
sets and covariance functions to determine a multi-task covariance
function, for use with the multi-task Gaussian process.

19. The method of claim 18 wherein the multi-task covariance function is
determined in a training phase.

20. The method of claim 18 wherein the multi-task covariance function K is
determined from a basis function, g, associated with each covariance
function, using the relationship described as
follows:K((x,i),(x',j))=∫.sub.-.infin..sup.∞gi(x-u)gj(x'-u)du where i and j identify the task number and (x, i), (x', j)
represent the points x and x' from the task i and j respectively.

21. A method for evaluating a task from a Gaussian process regression
model, wherein the task is one of a plurality of dependent tasks, and the
Gaussian process regression model includes a Gaussian process, the
Gaussian process being associated with a covariance function, the
covariance function being a multi-task covariance function.

22. The method of claim 21 wherein the multi-task covariance function is
determined by a method comprising:receiving a data set associated with
each one of the plurality of related tasks;receiving one covariance
function associated with each one of the related tasks; andusing the data
sets and covariance functions to determine a multi-task covariance
function, for use with the multi-task Gaussian process.

23. A computer program comprising machine-readable program code for
controlling the operation of a data processing apparatus on which the
program code executes to perform a method for determining a Gaussian
process for regression of a plurality of related tasks
comprising:receiving a data set associated with each one of the plurality
of related tasks;receiving one covariance function associated with each
one of the related tasks; andusing the data sets and covariance functions
to determine a multi-task covariance function, for use with the
multi-task Gaussian process.

24. A computer program product comprising machine-readable program code
recorded on a machine-readable recording medium, for controlling the
operation of a data processing apparatus on which the program code
executes to perform a method for determining a Gaussian process for
regression of a plurality of related tasks comprising:receiving a data
set associated with each one of the plurality of related tasks;receiving
one covariance function associated with each one of the related tasks;
andusing the data sets and covariance functions to determine a multi-task
covariance function, for use with the multi-task Gaussian process.

25. A system for analyzing a plurality of data sets, each data set
associated with a single-task covariance function, the system
comprising:a multi-task Gaussian process training processor that analyzes
the plurality of data sets simultaneously to determine a multi-task
covariance function,wherein the multi-task covariance function is a
combination of the single-task covariance functions.

26. A system for synthesizing a data set from a test input data set,
wherein the data set comprises data from one of a plurality of data
types, each data type being associated with a single-task covariance
function, the system comprising:a multi-task Gaussian process associated
with a multi-task covariance function, wherein the multi-task covariance
function is a combination of the single-task covariance functions; anda
Gaussian process evaluation processor that inputs the test input data
set, and uses the multi-task Gaussian process to synthesize the data set.

27. A method for computer regression of a plurality of related tasks, the
method comprising:receiving a data set associated with each one of the
plurality of related modelling tasks;assigning a data set kernel for each
of the data sets;simultaneously modelling the data sets using a kernel
process in which the kernel is a convolution of the data set kernels.

28. The method of claim 27, wherein the data set kernel for one of the
plurality of data sets is different from the data set kernel for another
of the plurality of data sets.

29. A method for computer regression of a plurality of related tasks, the
method comprising:receiving values for inputs X, targets y, covariance
function K, noise level σn2, and test input X*, wherein
X, y and X* are in the form of block vectors and K is in the form of a
block matrix comprising covariance functions for each input X along its
diagonal and cross-covariance functions formed by a convolution of
covariance functions outside of its diagonal;applying the covariance
function K to the inputs X, targets y, noise level σn2,
and test input X*, in a predictive process and outputting a model of the
inputs X.

30. A computer system or computer readable medium including instructions
for a method comprising:receiving a data set associated with each one of
a plurality of related modelling tasks;simultaneously modelling the data
sets using a kernel process in which the kernel is a convolution of
kernels assigned to each data set.

31. A computer system or computer readable medium including instructions
for a method comprising:implementing regression of a plurality of related
tasks, by:receiving values for inputs X, targets y, covariance function
K, noise level σn2, and test input X*, wherein X, y and
X* are in the form of block vectors and K is in the form of a block
matrix comprising covariance functions for each input X along its
diagonal and cross-covariance functions formed by a convolution of
covariance functions outside of its diagonal;applying the covariance
function K to the inputs X, targets y, covariance function K, noise level
σn2, and test input X*, in a predictive process and
outputting a model of the inputs X.

Description:

FIELD OF THE INVENTION

[0001]This invention relates to a method and system for data analysis and
data synthesis using a smoothing kernel/basis function, as is used in
Gaussian processes and other predictive methods and processes. Examples
of applications include, but are not limited to, mining, environmental
sciences, hydrology, economics and robotics.

BACKGROUND OF THE INVENTION

[0002]Computer data modelling, such as for data embodying a spatial
representation of a desired characteristic, is frequently useful in
fields such as mining and environmental sciences. In the case of mining
as an example, it is oftentimes desirable to determine a representation
of the spatial distribution of minerals and ores within a body of earth
to model and predict the geometry and geology of material in the ground.
The in-ground model can then be used for mine planning, drill hole
location, drilling operations, blasting, excavation control, direction of
excavated material and resource management, amongst other things.

[0003]To model an in-ground ore body, for example, sample data can be
generated from measurements of mineral concentrations, or related
quantities, at discrete locations within a three-dimensional spatial
domain including the ore body. The sample data can then be analysed and,
using a method of interpolation, synthesised into a model that can be
used to make predictions of mineral concentrations at spatial locations
distinct from those that were measured. A mathematical technique that has
been found useful in this application is regression using the Gaussian
process (GP) which is a stochastic process based on the normal (Gaussian)
distribution and can be used to good effect as a powerful non-parametric
learning technique for spatial modelling. Described by an appropriate
covariance function, the GP can be used to infer continuous values within
the spatial domain from the distribution of sample measurements. GPs and
their application are described in Gaussian Processes for Machine
Learning (MIT Press, 2006) by C. E. Rassmussen and C. K. I. Williams, the
contents of which are incorporated herein by reference.

SUMMARY OF THE INVENTION

[0004]According to a first aspect of the invention, there is provided a
system for analysing and synthesising data from a plurality of sources of
sample data by Gaussian process learning and regression, the system
including data storage with a stored multi-task covariance function and
associated hyperparameters, and an evaluation processor in communication
with the data storage. The evaluation processor performs Gaussian process
regression using the stored sample data and multi-task covariance
function with the hyperparameters and synthesises prediction data for use
in graphical display or digital control. The multi-task covariance
function is a combination of a plurality of stationary covariance
functions.

[0005]In one embodiment, the system further includes a training processor
to determine the hyperparameters by analysing the sample data and the
multi-task covariance function.

[0006]In one embodiment, the sampled measurement data is derived from
measurement of a plurality of quantities dependent and distributed over a
spatial region or temporal period. The sampled measurement data may be
derived from sensors measuring a plurality of quantities at spatially
distributed locations within a region. The sensors may measure quantities
related to geology and/or rock characteristics within the region.

[0007]In one embodiment, the multi-task covariance function is determined
by a selected combination of separate stationary covariance functions for
each task corresponding to a separate source of sampled measurement data.
The covariance functions for each separate task may be the same.
Alternatively, the covariance functions for each separate task may be
different.

[0008]In one embodiment, at least one of the covariance functions combined
into the multi-task covariance function is a squared-exponential
covariance function.

[0009]In one embodiment, at least one of the covariance functions combined
into the multi-task covariance function is a Sparse covariance function.

[0010]In one embodiment, at least one of the covariance functions combined
into the multi-task covariance function is a Matern covariance function.

[0011]In one embodiment, the cross-covariance function is determined by
selecting a stationary covariance function for each data source task, and
combining the plurality of covariance functions using Fourier transform
and convolution techniques.

[0012]According to a second aspect of the invention, there is provided a
method of computerised data analysis and synthesis for estimation of a
desired first quantity. The method includes measuring the first quantity
and at least one other second quantity within a domain of interest to
generate first and second sampled datasets, storing the sampled datasets
and selecting first and second stationary covariance functions for
application to the first and second datasets. The method then includes
determining a multi-task covariance function determined from the selected
first and second covariance functions, training a multi-task Gaussian
process by computing and storing optimised hyperparameter values
associated with the multi-task covariance function using the stored first
and second datasets, and performing Gaussian process regression using the
selected multi-task covariance function, computed and stored
hyperparameters and stored datasets to predict unknown values of the
first quantity within the domain of interest.

[0013]In one embodiment, the first and second quantities are spatially
distributed measureable quantities. The first and second quantities may
be derived from geological characteristics within a body of earth.

[0014]In one embodiment, the first and second covariance functions are the
same. Alternatively, the first and second covariance functions are
different.

[0015]According to a third aspect of the invention, there is provided a
method for determining a Gaussian process for regression of a plurality
of related tasks including the steps of receiving a data set associated
with each one of the plurality of related tasks, receiving one covariance
function associated with each one of the related tasks and, using the
data sets and covariance functions to determine a multi-task covariance
function, for use with the multi-task Gaussian process.

[0016]In one embodiment, the multi-task covariance function is determined
in a training phase.

[0017]In one embodiment, the multi-task covariance function K is
determined from a basis function, g, associated with each covariance
function, using the relationship described as follows:

K((x,i),(x',j))=∫.sub.-∞.sup.∞gi(x-u)gj(x'-u)du

where i and j identify the task number and (x, i), (x', j) represent the
points x and x' from the task i and j respectively.

[0018]According to a fourth aspect of the invention, there is provided a
method for evaluating a task from a Gaussian process regression model,
wherein the task is one of a plurality of dependent tasks, and the
Gaussian process regression model includes a Gaussian process, the
Gaussian process being associated with a covariance function, the
covariance function being a multi-task covariance function.

[0019]According to a fifth aspect of the invention, there is provided a
system for analysing a plurality of data sets, each data set associated
with a single-task covariance function. The system includes a multi-task
Gaussian process training processor that analyses the plurality of data
sets simultaneously to determine a multi-task covariance function. The
multi-task covariance function is a combination of the single-task
covariance functions.

[0020]According to a sixth aspect of the invention, there is provided a
system for synthesising a data set from a test input data set, wherein
the data set comprises data from one of a plurality of data types, each
data type being associated with a single-task covariance function. The
system includes a multi-task Gaussian process associated with a
multi-task covariance function, wherein the multi-task covariance
function is a combination of the single-task covariance functions and a
Gaussian process evaluation processor that inputs the test input data
set, and uses the multi-task Gaussian process to synthesise the data set.

[0021]According to other aspects of the invention, there is provided a
method for computer regression of a plurality of related tasks, or a
computer system for such regression, including the steps receiving a data
set associated with each one of the plurality of related modelling tasks,
assigning a data set kernel for each of the data sets and simultaneously
modelling the data sets using a kernel process in which the kernel is a
convolution of the data set kernels.

[0022]In some embodiments, the data set kernel for one of the plurality of
data sets is different from the data set kernel for another of the
plurality of data sets.

[0023]According to still other aspects of the invention, there is provided
a method for computer regression of a plurality of related tasks, or a
computer system for such regression. Values for inputs X, targets y,
covariance function K, noise level σn2, and test input X*
are received, wherein X, y and X* are in the form of block vectors and K
is in the form of a block matrix comprising covariance functions for each
input X along its diagonal and cross-covariance functions formed by a
convolution of covariance functions outside of its diagonal. The
covariance function K is applied to the inputs X, targets y, noise level
σn2, and test input X*, in a predictive process and an
output of a model of the inputs X is generated.

[0024]According to further aspects of the invention, there is provided a
computer program and a computer program product comprising
machine-readable program code for controlling the operation of a data
processing apparatus on which the program code executes to perform the
method described herein.

[0025]Further aspects of the present invention and further embodiments of
the aspects described in the preceding paragraphs will become apparent
from the following description, given by way of example and with
reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0026]In the drawings:

[0027]FIG. 1 is a representative diagram of an example computing system
which may be used to implement a data modelling system in accordance with
an embodiment of the invention;

[0028]FIG. 2 is a diagrammatic illustration of a mining drill hole
pattern;

[0029]FIG. 3 is a flow chart for data analysis and data synthesis using
multi-task Gaussian processes according to one embodiment of the
invention;

[0030]FIG. 4 is a flow chart showing a training phase for a spatial data
modelling process, according to one embodiment of the invention;

[0031]FIG. 5 is a diagrammatic representation of an evaluation phase for
the spatial data modelling process, according to one embodiment of the
invention;

[0032]FIG. 6 indicates a plot showing the output from a single task
Gaussian process regression;

[0033]FIG. 7 is a flow chart for the multi-task GP regression method
according to one embodiment of the invention;

[0034]FIG. 8 is a flow chart for the multi-task GP regression method
according to one embodiment of the invention;

[0035]FIGS. 9 a) and b) indicate two plots showing the output from a
multi-task Gaussian process regression according to one embodiment of the
invention;

[0037]FIGS. 11A to 11C graphically illustrate an example of using a
multi-kernel methodology in an example having two dependent tasks, the
figures showing predictive mean and variance for respective independent,
multi-task and multi-kernel GP's.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0038]It will be understood that the invention disclosed and defined in
this specification extends to all alternative combinations of two or more
of the individual features mentioned or evident from the text or
drawings. All of these different combinations constitute various
alternative aspects of the invention.

[0039]In an estimation problem such as ore grade prediction in mining,
some relationship can exist between grades of different minerals being
predicted. Modelling these relationships can significantly improve the
prediction quality, reduce the overall uncertainty for each estimation
task and provide means for estimation with partial data. A technique in
geostatistics for this purpose is known as co-kriging where correlations
between variables need to be specified manually.

[0040]The problem of simultaneously learning multiple tasks has received
increasing attention in the field of machine learning in recent years.
This research is motivated by many applications in which it is required
to estimate different quantities from a set of input/output data and
these quantities have unknown intrinsic inter-dependences. This problem
can be framed as that of learning a set of functions where each function
corresponding to a particular task is represented by its individual data
set. These tasks are inter-dependent in that they share some common
underlying structure. Using this inner structure each task can be learned
in a more efficient way and empirical studies indicate that one can
benefit significantly by learning the tasks simultaneously as opposed to
learning them one by one in isolation.

[0041]The present invention may be applied to ore grade modelling as
described below in a non-limiting example of its implementation. Other
applications include environmental sciences, hydrology, economics and
robotics.

1. System Overview

[0042]Referring to FIG. 1, an embodiment of a data modelling system can be
implemented with the aid of appropriate computer hardware and software in
the form of a computing system 100. The computing system 100 can comprise
a processor 102, memory 104 and input/output 106. These components
communicate via a bus 108. The memory 104 stores instructions executed by
the processor 102 to perform the methods as described herein. Data
storage 110 can be connected to the system 100 to store input or output
data. The input/output 106 provides an interface for access to the
instructions and the stored data. It will be understood that this
description of a computing system is only one example of possible systems
in which the invention may be implemented and other systems may have
different architectures

[0043]FIG. 2 is a diagrammatic illustration of an orthogonally bounded
three-dimensional section of earth 200 incorporating ore of potential
interest for mining. The distribution of ore (not shown) within the body
of earth 200 may be of particular interest. The amount of ore in the
earth can be determined at intervals through an array of drill holes 220
bored from the surface 240 by a movable drill rig 260. The concentration
of ore can be measured from samples of material taken from the bore holes
220, at various depths, to generate a dataset representing a three
dimensional (3D) spatial array of discrete measurements. In order to
infer values of ore concentration at locations not actually measured, the
dataset can be applied to GP learning and regression for the purposes of
interpolation or extrapolation.

2. Gaussian Processes for Regression

[0044]Regression is supervised learning of input-output mappings from
empirical data called the training data. Each input-output mapping is
referred to as a task. If there are multiple inputs associated with
multiple outputs, the problem becomes a multi-task regression problem.
Once this mapping has been modelled, for example using Bayesian
modelling, it is possible to predict output values for new input data,
called test data.

[0045]Gaussian processes provide a powerful learning framework for
learning models of spatially correlated and uncertain data. A GP
framework is used in Bayesian modelling to describe the distribution of
outputs for functions used for mapping from an input x to an output f(x).
GP regression provides a robust means of estimation and interpolation of
spatial information that can handle incomplete sensor data (training
data) effectively. GPs are non-parametric approaches in that they do not
specify an explicit functional model between the input and output.

[0046]A GP is a collection of random variables, any finite number of which
have a joint Gaussian distribution. A GP is completely specified by its
mean and covariance functions. The mean function m(x) and covariance
function k(x, x') of a real process f(x) are defined as:

m(x)=E[f(x)] (1)

k(x,x')=E[(f(x)-m(x))(f(x)-m(x))] (2)

such that the GP is written as

f(x)˜GP(m(x),k(x,x')) (3).

[0047]The mean and covariance functions together describe a distribution
over possible functions used for estimation. In the context of modelling
in-ground resource distribution, for example, each input x represents a
point in 3D space, x≡(x, y, z), and the output, f(x), corresponding
to each x is a measurement of ore concentration.

2.1 Covariance Functions

[0048]Although not necessary, for the sake of convenience the mean
function m(x) may be assumed to be zero by scaling the data appropriately
such that it has a mean of zero. This leaves the covariance function to
describe the GP. The covariance function models the covariance between
the random variables which, here, correspond to sensor measured data.

[0049]As part of a non-parametric model, the covariance functions used for
GP regression have some free parameters that can be varied, and are
optimised using the training data. These parameters are called
hyperparameters.

[0050]There are numerous covariance functions that can be used to model
the spatial variation between the data points. A popular covariance
function is the squared-exponential covariance function given as

where ky is the covariance function; l is the length-scale, a measure
of how quickly the f(x) value changes in relation to the x value;
σf2 is the signal variance and σn2 is the
noise variance in the data being modelled. The symbol δpq
represents a Kroeneker Delta defined on indices p and q. The set of
parameters l, σf, σn are referred to as the
hyperparameters and specify what sort of values the parameters might
take. The squared-exponential covariance function, being a function of
lx-x'l, is stationary (invariant to translation).

2.2 Hyperparameters

[0051]Training the GP for a given dataset means determining and optimizing
the hyperparameters of the underlying covariance function.

[0052]Hyperparameters are determined from the data to be modelled. The
hyperparameters can be learnt from the training data using a manual
process, i.e. using a trial and error process. The hyperparameters can
also be learnt using a machine learning process. Typical methods include
using leave-one-out cross-validation (LOOCV), also called rotation
estimation, and Bayesian learning such as Maximum Likelihood Estimation.
In this example, a Maximum Likelihood Estimation method is used.

[0053]The log marginal likelihood of the training output (y) given the
training input (X) for a set of hyperparameters θ is given by

where Ky=Kf+σn2I is the covariance matrix for
the noisy targets y. The log marginal likelihood has three terms: the
first describes the data fit, the second term penalizes model complexity
and the last term is simply a normalization coefficient. Thus, training
the model will involve searching for the set of hyperparameters that
enables the best data fit while avoiding overly complex models. Occam's
razor is thus in-built in the system and overfitting is prevented by the
very formulation of the learning mechanism.

[0054]Using Maximum Likelihood Estimation, training the GP model on a
given set of data amounts to finding the optimal set of hyperparameters
that maximize the log marginal likelihood (eq. 6). For the
squared-exponential covariance function, optimizing the hyperparameters
entails finding the optimal set of values for θ={lx, ly,
lz, σf, σn}. Optimization can be done using
standard off-the-shelf optimization approaches. For example, a
combination of stochastic search (simulated annealing) and gradient
descent (Quasi-Newton optimization with BFGS hessian update) has been
found to be successful. Using a gradient based optimization approach
leads to advantages in that convergence is achieved much faster. A
description and further information about these optimization techniques
and others can be found in the text Numerical Optimization, by J. Nocedal
and S. Wright (Springer, 2006).

2.3 Regression

[0055]The learned GP model is used to estimate the quantity of interest
(e.g. ore concentration) within a volume of interest, characterized by a
grid of points at a desired resolution. This is achieved by performing
Gaussian Process regression at the set of test points, given the training
dataset and the GP covariance function with the learned hyperparameters.

where δpq is a Kroeneker Delta defined on p, q and is =1 if p=q
and 0 otherwise.

[0057]The joint distribution of any finite number of random variables of a
GP is Gaussian. Thus, the joint distribution of the training outputs y
and test outputs f.sub.* given this prior can be specified by

[0059]Denoting K(X, X) by K and K(X, X.sub.*) by K.sub.*, for a single
test point x.sub.*, k(x.sub.*)=k.sub.* is used to denote the vector of
covariances between the test point and the set of all training points.
The above equations can then be rewritten for a single test point as:

f.sub.*=k.sub.*T(K+σn2I)-1y (10)

and

V[f.sub.*]=k(x.sub.*,x.sub.*)-k.sub.*T(K+σn2I)-1-
k.sub.* (11).

[0060]Equations (10) and (11) provide the basis for the estimation
process. The GP estimates obtained are a best linear unbiased estimate
for the respective test points. Uncertainty is handled by incorporating
the sensor noise model in the training data. The representation produced
is a multi-resolution one in that a spatial model can be generated at any
desired resolution using the GP regression equations presented above.
Thus, the proposed approach is a probabilistic, multi-resolution one that
aptly handles spatially correlated information.

[0061]FIG. 6 is a graphical representation of a single-task Gaussian
process modelling one-dimensional data measurements shown as `+` symbols
in the drawing. The solid line represents the continuous best estimate
for the model, with uncertainty of prediction represented by the width of
the shaded region in the drawing. This figure shows that GP regression
leads to uncertain outcomes, i.e. results with great variance, in the
regions where the data points are not dense.

3. Regression with Interdependent Tasks

[0062]Sometimes measurements are taken of multiple characteristics within
a spatial domain which are dependent in some way. Iron ore deposits, for
example, are frequently accompanied by silicon dioxide in some dependent
manner, and the concentrations of each can be measured separately from
sample material obtained out of drill holes. A model of the ore deposit
may be generated by applying a standard single-task GP to the sample
measurements of iron concentrations. It is also possible to exploit the
dependence of iron ore on the silicon dioxide. To achieve this, an
algorithm is provided that is able to learn the dependence from the
training data in a GP framework by learning multiple dependent GP tasks
simultaneously.

[0063]Single task covariance functions can be used to apply GP regression
only to a single task (i.e. a single output function) at a time. If there
are many tasks to learn and estimate, then using single-task covariance
functions considers the tasks separate from one another and information
present in one task is not used to achieve an improved model for another
task. Multi-task GPs make it possible to consider different tasks in a
single GP regression and to use the intrinsic connections between them to
produce better results. The developed new multi-task covariance functions
of this invention have the advantage of making it possible for multi-task
GPs to:

[0064](1) have different parameters (e.g. length scales) for each
individual task, and

[0065](2) have different covariance functions for each individual task.

Furthermore, the sets of input data points for different tasks can be
different in the input/output data sets. These new possibilities are
useful because the different tasks to be learnt and estimated together
may be scaled differently or have different appropriate covariance
functions because of different inner structures.

[0066]A multi-task GP framework involves analysing the multiple datasets
simultaneously to learn hyperparameters of a multi-task covariance
function that simultaneously models the covariance between the different
datasets as well as the covariance amongst data samples within datasets.
However, covariance functions suitable for single-task GPs, like the
squared-exponential, Sparse and Matern (described further hereinafter)
are not directly applicable where multiple GP tasks are to be combined.
What is required is a manner of combining single-task covariance
functions to be suitable for use in multi-task applications. A method for
determining such multi-task covariance functions and applying them is
described herein. Mathematical derivations are shown in the appendices to
the specification.

[0067]FIG. 3 is a flow chart for a data analysis and data synthesis system
using multi-task Gaussian processes, adapted for use in the mining
scenario depicted in FIG. 2. The implemented method can accommodate
multiple data types, and is described herein, by way of example, with two
data types. The system includes first and second rock characteristic
measurement sources that sample characteristics of the material
encountered in forming the drill holes 220 (FIG. 2). The rock
characteristics measured can be derived during formation of the drill
holes by the drill rig 260 by sampling sensors such as accelerometers,
tachometers, pressure transducers and torque sensors and classifying
rocks in terms of rock factors (hardness, fragmentation) and geology.
Other applicable measurement techniques may include down-hole sensing
such as natural gamma, and chemical assays, possibly in-situ. Whatever
the measured quantity, the measurement is accompanied by spatial position
information recorded by the drill rig 260, for example using GPS and/or
other positioning methods that provide 3D location information
corresponding to each measurement sample.

3.1 The Multi-Task Training Phase

[0068]The two different types of measurement sensor data 310, 320
generated by the sensors, including the corresponding spatial positioning
information, are provided to a training processor. The sensor data
provides the training data required for the regression. The multi-task
training step 330 trains the sensor data 310, 320. The training step 330
determines a non-parametric, probabilistic, multi-scale representation of
the data for use in modelling the in-ground spatial distribution of ore,
which in turn can be used for prediction in the multi-task evaluation
step 340. Details of specific operational procedures carried out by the
training processor are described below with reference to FIG. 4.

[0069]FIG. 4 is a flow chart diagram showing the multi-task training phase
procedure 340 for the ore distribution data modelling process. The
training phase 340 begins with obtaining the sensor measurement data at
step 410 from an appropriate source, in this case drill sensors and/or
chemical and radiological assay measurements with corresponding 3D
spatial positioning information. The positioning information and the
sensor data together are the observed inputs and observed outputs,
respectively, that comprise the training data used for the regression.

[0070]For the sake of the current example, one sensor measures and
produces data representing a quantity representative of iron content
(310) whilst another measures a quantity representing silicon dioxide
content (320). The measurements relating to iron and silicon dioxide
spatial distribution are distinct but dependent in some unknown way.

[0071]For ease of storage and retrieval the data analysed and synthesised
by the GP regression method described herein can be saved in the form of
a hierarchical data structure known as a KD-Tree. The use of such a data
structure provides the training and evaluation processors with rapid
access to the sampled measurement data on demand. After the data has been
input to the training processor at step 410 it is converted to KD-Tree
data at step 420 and stored at step 430.

[0072]The data storage step is followed by a multi-task GP learning
procedure at step 440, with the objective of learning a representation of
the spatial data. The learning procedure is aimed at determining the
hyperparameter values of the covariance function associated with the GP.
This is done with a Maximum Likelihood Estimation method that is used to
optimise the hyperparameters associated with the GP covariance function.
The covariance function hyperparameters provide a coarse description of
the spatial model, and can be used together with the sensor measurement
data to generate detailed model data at any desired resolution, including
a statistically sound uncertainty estimate. The optimized covariance
function hyperparameters are stored in step 450, together with the
KD-Tree sample data structure, for use by the evaluation procedure.

[0073]Although the method of obtaining the multi-task GP described here is
similar to a standard method of obtaining a single task GP, there are
some differences. In the case of a single task GP we have: [0074]a
single set of input points X=[x1, x2, . . . , xn]T;
[0075]a single set of targets y=[y1, y2, . . . ,
yn]T; [0076]a single scalar noise level σ2; and
[0077]a single set of test inputs X.sub.*=[x.sub.*1, x.sub.*2, . . . ,
x.sub.*p]T.This results in a single covariance matrix K.

[0088]Once the multi-task model has been established, it can be used to
estimate new output values for a new set of test inputs.

[0089]An evaluation processor is used to execute the evaluation step 340,
which entails utilising the measurement data together with multi-task
Gaussian process model data according to a desired modelling grid
resolution. This grid resolution is the test data for the evaluation
process. Specific operational details of the evaluation processor are
provided below with reference to FIG. 5.

[0090]FIG. 5 is a diagrammatic representation of the evaluation phase
procedure 340 for the data modelling process. The multi-task GP
evaluation process 530 entails using the model 510 to estimate output
values 540 that correspond to the test input values 520. The model is
described by the multi-task covariance function that was determined in
step 330 of FIG. 3.

[0091]Since the Gaussian process representation obtained is a continuous
domain one, applying the model for any desired resolution amounts to
sampling the model at that resolution. A grid in the area of interest, at
the desired resolution, is formed. The required grid resolution provides
the test input values 520 for the evaluation process 530.

[0092]The objective is to use the learnt spatial model to conduct
estimation at individual points in this grid. Each point in the grid is
interpolated with respect to the model determined in the previous step
and the nearest training data around that point. For this step, using a
KD-Tree for storing the data naturally and efficiently provides access to
the nearest known spatial data. This together with the learnt model
provides an interpolation estimate for the desired location in the grid.
The estimate is also accompanied with an uncertainty measure that is
simultaneously computed in a statistically sound manner.

[0093]The output 540 of the multi-task GP evaluation 530 is a digital
representation (shown in FIG. 3 as data that is displayed 350 or is used
as a control input 360) of a spatial distributed quantity (e.g. Fe) at
the chosen resolution and region of interest together with an appropriate
measure of uncertainty for every point in the map.

[0094]Evaluation of the GP can be done using a standard prediction
algorithm, for example by executing the following steps:

[0096]Once the ore spatial distribution model data has been generated in
the evaluation step 340 it can be displayed graphically for human viewing
350, or used in digital form 360 as input for computer controlled
operations, for example.

4. Determining Multi-Task Covariance Functions

[0097]What happens in the multi-task training phase described above can be
understood within the general framework for calculating inter-task
cross-covariance functions for stationary covariance functions, based on
the methods of Fourier analysis, as described in this section. New
cross-covariance functions are derived for different single task
covariance functions; they are calculated in analytical form and can be
directly applied.

[0098]Using the methods of Fourier analysis a general framework is
developed for calculating the cross-covariance functions for any two
stationary covariance functions. The resulting ((N1+N2+ . . .
+NM)×(N1+N2+ . . . +NM)) sized covariant
matrix, where M is the number of tasks and N1, N2, . . . ,
NM are the number of input points in each task, can be shown to be
positive semi-definite and is therefore suitable for use in multi-task
Gaussian processes. Analytical calculations are also provided for the
calculation of cross-covariance functions of different covariance
functions.

4.1 Defining the Multi-Task Covariance Function

[0099]It is possible to consider several dependent tasks simultaneously.
As an example, and with reference to FIG. 7, the case of two dependent
tasks is described here, each task associated with a different covariance
function. Each covariance function is selected in step 702. The basis
functions g1(x) and g2(x) of the covariance functions
K1(x, x') and K2(x, x') can be determined by using Fourier
analysis as described in Appendix A and shown in step 704. The basis
functions are used to construct the multi-task covariance function for
these two covariance functions as shown in step 706.

[0100]Constructing the multi-task covariance function includes finding the
cross-covariance function between the two covariance functions. Suppose
K1 and K2 are single-task stationary covariance functions, it
is shown in Appendix A that K1 and K2 can be represented in the
following form:

K1(x,x')=∫.sub.-∞.sup.∞g1(x-u)g1(x'-u)du
(12)

K2(x,x')=∫.sub.-∞.sup.∞g2(x-u)g2(x'-u)du
(13)

[0101]All stationary covariance functions can be expressed in this form.
Consequently, the multi-task covariance function that describes the
multi-task GP (step 708) can be defined as:

K((x,i),(x',j))=∫.sub.-∞.sup.∞gi(x-u)gj(x'-u)du
(14)

where i and j identify the task number and (x, i), (x',j) represent the
points x and x' from the task i and j respectively.

[0102]The proof in Appendix B shows that the multi-task covariance
function K((x, i), (x', j)) is positive semi-definite (PSD) for the set
of any number of tasks and therefore can be directly used in multi-task
GPs. K1(x, x') and K2(x, x') can be the same covariance
function with the same or different characteristic lengths, or they can
be different covariance functions.

[0103]The multi-task covariance function of eq. (14) (as described in
Appendix B) can be understood as having the following general form for n
tasks:

( C 11 C 1 n C n 1 C
nn ) ##EQU00006##

wherein the diagonal of this matrix, C11, C22, . . . , Cnn,
is provided by the covariance functions of each of the n tasks. The other
off-diagonal terms represent the cross-covariance functions that describe
the interdependence between the tasks.

[0104]In step 706 shown in FIG. 7, the multi-task covariance function can
be found from the basis functions of the individual covariance functions
by using eq. (14). As an example consider the case when there are two
tasks with associated covariance functions, K1(x, x') and K2(x,
x'), which are squared exponential covariance functions with different
characteristic lengths:

[0105]Applying the proposed procedure and calculating the integral present
in the multi-task covariance function definition of eq. (14), provides
the following multi-task version of the squared exponential covariance
function:

In general, the model is a convolution process of two smoothing kernels
(basis functions) assuming the influence of one latent function. It is
also possible to extend to multiple latent functions using the process
described in M. Alvarez and N. D. Lawrence. Sparse, Convolved Gaussian
Processes for Multi-output Regression, in D. Koller, Y. Bengio, D.
Schuurmans, and L. Bottou (editors), NIPS MIT Press, 2009.

4.2 Three Example Covariance Functions

[0106]In this section the cross-covariance functions of three example
covariance functions will be calculated.

[0110]From eq. (12)-(14) and (19) it follows that the cross-covariance
function of the Sparse covariance function and any other covariance
function can be written in the following form of an integral with finite
limits:

[0111]Eq. (20) demonstrates an important consequence of the vanishing
property of the Sparse covariance function: as the Sparse covariance
function vanishes outside of the interval xε(-lS/2,
lS/2), the cross-covariance function with it is an integral over
only a finite interval, which can be easily computed numerically. If the
basis function of the task j does not have a very complicated form, the
integral in eq. (20) can be calculated analytically which will
significantly speed up calculations.

[0112]From eq. (20) it follows that the cross-covariance function of the
task j with the Sparse covariance function will vanish outside of some
finite interval if and only if the basis function of the task j vanishes
outside of some finite interval.

2) Squared Exponential and Matern Covariance Functions

[0113]The other two example covariance functions considered here are the
following:

[0114]For these covariance functions the steps described below correspond
to the second 704 and third step 706 of the process shown in FIG. 7.

[0115]To find the basis function of the squared exponential and Matern
covariance functions we use the Fourier analysis technique presented in
Appendix A. Applying Fourier transformation to these functions one has
that

[0117]The next step is to derive the inverse Fourier transformations of
g*SE(s) and g*M(s). Comparing eq. (23)-(24) and (25)-(26) one
can see that g*SE(s) and g*M(s) can be obtained from
K*SE(s) and K*M(s), respectively, by applying the following
changes to the parameters:

[0120]Using the associations between eq. (21), (22) and (23), (24)
together with the conversion formulas between the images of covariance
functions and the images of basis functions presented in eq. (27)-(29)
after some algebraic manipulations the following expressions for the
basis functions are obtained:

[0122]KSE×SE, KS×S and KSE×S are
calculated in closed form, KM×S has finite limits of
integration and the integral in KM×SE converges very quickly
as its integrand tends to zero squared exponentially. Therefore all the
presented cross-covariance functions are suitable to be directly used for
multi-task GP learning and inference.

[0123]There are many mathematical equivalents and approximations of the
aforementioned cross-covariance functions that may be used for data
analysis. The cross-covariance functions KSE×SE,
KS×S and KSE×S in a different form and a Matern
3/2×Matern 3/2 cross-covariance function are listed in Appendix D.

[0124]When eq. 32-36 are used following the first step 702 in the process
shown in FIG. 7, then the second 704 and third steps 706 can be omitted.
FIG. 8 shows that, in this case, an alternative method 800 is used
wherein the second 704 and third steps 706 are replaced by a step 802. In
step 802 the multi-task covariance function is looked up if any of the
example covariance functions of this section are used, for which the
cross-covariance functions are given by eq. 32-36.

[0125]Details of the derivation of KS×S and KSE×S
and the definition of JS×S are presented in Appendix C.

5. Results of Using Multi-Task GP Regression

[0126]In ore grade prediction the interdependence between grades of
different minerals can be used to improve the prediction quality, reduce
the overall uncertainty for each estimation task and provide means for
estimation with partial data. The estimated function represented in FIG.
6, for example, could have a reduced variance if a second set of data
measurements were known that was in some way related to the first. FIG.
9(a) graphically shows the same data modelled using a multi-task GP that
considers cross-covariance with an additional dataset, graphically
illustrated in FIG. 9(b). This figure demonstrates that the multi-task GP
learns intrinsic inter-task connections in different regions and
therefore leads to more confident results (i.e. results with less
variance) even in the regions with low density of data points.

[0127]FIG. 10 graphically demonstrates how a three-dimensional multi-task
GP with the proposed covariance function can provide information about
the regions where data is missing or is not complete. FIG. 10a) shows the
single task GP regression results for iron with about 30% of its data
removed, and for silicon dioxide with full data, i.e. with information
from all the drill holes. The drawing shows only the front views of the
3D in-ground resource estimation results. The first part of FIG. 10a)
clearly demonstrates that the single task GP is unable to provide
reasonable estimations in the region 1002 where the data is missing. For
FIG. 10b) a two-task GP was used to learn iron and silicon dioxide
distributions simultaneously. The GP regression with the proposed
multi-task approach learns the intrinsic connections between the grade
distributions of iron and silicon dioxide where the data for both of them
is available and based on that connection estimates the distribution of
iron for the 30% of the volume where the data is actually missing. The
results can be seen by comparing region 1004 in FIG. 10b) with region
1002. These plots demonstrate that the proposed approach is able to
provide good estimation even in the case when a significant portion of
the data is missing.

[0128]Another experiment demonstrates the benefits of using the
multi-kernel methodology in an artificial 1-D problem for two dependent
tasks. The observations for the first task are generated from a minus
sine function corrupted with Gaussian noise. Only the observations for
the second part of the function are used and the objective is to infer
the first part from observations of the second task. Observations for the
second task were generated from a sine function with some additional
complexity to make the function less smooth and corrupted by Gaussian
noise. A comparison between independent GP predictions, multi-task GP
with squared exponential kernel for both tasks, and the multi-kernel GP
(squared exponential kernel for the first task and Matern 3/2 for the
second) is presented in FIGS. 11A to 11C. It can be observed in FIG. 11C
that the multi-kernel GP models the second function more accurately. This
helps in providing a better prediction for the first task. In FIG. 11 the
dots represent the observations and the dashed line represents the ground
truth for task 1. The extent of the shaded region around the lines is
indicative of prediction accuracy.

[0129]Despite the simplicity of this experiment it simulates a very common
phenomenon in grade estimation for mining. Some elements have a much
higher concentration variability but follow the same trend as others.
Being able to aptly model these dependencies from noisy x-ray lab samples
is essential for an accurate final product.

[0130]This is empirically demonstrated in a further experiment. 1363
samples from an iron ore mine were collected and analyzed in a laboratory
with x-ray instruments to determine the concentration of three
components: iron, silica and alumina. Iron is the main product but
equally important is to assess the concentration of the contaminants
silica and alumina. The samples were collected from exploration holes of
about 200 m deep, distributed in an area of 6 km2. Each hole was
divided into 2 meter sections for laboratory assessment, the lab result
for each section was then an observation in the dataset. The final
dataset consisted of 4089 data points representing 31 exploration holes.
Two holes were separated to use as testing data. For these holes the
concentration of silica given iron and alumina was predicted. The
experiment was repeated employing different multi-task covariance
functions with either squared exponential or Matern kernel for each task
combined with the cross-covariance terms presented in Appendix D. The
results are summarized in Table 1 which demonstrates that the
dependencies between iron, silica and alumina are better captured by the
Matern 3/2×Matern 3/2×SqExp multi-kernel covariance function.

[0131]In a still further experiment GP's with different multi-kernel
covariance functions were applied to the Jura dataset, a benchmark
dataset in geostatistics. It consists of a training set with 259 samples
in an area of 14.5 km2 and a testing set with 100 samples. The task
is to predict the concentration of cadmium (Cd), lead (Pb) and zinc (Zn)
at new locations. The proposed multi-kernel covariance functions enable
considering different kernels for each of the materials thus maximizing
the predictive qualities of the GP. The 259 training samples were used at
the learning stage and the 100 testing samples were used to evaluate the
predictive qualities of the models. The square root mean square error
(SMSE) for all possible triplet combinations of SqExp and Matern 3/2
kernels are presented in Table 2. The results demonstrate that the
dependencies between cadmium, lead and zinc are better captured by the
Matern 3/2×SqExp×SqExp triplet-kernel.

In a still further experiment a concrete slump dataset was considered.
This dataset contains 103 data points with seven input dimensions and 3
outputs describing the influence of the constituent parts of concrete on
the overall properties of the concrete. The seven input dimensions are
cement, slag, fly ash, water, SP, coarse aggregate and fine aggregate and
the outputs are slump, flow and 28-day compressive strength of concrete.
83 data points were used for learning and 20 data points were used for
testing. The square root mean square error (SMSE) for all possible
triplet combinations of SqExp and Matern 3/2 kernels for this dataset are
presented in Table 3. The results demonstrate that the dependencies
between slump, flow and 28-day compressive strength of concrete are
better captured by the SqExp×Matern 3/2×Matern 3/2
triplet-kernel.

One aspect of the invention provides a novel methodology to construct
cross covariance terms for a multi-task Gaussian process. This
methodology allows the use of multiple covariance functions for the same
multi-task prediction problem. If a stationary covariance function can be
written as a convolution of two identical basis functions, a cross
covariance term can always be defined resulting in a positive definite
multi-task covariance matrix. A general methodology to fund the basis
function is then developed based on Fourier analysis.

[0132]Analytical solutions for six combinations of covariance functions
are provided, three of them combining different covariance functions. The
analytical forms for the cross covariance terms can be directly applied
to GPs prediction problems but are useful for other kernel machines.

[0133]A multi-task sparse covariance function is presented which provides
computationally efficient (and exact) way of performing inference in
large datasets. Note however that approximate techniques can also be
used.

[0134]The approach may be extended to non-stationary covariance functions,
possibly combining non-stationary and stationary kernels. This can be
useful in applications involving space and time domains such as pollution
estimation or weather forecast.

[0135]The presented method not only provides possibilities for better
fitting the data representing multiple quantities but also makes it
possible to recover missing data. It provides means for estimating
missing data in different regions for different tasks based on the
intrinsic inter-task connections and information about other tasks in
these regions (e.g. if the information for grades of some materials is
missing for some drill holes, it can be inferred based on the information
about the grades of other materials in these drill holes and the
intrinsic connections between distributions of all these materials
learned using the proposed approach).

[0136]Although the foregoing description relates to specific mine related
models where the proposed method can be directly used in in-ground
resource estimation (i.e. simultaneous learning of different materials'
grade distribution taking into consideration their intrinsic
inter-dependences), it will be readily appreciated that the spatial data
modelling methodologies described herein are not limited to this
application and can be used in many areas including geophysics, mining,
hydrology, reservoir engineering, multi agent robotics (e.g. simultaneous
learning of information provided by different sensors mounted to several
vehicles and/or developing a control system that utilises a model of the
dependencies between the control outputs for a plurality of actuators)
and financial predictions (e.g. simultaneous learning of variances in
exchange rates of different currencies or simultaneous learning of the
dynamics of different share prices taking into consideration intrinsic
inter-task connections).

[0137]It will be understood that the term `comprises` (and grammatical
variants thereof) as used in this specification is equivalent to the term
`includes` and is not to be taken as excluding the existence of
additional elements, features or steps.

[0138]It will be understood that the invention disclosed and defined in
this specification extends to all alternative combinations of two or more
of the individual features mentioned or evident from the text or
drawings. All of these different combinations constitute various
alternative aspects of the invention.

APPENDIX A

General Framework Based on Fourier Analysis

[0139]Suppose that K(τ) is a stationary covariance function in RD
with a spectral density S(s). In this case K(τ) and S(s) are Fourier
duals of each other, i.e.

K(τ)=Fs→τ-1[S(s)](τ),
S(s)=F.sub.τ→S[K(τ)](s) (37)

where τ=x-x'and the direct and inverse Fourier transformations are
defined as follows:

[0141]Applying the Fourier transformation to eq. (46) and using the fact
that the Fourier transformation of the convolution of two functions is
equal to {square root over (2π)} times the product of the Fourier
transformations of the functions being convoluted, i.e.

(g1(x)*g2(x))*(s)= {square root over
(2π)}g1*(s)g*2(s)

one has that

K*(s)= {square root over (2π)}(g*(s))2 (47).

Using eq. (47) and (39) one can calculate the basis function using the
covariance function as follows:

where lSE, lM and lS are the length scales for the squared
exponential, Matern 3/2 and sparse covariance functions respectively, and
H(x) is the Heaviside unit step function.From these definitions, the
following cross covariance functions can be derived:

Multidimensional and anisotropic extensions to the other models are
possible by taking the product of the cross covariance terms defined for
each input dimension.The examples above do not consider parameters for
the amplitude (signal variance) of the covariance functions. This,
however, can be added by multiplying blocks of the multi-task covariance
matrix by coefficients from a PSD matrix.