Abstract:

A system and method of generating a dynamic visualization of a
multi-dimensional dataset of data-points are disclosed. The method is an
adaptation of the Grand Tour approach, but instead of using all possible
projections comprising at least one data point, some of which may not be
useful, the method includes generating a set of candidate projections
from the space of all possible projections. The set of candidate
projections is approximated with a one dimensional smoothed curve and the
dynamic visualization is generated, based on a sequence of projections
taken along the smoothed curve.

Claims:

1. A method of generating a dynamic visualization of a multi-dimensional
dataset comprising:generating a set of candidate projections from the
space of all possible projections for a multi-dimensional set of
data-points, based on the multi-dimensional dataset;approximating the set
of candidate projections with a one dimensional smoothed curve;
andgenerating the dynamic visualization based on a sequence of
projections taken along the smoothed curve.

2. The method of claim 1, wherein the generating a set of candidate
projections includes sampling the possible projections to identify a set
of projections expected to be of interest to a user.

3. The method of claim 1, wherein the generating of the set of candidate
projections comprises generating constraints based on user input and
generating the set of candidate projections based at least in part on the
user-based constraints.

4. The method of claim 3, wherein projections from the space of
projections are sampled, based on a Metropolis Hasting algorithm, to
identify a distribution of candidate projections which take into
consideration the constraints.

5. The method of claim 1, wherein the generating of the set of candidate
projections comprises local linear embedding which for each of the
data-points, identifies a single projection for a neighborhood comprising
the data-point and its nearest neighbors.

6. The method of claim 5, wherein the identification of the linear
projection also takes into account any labels applied to the data-points
in the neighborhood.

7. The method of claim 1, wherein the approximating of the finite set of
projections with the one dimensional smoothed curve includes defining a
number of centroids which, in sequence, are to form the smoothed curve,
and optimizing an objective function which includes a first term which
takes into account the distance of each candidate projection from its
nearest centroid and a second term which favors smoothness of the curve.

8. The method of claim 7, wherein the objective function is optimized by
expectation maximization.

9. The method of claim 7, wherein the objective function includes a
weighting factor which is selectable to influence the respective
importance of the first and second terms in the optimization.

10. The method of claim 7, wherein there are at least 10 centroids.

11. The method of claim 7, wherein the generating of the dynamic
visualization includes generating a sequence of projections include
projections at the centroids and optionally projections identified by
interpolation between pairs of centroids.

12. The method of claim 1, wherein the projections are linear projections.

13. The method of claim 1, wherein the projections are 2 dimensional
projections.

14. The method of claim 1, further comprising displaying the dynamic
visualization on a display.

15. The method of claim 14, further comprising receiving labels applied by
a user to data-points while viewing the dynamic visualization on the
display and generating a new dynamic visualization based on constraints
derived from the labels.

16. A computer program product which encodes instructions, which, when
implemented by a computer, perform the method of claim 1.

17. A dynamic visualization of a dataset generated by the method of claim
1.

18. A computer system comprising:memory which stores a dynamic
visualization component which includes instructions for performing the
method of claim 1; anda processor, communicatively linked with the memory
for executing the instructions.

19. The system of claim 18, further comprising memory which stores the
data set in communication with the processor.

20. The system of claim 18, further comprising a display communicatively
linked with the processor, which is caused to display the animation.

21. A computer-implemented method for generating a dynamic visualization
of a dataset having more than three dimensions comprising:receiving
constraints input by a user;based on the constraints, identifying a set
of candidate linear projections for the dataset;approximating the set of
candidate linear projections as a smoothed one dimensional curve;
andautomatically generating a dynamic visualization based on a sequence
of projections taken along the smoothed curve.

22. An automated method for assisting a user in labeling data points of a
dataset comprising:generating a dynamic visualization of a dataset based
on a sequence of projections, each projection comprising at least one
data-point from the dataset;receiving labels applied by a user to
data-points displayed in the dynamic visualization;based on the labels,
generating constraints;based on the constraints, identifying a set of
candidate linear projections for the dataset;approximating the set of
candidate linear projections as a smoothed one dimensional curve;
andgenerating a new dynamic visualization based on a sequence of
projections taken along the smoothed curve.

Description:

BACKGROUND

[0001]The exemplary embodiment relates to a method and system for the
display of multi-dimensional data. It finds particular application in
connection with dynamically determining and presenting appearance and
spatial attribute values of entities of the multi-dimensional data over a
sequence to assist in the recognition of patterns and trends within the
data.

[0002]Data visualization and analysis is a difficult task when the
dimensionality of data is high and the shape of clusters is complex.
Recently, a number of data visualization methods have been proposed. Most
of the methods are only applicable to a simple dataset, for example
smooth manifolds or clustered data-points. Standard data analysis tools
use mathematical operations to reduce the dimensionality of the data to a
more manageable dimensionality (e.g., multidimensional scaling (MDS),
principal component analysis (PCA), cluster analysis, projection pursuit
methods, neural network algorithms, and the like) or for transforming
data for visualization. These methods provide a static view of the data,
which is efficient for a simple dataset.

[0003]In some cases involving large, noisy or non-linear datasets, static
visualization methods are not able to give an intuitive understanding of
the spatial organization of the data. One approach is to use a dynamic
visualization method, that is to say a system that outputs an animation
consisting of a series of smoothly changing projections of a data-point
cloud that encompasses all data-points. With such settings, visualization
is similar to watching a movie, and thus makes use of time as an
additional dimension. However, this dimension is very specific and
requires a dedicated method to be really understood by users. A standard
method for dynamic visualization is known as the Grand Tour. In the Grand
Tour, sequences of 2D or 3D projections are displayed (See, ASIMOV, D.
The grand tour; a tool for viewing multidimensional data. SIAM. Journal
of Science and Statistical Computing, 6(1):128-143, January 1985).
Instead of choosing an arbitrary projection to visualize the data, every
possible projection is approximately visualized using multiple images in
a movie-like animation. A space-filling curve is used in traversing the
projection space, i.e., a series of projections for which, for every
possible projection, the series contains at least one element in a small
neighborhood, and the sequence of projections is smooth, so that two
contiguous projections in the series give similar images. In the
classical implementation, a step and space-filling curve are defined, a
plane is moved along this curve and the data projected. The user browses
the animation using the time dimension scale by which the projections are
indexed.

[0004]The series of projections in the Grand Tour does not depend on the
data being visualized. However, viewing the huge space of 2D projections
of a multidimensional dataset as a video can be prohibitively time
consuming and not really informative when the number of dimensions is
large. As a result, the Grand Tour is generally impractical for more than
10 dimensions.

[0005]Other dynamic visualization methods may involve a more an advanced
framework that includes interaction with the user.

[0006]In order to reduce the huge search space for projection
visualization, a projection pursuit guided tour has been proposed which
combines Grand Tour and projection pursuit (See COOK, D., BUJA, A.,
CABRERA, J., AND HURLEY, H. Grand tour and projection pursuit, J. of
Computational and Graphical Statistics 4, pp. 155-172 (1995)). The method
of projection pursuit finds the projections that optimize a criterion
called the projection pursuit index. This criterion should reveal the
most details about the structure (clusters, surfaces, etc.) of the
dataset (See FRIEDMAN, J., AND TUKEY, J. A projection pursuit algorithm
for exploratory data analysis. In IEEE Transactions on Computers., pp.
881-890 (1974)). This combination is a useful visualization tool for some
applications but does not allow a user to participate in the process.

[0007]Interaction techniques can empower the user's perception of
information. A set of interaction techniques, such as aggregation,
rotation, linking and brushing, interactive selection, and the like may
improve the visualization process (See, DOS SANTOS, S. R., A framework
for the visualization of multidimensional and multivariate data. Ph.D.
Dissertation, University of Leeds, United Kingdom (2004)).

[0009]Another approach is known as Targeted Projection Pursuit (TPP) (See,
FAITH, J. Targeted projection pursuit for interactive exploration of
high-dimensional data sets. In IV '07: Proceedings of the 11th
International Conference Information Visualization, IEEE Computer
Society, pp. 286-292 (Washington D.C., 2007) Unlike VISTA, The basis of
TPP is that the user manipulates their view of the data directly, rather
than manipulating the projection that produces that view. TPP is an
interactive exploration tool where the user defines a target, and the
system finds a projection that best approximates that target.

[0010]Both of these alternatives to the Grand Tour approach are relatively
complex and require a highly trained user.

INCORPORATION BY REFERENCE

[0011]U.S. Pat. No. 7,265,755 issued Sep. 4, 2007, entitled METHOD AND
SYSTEM FOR DYNAMIC VISUALIZATION OF MULTI-DIMENSIONAL DATA, by Peterson,
discloses a method in a computer system for automatically presenting a
dynamic visualization of data in a multi-dimensional space of greater
than three dimensions, the data having a plurality of attributes. The
method includes receiving a plurality of mappings of data attributes to
visualization dimensions, wherein the visualization dimensions include at
least one appearance dimension, a plurality of spatial dimensions, and at
least one sequencing dimension, determining a plurality of data entities
from the data, each data entity associated with a portion of the data,
and for each determined data entity, at a time of visualization of the
data, automatically and dynamically generating a series of
representations of the data entity in the multi-dimensional space and
automatically and dynamically presenting the generated series of
representations, the representations based upon values of each of the
data attributes of the data associated with the data entity that have
been mapped to the appearance, spatial, and sequencing dimensions, to
portray changes in the data entity over values of the sequencing
dimension so that trends in the data can be identified.

[0012]U.S. Pat. No. 6,100,901, issued Aug. 8, 2000, entitled METHOD AND
APPARATUS FOR CLUSTER EXPLORATION AND VISUALIZATION, by Mohda, et al.,
discloses a method and apparatus for visualizing a multi-dimensional data
set in which the multi-dimensional data set is clustered into k clusters,
w, each cluster having a centroid. One of two distinct current centroids
and three distinct non-collinear current centroids is selected. A current
2-dimensional cluster projection is generated, based on the selected
current centroids. Two distinct target centroids are selected (or three
non-collinear target centroids), at least one of which is different from
the current centroids. An intermediate 2-dimensional cluster projection
is generated, based on the current centroids and the target centroids.

BRIEF DESCRIPTION

[0013]In accordance with one aspect of the exemplary embodiment, a method
of generating a dynamic visualization of a multi-dimensional dataset
includes generating a set of candidate projections from the space of all
possible projections for a multi-dimensional set of data-points, the set
of candidate projections being based on the multi-dimensional dataset.
Each of the possible projections includes at least one data point. The
method further includes approximating the set of candidate projections
with a one dimensional smoothed curve and generating the dynamic
visualization based on a sequence of projections taken along the smoothed
curve.

[0014]In another aspect, a computer-implemented method for generating a
dynamic visualization of a dataset having three or more than three
dimensions includes receiving constraints input by a user, based on the
constraints, identifying a set of candidate linear projections for the
dataset, approximating the set of candidate linear projections as a
smoothed one dimensional curve, and automatically generating a dynamic
visualization based on a sequence of projections taken along the smoothed
curve.

[0015]In another aspect, an automated method for assisting a user in
labeling data points of a dataset includes generating a dynamic
visualization of a dataset based on a sequence of projections, each
projection comprising at least one data-point from the dataset, receiving
labels applied by a user to data-points displayed in the dynamic
visualization, based on the labels, generating constraints, based on the
constraints, identifying a set of candidate linear projections for the
dataset, approximating the set of candidate linear projections as a
smoothed one dimensional curve, and generating a new dynamic
visualization based on a sequence of projections taken along the smoothed
curve.

[0020]FIGS. 5A-F illustrate projections from a sequence of projections
generated by the exemplary method on a real dataset (Lymph) using a
semi-supervised approach; and

[0021]FIGS. 6A-F illustrate projections from a sequence of projections
generated by the exemplary method on a the same dataset (Lymph) using a
unsupervised approach, demonstrating a difference in the projections
which are generated by supervised and unsupervised approaches.

DETAILED DESCRIPTION

[0022]Aspects of the exemplary embodiment relate to a system and
computer-implemented method for automatically presenting a dynamic
visualization of data-points in a multi-dimensional space of typically
three or more dimensions. The exemplary method, referred to herein as the
Adaptive Grand Tour (AGT), provides a dynamic visualization framework for
multidimensional data analysis. One objective of the system is to allow a
user to explore the complexity of a dataset by smooth animations of
data-points. This kind of data visualization is a natural way to handle
complex datasets with high dimensionality and complex shape. A tractable
alternative to Grand Tour visualization has been developed and extended
to permit the user to guide the smooth animation of data-points according
to his specific interests. The exemplary method is based on the
extraction of many optimal local 2D projections and the generation of
data-point movies with a 1-dimensional Bayesian Self-Organizing Map that
finds a fixed-length smooth path through the generated projections. In
tests of the relevance of the approach on real and artificial datasets,
the method has been shown to enable the management of complex datasets
which can benefit from a user's guidance.

[0023]In the exemplary method, the Adaptive Grand Tour approach is an
adaptation of the Grand Tour approach which improves visualization of
high dimensional (possibly infinite-dimensional) datasets, by using the
structure of the data. The method assumes that, in general, the effective
number of dimensions of a dataset is much lower that the number of
features used to describe it. By allowing a user to select the most
relevant features (or by automatically selecting projections likely to be
interesting), the method is able to target only a subset of the (more
relevant) projections that Grand Tour would otherwise explore.

[0024]In one embodiment, the method provide an automatic exploration of a
dataset, based on the selected features, e.g., as a sequence of
projections displayed to a user as a movie. In other embodiments, the
method takes into account user feedback during initial exploration to
guide subsequent visualizations toward views that are of greater interest
to the user. In some aspects, the method makes use of a kernel
representation of the data to handle non-linear projections.

[0025]An implementation of such a system has been tested on exemplary
multidimensional datasets, as described in further detail below. It
combines, in a natural way, user interaction and the visualization of an
animation. This makes a powerful tool to explore and analyze data, to
uncover underlying structure, extract important variables, and detect
outliers and anomalies.

[0026]The method takes advantage of the exploratory power of Grand Tour
visualization and resolves its main drawbacks: exploration of
high-dimensional datasets and consideration of the user interest.

[0027]As with the Grand Tour approach, the exemplary method and system
allows a user to visualize a dataset by a smooth animation of 2D
projections (or simulated 3D projections). The basic steps of the method
are illustrated in FIG. 1, and may be implemented with a computer system,
as shown in FIG. 2. The method assumes that a multidimensional data set,
e.g., having greater than three dimensions, has been input and stored.
The data set, for purposes of illustration, may comprise data-points
having a set of features which can be represented by feature values in a
number of dimensions. In some embodiments, each data-point may correspond
to a record, such as a document, image, thumbnail image, or the like. The
record may be linked to the data point and revealed when a user clicks on
an active area of the screen corresponding to the data-point or may be
displayed automatically when a projection is displayed.

[0028]In general the method is applicable to data sets having hundred(s)
or even thousands of data-points, although fewer data-points may be
considered. The method begins at S100.

[0029]At S102, based on the dataset, a finite subset of projections is
generated that may be expected to be of interest to the user, e.g., based
on user-selected criteria. Each projection may be a linear projection,
although non-linear projections are also contemplated. From the set of
all possible projections, the finite subset of projections may be
generated by automated methods which focus on data-points or groups of
data-points likely to be of interest or by using a user-directed
selection of features designed to reduce the number of projections to a
subset of the possible projections. Each projection includes a set of the
data-points (at least one data point) and may be a 2 dimensional
projection in which x and y axes represent first and second of the
possible dimensions (features) of the data set respectively or a
simulated 3D projection where a third dimension is graphically
represented.

[0030]At S104, the set of projections generated at S102 is approximated by
a 1-dimensional smoothed curve (see, for example, FIG. 3). The
approximation may be performed by optimizing an objective function which
takes into account two factors: a) the proximity of the data-points to
the curve (with the object of bringing the curve as close as possible to
the data-points of interest) and b) the smoothness of the curve (which
increases as the length of the curve is reduced). For example, a
weighting factor may be used to weight the relative influence of a first
term of the function related to the proximity of data-points and a second
term related to the curve smoothness. As will be appreciated, when the
solution of the objective function is obtained through an iterative
process, such as expectation maximization, a true optimum may never be
achieved and the term "optimization" is intended to cover such cases
where the method approaches but does not fully achieve a true optimal
value.

[0031]At S106, an animation (dynamic visualization) can be assembled using
a fixed number of projections taken uniformly of the smoothed curve.

[0032]At S108, the animation is displayed to the user.

[0033]Optionally, at S110, a user reviewing the animation selects
data-points and annotates them with class labels or selects new
constraints to add. The method then proceeds to step S102, where the
class labels/constraints are automatically used to form constraints on
the sampling of projections which are used in identifying a new set of
candidate projections from the set of all possible projections.

[0034]The method ends at S112.

[0035]FIG. 2 illustrates a computer system for implementing the exemplary
method. The system includes data memory 10 for storing a data set 12
being processed. A dynamic visualization component (DVC) 14 performs
steps S102-S108. The DVC 14 may be implemented as hardware or software or
a combination thereof. In the exemplary embodiment, the DVC 14 comprises
software instructions, stored in main memory 16. A processor 18, such as
the CPU of a computer 20, which hosts the DVC, controls the overall
operation of the computer system by execution of processing instructions
stored in memory 16. The instructions comprising the DVC 14 are executed
by processor 18. Components 10, 16, 18, of the computer system may be
connected by a data control bus 22. The computer 20 includes an
input/output device 24, which is linked by communication links 26, 28 to
one or more of a display 30, such as a computer monitor, and a user input
device 32, such as a keyboard, keypad, cursor control device, or touch or
writable screen, and/or a cursor control device 26, such as mouse,
trackball, or the like, for communicating user input information and
command selections to the processor 18, such as annotations for the
data-points, constraints, or selection of one of the projections in the
sequence for viewing. Links 26, 28 may be wired or wireless communication
links and may be direct links or connections through a network, such as a
local area network or wide area network, such as the Internet.

[0036]As will be appreciated, computer 20 may comprise one or more
computing devices, such as a personal computer, PDA, laptop computer,
server computer, or combination thereof. Memories 10, 16 may be integral
or separate and may represent any type of computer readable medium such
as random access memory (RAM), read only memory (ROM), magnetic disk or
tape, optical disk, flash memory, or holographic memory. In one
embodiment, the memories 10, 16 comprise a combination of random access
memory and read only memory. In some embodiments, the processor 18 and
memory 10 and/or 14 may be combined in a single chip.

[0037]The method illustrated in FIG. 1 may be implemented in a computer
program product that may be executed on a computer. The computer program
product may be a tangible computer-readable recording medium on which a
control program is recorded, such as a disk or hard drive, Common forms
of computer-readable media include, for example, floppy disks, flexible
disks, hard disks, magnetic tape, or any other magnetic storage medium,
CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a
FLASH-EPROM, or other memory chip or cartridge, or any other medium from
which a computer can read and use. In other embodiments, the method may
be implemented in a transmittable carrier wave in which the control
program is embodied as a data signal, e.g., transmission media, such as
acoustic or light waves, such as those generated during radio wave and
infrared data communications, and the like.

[0038]The exemplary method may be implemented on one or more general
purpose computers, special purpose computer(s), a programmed
microprocessor or microcontroller and peripheral integrated circuit
elements, an ASIC or other integrated circuit, a digital signal
processor, a hardwired electronic or logic circuit such as a discrete
element circuit, a programmable logic device such as a PLD, PLA, FPGA, or
PAL, or the like. In general, any device, capable of implementing a
finite state machine that is in turn capable of implementing the
flowchart shown in FIG. 1, can be used to implement the method for
generating a dynamic visualization of data in a multi-dimensional space
of three or more dimensions as a sequence of images of 2D or 3D
projections.

[0039]Further details of the system and method are described below.

1. PROJECTIONS GENERATION (S102)

[0040]The aim of the first step is to sample, from all possible
projections comprising at least one data point, a subset of the
projections which focus on showing projections containing data-points
which are more likely to be of interest to the user. This may be achieved
by providing automatically generated constraints to the sampling process,
user generated constraints, or a combination thereof.

[0041]Assume that the dataset X .di-elect cons.n×d consists of
n data-points (x1,x2, . . . ,xn) where every data-point
Xi .di-elect cons.d can be described by d features, e.g., as a
features vector. d thus represents the number of dimensions of the
data-point cloud. For simplicity, only linear projections are considered
here (see below for non-linear projections). Let β denote the linear
projection: β is a d×2 matrix (or d×3 matrix in the case
of a 3D representation) and the projected vector is
Π(x,β)=Tv(β)x. Where Tv denotes the transposition
operator for a vector. The objective of the first step is to generate a
set of N candidate projections β1,β2, . . .
,βN that roughly cover the space of "interesting" projections
and which is only a subset of all possible projections B. Two types of
approach to generate interesting projections (projections in areas of
high user preference) are proposed. The first one is based on linear
dimensionality reduction, the second one is based on sampling strategies:

1.1 Generating Projections Using Local Linear Embedding

[0042]The standard method to project high dimensional data-points in 2 or
3 dimensions is Principal Component Analysis (PCA). When PCA is applied
to data-points and their neighbors only, the method is called Local
Linear Embedding (LLE). With this method, for every point, its local
neighborhood is identified using a pre-defined distance metric and the
linear projection over this neighborhood that minimizes the squared
reconstruction error when projecting the points back in the original
space. In the present embodiment, three proposals for LLE are described
to generate the projections β1,β2, . . .
,βN, for unsupervised, supervised, and semi-supervised
applications, respectively. For each of these methods, there is one
projection associated to every point.

[0043]1.1.1 Unsupervised setting In this approach, exactly the same number
of projections is generated as the number of data points n. The PCA
method is applied on the neighborhood of each of the n available
data-points. Each of these projections is used in the next step.
Concerning the neighborhood, it is limited to the K closest data-points,
as determined by a distance metric. The diversity of the possible
projections depends on the size of the neighborhood K. Note that if K=n-1
were to be selected, all the data-points are neighbors to each other so
that the n projections are the same and equal to the PCA solution on the
full dataset. In an exemplary embodiment, K≦n/4 for example,
10≦K≦100. For moderate values of K (e.g., K=20), the
computed projections can be useful for visualizing the local relative
positions of similar data-points.

[0044]1.1.2 Supervised setting When data-points are already labeled, it
may be helpful to take this information into account when computing the
candidate projections. A K-neighborhood may still be used, but PCA can be
used on the class mean vectors instead of the original data-point. Also,
as for the Fisher Linear Discriminant (FLD) method, the Mahalanobis
distance can be used to compute the principal component (see Mahalanobis,
P. C. "On the generalised distance in statistics". Proceedings of the
National Institute of Sciences of India 2 (1): 49-55 (1936)). For large
neighborhoods, the method is equivalent to the standard FLD.

[0045]1.1.3 Semi-supervised setting In practice, few points are labeled,
especially in the interactive setting presented in section 4.1 below. A
tradeoff can be defined between the unsupervised and the supervised
approach by considering that every unlabeled point is in a separate class
and then apply the local FLD method described for the supervised setting.

1.2 Generating Projections by Sampling Random Projections

[0046]A set of L constraints C={c1, . . . ,cL} given by the user
is defined. Projections are then selected automatically, based on these
constraints. Every constraint cl,l=1, . . . ,L is a function from
n×2 (a given 2D projection of the data-points) to [0,1]. If
cl(z)=1, where z is a candidate projection in n×2, then
the projection z is valid for the lth constraint is satisfied. On
the other hand, the value cl(z)=0 means that the constraint is not
satisfied: this projection is not considered valid. For example, a user
may define a proximity constraint: two given points should have a
projection that is no more than a pre-specified distance. A value of
cl between 0 and 1 can be interpreted as the degree of acceptance
for the constraint.

[0047]The exemplary method aims to sample more frequently those
projections for which the degree of acceptance is highest and less
frequently those projections where the degree of acceptance is lowest.

[0048]N different candidate projections β1, . . . ,βN
are then sampled, such that the visualization of the projected data
satisfies the user constraints and gives a good overview of the structure
of the data.

[0049]Formally, let S be the set of projections satisfying the
constraints:

If only hard constraints are set, the distribution Q is the uniform law
over S, that is to say the probability of occurrence under Q of a point
in an subset s of S is proportional to

s S ##EQU00002##

where |s| denotes the cardinality of the set s and |S| denotes the
cardinality of the set S. If the user chooses soft constraints, the
distribution Q is proportional to

l = 1 L c l ( ( X , β ) ) .
##EQU00003##

[0050]Exemplary sampling strategies include:

[0051]1. Accept/reject

[0052]2. Metropolis-Hasting

[0053]In the accept-reject strategy, the space of projections B is sampled
uniformly. If the sampled distribution does not satisfy the constraint,
then a new projection is generated until acceptance is achieved. This
works well only for small dimensional spaces (in practice, for values of
d which are smaller than 10).

[0054]For higher dimensions, the Metropolis Hasting algorithm is a more
applicable method: This method starts with the selection of an initial
projection β0 satisfying the constraints. Then, iterate the
algorithm for indices τ=0, 1, 2, 3, . . . , as follows:

[0055]1. Generate a new projection {tilde over (β)} using a proposed
distribution q({tilde over (β)}|β.sub.τ). In one embodiment
a Gaussian distribution is used for q, with mean βt and
covariance σ2;

[0059]In the case of hard constraints, this procedure is guaranteed to
sample the space S uniformly over the space. The value of σ2
may be chosen adaptively to have a predetermined acceptance rate, e.g.,
of approximately 25%.

2. ONE-DIMENSIONAL APPROXIMATION (S104)

[0060]In the second step of the method, a series of projections that
approximate the high dimensional data-cloud of N candidate projections by
a smoothed segment (curve) is computed.

[0063]where: T represents the number of centroids on the smoothed curve (a
fixed parameter, which may be, for example from about 10 to 1000 or more,
e.g., about 100 for ease of computation);

[0064]μt are the centroid mean locations;

[0065]λ is a learned parameter and is the inverse variance of the
centroids (using the exponential of the negative value in the first term
allows for optimization by maximization of the value, rather than
minimization). If λ is large the expression considers points which
are relatively far from the curve; and

[0066]α is a weighting factor. The value of α is set manually
to tradeoff smoothness and data point closeness in the objective Eqn.
(1). The value of α can be, for example, 0<α<∞.
The closer α is to ∞, the closer the curve is required to be
to a straight line.

[0067]The optimal values of λ and μt are learned using
Expectation Maximization (EM). As noted above, the first term of the BSOM
objective function is a function of the covering of the space, taking
into account the user preferences. It aims to optimize the proximity of
the data-points to the closest centroid on the curve (See FIG. 3). The
curve has a fixed number of centroids spaced along its length. The
quantity βi-μt represents the distance between a given
projection (as represented by a data point) from the set of candidate
projections and the closest centroid μt. The second term (after
+) aims to optimize the smoothness. In the exemplary function this is
done by considering, for each three successive centroids: μt-1,
μt, and μt+1, how close these points are aligned to a
straight line.

[0068]The EM algorithm provides a general approach to learning in the
presence of unobservable variables. In the present case, the hidden
variables are the assignment of the data-points to the clusters (defined
by their centroids). This algorithm begins with an arbitrary initial
hypothesis and then repeatedly calculates the expected values of the
hidden variables (assuming the current hypothesis is correct)
(Estimation), and then update the parameters based on the expectations
over the hidden variables (Maximization). This procedure converges to a
local maximum likelihood.

[0069]FIG. 3 is an exemplary Bayesian Self-Organizing Map which can be
generated during the exemplary method. Data-points corresponding to
projections in higher dimensional datasets, are illustrated by dots 40.
In the example, for ease of illustration, a set of 2 dimensional data is
used. The data is approximated by a 1-dimensional smoothed curve 42. The
smoothed curve 42 has a beginning at 44 and end at 46. Spaced along the
curve are centroids 48. In the AGT method, the centroids are the set of
projections that will be used to build the 2D smooth animation. BSOM
allows a non-redundant and sequential set of projections to be obtained.
Additional projections for the animation can be obtained by interpolation
between adjacent centroids. As can be seen in FIG. 3, areas shown
generally as A and B, which represent the least interesting projections,
are spaced from the curve 42 and are typically under-represented in the
sequence of projections which form the animation.

[0070]An example of a solution obtained in the AGT framework using a
multi-dimensional data-set is shown in FIG. 4. As for FIG. 3, the data is
approximated by a 1-dimensional smoothed curve 42. It can be seen that
there are more cluster centroids (the circles) where the density of
projections (the squares) is high. Also the global coverage of the
candidate projections is reasonable. It will be appreciated that,
although the curve 42 appears to pass through the same point in space
more than once, this is an artifact of the two dimensional
representation. As for FIG. 3, the curve has a beginning 44 and end 46
with centroids 48 spaced along the curve and never passes through the
same point in the multidimensional space more than once.

3. CREATION OF THE ANIMATION (S106)

[0071]The curve 42 obtained from the previous step is a piecewise linear
curve and is used to select the projections for the animation. Each
selected projection can correspond to one video frame. For example, a
given number M of projections is computed by splitting the curve into in
M segments. The mean of each of these segments corresponds to a
projection that is applied to every data-point to obtain the data-set to
obtain a low-dimensional plot. For example, if M is equal to the number
of centroids, then there is one image computed for every centroid of the
curve. M can be more than the number of centroids, by interpolating
between them.

[0072]The segments M may be equally spaced along the curve. An image is
generated for each projection (there is one projection for every segment)
using image rendering software (which may form a part of the DVC). The
set of the generated images in the sequence define the movie that can be
shown to the user via the display at S108. As for the Grand Tour, the
index of the projections is the time dimension. For improved
visualization, colors can be used to distinguish labeled and unlabeled
data-points, where specific colors corresponding to class labels.

[0073]As for the Grand Tour, the space-filling curve is used in the
projection space to provide a series of projections for which, for every
possible projection, the series contains at least one element in a small
neighborhood, and the sequence of projection is smooth, so that two
contiguous projections in the series give similar images. However, unlike
the Grand Tour, the method does not require that the entire space of the
data set be sampled by the projections.

4. EXTENSIONS OF THE EXEMPLARY METHOD

4.1 Interactivity

[0074]The Adaptive Grand Tour method thus described may be useful as an
interactive tool to visualize quickly complex and multidimensional
datasets. The user can guide the generation of the final animation either
by annotating items or by adding new constraints. Users work directly on
data-points with the current projection. For example, a first animation
is generated by the method described above. The user views the animation
and selects an interesting projection to work on. The user may examine a
data point and manually label it. The label then becomes a constraint,
which influences the generation of the animation when the method is
repeated. Of course, a user may decide to label several or all
data-points in a selected projection or projections before repeating the
method. As a simple example, a user may label some data-points as "cats"
and others as "dogs" based on a review of images corresponding to the
data-points. An automated learning system (not shown) can be trained on
the labeled data-points to label other data-points automatically, based
on the similarity of their features. The projection helps the user to
identify the best data-points to label for training the system. (See, for
example, U.S. patent application Ser. No. 12/080,414, filed Apr. 2, 2008,
entitled MODEL UNCERTAINTY VISUALIZATION FOR ACTIVE LEARNING, by Loic
Lecerf, the disclosure of which is incorporated in its entirety by
reference).

[0075]The annotation may be the data class label, but a user can also
define specific annotations in order to guide the AGT according to his
interest. With the generic constraints alternative, the user may use
another kind of guidance. A constraint may be expressed in terms of a
minimal or maximal distance between two or several selected items.

[0076]The overall method tends to favor retaining projections in which
data-points from the same cluster are close to each other. A user stops
the animation when he finds an interesting projection. He can then add or
remove constraints and/or labels and launch a new animation. This
iterative process is a powerful tool for exploring and discovering
interesting clusters or structure in a dataset.

[0077]The interaction with the user occurs once the movie is visualized:
some data-points are selected and annotated into categories. The
annotations are used to generate a new movie. The current animation may
be stopped at anytime and annotation may be removed or refined in order
to have a better insight into a specific part of the dataset.

4.2 Non-Linear Projections

[0078]The exemplary method has been described in terms of only linear
projections of the data. It is possible to extend the approach to
non-linear projections by replacing the n×d data matrix (n is the
number of points, d the number of dimensions) by a n×n Kernel
matrix with entries K(xi, xj) where K is a Kernel function.

[0079]Without intending to limit the scope of the exemplary embodiment,
the following examples demonstrate the applicability of the method to
existing multidimensional datasets.

5. EXAMPLES

[0080]Three data sets of the standard UCI collection were used for testing
the method: Iris; Lymph, and Segment (See D. J. NEWMAN, S. HETTICH, C.
B., AND MERZ, C. UCI repository of machine learning databases, 1998, for
a description of these data sets).

[0081]Movies were prepared to compare the different modes of AGT described
in sections 1.1 and 1.2 above and show the advantages of taking into
account the labels of annotated items. For initial tests, a set of twenty
items (represented by data-points) were used and the AGT was generated
with the semi-supervised mode and unsupervised mode. Random annotations
were used to simulate the user guidance. For improved visualization,
colors are used to distinguish labeled and unlabeled data-points, with
specific colors corresponding to class labels (For ease of illustration
in monochrome, different shapes are used for the data-points).

[0082]In the supervised mode, the twenty labeled data-points were used to
adapt the Grand Tour to user interest. FLD was used to find an optimal 2D
projection of each neighborhood. The K-Neighborhood of an item is the
nearest labeled item for each class c plus the K-c nearest items.

[0083]In the unsupervised mode PCA was used instead of FLD for the
projections generation. It could be seen by examinations of the
animations that semi-supervised AGT focused the dynamic visualization on
user interest. For each projection of the movies, the labeled data-points
are as far as possible. In the other hand, unsupervised AGT gives a
broader overview of the dataset.

[0084]FIGS. 5 and 6 show screen shots of AGT with the Lymph dataset.
Data-points shown as dots represent non annotated items. Labeled
data-points are marked with an x or + according to their classes. FIGS.
5A-F arose from application of the semi-supervised AGT method. FIGS. 6A-F
arose from application of the unsupervised AGT method.

[0085]As previously noted, Metropolis-Hasting is an alternative method for
the generation of good projections. In the animations generated with this
method, 2 pairs of data-points were chosen randomly and the AGT
generated. The valid projections correspond to the set of projected
datasets such that the distance between every pair of labeled data-points
in different (same) clusters is greater (smaller) than a pre-specified
threshold (which was set to 25% of the dataset diameter). This complex
space was then sampled with the Metropolis Hasting method.

[0086]In the videos, it could be seen that the AGT is restricted in order
to respect the constraints.

[0087]In the case of Non-linear projections, the original projections were
mapped into a higher-dimensional space. A Gaussian kernel transformation
was applied. Thus an AGT was built on transformed data.

[0088]In summary, the exemplary system and method has several advantages
over existing methods.

[0089]It allows the exploration of high dimensional data, by a set of
continuous 2D projections that try to fill the space of every possible
projection. In contrast, standard data visualization methods reduce the
dimension to display a static view of the data, resulting in a loss of
information.

[0090]In the interactive approaches of the exemplary method, the user is
allowed to work directly on data-points rather than on parameters. This
is a very intuitive way to guide the exploration process.

[0091]Some methods are based on a view that is optimal for a certain
criterion (e.g., PCA minimizes the Euclidian projection error). With the
exemplary method, a video sequence can be guided by the user's intuitions
rather than a fixed criterion.

[0092]It will be appreciated that various of the above-disclosed and other
features and functions, or alternatives thereof, may be desirably
combined into many other different systems or applications. Also that
various presently unforeseen or unanticipated alternatives,
modifications, variations or improvements therein may be subsequently
made by those skilled in the art which are also intended to be
encompassed by the following claims.