2009

Kernel methods are among the most successful tools in machine learning and are used in challenging data analysis problems in many disciplines. Here we provide examples where kernel methods have proven to be powerful tools for analyzing behavioral data, especially for identifying features in categorization experiments. We also demonstrate that kernel methods relate to perceptrons and exemplar models of categorization. Hence, we argue that kernel methods have neural and psychological plausibility, and theoretical results concerning their behavior are therefore potentially relevant for human category learning. In particular, we believe kernel methods have the potential to provide explanations ranging from the implementational via the algorithmic to the computational level.

Exemplar theories of categorization depend on similarity for explaining subjects ability to
generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction
mechanisms and thus, seemingly, generalization ability. Here, we use insights from
machine learning to demonstrate that exemplar models can actually generalize very well. Kernel
methods in machine learning are akin to exemplar models and very successful in real-world
applications. Their generalization performance depends crucially on the chosen similaritymeasure.
While similarity plays an important role in describing generalization behavior it is not
the only factor that controls generalization performance. In machine learning, kernel methods
are often combined with regularization techniques to ensure good generalization. These same
techniques are easily incorporated in exemplar models. We show that the Generalized Context
Model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical
model called kernel logistic regression. We argue that generalization is central to the enterprise
of understanding categorization behavior and suggest how insights from machine learning can
offer some guidance. Keywords: kernel, similarity, regularization, generalization, categorization.

Similarity is used as an explanatory construct throughout psychology and multidimensional scaling (MDS) is the most popular way to assess similarity. In MDS, similarity is intimately connected to the idea of a geometric representation of stimuli in a perceptual space. Whilst connecting similarity and closeness of stimuli in a geometric representation may be intuitively plausible, Tversky and Gati [Tversky, A., Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89(2), 123154] have reported data which are inconsistent with the usual geometric representations that are based on segmental additivity. We show that similarity measures based on Shepards universal law of generalization [Shepard, R. N. (1987). Toward a universal law of generalization for psychologica science. Science, 237(4820), 13171323] lead to an inner product representation in a reproducing kernel Hilbert space. In such a space stimuli are represented by their similarity to all other stimuli. This representation, based on Shepards law, has a natural metric that does not have additive segments whilst still retaining the intuitive notion of connecting similarity and distance between stimuli. Furthermore, this representation has the psychologically appealing property that the distance between stimuli is bounded.

The abilities to learn and to categorize are fundamental for cognitive systems, be it animals or machines, and therefore have attracted attention from engineers and psychologists alike. Modern machine learning methods and psychological models of categorization are remarkably similar, partly because these two fields share a common history in artificial neural networks and reinforcement learning. However, machine learning is now an independent and mature field that has moved beyond psychologically or neurally inspired algorithms towards providing foundations for a theory of learning that is rooted in statistics and functional analysis. Much of this research is potentially interesting for psychological theories of learning and categorization but also hardly accessible for psychologists. Here, we provide a tutorial introduction to a popular class of machine learning tools, called kernel methods. These methods are closely related to perceptrons, radial-basis-function neural networks and exemplar theories of catego
rization. Recent theoretical advances in machine learning are closely tied to the idea that the similarity of patterns can be encapsulated in a positive definite kernel. Such a positive definite kernel can define a reproducing kernel Hilbert space which allows one to use powerful tools from functional analysis for the analysis of learning algorithms. We give basic explanations of some key conceptsthe so-called kernel trick, the representer theorem and regularizationwhich may open up the possibility that insights from machine learning can feed back into psychology.

Elimination by aspects (EBA) is a probabilistic
choice model describing how humans decide between several options.
The options from which the choice is made are characterized by
binary features and associated weights. For instance, when choosing
which mobile phone to buy the features to consider may be: long
lasting battery, color screen, etc. Existing methods for inferring
the parameters of the model assume pre-specified features. However,
the features that lead to the observed choices are not always known.
Here, we present a non-parametric Bayesian model to infer the
features of the options and the corresponding weights from choice
data. We use the Indian buffet process (IBP) as a prior over the
features. Inference using Markov chain Monte Carlo (MCMC) in
conjugate IBP models has been previously described. The main
contribution of this paper is an MCMC algorithm for the EBA model
that can also be used in inference for other non-conjugate IBP
models---this may broaden the use of IBP priors considerably.

2005

Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.

In psychophysical studies, the psychometric function is used to model the relation between physical stimulus intensity and the observers ability to detect or discriminate between stimuli of different intensities. In this study, we propose the use of Bayesian inference to extract the information contained in experimental data to estimate the parameters of psychometric functions. Because Bayesian inference cannot be performed analytically, we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition, we discuss the parameterization of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generated data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation and find the Bayesian approach to be superior.

In psychophysical studies the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. In this report we propose the use of Bayesian inference to extract the information contained in experimental data estimate the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition we discuss the
parameterisation of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generate
d data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation. The appendix provides a description of an implementation for the R environment for statistical computing and provides the code for reproducing the results discussed in the experiment section.

In psychophysical studies of perception the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. We propose the use of Bayesian inference to extract the information contained in experimental data to learn about the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we use a Markov chain Monte Carlo method to generate samples from the posterior distribution over parameters. These samples can be used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. We compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with parametric bootstrap techniques for confidence interval estimation. Experiments indicate that Bayesian inference methods are superior to bootstrap-based methods and are thus the method of choice for estimating the psychometric function and its confidence-intervals.

We explored several ways to improve the efficiency of measuring psychometric functions without resorting to adaptive procedures. a) The number m of alternatives in an m-alternative-forced-choice (m-AFC) task improves the efficiency of the method of constant stimuli. b) When alternatives are presented simultaneously on different positions on a screen rather than sequentially time can be saved and memory load for the subject can be reduced. c) A touch-screen can further help to make the experimental procedure more intuitive. We tested these ideas in the measurement of contrast sensitivity and compared them to results obtained by sequential presentation in two-interval-forced-choice (2-IFC). Qualitatively all methods (m-AFC and 2-IFC) recovered the characterictic shape of the contrast sensitivity function in three subjects. The m-AFC paradigm only took about 60% of the time of the 2-IFC task. We tried m=2,4,8 and found 4-AFC to give the best model fits and 2-AFC to have the least bias.