Leslie Valiant, the T. Jefferson Coolidge Professor of
Computer Science and Applied Mathematics in Harvard University's
School of Engineering and Applied Sciences, is the recipient of
the 2010 ACM A.M. Turing Award. Valiant is best known for the
"probably approximately correct" model of machine learning, which
provided an essential framework for studying the learning process
and led to major advances in areas from artificial intelligence
to computer vision. (A profile of Valiant, "Beauty and Elegance,"
appears on p. 14.)

Let's talk about your background. What drew you to computer
science?

Growing up I was intrigued by both physics and mathematics,
and perhaps most of all by the connection between the two. At
Cambridge I studied mathematics. I had various tutors, both
graduate students and faculty, and when I could I tried to find
out what they were working on. Computer science was a way of
trying a different diretion. My Ph.D. thesis at Warwick was on
questions of computability suggested by my advisor, Mike
Paterson. In retrospect, this was a good area to cut one's teeth
on as it was so quintessentially about computation. Just how rich
the field is became clear to me only much later. I consider it my
very good fortune to have found my way into it.

How did you first get interested in machine
learning?

After my Ph.D., I worked in algorithms and complexity theory.
I observed work in artificial intelligence only from a distance.
I had accepted the argument that humans provided an existence
proof that intelligent behavior is possible by computational
mechanisms. However, I recall being puzzled that so many people
could be working with such confidence in an area where the
concepts seemed so ill-defined. Several topics were studied at
that time within AI, such as reasoning, expert systems, planning,
natural language, robotics, vision, as well as learning. But
which of these was the most basic? Which one had some fundamental
definition? I convinced myself somehow that the answer must be
learning, but I am not sure how I got there.

Describe how you came to develop the "probably
approximately correct," or PAC, model of machine
learning.

The main problem, as I saw it, was to specify what needs to be
achieved if a learning algorithm is to be declared effective. It
was clear that this definition would need to be quantitative if
it was to mean anything, since with infinite data and computation
there is no real problem. Existing statistical notions, such as
Bayesian inference, did not seem enough because they did not
distinguish quantitatively between what is learnable in practice
and what is not. The final formulation emerged quite rapidly. The
state of complexity theory at the time made this possible for
someone asking the right questions. The main point of PAC
learning is that while it has a statistical component, because
uncertainty needs to be allowed for, it is the computational
aspect that gives it substance, since without that essentially
everything would be classified as learnable.

Can you tell me about some of the computational models you
developed in conjunction with PAC?

I'll describe the one model that led to consequences far
greater than I could have originally imagined. In the course of
work that sought to explain why certain easy-looking learning
problems, such as learning regular languages, had proved so
intractable, Michael Kearns and I formulated the notion of weak
learning. This was essentially a gambler's version of PAC
learning, where it was sufficient to make predictions with
accuracy just better than random guessing, rather than with near
certainty. The question of whether this was as powerful as PAC
learning became a natural open problem. Successive contributions
by Kearns, Schapire, and Freund not only resolved this question
in the very unexpected positive direction, but yielded the
boosting technique, perhaps the most powerful general learning
method now known. This technique, somewhat magically, has been
used to improve the performance of almost any basic learning
algorithm on real datasets. This is an excellent example of how a
theoretical question about abstract models can get at the heart
of an important phenomenon and yield rich practical
dividends.

You made a number of important contributions to parallel
computing in the 1980s. Can you tell me about your more recent
work in that arena?

The root problem with parallel machines has not been that any
one is inherently bad, but more that it is difficult to make
sense of them as a totality. The most striking fact about
sequential machines is that they are all the samethey
emulate the von Neumann abstraction. This sameness has been vital
to the growth of computing since it has enabled us to write
portable software for this abstraction and make machines to
emulate it. Thus the von Neumann model has served as the vital
"bridging model" between the software and hardware.

I have been long interested in the question of whether a
similarly effective bridging model exists for parallel machines
and what that might be. Parallelism has now arrived with multiple
cores in our laptops. My view is that the irresistible driving
forces that have brought this about have been the imperatives of
physics. It follows, I think, that we have to look to the driving
physical limitations to see what is most fundamental about
parallel computing. Computation, communication, synchronization,
and memory all need various resources, which in turn are limited
by the physical arrangements of the devices. The sought-after
bridging model has to account for these costs. We have been
spoiled over the last 60-odd years by having an abstraction as
simple as the von Neumann model suffice. Life will become a
little more complicated now. The absence of an accepted bridging
model is evident even at the theoretical level of algorithms.

My most recent work addresses these issues via a particular
proposed bridging model, called the Multi-BSP, which tries to
capture the physical limitations of machines as simply as
possible. It uses barrier synchronization in a hierarchical
manner and idealizes a machine as a set of numerical parameters
that specify a point in the space of possible machines. Mention
of the many architectural details of current machines that are
not inevitably suggested by physics is deliberately omitted.

Let's talk about your most recent work in computational
neuroscience. Can you explain the "neuroidal model" you developed
in your book Circuits of the Mind?

The neuroidal model is designed to describe how basic
computational tasks are carried out in the brain at the level of
neurons. We do not know what changes the brain undergoes when it
memorizes a new word or makes a new association. We need some
language for describing the alternative algorithms that a network
of neurons may be implementing. Memorizing a new word is easy
enough for a computer, but it is still mysterious how the brain
does it. Devising algorithms that do not obviously contradict
biology is not easy, because biology appears very constrained.
Each neuron is connected to only a small fraction of the others,
and its influence on each of those is believed to be weak on the
average.

You also developed a formal system called robust logics
that seeks to reconcile what you've described as "the apparent
logical nature of reasoning and the statistical nature of
learning."

Robust logic is aimed at tasks which we do not understand how
to do on any computational device, let alone the brain. The tasks
in question are those of reasoning using information that is
learned from experience and therefore uncertain. Humans manage to
make effective decisions even in an uncertain world and even with
only partial knowledge. This skill is close to the heart of
intelligence, I believe, and understanding it better raises some
fundamental scientific questions.

How does computation in the brain differ from computation
in a computer?

At the level of neurons the organization is clearly very
differentthere appears to be no global addressing
mechanism for a start. However, there is a good chance, I
believe, that at the next level up from neurons, some basic
primitives are realized which we will be able to describe in more
familiar computational terms. This is the enterprise that the
neuroidal model is designed for.

The brain has some general computational character, which we
ought to be able to decipher in the foreseeable future. It also
contains more specific information acquired through evolution and
learning that may be more difficult to uncover. Evolution and
learning are subtle adaptive processes, and to emulate their
results, we have to understand both the algorithms they use, as
well as the environments in which the evolution and learning took
place.

Author

Footnotes

Permission to make digital or hard copies of part or all of
this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or
commercial advantage and that copies bear this notice and full
citation on the first page. Copyright for components of this work
owned by others than ACM must be honored. Abstracting with credit
is permitted. To copy otherwise, to republish, to post on
servers, or to redistribute to lists, requires prior specific
permission and/or fee. Request permission to publish from
permissions@acm.org or
fax (212) 869-0481.