1. Motion Perception

Broad, Clay, Russell, Foster and other realists hold that motion
(and other forms of change) can be directly perceived. This
notion receives some support from findings relating to the workings of
our perceptual systems in general, and the visual system in
particular.

Just as there are regions of the brains visual systems that
specialize in colour (e.g. V4), there are other regions – V3 and
V5 (or MT) – that specialize in motion detection.
What is more, some of the systems and pathways devoted to motion are
– in evolutionary terms – among the more primitive parts of
the
brain.[43]
Region V5 is particularly intriguing in
this respect. It seems that all its neurons are
concerned with motion in one direction or another, and none with colour
or shape; and unlike V3, the neurons in V5 are concerned with
large-scale motion detection, e.g., of whole objects, rather than mere
edges. Also, damage to V5 is associated with cerebral
akinetopsia: the severely degraded ability to perceive motion, as
found in the patient L.M. (Zeki, 1991, 2004; Rizzo et
al, 1995). The latter’s predicament was characterized
thus:

The visual disorder complained of by the patient was a loss of
movement in all three dimensions. She had difficulty, for
example, in pouring tea or coffee into a cup because the fluid appeared
to be frozen, like a glacier. In addition, she could not stop
pouring at the right time since she was unable to perceive the movement
on the cup (or a pot) when the fluid rose. … In a room where
more than two people were walking she felt very insecure and unwell,
and usually left the room immediately, because ‘people were
suddenly here or there but I have not seen them moving.’ …
She could not cross the street because of her inability to judge the
speed of a car, but she could identify the car without
difficulty. (Zihl, von Cramon & Mai, 1983: 315)

The difference between L.M.’s experience and our own could not
be clearer or more dramatic: whereas we are able to see things
moving in a smooth, continuous manner, L.M. has lost this
ability.

There is plentiful further evidence for the contention that our
brains are more than willing to generate experiences of motion.
Perhaps the most familiar is the fact that we see moving images on
cinema and television screens (and computer monitors). This may
not seem in the least surprising: aren’t these devices expressly
designed to show moving images? In fact, as already noted, the
images shown are stills, and the motion we perceive is
entirely supplied by our brains. Two main mechanisms are at work
in such cases. Suppose two spots are shown, one after the other
at different locations on a screen; if the interval between them is
sufficiently short, we will not discern the succession: the spots will
seem to occur simultaneously. This is due to the well-studied
phenomenon of visible persistence. When we are shown a
brief visual stimulus, the resulting visual experience is typically a
good deal longer than the stimulus itself: e.g., the visible
persistence of a single 1 msec flash can vary between 100 msec and 400
msec, depending on the type of flash and the adaptive state of
the
eye.[44]
This effect is one reason why the brief gaps between successive images
on a cinema screen tend not seen. It also explains why it is
possible to write ones name in the night sky with a moving torch (as
noted by both Leonardo and Newton). However, visible persistence
alone does not explain how it is that we see motion in the clear and
distinct way in which we do. If I wave my hand slowly back and
forth in front of my eyes in broad daylight, I do not see it followed
by a trail of lingering ghostly predecessors: my hand is cleanly
delineated, yet moving. This is explained by an effect
first noted by Exner. Returning to our spots-on-a-screen, if the
interstimulus interval is increased somewhat, to 20–40 frames per
second (fps) something more interesting and dramatic happens: we see
the spot moving smoothly back and forth between its left and right
positions, despite the fact that all that is really appearing on the
screen are two spots of light, at fixed locations, flashing on and
off, This is the already-mentioned phi phenomenon, also
known as ‘apparent motion’. (The latter designation
could be misleading: the motion as it appears is entirely
indistinguishable from the real thing.) Evidently, our brains are
more than happy to supply us with experiences of motion at the least
opportunity. And happily, the effect is not confined to spots of
light: it extends to sequences of complex images (e.g., photographs
taken in rapid succession of a swarming crowd of people). The
frequencies which suffice to generate smooth apparent motion are not
great: only 24 fps are shown on cinema screens, whereas television uses
30 fps and computer monitors 60 fps or
above.[45]
Online examples of the
simple two-spot illustration of the phi phenomenon are easy to find,
and are well worth experimenting
with.[46]

No less dramatic, but somewhat less ubiquitous, is the phenomenon of
biological motion. As Johansson (1973) showed,
appropriately arranged, a small number of moving dots will give rise to
a very vivid impression of a moving human figure. Again, online
examples are available – and
striking.[47]
Morgan sums up thus:

Human vision lies somewhere between the extremes of the [blurry]
daguerreotype and the time-frozen electronic flash. We are not normally
conscious of a blur in moving objects: nor do we see them frozen in
space-time. Instead, we see recognisable objects in motion.
Motion is a sensation that cannot be communicated by a single snapshot,
but somehow, the sensation of motion can occur without seeing an object
in many places at the same time. Motion is a specific sensation,
like colour or smell, which cannot be analysed into a separate,
stationary sensations. (2003: 61)

Since perceived motion is usually a property of perceptible
objects – we cannot discern motion in the absence of
moving things – it is probably more correct to talk of motion as
a property or feature of sensations. But in
other respects what Morgan says here seems plausible.

2. Delays: how long does it take for a stimulus to reach (or produce) consciousness?

It is uncontroversial that the phenomenology of perception is such
that we seem to be in immediate perceptual contact with our
surroundings. ‘Immediate’ here has two relevant
connotations. Firstly, perceptual experience is seemingly
unmediated: we are not ordinarily aware of anything (any
mental representations or images) coming between ourselves and what we
see, hear and touch. Secondly, perceptual experience is seemingly
instantaneous in this sense: we ordinarily assume that we
perceive events as they happen, with absolutely no time-lag or
delay.

This assumption does not sit easily with what we know about the
workings of our perceptual systems. Light and sound both travel
at a finite speed – if you look at a distant star you are seeing
it as it was years earlier, irrespective of how present to you
the star might seem. And when the starlight finally reaches our
eyes there are several further hurdles to be crossed before any visual
experience is produced in response to them: the starlight has to
trigger the light-sensitive cells in our retinas, these cells have to
transmit signals through the optical nerve, these signals have to be
processed by the visual centres of the brain. All this takes
time.

How much time? The delay is more difficult to measure than
might initially be thought. The obvious approach is to ask
a subject to react to a signal as soon as they perceive it – by
pressing a button, say – and then subtract the amount of time it
takes messages leaving the brain to result in a muscular
movement. However, the reliability of this approach is undermined
by the fact that we are able to react to stimuli before they become
conscious (‘blindsight’ is a familiar instance of
this). To circumvent the problem some experimental ingenuity is
required. Libet’s well-known results, deriving from direct
stimulation of the brain during neurosurgical operations, suggest that
it typically takes around half a second (500 msec) for a stimulus to
work its way through to consciousness (Libet 1993, 2004).
However, these results have also been criticized (Churchland 1981;
Gomes 1998). Pockett (2002, 2005) suggests that while as much as
500 msec may be required if complicated judgements are being made
concerning the data, in other cases stimuli can produce basic
sensations in as little as 50–80 msec. This is broadly in line
with Efron (1967), who estimates that a minimum of 60–70 msec of neural
processing time is required for simple auditory and visual stimului
reaching the brain to result in experience. In the visual case,
Koch (2004: 260) estimates that around a quarter of a second is
typically needed to properly see an object (in the sense of
recognizing a thing as a thing of a particular
kind).

The answer to our starting question seems to be ‘it
depends’. And while those with an interest in responding to
changes in their environment in a fast, real-time, manner will be
heartened to learn that Libet’s 500 msec estimate may often be on
the long side, it remains the case that a car moving at a 100 kph
traverses a fair distance in 200–250 msec, which even under the best of
conditions is the sort of time needed to see and respond to a traffic
light changing to red, or someone stepping onto the road. Even at
more modest speeds processing delays will have a significant impact: a
delay of just 100 msec mean that the apparent position of a
medium-paced moving ball – say 30 mph – will lag behind its
real position by over a metre.

3. Thresholds of Simultaneity, Succession and Integration

James posed this question:

what then is the minimum amount of duration which we can distinctly
feel? The smallest figure experimentally obtained was by Exner,
who distinctly heard the doubleness of two successive clicks of a
Savart’s wheel, and of two successive snaps of an electric spark,
when their interval was made as small as about one five-hundredth of a
second [2msec]. With the eye, perception is less delicate.
Two sparks, made to fall beside each other in rapid succession on the
centre of the retina, ceased to be recognized as successive by Exner
when their interval fell below 0.044 second [44msec] (1890:
613)

These 19th century figures have largely survived the test
of time. Pockett (2003) recently attempted to replicate
Exner’s findings using contemporary equipment. When shown
two 1 msec flashes of LED light in succession her subjects only began
to see two flashes (rather than one) when the illuminations were
separated by at least 45–50 msec – very much in line with
Exner’s results. In the auditory case, although subjects could
tell that a two-click stimulus was (in some manner) different from a
1-click stimulus when the clicks were separated by as little as 2 msec
– the stimuli were again 1 msec in duration – they could
only begin to discern two clearly distinct sounds when the separation
rose to 10–20 msec, depending on the subject. This result
suggests Exner somewhat exaggerated our capacity for auditory
discrimination. Pöppel’s measurements also suggest as
much: irrespective of modality he found that distinct events require a
separation of at least 30 msec to be perceived as successive (1997:
57). Hirsch and Sherrick’s experiments also point in this
direction: they found that a mere 2 msec separation between sounds was
sufficient for subjects to judge that two sounds were occurring rather
than one, but an interval in the order of 20 msec was required before
subjects could reliably discern the order of the sounds (1961:
425). However, as Fraisse (1984) notes, since it took a good deal
of practice before Hirsch and Sherrick’s subjects achieved this
score, a somewhat higher figure may come closer to the norm.
Hence Pöppel’s more cautious verdict may well be closer to
the mark for most cases. Summarizing, the picture is something
like this:

Pairs of stimuli which are separated by less than the coincidence
threshold are experienced as one rather than two. Stimuli which
are separated by more than the coincidence threshold but less than the
succession threshold are experienced as two rather than one, but their
order is indistinct. It is only when the succession threshold is
reached and surpassed that stimuli appear to have a distinct temporal
ordering (Ruhnau 1995). The fact that the succession
threshold is much the same for all sensory modalities suggests that
cross-modal integrative mechanisms may well exist.

Why do our brains treat stimuli which arrive over brief intervals as
simultaneous? It is by no means just a matter of
insensitivity. Not only do sound and light travel at very
different speeds, our eyes and ears work at different speeds too (our
ears are faster). Consequently, our brains have a good deal to
take into account when attempting to work out what happens when and
where on the basis of the information it receives from millisecond to
millisecond. For more on how it manages to do as well as it does
see King (2005), Kopinska and Harris (2004), Stone et al
(2001), Stetson et al (2006); see Callender (2008) for an
interesting exploration of the philosophical relevance of perceptual
‘simultaneity windows’.

In addition to the limits already mentioned there is what is
sometimes called the ‘integration threshold’.
Distinct brief stimuli which occur within the confines of the latter
can be blended (or integrated) so as to produce a single experience,
the character of which can be surprising. If a small red disk is
shown in for 10 msec and is immediately followed by 10 msec exposure of
a small green disk at the same location, the resulting experience is of
a single yellow flash. If a 20 msec blue light is
followed by a 20 msec yellow light, a single white flash is
seen. (Efron, 1967, 1973) It is clear how long this integration
period lasts, but it is probably less than a quarter of second (Koch
2004: 256).

This sort of integration is sometimes construed as a special case of
the more general phenomenon of backward masking. Visual
masking per se occurs when the appearance of one stimuli
– the target – is affected by the visibility of a
second stimuli, the mask. Backward masking is so-called
because the mask occurs after the target. In one illustration of this
effect subjects are shown brief exposures of two different shapes at
the same location, and asked name each of the shapes. When the
distance between the exposures is between 50–100 msec, the second shape
is accurately identified around 70% of the time, whereas accuracy for
the shape seen first lies around 30% (Bachman and Allik, 1976).
The effect is typically at its strongest when the mask follows the
target by approximately 100 msec, and diminishes rapidly
thereafter. For a recent review of this topic see Breitmeyer and
Ogden (2000).

4. Minimally Brief Experiences of Duration

Do the various findings concerning coincidence and succession
thresholds tell us anything about the temporal extent of minimally
brief experiences of duration or succession? Probably, but there
are complications.

We have just seen that stimuli separated by as little as 30
msec can give rise to an experience of succession. However, it
would be a mistake to move directly from this result to the conclusion
that the duration of the resulting experiences are also have
an objective duration of the order of 30 msec (and a corresponding
phenomenal duration of approximately 30 msec*). As we have also
seen, due to the phenomenon known as visible persistence, the
experience resulting from a single 1 msec flash of light can vary
between 100 msec and 400 msec, depending on the brightness of the flash
and how adapted the eye is to the relevant brightness
conditions.

So what should we conclude? The literature contains a range of
estimates. Pöppel tells us that Ernst Mach, who had an interest
in the discrimination of different temporal durations: ‘observed
that there is no experience of duration for intervals that are shorter
than 40 msec. Stimuli with 40 msec duration or shorter are experienced
as “time points” ’ (2004 :296) But for the reason
just given, this does not tell us how long – in ordinary clock
time – these subjectively durationless experiences last. Drawing
on evidence from a range of experiments, Allport, Stroud and other
proponents of the ‘perceptual moment’ hypothesis (see
Section 5, below) estimated that typical experiential quanta were of
the order of 100 msec. Other estimates suggest a perceptual minima of
longer duration. Efron was notably careful in distinguishing the
duration of stimuli and the durations of the resulting sensory
experiences. His investigations of subjects asked to compare the
durations of immediately successive auditory and visual sensations led
him to conclude that ‘a minimal perceptual duration is produced
by all stimuli of 120–130 msec or less, and that
the duration of this minimum perception lies between 120 and
240 msec for vision and between 120 and 170 msec for audition.’
(1970b: 62). Coren, Ward and Enns sum up a confusing situation thus:
‘the best that we can say at this time is that, depending on the
specific task, the minimum perceptual duration … is probably
between 25 and 150 msec’ (2004: 351).

While Extensional models do not predict a minimum perceptual
duration, they have no difficulty accommodating it. What of
Retentional models? There is no difficulty here either.
Retentionalists may hold that individual specious presents are
experiential episodes with little or no temporal extension, but since
the contents of these experiences present (or represent)
temporally extended phenomena, they too can accommodate minimal
perceived durations.

5. Is Experience Continuous or Discrete?

In §5.2 we encountered the ‘discrete block’
conception: according to some philosophers (e.g., Bradley, Sprigge,
Whitehead) our streams of consciousness are composed of non-overlapping
sequences of Jamesian duration-blocks. To put it another way, our
experience is chunked or quantized. This model
has its merits – it offers a phenomenologically plausible account
of individual specious presents – it is also problematic: if
awareness of phenomenal continuity is confined to the interiors of
non-overlapping duration-blocks, it is hard to see how our experience
could be as continuous as it is often thought to be.

Something superficially akin to the discrete block model has been
entertained by a number of psychologists. According to proponents
of the ‘perceptual moment’ hypothesis, the workings of our
sensory systems are discrete rather than continuous. As Allport
puts it, the assumption is that ‘at some stage in the nervous
system, the sensory input is packaged for analysis into successive,
temporally discrete samples, or “chunks”. Underlying
this suggestion is the idea that the brain operates in some way
discontinuously in time on its inputs’ (1968: 395); also see
Stroud (1955). The duration of the chunks or moments (or frames)
is typically taken to be of the order of 100 msec.

Most proponents of this approach combine the claim that sensory
processing operates in a discrete manner with the further claim that
stimuli presented within any given perceptual windows appear
simultaneous (or are not distinguishable from one another at all); only
stimuli which are presented in distinct perceptual moments give rise to
experiences which seem successive. Thus construed, perceptual
moments are manifestly very different from the discrete specious
presents of Bradley, Sprigge and Whitehead: the contents of the latter
are typically not simultaneous, containing as they do
phenomenal duration and succession. However, as Allport suggests
(ibid. 396), there is no a priori reason why discrete
perceptual processing should lead to a complete loss of information
regarding the temporal order of stimuli occurring within individual
perceptual moments. If so, it is at least logically possible for
stimuli occurring within perceptual moments to give rise to experiences
of duration and
succession.[48]

Although there are results which can be taken as supportive of the
discrete processing model, over the past half century or so it has
gradually fallen from favour. Efron conducted a series of
experiments on auditory and visual experience to test two hypotheses:
(a) that a perception has a minimum duration, and (b) that perceptual
durations occur only in exact multiples of this minimum (the
quantization hypothesis). He found no support for the latter:
‘auditory and visual perceptions have a minimum duration,
produced by stimuli shorter than the critical duration, and that
perceptions evoked by longer than this critical value are continuously
graded with respect to duration’ (1970: 54). In recent
years the quantized conception has found renewed favour in some
quarters, e.g., Purves, Paydarfar & Andrews (1996) and VanRullen
& Koch (2003), but the supporting evidence is less than wholly
compelling. In a recent review Kline et al conclude
‘while quantized perception cannot be ruled out, there currently
exists little meaningful evidence in support of it’ (2004:
2658).

In its standard form the perceptual moment hypothesis is in tension
with our ability to perceive motion. It is by no means clear how
we could perceive motion if (i) our streams of consciousness
are composed of discrete (non-overlapping) perceptual moments, and (ii)
the experiential contents of each perceptual moment are (phenomenally
speaking) simultaneous. For Crick and Koch, however, there is no
problem here at all. According to their ‘snapshot
hypothesis’ (2003: 122), consciousness not only comes in discrete
chunks, the experience of motion is itself illusory:

Perception might well take place in discrete processing epochs,
perceptual moments, frames, or snapshots.
Your subjective life could be a ceaseless sequence of such frame
… Within one such moment, the perception of brightness, colour,
depth and motion would be constant. Think of motion painted onto
each snapshot … (Koch 2004: 264)

More recently, Herzog, Kammer and Scharnowski (2016) have outlined a
“two-stage” version of the discrete view for the case of
visual experience, according to which fine-grained perceptual
processing operates continuously at a sub-conscious level, but only
produces snapshot-like visual experiences periodically (e.g. every 400
msec or so). As we saw in §4.4, all such views face difficulties
on the phenomenological level: they need to provide a convincing
account of why, generally speaking, our experience seems so deeply
continuous, if in reality it is radically discontinuous.