The integration of speaker and listener responses: a
theory of verbal development.

Abstract:

We provide an empirically updated Skinnerian-based account of
verbal behavior development, describing how the speaker-as-own-listener
capability in children (the capability of children to behave as speaker
and listener within their own skin) accrues and how it is pivotal to
becoming verbal. The theory grew from (a) findings in experiments with
children with and without language delays and (b) findings from research
devoted to the identification of derived and emergent behavior (i.e.,
novel, creative, and spontaneous behavior). Experiments identified
preverbal instructional histories leading to separate listener and
speaker capabilities and experiences that joined the listener and the
speaker. Once this learned intercept is present, children engage in
conversational self-talk, engage in say-do correspondence, and acquire
new vocabulary without direct instruction. These developmental
capabilities make it possible for most complex behavior to be learned,
including reading, writing, emission of novel tenses and suffixes, and
the following of and construction of complex algorithms.

"We need separate but interlocking accounts of the behaviors
of both speaker and listener if our explanation of verbal behavior is to
be complete. ... In many important instances the listener is also
behaving at the same time as a speaker" (Skinner, 1957, p. 34).

Contemporary accounts suggest that to be truly verbal, the speaker
must also simultaneously behave as a listener (Barnes-Holmes,
Barnes-Holmes, & Cullinan, 2001; Greer & Ross, 2008; Home &
Lowe, 1996). But, how does that ability develop within the child's
life? A growing body of evidence suggests a developmental sequence on
how the initially independent speaker and listener classes of responding
come to be integrated within the individual. Verbal behavioral
development refers to children's experientially acquired
capabilities to learn and be taught new relations, to learn multiple
responses and multiple stimulus control from a single experience, to
learn at a faster pace, and to learn in ways they could not prior to the
attainment of verbal developmental capabilities. In this article we
describe findings and methods from a program of research that led to the
identification, and induction, of verbal developmental capabilities that
were missing in children, and these served as the basis for our theory.
The theory was made possible by, builds on, and complements research and
theory that identified sources of complex human behavior, including
verbal behavior (Barnes-Holmes et al., 2001; Barnes-Holmes, Healey,
& Hayes, 2000; Catania, 2007; Hayes, Barnes-Holmes, & Roche,
2001; Hayes & Hayes, 1989; Home & Lowe, 1996; Sidman, 1986).

Findings and theories from each of these programs, and interactions
with scientists involved in each of these programs of research, led to a
greater emphasis in our research on the listener functions in verbal
behavior, and a resurgence of a longtime interest in
speaker-as-own-listener (Lodhi & Greer, 1989). Relational Frame
Theory (Hayes et al., 2001), Naming Theory (Home & Lowe, 1996), and
Stimulus Equivalence Theory (Sidman, 1986, 1994) have suggested the
importance of the listener and related observing responses, as well as
potential experiential sources for emergent behavior. Experiments that
manipulated instructional histories subsequently identified preverbal
foundational developmental cusps, speaker and listener cusps, and verbal
capabilities or stages, together with protocols to advance them or
induce them in children in whom they were missing (Greer & Keohane,
2005; Greer & Ross, 2008).

The protocols that we identified from the experimental program
simply revolutionized practices that can be used to advance
children's verbal development and how children could learn and be
taught. Most of the procedures from this research have been replicated
with children in CABAS[R] research and development schools (Selinski,
Greer, & Lodhi, 1991) numerous times with considerable success
(Greer & Keohane, 2005). A recent book (Greer & Ross, 2008)
describes practices that professionals can use to advance
children's verbal development. However, the book also contains an
empirically based theory of language development that may be of interest
to a wide range of psychologists who are not the primary audience for
the book. In this article we describe the theory emphasizing the joining
of speaker and listener, particularly speaker-as-own-listener (Skinner,
1957). The concept of the speaker-as-own-listener is at the heart of
anecdotal linguistic evidence about what some linguists as well as
contemporary scholars of verbal behavior believe is unique and novel
about language functions (Barnes-Holmes et al., 2001; Crystal, 2006;
Greer & Ross, 2008; Hayes et al., 2001; Home & Lowe, 1996).

Indeed, this concept may be the critical gateway to complex human
verbal behavior. We argue that it is also a critical stage of behavioral
development, and a growing body of evidence suggests (a) the
foundational behavioral development that makes the intercept of
speaker-as-own-listener possible and (b) the subsequent learning and
development that is made possible by this intercept. We first present
the evidence that the speaker (a production response) and the listener
(an observing response) are initially developmentally independent.

Observation and Production

An individual becomes verbal when the speaker and listener
capabilities are joined within him or her. On the way to becoming
verbal, infants and young children come to observe certain stimuli
within their environment. Biological preparedness and adventitious
experiences with certain stimuli allow the stimuli to select out the
observing responses. These stimuli may be auditory, visual, tactile,
olfactory, or gustatory stimuli, and all senses may be involved in
observational responses (Keohane, Pereira-Delgado, & Greer, 2009).
Observational responses consist of listening, looking, touching,
smelling, and tasting. The listener response, as one type of
observational response, is of particular importance to our analysis and
thus deserves special attention.

Skinner (1957) spoke of the listener's role in verbal behavior
as providing "the conditions we have assumed in [italics added]
explaining the behavior of the speaker" (p. 34). More recent
accounts include analyses of the role of the assumed listener
(Barnes-Holmes et al, 2001; Greer & Ross, 2008; Hayes et al., 2001;
Home & Lowe, 1996). The listener mediates and shapes the behavior of
the speaker, and sometimes that speaker is the listener himself or
herself. Thus, a verbal person is not a "processor" of
language or a "retriever" of information stored in long- or
short-term memory, but instead is one who observes stimuli in a specific
way and responds appropriately to them. For purposes of this analysis,
listening is interchangeable with reading, because both are part of the
observing side of verbal behavior. The listener and reader have their
senses extended by listening and reading.

Verbal production responses are, for the purposes of this analysis,
interchangeable with speaking and writing. The speaker produces behavior
that functions to mediate between the environment and the listener
(Skinner, 1957). A listener, or an audience, that serves to consequate
the speaker governs the speaker's behavior. Speaker behavior
differs from other behavior in that listeners mediate the contingencies
that affect the speaker. For example, instead of reaching for something,
a speaker can ask a listener to hand the item to the speaker. Hence the
listener mediates for the speaker by handing the item to her or him.

Skinner (1957) described verbal thinking as the "speaker and
the listener in the same skin" (p. 11). The degree to which a
speaker is able to mediate his or her own speaker behavior is dependent
on the degree to which the speaker listens to his or her own speaker
behavior. When the speaker and the listener become joined, there is
evidence to suggest that this joining is a function of experiences that
occasion the intercept (Greer & Ross, 2008; Greer, Stolfi,
Chavez-Brown, & Rivera-Valdes, 2005).

The Initial Independence of Observing and Producing Responses

One of the paradigmatic shifts (an expression we do not use
loosely) introduced by Skinner's (1957, 1986) theory of verbal
behavior was his position that the speaker and listener capabilities
were, at least initially, independent types of responding, including
their independent evoking, eliciting, and consequating controls. Other
theoretical treatments of the listener and speaker capabilities relegate
them to a single language entity with receptive and expressive
attributes (Crystal, 2006; Pinker, 1999). Because our focus is on
environmental sources rather than psychological constructs, we avoid the
use of the terms receptive and expressive, as we are talking about the
behavioral capabilities of listening and speaking between individuals
and within one's own skin--much more is involved here than
receiving and sending.

We define verbal behavior as the language functions of both speaker
and listener as the individual functions with others and within his or
her own skin. However, the "behavior beneath the skin" that we
discuss is a covert version of the overt, not a psychological construct
(Uttal, 2001). Study of the physiological behavior beneath the skin will
eventually be joined with the overt behavior (Barnes-Holmes et al.,
2005; Dickins, 2005) as simply different sides of the same phenomenon.
Both sides are important. Linguists provide necessary structural
analyses (Crystal, 2006), and that is their unique contribution to an
account of language. The unique contribution of verbal behavior analysis
is to provide an account of the behavior-environment relations relative
to how the speaker affects the behavior of others and how the
listener's environment is mediated by the speaker. The structural,
functional, and neurophysiological accounts are all necessary for a more
complete treatment of language. It should be noted that vocal or oral
language is not synonymous with our subject matter of verbal behavior,
either, because verbal behavior may occur in various topographies (e.g.,
sign language, Morse code, smoke signals, drum beats, logograph symbols,
hieroglyphics). However, as we will see later, vocal verbal behavior has
advantages for advancing verbal development over other topographies
(Karchmer & Mitchell, 2003; McGuinness, 2004; Robinson, 1995).

We treat listening and speaking as separately evolved physiological
capabilities, whose joining is key to several contemporary accounts of
verbal behavior and our developmental account. The presence of the
independent physiological underpinnings (Davidson, 1978) of verbal and
nonverbal behavior, including the implicit respondent and operant
control, was likely determined by natural selection. These underpinnings
then allowed for the adventitious emergence of verbal behavior from
cultural contingencies that drew on respondent and operant physiological
capabilities (Catania, 2001). This view of the role of environmental and
cultural contingencies in the evolution of language has gained
considerable cross-disciplinary acceptance (Culotta & Hanson, 2004),
whereby the science of verbal behavior joins with linguistics, anatomy,
physiology, anthropology, neuropsychology, and other disciplines. In our
study of language as behavior-environment relations, we seek to know how
independent behaviors, and their separate controls, were joined by
experience in the development of key verbal behavior capabilities within
the life span of the individual (Greer & Keohane, 2005/06).

Anecdotal and Empirical Evidence of Separate Listener and Speaker
Capabilities

One may distinguish the sounds of an unfamiliar language as
different from one's own but still not be capable of responding to
these sounds as either listener or speaker within that language. The
expression "It's Greek to me" says this well. Moreover,
one might even learn to translate the language by relating print in
another language to print in English (without the auditory component)
without being able to "understand" the language as a listener
or emit speaker responses (Hayes et al., 2001). To be truly verbal, the
listener and speaker must be joined (Barnes-Holmes, 2001). How the
speaker and listener functions come to be joined is central to an
account of verbal development.

Several reports have identified children with and without
disabilities for whom listener and speaker functions were initially
independent (Feliciano, 2006; Gilic, 2005; Greer & O'Sullivan
2007; Greer, Stolfi, et al., 2005; Greer, Stolfi, & Pistoljevic,
2007; Home, Lowe, & Randle, 2004; Lee, 1981; Lowe, Home, &
Hughes, 2005; C. Sundberg & Sundberg, 1990; Tsiouri & Greer,
2007). It also appears that certain listener and speaker capabilities
develop differently and even at different rates within one's life
span. It is possible, and maybe even probable, that much of language
development involves the differential processes joining these two
capabilities.

Methods for Identifying Verbal Capabilities in Experiments

If we are to understand verbal development, it is necessary to
identify how the listener and speaker functions are formed and then
joined by experiences. Clearly this cannot be accomplished via
descriptive analyses of the covariance between age and verbal
capabilities alone. Experiments are necessary, but how can they be done?

One approach is to compare the behavior of different species with
that of humans (Heyes & Galef, 1996; Zentall, 1996). Another
approach is to induce language functions in primates (Premack, 1976,
2004; Savage-Rumbaugh, 1984) or simulate verbal behavior with other
species (Epstein, Lanza, & Skinner, 1980). However, these lines of
evidence do not tell us how verbal behavior develops within the life
span of the human.

The best scientific alternative would be to discover a tribe of
preverbal humans and test various environmental interventions that lead
to the emergence of verbal behavior. Interestingly, something like this
does exist. Our own species has both verbal and nonverbal members. Given
that fact, if we can identify ways to induce verbal behavior when it is
missing in members of our own species, we can provide within-species
evidence.

Within-Species Experiments on Incremental Steps

The potential for incremental experimental analyses became possible
with the development of behavior-analytic interventions to induce
speaker and listener functions in individuals who, most likely, would
not have become speakers and listeners (Guess & Baer, 1973; Lovaas,
1977; Ross & Greer, 2003; M. Sundberg, Michael, Partington, &
Sundberg, 1996; Tsiouri & Greer, 2003; Williams & Greer, 1993).
These interventions made it feasible to study the incremental steps
involved in how the listener and speaker functions intercept. Evidence
is now available to show that the two are initially independent but
become joined (Greer & Ross, 2008).

Listener and Speaker Distinctions and Verbal Development

The distinctions between listener and speaker were particularly
helpful in attempts to remediate language deficits. For example, it is
now common for children with severe verbal delays to be taught good
functional speech-production repertoires using well-known behavioral
procedures, particularly the emissions of mands, which are requests or
words that specify reinforcers (Skinner, 1957), and yet continue to lack
important listener responses. For example, a child may have mands in his
or her repertoire but not be influenced by the phonemic sounds emitted
by others (Greer, Chavez-Brown, Nirgudkar, Stolfi, & Rivera-Valdes,
2005). In Greer et al.'s study, eight children with mand
repertoires could not respond to simple vocal instructions when no
visual cues were missing. In the experimental intervention, the only way
that the children could be reinforced was by accurately responding to
the consonant-vowel sounds of the experimenter's speech. After this
intervention, the children could respond as listeners. Also, for
children with autism spectrum disorders, it is possible that problems
with the more advanced listener capabilities contribute to what is
characterized as "social deficits," the lack of a "theory
of mind," or a kind of "choice to withdraw." We think
that part of these problems are tied to these individuals' lacking
the conditioned reinforcement control for functioning as a listener, and
other related observing responses (see Reilly-Lawson & Walsh, 2007,
for evidence to this effect).

The speaker's verbal function, as identified by Skinner
(1957), is for the speaker to have his or her environment
"mediated" by a listener. But the speaker mediates for the
listener, too (Greer & Ross, 2008; Hayes et al., 2001; Home &
Lowe, 1996). The reinforcement for a listener is the extension of the
listener's senses from the behavior of the speaker (Skinner 1957,
1986). The listener profits when a speaker warns the listener of
consequences, as in "It is raining," "The food is
terrible there," "You must see the huge moon," "Your
friend may not be a real friend," or "I am upset." In
addition, the acquisition of a new tact (Skinner's term for the
direct control of a stimulus, as in seeing a stimulus and saying its
"name") reinforces future listening to recruit tacts (as a
function of prior reinforcement for emitting tacts). It is also possible
that the reinforcement for being a listener needs to be present if one
is to have empathy (Barnes-Holmes et al., 2000; Reilly-Lawson &
Walsh, 2007).

Differences and Similarities Between Listening and Other Types of
Copying

Complex Vocal Copying in Speaker, Nonverbal, and Verbal Functions

Echoics are vocal responses that have point-to-point correspondence
with the vocal emissions of other speakers and that come to serve verbal
functions (Skinner, 1957). A child may point to a toy and attempt to
gain access to it. If a parent holds the toy while saying
"toy" and the child then says "toy" in order to gain
the toy, this is an example of an echoic response, in that the copying
moves to a mand function. Indeed, the very first vocal copying responses
are instances of what Skinner described as parroting. Parroting is not a
speaker operant, although parroting is dependent on hearing the
correspondence between what is heard and what is said. Parroting (e.g.,
repetitions of saying "mama" that have no functional effect
for the speaker relative to a listener) may then lead to serendipitous
effects on the behavior of a listener and result in speaker operants
such as mands and tacts. When a short history of vocal copying results
in speaker effects on a listener, the parroting shifts to echoic operant
functions. Whereas Skinner characterized these speaker operants as being
verbal, others have suggested that they are not fully verbal if the
speaker is not simultaneously a listener (Barnes et al., 2001).

Very young children and children with language deficits may have
speaker operants that are not verbal (Ross & Greer, 2003; Tsiouri
& Greer, 2003, 2007). However, this does not mean that a child has
the speaker-as-own-listener capability needed to be fully verbal
(Barnes-Holmes et al., 2000; Greer, Stolfi, et al., 2005; Greer, Stolfi,
et al., 2007; Horne & Lowe, 1996).

Nevertheless, the speaker operants are critical in advancing the
special mediated reinforcement that separates verbal observe-and-produce
effects from other types of human observation-and-production relations
(e.g., see-do, hear-sing, see-draw). The speaker operants are necessary
for establishing one of the mediating functions of verbal behavior. The
echoic is a necessary speaker operant, and it can be induced as a result
of special arrangements for joining see-do and hear-say as a higher
order copying class (Greer & Ross, 2008; Ross & Greer, 2003;
Tsiouri & Greer, 2003). A higher order operant results when two or
more previously independent operants join as an overarching operant
(Catania, 2007).

Ross and Greer (2003) and Tsiouri and Greer (2003) reported
procedures that resulted in children acquiring functional speech who had
never spoken, one of whom was 9 years old. This occurred when
opportunities to echo were preceded by teaching them generalized
imitation or the capability to imitate novel behavior. This new
capability for generalized imitation was then used as a part of an
intervention to induce speech. Several body-imitation responses (see-do)
were alternated with the opportunity to echo words in mand functions
under relevant deprivation conditions for emitting the mand
(echoic-to-mand). The observing-and-producing responses for imitation
are independent because seeing-and-doing involves (a) seeing someone
perform a response, (b) emitting the response, and (c) observing the
visual correspondence between one's own response and the response
of the other. This is not easy, as anyone who has ever tried to learn a
new dance step can attest. However, whereas hearing-and-saying is a
copying response, as is seeing-and-doing, acquiring the correspondence
between what is observed and one's own production is more difficult
for the hear-and-say relations because one only "observes" the
speech of another aurally. One can observe only the outcome because the
process remains hidden from view, for the most part. We must match the
observed speech with our own sounds, without the advantage of seeing how
it can be done (Vargas, 1982). Moreover, we must observe the
correspondence between our spoken sounds and the speech of others in the
process of emitting the echoic. Indeed, the echoic is similar to what
has been identified in the social learning research as emulation (Heyes
& Galef, 1996). In emulation, the product is copied, whereas in
imitation, the process is copied. However, what the see-do and hear-say
relations have in common is that they both involve correspondence
between observing and producing as an overarching class of operants.
Tsiouri and Greer (2007) found these two different
observing-and-producing classes to be initially independent. However, in
the experiments by Ross and Greer (2003) and Tsiouri and Greer (2003),
they were joined into an overarching operant as a result of multiple
exemplar rotations across the different classes (i.e., generalized
imitation and the echoic-to-mand conditions). The new relation is a
higher order cross-modal relation involving the joining of two different
classes of observing-and-producing responses.

Although higher order operants occur within either of the speaker
categories, special benefits accrue with the emergence of higher order
operants across speaker and listener responding that lead to being fully
verbal. This is made possible by more advanced listener roles in the
speaker-as-own-listener capability. A more advanced listener results
when a child incidentally acquires new vocabulary from hearing others
"label" or tact aspects of the environment. This special
listener capability of acquiring the tact, or label for an object,
without direct instruction is a much more advanced listener capability,
as we will describe later.

Differences in Reinforcement Effects of Verbal and Nonverbal
Observation and Production

Although observing and emulating are important for language
functions, observing and producing correspondences are found in other
types of complex human behavior, as in the making of music, dance, or
visual art. The reinforcement for the latter is direct, immediate, and
automatic. Automatic reinforcement occurs when behavior itself is the
reinforcer (e.g., a child swinging on a swing). However, language
functions are unique because the reinforcement is indirect and another
person mediates it. Creative and seemingly untaught outcomes (i.e.,
derived relations) occur in the arts and in language. As described
earlier, novel, creative, or spontaneous behavior is derived behavior,
in that it is not directly taught but instead emerges as a function of
the acquisition of certain types of stimulus control. This kind of
control emerges because of certain histories (Greer & Ross, 2008).
We suggest that derived relations are possible in each of these
categories of observing and producing responses. Thus, higher order
operants are not restricted to verbal functions. However, a critical
component of being verbal occurs when higher order operants involve
reinforcement that results from language functions.

Rotation of the Listener and Speaker

The rotation of speaker and listener responses between individuals,
or turn-taking, is readily observable, and there is an extensive
linguistic literature (e.g., Crystal, 2006) on the structure of these
episodes. There is also a verbal behavior literature identifying certain
types of rotations as particular types of interlocking verbal behavior
units themselves (Donley & Greer, 1993; Pistoljevic & Greer,
2006). The latter studies demonstrated what Skinner characterized as
episodic verbal behavior between individuals. When both listener and
speaker responses are reinforced for an individual in a dyad involving
turn-taking, it is an observable incidence of an episode in which both
speaker and listener responses for each of the individuals are
reinforced. Moreover, the rotation between speaker and listener within
the individual's own skin is observable also, as in when young
children talk to themselves aloud while engaged in fantasy play (Lodhi
& Greer, 1989).

Audience Control

As most, but not all, children gain more experiences with
audiences, they cease to speak aloud to themselves in the presence of
others, due to the repressing effects of having an audience. Some
children and adults continue to talk aloud using contextually
inappropriate speech because they lack the contextual audience control.
The presence of the audience for older children punishes the overt
speaker and listener responses--a phenomenon that Skinner (1957)
referred to as one of many types of audience control.

In a controlled experiment, Hugh (2006) compared several
consequences for children's talking aloud in free-play settings
when they emitted what appeared to be contextually inappropriate speech.
She found that the contingent removal of or production of 3-s recordings
of the children's self-talk or music resulted in the
children's ceasing to talk aloud in free-play settings. Apparently,
these contingencies resulted in the child discriminating the presence of
an audience. The opposite happens when individuals sing aloud, as in
situations where they listen to recordings wearing earphones (often
followed by the appearance of embarrassment when the person realizes an
audience is present). However, we suggest that one's speaker and
listener responses within the skin do not stop, they simply become
covert--a progression not unlike the process whereby one goes from
reading aloud to reading silently. For example, when certain audiences
are not present, adults continue to emit both speaker and listener
responses aloud--for example, when they talk to their computer,
seat-belt alarm, a recalcitrant drawer or shirt button, or their pets.
Clearly, audience control, or the lack thereof, is at work here (Epting
& Critchfield, 2006). A more advanced level of self-talk occurs when
a writer, in the process of editing, rotates the covert speaker and
listener roles. The effective reader "listens" to what is
read, and an effective editor listens to what she or he writes as the
target audience would listen to what is written; that is, he or she
listens for the responses that would reinforce the target reader.

Gaining Reader and Listener Control

Several studies have demonstrated written and vocal verbally
governed behavior where the verbal stimulus control for performing
algorithms was isolated and shown to result in verbally governed problem
solving (Keohane & Greer, 2005; Marsico, 1998). In these
experiments, written directions or algorithms for performing problems
were provided to participants. In the Marsico study, middle school
students were taught algorithms to solve math problems until they could
learn new math operations solely by following print instructions. Probes
with novel algorithms for learning other subject matter showed that the
students were now under the written control of verbal print stimuli. In
this experiment and the Keohane and Greer study, prior to acquiring
verbal stimulus control, written instructions did not result in use of
problem-solving operations, whereas afterwards they did. Such verbally
governed behavior is facilitated by the connection of written language
to phonemic sounds in speech. This is evident in the history of written
languages and the best research on reading and spelling, wherein
phonemic auditory control is shown to be essential (McGuinness, 2004;
Robinson, 1995). However, what of the child who, for native or
environmental reasons, lacks a listener capability or has a weak
listener repertoire? Or, what of the child who does not receive, or has
not yet received, the experiences necessary to join the listener and
speaker within the same skin?

It is very likely that many complex verbal problem-solving tasks
involve the joining of the listener and speaker repertoires within
one's own skin, or substitutes, if one is deaf. However, members of
the deaf community seldom have reading comprehension beyond the
sixth-grade level (Karchmer & Mitchell, 2003). The lack of phonemic
auditory components of reading is regarded as the source of the problem.
Thus, one speaks and, when able, listens to oneself, or goes from
listening to matching first instances of speaking with what is heard.

The Merging of Listener and Speaker

The evidence for how the listener and speaker are joined grew out
of research tracing the emergence of certain verbal developmental cusps
and capabilities; one of the most important is a phenomenon identified
as Naming (Home & Lowe, 1996; Home et al., 2004; Lowe et al., 2005).
The verbal developmental capability of Naming (capitalized herein to
distinguish it from common usage) is not merely saying or labeling
things using language; rather, it is a verbal developmental cusp
(Rosales-Ruiz & Baer, 1996) and a verbal developmental learning
capability that allows a child to simultaneously acquire speaker and
listener vocabularies incidentally. That is, the child acquires new
vocabulary without direct instruction and seemingly without
reinforcement.

Naming

Home and Lowe (1996) first identified Naming as a verbal
developmental capability and provided a series of experiments on Naming
as a facilitator of emergent categorizations. Naming has been described
as the real beginning of what is essentially being verbal (Barnes-Holmes
et al., 2001; Hayes et al., 2001; Home & Lowe, 1996). Horne and Lowe
also proposed a program of research on Naming as a dependent
variable--the origin of this developmental phenomenon.

Horne and Lowe (1996) identified Naming as a developmental
phenomenon, and much of the research on Naming has treated it as an
independent variable or as a process leading to categorical or derived
responding (Miguel, Petursdottir, Carr, & Michael, 2008). This
research tests whether being verbal (having Naming) is essential to, or
facilitates, certain types of derived relations. However, Naming itself
is emergent behavior involving derived relations, and it is a
developmental verbal cusp and capability (Fiorile & Greer, 2007;
Gilic, 2005; Greer & Keohane, 2005/06; Greer, Stolfi, et al., 2005;
Greer, Stolfi, & Pistoljovic, 2007). The research that we describe
focused on Naming as a developmental cusp and capability in cases where
children who lacked the Naming capability acquired it as a function of
experimental interventions. The experiments described below were pre-
and postintervention time-lagged multiple-probe designs, or designs that
combined experimental-control group designs with nested time-lagged
multiple-probe designs, and provided controls for maturation and
instructional histories. These studies were not demonstrations of Naming
as an example of emergent behavior or tests of the relation of Naming to
derived relational responding, although that occurred too; rather, they
were experimental tests for the effects of interventions on the
emergence of Naming itself.

The Experimental Analyses of the Acquisition of Naming: Procedures

Testing for the Presence or Absence of Naming. The test for the
presence or absence of Naming is done as follows. Visual stimuli (both
contrived and uncontrived pictures or symbols) with corresponding
contrived or uncontrived vocal labels (tacts, to use Skinner's term
for seeing and saying) are presented to a child in a match-to-sample
arrangement in which correct and incorrect matches are present. The
child is then given a card that matches one of the samples while the
experimenter says the tact for the stimulus (e.g., "match
zog," as a contrived tact for a contrived stimulus, or "match
hawk," as a tact of a particular bird). The child is then required
to place his or her sample card on the correct match. This experience
provides a controlled simulation of the conditions under which a child
with Naming would naturally acquire a new speaker response (e.g.,
"that's a hawk," as a tact) and the relevant listener
response (point to or look at the hawk when someone emits the tact or
says "hawk") without direct instruction. These conditions,
under which the child matches the correct visual stimuli while hearing
the experimenter say the tact for the stimulus, ensure the joint
attention of the child and the experimenter to the visual stimuli and
set the occasion for the child to learn the tact. If the child masters
the visual matching requirement while the tact is spoken by the
experimenter, but cannot emit both the speaker and listener response,
the child is identified as not having the Naming capability. The child
may have the listener component but not the speaker component (most
typically) or vice versa (in rare cases) and is then identified as
having either the listener or speaker half of Naming. If the child emits
both listener and speaker responses, she or he is identified as having
Naming as a verbal developmental cusp and capability. (We will describe
the distinctions between cusps and capabilities later.)

Naming as a Dependent Variable

Another common feature of these experiments is that individuals
without Naming became the participants in experiments designed to test
for the effects of interventions on the emergence of Naming as a
dependent variable. The conditions outlined in the prior paragraph
constitute the pretest. After the intervention, the children are tested
again for Naming using the same procedures as in the pretest described
above; however, in the posttest, the children do not repeat the matching
trials while hearing the spoken words for the visual stimuli. Rather,
the children must point to the stimuli as a listener and tact the
stimuli as a speaker--responses that they could not make prior to the
experimental intervention. If the child emits unreinforced correct
responses as a listener and speaker to 80% of the probe trials, he or
she also receives a novel set of stimuli. The child is then taught to
match the novel set of stimuli while hearing the experimenter say the
tact for the stimuli and then probed for untaught speaker and listener
responses. If he or she meets the 80% criterion on the initial probe set
and the novel set, the child is deemed to have the Naming capability.

Our early intervention consisted of what we describe as multiple
exemplar instruction (MEI) across listener and speaker responding for
one or more instructional sets of stimuli (i.e., five different stimuli
presented four times in 20 instructional trial sessions/blocks). The
child receives instructional trials on stimuli different from the
stimuli used in the pre- and postintervention probe trials. The
instructional trials consist of reinforcement for correct responses and
a correction for incorrect responses in which the child must emit the
correct response while attending to the stimulus but is not reinforced.
Providing the correction appears to be more efficient than differential
reinforcement alone, as Skinner (1968) found when he developed the frame
in programmed instruction (Kangas & Branch, 2008). The presence of
all of these components in instructional trials, as in the frame, meets
the requirements for what has been identified in several experiments as
learn units. Learn units consist of instructional presentations that
have all of the tested components, described above, that were found to
ensure mastery (Albers & Greer, 1991; Emurian, 2004; Emurian, Hu,
Wang, & Durham, 2000; Greer, 1994; Greer & McDonough, 1999;
Ingham & Greer, 1992; Selinske et al., 1992; Skinner, 1968).

Multiple Exemplar Instruction Across Listener and Speaker
Responding

Using learn units, we do visual match-to-sample presentations with
pictures or objects while the child hears the experimenter tact or say
the "name" for the picture or object. Next, a listener trial
may be presented in which the child is asked to point to a stimulus in
an array that includes a correct stimulus and incorrect stimuli;
however, the stimuli are counterbalanced such that they are not
presented in immediate serial proximity for the different speaker and
listener responses. This ensures that the visual stimuli will control
the response, along with vocal stimuli, for listener and match-to-sample
trials. Next, a speaker response is presented. The presentations
continue until the child masters all of the listener and speaker
responses for the instructional set (five pictures or objects). If one
response is mastered for all stimuli, it is still presented in rotated
fashion but not reinforced, with the responses not mastered until all
responses to the five stimuli are mastered for the instructional set.
Mastery is a criterion of 90% across two consecutive sessions (80 trial
sessions/blocks) for all speaker and listener responses to all five
stimuli in the instructional set.

Experiments on the Induction of Naming: Findings

In the process of screening for children missing Naming in our
experimental analysis for the sources of the acquisition of Naming, we
coincidentally acquired some evidence about a correlation between age
and the presence of Naming. In screening for children with and without
Naming, Gilic (2005) tested 19 typically developing 2- and 3-year-olds.
She found that 9 out of 9 typically developing and sensory-intact
3-year-olds from upper middle class families (a demographic that
suggests they had rich language histories; Hart & Risley, 1995) had
both the speaker and listener responses for three-dimensional objects
after hearing an adult say the tacts for stimuli as the child learned to
match the stimuli as described above.

In the same sample, 8 out of 10 upper middle class 2-year-olds
could not do this. Some could respond as listeners but not as speakers,
or vice versa. We determined they lacked Naming if, as a result of the
Naming test, they (a) could respond to the stimuli as listeners but
could not say the words for the stimuli, (b) could respond as speakers
but not as listeners, or (c) could do neither. The children who lacked
Naming became candidates for the MEI intervention described earlier.
After the MEI interventions, they responded with both listener and
speaker responses to the original stimuli (unfamiliar objects) to which
they did not respond initially with the contrived "names." The
instructional history that led to Naming was shown to be experimentally
traceable to the multiple exemplar experience because the relevant pre-
and posttests and the interventions were time-lagged to control for
history and maturation. Moreover, in this and other experiments, the
design also included an experimental and control group component (Gilic,
2005; Greer, Stolfi, et al., 2007; Pistoljevic, 2008). In each of the
experimental and control group designs, 4 children who did not have
Naming remained in a matched control group that did not receive MEI
until the first 4 participants attained naming. The control groups did
not attain Naming. Subsequently, the children in the control groups
received the MEI intervention that was introduced in a multiple-probe
time-lagged design, and they attained Naming. These experiments and
others that we shall describe showed that the capability accrued from
experiences, not age. Age often provides opportunities for relevant
experiences, but only if the child has the prerequisite experiences
that, in turn, allow her or him to make contact with the special
contingencies that lead to the emergence of Naming and other verbal
developmental capabilities (Greer & Ross, 2008).

Capabilities, Cusps, and Repertoires

We have come to make certain distinctions between capabilities,
cusps, and repertoires. Our distinctions are not universal in behavior
analysis, but differences in the outcomes associated with each suggest
that they are important distinctions.

Behavioral Developmental Cusps

One of the most important conceptual contributions to a behavioral
treatment of development was the notion of behavioral developmental
cusps. Rosales-Ruiz and Baer (1996) identified behavioral developmental
cusps. Their identification and description of behavioral cusps are key
to the contemporary study of development from the behavioral
perspective:

Cusps That Are New Capabilities

For example, once a child learns to walk, she or he comes into
direct contact with new contingencies or experiences that result in new
learning opportunities. The child learns from direct contact with new
contingencies, or reinforcing or punishing experiences, that she or he
could not contact before. However, when the induction of a behavioral
developmental cusp also results in a child's being able to learn in
a way he or she could not before, we identify that as an experientially
derived verbal developmental capability and higher order or overarching
operant (Healy, Barnes-Holmes, & Smeets, 2000). For example, once
children have Naming, they acquire new words in speaker and listener
functions without direct instruction. Thus, they can learn in ways they
could not before. Once they have the Naming capability, their vocabulary
expands commensurate with their having experiences in which speakers
tact objects. Until children have Naming, they must be taught new
speaker and listener responses via direct instruction involving
differential reinforcement and corrections. While a developmental
capability is a cusp, not all cusps are developmental capabilities.

Capabilities allow children to learn in ways they could not before,
and they are crucial to language development. For example, McGuinness
(2004), in an exhaustive review of the literature on reading and
spelling, reported evidence that children need 55,000 words for normal
discourse and 86,000 words to be successful over the course of the
elementary school years. If this is the case, it is not likely that
these were learned via direct instruction. Hart and Risley (1996)
reported few incidences of direct instruction in language in their
longitudinal study of the development of language. This phenomenon,
whereby children seem to acquire language seemingly without direct
instruction, has led some, but not all, linguists to posit innate
psychological language constructs, and discount the role of learning,
based on anecdotal descriptions of emergent language (Pinker, 1999). Of
course, now that Naming and other verbal capabilities have been shown to
be the result of experimentally isolated experiences, these construct
theories seem to have less face validity (Barnes-Holmes et al., 2005;
Feliciano, 2006; Fiorile & Greer, 2007; Gilic, 2005; Greer,
Nirgudkar, & Park, 2003; Greer, Stolfi, et al., 2005, 2007; Greer
& Yuan, 2008; Miguel et al., 2008).

The Importance of Naming in Incidental Learning and Formal
Education

Much of what we learn incidentally and in classrooms must occur
from the following: (a) the Naming capability (Greer &
O'Sullivan, 2007), (b) the ability to learn from indirect contact
with learn units (Greer, Singer-Dudek, & Gautreaux, 2006), and (c)
the emergence of conditioned reinforcement from observation (Greer &
Singer-Dudek, 2008; Greer, Singer-Dudek, Longano, & Zrinzo, 2008).
Catania (2007) suggested that it is likely that the Naming repertoire is
also the basis for the ability to speak about things in their physical
absence, consistent with Skinner's term conditioned seeing. Naming
is also important for children's success in most educational
settings. Children can profit to some degree in typical classroom
settings if they have Naming, but if they do not, the lack of direct
learn units (direct reinforcement and corrections) in most educational
settings means that they cannot be successful (see Greer, 1994, for data
on the lack of direct instruction in classrooms).

For those who teach young typically developing children, it is
common to find that when children are taught, for example, to point to
colors as the teacher says the word for the color (a listener response),
they cannot say the "name" of the color (a speaker response)
even when they can match the stimuli visually (Engelmann & Carnine,
1991; Lee, 1981). This difference is true for the range of curricula,
such as pointing to and saying letters, phonemes, numbers, objects, and
shapes. The child's reaching a point at which he or she can emit
the speaker response after learning the point-to response and vice versa
is made possible, it would seem, by acquiring the Naming capability.
Chase, Johnson, and Sulzer-Azaroff (1985) identified what we believe is
a more advanced case of the persistence of the independence of the
speaker and listener, and problems in the joining of the two. They found
significant differences between production and selection responding in
mature and intellectually capable college undergraduate students, where
performances on multiple choice questions (a selection response more in
keeping with a listener response) and essay exams (a production or
writing response) were different. Moreover, some impoverished
middle-school-age children were found recently to also lack Naming
(Helou-Care, 2008). These students were similar to the children from
low-SES families who were identified in the Hart and Risley (1995)
longitudinal study as having low levels of verbal language experiences
compared to middle income families and professional families.

Other Speaker-as-Own-Listener Capabilities

Naming is a critical speaker-as-own listener capability, but it is
only one of the three such capabilities that have been identified (Greer
& Keohane, 2005; Greer & Ross, 2008). The other two types
identified in the research include say-do correspondence (Paniagua &
Baer, 1982; Rogers-Warren & Baer, 1976) and self-talk conversational
units involving the speaker and listener in the same skin (Lodhi &
Greer, 1989). All three of these speaker-as-own-listener capabilities
are cases of the joining of the speaker and listener within the
individual.

When children have say and do correspondence (saying what you are
going to do and then doing it), we have some evidence that the
child's speaker repertoire is responded to by her or his listener
capability. There is something much more basic here than
"self-management." That is, if a child says, "I am going
to play with an item" and then proceeds to do so, we say there is
correspondence between what she or he says and what she or he does
(Paniagua & Baer, 1982); however, as Paniagua and Baer pointed out,
to be true say-do correspondence, the effect cannot be the result of
direct instruction. This is evidence that the phonemic sounds and the
vowel/consonant blends of words said by the individual have
correspondence with the individual's own listener responding. The
individual's own speaker stimuli join the individual's own
listener responses.

Lodhi and Greer (1989) identified speaker-as-own-listener
responding when they studied self-talk during the solitary play of
typically developing 5-year-olds. Independent observers, who were naive
to conditions, observed videotapes of children engaged in solitary
fantasy play. The children occasionally "directed" their
activities, demonstrating say-do correspondence. In addition, the
children talked aloud to anthropomorphic toys (stuffed animals, dolls,
and pictures of people and animals that were used to evoke
speaker-as-own-listener responding in fantasy play). In that fantasy
self-talk, they responded in both speaker and listener roles, changing
their tone of voice and function such that they completed conversational
units. Conversational units are units of verbal exchange or turn taking
between two or more individuals, or within a single individual, that
result in an exchange in which each party (or separate capability within
the individual) completes interlocking verbal operants as both speaker
and listener (Donley & Greer, 1993; Lodhi & Greer, 1989;
Pistoljevic, 2008; Pistoljevic & Greer, 2006).

For example, in one case of a self-talk conversational unit taken
from Lodhi and Greer (1989), the child speaks to a stuffed horse and
says, "Hi, horsy," and the horse responds in a different voice
as a listener-speaker, "Want to play with me?" Subsequently
the child responds, "Let's play in the dollhouse," the
horse and child move to the dollhouse, and play occurs (a listener
response). The original speaker responded to the presence of the horse,
the horse responded as listener and speaker, and the original voice
responded as a listener by moving to the play area. When the pair then
went to the dollhouse, they also demonstrated say-do correspondence and
perspective taking (Y. Barnes-Holmes et al., 2000; Luciano, Herruzo,
& Barnes-Holmes, 2001). Note in this case that the correspondence
was not the result of prior instruction (Paniagua & Baer, 1982).

We can observe the same phenomenon in adults as they serve as both
speaker and listener with their preverbal infants. A mother may produce
the question, "Does Mommy love you?" and respond for her baby,
"Yes, I know Mommy loves me very much." Although the
mother's overt speaker behavior for the most part may be under
audience control, this type of self-talk is culturally acceptable and,
to most people, endearing. Conversational units, and self-talk involving
correspondence between saying and doing, together with Naming are
components of the joining of the speaker and listener capabilities in
the individual. These, in turn, are foundational to most complex verbal
behavior.

Development of the Components of Naming

Figures 1 though 5 illustrate how one of these
speaker-as-own-listener capabilities, Naming, emerges as a result of the
transformation of stimulus control across speaker and listener and how
it can also incorporate abstraction or categorization (the example of
red used in Figures 1 and 2, or maple trees in Figure 3). By
"transformation of stimulus control" we mean the following.
Prior to the acquisition of Naming, the experience of hearing others
tacting visual or other sensory stimuli did not result in learning the
tact or listener response; or, one of the responses was learned but the
other was not. After Naming, the stimuli found in such experiences were
transformed to control the untaught speaker and listener components of
Naming. Note that this is not stimulus or response generalization;
rather, certain instructional histories transformed the control of these
stimuli from observation alone.

[FIGURE 1 OMITTED]

Figures 1, 2, and 3 provide examples of children without Naming,
children with either the listener or speaker components of Naming only,
and children with full Naming. Figure 3 shows how the stimuli in Figures
1 and 2 were transformed. Several studies using the experimental
procedures we described earlier have identified typically developing
children and children with language delays who lacked Naming, had
speaker or listener components of Naming, or had full Naming (Fiorile
& Greer, 2006; Gilic, 2005; Greer & O'Sullivan, 2006;
Greer, Stolfi, et al., 2005, 2007; Lowe, Horne, Harris, & Randle,
2002; Lowe et al., 2005;). Also, several experiments have shown that
children who lacked Naming acquired it as a function of a
multiple-exemplar intervention or related interventions (Feliciano,
2006; Fiorile & Greer, 2007, two experiments; Gilic, 2005, two
experiments; Greer, Stolfi, et al., 2005, 2007, two experiments;
Helou-Care, 2008; Longano, 2008, two experiments; Nirgudkar, 2005;
Pistoljevic, 2008, two experiments).

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

When Naming Joins Print Control: Reading and Writing

Naming plays other critical roles. For example, as a verbal
developmental capability, Naming may be basic to the manner in which
print stimuli become a critical part of verbal functions (Lee-Park,
2005). When children learn to say the sounds of letters, or phonemes,
they can say the printed words that Skinner (1957) called textual
responding. As they speak the words in the textual response, they also
hear what is said. That is, if a child decodes (i.e., textually responds
to or says) the phonemes K-A-N-GA-R-OO and the child has the tact and
listener responses for kangaroo that accrued from hearing someone tact a
kangaroo, the listener component of Naming provides comprehension (see
Figure 4). Helou-Care (2008) demonstrated this relation in a recent
experiment. Students were selected for the experiment who (a) were
fluent phonemic textual responders (i.e., they "decoded" words
accurately and at 160 words per minute or faster) but (b) had poor
comprehension for contrived stories and (c) lacked Naming. On the
pretest with contrived stimuli and words, they could not answer
comprehension questions. Following the induction of Naming, the
participants demonstrated comprehension on their posttest performance
with the contrived story. In a related experiment, Reilly-Lawson (2008)
found that students who achieved Naming but lacked fluent phonemic
decoding had comprehension as a function of being taught fluent phonemic
decoding. The auditory stimulus and the listener capability play a key
role. One may substitute verbal signs for stimuli (e.g., American Sign
Language). Thus, a child may have Naming that involves observing signs
for stimuli along with observing the stimuli designated by the sign. All
of the other relations leading to reading and writing may follow a
similar path as described for the listening child with speech. However,
without the auditory phonemic control of listening, deaf children
achieve reading comprehension at only the sixth-grade level (Karchmer
& Mitchell, 2003). The listener is key in the Naming
capability-derived relations between print and textual responding, and
derived relations between hearing or saying and writing.

[FIGURE 4 OMITTED]

Figure 5 illustrates the possible emotional effect of
"conditioned seeing" (Skinner, 1957), which accrues from the
joining of Naming and phonemic responding to text. In the Naming
experience, emotional effects are also conditioned, and these can be
elicited by reading (Leader, Barnes-Holmes, & Smeets, 2000; Longano,
2008; Roche & Barnes-Holmes, 1997). The term frame seems to capture
the extensive potential of these relations (Hayes et al., 2001) relative
to the environmental sources for their formation. In our early
experiments we did not test for mutual or combinatorial entailment.
Testing for these constitutes the litmus test for a relational frame.
However, in a recent study, Reilly-Lawson (2008) did show the role of
mutual and combinatorial entailment.

Naming is not the only way in which readers achieve comprehension.
The second condition under which they may have comprehension occurs when
readers receive learn units for the novel words they read. That is, they
are taught the tacts, as when we learn new abstract terms from direct
instruction (e.g., instruction in the difference between types of
molecules). The third condition involves indirect contact, as in
observing others receive instructional contingencies for tacting and
emitting listener responses to a stimulus (Greer et al., 2006). An
example of this occurs when a lecturer questions other students and an
observing student observes reinforcement and corrections received by
others. Of course, if a lecturer simply lectures and illustrates, but
does not provide learn units or observational experiences of others
receiving learn units, students will learn only if they have Naming.

If the child has still another type of derived relational
responding across saying and writing, he or she can also spell a novel
word. That is, if the child has derived relations between the phonemic
sounds and the writing of letters for those sounds when he or she hears
a word, the child can spell the word in a written response or say the
letter "names" for the word. This is not a case of
generalization or transfer, as the responses are different. When
children are taught to spell by saying the phonemes, writing the letters
is an entirely different behavioral topography. Similarly, if taught to
write the letters, saying the letters phonemically is a different
response. Saying the letter names is still another response.

Greer, Yuan, and Gautreaux (2005) found that multiple-exemplar
instruction across saying and writing letters with a training subset of
dictated words led to children's producing either untaught written
responses or untaught vocal responses without direct instruction when
they could not do this prior to a multiple-exemplar intervention. During
the interventions, the children learned to respond to dictated words by
saying the letters and then writing the letters. Afterwards, they could
spell the words they originally could not in both written- and
spoken-response form. The two experiments, each with four participants,
used multiple-probe and time-lagged designs, again controlling for
maturation and history.

Development of Preverbal Cusps and Speaker and Listener
Capabilities That Allow the Joining of the Speaker and Listener

There is growing evidence regarding some of the prerequisite
developmental cusps and capabilities that make it possible for the
speaker and listener to be joined. Though space does not permit an
in-depth discussion here, a recent book and article describe these in
detail, along with the evidence base for them (Greer & Ross, 2008;
Keohane et al., in press). Here we provide a brief overview before
returning to a discussion of how the joining of speaker and listener in
the three speaker-as-own-listener functions leads to the most complex
human verbal behavior.

Numerous developmental foundations seem to provide the child with
the potential to benefit from coming in contact with experiences that
allow the joining of speaker and listener. For this part of our story we
draw on literature in developmental psychology, as well as work in
behavior analysis. We suggest that the process begins in the uterus,
where the pairing of the mother's voice with in-uterus feeding
apparently conditions her voice as a reinforcer for the observing
response of listening (Decasper & Spence, 1987). Decasper and
Spence, in a well-controlled experimental work, found that newborn
infants orient to their mother's voice. When vision develops, the
voice of the mother is paired with her face, along with other sensory
experiences (scents, olfactory kinesthetic or tactile sensory
experiences). Respondent and operant relations are joined in ways
described by Donahoe and Palmer (2004). When infants can see their
mother's face, feel her touch, detect scents, and taste the milk,
these sensory experiences provide introductions to see and do, as when
the children perform as their mothers do. Skinner (1957) made brief
reference to these processes as ostensive learning (see Stemmer, 1992,
for elucidation). In other words Pavlovian conditioning processes play a
large role (Leader et al., 2000; Longano, 2008; Roche &
Barnes-Holmes, 1997). Numerous responses, and evoking and eliciting
stimuli for those responses, accrue. These lead to still other relations
that we suggest are behavioral developmental cusps that, in turn, lead
to verbal behavior. The acquisition of conditioned reinforcement for the
correspondence between observing and producing appears key.

Emitted Behavior and Sensory Experiences

Vocal sounds, along with other naturally selected emitted behavior
(e.g., swimming motions present before and after birth), are emitted
from the outset and come to be related to the aforementioned observing
responses (Donahoe & Palmer, 2004; Novak, 1996). Stereotypical play
with production or emission of motor movement, including vocal sounds
and observing, occurs simultaneously. The range of observing responses
accrues, including conditioned reinforcement for the correspondence
between what is observed and what is produced. Observations involving
the senses of smell, taste, oral mouthing, touch, and the relation
between beingtouched and touching progress (Luciano &
Polaino-Lorente, 1986; Meltzoff, 1996; Meltzoff & Moore, 1983).
Derived relations between the behavior of caretakers and the child
accrue, including touching and imitation, because of the conditioned
reinforcement for correspondence (Meltzoff, 1983; Pelaez-Nogueras et
al., 1997; Poulson, Kymiss, Reeve, Andreatos, & Reeve, 1991).

Movement and Observation

Movement and observing responses come under the control of both
visual and auditory stimuli, either separately or paired. These operants
occur and are maintained by generalized reinforcement in the form of
conditioned auditory, kinesthetic, olfactory, gustatory, and visual
stimuli (e.g., mother's voice, touch, and smile). Imitation
responses occur in response to caregivers' movements and gestures,
and parroting occurs in response to caregiver vocal sounds. It is
possible that the source for the emission of these and other copying
responses is conditioned reinforcement for the correspondence between
observing and producing responses. Generalized imitation is one of the
cusps and capabilities that accrue from conditioned reinforcement for
the correspondence between observing and producing.

Evidence for Pre-Speaker and Pre-Listener Cusps

These preverbal foundational cusps lead to the development of the
separate speaker and listener operants. They also precede the joining of
speaker and listener within the skin that constitutes being truly verbal
(Barnes-Holmes et al., 2001; Home & Lowe, 1996). These cusps include
acquisition of reinforcement for observing responses. Several
experiments have demonstrated a functional relation between
reinforcement conditioning protocols and acceleration of learning
associated with these senses. Children who are severely delayed do not
orient or attend to voices, look at novel visual stimuli, or leaf
through children's books; rather, they engage in stereotypy or
"self-stimulation." Our work suggests that they have missed
the early and incidental conditioning of voices that occurs for
typically developing infants reported by Decasper and Spence (1987). For
example, we found that when we conditioned recordings of voices as
reinforcement for listening (measured by children's choosing to
listen to recordings of human voices in free play), children learned
listener discriminations that they could not learn prior to the
conditioning process (Keohane, Greer, & Ackerman, 2006c). Similarly,
visual match-to-sample learning occurred when visual stimuli attained
reinforcement for visual observing as a function of stimulus-stimulus
conditioning processes (Keohane, Greer, & Ackerman, 2006b). These
children could not master visual match-to-sample tasks without extensive
prompting procedures. After the visual stimuli were conditioned as
reinforcers for observing, the children could master visual
match-to-sample instruction solely via the use of learn units. Thus,
stimulus-stimulus pairing procedures were performed until the various
stimuli acquired conditioned reinforcement for observing; subsequently,
the children required 4 to 10 times fewer learn units to master relevant
instructional objectives. These findings are consistent with
Dinsmoor's (1983) basic findings of the observational facilitation
of discrimination learning in pigeons. These are cusps, because they
allow children to learn from stimuli they could not contact before.

Another developmental cusp that we believe is foundational to
verbal behavior is the cross-modal capacity for sameness (Engelmann
& Carnine, 1991; Keohane, Greer, & Ackerman, 2006a). Children
who have difficulty matching are often lacking this developmental
capability. The intervention that we use to induce this is a protocol
that we call cross-modal sensory matching. In that procedure, we rotate
having children match across the senses. They match, in rotated
presentations, scents, textures, olfactory stimuli, sounds, and visual
stimuli (i.e., multiple-exemplar instruction). Once they master these at
criterion levels, their learning across matching and other realms
accelerates significantly (Keohane et al., 2006a). Because the notion of
sameness across the senses is an arbitrary human invention, developing
the capacity for sameness may lay the foundation for the kinds of
cross-modal arbitrary applicable relations, or those relations that are
not controlled by the physical attributes of stimuli, that are found in
emergent verbal behavior and its subsequent potential in categorizing
functions. See Keohane et al. (2009) for a summary of this research.

Acquiring Listener Cusps

After children acquire the foundational observing cusps (capacity
for sameness, conditioned reinforcement for tabletop stimuli or
print/pictures, conditioned reinforcement for observing voices and
faces, and generalized imitation), they can be brought under the control
of the vowel/consonant commands of others, as in hearing what others say
and doing what they hear--a new cusp. When children respond
differentially and correctly to two or more arrangements of vowel and
consonant sounds produced by a speaker, we identify this as the
beginning of "listener literacy." We have used a protocol that
we call "listener emersion" to induce basic listener literacy
that we tested using multiple-probe time-lagged designs to control for
maturation and history. In this procedure, we provide a sequence of
instructions such that the child must respond to sets of commands based
only on properties of speech. First, they master the sets (usually five
to six sets of five commands) to mastery, with frequent recombination of
component sets that include a nonsense command, after which they must
respond to sets at 30 responses per minute. Finally, they respond to
recorded commands given by different voices (Greer, Chavez-Brown, et
al., 2005). Once they have achieved this basic listener literacy, they
acquire educational repertoires 4 to 10 times faster than before they
mastered listener literacy.

Phonemic Control

Chavez-Brown and Greer (in press), also in multiple-probe and
time-lagged experiments across six children, found that an intervention
involving the acquisition of selection responses for matching vocal
speech sounds of others led to echoics or clear speech by children
lacking those capabilities. The children were taught to activate
switches to accurately match recorded spoken words when the choice was
an accurate match and an inaccurate match. The experimenter pressed a
switch sounding a spoken word that was the target sample; the
experimenter then pressed two switches, one with the correct match and
one with a nonmatching word. The child then pressed a switch for the
matching word to attain a correct response. Thus, mastery of auditory
selection responses appears to have assisted the children in matching
their own speaker responses that consisted of their emitting
point-to-point correspondence between hearing words and saying those
words. Hearing and matching the consonant/vowel sounds of others lead to
matching one's own emitted and heard vocal sounds (Chavez-Brown
& Greer, in press; Marion et al., 2003). Perhaps saying phonemic
sounds or phonemic blends at first results in automatic reinforcement
for the production of vocal responses with point-to-point correspondence
that has not yet acquired verbal functions. We refer to this type of
responding as parroting because the correspondence itself becomes
reinforcing as a result of the infant's experiences (M. Sundberg et
al., 1996). At the next stage, saying the vowel/consonant has effects on
the behavior of caretakers who mediate for the speaker. At this point,
the echoic is controlled by a history of having the echoed sounds affect
the behavior of the listener; the response is not controlled by
automatic reinforcement. The response now has a speaker function.
Instances of the emission of certain vowel/consonant sounds specify
reinforcers, and the class of mands is formed (Williams & Greer,
1993). In other cases, instances of emission of certain vowel/consonant
sounds results in attention from caretakers, and if attention is a
conditioned reinforcer, the tact operant function forms (Tsiouri &
Greer, 2003). At this point, the response learned under mand conditions
is not likely to function as a tact or vice versa, consistent with
findings by Lamarre and Holland (1985), Twyman (1996), and Williams and
Greer (1993).

Acquiring Speaker Cusps: Tact Speaker Operants

Once the listener literacy and the speaker capabilities are in
place for children with language delays, we can expand the tact
repertoire with intensive tact instruction. However, even with all of
the above listener and speaker repertoires, children without the Naming
capability acquire tacts, or listener responses, only with direct
instruction.

Probably the most important speaker response that is needed is the
tact (Gewirtz, 1969; Greer & Ross, 2008; Home & Lowe, 1996;
Skinner, 1957). When attention from adults is a conditioned reinforcer,
which apparently originates from very early pairings or from
interventions like those we described previously, tacts can be acquired
via direct instruction. Attention and approval are often missing as
reinforcers in children with language delays (Greer, Singer-Dudek,
Longano, & Zrinzo, 2008); however, once attention is a reinforcer,
the emission of tacts is an efficient means for the child to attain
reinforcement. Although mands are useful, relevant unconditioned
establishing opportunities are limited to conditions of deprivation and
alleviation of aversive conditions. In homes with good caretakers,
aversive conditions are often avoided, and children are not under
frequent deprivation. However, tact responses are a limitless means for
reinforcement because the conditioned motivational conditions have to do
with momentary deprivation of social attention, whether that deprivation
is the result of direct deprivation (Tsiouri & Greer, 2003, 2007) or
observational deprivation (Greer & Singer-Dudek, 2008; Greer,
Singer-Dudek, Longano, et al., 2008; Singer-Dudek, Greer, &
Schmelzkopf, 2008).

In our developmental interventions with children for whom we have
provided the basic speaker and listener capabilities through
interventions, we emphasize the expansion of the tact repertoire. To do
this, we use the intensive tact protocol (Lyndon, Healy, Leader, &
Keohane, 2008; Pereira-Delgado & Oblak, 2007; Pistoljevic &
Greer, 2006; Schauffler & Greer, 2006). In this procedure we
increase tact instruction in addition to maintaining pre-intervention
levels of instruction across all other curricular areas. In several
controlled experiments, we have found that this instruction has led to
significant increases in children's emission of spontaneous pure
tacts (tacting stimuli without being asked) in noninstructional
settings, such as lunchtime, free play, or transition (Lydon, et al.,
2008; Pistoljevic & Greer, 2006, Schauffler & Greer, 2006).
Interestingly, the majority of the tacts emitted in the noninstructional
settings are not those that were taught in the tact instruction. In
another study, there was a significant increase in "wh"
questions in addition to increases in tacts (Reilly-Lawson & Walsh,
2007). These findings suggest to us that the intervention provides new
means for children to receive increased attention and even provides
establishing operations for "wh" questions that, in turn,
allow the child to recruit more tacts and subsequent attention by
emitting more frequent tact responses. At this point, the developmental
learning history has established attention as a conditioned reinforcer
for tacts and observational responding, a cusp that appears to develop
incidentally for typically developing children.

On the listener side, the child builds on prior cusps such that
after hearing the "name" or tact of stimuli, while jointly
attending to the stimuli spoken of by a speaker, she or he can respond
as a listener without direct instruction (e.g., "That's a
robin," followed by the child's pointing to a robin when asked
to do so (Feliciano, 2006; Home et al., 2004). In the Feliciano
experiment, children with no or little speech acquired the listener half
of Naming as a result of multiple-exemplar instruction across the
matching and hearing condition and the listener responses of pointing,
suggesting that the listener half can be acquired prior to the
child's having a speaker response. Presumably, as children like
these acquire speaker responses, both the speaker and listener responses
of Naming accrue, and in fact, this did appear to be the case for one of
Feliciano's participants, who began to echo the listener
instructions. Covert echoics are suggested as one possible source of the
reinforcement for Naming (Longano, 2008; Lowenkron, 1991, 1998), while
stimulus-stimulus pairing is the possible source for why the echoic may
reinforce Naming (Longano, 2008; Stemmer, 1992). It is also possible
that both of these play a role. Full Naming then emerges as the listener
and speaker components of observing and producing are further joined.
After hearing tacts of objects, and later two-dimensional
representations, the child echoes the tact form and emits pure tacts and
impure tacts, as well as the listener component of naming (Fiorile &
Greer, 2006; Gilic, 2005; Greer, Stolfi, et al., 2005, 2007).

In the above sequence, the listener components and other
observational components may proceed at different rates compared to the
speaker and other production components. However with Naming and the
other speaker-as-own-listener capabilities (see and do and self-talk
conversational units), the verbal observing and producing are joined, at
least to some degree.

Transformation of Establishing Operations Across Mands and Tacts

Establishing operation is a term used to describe something that
momentarily alters the effectiveness of a stimulus as reinforcement
(Michael, 2004). The reinforcement for a mand is specified, whereas the
reinforcement for a tact is generalized and usually comes in the form of
feedback from a listener. However, the form of the behavior itself (the
actual word) may of course be identical for both the mand and the tact,
as described above. At some point in children's development,
learning a response as a tact results in the untaught capability to emit
the response under mand establishing operation conditions or vice versa
(Arntzen & Almas, 2002; Petursdottir, Carr, & Michaels, 2005).

Nuzzolo-Gomez and Greer (2004) found in an experiment that
controlled for maturation and history that multiple-exemplar instruction
could lead to this capability. They identified children with language
delays for whom the mand and tact functions for responses were
independent. If children were taught a tact, they could not use the
response under the establishing operations for the mand or vice versa.
The experimenters then provided a multiple-exemplar instructional
intervention in which a set of speech responses was taught across both
types of establishing operations using contrived conditions. For
example, after the child had learned a tact function for a word but
could not use it as a mand, conditions were arranged in which the child
needed to emit the response under mand conditions. Similarly, responses
that had mand functions were placed under contrived establishing
operations for the tact function. The two functions were rotated for a
subset of trained responses until the responses had both mand and tact
functions controlled by the contextual establishing operations for the
respective functions. After the intervention, the children could use the
pretest words that were formerly independent responses in either
condition. In addition, following the MEI intervention, they were taught
novel responses in single functions and they could emit the untaught
functions. Greer et al. (2003) and Nirgudkar (2005) replicated these
findings, again in experiments controlling for maturation and history.
We characterize the onset of this capability as the acquisition of
"transformation of establishing operations" across mand and
tact functions. The derived relations between the mand and tact
functions were controlled by the relevant contextual establishing
operations after the intervention and were not controlled by them
before; thus, the responses were transformed from control by one type of
contextual condition to control by either condition.

Suffixes in the Tact Repertoire

Once the reinforcement contingencies for tacts are in place,
further expansion of a tact repertoire may occur when autoclitic frames
in the form of affixes are combined with tacts. Greer and Yuan (2008)
tested the effects of contextual control--pictures taught for
conditional stimulus control--on the emergence of novel past tense verb
forms in children who lacked that capability. The intervention required
the children to master the contextual control for emitting past tense in
multiple-exemplar instruction that rotated learn units across present
tense and past tense responses. They were taught to emit the present
tense when pictures were shown of children engaged in the verb actions
where the sun and a blue sky were present. They were also taught to add
the "-ed" ending when the pictures had dark skies and a moon.
This intervention resulted in abstraction of the "-ed"
autoclitic frame (a partially conditioned tag that changes the meaning
of other verbal behavior) to novel regular and irregular verbs.
Emissions of novel verb forms such as "He singed last night"
have been regarded as "benchmarks of arguments that language is
acquired by a neural network independent of experience" (Pinker,
1999, p. 190). In a similar study on suffixes, Speckman and Greer (2006)
implemented an MEI intervention for teaching a subset of positive and
comparative regular and irregular adjectives as well as
"contrived" adjective forms, again using a multiple-probe and
time-lagged experimental design. The procedure induced derived relations
for the "-er" autoclitic frame across regular, irregular, and
contrived adjectival forms (e.g., "blooby" and
"bloobier"). Both of these were experimental analyses that
controlled for maturation and history. The recombination of tact forms
with abstracted autoclitic frames as affixes allows for a significant
increase in children's vocabulary, as well as a more precise
speaker capability (i.e., autoclitic functions as in "the bigger
one"). These findings also weaken the face validity of linguistic
developmental theories that deny the role of learning (Pinker, 1999).

Readiness for Joining the Listener and Speaker

At this point, the child is prepared to contact the contingencies
for joining the listener and speaker. Verbal episodes involve rotation
of speaker and listener exchanges between individuals. Conversational
units may be one of the strongest measures of socialization, in that
they consists of verbal interactions in which each person in a
turn-taking exchange is reinforced as both speaker and listener. Donley
and Greer (1993) experimentally identified contextual conditions (a
setting where only verbal interaction with peers was possible) that
resulted in the emission of conversational units between peers with
mental retardation. Chu (1998) also experimentally demonstrated
contextual conditions for inducing and expanding conversational units
between children with autism and nonhandicapped siblings in two separate
experiments.

We suggest that the joining of the listener and speaker progresses
from listener-speaker rotations with others as a likely precedent for
the three major components of speaker-as-own-listener--say-do
correspondence, self-talk conversational units, and Naming. Several
experiments suggest that most, if not all, of the foundational
components described earlier must be present for Naming to emerge as a
result of the Naming MEI protocol (Longano, 2008; Pistoljevic, 2008;
Speckman-Collins, Park, & Greer, 2007). Introducing Naming
instruction to a child who does not orient to voices, faces, other
visual stimuli, or sounds is likely futile. Moreover, children who lack
auditory selection, fluent echoics, and tacts cannot likely profit from
the protocol to induce Naming. These appear, at present, to be the
developmental cusps on which Naming and other speaker-as-own-listener
capabilities are built. It is probable that many more remain to be
identified.

The rotation of speaker and listener roles, or turn-taking between
individuals, where the reinforcement for the listener and speaker roles
is present (Donley & Greer, 1993), prepares the way for joining of
the speaker and listener in the individual in self-talk (Lodi &
Greer, 1989) and say-and-do correspondence (Paniagua & Baer, 1982).
The intercept of listener and speaker within the skin, in turn, makes
the subsequent more complex verbal behavior possible. These independent
observing and producing responses come to be joined by a sequence of
exemplar experiences and pairing experiences that result in the higher
order classes that we describe next.

More Complex Verbal Behavior Made Possible by the Intercept of
Speaker and Listener

The Reader and Writer Learn to Listen

Reading involves a form of listening, just as saying and doing
involve a form of listening. Phonemic sounds lead to word sounds, and
the Naming capability allows prior experiences to result in
comprehension and conditioned seeing. At the same time, the writer must
come to be controlled by the effects of his or her writing on a reader
(Reilly-Lawson & Greer 2006), and the reader must read and do
accurately. Reilly-Lawson and Greer (2006), building on experimental
research by Madho (1997) and Marsico (1998), arranged an intervention in
which the writer was required to continue to rewrite until the behavior
of a reader corresponded with the objective the writer sought from the
reader. This intervention led to significant improvements in both the
functional and structural components of writing. Jadlowski (2000), in a
controlled experiment, found that editing others' writing acted to
decrease the number of rewrites participants needed to affect the
behavior of readers. We suggest that, in all three of these experiments,
the effects that accrued from these interventions were derived from
speaker-as-own-listener joining print and that the
speaker-as-own-listener plays the key role in self-editing. The speaker
must listen, so to speak, to what she or he has written, as the reader
would listen. Listening to what one has written is key to
self-editing-writer-as-own-reader. When the reader listens to his or her
textual response and has the Naming capability (i.e., the relevant tacts
are present or a minimal number are present), comprehension and its
reinforcement occur immediately. By "comprehension and its
reinforcement," we mean that the reader responds to precise
instructions for technically reinforced purposes, or the reader's
senses are extended for emotional effects.

Acquisition of self-editing (listening to what one has written
relevant to the audience the writer seeks to affect), verbally governed
responding (responding to written or spoken verbal stimuli), and
verbally governing responding (writing or speaking algorithms evokes
simple responses or complex problem solving by listeners or speakers)
result from the joining of print to Naming, self-talk, and say-do
correspondence. These make problem solving using the methods of
authority, logic, and science (Peirce, 1935) possible because they are
types of verbally governing and verbally governed behavior.

Conclusion

We propose that our evidence on the identification and induction of
missing verbal capabilities, and experiments isolating the instructional
histories for doing so, suggest a developmental sequence, some of which
we describe above and in other articles (Greer & Keohane, 2005,
2006; Keohane et al., in press) and a book (Greer & Ross, 2008).
Moreover, the protocols derived from these experiments provide ways to
provide children with missing verbal capabilities.

The findings from Naming, Relational Frame Theory, Stimulus
Equivalence, mainstream developmental psychology and verbal development
research contributed to our theory of verbal development. Indeed the
basic research on the existence and sources of various types of emergent
behavior provided us with the questions and the tools to identify and
induce what we propose are nonverbal and verbal developmental cusps and
capabilities. Rosales-Ruiz and Baer's (1996) paper on the concept
of behavioral developmental cusps, the concept of higher order operants
(Catania, 2007), Horne and Lowe's (1996) seminal work on Naming,
the pioneering work on emergent behavior (Sidman, 1986), major
contributions of Relational Frame Theory research, and our interaction
with behavior analysts in Ireland led us to procedures that
revolutionized what we could do with children. The new and more complete
account of verbal behavior that emerged from these various programs of
research made it possible for us to provide an account of verbal
development.

One of the reviewers asked us what our position was on the
different theories pertaining to emergent verbal behavior. Rather than
differences, we saw amazing consistency for our purposes--the induction
of verbal capabilities in children who were missing them. For example,
Naming is a higher order operant (Catania, 2007; Home & Lowe, 1996)
or overarching operant (Hayes et al., 2001), and this suggested that
verbal developmental capabilities and some cusps were themselves higher
order operants. Barnes-Holmes et al. (2001) provided a convincing
argument that to be truly verbal the listener and speaker must be joined
in ways that make arbitrarily applicable relations possible--both
speaker and listener operants are foundational, but ultimately it is the
intercept that is fully verbal. Derived relations make transformation of
stimulus control possible, and the contribution of Sidman (1986) was
simply basic to all of this. Hayes et al. (2001) identified and
profoundly expanded the potential explanatory role of relational
responding and suggested sources in MEI histories. Horne and Lowe (1996)
discovered Naming as a verbal developmental capability, and this led us
to consider other steps in development as verbal developmental
capabilities, too. Lodhi and Greer (1989) identified self-talk
conversational units, and Paniagua and Baer (1982) identified say-and-do
correspondence early on. Foundations of verbal behavior, like the
development of observational stimulus control, also owe a great deal to
Pavlovian second-order conditioning that underlies complex derived
relations and allows verbal capabilities to emerge (Donahoe &
Palmer, 2004; Leader et al., 2000; Longano, 2008). There is now
considerable evidence about the role of the listener in verbal behavior,
making the account of verbal behavior more complete--we know much more
about the origins of so-called novel verbal behavior. Though there are
differences in interpretation about these various phenomena, we are
interested in findings and theories that work in inducing verbal
behavior.

We drew on the similarities in the findings of these programs of
research to deal with seemingly intractable learning problems. What we
viewed as convergent findings suggested that the learning problems that
we encountered regularly were, in fact, behavioral developmental
obstacles that were missing higher order operants and relational
responding. Findings and interpretations from all of the work suggested
possible interventions to induce verbal capabilities in children who
were missing them. The interventions have consisted of multiple-exemplar
instruction across listener and speaker responses (Fiorile & Greer,
2007; Greer, Stolfi, et al., 2005, 2007), saying and writing (Greer,
Yuan, et al., 2005), contextual establishing-operations control for
manding and tacting (Nuzzolo-Gomez & Greer, 2004), contextual
control for literal and metaphoric expressions (Meincke-Matthews, 2005),
and contextual control for suffixes (Greer & Yuan, 2008; Speckman
& Greer, 2006). Other interventions included conditioning
reinforcement of stimuli for the range of observing responses (Dinsmoor,
1983; Donahoe & Palmer, 2004; Keohane, Greer, & Ackerman, 2006a,
2006b, 2006c; Longano & Greer, 2006; Tsai & Greer, 2006) that
resulted in accelerated learning and developmental cusps that were not
capabilities, yet seem to be foundational cusps that prepare the
potential for capabilities. We are exploring other possibilities and
there are likely many more. The range of findings across verbal
capabilities, and their foundations, made the proposal of a verbal
developmental theory irresistible.

However, any theory should be treated with caution, and ours is no
exception. For example, we presume much in extrapolating our findings
from work with children with language or learning delays to the verbal
development of typically developing children. Moreover, we do not have
the benefit of large-group studies that have been the mainstay of
traditional approaches to development. Population studies have been the
mainstay of developmental psychology, where the population is grouped by
age. Of course, group studies are necessary for studying populations and
testing for generality from representative samples of populations to the
populations themselves. However, they do not provide tests for
generality to individuals. Also, presuming that age constitutes the
population for a particular developmental capability/cusp is just that:
a presumption. Our objective is to isolate the role of experience.
Experimental analyses at the level of the individual are useful for
obtaining generality to individuals, and it is individuals who are our
targets. We have also done several studies that combined
experimental-control-group designs with single-case designs, and this
approach may provide the means for testing the generality of findings to
both individuals and populations.

Many developmental theories are driven by hypotheses and are
essentially deductive in nature. However, all theories need not be
deductive. While there are some compelling advantages of deductive
approaches, there are equal if not stronger advantages to an inductive
approach. The theory we propose grew out of a more inductive approach.
In such an approach, replications of the findings with individuals who
have similar cusps and capabilities expanded the generality to
individuals with similar characteristics. Moreover, the time-lagged
multiple-probe designs, thorough knowledge of the children's
entering capabilities, and close, continuous contact with the
participants following the experiments contributed to the generality and
validity of the procedures. We are confident that the educational and
developmental contributions are robust because we have replicated them
with many children.

However, the validity of our interpretation of this work, remains
to be tested by research and close scrutiny by other scholars. Although
we did begin to test the applicability of Skinner's (1957) verbal
behavior theory to solving verbal deficits in a program of research that
began over 27 years ago, we did not set out to find a theory of verbal
development, rather, it found us. The early direct applications of
Skinner's theory were helpful in inducing speaker operants, in
particular, identifying the source and procedures to teach
"spontaneous speech." However, with the advent of Naming
theory, relational frame theory, stimulus equivalence theory, the notion
of higher order operants and behavioral developmental cusps, and the
incorporation of the listener role, a more complete account of verbal
behavior emerged, particularly the joining of the speaker and listener
within the skin. Our research then built on this more complete
understanding of verbal behavior to identify verbal developmental cusps
and capabilities and the means to induce them. The sequence of these
cusps and capabilities suggests a trajectory of verbal development in
children. Hopefully, this theory will prove a useful addition to the
body of scholarship devoted to understanding language and its evolution
and development.

CHAVEZ-BROWN, M, & GREER, R. D. (in press). The effects of the
acquisition of a generalized auditory word match-to-sample selection
repertoire on the emergence or improvement of the echoic repertoire. The
Analysis of Verbal Behavior.

FIORILE, C. A., & GREER, R. D. (2007). The induction of Naming
in children with no echoic-to-tact responses as a function of multiple
exemplar instruction. The Analysis of Verbal Behavior, 23, 71-88.

GEWIRTZ, J. L. (1969). Mechanisms of social learning: Some roles of
stimulation and behavior in early human development. In D. A. Goslin
(Ed.), Handbook of socialization theory and research (pp. 57-72).
Chicago: Rand-McNally.

GREER, R. D., NIRGUDKAR, A., & PARK, H. (2003, June). The
effect of multiple exemplar instruction on the transformation of mand
and tact functions. Paper presented at the annual international meeting
of the Association for Behavior Analysis, San Francisco.

GREER, R. D., & O'SULLIVAN, D. (2007, May). Preliminary
report of Naming for 2-dimensional stimuli in typically developing first
graders. Paper presented at the annual international meeting of the
Association for Behavior Analysis, San Diego.

GREER, R. D., & ROSS, D. E. (2004). Verbal behavior analysis: A
program of research in the induction and expansion of complex verbal
behavior. Journal of Early Intensive Behavioral Intervention, 1(2).
141-165. Retrieved February 20, 2005, from
http://www.jeibi.com/JEIBI-l-2.pdf

GREER, R. D., STOLFI, L., CHAVEZ-BROWN, M., & RIVERA-VALDEZ, C.
(2005). The emergence of the listener to speaker component of Naming in
children as a function of multiple exemplar instruction. The Analysis of
Verbal Behavior, 21, 123-134.

HELOU-CARE, Y. (2008). The effects of the acquisition of naming on
reading comprehension with academically delayed middle school students
with behavioral disorders. Unpublished doctoral dissertation, Columbia
University, New York.

KEOHANE, D. D., GREER, R. D., & ACKERMAN, S. A. (2006a, May).
Effects of teaching sameness across the senses on acquisition of
instructional objectives by pre-listeners and listeners and
emergent-speakers. Paper presented as part of a symposium at the annual
international Association for Behavior Analysis, Atlanta, Ga.

LONGANO, J. M. (2008). An investigation of the source of
reinforcement for Naming as a function of a second order classical
conditioning procedure. Unpublished doctoral dissertation, Columbia
University.

LONGANO, J., & GREER, R. D. (2006). The effects of a
stimulus-stimulus pairing procedure on the acquisition of conditioned
reinforcement for observing and manipulating stimuli by young children
with autism. Journal of Early and Intensive Behavior Interventions, 3,
135N-150. Retrieved February 22 from
http://www.behavior-analyst-online.org

LUCIANO, M. C, & POLAINO-LORENTE, A. (1986). The effects of the
acquisition of prerequisite behavior of non-vocal children and vocal
imitation in children with severe retardation. The Psychological Record,
36, 315-332.

LYDON, H., HEALY, O., LEADER, G., & KEOHANE, D. D. (2008). The
effects of intensive tact instruction on three verbal operants in
non-instructional settings for two children with autism. Paper submitted
for publication.

PEREIRA-DELGADO, J. A., & OBLAK, M. (2007). The effects of
daily intensive tact instruction on the emission of pure mands and tacts
in non-instructional settings by three preschool children with
developmental delays, Journal of Early and Intensive Behavior
Intervention, 4(2), 392-411. Retrieved May 1, 2007, from
http://www.behavior-analyst-online.org

REILLY-LAWSON, T. (2008). Phonemic control as the source of derived
relations between naming and reading and writing. Unpublished doctoral
dissertation, Columbia University.

REILLY-LAWSON, T., & GREER, R. D. (2006). Teaching the function
of writing to middle school students with academic delays. Journal of
Early and Intensive Behavior Intervention, 3, 151-169.

REILLY-LAWSON, T., & WALSH, D. (2007). The effects of
observational training on the acquisition of reinforcement for
listening. Journal of Early and Intensive Behavior Intervention,
430-452. Retrieved May 1, 2007, from
http://www.behavior-analyst-online.org

TSIOURI, I., & GREER, R. D. (2007). Different social
reinforcement contingencies in inducing echoics to tacts through motor
imitation responding in children with severe language delays. Journal of
Early and Intensive Behavioral Interventions, 3.

Columbia University Teachers College and Graduate School of Arts
and Sciences

JeanneMarie Speckman

The Fred S. Keller School

The authors may be contacted at Box 76 Teachers College Columbia
University, New York, NY 10027, or at rdgl3@columbia.edu. We would like
acknowledge the efforts of the editor and reviewers, who provided
uncommonly good feedback. We would like to give special thanks to
Reviewer B and the Editor, for helping us clarify points in the paper,
and for their encouragement.

To study Naming directly entails ... experimental investigation from
birth, of how the young child learns the behavioral relations
involved in Naming. This approach would certainly be more
parsimonious; it is also in the best tradition of behavior analysis.
Such a study would enable researchers to come to terms with the full
complexity of the phenomenon, both in terms of the conditions that
give rise to it and the interactions between the multisensory
stimulation and the multi-modal responding that it entails, including
emotional behavior and the effects of classical conditioning. (Horne
& Lowe, 1996, p. 238)

A cusp is a change that (1) is often difficult, tedious, subtle, or
otherwise problematic to accomplish, yet (2) if not made, means
little or no further development is possible in its realm (and
perhaps in several realms); but (3) once it is made, a significant
set of subsequent developments suddenly becomes easy or otherwise
highly probable which (4) brings the developing organism into contact
with other cusps crucial to further, more complex, or more refined
development on a thereby steadily expanding, steadily more
interactive realm. (p. 166)