Encounters with Noticing Part 2

If I’d actually drunk a bottle of tequila while trying to understand Schmidt’s Noticing Hypothesis last Tuesday, I would have woken up with a hangover, and these days the hangovers are so bad that I just can’t face them. So when I woke up the morning after, all was well; my surroundings were familiar, my wife was with me, there was nothing to make amends for. Reassuring, of course, but I confess to feeling nostalgia for my younger days. There’s nothing quite like the fun you have drinking; the Devil has all the best songs, they say, and I bet Hades had all the best cocktails. Easy to imagine getting the ferry across the Acheron, sitting around the lounge bar waiting to see where you were going (probably not to the Elysian fields!), banging back dry martinis with funny people like W.C. Fields (“I cook with wine. Sometimes I even add it to the food”, and Tommy Cooper (“I’m on a whisky diet. I’ve lost three days already), grateful that you’d never been a mere sober mortal.

Downstairs, I made a nice big mug of tea and took it to the study. There on the desk and on the monitor was all this stuff about the Noticing hypothesis. Not just Schmidt versus Truscott, and Gregg versus Krashen, and all the other SLA feuds, but also the famous Locke versus Leibniz debate and the equally famous Aristotle versus Plato debate about more or less the same thing. Aristotle wasn’t quite an empiricist, but certainly got the better of Plato on epistemology, while Leibniz is generally regarded as coming out on top against Locke. Specially the Leibniz-Locke debate still seems relevant today in the light of the latest challenge to nativist views on language learning, and I think Leibniz might have had some harsh words to say about the blurred lines between awareness, atttention and consciousness in Schmidt’s attempts to develop the Noticing Hypothesis.

Just to reassure those who might be unduly swayed by the likes of Penny Ur (and Scott Thornbury on a bad day) into thinking that they shouldn’t worry their heads with all all this theoretical stuff (just trust your instincts and polish your presentation skills), my motivation for sniffing around this particular theoretical stuff is to check on the foundations of our teaching. It’s a terrible job, the pay’s lousy, but somebody’s got to do it, right? Somebody’s got to check, that is, to see whether ‘noticing’ justifies all the explicit teaching done in its name. I suspect that the influential teacher trainers who rely on ‘noticing’ to justify their encouragement of everything from teaching a grammar-based syllabus to teaching as many lexical chunks as you can cram into a 90 minute class are talking baloney, and it should be made clear that their advice gets no support from any good research. On the face of it ‘noticing’ encourages bad teaching practice, and so needs to be carefully examined.

So here we go with Part 2. I left Part 1 face down on the carpet, exhausted by unsuccessful efforts to understand the Noticing Hypothesis. In the comments that followed, one particular problem was highlighted by Kevin Gregg, who said:

You can’t notice what is not in the input; and rules, for instance, or functions, are not in the input.

This prompted Thom to ask:

In what other way can anybody learn grammar if it is not by way of input?

Kevin’s on-going tussle with time (trains to catch, letters to write, shopping to do) prevented him from replying, so I’ll try.

Well it depends where you’re coming from, as they say. Empiricists, or rather, “‘empiricist’ emergentists” as Gregg calls them would say that input is the sufficient condition for learning an L2, and they’d probably caution against listening to any talk of mental grammars. Empiricists like Nick Ellis see all knowledge as coming from the information we get through our senses during our interaction with the environment, and with reference to language learning, the emergentists argue that we aren’t born with linguistic knowledge of any sort because we don’t need it. General learning devices (capable of making generalisations based on exemplars found in the input, for example) are all we need. In Nick Ellis’ words:

massively parallel systems of artificial neurons use simple learning processes to statistically abstract information from masses of input data. What evidence is there in the input stream from which simple learning mechanisms might abstract generalizations? The Saussurean linguistic sign as a set of mappings between phonological forms and conceptual meanings or communicative intentions gives a starting point. Learning to understand a language involves parsing the speech stream into chunks which reliably mark meaning.

… in the first instance, important aspects of language learning must concern the learning of phonological forms and the analysis of phonological sequences: the categorical units of speech perception, their particular sequences in particular words and their general sequential probabilities in the language….

In this view, phonology, lexis and syntax develop hierarchically by repeated cycles of differentiation and integration of chunks of sequences.

On the other hand, nativists like Kevin Gregg, specially those who accept Chomsky’s principles and parameters UG theory, point to the knowledge young children have of language to argue that SLA is the result of an innate representational system specific to the language faculty acting on input in such a way that an L2 grammar is created. We are born with knowledge of various linguistic rules, constraints and principles. In interaction with the environment, which exposes us to ‘primary linguistic data’, we acquire a new, expanded body of linguistic knowledge, namely, knowledge of a specific language like English. This final state of the language faculty constitutes our ‘linguistic competence’, essential, but not sufficient for our ability to speak and understand a language. Additional knowledge about actual language use is acquired through other general learning mechanisms.

Whatever view we take of the SLA process, the question of how it starts (input) is obviously critical, but re-visiting Schmidt’s Noticing Hypothesis has led me to appreciate that the question of how it ends up is equally important. What finally gets acquired? To answer this question we need what Gregg calls a “property” theory of SLA – a theory of language, or, more precisely, of linguistic knowledge of the L2. What is the knowledge that is acquired when someone learns a second language? O’Grady (2005) notes that while the UG camp talk about problems sorting out categories and structures, the emergentists talk about sorting out words and their meanings, and this leads him to suggest that the disagreement about how we learn an L2 stems from a deeper disagreement about “the nature of language itself”. O’Grady (2005, p. 164) explains:

On the one hand, there are linguists who see language as a highly complex formal system that is best described by abstract rules that have no counterparts in other areas of cognition. …. Not surprisingly, there is a strong tendency for these researchers to favor the view that the acquisition device is designed specifically for language. On the other hand, there are many linguists who think that language has to be understood in terms of its communicative function. According to these researchers, strategies that facilitate communication – not abstract formal rules – determine how language works. Because communication involves many different types of considerations … this perspective tends to be associated with a bias toward a multipurpose acquisition device.

This excellent comment is echoed by Susanne Carroll (2001, p. 47), who distinguishes between

Classical structural theories of information processing which claim that mental processes are sensitive to structural distinctions encoded in mental representations. Input is a mental representation which has structure.

Classical connectionist approaches to linguistic cognition which deny the relevance of structural representations to linguistic cognition. For them, linguistic knowledge is encoded as activated neural nets and is only linked to acoustic events by association.

Carroll comments:

Anyone who is convinced that the last 100 years of linguistic research demonstrate that linguistic cognition is structure dependent — and not merely patterned— cannot adopt a classical connectionist approach to SLA.

O’Grady’s and Carroll’s remark remind me that the majority of scholars who are currently looking closely at how input ends up as knowledge don’t articulate a coherent answer to the crucial question: “What is the linguistic knowledge that is acquired?”. Many years ago, I myself made some effort to kick this question into the long grass. Gregg’s repeated insistence on the need for a property theory of SLA which describes what is acquired, prompted me to say in a book and in an article for Applied Linguistics that researchers could perfectly well get on with developing a theory of SLA without worrying about the damn property theory. In a short reply (I think he had a bus to catch that time), Gregg effortlessly dealt with my bleatings (the bus and, I like to think, our friendship saved me from the full Gregg treatment) and I’m now fully persuaded that he’s right to demand a property theory.

I think it’s the absence of a well-articulated property theory that makes it so difficult for Schmidt and others to explain how information from the environment ends up as linguistic knowledge of the L2. They accept that the knowledge acquired includes linguistic knowledge of, for example, the structure of an English verb phrase, and they insist that learning this knowledge depends on ‘noticing’ things in the input” But how, we must ask again, does ‘noticing’ audio stimuli from the environment lead to the acquisition of the linguistic knowledge demonstrated by proficient L2 users? Let’s take a quick look at the history of SLA research.

The shift from a behaviouristic to a mentalist view of language learning (sparked by Chomsky’s rebuttal of Skinner in 1957) prompted scholars in the field of psycholinguistics to see language learning as a process which goes on inside the brain and involves the workings of some kind of acquisition device. The, as yet unobservable, “black box” that we can refer to as an acquisition device is almost certainly not located in one particular part of the brain, might or might not be dedicated exclusively to language learning, might or might not make use of innate linguistic knowledge, but certainly does (somehow) enable us to receive, organise, store and retrieve, and manipulate ‘input’ so as to facilitate learning the L2.

And there it is: ‘input’. The Merriam-Webster dictionary says that the term was first used in 1953, in the context of computer design, to refer to data sent to a computer for processing. In the study of SLA, Corder (1967) was the first to suggest that we acquire the rules of language in a predictable way, and that the order is independent of the order in which rules are taught in language classes. This led Corder to suggest that there was a difference between input and intake.

The simple fact of presenting a certain linguistic form to a learner in the classroom does not necessarily qualify it for the status of input, for the reason that input is ‘what goes in’ not what is available for going in, and we may reasonably suppose that it is the learner who controls this input, or more properly his intake. This may well be determined by the characteristics of his language acquisition mechanism. (p. 165).

Here, input is what’s available, and intake is what the learner decides to take in. It’s not clear to me what either ‘input’ or ‘intake’ refer to, and anyway, as Schmidt (1990) points out, Corder contradicts himself by saying in the first sentence that the learner controls intake, and by then saying in the second sentence that his language acquisition mechanism does. More importantly for our hunt, Schmidt goes on to say that it’s not clear whether intake is the subset of input that makes it into short term memory, or whether it’s that part of input that has been sufficiently processed to now form part of the learner’s interlanguage system. The way Schmidt expresses this second point is instructive. Schmidt says that Corder’s treatment of intake does not make any clear distinction between that part of input used to comprehend messages and that part used “for the learning of form” (Schmidt, 1990, p. 139). Schmidt also endorses Slobin’s (1985) distinction between processes involved in converting input into stored data for the construction of language, and processes used to organise stored data into linguistic systems. Schmidt is obviously aware (sorry) of the problem of clearly identifying not just the level of conscious attention /awareness involved in noticing, but also the problems of clearly defining what is noticed and what (if any) processing goes on when learners notice whatever it is they notice.

Moving on to Krashen, his input hypothesis draws on the “natural order” of L2 acquisition that Corder drew attention to, and supposes that learners progress along a pre-determined learning trajectory which is impervious to instruction and controlled by a language acquisition device. Acquisition, Krashen says, is triggered by receiving L2 input that is one step beyond their current stage of linguistic competence. If a learner is at a stage ‘i‘, then acquisition takes place when he/she is exposed to ‘Comprehensible Input’ which belongs to level ‘i + 1‘. In Krashen’s model, learners only need comprehensible input and a low affective filter to acquire the L2, because once the i+1 input is received, Chomsky’s LAD does the rest. Almost needless to say, the trouble with Krashen’s input hypothesis is that he nowhere explains what comprehensible input consists of, or tells us how to recognise it.

Unsurprisingly, Schmidt’s not very impressed with Krashen’s badly-defined hypothesis, but it’s not just the lack of definition that Schmidt objects to; crucially, Schmidt insists that SLA is triggered by conscious attention. Krashen’s comprehensible input is, says Schmidt, much better seen as intake, itself defined as that part of the input which is ‘noticed’. Because what learners actually do is consciously attend to, notice, certain parts of the input, and the noticed parts becomes intake. Furthermore, since the parts of the input which aren’t ‘noticed’ are lost, it follows that noticing is the necessary condition for learning an L2. In his 1990 paper, at least, the claim is not, as so many now want to interpret the Noticing Hypothesis, “More noticing leads to more learning”, but rather, the much stronger claim “Learning can’t take place without noticing”.

In the next post, I intend to look at processing models and try to pin down Schmidt’s “technical” definition of ‘noticing’, which he says is “equivalent” to Gass’ ‘apperception’. Hmmm. I’ll also look at Suzanne Carroll’s very different view of input. She says:

The view that input is comprehended speech is mistaken and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!”

Post navigation

14 thoughts on “Encounters with Noticing Part 2”

I’m drawn to trying to understand this thread because I’m seriously interested and, long retired, have the time. I offer the following comments for demolition or comment. (1) Surely the place to start the search for “the truth” is in the speculations arising from the literature of academic/scientific study or research into actual people learning languages. (I take it we are always going to have to make do with speculations since we cannot get inside people’s brains to see what is going on there, the place where learning takes place.At best, neurolinguistics might be able to tell us which bits of the brain are activated for certain processes but not what precisely happens.). (2) Did I observe a tiny bit of learning/acquisition yesterday where the pleasure principle played a role? My German.speaking great nephew (son of my German nephew) aged 8 comes to me once a week ostensibly for a clarinet lesson. I’ve done a deal with him – we alternate between musical practice, which he does not enjoy much, and short visits on my PC to the so-called virtual world of Second Life. My account there is in English. When his mother collected him yesterday he did not mention the clarinet, he told her. “I’ve learned some English today, but not in school:. “Stop flying!”
He’d picked this up because he’d asked me in German how he could get his avatar to stop flying and I’d replied in English – “Left click on ‘Stop flying’.”. We can note this question and answer were situational, contextualised and there was a causal link – a click resulted in the observable figure, the avatar, dropping to the ground. Squeezing this mini example tight to get from it all I can I notice he picked up this phrase in English (1) When he was focussed on something else – how to manipulate an avatar in Second Life – he was not focussed on learning English (2) He was enjoying himself.

1. A lot of SLA research is based on actual people learning languages. Schmidt’s Noticing Hypothesis was originally based on studies of 2 people: Wes learning English and Scmidt himself learning Portuguese. Our speculations about what’s going on in the brain / mind lead us to propose hypotheses which must be made in such a way as to allow support and falsification by empirical evidence.

2. Whatever the explanation for how your great nephew picked up the English phrase, I think iwe can agree he did it incidentally, and, as Mike Long points out, the incidental learning of vocabulary doesn’t have to be noticed. Hope this helps, as they say, tho I confess that since I myself am still not sure what noticing refers to, it doesn’t help me much.

Hi Geoff. Isn’t there a difficulty in demanding that hypotheses regarding mental events (‘noticing’, ‘understanding’, ‘knowing’ etc) be supported by empirical evidence? The only empirical evidence that could be adduced for the occurrence or otherwise of any of these mental events, surely, would have to lie in the behaviour of the person held to be noticing, understanding or knowing. Since that behaviour (in the present case, language acquisition) is plainly the very thing that we are seeking to explain, how can this fail to lead to circularity? This, I think, is why attempts to pin down these terms are so frustrating.

• Learner production. The problem here is how to identify what has been noticed.
• Learner reports in diaries. Schmidt cites Schmidt & Frota (1986), and Warden, Lapkin, Swain and Hart (1995). The problem here, as Schmidt himself points out, is that diaries span months, while cognitive processing of L2 input takes place in seconds. Furthermore, as Schmidt admits, making diaries requires not just noticing but reflexive self-awareness.
• Think-aloud protocols. Schmidt agrees with the objection made to such protocols that studies based on them cannot assume that the protocols include everything that is noticed. Schmidt cites Leow (1997), Jourdeais, Ota, Stauffer, Boyson, and Doughty (1995) who used think-aloud protocols in focus-on-form instruction, and Schmidt concludes that such experiments cannot identify all the examples of target features that were noticed.
• Learner reports in a CALL context (Chapelle, 98) and programs that track the interface between user and program – recording mouse clicks and eye movements (Crosby 1998). Again, Schmidt concedes that it is still not possible to identify with any certainty what has been noticed.
• Merikle and Cheesman distinguish between the objective and subjective thresholds of perception. The clearest evidence that something has exceeded the subjective threshold and been consciously perceived or noticed is a concurrent verbal report, since nothing can be verbally reported other than the current contents of awareness. Schmidt argues that this is the best test of noticing, and that after the fact recall is also good evidence that something was noticed, providing that prior knowledge and guessing can be controlled. For example, if beginner level students of Spanish are presented with a series of Spanish utterances containing unfamiliar verb forms, are forced to recall immediately afterwards the forms that occurred in each utterance, and can do so, that is good evidence that they did notice them. On the other hand, it is not safe to assume that failure to do so means that they did not notice. It seems that it is easier to confirm that a particular form has not been noticed than that it has: failure to achieve above-chance performance in a forced-choice recognition test is a much better indication that the subjective threshold has not been exceeded and that noticing did not take place.

Schmidt goes on to claim that the noticing hypothesis could be falsified by demonstrating the existence of subliminal learning either by showing positive priming of unattended and unnoticed novel stimuli or by showing learning in dual task studies in which central processing capacity is exhausted by the primary task. The problem in this case is that in positive priming studies one can never really be sure that subjects did not allocate any attention to what they could not later report, and similarly, in dual task experiments one cannot be sure that no attention is devoted to the secondary task. Jacoby, Lindsay, & Toth (1996, cited in Schmidt, 2001: 28) argue that the way to demonstrate true non-attentional learning is to use the logic of opposition, to arrange experiments where unconscious processes oppose the aims of conscious processes.

In conclusion, it seems that Schmidt’s noticing hypothesis rests on a construct that is difficult to test empirically; it is by no means easy to properly identify when noticing has and has not occurred.

Then there’s this from Schmidt 2010:

Most empirical studies have been supportive of the Noticing Hypothesis. For example, using a clever crossword puzzle task to manipulate the focus of learners’ attention when exposed to instances of Spanish stem-changing verbs, Leow (1997, 2000) found that those who exhibited a higher level of awareness (“understanding”) learned the most; those who noticed instances but attempted no generalization learned next most; and there was no learning in the absence of noticing instances. Mackey (2006) used multiple measures of noticing and development to investigate whether feedback promotes noticing of L2 forms in a classroom context and whether there is a relationship between learners’ reports of noticing and learning outcomes. The findings of this study were that learners reported more noticing when feedback was provided, and learners who exhibited more noticing developed more than those who exhibited less noticing. Izumi (2002) conducted an experimental study to compare the effects of output and enhanced input on noticing and development. Izumi found that subjects demonstrated more noticing and more learning than did controls, and that enhanced input subjects exhibited more noticing but not more learning.

All of which, I think shows that Schmidt, a serious, thorough scholar, tried hard to provide evidence for his hypothesis to the satisfaction of many.

Why am I replying to myself? Oh, well, this is Ellis, copied from your part 2:

massively parallel systems of artificial neurons use simple learning processes to statistically abstract information from masses of input data. What evidence is there in the input stream from which simple learning mechanisms might abstract generalizations? The Saussurean linguistic sign as a set of mappings between phonological forms and conceptual meanings or communicative intentions gives a starting point. Learning to understand a language involves parsing the speech stream into chunks which reliably mark meaning.

… in the first instance, important aspects of language learning must concern the learning of phonological forms and the analysis of phonological sequences: the categorical units of speech perception, their particular sequences in particular words and their general sequential probabilities in the language….

In this view, phonology, lexis and syntax develop hierarchically by repeated cycles of differentiation and integration of chunks of sequences.

Can you tell me what this means? While you’re pondering the answer, I might point out that ‘artificial neurons’ is piffle; the similarity between ‘artificial neurons’ and real ones is about the same as that between octopi and string quartets. Damn, there goes that train again.

I gave that quote from N. Ellis precisely because I thought that using the Saussurean linguistic sign as a set of mappings between phonological forms and conceptual meanings or communicative intentions was a particularly imaginitive way to begin explaining how massively parallel systems of artificial neurons use simple learning processes to statistically abstract information from masses of input data. I’m disappointed to hear that artificial neurons can’t be played on a violin.

“In his 1990 paper, at least, the claim is not, as so many now want to interpret the Noticing Hypothesis, “More noticing leads to more learning”, but rather, the much stronger claim “Learning can’t take place without noticing.”

In 1989 (UHWPESL) and 1990 (ALx), noticing was necessary and sufficient, a very strong claim. But in the face of critiques by Truscott and others, and no doubt due to his own smarts and capacity for self-criticism, he subsequently modified his position, for which, see the 2010 paper referenced below, again (downloadable from his web-site). To save time, the following is from a paper of mine. It ends with direct quotes from an email exchange with Dick in summer, 2015.

“Like others before him (e.g., Truscott, 1998), Swan (2005) pointed out that knowledge of some constraints on rule application cannot be the product of noticing because there are no examples to notice. For instance, some verbs (give, offer,
promise, etc.) can figure in the double-object construction (e.g., The Minister promised Eubanks-Smythe a knighthood), but others (donate, present, explain, etc.) cannot (∗Eubanks-Smythe donated the Party a million pounds), requiring a
prepositional phrase, instead (Eubanks-Smythe donated a million pounds to the Party). Schmidt (2010, and elsewhere) readily acknowledged that learning in such cases indicates alternative learning mechanisms at work (theoretical accounts run
the gamut from innate linguistic knowledge to statistical learning), but maintained that the noticing hypothesis still applies to the great majority of language learning. The noticing hypothesis holds that learning requires attention and awareness, but not intention or understanding. Understanding is facilitative, but not required. He has never denied the possibility of incidental learning, for example, of vocabulary and collocations, but remains deeply skeptical of all claims of learning without awareness (a position with which I disagree). However, while instances have to be attended to and noticed, he recognizes that generalizations from those instances can be either explicit or implicit, implicit learning being a basic human learning mechanism that automatically detects regularities across instances. Schmidt’s position can now be summarized (briefly, for lack of space) not as saying noticing is required (noticing of surface features, at least) for all aspects of language, but as “more noticing means more acquisition,” and “more attention and more awareness,
means more acquisition” (R.W. Schmidt, personal communication, July 26, 2015).” (Long, 2016, p. 16)

Thanks for this. I was going to deal with the concessions Schmidt makes both in the 2001 and 2010 papers in Part 3, and this helps. Gass made the same point as Swan, didn’t she? There still seems to me to be some confusion about how the claim that “the noticing hypothesis still applies to the great majority of language learning” should be understood.

“I think it’s the absence of a well-articulated property theory that makes it so difficult for Schmidt and others to explain how information from the environment ends up as linguistic knowledge of the L2.”

this is a key issue and as far as i know (would love to get pointers to other explanations) MOGUL is the only “theory” that has outlined how noticing could work – i.e. noticing plays a major role in meta-linguistic development and may play a much more +indirect+ role in core (phonology + syntax) modular language development

Truscott, as I’m sure you know, suggests that noticing should be restricted to the metalinguistic realm. And, as I’m sure you also know, Suzanne Carroll is a big fan of Jackendoff. I intend to discuss all this in Part X of “Encounters” 😦