Environmental policy with regard to noise abatement has traditionally only considered whether the noise levels in a given setting are high enough to be deemed a source of annoyance, disturbance, or threat to well being. However, laboratory studies using both simple and more complex work-related tasks have shown that task-irrelevant sound, regardless of its intensity, intrudes upon cognitive processing and disrupts performance substantially; furthermore, its damaging effect does not diminish with repeated exposure to the sound over time. For tasks that require short-term memory processing (particularly the short-term maintenance of order information) sound assumes disruptive power if it is acoustically varying over its time course. However, other properties of sound (e.g., the semanticity of speech) can incur an additional cost if the primary task necessitates or tends to evoke the extraction of meaning. It will be argued that interference in each case is explained by reference to a conflict between two concurrent mental processes; that being demanded by the task and that being involuntarily applied to properties of the sound. Such harmful effects, as well as having direct consequences for the general well-being of those working in noisy environments, may have far reaching consequences for health insofar as extraneous sound is a feature of many safety-critical work settings. Implications for noise abatement policy are highlighted.

The most notable functional characteristic of audition, relative to vision, is the absence of any simple physical means, within the peripheral auditory hardware, of modulating the likelihood of perceiving the information. That is, we cannot easily 'shut our ears' or 'listen away' in the same way as we can 'shut our eyes' or 'look away' from something (hence the dubbing of audition as the 'sentinel of the senses'). Some authors have argued that such a characteristic is highly functional in terms of survival and indeed may serve as an early warning device to trigger both the external (eye and head movement system) and internal systems (mechanisms of covert attentional selection) subserving the control of visually-based information (e.g., Spence & Driver, 1994, 1996).

However, the sentinel capacity of audition also comes with a price: Evidence from laboratory studies suggests that auditory information is organised into coherent streams even when it is unattended and that this process has the propensity to corrupt certain types of mental operations. Most notably, the studies indicate that sound need not be loud to be intrusive and that the effect is not fleeting. The incessant intrusiveness of even low intensity sound has potentially negative consequences for health both directly in terms of employee discomfort, stress, and annoyance, and more indirectly as a threat to performance in health/safety-critical settings. Thus, rather than focus on the theoretical implications of this phenomenon, which have been extensively discussed elsewhere (e.g., Baddeley, 1990; Jones, Beaman & Macken, 1996), our emphasis will be in disseminating the empirical facts about the deleterious effects of unwanted sound and upon their practical implications for noise abatement.

Laboratory studies of the detrimental impact of sound

(i) The effects of white noise.

Although not the focus of the present review, it is worth noting first some of the laboratory findings pertaining to the effects of white noise (or aperiodic sound) on cognitive performance if only to highlight the ways in which the key issues in the study of the intrusiveness of periodic sound are quite different. Much of the research on the effects of white noise or aperiodic sound (so called since it has no clear cycle of repetition and thus lacks a clear tonal quality) focused on establishing the threshold of intensity at which noise impaired cognitive performance. However, the findings were rather inconsistent and difficult to integrate into any coherent explanatory framework (see Jones & Broadbent, 1991; Smith & Jones, 1992, for reviews). For example, the presence of continuous white noise can either facilitate (Poulton, 1977) or reduce (Broadbent, 1979) performance on vigilance tasks. Broadbent (1979) proposed that the detrimental effects on such tasks depends on the noise being above 95 dB(A). Moreover, other tasks such as mental arithmetic (e.g., Woodhead, 1964) were negatively affected by intermittent white noise, but only if the noise bursts were presented during the intake of information or during response execution. A similar picture emerged when the tasks used involved short-term memory (STM) processing: The noise had to be loud to produce even modest disruptive effects (e.g., Baddeley & Salame, 1982), or again, the timing between the items to be remembered and the burst of noise was crucially important (Salame & Wittersheim, 1978). And yet other studies found that continuous noise actually facilitated STM performance slightly (e.g., Hockey & Hamilton, 1970).

In sum, the rather inconsistent findings from this early literature suggest at least that noise needs to be very loud to disrupt performance and its presentation restricted to certain stages of the task. However, some authors have argued that a profile of noise effects would be attainable if the complex interaction of noise with other variables was considered (e.g., Broadbent, 1979; Smith & Jones, 1992).

(ii) The irrelevant speech effect (ISE).

In contrast to the effects of white noise, it has consistently been found that periodic sound (e.g., speech, tones) exerts a marked detrimental effect on the performance of certain classes of cognitive tasks due to the mere presence of such sound. Notably, this work departs from noise research in that it focuses on the composition of sound in terms of its spectral qualities rather than on its intensity.

Colle and Welsh (1976) were the first to report the now extensively documented finding that task-irrelevant speech disrupts serial recall performance (Jones, Madden & Miles, 1992; Salame & Baddeley, 1982, 1989; LeCompte, 1996). The serial recall task involves presenting participants with a sequence of seven to nine visually presented items (e.g., digits), which are to be recalled in the order of presentation either immediately following the last item or following a short retention interval (about 10s) during which the participant is expected to keep rehearsing the sequence. This task is widely used in cognitive psychology as it provides a tool with which to elucidate the mechanisms responsible for the maintenance of the order of events, a capacity which is critical to many everyday activities (see e.g., Crowder & Greene, 2000; Gathercole & Baddeley, 1993; Lashley, 1951). The ISE then is the marked reduction in accuracy of serial recall performance when background speech, which the participant is explicitly instructed to ignore, is presented either during the presentation phase, during the retention interval, or during both these phases relative to performance in a quiet control condition. This detrimental effect is robust, typically reaching up to 30-50%, does not appear to wane throughout the course of a single experiment (Jones, Macken & Mosdell, 1997) or across experimental sessions (Hellbruck, Kuwano, & Namba, 1996), and only around one-eighth of individuals appear to be invulnerable to the effect. Indeed some participants' performance can be reduced by over 300% (Ellermeier & Zimmer, 1997). We turn now to discuss findings which speak to the goal of delineating what aspects of sound do or do not play a role in mediating the ISE.

(iii) What property of speech gives it disruptive power?

(a) Intensity. Several factors can now be reasonably confidently considered as unimportant as determinants of the disruption. One of these is the sound pressure level of the speech. In contrast to the effects of aperiodic (or broad-band) noise, which generally seems only to disrupt performance above the level of 95dB(A) (Broadbent, 1979) the effect of irrelevant speech appears to be independent of its intensity, at least within the range of 48dB(A) to 76dB(A) (the approximate levels of a whisper and a shout respectively (Colle, 1980)). This is the case whether the intensity level is manipulated either within or between trials (Tremblay & Jones, 1999). Since the early work on noise research, higher levels have not been studied due to concerns about the danger to hearing, to the unrepresentativeness of sounds at this level (at least for speech), and to the likelihood that unspecific arousal factors would come into play above around 80dB(A). Hence, the fact that sound can cause disruption at very low levels is one of the key findings to note with respect to its implications for noise abatement policy, an issue to which we shall return in Section 4.

(b) Meaning. An intuitively plausible explanation for the ISE is that the semanticity of speech would be a potent source of disruption. The possible role of the semantic property of speech in the ISE has been examined in two ways; in terms of the meaningfulness of speech per se and in terms of its similarity in semantic content to the to-be-remembered material. With respect to the serial recall task at least, the evidence tends to suggest that the role of semanticity in both these senses is minimal. LeCompte, Neely and Wilson (1997) found an effect of meaningfulness but meaning only accounted for around 12% of the irrelevant speech effect thus indicating that semanticity was certainly not the primary source of disruption. Moreover, a number of other studies have failed to show any effect of meaning: A comparable disruptive effect to that of speech in a participant's native language has been found for speech in a language the participant does not understand and for speech played backwards (see [Figure - 1]; Colle & Welsh, 1976; Jones, Miles & Page, 1990; Salame & Baddeley, 1982; see also LeCompte & Shaibe, 1997).

When semanticity is manipulated in terms of the similarity in semantic content between the irrelevant speech and the to-be-remembered material, again the evidence suggests that it plays no role. For example, Buchner, Irmen and Erdfelder (1996) found a comparable degree of disruption to the serial recall of visual digits regardless of whether the irrelevant speech was made up of auditory number-words, non-words comprising the phonemes of the numbers, or word combinations with phonemes that were similar to those of the to-be-remembered digits. A second experiment showed that presenting the same set of auditory number-words as that used as to-be-remembered items was no more disruptive than presenting numbers that were not in the to-be-remembered set. Overall then, evidence points away from semanticity as a potent factor in the ISE. However, it will be seen later in Section 3 that the semantic property of speech does have at least an additive disruptive effect on tasks that call for semantic processing.

(c) Propensity to interfere with the registration of TBR items. One broad approach to the ISE has been to suppose that some property of the speech interferes with the encoding (or registration) of the to-be-remembered items. That is, some kind of sensory confusion or masking occurs despite the fact that the irrelevant and relevant materials are presented in different sensory modalities (Broadbent, 1982). However, note that the speech need not be presented during the input of the TBR items to be disruptive; comparable disruption occurs when speech is presented during the retention (or rehearsal) interval see [Figure - 2].

This fact alone suggests that interference is occurring at a later stage of processing when the irrelevant and relevant materials have entered memory (Miles, Jones & Madden, 1991; see also Macken, Mosdell & Jones, 1999). Further evidence against this approach comes from the finding that irrelevant sound events presented between TBR items produce as much disruption as events presented concurrently with the TBR items (Jones, 1994; Salame & Baddeley, 1982).

Despite this, the interference at encoding approach to the ISE has resurfaced recently in two further forms. One account proposes essentially that irrelevant and relevant materials clash because they share the same temporal frame, i.e., there is a difficulty in sampling the relevant items in memory because irrelevant items have entered the same memory search space at approximately the same time (LeCompte, 1996). However, a study by Macken et al. (1999) seriously undermines this account. They reasoned that such an account would predict that presenting irrelevant items just prior to the TBR items would be as likely to share the same temporal frame as irrelevant items presented just after the list items. In turn, pre­and post-list items should be equipotent in their propensity to corrupt the accuracy of sampling the correct items and hence serial recall performance. However, Macken et al. (1999) actually found a marked asymmetry; pre-list irrelevant items had no significant effect on recall while post-item lists produced marked disruption. Such a finding clearly undermines the viability of the temporal frame-sharing explanation of the ISE.

The second account proposes that modality­independent features of the TBR items are overwritten (or masked) by those it shares with the irrelevant items thus inducing order errors (Neath, 2000). The idea that the degree of mutual interference is proportional to the degree of similarity in terms of identity between relevant and irrelevant material is an old one (see sub­section (c) below). However, there are several lines of evidence that pose great difficulty for this explanation (see Tremblay & Jones, 2000, for a full critique; see also Baddeley, 2000a). For example, irrelevant tones, which clearly share no similarity with the verbal TBR items, can cause disruption (Jones & Macken, 1993). Second, spatial analogues of the verbal serial recall task in which the sequence of locations of visual events are to be serially recalled are also susceptible to disruption by irrelevant speech; again there is no obvious similarity between relevant and irrelevant material in this case (Jones, Farrand, Stuart & Morris, 1995).

(c) Phonology. One of the earliest accounts of the ISE proposed that speech is disruptive because the phonemes within the irrelevant speech gain privileged and obligatory access to a speech-specialised memory store (see Baddeley, 1990) wherein they clash with similar phonemes produced while sub-vocally rehearsing the to-be­remembered items (Salame & Baddeley, 1982). Again, given that this account posits that the interference arises due to a similarity in content between relevant and irrelevant items it is clearly open to the same empirical challenges as the Neath (2000) model discussed above. That is, the account clearly cannot explain the functional equivalence of speech and non-speech sounds in the ISE nor for the fact that the serial recall of non-verbal items can be disrupted by irrelevant verbal items (Jones et al., 1995).

More direct evidence against this account comes from the finding that the degree of phonological similarity between verbal list items and irrelevant speech tokens does not predict the degree of disruption (Jones & Macken, 1995a). In this experiment, the TBR items were the syllables f, k, l etc. and the irrelevant sequence could consist either of words that rhymed with (and were therefore phonologically similar to) these items (deaf, pay, bell etc.) or words that did not rhyme, (hat, cow, nest etc.). Failing to provide support for the phonological similarity account, both types of sequence produced a comparable degree of disruption. Moreover, an irrelevant sequence containing words that were phonologically dissimilar to the TBR items but which rhymed with each other, (door, war, more, etc.) produced significantly less disruption than either of the other two sequences thus suggesting that it is the degree of physical (or acoustical) dissimilarity between items within the irrelevant stream that is pivotal to the ISE see [Figure - 3].

This idea has been termed the changing-state hypothesis, further evidence for which we now describe below.

(e) Acoustical variation. Over the past ten years or so, strong evidence has accumulated showing that the property of speech that gives it disruptive potency is that it exhibits acoustical variation over its time course in terms of spectral qualities such as pitch and timbre (but not intensity; see Tremblay & Jones, 1999). Thus, the changing-state hypothesis holds that any sound (speech or non-speech) will disrupt performance if (and only if) the sound shows appreciable acoustical change from one segmentable entity to the next (Jones, Madden & Miles, 1992). For example, using speech as the irrelevant material, a repeated consonant, e.g., 'c, c, c, c,' produces little if any disruption, whereas an irrelevant stream consisting of different consonants, e.g., 'c, h, j, t,' produces marked disruption. Similarly, as noted in the previous sub-section, non-rhyming words are far more disruptive than rhyming words presumably because the latter exhibit less acoustical change from one word to the next than the former (Jones & Macken, 1995a). Importantly, tones produce the same pattern; a same-pitch repeated tone produces little if any disruption whereas a sequence of tones varying in pitch from one successive tone to the next produces marked disruption (Jones & Macken, 1993). As mentioned, the fact that changing-state tones produce disruption undermines the phonological similarity account of the ISE (Salame & Baddeley, 1982) and indeed any account that grants speech a special status in this effect (e.g., Neath, 2000). However, an attempt was made to accommodate the effect of non-speech within the phonological store account by supposing, in a somewhat ad hoc fashion, that pure tones were speech-like enough to be given access to the same store as the TBR verbal items and thereby have the propensity to clash with them. However, in a very recent study which revisited the effects of aperiodic sound on serial recall performance, it was found that a sequence of bursts of broadband noise whose centre frequency changed from one to the next, produced an ISE (Tremblay, Macken & Jones, in press). Such a finding further undermines the idea that the locus of interference in the ISE is within a discrete memory module which only gives access to speech and speech-like information. That is, speech appears to be disruptive merely by virtue of the fact that it contains a high degree of acoustical variation over time rather than by virtue of its meaningfulness, its lexicality, its phonological properties, or its similarity in any other respect (in terms of identity) to the TBR items. For the remainder of the review we therefore refer to the effect as the irrelevant sound effect (ISE) given the contention that speech and non-speech sounds are equipotent in their power to disrupt serial recall (Jones, 1993).

An intuitively appealing and parsimonious explanation of the ISE and particularly for why changing and not steady-state sound is disruptive is that changing-state sounds are more likely to attract attention away (elicit an orienting response (OR)) from the recall task (Cowan, 1995). The OR has been characterised as the 'what is it?' response to novel or significant stimuli, i.e., an involuntary (or exogenous) shift of the attentional focus is triggered. The OR may be accompanied by behavioural changes (e.g., quieting, eye and head movements) and by physiological changes (e.g., slowed heart rate). Although such changes may not be observable in the irrelevant sound paradigm, it is possible that a diversion of attention could occur without the panoply of classical overt effects. More technically, the OR account of the ISE draws upon the idea that a mental model of a presented stimulus is formed which is fashioned progressively to be a more faithful model as that stimulus is repeated (see Sokolov, 1963). The likelihood of an OR is a function of the degree of mismatch between a new stimulus and the mental model. So, with repeated presentation of the same stimulus, that stimulus becomes less and less likely to mismatch the model and in turn less likely to evoke an OR, i.e., habituation of the OR is said to occur. In contrast, a sequence of changing stimuli could be construed as constantly providing novel stimuli, which would preclude habituation of the OR, thus consistently diverting attention away from the task at hand and in turn disrupting recall performance (Cowan, 1995).

However, on closer inspection, this explanation of the changing-state effect has problems accommodating a number of key empirical findings. First, the stimuli typically used in irrelevant sound experiments are repeated many times over the course of an experiment even though all the irrelevant stimuli within a trial may all be different to each other. Such repeated exposure to the stimuli should quickly allow a neural model of those stimuli to be fashioned so that ORs in turn should become less likely over the course of an experiment. However, as noted earlier, the degree of disruption does not diminish over the course of an experiment (Hellbriick, Kuwano & Namba, 1996; Jones et al., 1997; Tremblay & Jones, 1998; but see Banbury & Berry, 1997) nor between subsequent experimental sessions in which the same stimuli are used to make up the irrelevant sequences (Ellermeier & Zimmer, 1997; Hellbruck et al., 1996). Second, the account would not predict the 'token dose effect' (Bridges & Jones, 1996). This refers to the finding that as the number of tokens per unit time within a trial increases, so too does the degree of disruption. If an OR becomes less likely the more times a stimulus is presented (due to the increasingly better specified mental model) then performance should in fact be better the higher the token dose (or certainly not worse). The third line of evidence relates to the irrelevant token set size. Following a similar logic to that with the token dose effect, the higher the number of different tokens used in the irrelevant stream within a trial the more likely it is that an OR will be elicited (i.e., the habituation rate should decrease). That is, with a low token-dose sequence (e.g., 'a, b, a, b, a, b…' ) each token would be repeated relatively more often than within a high token­dose sequence (e.g., 'a, b, c, d, a, b…'), thus serving to fashion their respective neural models more quickly, in turn leading to habituation of the OR. That is, the OR account of the ISE would predict a monotonic increase in disruption as the token set size is increased. However, for both speech and non-speech items, a change between immediately successive tokens (low token-dose) is sufficient to produce a disruptive effect, with the addition of further tokens (high token-dose), having no significant further effect on disruption (Tremblay & Jones, 1998).

Hence, the evidence points away from the idea that the attentional orienting response plays a role in the intrusiveness of changing-state sound. From a practical point of view, the main point to emphasise from the foregoing is that the presence of sound, whatever the setting, will constitute a constant source of disruption so long as the sound contains acoustical variation.

Rather than having an increased propensity to capture attention, the changing-state account of Jones and colleagues proposes that changing­state sound is disruptive because it yields information about the order of its constituent elements, and the involuntary processing of this 'unwanted' order information clashes with the process of maintaining the order of (i.e., rehearsing) the TBR items (Jones, 1999; Jones et al., 1992; see Jones et al., 1996, for a detailed discussion). A changing-state sequence is thought to yield order information as a direct by­product of the process of auditory grouping (or streaming). Briefly, auditory streaming refers to the process whereby the gross acoustical signal reaching the brain is decomposed, using Gestalt­like principles such as similarity of pitch, location, timbre and so on, according to the different environmental objects that have contributed to that signal (see Bregman, 1990, for an extensive discussion, or Bregman, 1993, for an overview). Part of this streaming process involves the perceptual system comparing successive auditory events so as to decipher whether or not the sound events are emanating from the same environmental source: Sound events that are relatively distinct from one another can still be fused into a single stream but once the magnitude of difference reaches a certain point the events are segregated into separate streams (i.e., fusion gives way to fission). To illustrate this perceptual phenomenon, [Figure - 4] shows two sequences of tones in which the difference between successive tones is either relatively small (sequence a) or relatively large (sequence b). When participants are asked to attentively listen to these sequences, the first is heard as a coherent single stream of changing tones whereas the second is perceived as two streams, one containing a repeating high tone, the other a repeating low tone (Van Noorden, 1977).

The changing-state account posits that a side­effect of the process of fusing relatively distinct successive events into a single stream is that their order is automatically encoded. A further key assumption is that auditory streaming is a preattentive process (i.e., the sound does not have to be in the focus of attention) and that the discrete elements making up unattended sound, as well as attended sound, are therefore subject to streaming and hence seriation (the placing of events in temporal order; Jones, 1999; see also Bregman & Rudnicky, 1975). Thus, on this account, the ISE arises as a result of a clash between two concurrent seriation processes; the deliberate process of seriating the TBR items is corrupted by the additional involuntary process of seriating the elements making up the changing-state irrelevant sound (Jones et al., 1996). Thus, it is not the encoding of the TBR items that is vulnerable to interference but rather the rehearsal process that serves to re-vivify the order of the successive items. This explains why, once presentation of the TBR items has commenced, the time at which the irrelevant items are presented is relatively unimportant (see Section 1(i) (c)): Rehearsal will take place both during and following presentation (if there is a delay before recall is required) and so serial recall should be similarly affected in each case, as indeed has been shown see [Figure - 1].

It should be emphasised however that according to the auditory streaming framework, only a sequence of elements emanating from the same stream produces information pertaining to the order of those elements. Furthermore, up to a point, the larger the differences between successive elements within a stream the stronger the order information. However, this linear function breaks down at the point of fission; when the difference between successive elements reaches a critical point they are segregated into separate streams and information about their order is no longer forthcoming. The impoverishment of the availability of order information when successive element are particularly distinct is evident when participants are asked to report the order of elements in an attended sequence of sounds: When the elements are relatively distinct, order report accuracy is high, but when the elements are highly distinct (e.g., a burst of white noise, a tone, a vowel sound and a buzz) accuracy drops dramatically (Broadbent & Ladefoged, 1959; see also Warren & Obuzek, 1972).

A key prediction for the changing-state account flows from the non-linear relationship between the mismatch between successive elements and the availability of order information: If the ISE is the result of the presence of a second source of order information, then the relationship between the degree of change in the irrelevant stream and its propensity to disrupt serial recall should also be non-monotonic. That is, as the degree of change between successive elements in the irrelevant stream increases (and hence the strength of irrelevant order cues) so too should the degree of disruption, but this should only be the case up to the point of fission. This is indeed what is found. For example, using pitch differences to implement mis-match between successive elements in an irrelevant sound stream, it is found that disruption increases linearly when the pitch difference between two successive tones is increased from 0 to 2 to 5 semi-tones but diminishes again when the difference is increased to 10 semi-tones (Jones, Alford, Bridges, Tremblay & Macken, 1999; see [Figure - 5]). That is, the point of inflection in this non-linear function corresponds to the point where the fission of the highly distinct tones occurs (i.e., in the 10 semi-tone difference condition). This suggests that two distinct perceptual streams are formed to represent the unattended sequence, order information is in turn lacking, and thus the sequence is relatively impotent in its capacity to disrupt recall.

The modulation of the changing-state effect by streaming has also been demonstrated when the disparity between elements is implemented via differences in spatial location. If an irrelevant sound sequence consisting of three different syllables is presented in a repeating loop monophonically, then the usual ISE is observed. However, if the three syllable loop is manipulated such that one syllable is presented to the left ear, one to the right ear, and the other to both ears at once, disruption is significantly less marked. This finding is explicable if the unattended monophonic sequence was interpreted as a single stream and the unattended 'stereophonic' sequence was interpreted as three separate streams each consisting of one repeating item. The single stream would still contain sufficient variation to provide order information whereas the three separate streams would contain no variation and therefore no information about order (Jones & Macken, 1995b; see also Jones, Saint-Aubin & Tremblay, 1999). Thus it appears that for acoustical variation to be potent in terms of disrupting performance, the variation must be produced by spectral changes superimposed on a common carrier. Everyday examples of auditory streams that meet this condition par excellence are speech from a single voice or varying notes from the same musical instrument.

Finally, a recent experiment by Tremblay, Macken, Culling and Jones (submitted) provides somewhat more direct evidence than hitherto that it is the availability of order information in the irrelevant sequence that mediates its disruptive potency.

They established first how well the order of events could be discerned from sequences when participants attended to those sequences. By varying the speed at which the events within a sequence were presented they established which sequences yielded strong order information (i.e., those for which participants' report of the order of events was relatively good) and which sequences yielded weak order information (i.e., where report of order was relatively poor). Tremblay et al. reasoned that if it is the availability of order information in an irrelevant sequence that mediates disruption of serial recall then those sequences that had been found to yield better order information when attended should be more disruptive when subsequently used as irrelevant sound. This is indeed what was found; a positive correlation was obtained between the accuracy of order report when the sequences were attended and the degree to which they subsequently disrupted serial recall.

(iii) What kind of mental activities are disrupted by which properties of sound?

In the previous section it was proposed that the intrusiveness of sound is mediated by the degree of order information it yields. Moreover, we proposed that the ISE is the result of a clash between two processes of seriation, that is, the extraneous order information from changing ­state sound will only be damaging to a concurrent task that requires the maintenance of temporal order information. In this section therefore we discuss in more detail the task part of this equation, i.e., the nature of the primary task that leaves it vulnerable to disruption.

(a) Tasks involving serial processing. As is clearly evident from this review so far, the examination of the ISE has been based predominantly on the disruptive effect of sound on the serial recall of visually presented verbal information. However, based on the similarity of process approach underpinning the changing­state account, the identity of the items to be deliberately seriated, as with the identity of the irrelevant events, play no causal role in the ISE. Rather, it is the process of deliberately seriating events, regardless of their nature, through the act of rehearsal that makes the serial recall task vulnerable to corruption by the order information yielded by changing-state sound. Thus any task that nominally involves serial processing (Beaman & Jones, 1997; Jones & Macken, 1993; Salame & Baddeley, 1990) or where rote rehearsal is the most efficient strategy (Beaman & Jones, 1998; LeCompte, 1994; Richardson, 1984) will be susceptible to disruption by any type of irrelevant sound so long as it has the property of acoustical variation over its time course. In support of this generalisation, a task involving the serial recall for the positions of dots presented sequentially in different locations on a screen is found to be significantly disrupted by irrelevant speech (Jones, et al., 1995).

A study by Jones and Macken (1993) serves to illustrate how the degree of seriation in the primary task determines its susceptibility to changing-state sound. Two versions of a memory task were contrasted in which either memory for the order of TBR items was required, or memory for the identity of list membership irrespective of the order (see also Beaman & Jones, 1997). For example, using a list that has six items drawn without replacement from days of the week (e.g., Tuesday, Thursday, Monday, Saturday, Wednesday, Friday), in one version of the task knowledge of item without respect to order is tapped by asking for recall of the day missing from the set of days (so the correct response in the example is Sunday). This missing-item procedure, when using a fixed class of items such as days of the week, is not thought to require any active rehearsal and to rely heavily on the knowledge of the list items from long term memory (see Buschke, 1963; but see LeCompte, 1996, for a different view). Another version of the same task uses a probe word to tap memory for order. Following a short delay following list presentation, the participant is presented with a day of the week and is expected to recall the item that followed it in the list. So, when given the same list of days as before but now presented subsequently with the probe Monday, the correct response would be Saturday. Critically, the effect of irrelevant sound is appreciably greater in the probe version than in the missing item version see [Figure - 6]. This is just as might be expected on the grounds that the tasks differ in the degree to which they rely on maintaining the order rather than item information. In fact most memory tasks involving the retrieval of lists appear to involve some degree of seriation, and for this reason are quite easily disrupted by irrelevant sound.

The particular sensitivity of seriation to disruption by irrelevant sound appears to be a general one insofar as a range of tasks, despite being difficult and sensitive in other contexts to lapses of attention, prove to be much less prone to the effect of irrelevant sound. These include tasks with neither a memory nor seriation component (e.g., sentence acceptability tests; Boyle & Coltheart, 1996), and a range of perceptual tasks (e.g., Baddeley & Salame, 1986; Burani, Vallar & Bottini, 1991). We would contend therefore that any cognitive task involving the STM processing of order will be vulnerable to the negative impact of sound that exhibits acoustical variation regardless of whether that sound is made up of discrete tones, speech tokens, or noise bursts. Again, this generalisation must be qualified: Disruption cannot be predicted solely on the basis of the acoustical parameters of the sound; this information must be supplemented by reference to the rules of auditory streaming (as discussed in section 1(ii) above). Necessarily these rules are approximate (but see Beauvois & Meddis, 1996, for an attempt to computationally instantiate principles of streaming).

(b) Comprehension tasks. The fact that the semantic property of speech does not mediate the degree of disruption in the serial recall task should not be taken to mean that this property has no effect on other primary tasks. In the serial recall task, the items are often rather arbitrary and devoid of rich semantic content (e.g., isolated letters or digits) and in any case semantic processing of the items is not a necessary component of the task at hand. As a result, there is no clash with any concurrent semantic processing that may be being applied involuntarily to the irrelevant speech. Whether unattended information undergoes semantic processing is still a highly contentious issue in attention research (see e.g., Holender, 1986; Pashler, 1998). Nevertheless the picture emerging from the few studies to have examined the effect of meaning of irrelevant sound on tasks that necessitate or encourage semantic processing (see Neely & LeCompte, 1999) suggest that the meaning of speech has at least an additive disruptive effect over and above the acoustical variation it exhibits (Jones et al., 1990; Martin, Wogalter & Forlano, 1988; Oswald, Tremblay, & Jones, 2000). For example, Oswald et al. (2000) tested participants' comprehension of sentences in the presence of meaningful speech, meaningless (backward) speech, and in quiet. They found that performance was worse in both speech conditions relative to quiet, but most interestingly that meaningful speech was more disruptive than meaningless speech. Thus, disruption to comprehension seemed to occur primarily due to the acoustical variation present in both the speech conditions but the effect was augmented by the additional property of semanticity present in the meaningful speech.

Such a finding can be accommodated within the changing-state account by supposing that segmentation of the auditory input into separate units - and in turn their seriation - can be based on different dimensions of that sound. At the most basic level the segmentation of sound could be based on just its changing physical attributes in terms of pitch and timbre while at a higher cognitive level segmentation may be based on semantic units (e.g., words, phrases). A particular task will be vulnerable to disruption to the extent that it calls upon seriation processes that can also be applied to the irrelevant sound. In the serial recall task there is no necessity for semantically-based seriation and so there is no clash with any preattentive seriation of the semantic units present in the irrelevant speech but only with the acoustically-based seriation of sound. However, a comprehension task is likely to involve seriation on both acoustic and semantic levels and is therefore liable to be corrupted by both the acoustically-based and semantically-based seriation of irrelevant speech.

The changing-state account of the ISE has been derived almost entirely from extensive examination of the impact of sound on simple laboratory-based tasks. Although using a simple task makes it easier to delineate the mental operations being deployed to perform that task, this is often at the price of limiting the generalisability of the findings to real-world tasks. Recently therefore examination of the intrusiveness of sound has been extended to more complex work-related tasks in order to highlight the practical implications of this kind of interference (see also Jones, 1995).

(a) The open-plan office. Background noise is reported to be among the most prevalent forms of interference in the open-plan office setting, causing stress and discomfort for workers and in turn low levels of performance (see Nemecek & Grandjean, 1973; Boyce, 1974; Sundstrom, Town & Rice, 1994, for survey-based studies). Moreover, the Washington-based World Watch Institute reported recently that so called 'Sick Building Syndrome,' of which extraneous sound is a major contributory factor, costs companies hundreds of billions of dollars through absenteeism. In a laboratory study designed to examine the effect of extraneous sounds on work-related tasks, Banbury and Berry (1998) had participants perform two office-typical tasks, namely memory for prose and mental arithmetic. Consistent with survey-based reports of the negative effects of background sound, they found that the performance of both tasks was deleteriously affected by the presence of irrelevant speech and other non-speech office noise (e.g., ringing phones, printer and fax machine noises). Again, counter to the intuitive notion that office-noise would have to be reasonably loud to be detrimental, the sound stimuli used in these experiments typically had a mean sound level of 65 dB(A) (as in most irrelevant sound experiments) which is approximately the level of normal intelligible speech. The implications of this finding for sound control in the office setting and indeed in any work domain where the work is largely cognitive and is conducted in a noisy environment (e.g., customer call centres, industrial control centres), is discussed in Section 3.

(b) The flight deck. Amongst the non-speech sounds on the flight deck are random broadband noise and periodic harmonic and high-frequency sound from the aircraft itself and intermittent sound bursts from weapon systems (in military aircraft). In addition, technological advances in aviation have introduced an increased amount of extraneous speech sounds into the cockpit (e.g., computer-generated synthesised speech output systems, e.g., Moore, 1989). Although auditory display systems and vocal control may serve to reduce high visual and manual workload demands in this safety-critical setting, such voice technology may actually introduce a source of performance disruption (Hart, 1988). Indeed, although the addition of speech/auditory systems shifts the nature of the pilot's activities from manual to cognitive, ironically, it is precisely the latter form of activity that is most vulnerable to disruption by the sound outputs from those systems. Banbury and Jones (2000) recently examined such a possibility systematically. In one experiment they had participants attempt to recall auditory-verbal navigational information (longitude and latitude coordinates, e.g., "Longitude: 4825; Latitude: 3719"). Radio messages to other aircraft were presented during the retention and recall periods, however, participants were instructed to ignore these irrelevant messages and focus on remembering their own. It was found that this task was disrupted severely in the presence of the irrelevant radio messages compared to quiet see [Figure - 7]. In a second experiment participants were to recall the track history of a moving target on a plan view radar display. Again, the results indicated that performance on this visual-spatial task was disrupted by irrelevant radio messages.

For the ergonomist in the aviation field therefore, a consideration of the gross timing of auditory events with concurrent seriation-based tasks must be incorporated into cockpit design policies. At first glance, one might think that tasks involving seriation are relatively infrequent in this setting. However, effective monitoring of displays or system parameters over time necessitates the maintaining of the temporal order of this input so that trend information may be inferred. It is possible then, that an auditory warning for a relatively minor event could induce errors in entering co-ordinates into navigation, or weapon delivery systems with potentially serious results. One way to circumvent this problem may be through the active management of speech and sound sources in this setting. Digital storage for example allows the possibility of suppressing or postponing non­critical audio signals during critical phases of flight.

(c) The air-traffic control tower. The air-traffic controller is constantly exposed to a high degree of extraneous sound, especially speech in the form of messages sent directly over the controller's headset, the speech of colleagues in the same environment, and messages being relayed over loudspeakers from other control centers. Clearly, many of these messages will be relevant to a given controller but many of them will also be irrelevant at any particular moment. Indeed, a common practice in this setting is to place the headset over only one ear so that the controller can potentially attend to either one of a multitude of concurrent streams of speech (Hopkin, 1995). Many cognitive operations in this setting rely heavily on seriation processes and may therefore be particularly susceptible to the disruptive effects of extraneous sound. For example, one typical operation in this setting is the re-organization of flight paper strips in a new order, which involves prospective memory for the new order and the coherent sequencing of motor activity. Another is the temporal sequencing of cognitive activities in response to a constantly changing visual display. It is imperative for example that the controller can accurately maintain the temporal order of information pertaining to both the identity and position of aircraft (Endsley, 1995). Jones, Parmentier, Hughes et al. (submitted) therefore conducted a study in which participants had to recall in serial order either the identity of sequentially presented aircraft callsigns or the sequence of locations in which they were presented. In the presence of irrelevant radio speech signals, participants' recall errors increased by up to 20% regardless of whether it was the identities or the positions of aircraft that had to be recalled.

In sum, it is important that human factors practitioners working in these areas are made aware of the damaging effects of even low intensity sound, and in particular the interaction between task-type and the properties of sound that will assume disruptive power. In section 1 (iii) we provided a general profile of the effect of different properties of sound on different types of tasks. However, in order to predict whether sound will be intrusive in a given setting, one would have to undertake a fine-grained analysis of the kinds of sounds present in that environment coupled with a careful analysis of the processes involved in the types of tasks commonly performed in that setting.

3. General implications for noise abatement In the preceding section we suggested some of the ways in which sound could disrupt performance in specific settings. In this final section we summarise the key findings from the work reviewed and take a more macroscopic view of their practical implications.

(a) The effect is independent of intensity. In line with the early preoccupation with pressure level in noise research, environmental policy in relation to noise abatement has been concerned almost exclusively with the intensity of noise. That is, it seems that a sound has had to be considered loud before it is deemed an environmental problem (Schultz, 1978; Fidell, Barber & Schultz, 1991). Given that any kind of sound that is acoustically changing over time can disrupt mental processing regardless of pressure level, the factor of acoustical change should no longer be neglected as a source of annoyance and disruption.

(b) The effect is not evanescent. Important also for environmental policy is the finding that the effect of low intensity changing-state sound is not fleeting; the weight of evidence points away from the idea that habituation occurs for unwanted sound. Thus the power of changing­state sound to disrupt performance will not diminish across time and the continuous presence of sound will therefore incur an increased quantitative cost overall.

(c) Speech and non-speech are equipotent. We would argue that both non-speech and speech sounds will be as damaging to certain cognitive operations so long as they are equal in terms of the acoustical variation they exhibit. This implies that simply removing or attenuating sources of speech in a given work environment will not adequately solve the problem. Indeed, such a 'solution' may even be counter­productive: Specifically, removing one changing-state sound source and not another may actually increase the disruptive power of the sound that remains because the changing-state information (and therefore the availability of order information) in that sound would then be more salient and more easily perceived (see discussion of the so-called 'babble' effect in section (e) below).

(d) Interference is by process, not content. The overwhelming evidence points away from the intuitive notion that interference in the ISE is a clash of information based on the similarity in the content of the material. Rather, a view based on a conflict due to the similarity in the processes applied to the competing streams of information explains more of the data. The interference by process account of the intrusiveness of sound has general implications for the human factors community. Human­machine interface designers for example must take into account both the proficiencies and limitations of the human's cognitive capacities in order to create the conditions that will allow optimum performance. However, based on the widely accepted, but arguably misleading, conception of the mind as a highly modular entity (Baddeley, 1990, 2000b; Fodor, 1983; Wickens, 1992), there has been a tendency while designing such interfaces to assume that streams of information conveyed via different sensory modalities will draw on modality-specific mental resources and will therefore be relatively free from mutual interference. This is particularly assumed to be the case if the individual has to ignore one stream of information and concentrate on the other. However, the work reviewed in this paper suggests that, for some classes of cognitive operations, particularly those reliant on STM, that the modality through which information enters is irrelevant: The ISE clearly demonstrates that the processing of information of visual origin can be corrupted substantially by the simultaneous processing of a second source of information despite the fact that this information has entered via the auditory modality. Similarly, interference by extraneous sound also transcends the different codes assumed to be applied to different dimensions of task-relevant information (e.g., verbal, spatial; see Wickens, 1992). That is, at least as far as the process of seriation is concerned, such processing is code­independent: Both a sequence of verbal items and a sequence of spatial locations can be seriated and both operations are susceptible to damage by extraneous sound whether this sound is verbal (speech) or non-verbal (tones, noise bursts).

(e) Possible general solutions. In environments such as the open-plan office where the sound is not necessarily intrinsic to the work of a given individual (e.g., sound from others talking, from other employees' phones ringing, from operating equipment, from environmental equipment such as fans and ancillary functions such as cleaning and maintenance), there are generally two approaches to sound abatement available. The insensitivity of the effect of extraneous sound to pressure level suggests that an effective improvement in efficiency can only be achieved by reducing the level of the sound to that below the threshold of audibility. However, a much cheaper and more practical alternative is to effectively mask the changes in energy within the irrelevant stream. The point may be illustrated by the 'babble effect'. If, using a monaural source, the number of voices contributing to the irrelevant stream is manipulated, the disruptive effect on memory increases as the number increases from one to two, and then also from two to three. However, above three voices the disruption is a decreasing function of the number of voices (Jones & Macken, 1995c). This effect is readily understood in terms of the masking of one sound by another. When the sound contains a relatively large number of voices, words and other cues to segmenting the speech are no longer individually distinguishable. In particular, there is evidence that the changes in energy at the boundary of the sounds are important in determining the degree of disruption (for further discussion see Jones, Macken & Murray, 1993). Thus, we would expect that the noise from a smaller office for example would have a greater capacity to disrupt work than noise from a larger office with more inhabitants, as the few sources of sound present do not mask each other. This assumption is supported by Keighley and Parkin (1981), who observed that the smaller the office, the more bothersome its noise climate was to its occupants.

One method commonly used to reduce disruption by background sounds in offices is to add a continuous noise signal [45 to 50 dB(A)] to mask most of the speech so that it becomes less audible, and most importantly, the boundaries of the sounds become inaudible. Alternatively, the provision of partitioning and sound insulating materials on floors and suspended ceilings can dampen down the sound and make it less intelligible. By altering the acoustics of an office, it may be possible to reduce the variability of the sound in this manner.

However, we acknowledge that such gross methods of sound control are less applicable for settings where some of the sounds (e.g., speech communications) are particularly critical in the setting (e.g., in the cockpit) and a policy of overall attenuation is clearly not feasible or appropriate. As noted earlier, for such settings, potential disturbances to cognitive operations by sound would require a microscopic analysis of the interactions between sound-type, the extent to which that sound affords perceptual streaming, and the particular nature of the mental processes demanded.

Conclusions

The study of irrelevant sound has illuminated a number of key features of the way in which sound intrudes into memory and corrupts the processing of task-relevant information. Sound appears to have obligatory access to memory; sound is recorded and processed even when attention is directed elsewhere. This obligatory access is accompanied by a range of organisational activities, one of which - seriation - has a general impact upon any other activity concurrently calling upon seriation. Moreover, additional properties of sound may assume disruptive power if (i) the cognitive task at hand necessitates a deeper level of processing (i.e., analysis of meaning) and (ii) the same dimension is present in the irrelevant sound (e.g., the semantic property of speech). From the practical viewpoint, the fact that this disruption does not depend upon the level of the sound (except, of course, the sound has to be above the absolute threshold of hearing), and does not diminish with continued exposure to the sound, has very important implications for noise abatement in work environments as diverse as the office and safety/health critical settings such as the flight deck.[93]

Beaman, C. P., & Jones, D. M. (1997). The role of serial order in the irrelevant speech effect: Tests of the changing state hypothesis. Journal of Experimental Psychology: Learning, Memory and Cognition, 23, 459-471.

Tremblay, S., & Jones, D. M. (1998). Role of habituation in the irrelevant sound effect: Evidence from the effects of token set size and rate of transition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 659-671.