Steady-state visual evoked potentials have only been applied recently to the study of face perception. We used this method to study the spatial and temporal dynamics of expression perception in the human brain and test the prediction that, as in the case of identity perception, the optimal frequency for facial expression would also be in the range of 5–6 Hz. We presented facial expressions at different flickering frequencies (2–8 Hz) to human observers while recording their brain electrical activity. Our modified adaptation paradigm contrasted blocks with varying expressions versus blocks with a constant neutral expression, while facial identity was kept constant. The presentation of different expressions created a larger steady-state response only at 5 Hz, corresponding to a cycle of 200 ms, over right occipito-temporal electrodes. Source localization using a time-domain analysis showed that the effect localized to the right occipito-temporal cortex, including the superior temporal sulcus and fusiform gyrus.

Introduction

Facial expressions are central to nonverbal social exchange as markers of internal states and as signals of intentions. An abundance of research literature has explored its anatomic and physiological properties in both healthy and pathologic populations (see e.g., Eastwood, Smilek, & Merikle, 2001; Hodsoll, Viding, & Lavie, 2011; McTeague, Shumen, Wieser, Lang, & Keil, 2011; Phelps, Ling, & Carrasco, 2006). The importance of expressions has given it a prominent place in both cognitive and neural models of face processing, in which it is represented as being processed in parallel and independent of face identity, possibly involving the superior temporal sulcus more than the fusiform face area (Allison, Puce, & McCarthy, 2000; Narumoto, Okada, Sadato, Fukui, & Yonekura, 2001).

The temporal dynamics of expression processing have been less investigated. How and when signals evolve within neural networks is important to our understanding of the integrated functioning of face processing systems. A methodology that has recently gained momentum in face perception research is the steady-state visual evoked potential (ssVEP; see e.g., Alonso-Prieto, Belle, Liu-Shuang, Norcia, & Rossion, 2013; Boremanse, Norcia, & Rossion, 2013, 2014; Gerlicher, van Loon, Scholte, Lamme, & van der Leij, 2014; Gruss, Wieser, Schweinberger, & Keil, 2012; Keil, Gruber, Muller, Moratti, & Stolarova, 2003; McTeague et al., 2011; Rossion, 2014; Rossion & Boremanse, 2008, 2011; Rossion, Prieto, Boremanse, Kuefner, & Van Belle, 2012). The ssVEP is an oscillatory brain response to stimuli being repeatedly presented or modulated at a regular temporal frequency, which was first applied to studies of low-level properties such as luminance (Regan, 1966, 1989; Van der Tweel, 1965). Because the temporal frequency of the electroencephalographic response matches that of the stimulus modulation, the stimulus-specific response can be reliably separated from noise and quantified by Fourier analyses that measure the response specifically at that frequency.

To isolate effects specific to certain facial properties, this technique can be married to an adaptation paradigm. Neural responses decline with repeated presentation of the same stimulus, as shown in both functional neuroimaging (Grill-Spector, Henson, & Martin, 2006; Grill-Spector & Malach, 2001) and electroencephalographic studies (see e.g., Alonso-Prieto et al., 2013; Itier & Taylor, 2002; Jacques, d'Arripe, & Rossion, 2007; Kovacs et al., 2006; Rossion & Boremanse, 2011; Rossion et al., 2012; Walther, Schweinberger, Kaiser, & Kovacs, 2013). By comparing responses from blocks in which face stimuli are identical with those from blocks in which one specific facial property is changed but all others remain the same, we can identify neural responses that are sensitive to that facial property.

Using this approach, a recent study examined electroencephalographic responses sensitive to facial identity at a range of temporal frequencies, and found that the optimal response to identity occurred with a temporal frequency of 5.88 Hz (Alonso-Prieto et al., 2013). Stimulation at 5.88 Hz has a cycle length of 170 ms, which parallels the finding in traditional event-related potential (ERP) methods that face-selective responses are evident in the N170 potential (Bentin, Allison, Puce, Perez, & McCarthy, 1996; Rossion & Jacques, 2011).

Whether facial expression processing has a similar optimal frequency remains unclear. The precise neural basis behind the N170 potential is also still under speculation. Some suggest that it originates in the superior temporal sulcus (for a review, see Rossion & Jacques, 2011), while others note that it can be abolished by lesions that eliminate both the fusiform face area and occipital face area (Dalrymple, 2011; Alonso-Prieto, 2011). It may even reflect the aggregate activity of all three of these areas of the core face network. If so, one would predict that the optimal frequency for facial expression would also be in the range of 5–6 Hz. This prediction is also supported by previous studies showing that the most consistent ERP modulations by facial emotion appear around 200 ms after stimulus onset during the early posterior negativity (Calvo & Beltran, 2014; Olofsson, Nordin, Sequeira, & Polich, 2008; Rellecke, Palazova, Sommer, & Schacht, 2011; Sato, Kochiyama, Yoshikawa, & Matsumura, 2001; Schupp, Flaisch, Stockburger, & Junghofer, 2006; Schupp et al., 2004).

To test this prediction, the current study used an ssVEP approach with a modified adaptation paradigm that compared blocks with repeated neutral expressions versus blocks with varying facial expressions, all of the same facial identity, at a range of frequencies from 2–8 Hz. Finally, we applied source localization methods to determine the neural generators of the expression response at the stimulation frequency at which it reached its highest value.

Methods

Participants

Sixteen healthy subjects were initially recruited, but the data of one was excluded because of excessive noise in the recordings. The final cohort consisted of fifteen subjects (eight men, seven women; mean age 26.47 ± 5.76 years; all right handed). Subjects were compensated $10 per hour for their time. All subjects had normal or corrected-to-normal visual acuity. The institutional review boards of Vancouver General Hospital and the University of British Columbia approved the protocol, and all subjects gave informed consent in accordance with the declaration of Helsinki.

Procedure

Stimuli

Seven black and white images of the face of a Caucasian male unknown to the participants were obtained from the HVEM-FIVE database. In these images, the individual displayed seven emotional expressions (neutral, happy, angry, disgust, fear, sad, and surprise). All images were displayed in frontal view and cropped to remove external features (i.e., hair and ears) while preserving the overall shape (Figure 1).

We used six expressions regardless of emotional valence for several reasons. First, previous studies have shown that ssVEPs in response to facial expressions do not change as a function of valence (McTeague et al., 2011; Wieser, McTeague, & Keil, 2011). Also, competition effects between different expressions have not been observed in ssVEP studies (Wieser et al., 2011). Second, enhanced sensory processing of dynamic emotional stimuli is not specific to certain emotions (Recio, Schacht, & Sommer, 2014). Third, although different facial expressions might show slightly different patterns of cerebral activity, these show a high degree of overlap (see Fusar-Poli et al., 2009; Vytal & Hamann, 2010 for a review on this topic). Finally, using six as opposed to two different expressions would minimize repetition effects in our “different expressions” condition, and thus magnify the differences between this and the “identical neutral expression” condition.

All images were sized to 400 pixels in width with a resolution of 300 pixels per inch and equated for luminance and contrast using the SHINE toolbox (Willenbockel et al., 2010). The luminance equalization was also verified by the stimulation program. The base face (100% size) subtended 3.72 × 2.86 degrees of visual angle at a distance of 100 cm from the stimulation monitor.

Stimulation

Stimuli were displayed on a grey background (40 cd/m2) via a custom application (SinStim) running in Matlab (MathWorks Inc., Natick, MA). In the identical neutral expression condition, the same face image with neutral expression was presented repeatedly for 80 s. In the different expressions condition, the same face with a single randomly chosen expression was presented repeatedly for the first 10 s, after which the six different expressions (excluding neutral) were presented for the remaining 70 s in a random order. The initial 10-s baseline was used to ensure that the ssVEP had reached its maximal power and already started to decrease when different expressions were introduced (Rossion & Boremanse, 2011). We note that in a classic adaptation paradigm, the identical neutral expression condition would consist of six blocks, each with repeated presentation of a single one of the six expressions. We chose to use a modified adaptation paradigm instead for two reasons. First, by using a repeated neutral expression in the identical neutral expression condition, we hoped to enhance differences generated by expression in the comparison between identical and different expression conditions. Second, because we planned to examine effects at multiple stimulation frequencies, it would not be practical to run an experiment with six identical and six different expression conditions at each frequency. Regardless, the fact that facial identity is constant across the two conditions means that the only difference between our identical and different expression conditions is that the latter contains variations in expression.

The stimulation frequencies (F) used were selected from frequencies between 2 and 8 Hz that could be accurately displayed by the display monitor (integers of 60.05 Hz refresh rate): 2, 3, 4, 5, 6, 6.67, and 7.51 Hz, which corresponded to cycle durations of 500, 333, 250, 200, 167, 150, and 133 ms, respectively. The stimulus contrast modulation was driven by a sinusoidal function (Regan, 1989; Van der Tweel, 1965; Victor & Zemon, 1985). Therefore, in each condition, a face stimulus appeared and disappeared on the screen with a rate of stimulation that corresponded to a full cycle. Following the beginning of the stimulation sequence (background), each pixel reached the full luminance value of the face stimulus after half a cycle and decreased back to zero contrast after the reminder of the cycle.

We chose the above mentioned frequencies based on a postulated inverse correspondence between the optimal frequency in ssVEP studies and the timing of responses in ERP studies (Alonso-Prieto et al., 2013): Given that ERP studies have reported emotion-driven amplitude modulations between 120 ms and 500 ms (Olofsson et al., 2008; Schupp et al., 2006; Schupp et al., 2004), this would correspond to frequencies of 2 (1000/500) to 8 (1000/120) Hz.

To minimize effects of repetition due to low-level visual cues, the face image changed size randomly after every cycle, by between 82% and 112% of the size of the original face, in both conditions.

There was one recording for each of the two conditions (identical neutral expression, different expressions) and each of the seven frequencies per participant. To control for the potential effects of fatigue, the order of the frequencies was randomized with both conditions being presented consecutively for that frequency. Testing was counterbalanced across subjects in two ways. First, subjects were paired in order of testing so that the random order of frequency assigned to the first subject was reversed for the second subject. Second, the first subject was randomized to start all pairs of frequency blocks with one particular condition (i.e., either identical or different expressions) and the second subject then started all pairs of frequency blocks with the other condition. To maintain attention across all conditions, participants were instructed to fixate on the center of the screen while monitoring a frame placed around the face. This frame changed from a square to a round shape briefly (400 ms) between 11 and 16 times during a stimulation run. The participant was instructed to press the space bar when they detected a change in the shape of the frame. Participants were at ceiling for this easy task, between 97% and 99% accuracy, with their mean reaction times ranging between 455 and 530 ms. At the beginning of each stimulus sequence and at each minimum contrast value, a trigger was sent from the parallel port of the display computer to the EEG recording computer.

Electrophysiological recording

Subjects were seated in a sound-attenuated room, 100 cm from a 17 in. computer monitor. Brain electrical activity was recorded using a standard 64-electrode cap (BioSemi ActiveTwo system; BioSemi, Amsterdam, the Netherlands) and five additional electrodes (three eye movement channels plus the two mastoids). All recordings were performed relative to two scalp electrodes located over medial parietal cortex (CMS/DRL), amplified with a gain of 0.5 and digitized on-line at a sampling rate of 256 samples per second. Offsets at each active electrode, which are a running average of the voltage measured between CMS and each active electrode and reflect half-cell potential of the electrode/gel/skin interface, were kept between ±40 mV at rest. Vertical eye movements were recorded using an electrode inferior to the right eye, while horizontal eye movements were recorded using electrodes on the right and left outer canthi.

Channels in which muscle potentials or amplifier blocking occurred were interpolated. Afterwards, each subject's signal was bandpass-filtered between 0.1 and 100 Hz (Butterworth filter, slope = 24 dB/oct). Finally, all channels were rereferenced to a common average reference.

For each 80-s trial, the first 12 s of EEG recording were removed. These 12 s corresponded to the presentation of the same expression plus two additional seconds, which were removed to exclude transient ERP components that may have been elicited by the sudden change of facial expression. The last 10 s of stimulation were excluded because these had more frequent eye blinks. Within the remaining recording, a final window of around 50 s of stimulation was used for analysis. This window was adjusted to be an integer number of cycles for every frequency (e.g., 52 s for stimulation of 6.67 Hz and 52.62 s for stimulation of 7.51 Hz) to avoid spectral leakage and to ensure a fair comparison across frequency rates.

A discrete Fourier transform (DFT; Matlab) was applied to the resulting temporal window, and EEG amplitude extracted at a resolution of roughly 1/50 = 0.02 Hz signal–noise ratio was computed at each channel for all frequency bins between 0 and 100 Hz as the ratio of the amplitude at the frequency of interest to the average amplitude of the 20 neighboring frequency bins, skipping only the closest bins on each side (Rossion & Boremanse, 2011). The results are reported as signal–noise ratio values, but were qualitatively similar to the amplitude at the driving frequency.

For statistical analysis, the signal–noise ratio values obtained for each expression condition at the response frequency bin corresponding to the stimulus frequency, and the identical neutral expression and different expression conditions were compared using one-tailed t test for paired samples to identify the electrodes and frequencies that had significant expression effects. This analysis was performed for each stimulation frequency and electrode separately. To control for multiple comparisons we calculated α values using the classic Bonferroni correction (i.e., dividing 0.05 by the number of comparisons to be made).

To provide confirmation of the effects of frequency, we performed a second analysis based on the electrodes that had significant effects above. These were the two neighboring electrodes P10 and PO8 in the right hemisphere. For completeness, we also included their equivalent left hemisphere electrodes (P9, PO7). The signal–noise ratios obtained for each condition (identical and different expressions) were first averaged over the electrodes above for each hemisphere separately and then subtracted (different expressions − identical neutral expression) to obtain the steady-state expression response (SSER) for a subject. Steady-state expression responses were compared using an ANOVA analysis with two factors: stimulation frequency (2, 3, 4, 5, 6, 6.67, and 7.51 Hz) and hemisphere (left and right), with subject as a random effect. Effects were explored using a t test for dependent samples. The α value was corrected using Bonferroni correction.

Time domain analysis

This focused on the stimulation frequency that showed the highest effects in the frequency domain analysis. The signal was first bandpass-filtered around the frequency of interest (4.5–5.5 Hz, center frequency 5 Hz, Butterworth filter, slope = 24 dB/oct) and then all channels were rereferenced to a common average reference.

The resulting time series for each condition were statistically compared using BESA (Megis Software, Munich, Germany) to ensure statistical replication of the significant differences between experimental conditions in the frequency domain. Afterwards, we evaluated the source of such difference using L2 minimum-norm current estimates (L2 MNE; Hamalainen & Ilmoniemi, 1994). This method supplies a solution to localize the synchronized postsynaptic potentials of pyramidal cells generated in the cortex of the brain and recorded on the human scalp (Martin, 2000) without making a priori assumptions about underlying neural generators and minimizing the source activation that can account for scalp potentials (Ilmoniemi, 1993). We utilized an idealized three-shell spherical head model with a radius of 85 mm and assumed scalp and skull thickness of 6 and 7 mm, respectively. The minimum norm was applied to the data across the latency interval 0–200 ms, spanning one entire cycle at a stimulation frequency of 5 Hz. Mean noise levels were computed as 15% lowest values. Weightings were applied to the data as follows: (a) weighting at the individual channel level with a noise scale factor of 1.00; (b) depth weighting, such that the lead field of each regional source was scaled with the largest singular value of the singular value decomposition of the source's lead field; and (c) spatio-temporal weighting using the signal subspace correlation method of (Mosher & Leahy, 1998) (dimension setting = 3). The obtained results were also reproduced using an additional source localization method, standardized low-resolution brain electromagnetic tomography (sLORETA) (Pascual-Marqui, Esslen, Kochi, & Lehmann, 2002; Pascual-Marqui et al., 1999) as implemented in BESA.

For statistical analysis, we first confirmed whether the differences between conditions observed in the frequency domain could also be observed in the time-averaged waveforms. Individual trials were averaged for each participant (from 100 ms prior to stimulus onset to 200 ms after encompassing one complete cycle for a stimulation frequency of 5 Hz) and submitted to time point by time point one-tailed paired t test corrected for multiple comparisons using permutation tests in combination with data clustering as implemented in BESA software. This analysis is based on the idea that if a statistical effect is found over an extended time period in several neighboring channels, it is unlikely that this effect is found by chance (Bullmore et al., 1999; Maris & Oostenveld, 2007).

Results

Frequency domain analysis

For each frequency, large EEG responses for each of the two expression conditions were observed at the response bin whose frequency equaled the stimulation frequency (Figure 2), indicating that the brain synchronized precisely with the rate of visual stimulation.

Signal entrainment to the periodic stimulation. Signal–noise ratio values, averaged over all electrodes, are presented for each of the two expression conditions and for 10 response frequency bins centered on the stimulation frequency. Note that for all stimulation frequencies, the highest value occurred at a response frequency that corresponded to the stimulation frequency, demonstrating the efficacy of periodic stimulation in entraining the brain response.

Figure 2

Signal entrainment to the periodic stimulation. Signal–noise ratio values, averaged over all electrodes, are presented for each of the two expression conditions and for 10 response frequency bins centered on the stimulation frequency. Note that for all stimulation frequencies, the highest value occurred at a response frequency that corresponded to the stimulation frequency, demonstrating the efficacy of periodic stimulation in entraining the brain response.

Expression effects were not observed at any stimulation frequency except at 5 Hz. At this frequency, the effect was observed at P10, t(14) = 1.76, p = 0.0004, and PO8, t(14) = 1.76, p = 0.0006. It is worth noticing that even using a conservative method to correct for multiple comparisons, it approached significance at P6, t(14) = 1.76, p = 0.001, and P8 t(14) = 1.76, p = 0.001 (Figures 3 and 4; α = 0.0008, classic Bonferroni correction for multiple comparisons). In all cases, the response as reflected in the signal–noise ratio for the different expressions condition was larger than that in the identical neutral expression condition, as predicted for an effect based on a response to expression.

Steady-state expression response for all frequencies. (A) Signal–noise ratio in both experimental conditions for the 5 Hz stimulation frequency is shown for the electrodes with a significant difference between identical and different expression conditions. (B) Steady-state expression response (signal–noise ratio to different emotions minus signal–noise ratio to identical emotions) at all stimulation frequencies, for all electrodes. The highest response is observed at 5 Hz, especially for electrodes P6, P8, P10, and PO8 (indicated in full red circles).

Figure 3

Steady-state expression response for all frequencies. (A) Signal–noise ratio in both experimental conditions for the 5 Hz stimulation frequency is shown for the electrodes with a significant difference between identical and different expression conditions. (B) Steady-state expression response (signal–noise ratio to different emotions minus signal–noise ratio to identical emotions) at all stimulation frequencies, for all electrodes. The highest response is observed at 5 Hz, especially for electrodes P6, P8, P10, and PO8 (indicated in full red circles).

Frequency tuning function of the steady-state expression response, averaged over occipital electrodes. Data for left hemisphere is on the left, that for the right hemisphere on the right. Error bars show one standard error.

Figure 4

Frequency tuning function of the steady-state expression response, averaged over occipital electrodes. Data for left hemisphere is on the left, that for the right hemisphere on the right. Error bars show one standard error.

To confirm these results, a second analysis of frequency effects confined to results averaged over occipital electrodes P10 and PO8, identified above, and their left hemisphere counterparts (P9 and PO7) was performed. The ANOVA showed a main effect of stimulation frequency, F(6, 84) = 4.24, p < 0.001, and an interaction between stimulation frequency and hemisphere, F(6, 84) = 2.38, p = 0.03 (Figure 4).

These results were due to fact that the differences between 5 Hz and all other frequencies were found only in the right hemisphere. That is, the t test for paired samples showed that the difference at 5 Hz in the left hemisphere did not differ from the response at any other frequency in either hemisphere. In contrast, for the right hemisphere, the 5 Hz response was larger in the case of 3 Hz versus 5 Hz, t(14) = −4.56, p = 0.000; 4 Hz versus 5 Hz, t(14) = −3.12, p = 0.003; and 7 Hz versus 5 Hz, t(14) = 3.51, p = 0.003 (α = 0.003, classic Bonferroni correction for multiple comparisons). Notably, it was statistically significant at α = 0.01 in the case of 2 Hz versus 5 Hz, t(14) = −2.81, p = 0.007; 6 Hz versus 5 Hz, t(14) = 2.65, p = 0.01; and 8 Hz versus 5 Hz, t(14) = 2.93, p = 0.005.

Finally, we did not observe effects at harmonics frequencies of 5 Hz (see Figure 5).

Spectrum for stimulation frequency of 5 Hz. The SNR averaged over electrodes P8 and PO10 and obtained in response to both experimental conditions when the stimulation frequency was 5 Hz is shown for all frequencies from 1 Hz to 20 Hz. Note that the response at harmonics frequencies 10 Hz, 15 Hz, and 20 Hz is not as strong as at the fundamental frequency of 5 Hz.

Figure 5

Spectrum for stimulation frequency of 5 Hz. The SNR averaged over electrodes P8 and PO10 and obtained in response to both experimental conditions when the stimulation frequency was 5 Hz is shown for all frequencies from 1 Hz to 20 Hz. Note that the response at harmonics frequencies 10 Hz, 15 Hz, and 20 Hz is not as strong as at the fundamental frequency of 5 Hz.

Time domain analysis and source reconstruction of the steady-state expression response at 5Hz

The differences between both experimental conditions reached statistical significance between 100 and 180 ms and peaked at 140 ms, 40 ms after the periodic stimulus had reached full contrast. Those differences were observed over occipito-temporal electrodes of the right hemisphere (P8, P10, P6, P4, PO4, PO8, and CP6) and left hemisphere (P5, P7, P9, PO7, and O1; Figure 6).

Comparison of both experimental conditions in the time domain, for 5 Hz frequency. (A) Grand averaged ssVEP waveforms to identical and different expressions as well as the difference (red line) between both conditions. The waveforms have been averaged over 400 ms to show two stimulation cycles for 5 Hz frequency. (B) Topographical distribution of the steady-state expression response, where blue indicates larger response in the different expressions condition (left), as well as the results of the statistical comparison between the two conditions (right). Note how the effects are highest over occipito-temporal electrodes of the right hemisphere.

Figure 6

Comparison of both experimental conditions in the time domain, for 5 Hz frequency. (A) Grand averaged ssVEP waveforms to identical and different expressions as well as the difference (red line) between both conditions. The waveforms have been averaged over 400 ms to show two stimulation cycles for 5 Hz frequency. (B) Topographical distribution of the steady-state expression response, where blue indicates larger response in the different expressions condition (left), as well as the results of the statistical comparison between the two conditions (right). Note how the effects are highest over occipito-temporal electrodes of the right hemisphere.

The grand-averaged waveforms to identical and different expressions were subtracted to obtain the difference grand-averaged waveform. The identical and different expressions waveforms as well as the difference waveform were submitted to source reconstruction analysis using MNE. Because the time-varying ssVEP is a measure of continuous brain activation at the stimulation frequency (Regan, 1989), we specifically focused on the sources that explained the scalp recorded signal at 140 ms. Our analysis showed that these sources were located in the occipito-temporal cortex, encompassing the superior temporal sulcus and the fusiform gyrus (Figure 7). The sources encompassing the fusiform gyrus were active in both conditions, identical and different expressions, but more strongly in the different expressions condition. The sources encompassing the superior temporal sulcus were active mainly in the different expressions condition (Figure 7).

Results of the source localization algorithm. The 5 Hz steady-state expression response was localized to a neural network within the right occipito-temporal cortex including the superior temporal sulcus (STS) and the fusiform gyrus (FG). Note how the sources including the right FG were active in both conditions although stronger in the different expressions condition. Sources including the superior-temporal sulcus were active mainly in the different expressions condition.

Figure 7

Results of the source localization algorithm. The 5 Hz steady-state expression response was localized to a neural network within the right occipito-temporal cortex including the superior temporal sulcus (STS) and the fusiform gyrus (FG). Note how the sources including the right FG were active in both conditions although stronger in the different expressions condition. Sources including the superior-temporal sulcus were active mainly in the different expressions condition.

Using a ssVEP version of a modified neural adaptation paradigm, we found that the steady-state expression response is tuned to 5 Hz and likely generated by the activity in a right occipito-temporal network, likely including the superior temporal sulcus and the fusiform gyrus.

It is unlikely that these observations are the result of attentional fluctuations and fatigue. First, our method counterbalanced the order of conditions across participants. Second, participants reached a high level of accuracy in the behavioral task implemented to control for attention (97%–99% accuracy). Third, previous studies have reported that the ssVEP attentional modulation is distributed bilaterally over an occipito-frontal network (Ding, Sperling, & Srinivasan, 2006), whereas there is extensive electrophysiological evidence that amplitude modulations over right occipito-temporal electrodes characterize face-specific neural responses (see e.g., Bentin et al., 1996; Rossion & Boremanse, 2008; Sergent, Ohta, & MacDonald, 1992).

The tuning of this effect at 5 Hz (200-ms cycle length) corresponds to the onset latency of the early posterior negativity, an ERP component that appears around 200 ms after stimulus onset and reaches its highest amplitude between 250 and 300 ms. The early posterior negativity amplitude increases for emotional relative to neutral stimuli (Sato et al., 2001; Schupp et al., 2006; Schupp et al., 2004) and is evident mainly over occipito-temporal electrodes, although it can be observed as a positive amplitude deflection over fronto-central sites when a mastoid reference is used (Junghofer, Peyk, Flaisch, & Schupp, 2006; Rellecke, Sommer, & Schacht, 2013). Modulations of the early posterior negativity by facial emotions have been observed even when emotion is irrelevant for the experimental task (Rellecke, Sommer, & Schacht, 2012), and this potential has been considered an index of the time at which encoding of emotional stimuli occurs in extrastriate visual cortex (Rellecke et al., 2012, 2013; Schupp et al., 2006; Schupp et al., 2004).

This correspondence between ERP latency and optimal ssVEP frequency accords with a previous study demonstrating that the optimal stimulation frequency of 5.88 Hz for facial identity discrimination has a cycle length that coincides with the latency of the face-selective N170 potential (Alonso-Prieto et al., 2013). It also supports the hypothesized relation between the stimulation rate that generates the largest ssVEP response and the time needed to process the stimulus. In other words, the stimulation rate that generates the largest ssVEP response is likely one that allows optimal engagement of neural networks related to processing a given stimuli because they are not only ready to fire every time a stimulus arrives, but also able to synchronize their responses and avoid both temporal dispersion, which occurs if cycle length is longer than the neural activation cycle, and destructive interference, which occurs if cycle length is shorter (Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015; Rossion, 2014).

The observation of a correspondence between ERPs latency and optimal ssVEP driving frequency also supports the proposal that ssVEP are generated by linear superposition of individual transient evoked potentials (for studies focused on sensory processes and low-level vision proposing this hypothesis see e.g., Capilla, Pazo-Alvarez, Darriba, Campo, & Gross, 2011; Galambos, Makeig, & Talmachoff, 1981; Heinrich, 2010; Santarelli et al., 1995). However, not all previous studies using more complex stimuli such as faces have found such direct correspondence between the optimal stimulation rate in ssVEP and the latency of ERP components (see e.g., Gruss et al., 2012). As previously suggested, both linear and nonlinear mechanisms may be involved in the generation of the ssVEPs (for studies focused on sensory processes and low-level vision proposing this hypothesis see e.g., Azzena et al., 1995; Conti, Santarelli, Grassi, Ottaviani, & Azzena, 1999; Regan, 1982, 1989; Stapells, Linden, Suffield, Hamel, & Picton, 1984) depending on factors such as driving frequency, stimulus characteristics, or perceptual demands (for a discussion on this topic see Norcia et al., 2015; Rossion, 2014). Hence, the steady state response cannot always be predicted from the transient waveforms.

If expression effects in ssVEP were the same as identity effects, which were found to be optimal at 5.88 Hz, we would have expected the optimal frequency in our experiment to be at 6 Hz, not 5 Hz. This discrepancy suggests subtle differences in the timing of identity and expression processing for faces. It would support assertions that identity effects in the N170 and expression effects in the early posterior negativity are distinct effects in occipito-temporal electrodes (Rellecke et al., 2011; Schacht & Sommer, 2009). Because of their similar latencies and topographical distribution, the N170 and the early posterior negativity are often superimposed, which may account for disagreements in the literature about whether facial expression influences the N170 amplitude (see Rellecke et al., 2013, for a discussion on the topic).

Another important aspect to consider is potential confounds introduced by interaction between identity and expression processing. There is strong evidence suggesting the mechanisms for these two forms of facial processing are not completely distinct (Baudouin, Martin, Tiberghien, Verlut, & Franck, 2002; Schweinberger, Burton, & Kelly, 1999; Schweinberger & Soukup, 1998; Stoesz & Jakobson, 2013). However, more relevant to the current work, behavioral adaptation studies have shown that, while alterations in identity between adapting and test stimuli reduce the magnitude of expression aftereffects, alterations in expression have no impact on the magnitude of identity aftereffects (Campbell & Burke, 2009; Fox & Barton, 2007; Fox, Oruc, & Barton, 2008; Vida & Mondloch, 2009). These data suggest that changes in expression in our different conditions would not alter the activity in identity representations. Hence, keeping facial identity constant in the different and same conditions should be sufficient to ensure that activity in identity representations contribute minimally to the subtraction analysis.

Our study localized the anatomical generators of the steady-state expression response at 5 Hz to right occipito-temporal cortex, including the fusiform gyrus as well as the superior temporal sulcus. This is in agreement with neuroimaging data, suggesting that processing of facial expression may involve these regions (Allison et al., 2000; Ganel, Valyear, Goshen-Gottstein, & Goodale, 2005; Haxby, Hoffman, & Gobbini, 2000; Lang, Bradley, & Cuthbert, 1998; Phan, Wager, Taylor, & Liberzon, 2002). We did not find, however, activation in the prefrontal cortex or the limbic system, which are reportedly activated in a wide range of facial expression tasks (for a review, see Phan et al., 2002). This may reflect the inability of the source localization method to determine deep sources such as the limbic structures (see e.g., Grech et al., 2008; Hauk, 2004). It can also reflect the fact that the prefrontal cortex does not contribute significantly to a steady-state expression response at 5 Hz. For example, frontal ERP modulations in response to fearful expressions have been found at around 120 ms, which is equivalent to an 8.33 Hz cycle (Eimer & Holmes, 2002).

There are technical implications of this work for ssVEP studies. One key point is that the magnitude and spatial distribution of the steady state response is strongly related to the stimulation frequency, which in turn depends on the type of stimulus used and the recording site (Norcia et al., 2015; Rossion, 2014). It follows that ssVEP studies should use temporal frequencies optimal for the cognitive process being investigated (Rossion, 2014). If nonoptimal frequencies are selected, nonspecialized networks may be tagged or destructive interference of the neural responses may occur, leading to null results. Attention to the parameter of frequency will be critical to enhancing the value of data of future studies.

Acknowledgments

This research was supported by CIHR MOP-106511, the Canada Research Chair, and the Marianne Koerner Chair in Brain Diseases to J. B.

Signal entrainment to the periodic stimulation. Signal–noise ratio values, averaged over all electrodes, are presented for each of the two expression conditions and for 10 response frequency bins centered on the stimulation frequency. Note that for all stimulation frequencies, the highest value occurred at a response frequency that corresponded to the stimulation frequency, demonstrating the efficacy of periodic stimulation in entraining the brain response.

Figure 2

Signal entrainment to the periodic stimulation. Signal–noise ratio values, averaged over all electrodes, are presented for each of the two expression conditions and for 10 response frequency bins centered on the stimulation frequency. Note that for all stimulation frequencies, the highest value occurred at a response frequency that corresponded to the stimulation frequency, demonstrating the efficacy of periodic stimulation in entraining the brain response.

Steady-state expression response for all frequencies. (A) Signal–noise ratio in both experimental conditions for the 5 Hz stimulation frequency is shown for the electrodes with a significant difference between identical and different expression conditions. (B) Steady-state expression response (signal–noise ratio to different emotions minus signal–noise ratio to identical emotions) at all stimulation frequencies, for all electrodes. The highest response is observed at 5 Hz, especially for electrodes P6, P8, P10, and PO8 (indicated in full red circles).

Figure 3

Steady-state expression response for all frequencies. (A) Signal–noise ratio in both experimental conditions for the 5 Hz stimulation frequency is shown for the electrodes with a significant difference between identical and different expression conditions. (B) Steady-state expression response (signal–noise ratio to different emotions minus signal–noise ratio to identical emotions) at all stimulation frequencies, for all electrodes. The highest response is observed at 5 Hz, especially for electrodes P6, P8, P10, and PO8 (indicated in full red circles).

Frequency tuning function of the steady-state expression response, averaged over occipital electrodes. Data for left hemisphere is on the left, that for the right hemisphere on the right. Error bars show one standard error.

Figure 4

Frequency tuning function of the steady-state expression response, averaged over occipital electrodes. Data for left hemisphere is on the left, that for the right hemisphere on the right. Error bars show one standard error.

Spectrum for stimulation frequency of 5 Hz. The SNR averaged over electrodes P8 and PO10 and obtained in response to both experimental conditions when the stimulation frequency was 5 Hz is shown for all frequencies from 1 Hz to 20 Hz. Note that the response at harmonics frequencies 10 Hz, 15 Hz, and 20 Hz is not as strong as at the fundamental frequency of 5 Hz.

Figure 5

Spectrum for stimulation frequency of 5 Hz. The SNR averaged over electrodes P8 and PO10 and obtained in response to both experimental conditions when the stimulation frequency was 5 Hz is shown for all frequencies from 1 Hz to 20 Hz. Note that the response at harmonics frequencies 10 Hz, 15 Hz, and 20 Hz is not as strong as at the fundamental frequency of 5 Hz.

Comparison of both experimental conditions in the time domain, for 5 Hz frequency. (A) Grand averaged ssVEP waveforms to identical and different expressions as well as the difference (red line) between both conditions. The waveforms have been averaged over 400 ms to show two stimulation cycles for 5 Hz frequency. (B) Topographical distribution of the steady-state expression response, where blue indicates larger response in the different expressions condition (left), as well as the results of the statistical comparison between the two conditions (right). Note how the effects are highest over occipito-temporal electrodes of the right hemisphere.

Figure 6

Comparison of both experimental conditions in the time domain, for 5 Hz frequency. (A) Grand averaged ssVEP waveforms to identical and different expressions as well as the difference (red line) between both conditions. The waveforms have been averaged over 400 ms to show two stimulation cycles for 5 Hz frequency. (B) Topographical distribution of the steady-state expression response, where blue indicates larger response in the different expressions condition (left), as well as the results of the statistical comparison between the two conditions (right). Note how the effects are highest over occipito-temporal electrodes of the right hemisphere.

Results of the source localization algorithm. The 5 Hz steady-state expression response was localized to a neural network within the right occipito-temporal cortex including the superior temporal sulcus (STS) and the fusiform gyrus (FG). Note how the sources including the right FG were active in both conditions although stronger in the different expressions condition. Sources including the superior-temporal sulcus were active mainly in the different expressions condition.

Figure 7

Results of the source localization algorithm. The 5 Hz steady-state expression response was localized to a neural network within the right occipito-temporal cortex including the superior temporal sulcus (STS) and the fusiform gyrus (FG). Note how the sources including the right FG were active in both conditions although stronger in the different expressions condition. Sources including the superior-temporal sulcus were active mainly in the different expressions condition.