Received April 4, 2017; Revised June 24, 2017; Accepted June 24, 2017.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The P300 wave in electroencephalography (EEG) has been widely used for brain–computer interfaces (BCIs). The purpose of this study is to reduce the large latency in detecting P300 wave using sequential detection theory. The probability of a P300 wave from each EEG measurement is calculated using an ensemble of support vector machines with simple post-processing. The sequential detection of a P300 wave is formulated as either binary or multiple hypothesis tests. A decision is made as soon as the accumulated probability of a P300 wave reaches decision boundaries given by Wald’s approximation. The experimental results agreed with the theory and often showed fewer errors than predicted. With binary hypotheses, the probabilities of a miss and a false positive were close to and often lower than theoretical predictions. For multiple hypothesis tests, sequential detection required much fewer samples than fixed-sample-size detection to achieve the same accuracy. Thus, sequential detection of P300 waves enables a high accuracy and low latency of BCIs.

The P300 wave is the selective increase in an electroencephalography (EEG) around 300 ms after a stimulus onset [1, 2], and its amplitude depends on the subject’s attention on the given stimulus. The P300 wave has been widely used for brain–computer interfaces (BCIs) [3, 4]. For example, the P300 speller [5] is designed to infer a user’s intention solely from brain signals and is a valuable tool for individuals with physical disabilities [6–8]. A user focuses on a character among 36 candidates arranged in a 6 × 6 matrix (Figure 1(a)). While the user maintains his or her attention on a target character, rows and columns of this matrix intensify in a random order. The EEG corresponding to the target character contains a P300 wave (Figure 1(b)), which is detected by the P300 speller.

In P300-based BCIs, accurately detecting a P300 wave is essential for achieving high accuracy. Unfortunately, detecting a P300 wave from an EEG is challenging because of the low signal-to-noise ratio of EEGs and the irregularity of brain activities. Thus, in existing BCIs, P300 waves are detected from average EEG measurements after a large number of repeated trials.

Such a long latency in detecting a P300 wave is a critical limitation of P300-based BCIs [9–11]. For example, one trial of the P300 speller contains 12 stimuli (six rows and six columns), and more than 90% accuracy is achieved by combining 15 repeated trials for each stimulus using state-of-the-art machine learning techniques [12].

Because of the delay of the P300 wave itself (approximately 300 ms) and potential interferences between responses to consecutive stimuli, stimuli must be separated by at least 1 secend. Thus, repeating 15 trials for each stimulus involves presenting 180 stimuli to detect only one character, which takes approximately 2–3 minutes.

Thus, recent studies have focused on reducing the number of repeated trials for detecting a P300 wave while maintaining the accuracy of BCIs [13–15]. In [13], stimulus sets were adaptively switched depending on some performance criterion. Introducing such flexibility to the BCI system could potentially decrease the latency, but selecting the optimal criterion remained elusive. In [14, 15], a Bayesian approach was adopted by continuously updating probabilities of target characters. Such a probabilistic model offered an optimal control of the number of repetitions. However, full Bayesian models require optimal choice of hyper parameters and high computational complexity.

In this study, a rather different approach is introduced to efficiently adjust the number of trials with clearly defined optimality criteria. Specifically, sequential detection [16] is used where the number of trials is adaptively adjusted according to the reliability of individual trials. For conventional detection, probabilities of candidate hypotheses are compared after repeating a fixed number of trials. In contrast, sequential detection keeps track of the probabilities of candidate hypotheses with an increasing number of trials and makes a decision as soon as these probabilities satisfy predetermined criteria. Thus, with sequential detection, the number of trials required for reliable detection is a random variable, and its average is much smaller than that required by the fixed-sample-size detection to achieve the same accuracy as conventional detection [16].

The remainder of this paper is organized as follows. Section 2 describes the calculation of P300 probabilities for individual trials. In Section 3, sequential detection of a P300 wave is formulated as binary and multiple hypothesis tests. The theory developed in Section 3 is compared with experimental results in Section 4. Finally, Section 5 provides conclusions.

2. Calculation of P300 Probabilities from Individual Trials

In order to perform sequential detection, the probabilities of candidate hypotheses must be accurately calculated for individual trials. With fixed-sample-size detection, the hypothesis with the maximum probability is chosen and thus, only relative differences of hypotheses’ probabilities are sufficient to make a decision. In contrast, sequential detection compares the probabilities of hypotheses with predetermined values at each trial. Thus, those probabilities must be estimated for each trial.

In order to calculate the P300 probability from an EEG, a support vector machine (SVM) [17, 18] is used with the following two modifications. First, multiple SVMs are trained using different subsets of training data to form an ensemble of SVMs. In general, combining multiple classifiers to make a decision simplifies training of constituent classifiers and improves the classification accuracy [19, 20]. The same holds true for SVMs; in fact, an ensemble of SVMs achieves the best accuracy for the P300 speller [12]. Second, the SVM is modified to produce the probability of the target label for a given trial. To this end, a clever way of transforming the SVM output to a probability is adopted [21].

Given training samples xi ∈ with class labels yi ∈ = {1,−1}, training an SVM is equivalent to finding a mapping from into . In conventional SVMs, the class label of a sample x ∈ is determined by the sign of an intermediate output f(x), which is given by [17, 18]:

(1)f(x)=∑iγiyiK(x,xi)+γo,

where γi and γo are parameters to be trained, and K(·, ·) is the kernel (intuitively, a generalization of the inner product to a high dimensional space). The estimated class label of x is determined by the sign of f(x):

(2)y^(x)={1if f(x)>0-1otherwise.

In real applications, the given training dataset is not separable, and some errors are unavoidable. For such cases, the penalty for errors is adjusted by a user-controlled parameter C [17, 18].

An ensemble of SVMs is trained as follows. The training samples are divided into M partitions, and an SVM is trained for each partition. A natural way to combine the outputs of multiple SVMs is to sum individual SVM outputs and make a decision on the basis of the sign of the sum:

(3)fe(x)=∑m=1Mfm(x),(4)y^(x)={1if fe(x)>0,-1otherwise,

where fm(x) is the mth SVM’s output for sample x from Eq. 1 and ŷ(x) in Eq. 4 is the estimated label of sample x from the ensemble of SVMs. When the individual SVMs in the ensemble are linear (i.e., the kernel in Eq. 1 is the simple inner product), the resulting combined SVM is another linear SVM. Thus, the ensemble of linear SVMs in conjunction with Eq. 3 is used for efficient training and to provide a better interpretation.

The relationship between the SVM output and the probability of a P300 wave is derived as follows. Let us denote the log odds of an P300 wave from an EEG x by

(5)log(P(y=1∣x)P(y=-1∣x))≡g(x),

where y is the class label indicating the presence (y = 1) or absence (y = −1) of a P300 wave. These log odds are related to the output of the SVM f(x) by a simple linear transform, proposed by Platt [21] (The signs of the parameters in (6) are opposite of the original proposal in [21], where f(x) and g(x) are parameterized with a negative a, which does not seem natural for interpreting g(x). In contrast, the SVM output in (6) is related to the log odd without a sign change, which offers a better interpretation of g(x) as the log odd.):

(6)g(x)=af(x)+b,

where a and b are parameters obtained by cross-validation. Thus, combining Eqs. 5 and 6 yields the explicit calculation of the probability of the P300 wave (y = 1) for a given EEG:

(7)P(y=1∣x)=11+e-g(x)=11+e-af(x)-b≡g˜(x).

It is worth noting that this transformation from the SVM output to the conditional probability in Eq. 7 is simply the sigmoid function with some scale a and translation b. Using this transform, the SVM output, which takes an arbitrary real number, is normalized to a conditional probability in the unit interval.

3. Sequential Detection of P300 Waves

After the probability of a P300 wave is calculated from each EEG measurement described above, this probability is used for sequential detection based on either binary or multiple hypothesis tests.

3.1 Sequential Detection with Binary Hypotheses

The detection of a P300 wave from repeated EEG measurements (trials) for the same stimulus is formulated as a binary hypothesis test as follows. From Nmax repeated EEG measurements {xn|n = 1, 2, …, Nmax} for the same stimulus, the SVM produces a sequence of output values {f(xn)|n = 1, 2, …, Nmax}, where n indexes the trial. The goal is to choose one hypothesis between H1 : y = 1 and H0 : y = −1. The log odds based on the first N trials are calculated using the post-processing in Eqs. 5 and 6:

(8)πN=∑n=1Nlog(P(y=1∣xn)P(y=-1∣xn))=∑n=1N(af(xn)+b).

The difference between detection with a fixed sample size and sequential detection is explained as follows. The traditional binary hypothesis test with a fixed sample size N makes a decision after observing a fixed number N of samples:

(9)πN⋛H0H10.

In contrast, sequential detection with binary hypotheses (SDBH) asks at each trial if the accumulated πN is sufficient to make a decision before collecting more samples. At trial N, SDBH makes a decision based on πN as follows:

where πH and πL are decision boundaries determined by the target error probabilities [16]. When only a finite number of samples are measured and the decision is deferred until the last measurement, the decision is made based on the last log odd πNmax⋛H0H10.

The intuition behind the decision rule in Eq. 10 is that when πN is sufficiently high (low), SDBH declares H1 (H0) without collecting any more samples (respectively). Otherwise, if the evidence is insufficient, SDBH collects more samples.

3.2 Sequential Detection with Multiple Hypotheses

Next, let us consider a multiple hypothesis test for detecting a P300 wave among multiple EEG measurements, which corresponds to choosing one row (or column) among six candidates in the P300 speller. To be specific, EEGs are measured for each stimulus, which is repeated up to Nmax trials. These measurements are denoted by {xns∣s=1,2,…,S, n = 1, 2, …, Nmax}, where s and n index stimulus and trial, respectively. The hypotheses that s gives rise to a P300 wave and others do not are Hs : ys = 1 and yt≠s = −1. The goal is to find the hypothesis with the maximum probability among S hypotheses. To this end, let us define the vector of log odds after N trials by

Note that the decision rule of SDMH in Eq. 15 has only an upper bound as opposed to SDBH in Eq. 10, which has both lower and upper bounds. Another technical difference between multiple and binary hypothesis tests arises in the definition of errors. The binary hypothesis test involves two types of errors: misses and false positives. However, those errors are not separable for multiple hypothesis tests. Therefore, the probability of correct detection is measured for SDMH.

4. Results

4.1 Data Set

P300 waves were detected for BCI data set II (the P300 speller paradigm) from the BCI Competition III 2004 [23]. This dataset consisted of EEG recordings of healthy Subjects A and B using the P300 speller. In an epoch, a subject was asked to focus attention on one character among 36 characters, displayed as shown in Figure 1(a). In a trial, while the subject maintained focus on the target character, six rows and six columns were intensified in a random order, and corresponding EEGs were measured. This trial was repeated 15 times for each epoch. Training and testing data sets contained 85 and 100 epochs, respectively.

4.2 P300 Probabilities from Individual Trials

The ensemble of SVMs was trained as follows. First, the training data set was randomly partitioned into 17 sets and used to train an ensemble of SVMs. Next, parameters a and b in Eq. 6 were estimated for the combined SVM using whole training samples by employing the algorithm in [21].

Using the SVMs trained above, log odds were calculated for 100 epochs in the test data that were not used for training. Figure 2 shows a typical test result for an epoch. In Figure 2(a), the log odds are plotted as a function of the trials, where each color corresponds to a particular stimulus. The dashed line represents the reference value of log(π1π0)=-log5. Figure 2(a) demonstrates the high level of uncertainty in each trial; i.e., simply comparing the log odds with the reference at each trial would result in frequent errors. In contrast, Figure 2(b) shows that the accumulated log odds tended to increase for y = 1 (purple and yellow) and decreased for y = −1; thus, detection accuracy increased as N increases. For example, if one made a decision at N = 2, the blue and purple lines were the two largest values, and the former was a false positive. On the other hand, when N = 15, the yellow and purple lines stood out from the rest and resulted in correct detections.

4.3 Sequential Detection with Binary Hypotheses

A P300 wave was detected from a given EEG signal in response to a single stimulus. The number of trials is dynamically adjusted to meet the targets Pm and Pf. The feasibility of the sequential detection of P300 waves was tested experimentally by checking whether the experimentally measured Pm and Pf achieve their target values.

The measured values of Pm and Pf with binary hypotheses were consistent with the theoretical predictions. Figure 3 shows the results of SDBH for a range of β and fixed α = 0.01 (target Pm was varied while the target Pf was fixed as 0.01.) In Figure 3(a), as β increased, the upper bound πH increased, indicating a stricter requirement for a sample to be declared as positive. The decrease in the lower bound πL in Figure 3(a) was due to its dependence on β in Eq. 11. Figure 3(a) shows that the average number of trials (E[N]) of SDBH increased as β increased. Figure 3(c) shows the measured Pm value (solid) compared with the target 1 − β (dashed). Figure 3(d) shows the measured Pf value (solid) compared to the target α = 0.01 (dashed). Overall, the measured values of Pm and Pf were in good agreement with the target values. When β was small, SDBH obtained smaller values of Pm and Pf than the targets. When β ≈ 1, the performance slightly degraded from what theory predicted.

This deviation for a large β can be interpreted as the effect of the limited number of trials (Nmax = 15). Sequential detection with Wald’s approximation assumes an unlimited number of samples (Nmax = ∞). However, in the given dataset, Nmax was limited to 15. As shown in Figure 3(b), E[N] ≈ 15 for β = 1, indicating that SDBH was forced to make a decision at N = 15, even though more samples were required. This lack of samples accounted for the degradation of Pm and Pf for a large β ≈ 1.

To compare the performance of SDBH with that of fixed-sample-size detection, true positive rates and false positive rates of 100 test epochs are plotted on the receiver operating characteristic curve in Figure 4. The parameters of SDBH (α = 0.01, β = 0.93) were chosen to produce E[N] = 7, which matched with the number of samples used for fixed-sample-size detection (Nmax = 7). Figure 4 shows that SDBH (blue circles) tended to have higher true positive rates and lower false positive rates than the fixed-sample-size detection (black squares), indicating the superiority of SDMH.

4.4 Sequential Detection with Multiple Hypothesis Tests

Next, P300 waves were sequentially detected with multiple hypotheses. Each trial of the test data was comprised of EEG recordings for six row and six column stimuli. Two SDMHs were independently performed for the rows and columns to detect a row and column with P300 waves. Correct detection was achieved when both row and column estimates were correct. The probability of correct detection, PC, was measured for 100 epochs of test data.

The PC and E[N] for sequential detection were compared with those of fixed-sample-size detection. Figure 5 shows the SDMH results for Subject A. As the decision boundary πH increased, E[N] increased and then saturated at 15 (Figure 5(a)). As πH increased to πH = 5, PC increased and reached the maximum value of 0.98 (Figure 5(b)). Above this point, PC slightly decreased and then stayed at 0.97. In Figure 5(c), PC is plotted as a function of the average number of trials (solid), compared to PC for a fixed sample size N (dashed). The former lay above the latter when E[N] ≥ 4, indicating that SDMH required fewer trials than fixed-sample-size detection to obtain the same PC value.

In Figure 6, SDMH for Subject B shows similar patterns to that of Subject A (Figure 5). As the decision threshold πH increased, E[N] and PC increased (Figures 6(a) and 6(b)). The PC of SDMH (solid) was greater than that of fixed-sample-size detection (dashed) for E[N] ≥ 4 (Figure 6(c)).

5. Discussion

Detecting a P300 wave from EEGs requires repeated measurements because of low signal-to-noise ratios, which results in a large latency in BCIs. Sequential detection overcomes this limitation by reducing the number of repeated trials to achieve the same accuracy. Specifically, the probabilities of P300 waves across trials are monitored, and a decision is made as soon as the probabilities meet predetermined criteria. During this procedure, probabilities of P300 waves are calculated using an ensemble of SVMs with simple post-processing.

Experimental results with the P300 speller data showed that sequential detection outperforms fixed-sample-size detection. The accuracy of SDBH was consistent with theoretical results. Sequential detection achieved lower probabilities of misses and false positives than the given target values. SDMH required fewer trials than fixed-sample-size detection to achieve the same detection accuracy.

For Subject A, SDMH achieved higher accuracy with fewer trials than fixed-sample-size detection, which requires more trials. This implies that, in some cases, making an earlier decision and ignoring the remaining signals improves accuracy. This presumably results from the inherent stochasticity of neural activities in the brain or the subject’s wandering attention in later trials.

Therefore, adaptively adjusting the number of repeated trials using sequential detection is a promising solution for increasing BCI throughput without sacrificing accuracy.

Acknowledgements

This work was supported by Incheon National University Research Grant in 2017.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

(a) In the P300 speller, 36 characters were displayed in a 6×6 matrix. While a user focused attention on a target character, rows and columns were intensified in a random order, and corresponding EEGs were measured. Panel (a) shows an example when the third row was intensified. (b) When 15 repeated EEG measurements for the same stimulus were averaged (Subject A, channel Cz), the EEG corresponding to the target character (solid blue) had a larger amplitude 0.2–0.5 seconds after the stimulus onset than to non-target characters (dashed red). This selective difference in an EEG is called the P300 wave.

When the ensemble of SVMs was used to estimate the log odds of the test data, the estimated log odd at each trial showed a high level of uncertainty (a), but the accumulated log odds πN enabled more accurate detection (b). In panel (a), log odds of different stimuli are shown in different colors with the reference (dashed line) according to the prior probabilities: logπ1π0=-log5. In panel (b), the accumulated log odds πN of positive (y = 1) and negative (y = −1) samples became separable as N increased.

Sequential detection of P300 waves with binary hypotheses agreed well with theory. From target probabilities Pm = P(ŷ = −1|y = 1) = 1− β and Pf = P(ŷ = 1|y = −1) = α, decision boundaries according to Wald’s approximation in Eq. 11 were determined. When β increased with a fixed α = 0.01, πH increased, but πL decreased (a). Consequently, the average number of trials used for sequential detection increased (b). Measured Pm (solid line in (c) agreed with or often performed better than the target 1 − β (dashed line in (c), and measured Pf (solid line in (c) stayed close to the target value of 0.01 (dashed line in (d)). Thus, the accuracy of sequential detection achieved the target probabilities of misses and false positives.

Sequential detection with multiple hypotheses required fewer repeated trials than detection with fixed trials (Subject A). As the decision boundary πH increased, the average number of trials E[N] and the probability of correct detection PC increased ((a) and (b)). In panel (c), PC of sequential detection is plotted as a function of E[N] (solid line) and compared with PC of fixed-sample-size detection (dashed line). To obtain a given PC > 0.6, sequential detection required fewer trials than detection with a fixed number of trials.

Sequential detection with multiple hypotheses for Subject B showed similar results as for Subject A (in Figure 5). As πH increased, the average number of trials E[N] and the probability of correct detection PC increased ((a) and (b)). On average, sequential detection (solid line) required fewer trials than fixed-sample-size detection (dashed line) to obtain the same PC > 0.7 (c).

Yongseok Yoo is an assistant professor in the Department of Electronics Engineering at Incheon National University. He received his Ph.D. degree in Electrical and Computer Engineering from the University of Texas at Austin. He received M.S. and B.S. degrees in Electrical Engineering from Seoul National University. He has worked for Samsung Advanced Institute of Technology and Electronics and Telecommunications Research Institute. His research interests include computational neuroscience and neural signal processing.