Recently, much attention has been paid to Hybrid Brain-Computer Interfaces (BCI). In this study, we developed a hybrid BCI speller that simultaneously utilized information from both hand Electromyography (EMG) and SSVEP. This cross-modal BCI speller could increase the target number so as to enhance the information transfer rate (ITR). A 60-target hybrid BCI speller was built in this study. A frame-based sampled sinusoidal stimulation method was used to generate the flickering stimulus on the LCD screen. The 60 targets were equally divided into 4 sections, and each section had the same frequency range. EMG signal was used to distinguish different sections. Subjects were required to repeatedly make a fist from 0 to 3 times when the target was shown in section 1 to section 4. Then by extracting the envelope of the EMG signal and calculating the number of peaks of the envelope, we could know which section the target was in. Canonical Correlation Analysis (CCA) method was used to classify the SSVEP signal. The offline results showed that ITR achieved maximum value when the time window was set to be 2 s. The average classification accuracy of a 2 s time window was 80.5% and information transfer rate was 83.2 bit/min using the proposed hybrid BCI system. While the ITR was 32.7 bit/min for EMG only condition and 58.2 bit/min for SSVEP only condition, which revealed that the hybrid system had better performance than the two single-modal modalities.

Hybrid BCIs combine multiple different approaches in an effort to take advantage of the various strengths that each BCI has on its own (Allison et al. 2010; Pfurtscheller et al. 2010a; Leeb et al. 2011; Lalitharatne et al. 2013;Amiri et al. 2013; Xu et al. 2013; Yin et al. 2013). Generally, two kinds of BCIs can be fused to become a hybrid BCI. For example, SSVEP-motor imagery hybrid BCI combined information from SSVEP and motor imagery to enhance classification accuracy (Allison et al. 2010). Moreover, SSVEP-motor imagery hybrid BCI can be used for orthosis control (Pfurtscheller et al. 2010b). Another kind of hybrid BCI is the P300-SSVEP hybrid BCI. After target stimuli, SSVEP were dismissed and replaced by P300 potentials, and this phenomenon was called SSVEP blocking. By using SSVEP blocking, the hybrid speller achieves higher accuracy and ITR than P300 speller on its own (Xu et al. 2013). SSVEP stimuli could also be superimposed onto the P300 stimuli to increase the difference among targets; this kind of hybrid BCI system can also enhance the accuracy and ITR significantly (Yin et al. 2013). A P300-motor imagery hybrid BCI is another possible combination (Rebsamen et al. 2008). P300 was suitable for discrete control applications and motor imagery was often used for continuous control, therefore the combination of these two types of BCI systems could provide more complicated and practical applications.

Another type of hybrid BCI system combined one BCI system with another system based on other physiological signals such as electromyogram (EMG). Although it is debatable if this type of system should be called hybrid BCI, it can be used for disabled people with all their residual functionalities and enhance the performance of the system. Thus, more and more researchers support this kind of hybrid BCI for practical use (Nijholt et al. 2011; Amiri et al. 2013; Lalitharatne et al. 2013). EMG-motor imagery hybrid BCI was developed to achieve better and more stable performance compared to the single conditions (Leeb et al. 2011). Subjects were asked to move their left or right hand. The results showed that the accuracy of the hybrid system with different kind of fusion method was higher than single modalities. EMG-P300 hybrid system is another kind of EMG based hybrid BCI system. In (Holz et al. 2013), researchers used the EMG signal to cancel any spelling errors that occurred when using a P300 based speller. The efficiency of the hybrid BCI-system was evaluated in terms of time for selection, percent of errors, and users frustration. The results illustrated that the hybrid system improved the performance in all three three aspects.

In this study, we designed a hybrid BCI speller using the information combined from hand EMG signal and SSVEP. The main advantages of SSVEP compared to other BCI systems are its high signal-to-noise ratio (SNR), little user-training, and high information transfer rate (ITR) (Gao et al. 2003; Wang et al. 2010; Bin et al. 2009; Chen et al. 2013). However, SSVEP is only capable of showing a good response within a limited frequency range, which limits the number of targets. Researchers have tried several methods to increase the number of targets such as phase-tagging and using intermodulation frequencies (Jia et al. 2011; Pan et al. 2011; Chen et al. 2013). In this study, we used EMG signal to increase target number. This combination was advantageous because both of the modalities can be recognized within a fairly short time and the interaction between them is negligible. Some researchers utilize different gestures to represent different commands (Chen et al. 2007; Zhang et al. 2009). However, these methods required training before using and the extracted features of this method were not stable due to muscle fatigue and gesture strength. So, we used the features from the EMG envelope of different gesture repetition times to represent different commands. All of the targets were equally divided into 4 sections, and each section had the same frequency range. When a target was shown in a particular section, subjects were required to stare at the target and make a fist several times corresponding to the section simultaneously. Then by calculating the numbers of peaks in the EMG envelope, we could deduce which section the target was in. Canonical Correlation Analysis (CCA) method was used to classify the SSVEP signal in order to determine which target in a particular section one was focusing. Offline studies were conducted among 10 subjects to investigate the feasibility of our hybrid method and determine the optimal parameters for the future online studies.

A. Subjects

Ten healthy subjects (four males and six females; mean age 25.6 ± 2.55 years) volunteered to participate in the experiment. The number of subjects was sufficient compared to some other relevant study (Holz et al. 2013; Xu et al. 2013; Hwang et al. 2012; Volosyak 2011). All subjects had normal or corrected to normal vision. Each subject signed an informed consent form prior to the experiment and was paid for the participation.

B. Data acquisition

EEG and EMG data were simultaneously recorded using a Neuroscan system (Neuroscan Inc.) with a sampling rate of 1000 Hz. Nine EEG electrodes sites (Pz, POz, PO3, PO5, PO4, PO6, O1, Oz, O2) were selected because of their elevated higher SSVEP response (Chen et al. 2014). EEG electrodes were placed according to the 10–20 system and the reference electrode was located at the vertex. EEG electrode impedances were kept below 10 kΩ. Two channel forearm EMG signals were recorded at the same time. Figure 1 illustrates the placement of the two channel EMG electrodes. One electrode was used for re-reference, while the other one was used to obtain the EMG signal. EMG electrode impedances were kept below 50 kΩ. During the recording, the subjects were seated on a comfortable chair in a quiet room.

A total of sixty (6 × 10) targets were presented on the screen during the experiment. A typing window was located at the top the screen. Each target was presented within a 100 × 100 pixels square and contained a particular character (26 English letters, 10 digits, 18 punctuations, and 6 operators). The targets were arranged into 4 sections, with each section containing 15 targets. The distance of adjacent targets in the same section was 50 pixels. The distance between targets that were horizontally or vertically adjacent but in different sections was 260 pixels and 100 pixels respectively. The flash frequency of targets in the same section was from 6Hz to 11.6Hz with a frequency interval of 0.4Hz, and different sections contained the same frequency range. Previous studies have reported that SSVEP responses can be clearly observed in this frequency range (Gao et al. 2003; Wang et al. 2010]. We designed the frequency arrangement as shown in Figure 2 to classify targets which were adjacent more effectively. We used a sampled sinusoidal stimulation method to realize flash visual stimulus presentation (Chen et al. 2014). Suppose the screen refresh rate was F, the stimulus luminance of the target in the ith frame with flash frequency of f was

The offline experiment contained 2 blocks of 60 6-s trials. All targets were flashed in their particular frequency during the 5 s stimulus period. A red triangle was below one target indicating the target that subject needed to stare at. The focus target was selected randomly and each target was chosen for only one time. During the flickering period, subjects were also required to make a fist in a given repetition time as fast as possible. The repetition time from section 1 to section 4 was 0, 1, 2 and 3 respectively. Different repetition time can lead to different EMG envelop, so from the EMG data we can see which section the subject was staring and the SSVEP data was used to classify different targets in a given section. Then the subject could rest for 1 s before the next trial began. During the rest period, the target which subject was required to stare at turned

D. Data analysis

A mean filtering algorithm was used to classify the EMG data (Lin et al. 2014). First, the EMG reference was subtracted from the signal recorded at EMG electrode 1 in order to eliminate the ECG noise. Then a 5Hz highpass FIR filter was used for the EMG data to remove the baseline shift. The mean filtering algorithm was illustrated as equation (2),

where refers to the value of the envelope at sample point t, refers to the value of EMG signal at sample point n, and the window size W = 100. Then another mean filter which window size was points was used for the envelope to further smooth the curve. Afterwards, a threshold method was used to count the peak number of the envelope signal. A peak was detected when the envelope signal was above the threshold, and continues until it was below the threshold. Half of the max value of the envelope was set to be the threshold. Peaks were rejected if there lengths were less than 50 sample points to eliminate the interference of the noise. Then the gesture number was observed by evaluating the number of peaks. When the whole envelope curve was below a given threshold, the gesture repetition time was zero. Figure 3 illustrates the raw EMG signal, envelopes of the signal, and the threshold method.

Figure 3

Illustration of raw EMG signal, the envelope of the EMG signal and the threshold method. The repetition time was 3 in the figure. By subtracting the signal of EMG electrode channel with EMG reference channel, the ECG artifact could be eliminated. Then the envelope of EMG data was obtained by using a mean filter method. The number of peaks in the EMG envelope was 3 which was equal to the repetition time, so we can know which section the target was in blue to indicate the next target. The offline session, allowed us to optimize stimulus time to achieve the best performance.

To classify the EEG data, we used Canonical Correlation Analysis (CCA) method (Lin et al. 2006; Bin et al. 2009; Chen et al. 2014). CCA was implemented using the canoncorr function in Matlab. The reference signals were composed of sinusoids and cosinusoids pairs at the same frequency of the stimulus and its second and third harmonics.

In the offline experiment, the raw EEG data were segmented into 5 s segments. Then a 1–40 Hz bandpass FIR filter was used to eliminate low and high frequency band noise. SSVEP to each stimulus frequency was estimated by calculating the frequency spectrum through the fft function. The classification accuracy was calculated in each block for each subject. Then ITR was calculated to evaluate the system performance using the method defined by Wolpaw et al. (Wolpaw et al. 1998; Yuan et al. 2013). The ITR calculation was illustrated as equation (3)

whereis the number of targets, is the mean accuracy averaged over all targets and (seconds/target) is the time for a selection. contained two parts, gaze time and rest time, and in this study, rest time was 1 s. To demonstrate the performance of any of the two single-modal modalities, we also calculate the classification accuracy and ITR for each modality. The EEG data were neglected in the EMG accuracy calculation. If the gesture number determined from the EMG signal matched the corresponding section, the trial was determined to be accurate. In the SSVEP accuracy calculation, a trial was a success if the discriminant target had the same frequency as the actual target regardless of which section the discriminant target was in. In the hybrid accuracy calculation, the trial was considered a success only if the discriminant target was exactly the same as the real target in a given trial. N in equation (3) to calculate the ITR was 4 for EMG only modal, 15 for SSVEP only modal, and 60 for the hybrid system. In order to investigate the influence of the length of time window to the system performance, accuracy and ITR were calculated separately with different epoch time varying from 0.5 s to 5 s with an interval of 0.5 s. Then a two-tailed t-test was conducted on the ITR value to verify the better performance of the hybrid system. Lastly, the classification accuracy for each EMG command was calculated and a two-tailed t-test was done for each pair of EMG commands. The same procedure was also done for the SSVEP classification.

Figure 4 shows the frequency spectra of one representative subject for each target in section 1 at Oz electrode site with a frequency resolution of 0.2 Hz in one block. Similar results were observed in other sections, other blocks, and other subjects. The SSVEP response of each target could be clearly seen as peaks at the corresponding frequencies and harmonics, which suggested that the system we designed could evoke SSVEP and classify different targets with different flickering frequencies effectively.

Figure 4

Frequency spectra of signals recorded at Oz electrode of a single block on one representative subject. The red circles stand for the corresponding main frequencies.

Figure 5 illustrated the result of classification accuracy and ITR with respect to different lengths of the time window. The classification accuracy increased as the time window grew larger until reaching a plateau. The classification accuracy was lower in the hybrid system compared to the other two conditions because accurate classification in the hybrid system could only occur when both EMG and SSVEP were accurate. It could be clearly seen that the increase of accuracy for the three curves grew much slower after 2 s. The EMG curve was especially flat after 2 s because the hand movement was almost completely done within 2 s. As shown in Table 1, with the time window of 2 s, the mean classification accuracy was 80.8%, 85.6% and 94.8% for hybrid system, SSVEP only modality, and EMG only modality respectively. It also could be seen in the ITR result that the ITR could reach the largest value when a time window was 2 s. The value of ITR for the hybrid system was significantly higher than single-modal modality despite the lower classification accuracy, because the target number was much more than the two single-modal modalities (hybrid and EMG: t(19) = 10.25, p <0.001; hybrid and SSVEP: t(19) =10.84, p <0.001). The mean ITRs for the three conditions were 83.7bit/min, 58.0bit/min, and 33.1bit/min respectively. Based on the offline result, we chose a 2 s time window as the stimulus time for the future online experiment.

Figure 5

Relationship between the length of the time window and the classification accuracy and ITR. Error bars represented standard errors of the means.

Table 1

Results of the accuracy and ITR with 2 s’ time window length for each subject

Subject

Accuracy(%)

ITR(bit/min)

Hybrid

EMG only

SSVEP only

Hybrid

EMG only

SSVEP only

S1

87.5

88.3

99.2

92.6

26.0

76.3

S2

100

100

100

118.1

40.0

78.1

S3

64.2

94.2

68.3

57.5

31.9

36.7

S4

81.7

99.2

82.5

82.9

38.5

51.5

S5

85.8

99.2

86.7

89.7

38.5

56.7

S6

81.7

87.5

94.2

83.1

25.2

67.4

S7

76.7

96.7

80.0

75.1

34.7

48.6

S8

87.5

95.8

91.7

92.8

33.9

63.6

S9

46.7

90.0

53.3

35.4

27.9

22.7

S10

96.7

96.7

100

110.1

34.9

78.1

Average (mean ± SEM)

80.8 ± 15.6

94.8 ± 4.6

85.6 ± 15.2

83.7 ± 24.0

33.1 ± 5.3

58.0 ± 18.5

Figure 6 illustrates the result of classification accuracy for each EMG command (gesture repetition time) and each SSVEP frequency with a time window of 2 s. For EMG modality, the classification accuracies were 98.3%, 95.7%, 94.3% and 90.7% for gesture repetition time from 0 to 3 times. The classification accuracy declined with the gesture repetition time getting larger, because 2 s might not be enough for some subjects to complete the regulated movement. Another reason for the decline was some low peaks of the EMG envelope might not have been recognized due to the strength of each gesture repetition not being consistent. Among all differences of the pairs of two repetition time, only 0 time and 3 times had statistical significance (t(9) = 3.28, p <0.01). For SSVEP modality, classification accuracies for all of the 15 kinds of frequency exceeded 80% except for 6.4Hz (72.5%). The result above shows that a system with a 2 s time window was practical for future online testing.

Figure 6

The classification accuracy for (A) each EMG command (gesture repetition time) and (B) each SSVEP frequency with a time window of 2 s. The dashed line represents the chance level. *represent significance of p<0.05.

In this study, we proposed a novel hybrid BCI speller. From the results of the offline experiment, we demonstrated the feasibility of this hybrid BCI system and obtained the optimal length of the time window for future online experiment.

A. Feasibility of this hybrid BCI system

A hybrid BCI system was proposed to enhance the system performance. Besides combining two types of BCI approaches, such as SSVEP, P300, and motor imagery, cross-modal BCI systems, which combine BCI with another kind of physiological signal such as EMG was also a practical method by which patients can use their remaining muscular function to improve the BCI system (Leeb et al. 2011; Holz et al. 2013). In this study, the hybrid system we proposed achieved significantly higher ITR than its individual single-modal systems. While one might debate that this hybrid BCI may be not applicable for some paralyzed patients who have totally lost control of their hands, some of these patients might still have the ability to control their facial muscle. Therefore, this method could still be applied by converting the movement from making fists to gritting teeth. This hybrid BCI speller is perfect for Parkinson’s patients, because they might have lost the fine ability to use a real keyboard but still can make fists several times easily. Moreover, any person could also use this hybrid BCI system for entertainment or under conditions where a keyboard is not available.

B. Information transform rate

According to formula (3), there are three ways to obtain a high ITR: (1) by increasing the number of targets, (2) by improving the accuracy of target selection, and (3) by decreasing the time needed to recognize each target. Compared to single-modal BCI system, the hybrid BCI system we designed can enlarge the number of targets with a small sacrifice in accuracy, so as to increase ITR. The total number of targets was 60, which was higher than other studies. The actual ITR with a 2 s time window in the offline experiment was 83.7 bit/min, however the theoretical maximum ITR of the hybrid speller with the accuracy of 100% was 118.1bit/min. In the future work, to further increase the ITR, the classification accuracy of SSVEP and EMG should be enhanced. To increase the accuracy of SSVEP, better stimulus frequencies should be chosen carefully and the area of each target might be enlarged. To increase the accuracy of EMG classification, a more robust classification algorithm should be employed. On the other hand, the time needed to recognize each target was not very short in this study. We could see from formula (3) that recognition time played more important roles than the number of targets. Even though the numbers of targets was relatively high in this study, longer recognition time made the ITR not very high in comparison to (Chen et al. 2014). The recognition time for one target included two parts, stimulus time and rest time. Stimulus time of 2 s was proved to be best in the offline result, so only rest time can be reduced in this study. If the rest time was decreased to 0.3 s and the classification accuracy remained the same, the ITR would reach 118.7 bit/min. However, a short rest time might lead to user fatigue more easily and would not be practical in real use, so rest time was still set to be 1 s in this study.

C. Comparison with other hybrid BCI systems and BCI spellers

In (Xu et al. 2013; Yin et al. 2013), researchers presented a hybrid BCI based on P300 and SSVEP. The ITR in their study was 34.2 bit/min and 56.4 bit/min respectively. Compared to these types of hybrid BCI system, our system achieved much higher ITR which was 83.7 bit/min. Another type of hybrid BCI system was SSVEP and motor imagery based BCI. In (Allison et al. 2010), the mean classification accuracy of the hybrid system with two targets was only 81%, while our system has a classification accuracy of 80.8% for 60 targets which was much more than the SSVEP and motor-imagery hybrid BCI system.

In (Leeb et al. 2011), a motor-imagery and EMG hybrid system was illustrated. The result showed that their method could enhance the classification accuracy when compared to the motor-imagery system and the EMG system on their own. However, the mean accuracy for EMG activity alone was 87% and the fusion approach had only a slightly higher classification accuracy (91%), which showed that their hybrid system was mainly based on EMG and EEG did not have much influence. Another EMG based hybrid system was shown in (Holz et al. 2013). In their P300-EMG hybrid speller, EMG was only used to correct spelling errors. The result showed that the performance (expressed as time for selection and number of errors) was enhanced when compared to no-hybrid speller. However, EMG represents only one target in the study so it only could enhance the system performance slightly. Compared to the two EMG based hybrid BCI system above, the method we proposed has a higher ITR. The fusion method we used was effective and could enlarge the number of targets significantly. Moreover, P300 based and motor imagery based hybrid system required training sessions, which takes additional time and may result in users’ fatigue more easily. Furthermore, these systems required significant mental effort, which may also aggravate users’. Other gesture based EMG recognition methods need training sessions before each test. However in this study, both the EMG portion and the SSVEP portion did not require training sessions, and the system can evoke significantly high SSVEP response without much effort.

In previous study, several BCI spellers were introduced using single BCI modality. BCI spellers based on P300 and motor imagery required training before using and great mental effort to achieve the assumed goal. These types of spellers cannot obtain high ITR. SSVEP based BCI speller can get relatively higher ITR, but the number of target was limited by the frequency. In (Hwang et al. 2012), a SSVEP based BCI speller of 30 targets was introduced and a mean accuracy of 87.58% and ITR of 40.72 bits/min for 6 subjects was reported. In (Volosyak 2011), the mean ITR of 7 participants of their SSVEP based BCI speller with 5 targets was 61.7 bit/min. Our system has more targets and higher ITR than these systems.

In this study we designed a hybrid BCI speller based on EMG envelope and SSVEP. All targets were divided into 4 sections, EMG was used to classify which section the target was in, and SSVEP was used to classify the particular target in the section. The offline results obtained from ten healthy volunteers confirmed that the hybrid BCI speller could be classified effectively for a practical BCI system. Specifically, the offline results revealed that an average classification accuracy of 80.5% and information transfer rate of 83.2 bit/min was achieved using our proposed hybrid BCI system. While the ITR was 32.7 bit/min for the EMG only condition and 58.2bit/min for the SSVEP only condition, thus revealing that the hybrid system had better performance than the two single-modal modalities.

Acknowledgment

This work was supported by Huawei Technologies Co., Ltd., National Basic Research Program (973) of China (No. 2011CB933204), National Natural Science Foundation of China under Grant 90820304, 91120007, Chinese 863 Project: 2012AA011601. The authors declared that they have no competing interests. KL participated in the design of the study, performed in the data collection, performed in the data analysis and drafted the manuscript. XC and XH participated in the modification of the study and revised the manuscript critically. QD participated in the acquisition of funding and the modification of the study. XG participated in the design of the study, revised the manuscript and supervised the research group. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

KL participated in the design of the study, performed in the data collection, performed in the data analysis, and drafted the manuscript. XC and XH participated in the modification of the study and revised the manuscript critically. QD participated in the acquisition of funding and the modification of the study. XG participated in the design of the study, revised the manuscript, and supervised the research group. All authors read and approved the final manuscript.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.