Neural decoding of sentences using synchronization between EEG and speech rhythm

Hiroki Watanabe (1661021)


While the brain-computer interface (BCI) enables the user to spell the letters on the monitor using brain activity, the existing spelling system requires several seconds to select one letter. In contrast, because neural decoding of speech predicts speech information directly from brain activity, it enables to realize real-time communication with the interlocutor. The decoding is, however, at the level of basic research due to insufficient accuracy even in a limited number of classes.

With the aim of approximating the neural decoding of speech to practical usage, electroencephalogram (EEG)-based sentence classfication methods are proposed in the thesis. In the existing neural decoding of speech, constructing higher linguistic units (e.g., phrase and sentence) by combining lower ones (e.g., words) is of great difficult due to insufficient classification accuracy in multinomial classification. However, expression of intention using only lower linguistic units is prone to misinterpretation (e.g., in case of "water", it is difficult to estimate whether the user wants to drink water or wants to water plants). In contrast, brain-based sentence classification improves practicality because expression by sentences is less ambiguous (e.g., "I would like to drink water" instead of "water").

The current neural decoding of the sentence was on the basis of EEG phase information because previous magnetoencephalogram (MEG) research showed that a template matching could classify three sentences by using theta (4-8 Hz) phase patterns. This classification method utilizes phase synchronization during speech perception: theta oscillations during speech perception synchronize with speech envelope and a stimulus-specific phase patterns are induced.

In Research 1, I extended the previous MEG-based classification by (1-1) use of EEG which is suitable for practical BCI and (1-2) evaluation of the subject-independent models. To improve accuracy, I proposed (1-3) four classifiers: template matching (baseline), logistic regression, support vector machine, and random forest, and (1-4) novel features including phase patterns in a higher frequency band with the aims of utilizing phase synchronization in a low-gamma band.

Research 2 applied the method of Research 1 to imagined sentence classification. Because modulation of neural dynamics during imagined speech remains unclear, (2-1) I investigated whether neural oscillations during the articulatory imagination of speech synchronize with the imagined speech rhythm using nonsense speech materials in Experiment 2. I replaced imagined speech with the overt speech because the acoustics of imagined speech is not observable. To this end, I regressed overt nonsense speech envelopes from EEG during imaging the articulatory movement of the nonsense speech and calculate the correlation between the EEG-based regressed envelope and the speech envelope. (2-2) For the purpose of investigating classification performances based on the synchronization, a logistic regression classifier was trained using phase angles of the EEG-based regressed envelope for classification. (3-1) Experiment 3 extended to the classification method to real and meaningful sentences from EEG during imaging the speech.

Research 1 showed that while most classifiers performed similarly, the proposed features significantly improved the accuracy compared with a baseline feature (best: 55.9%). In Research 2, Experiment 2 demonstrated a significant correlation between the EEG-based regressed envelope and the overt speech one, indicating phase synchronization between the EEG during the imagined speech and overt counterpart. The average accuracy of imagined sentence classification achieved 48.7%. Experiment 3 showed the maximum classification accuracy achieved 51.2% among the two participants.

The thesis contributed to advance the neural decoding of speech by revealing that (a) phase patterns in the higher frequency band (>30 Hz) improves classification performance, (b) these classification models can be applied to unseen users without pre-training, (c) phase synchronization also emerges during imagined speech, and (d) it provides useful features to classify imagined sentences.