An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation

Kavli Affiliate: Terrence Sejnowski

| Authors: João Pedro Carvalho Moriera, Vinícius Rezende Carvalho, Eduardo Mazoni Andrade Marçal Mendes, Aria Fallah, Terrence J. Sejnowski, Claudia Lainscsek and Lindy B Comstock

| Summary:

Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets that can represent the complex tasks required for naturalistic speech decoding are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome various kinds of noise in the EEG signal and remain reliable across sessions and subjects without overfitting to a specific dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in forty consonant-vowel pairs, twenty real words, and twenty pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation targeted to inhibit or augment the EEG signal associated with specific articulatory processes. Competing Interest Statement The authors have declared no competing interest.