Blind Spatial Subtraction Array for Hands-Free Speech Recognition

Yu Takahashi (0551076)

In this thesis, I propose a novel blind spatial subtraction array (BSSA) to realize a hands-free speech recognition. A hands-free speech recognition is essential for realizing a natural human-machine interface. However, it is difficult to achieve an accurate speech recognition because noises and room reverberations always deteriorate speech quality in this system. Thus, we must suppress such interferences. In general, spatial subtraction array (SSA) and blind source separation based on independent component analysis (ICA) are the most popular noise reduction approaches to realize a hands-free speech recognition.

First, I point out the problems of the conventional noise reduction methods, SSA and ICA. The conventional SSA comprises a noise estimation part based on null beamformer (NBF), but NBF always suffers from microphone-element errors and room reverberations in real environments. Therefore, noise estimation performance decreases and consequently total performance also decreases. As for ICA, my preliminary experiment reveals that the conventional ICA is proficient in the noise estimation rather than the direct speech estimation in real environments, where the target speech can be approximated to a point source but real noises are often not point sources.

Next, I newly propose BSSA based on the above-mentioned findings. In the proposed BSSA, noise reduction is achieved by subtracting the power spectrum of a estimated noise by ICA from the power spectrum of a partly target-speech- enhanced signal via delay-and-sum (DS) beamformer. ICA can adapt microphone-element errors and room reverberations blindly. Therefore, ICA can work as more accurate noise estimator rather than NBF. Moreover, the proposed BSSA can work in the realistic acoustic scenario where the conventional ICA cannot work because the proposed BSSA uses ICA as a noise estimator. Furthermore, the proposed BSSA has a robustness against the permutation problem inherent in ICA. Indeed the proposed BSSA partially involves the permutation problem in the ICA-based noise estimator part. However, the proposed BSSA can efficiently reduce the negative affection of the permutation owing to the over-subtraction in the spectral subtraction and defocusing properties in DS.

Finally, an effectiveness of the proposed BSSA is shown in experimental results. First, Experiment I gives a superiority of using ICA as a noise estimator by analyzing noise estimation accuracy. Next, Experiment II reveals the permutation-robustness of the proposed BSSA in artificial simulation. Finally, Experiment III indicates that the speech recognition performance of the proposed BSSA overtakes those of the conventional methods. From these results, it can be confirmed that the proposed BSSA is well applicable to the noise-robust hands-free speech recognition system.