Theoretical analysis of musical noise generation in nonlinear noise reduction methods based on higher-order statistics

Takayuki Inoue (0951011)


In this thesis, I provide a new theoretical analysis of the amount of musical noise generated via typical nonlinear noise reduction methods based on higher-order statistics. Recently, it has not been clarified mathematically how much musical noise is generated through conventional nonlinear noise reduction methods for speech communication systems. To cope with the problem, an objective metric based on higher-order statistics has developed in past study, however, it still remains as an open problem that there is no theoretical analysis of the amount of musical noise generated via many of conventional nonlinear noise reduction methods. Therefore, in this research, I formulate the amount of musical noise generated in generalized spectral subtraction (SS), iterative SS, and Wiener filtering (WF) family that are commonly used nonlinear noise reduction methods based on higher-order statistics.

In this thesis, first, I analyze generalized SS. SS has a parameter that determines the domain in which the exponent is applied in SS. I find that SS is most commonly performed in the power spectral domain with an exponent value of two by preliminary investigation. However, there have been no theoretical studies on the advantages of SS in the power spectral domain. Therefore, in this thesis, I provide a theoretical analysis of the amount of musical noise in generalized SS that includes a flexible exponent parameter, and clarify from the mathematical analysis and evaluation experiments that less musical noise is generated in a spectral domain with a lower exponent.

Next, I analyze iterative SS that can achieve high-quality noise reduction with low musical noise using recursively applied weak SS. As a result, it is clarified that the iterative method is not always advantageous, and that the values of the parameters should be carefully set. However, my theory mathematically proves that iterative SS with the optimal parameters is advantageous for achieving no-musical-noise speech enhancement, which has only been experimentally shown in previous studies. Moreover, It is revealed that optimized iterative SS is superior to commonly used noise reduction methods. <\p>

Finally, I analyze WF family and clarify the relation of the amount of the musical noise generation among each of the WF methods from a mathematical analysis and evaluation experiments . In addition, It is clarified that when we use lower exponent parameters in WF as well as generalized SS, we can obtain an enhanced speech signal with less musical noise. Moreover, I reveal that the relation of the amount of the musical noise generation between SS and WF, which has not been clarified in the past studies.