Music Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration

Daichi Kitamura (1251035)


In this thesis, to address a music signal separation problem, I propose a new hybrid method that concatenates directional clustering and supervised nonnegative matrix factorization (NMF) with spectrogram restoration for the purpose of the specific sound extraction from the multichannel music signal that consists of multiple instrumental sounds. Recently, a main format for obtaining musical tunes has become electronic data such as music files, witch can be made available over the Internet owing to progress in information technology. Hence, users can easily obtain and edit music tunes, resulting in the active creation of new contents. According to this background, music signal separation technologies have much attention. Music signal separation is aimed to extract a specific target signal from music signals that contain multiple music instrumental sounds. Audio remixing by the users, automatic music transcription, and musical instrument education are one of the feasible music signal separation applications.

In the previous studies, music signal separation based on NMF has been a very active area of the research. Various methods using NMF have been proposed, but they remain many problems, e.g., poor convergence in update rules in NMF and lack of robustness. To solve these problems, I propose a new supervised NMF (SNMF) with spectrogram restoration and its hybrid method that concatenates the proposed SNMF after directional clustering. Via extrapolation of supervised spectral bases, this SNMF with spectrogram restoration attempts both target signal separation and reconstruction of the lost target components, which are generated by preceding binary masking performed in directional clustering.

Next, I provide a theoretical analysis of basis extrapolation ability and reveal the mechanism of marked shift of optimal divergence in SNMF with spectrogram restoration and trade-off between separation and extrapolation abilities. Evaluation experiment of the separation using artificial and real-recorded music signals show the effectiveness of the proposed hybrid method.

Finally, based on the above-mentioned findings, I propose a new scheme for frame-wise divergence selection in the proposed hybrid method to separate the target signal using optimal multi-divergence. The results of an evaluation experiment show that the proposed hybrid method with multi-divergence can always achieve high performance under any spatial conditions, indicating the improvement in robustness of the proposed method.