Speaking disability is a grievous problem for human beings to communicate with each other. This negative factor leads communication barriers and deteriorates the quality of life of disabled people. Laryngectomees are also people with disabilities as result of an operation to remove them larynges including the vocal folds for several reasons such as an injury and laryngeal cancer. An electrolarynx is a medical device to help them to produce quite intelligible speech, electrolaryngeal (EL) speech, by mechanically generating artificial excitation signals, instead of vocal fold vibrations. Unfortunately, its sound is characterized as mechanical and robotic because of monotonic excitation signals.
In order to generate natural EL speech, this dissertation deals with a statistical fundamental frequency (F0) pattern prediction method, which is a part of statistical voice conversion technique, in two approaches. One is a software-based EL speech enhancement system including three stages: analysis of the recorded EL speech, enhancement of the extracted acoustic parameters, and synthesis of the enhanced EL speech. This framework is not suitable to several situations, such as face-to-face conversation, because the enhanced EL speech is presented from a loudness speaker, while it is possible to perform complex enhancement. Another is a hardware-based EL speech enhancement system to directly control F0 pattern of excitation signals generated by the electrolarynx. The latter framework is effective for any situation including face-to-face conversation and allows them to produce enhanced EL speech from their own mouths. From experimental evaluations, it is demonstrated that the proposed systems successfully address the issue of EL speech and dramatically improve the naturalness of the EL speech.