概要: |
Recognition of conversation scenes has recently been tackled to achieve a variety of tasks such as automatic annotation, minute taking, and meeting assistance. Since participants speak spontaneously in a conversation, a recorded conversation includes many speaker overlaps and ambient noise. To handle such complicated recordings, speech signal processing techniques play an important role. In this lecture, I will focus on some multi-channel speech enhancement and "who spoke when" estimation (speaker diarization) techniques for conversation scene analysis. Prototype meeting recognition and meeting assistance systems are also introduced.
|