Controllable neural conversation model considering conversation structure and context

Seiya Kawano


A neural conversation model is an end-to-end scheme that generates system responses from user utterances. Unlike traditional dialogue systems based on pipeline architecture, it can generate high-quality responses when large amounts of training data are available. In this thesis, we focus on three problems of the current neural conversation models.

In our first study, we focused on the problem of controllability in them. Neural conversation models rely on end-to-end architecture. This problem complicates controlling the generation with human heuristics, knowledge bases, and dialogue models. We propose a conditional neural conversation model that can control the neural conversation model's response by semantic representations conscious of dialogue structure. Such semantic representations effectively enable dialogue systems to generate a consistent response toward dialogue goals. In this study, we proposed a conditional neural conversation model conditioned by semantic representations, especially on dialogue acts. Our experimental results showed that the proposed model generated promising responses in terms of controllability and naturalness compared with strong conventional models.

In our second study, we incorporated entrainment, which is an attractive human phenomenon, into neural conversation models. Entrainment is a well-known conversational phenomenon in which dialogue participants mutually synchronize with regards to various aspects; it has been suggested that it is closely related to the quality of human-human dialogues. We first analyzed the relationship between dialogue quality and entrainment, using an automatic entrainment evaluation measure. We identified that entrainment improved the satisfaction of the participants in human-human and human-machine dialogues. Then we proposed a conditional neural conversation model that can control generation using the given entrainment degree. The experimental results showed that the proposed entrainable neural conversation model generated comparable or more natural responses than conventional models and satisfactorily controlled the entrainment of the generated responses.

In our third study, we focused on the floor structure in dialogues. Most dialogue systems do not assume that a dialogue has multiple floors. Expanding the research scope to multi-floor dialogues will contribute to building autonomous robots that solve real-world problems through dialogues across multiple floors. In this study, as a first step, we proposed the first baseline model that automatically identifies multi-floor dialogues in an object exploration task in a house. We experimentally evaluated our proposed model's performance and discussed its limitations and future directions.