Learning and adaption of end-to-end multimodal dialog management

Nguyen The Tung


In a goal-oriented dialog system, the dialog management process is of utmost importance, since it decides whether the sytem can successfully achieves its goal. In many task, information from multiple modalities is necessary for the dialog management process. My researh focuses on the learning and adaptation of end-to-end multimodal dialog management. In this research, I tackle two major challenges for multimodal dialog managemen: fusion of multimodal information and data sparsity. My research can be divided into three major parts.

In the first part, I show that existing methods of fusion are inefficient and have many issues. I proposed a novel multimodal fusion method, which is called Hierarchical Tensor Fusion and show that it can solve all the issues of existing works. Experimental results in evaluation of deception detection task show that the proposed method outperforms existing multimodal fusion methods significantly.

In the second part, I discuss about multimodal dialog management for the health consultation task. I show two dialog management models, one is built with modular-based approach and the other is based on end-to-end method. In addition, how the proposed Hierarchical Tensor Fusion is incoporated into the end-to-end model is also illustrated. Next, I explain about the weaknesses of the modular-based approach, which are inflexibility and difficult for model restructurinig, and illustrate how the end-to-end approach solves these problems. I conducted evaluation to assess the performance of dialog managers built using these two methods and observed that there is no considerable difference in their performance.

The final part explain about alleviation of the data sparsity problem using policy adaptation. The methods used by current works in dialog policy adaptation need much time and effort for adapting because they use reinforcement learning algorithms to train a new policy for the target task from scratch. In this research, I show how to learn a dialog policy can be learned without training by reinforcement learning in the target task. In contrast to existing works, the proposed method learns the relation in the form of probability distribution between the action sets of the source and the target tasks. Thus, it can immediately derive a policy for the target task, which significantly reduces the adaptation time. The experiments show that the proposed method learns a new policy for the target task much more quickly. In addition, the learned policy achieves higher performance than policies created by fine-tuning when the amount of available data on the target task is limited.