This thesis describes the design and evaluation of a new statistical model for building robust dialog systems. Conventional chat-oriented dialog system requires a well hand-craft rule which requires a lot of humans work, especially when the dialog try to accommodate a various topics. A current statistical approach in the chat-oriented dialog should deal with vast human language, therefore there is always a case when the system end up giving an uncorrelated response because it couldn't find the user query match in conversation database. Moreover, relying on the unfiltered conversation database also results in the unnatural conversation.
Given this challenges, in this work, we try to scale up the conventional statistical model for chat-oriented dialog system. Dealing with the unnatural response, we utilize the real human-to-human conversation examples from movie scripts and Twitter conversations. Our goal here is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. Next we also deal with the case when the user query is not available in the database (out of example; OOE). Here we approach this problem with the response generation approach. We propose a new statistical model for building robust dialog systems using neural networks to either retrieve or generate dialog response based on an existing data sources.
System performance was evaluated based on objective and subjective metrics. It shows that the new proposed approaches have the ability to deal with user inputs that are not well covered in the database compared to standard example-based dialog baselines. Our experimental results also show that the proposed filtering approach effectively improves the performance.