概要(Abstract) |
Systems with natural language interfaces such as conversational interface are useful for human users in human-system collaboration tasks. Interactive image editing task is a task that uses natural language interfaces, which is a potential application for non-skilled users. If users want to create an imagined image, they can ask the system to create the image as we usually do with skilled workers. This thesis presents an interactive image editing system based on neural network image generative models, which proactively communicates with users to create the desired image.
The interactive image editing task is challenging on the following two aspects: 1) the system has to handle various editing requests from the users in natural language, 2) the system has to handle the uncertainty of the generated images due to the diversity of editing requests. For the first problem, we propose an interactive image editing framework based on neural network-based image generative models. This framework aims at training a model to automatically learn relationships between the change of images and the natural language editing requests. We demonstrate that our model can successfully edit a given image according to the editing requests.
For the second problem, a naive solution is to show the multiple images generated from multiple editing models and ask the user to confirm the most relevant image to the editing request every time. However, this strategy makes the interactive process redundant. To solve this problem, we propose a proactive confirmation method that enables the system to confirms with the user when the system is tentative about selecting a better image to match the editing requests. We defined an uncertainty score by using the entropy of the generated image to decide the system action to confirm. We demonstrate that our method achieves fewer confirmations to the users with better image qualities through the dialogues.
|