Data-driven Knowledge Processing(National Institute of Information and Communications Technology (NICT))
A study on intelligent dialog systems using big data
NICT Data-driven Intelligent System Research Center (DIRECT) strives to develop natural language processing systems that contribute to society. In particular, we are currently developing the dialog systems WEKDA and SOCDA. WEKDA is a spoken dialog system that can chat with users on a wide range of topics and give answers to spoken factoid/non-factoid questions using deep learning technologies and 4 billion web pages. SOCDA communicates with millions of disaster victims through a chat application (LINE) on smartphones and collects/provides disaster-related information from and to disaster victims. We are also trying to apply the technologies in WEKDA to spoken dialog systems that perform conversations with elderly people in order for them to have healthy and fulfilling everyday lives. In this research area, we pursue not only the further improvement of the above dialog systems but also the development of general technologies that enable intelligent conversations and debates using big data. Examples of research topics include "dialog strategies for educational purposes" and "automatic dialog strategy modification from user interaction". The latter aims at developing dialog systems that can automatically change their dialog strategies according to users' requests.
A study on question answering and hypothesis generation using big data
This research area focuses on 1) improving technologies of factoid/non-factoid question answering using knowledge obtained from a huge amount of web pages, 2) creating a new type of question answering task that has never been addressed in the field of natural language processing and 3) developing technologies for generating innovative hypotheses utilizing a huge amount of knowledge obtained from big data.
DIRECT have already developed the Japanese question answering system WISDOM X (https://wisdom-nict.jp/#top). This system gives answers to questions such as "why do sun flares occur?" and "what will happen if global warming persists?" using 4 billion web pages. Using it, we also succeeded in generating hypotheses that foresee facts reported in some scientific research paper. Here, "hypotheses" are not limited to scientific hypotheses: stories in novels can also be regarded as a certain type of hypotheses. Would it not be amazing if a dialog system could on its own start telling a story that was automatically constructed as hypotheses? Examples of research topics include "question answering methods that can provide multi-sentence answers to complex questions" and "story generation using question answering methods and big data".
Study on fundamental natural language processing technologies that are applicable to big data
The above two research areas require syntactic analysis, semantic analysis, and context analysis of texts. These technologies have been studied for a long time in the field of natural language processing but, in most cases, satisfactory performance has never been achieved. In this research area, we develop such fundamental technologies that can be applied to big data. Examples of research topics include "general purpose zero anaphora resolution".
DIRECT currently employs dozens of human annotators who create high quality datasets for new tasks related to the above technologies. In addition, we have collected a huge amount of raw texts by crawling the web (More than 20 billion Japanese web pages and 1 billion English web pages.) and develop question answering, dialog systems and other technologies. We also have a variety of versions of the pre-trained state-of-the-art language models, such as BERT. Our facility is equipped with more than 500 CPU severs and more than 500 GPGPUs. Members of Data-driven Knowledge Processing laboratory can utilize such resources and equipment for their research activities.