Multilingual Natural Language Processing (National Institute of Information and Communications Technology (NICT))
Research Staff
-
Professor
Eiichiro SUMITA -
Associate Professor
Chenchen DING
eiichiro.sumita@nict.go.jp chenchen.ding@nict.go.jp |
|
To the site | https://astrec.nict.go.jp/naist/ |
Research Areas
This laboratory is conducting research and development on new natural language processing technologies with the aim of making the languages of the world “equal”. Some languages, such as English, are widely used around the world, while many other languages are used only in limited areas and situations. Different languages have different levels of language processing technology. We will use state-of-the-art technologies and large-scale computing resources to improve the inequalities associated with languages.
The main research directions are listed below. For research results so far, please refer to the papers published on the website of the faculty in charge. Please feel free to consult with us as various research subjects are possible, not limited to those listed below.
Construction of multilingual data
The purpose of this direction is to develop morphological and syntactic systems that can be applied to multiple languages in a unified manner, and to build necessary training data for multilingual processing based on this system. This is because we are paying attention to not only languages such as English and Japanese, for which a large amount of data has already been prepared, but also many languages for which language processing has not yet been explored. NICT has been focusing on the preparation of language resources in Southeast Asian regions. In this lab, we will look at languages not only in Asian regions but also in the entire world.
Analysis of multilingual information
Using linguistic data, we apply machine learning methods to morphological, syntactic, and semantic analysis tasks of languages. The analysis of linguistic information has two axes: diverse tasks and diverse languages. In conventional research, we need to deal with specific tasks and specific languages individually. With the development of state-of-the-art deep learning technology and large-scale language models, simultaneous learning of multiple languages and tasks has become possible. This laboratory handles with languages having large amounts of data, as well as explores methods that can analyze languages with few or no data.
Generation of multilingual information
Generating human-readable and easy-to-understand texts by a computer is one of the most important tasks in language processing. Automatic translation of multilingual text is an application in which our lab is strong, and it has permeated into business and daily life. However, there are still many problems to be solved. We will especially focus our research on coping with the variety of languages, robustness against input fluctuation, and explainability of translated results. The translation engine “Minna no Jido Hon'yaku @ TexTra” developed by NICT will be utilized.
Cooperation with sound and image processing
In addition to speech recognition and optical character recognition, we will also work on fusing image and audio information with text information, and automatic translation using this information.
Key Features
We aim to develop students who are familiar with both “natural language” and “information engineering technology”. In the modern world where AI technology continues to evolve based on the huge growth of data and computation, researchers and developers who understand both “natural language” and “information engineering technology” are essential. We want to develop students from various fields according to their individual knowledge.
Internship activities at NICT are possible. In this case, intern students will be able to have a broad perspective through hands-on training at research sites with a view to implementing AI technology in society and discussions with various stakeholders.