コロキアムB発表

日時: 6月10日(金)3限(13:30-15:00)


会場: L1

司会: 柏 祐太郎
田口 智大 M, 2回目発表 自然言語処理学 渡辺 太郎, 中村 哲, 進藤 裕之
Title:Universal Dependencies for Low-Resource Code-Switching Languages: A Case-Study of Tatar
Abstract: This research is a report of creating a new Universal Dependencies (UD) treebank for the Tatar langauge, a low-resource language with the code-switching phenomenon. The contribution of the research is two-fold. First, it created the first treebank for Tatar on UD. Second, language tags are annotated for each word at the morpheme level, so that the treebank can cover code-switching occurring within a word (intra-word code-switching), which is commonly found in minority languages of the world. Our experiment showed that incorporating intra-word code-switching in UD enhances the accuracy of span-level language identification task by adding addtional POS information. Thus, the study confirms the merit and potential of the segment-level code-switching annotation, which is applicable to other language varieties with intra-word code-switching.
Language of the presentation: English
 
井手 佑翼 M, 1回目発表 自然言語処理学 渡辺 太郎, 中村 哲, 進藤 裕之

title: Context-Aware Grammatical Error Correction

abstract: Grammatical error correction (GEC) is a task to automatically correct errors in sentences. Typical GEC systems have processed each sentence independently, ignoring the context. While a previous study addressed this issue by considering preceding sentences before the current source sentence, we propose a model encoding future contexts, i.e. sentences after the source.

language of the presentation: Japanese

 
SARHANGZADEH ARMIN M, 1回目発表 自然言語処理学 渡辺 太郎, 中村 哲, 進藤 裕之

title: Towards an Error-tolerant Input Method for Japanese  

abstract: Typing, constitutes a noticeable part of our daily lives. And for languages with more complex writing system such as Japanese, this typing experience also involves an essential component called the input method [editor], which enables us input a large number of characters with a limited set of keys via processes such as kana-kanji conversion. Considering these, early research on Japanese input method dates back to more than three decades ago. But despite the old history and current advancements in closely related fields such as machine translation, recent works on the subject have been quite limited, and both the literature and existing in-production input methods leave noticeable gaps, especially when it comes to handling typos and misspellings. In this work, We hope to address such shortcomings and make Japanese input experience more robust. 

language of the presentation: English 

 
古賀 貴士 M, 1回目発表 ソーシャル・コンピューティング 荒牧 英治, 中村 哲, 若宮 翔子
title: Construction of Error Correction Models in Electronic Medical Records
abstract: Electronic medical records (EHR), which have been introduced in recent years, have enabled comprehensive tracking of patient dynamics and have contributed to a significant improvement in the work of physicians.However, there have been reports of cases in which physicians have entered incorrect information into EHR systems, resulting in medical accidents.In this study, we aims to construct a model that can detect whether or not an abnormal input in EHRs.
language of the presentation: Japanese
発表題目: 電子カルテにおける誤り訂正モデルの構築
発表概要: 近年導入が進められてきた電子化された診療録(電子カルテ)は、患者動態を網羅的に追跡を可能とし、医師の大幅な業務改善に貢献してきた.一方で,電子カルテシステムに記録する際に医師による誤入力が発生し,医療事故に発展するケースが報告されている.そこで本研究では,電子カルテに異常な記載があった際に誤入力であるか否かを検知するモデルの構築を目指す.
 
WU DONGMING M, 1回目発表 知能コミュニケーション 中村 哲☆, 荒牧 英治, 鳥澤 健太郎, 飯田 龍一

title: *** Pretraining Language Models with Category and Infobox information in Wikipedia for Multi-hop Question Answering *** 

abstract: *** Multi-hop question answering (QA) usually requires retrieving multiple evidence documents, each of which often has little lexical overlap to the others and the original question. In addition to retrieving multiple evidence documents, QA systems also often need to extract a span of text as the answer from those multiple evidence documents. Since each evidence document may not have much lexical overlap to the others and the original question, to better perform multi-hop QA, we may need some clues to retrieve documents and extract answers. Several previous research leveraged hyperlinks information in Wikipedia as such clues and showed the effectiveness of hyperlinks information in this task. Asai et al. improved retrieval performance by using hyperlink-based document graph (Asai et al., 2019). Yasunaga et al. improved extraction performance by fine-tuning language models pretrained with sentences from hyperlinked articles (Yasunaga et al., 2022). As extra-linguistic information, Wikipedia contains abundant category and so-called infobox information, as well as hyperlinks. A pair of articles belonging to the same category or a pair of articles which have category-subcategory relation may be helpful in pretraining of language models. An infobox contains basic information about a topic of an Wikipedia article, such as nationality, family members and so on, which are often hyperlinked to other documents but were not utilized in the previous work. We plan to develop a novel pretraining scheme for language models that can utilize the categories and infoboxes as well as hyperlinks and apply the language models pretrained using this scheme to Hotpot QA dataset, which is one of the most popular multi-hop QA datasets. Reference: Asai, Akari, et al. "Learning to retrieve reasoning paths over wikipedia graph for question answering." arXiv preprint arXiv:1911.10470 (2019). Published as a conference paper at ICLR 2020. Yasunaga, Michihiro, Jure Leskovec, and Percy Liang. "LinkBERT: Pretraining Language Models with Document Links." arXiv preprint arXiv:2203.15827 (2022). Published at ACL 2022. *** 

language of the presentation: *** English ***