| 芳賀 あかり | D, 中間発表 | 自然言語処理学 | 渡辺 太郎, | 荒牧 英治, | 大内 啓樹 |
|
title: Modeling Children's Errors Using Small Language Models
abstract: Language models have advanced rapidly in recent years. However, they still require substantially more training data than humans. Therefore, data-efficient approaches inspired by children’s language acquisition have attracted increasing attention. In this study, we hypothesize that language models that reproduce errors observed in child language acquisition exhibit learning processes that are closer to those of humans and are more data-efficient. To test this hypothesis, we construct an evaluation dataset to examine whether language models exhibit overregularization, a typecal error in child language acquisition. Using this dataset, we evaluate models of different sizes and architectures and investigate which models exhibit childlike error tendencies during training. language of the presentation: Japanese 発表題目: 小規模言語モデルを用いた子どもの誤りのモデリング 発表概要: 近年,言語モデルは急速に性能向上を示している一方で,人間と比べて依然として大幅に多い学習データを必要とする.このため,子どもの言語獲得に着想を得た学習効率化が注目されている.本研究では,子どもの言語獲得過程で観察される誤りを再現する言語モデルほど,人間に近い効率的な学習過程を示すという仮説を立て,子どもが起こす典型的な誤りである過剰一般化が現れるかを複数の言語モデルを用いて評価し,どのモデルが子どもらしい誤り傾向を示すかを調査した. | |||||
| MARTINEZ PEGUERO ARTURO | D, 中間発表 | 自然言語処理学 | 渡辺 太郎, | 荒牧 英治, | 上垣外 英剛 |
|
title: Reframed text generation
abstract: A glass that is half empty can also be seen as being half full. Carefully-rephrased wording can adjust our frame of reference and shift our point of view of the same fact. In other words, through meaning-preserving text style transfer, a change of perspective can be conveyed. My research seeks to develop a large language model-based text generation system that takes text as input, identifies relevant existing frames, and generates a re-framing of the input with a persuasive, appropriate and context-sensitive rephrasing. language of the presentation: English | |||||
| FREDERIKUS HUDI | D, 中間発表 | 自然言語処理学 | 渡辺 太郎, | 荒牧 英治, | 上垣外 英剛 |
|
title:
LecTrans: A Multimodal Translation Benchmark for Online Academic Lectures
abstract: While existing Machine Translation (MT) benchmarks primarily focus on text-only or tightly aligned inputs, academic lectures require translation across spoken and visual modalities in long-form, knowledge-intensive settings, making them a challenging and informative testbed for multimodal language understanding. We introduce LecTrans, a large-scale benchmark for evaluating multimodal translation of approximately 350 hours of expert-taught online academic lectures. LecTrans includes transcripts and presentation slides which are professionally verified, and 17 video segments totalling 162-minute across 17 subjects as evaluation subset which further includes human-corrected transcripts, domain-term annotations, and professional translations to 7 languages across 17 subjects. We formulate lecture translation as a dual-task evaluation problem that distinguishes between transcription translation and slide translation, which enables modality-aware analysis of translation behavior and error propagation, revealing challenges in long-context and domain-specific translation. LecTrans serves as a benchmark for assessing the capabilities of multimodal translation systems on academic content. language of the presentation: English | |||||