コロキアムB発表

日時: 9月14日(木)3限目(13:30-15:00)


会場: L1

司会: KAN Yirong
LEE SANGMYEONG M, 2回目発表 知能コミュニケーション 中村 哲, 吉野 幸一郎(客員教授), 品川 政太朗
title: Ameliorating the CLIP's classification ability by leveraging lingual structural information
abstract: The most fundamental premise of the Vision-and-Language model is that the model understands the meaning and the intention of the user accurately. However, as the model's inputs are pure texts, there have been recent reports showing the model's failure to understand structural information of the language, leading to its vulnerability to structural ambiguity, attribute mismatching, etc. Our research aims to tackle this problem by leveraging structural information as inputs for the CLIP model, one of the most versatile and powerful text encoding models in the field. Our first phase using the syntax trees in the form of bracketed texts has shown the superiority of its usage over pure texts while leaving limitations that the structural information wasn't understood by the model as intended. Our second and current phase constructs a computation graph from the semantic role labellings of the sentence's constituents. With this approach, we expect better classification ability as well as comprehension.
language of the presentation: Japanese
 
岡本 夏旺 M, 2回目発表 知能コミュニケーション 中村 哲, 吉野 幸一郎(客員教授), 品川 政太朗
title: Building a diversity prediction model to predict the diversity of image sets from text.
abstract: In the model of image generation from text, manual prompt engineering performed to generate the ideal image requires a considerable amount of time because the image is generated repeatedly and trial-and-error with the input text is necessary. To solve this problem, this study focuses on diversity in the image set and attempts to build diversity prediction model that predicts the diversity of the image set from the text. By using the diversity prediction model, it is expected to be possible to search for appropriate input text without the need for image generation. This presentation reports on the proposal of a new diversity evaluation index for building diversity prediction models. Specifically, while existing diversity evaluation indices cannot correctly evaluate the diversity of image sets that do not belong to a class label, the proposed method can evaluate diversity with higher accuracy than existing methods by avoiding the use of class classification models.
language of the presentation: Japanese
 
濱田 裕太 M, 2回目発表 知能コミュニケーション 中村 哲, 渡辺 太郎, 品川 政太朗
title: Structure-aware Text-to-Image using Scene Graphs Similarity
abstract: One of the challenges of the text-to-image synthesis model is that it is difficult to control relationships when drawing complex scenes. To solve this problem, an image generation evaluation metric suitable for understanding spatial structure is needed. Therefore, I propose a method using scene graph similarity as an evaluation metric to measure the spatial structure of synthetic images. Specifically, a scene graph is generated from each of input sentences and output images. Next, I calculate the similarity between the input and output graphs for the evaluation metric. In this presentation, I report the results of a comparison between existing evaluation metrics and evaluation metrics based on scene graph similarity for synthetic images.
language of the presentation: Japanese
発表題目: シーングラフ類似度により空間的構造を考慮したText-to-Image
発表概要: テキストからの画像生成モデルの課題の一つに、複雑なシーンを生成する場合に関係性の制御が困難という点がある。これを解決するためには空間的構造の把握に適した画像生成評価指標が必要である。そこで本研究では、生成画像の空間的構造を測る評価指標として、シーングラフ類似度を用いた手法を提案することで問題の解決を試みる。具体的には、入力文と出力画像のそれぞれからシーングラフを生成する。次に生成された入出力グラフ間の類似度を算出して評価指標とする。本発表では、生成画像を対象として既存の評価指標とシーングラフ類似度による評価指標の比較を行った結果を報告する。
 
土肥 康輔 D, 中間発表 知能コミュニケーション 中村 哲, 渡辺 太郎, 須藤 克仁
title: Automated Essay Scoring Using Positive Grammatical Features
abstract: In recent English language teaching in Japan, writing and speaking abilities have been getting more attention. Writing and speaking tests are conducted in entrance exams as well as in classrooms. However, scoring them requires human raters and are time-consuming. Automated scoring can be a solution of these problems. Writing scores are typically given based on the aspects of essays, such as content, organization, grammar, or vocabulary. It is known that there are linguistic properties that are characteristic and indicative of L2 proficiency, and human raters make a use of them while scoring. In this study, we used positive linguistic features (i.e., grammar that learners can use correctly) as well as error features as additional inputs for a scoring model. The results showed that the scoring performance of a model improved by using the grammar features.
language of the presentation: Japanese