ゼミナール発表

日時: 9月25日（火）5限 (16:50-18:20)

会場: L1

司会: 加藤有己

坂口慶祐	1151052: M, 2回目発表	松本裕治, 関浩之, 新保仁, 小町守
title: Joint English Spelling Error Correction and POS Tagging for Language Learners Writing abstract: We propose an approach to correcting spelling errors and assigning part-of-speech (POS) tags simultaneously for sentences written by learners of English as a second language (ESL). In ESL writing, there are several types of errors such as preposition, determiner, verb, noun, and spelling errors. Spelling errors often interfere with POS tagging and syntactic parsing, which makes other error detection and correction tasks very difficult. In studies of grammatical error detection and correction in ESL writing, spelling correction has been regarded as a preprocessing step in a pipeline. However, several types of spelling errors in ESL are difficult to correct in the preprocessing, for example, homophones (e.g. hear/here), confusion (quiet/quite), split (now a day/nowadays), merge (swimingpool/swimming pool), inflection (please/pleased) and derivation (badly/bad), where the incorrect word is actually in the vocabulary and grammatical information is needed to disambiguate. In order to correct these spelling errors, and also typical typographical errors (*begginning/beginning), we propose a joint analysis of POS tagging and spelling error correction with a CRF (Conditional Random Field)-based model. We present an approach that achieves significantly better accuracies for both POS tagging and spelling correction, compared to existing approaches using either individual or pipeline analysis. We also show that the joint model can deal with novel types of misspelling in ESL writing. language of the presentation: Japanese

濱口拓男	1151083: M, 2回目発表	松本裕治, 関浩之, 新保仁, Kevin Duh
title: nested Structured Model for Multi Task Learning abstract: Multi-Task Learning is a scheme which divides one data into some datum and learns parametar from each data.Even if using less data,we can expect hight accuracy by sharing each parameter not indipendently.But previous model treated only simple structure as parameter sharing.I introduce a model which is expect to see fine structure by using a hierarchical structure. language of the presentation: * Japanese * 発表題目:階層構造を用いたマルチタスク学習モデル発表概要: マルチタスク学習は、ある１つのデータを複数に分け、分割したデータに大使ここの学習機を当てはめるモデルである。この時、個々に学習するのではなく、各学習機のパラメータをシェアすることにより、少ないデータでも高い精度を期待することが出来る。しかし既存のモデルではパラメータのシェアする構造として単純なものしか扱わなかった。そこで今回階層的な構造を入れることによりより適した構造を期待できるモデルを紹介する。

藤野拓也	1151092: M, 2回目発表	松本裕治, 関浩之, 新保仁, 小町守
title: Word Segmentation for Error Correction of Japanese Learner's Text abstract: Word segmentation is often the first step in the tasks of error detection and correction of Japanese learner's text. However, word segmenter for error-free text tends to fail in word segmentation of learner's text, and this causes decreasing accuracy of error detection and corretion. In this study, we propose a method of word segmentation of Japanese learner's text for error detection and correction. In this presentation, we describe a method and result of learning word segmenter with data of language learning SNS. language of the presentation: Japanese 発表題目: 日本語学習者の作文の誤り訂正に向けた単語分割発表概要: 日本語学習者の作文の誤り検出・訂正タスクでは、前処理として文の単語分割を行うことが多い。しかし、一般的な文に対する単語分割器では単語分割に失敗する傾向があり、誤り検出・訂正の精度低下につながる。そこで、本研究では、日本語学習者の文を誤り検出・訂正に適した形に単語分割することを目的とする。発表ではSNSデータを用いた単語分割器の学習について、進捗報告を行う。

福嶋誠	1161010: D, 中間発表	池田和司, 松本裕治, 川人光男
title: Development of a method to jointly reconstruct current sources and source-space effective connectivity over the whole brain from MEG abstract: MEG, a signal of tiny magnetic fields from the brain, is broadly used in human neuroimaging studies owning to its non-invasiveness and millisecond-order high temporal resolution. However measured magnetic fields outside the head often distribute broadly across the MEG sensors even if one has spatially focal brain activity, which would be modeled as a small set of current sources. Thus, for discovering cortical location of brain activity as well as functional networks driving temporal evolution of the activity, it is necessary to reconstruct current sources and source-space effective connectivity over the whole brain from the measurements. Previous approach is so-called two-stage method, where all the current sources are firstly estimated and then the effective connectivity among only the active sources is estimated. The two-stage method has a problem: it would propagate possible errors in the current source reconstruction to the forthcoming effective connectivity estimation. To solve the problem of error propagation, we propose a method to jointly reconstruct current sources and source-space effective connectivity. We applied the joint method and the two-stage method to synthesized MEG data generated from a network model composed of the neural mass models. An essential difference was not seen in the estimates when signal to noise ratio (SNR) was high. When SNR was low, some false positive source activity and connectivity estimated by the two-stage method were not estimated by the joint method. The results indicate that accuracy of current source reconstruction and source-space effective connectivity estimation is improved by jointly reconstructing the source activity and connectivity over the whole brain. language of the presentation: Japanese 発表題目: 全脳にわたる電流源の強度と電流源間の機能的結合を脳磁図から同時推定する手法の開発発表概要: 脳磁図は脳から生じる微弱な磁場を計測して得られる信号であり，非侵襲的に計測できることとミリ秒単位の高い時間分解能をもつことから，ヒト脳機能イメージング研究に広く用いられている．ただし脳磁図で観測される頭外の磁場は，たとえ少数の電流源でモデリングされるような空間的に限局した脳活動であっても，空間的に広がった形で分布してしまう．そのため，ヒト脳活動の生じる場所，さらに活動の時間発展を駆動する脳の機能的結合様式を明らかにするためには，観測磁場から全脳の電流源強度，また電流源同士の機能的結合を推定しなければならない．これまでは，全脳の電流源をはじめに推定の上，活動がみられた電流源のみの間で機能的結合推定を行う，いわば二段階推定のアプローチがとられてきた．しかしながらこの方法では，はじめに推定された電流源に誤差が含まれる場合，その後推定される機能的結合にも誤差が伝播してしまう．本研究ではこの問題を解決するため，電流源推定と機能的結合推定を同時に実現する手法を提案する．同時推定法と二段階推定法を，ニューラルマスモデルからなるネットワークモデルより発生させた脳磁図シミュレーションデータに対して適用したところ，信号ノイズ比が高い場合には両者の推定結果の間に本質的な違いがみられなかったが，低い場合には二段階推定法で推定された活動，機能的結合の偽陽性が同時推定法では推定されなかった．このことは，全脳にわたる電流源の強度とその機能的結合を同時に推定することによって，電流源推定と機能的結合推定それぞれの精度が向上することを示している．

会場: L2

司会: 佐藤哲大

大井学	1151136: M, 2回目発表	金谷重彦, 湊小太郎, Md.Altaf-Ul-Amin
title: Role of codon usage bias in metabolism of Arabidopsis thaliana abstract: Codon choice is adapted to populations of tRNA for efficient protein translation. Unicellular organisms with highly expressed genes exhibit a strict use of particular codons. This is in contrast to multicellular organism, where codon choice differs among genes in the same species. As an example, in unicellular organisms’ G+C content of the whole genome is reflected in the third codon position, whereas in human genes, preference of A+T and G+C fluctuate. The diversity of codon choice is attributed to variety of gene functions and complexity of genetic information transmission. Secondary metabolism in plants is adopted to suit survival strategies. It is therefore easily expected that codon usage bias in plants depend on metabolism. In this study, we analyzed codon usage in Arabidopsis thaliana to examine relation between codon choice and metabolism. We measured codon usage bias in 27206 coding sequences obtained from The Arabidopsis Information Resource (TAIR) and used Principal component analysis (PCA) to identify codon usage bias. The results show that the first principal component PC1 is associated to G+C content whereas PC2 relates to arginine codon preference, and we observed different distribution tendency in PC1 between genes involved in primary metabolism and secondary metabolism. This implies presence of adjustment of efficiency in protein translation by altering populations of tRNA in respective metabolism to control protein production level. In particular, distribution tendency of ribosomal genes and genes involved in metabolism of phenylpropanoid and brassinosteroid are similar, and it implies that secondary metabolism produce large amount of metabolite. language of the presentation: Japanese 発表題目: シロイヌナズナの代謝におけるコドンバイアスの役割の解明発表概要: 同じアミノ酸をコードする複数のコドンを同義コドンと呼ぶ。同義コドンの使用頻度は、生物種固有に特徴的な偏りを持っている。特に、原核生物や酵母などの下等真核生物において、リボソーム遺伝子の様な常に発現がみられる遺伝子は、タンパク質を効率的に翻訳するためにtRNA量に適合させたコドン選択をおこなう。一方、高等真核生物では、同一生物種の遺伝子間で同義コドンの選択が異なっている。例として、原核生物、下等真核生物ではゲノム全体のG+C含有率がコドンの3塩基目に反映されるが、ヒト遺伝子では、AT、GCの好みが揺らぐことが報告されている。同一生物種内でのコドン選択の多様性の存在は、高等真核生物の多様な遺伝子機能と遺伝情報伝達の複雑な制御機構に起因する。特に生存戦略に合わせて多様な二次代謝をおこなう植物では、同義コドンの使用頻度はその遺伝子が関係する代謝に依存することが予期される。そこで本研究では、植物のコドン選択と代謝の関係を理解するために、植物の代表的なモデル生物であるArabidopsis thalianaのコドン使用頻度解析を行った。TAIR（The Arabidopsis Information Resource）に登録されている27206のCDS（coding sequence）の同義コドン使用頻度を算出し、主成分分析を行った。この結果、第一主成分について、遺伝子のG+C含有率と相関がみられた。第二主成分についてはアルギニンコドンが強い影響を与えていることが判った。さらに、各遺伝子についてArabidopsis Gene Classifierによる遺伝子の機能分類を用いて各主成分による傾向を観察した。その結果、第一主成分において生命維持に欠かせない一次代謝に関わる遺伝子群と二次代謝に関わる遺伝子群について異なった傾向が見られた。これは、それぞれの代謝においてtRNAの好みを変えることでタンパク質の生産効率を調整し、タンパク質生産を制御している可能性を示している。特にフェニルプロパノイド、ブラシのステロイドの代謝に関わる遺伝子群とリボソーム遺伝子は似た傾向を示しており、二次代謝が大量の代謝産物を生産することを表していると考えられる。

太田公平	1151137: M, 2回目発表	金谷重彦, 湊小太郎, Md.Altaf-Ul-Amin
title:Analysis of rare codon of microbe group abstract: Rare codons are known to have an impact to the efficiency of protein synthesis. It is difficult to analyze them in experiments since their expressions occur rarely.To attain this, we gathered a large amount of microbial genomes. We utilized the relative synonymous codon usage (RSCU) score, which is used to quantify the codon bias, to characterize rare codons. In the presentation, I report the methods of the analysis and their results. language of the presentation:Japanese 発表題目:微生物ゲノムにおけるレアコドン解析発表概要: 発現調節に関わるレアコドンの研究は、それを利用した遺伝子や制御機構などの発現の少なさから解析が困難である。そこで、大量の微生物ゲノムを用いた解析を行うことでレアコドンの特徴を解析する。レアコドン解析手法としてコドンバイアスを定量化する指標RSCU(relative synonymous codon usage)を用いた。発表では、解析方法と得られた結果について報告する。

中谷淳至	1151139: M, 2回目発表	金谷重彦, 湊小太郎, Md.Altaf-Ul-Amin
title: Sequence analysis of plant terpene synthase based on the characteristics of the compound abstract: In the process of drug discovery, chemoinformatics approach is utilized to efficiently analyze characteristics of drug candidates using information technology. Terpenes are widely used in pharmaceuticals and industrial materials. However, it is difficult to get ample supply of the compound using plants because the mechanisms of synthesis is not clear. Thus there is need for stable production of artificial terpenes. The purpose of this study is to understand of reactions in synthesis of terpenes. Here we show the sequence analysis of terpen synthase based on a characteristics of terpene compound. Initially, we calculate molecular weight, the number of rings, number of rotatable bonds, number of hydrogen bond acceptors, number of hydrogen bond donors, polar surface area and logP value from the structural formula using the techniques of chemoinformatics. We performed principal component analysis using the characteristic data. In addition, we report on the progress of the enzyme sequence analysis referring to the results of the principal component analysis. language of the presentation: Japanese 発表題目: 化合物特徴に基づく植物テルペン合成酵素の配列解析発表概要: 創薬研究過程では情報科学技術を用いて化合物特徴を効率良く解析するためにケモインフォマティクス的手法が利用されている．テルペンは医薬品や工業原料など幅広く用いられているが，植物体内の含有量が少なく人工的な安定生産が求められている．本研究の目的はテルペン化合物の特徴に基づく合成酵素の配列解析を行うことにより，テルペン合成における反応機構の解明である．ケモインフォマティクス的手法を用いて，化合物の構造式から分子量, 環状数, 回転可能結合数, 水素結合アクセプター数, 水素結合ドナー数, 極性表面積, LogP値を計算し，その化合物の特徴データを用いて主成分分析を行った結果について述べる．さらに，主成分分析の結果を参考にした酵素配列解析の進捗状況を報告する．

会場: L3

司会: 伊原彰紀

岩村祐佳	1151016: M, 2回目発表	松本健一, 山田敬嗣, 小西琢
title: Development of a ride-share system by providing different information based on personal trait abstract: :This study aims to develop a system that encourages ride-share. I suggest the structure which changes information to show in total into the personal trait of the passengers in a step of the matching of a driver and passengers, In Japan, the diffusion rate of the ride-share is lower than that of Europe and America. In existing surmises, public information from drivers takes no account of necessary information for passengers. I researched about necessary information for passengers.I will talk about a result of factor analysis, system suggestion based on it and future issues. language of the presentation: Japanese 発表題目: 個人特性に基づいた情報提供によるライドシェア促進システムの開発発表概要: 本研究の目的は人々のライドシェアを促進するシステムの開発である。特に、ドライバーと同乗希望者（ユーザ）のマッチングの段階において、同乗希望者の個人特性に合わせて提示する情報を変化させる仕組みを提案する。日本では欧米と比較してライドシェアの普及率が低い。その原因として既存のマッチングサービスでは、ドライバーからの公開情報がユーザのニーズを考慮していない可能性が考えられる。そこで本研究では、ユーザが相乗り相手を探索する状況において、どのような情報を欲しているかを質問紙調査を通して特定した。本発表では、質問紙調査の因子分析の結果、それに基づくシステム提案、今後の課題について述べる。

石村慎悟	1151010: M, 2回目発表	飯田元, 松本健一, 市川昊平, 吉田則裕
title: Supporting performance comparison of code clone detection tools abstract: Duplicated code (code clone) created by the copy-and-paste is one of the factors deteriorating the maintainability of the source code. Finding code clone manually by visual inspections of large-scale software projects would require extreme effort and it is unquestionable unrealistic. To address this issue, various automatic detection tools of code clone have been proposed. Comparison of the performance of these tools is indispensable for future development of code clone detection technology. However, currently, the authors of each of these tools often evaluate the performance of their tools using unusual and different indexes, so results are not comparable. In this study, we propose the use of a common format that can be applied to the result of each tool. In addition, our final goal is to create a comparison and aggregation tool using the proposed common format to provide assistance for evaluation purposes of code clone detection tools. language of the presentation: Japanese 発表題目: コードクローン検出ツールの性能比較支援発表概要: ソースコードの保守性を悪化させる要因のひとつに，コピーアンドペーストにより作成される重複コード（コードクローン）がある．大規模なソースコードから手作業でコードクローンを発見することは大変な労力を要し，現実的でない．その手間の解消のため，コードクローンの自動検出ツールがさまざま提案されている．それらのツールの性能の比較は，今後のコードクローン検出技術の発展のために欠かせない課題である．しかし現状では，性能評価は各ツールの発案者が独自の指標で行っているものが多く，比較可能な形で与えられているものは少ない．そこで本研究では，各ツールの検出結果のフォーマットを包括して表現できる，共通のフォーマットを定義する．さらに，その共通フォーマットを利用した比較・集計ツールを作成し，評価作業の支援を行うことを目標とする．

濱崎一樹	1151084: M, 2回目発表	飯田元, 松本健一, 市川昊平, 吉田則裕
title: An Analysis of Review Process Quality for Open Source Project abstract: Software review is a quality assurance process to discover violations of coding rules, and defects involved in design documents and source code. If carried out correctly, it leads to the discovery and correction of defects at the early stages of the software development process. It is reported that 60% of defects can be discovered by software reviews. Previous studies have focused only on the analysis of defects found during a particular review, not accounting for escaped defects. Escaped defects are defects that had passed a review, but they are found either found again in a future review or reported as new defects. In this study, we evaluate peer review quality based on escaped defects. We propose an extraction method using data integrated from a code review management, code version control and bug management systems. In this presentation, we will present our proposed method for investigation of review process quality. language of the presentation: Japanese 発表題目: オープンソースソフトウェアを対象としたレビュープロセスの分析発表概要: ソフトウェア開発におけるレビューとは，設計文書やソフトウェアのソースコードを人が読み，設計の誤りやコードの記述ミス，コーティングルールの違反などの問題がないかを検査するプロセスのことである．レビューにより欠陥の早期発見，修正を行うことができ，欠陥のおよそ60%を発見可能であることが報告されている．従来の研究では，記録が不十分であることからレビュー中に発見できた欠陥に着目し，レビュープロセス品質評価が行われてきた．そこで本研究では，レビューを通過してしまった欠陥に着目し，近年用いられるようになったレビュー管理システム，バージョン管理システム，バグ管理システムの情報を用いてそれらを抽出する手法を提案する．また，提案手法を用いたレビュープロセス品質の調査手法について述べる．

山田悠太	1151112: M, 2回目発表	飯田元, 松本健一, 宮崎純, 市川昊平
title: Visualization of developer's activity history based on topic analysis abstract: In software engineering, developer's activity is considered to the quality of software influence. However, in practice to keep record of the developer's extensive activity is a troublesome task. Normally, during the development of a software project, developers record artifacts on software repositories. Artifacts such as software source code and bug tracking reports. In order to estimate the developer's activity, I propose a method using topic analysis. First, I extract topics from comments and identifier of the source code by means of latent Dirichlet allocation (LDA). Next, based on these results, I estimate the developer's activity. I will introduce the results of applied the proposed technique and the results of application to an open source software project. language of the presentation: Japanese 発表題目: トピック解析による開発者活動履歴の可視化発表概要: ソフトウェア工学において，ソフトウェア開発プロジェクトでは個々の開発者の活動がソフトウェアの品質に影響を与えると考えられている．しかし，開発者の活動を逐一記録するのは困難である．通常ソフトウェア開発では構成管理システムが利用されており，開発で生まれるソースコードや不具合修正の履歴などの成果物を記録している．そこで，管理システムの成果物であるソースコード中のコメントや識別子から潜在的意味解析を用いてトピックを抽出する．次に変更を行った時点の成果物が持つトピックとその変更を行った開発者を照らし合わすことで，開発者の活動を推定する．発表ではオープンソースソフトウェアのプロジェクトを対象に手法を適用した結果を紹介する．