A Grammar-Based Approach to RNA Pseudoknotted Structure Prediction for Aligned Sequences

Nobuyoshi Mizoguchi (0951120)


RNA secondary structure prediction is one of the major topics in bioinformatics. A method based on a parsing algorithm for formal grammars is a promising approach to the prediction problem. In this thesis, I propose a prediction method which uses the formal grammar named SMCFG (stochastic multiple context-free grammar). This grammar can express the important substructures found in RNA, pseudoknot structure. The method is based on comparative sequence analysis, which accepts some of RNA sequences as input and predicts their common secondary structure. Unlike an existing using SMCFG, the proposed method can determin the parameters of grammar rules without training data, which is a large set of RNA sequences annotated by their secondary structures. Finally, I conducted experiments to compare the performance with the method {\it hxmatch}, which is one of the best known existing methods for predicting pseudoknot structure by comparative sequence analysis. F-measure of the propsed method for eight RNA families are comparable with that of hxmatch.