Ë©Íé ¾°¹¬ ¡ÖStudies on Prediciton of Protein Function Based on Oligopeptides¡×

ȯɽ¹¼³µ

The prediction of protein function using the sequence is one of the important research topics in bioinformatics. Meanwhile, the statistical characteristics of oligopeptide, relatively short subsequence, have been investigated. The main results of research include a new method based on oligopeptides. The research demonstrates that `oligopeptide' enable us to develop an effective method for predicting various protein function. A known function of a protein is regarded to be inherited to its oligopeptides, and the correspondence between oligopeptides and the function is calculated in the whole proteins. In the proposed method, unknown functions of proteins are predicted by means of the correspondence automatically.

Broad Applicability and High Performance: The prediction performance of the method is measured for several functions including GO terms and enzyme activities by recall-precision graphs using the 28,520 whole human proteins registered in RefSeq. In most cases on GO terms, it scores 70% recall with 80% precision. The proposed method is applicable in broad levels from a specific enzyme to large class of enzymes like 'transferases' (EC 2.-.-.-). In some cases on enzyme activities, it scores the maximum f-measure over 0.9. The results of these evaluation suggest that the proposed method is quite efficient for various protein functions.

Better than Other Methods: To clarify the performance of the proposed method objectively, the research includes a comparative research with some already proposed prediction methods based on homology search and pattern matching. For instance, on the prediction of protein-tyrosine kinase, The f-measures of homology search and pattern matching are 0.860 and 0.297, respectively, while the proposed method based on oligopeptides scores maximum f-measure of 0.932. The results of these evaluation suggest that the proposed method based on oligopeptides is more efficient than ones of homology search and pattern matching.

Consideration on Length of Oligopeptides: The research also characterises the relation between the length of oligopeptides and the prediction of protein functions. The performance of prediction is measured for the length of oligopeptides between 1 and 9. The results suggest that: 1) shorter oligopeptides than 4 are obviously less effective, 2) longer oligopeptides than 4 are almost equally effective, and 3) oligopeptides of 4 are intermediate. The prediction based on oligopeptides utilises coexistence of oligopeptides among proteins. The longer oligopeptides are more versatile than the shorter one because the longer oligopeptide is more varied than the shorter one, and the degree of the coexistence is inversely related to the length. The results of these evaluation suggest that length of 5 is quite effective and versatile.