Analysis of a large-scale protein-protein interaction network for predicting uncharacterized protein functions.


Although genomic and proteomic approaches have accumulated a huge amount of data which provide clues to protein function, a large number of proteins from the available fully sequenced genomes remain functionally uncharacterized. Ascribing function to these proteins is one of the most challenging problems of the post-genomic era. However, all omic approaches have intrinsic shortcomings, such as false positives and negatives; furthermore, data emerging from any single omic approach can provide only crude indications of protein function. Therefore, methods for integrating data from various genomic and proteomic approaches are needed to build more robust biological hypotheses. To predict uncharacterized protein functions with high confidence, I introduce a strategy that integrates protein-protein interaction data and domain information.

Approaches that involve protein-protein interaction data are considered among the most powerful for inferring functions of uncharacterized proteins, because protein-protein interaction data represent physical interactions between proteins. However, interpreting comprehensive protein-protein interaction data for predicting uncharacterized protein functions has been a challenging task because the data contain a lot of false positives and negatives.

To overcome these problem, I have developed a method which extracts functionally similar proteins with high confidence by integrating protein-protein interaction data and domain information. I used this method to analyze publicly available data from Saccharomyces cerevisiae. I identified 1,042 functional associations, involving 765 proteins of which 86 (11.2%) had no previously ascribed function. In addition, I identified seven clusters of uncharacterized proteins which represent potentially new functions.

My method extracts functionally similar protein pairs more accurately than conventional methods, and predicting functions for previously uncharacterized proteins can be achieved with high confidence. I found that integrating protein-protein interaction data with domain information is a rational and reliable approach for predicting protein function. It can of course be applied to protein-protein interaction data for any species.