Prediction of metabolite activities by repetitive clustering of the structural similarity based networks

若松 信孝


A number of studies have investigated the relations between structures and activities of metabolites.

It has been proposed that structural similarity between metabolites implies activity similarity between them.

In light of this fact we propose a method for activity prediction of secondary metabolites based on association philosophy.

First we determined the structural similarity scores between targeted metabolite pairs using COMPLIG algorithm.

To increase the possibility of clusters over represented with known metabolites, we calculated structural similarity between metabolite pairs for which activities of both metabolites or at least one metabolite are known and then selected the metabolite pairs for which the similarity score is higher than a threshold (s > 0.95).

The network of such metabolite pairs was then clustered using the DPClusO algorithm.

Statistically significant cluster-activity pairs were then selected using the hypergeometric test.

Then biological activities of unannotated metabolites were predicted from the activity of metabolites included in the statistically overrepresented clusters.

After the first round of biological activity prediction, we considered the predicted metabolites as known metabolites and repeated the prediction process until no new prediction was made.

Finally, we were able to predict biological activity of 945 metabolites of unknown biological activity.

Furthermore we investigated the activity-activity relationship by examining the activity prediction trends of this study.