Development of an Integrated System for Clustering of Simple and Bipartite Graphs and its Application to Different Biological Data

Mohammad Bozlul Karim


Presentation abstract

We make five chapters to elaborate our research work.

First chapter illustrates the biclustering algorithm (BiClusO) from a matrix representational data of a simple bipartite graph. At the end of the chapter, we also show two different parameters to control the overlapping between biclusters.

In the second chapter, we compare our algorithm BiClusO to five different biclustering algorithms using two different biological datasets and four different synthetic datasets. We explain the methods of comparison and the creation of different synthetic data set. For the comparison of the biological dataset, we use the hypergeometric test and GO-term analysis to assess the richness of each bicluster. For the comparison of the synthetic dataset, we use the average module recovery and average cluster relevance score to assess the similarity between actual bicluster and generated bicluster.

In the third chapter, we apply our algorithm BiClusO along with DPClusO algorithm to Species-VOC bipartite data. We apply fisher exact test in species side of the biclusters and show that taxonomic classification is consistent with VOC based classification. We also create the SSVG(Structurally similar VOC group) by applying DPClusO and mapped the VOC side of each bicluster and show that plant-fungi, plant-bacteria, and eukaryotes-prokaryotes produce a distinctive type of VOC. We also assess the diversity of VOC pathways across different kingdoms of species.

In the fourth chapter, we apply BiClusO to miRNA-mRNA bipartite network and isolate the sub-MRM (MicroRNA regulatory module) of Inflammatory Bowel Disease. We use four different sources of miRNA-mRNA bipartite network. After isolating the sub-MRM we rank each miRNA by using the relevance score and occupancy to each dataset. We also evaluate the relevance of IBD or related disease to our top-ranking miRNAs by using different literature and miRNA-Disease network analysis.

In chapter five we describe the implementation detail of DPClusO and BiClusO using GUI. We explain user interaction with different parameter selection using two illustrative examples. The system provides simple clustering, biclustering, simple cluster hierarchical relation, bicluster hierarchical relation, cluster joining, cluster filtering, visualization of the cluster set using different layout. partial visualization of a single cluster etc.

Abstract

Network analysis particularly graph clustering has become a useful and important technique in data mining applications. It provides a global view of data structure where highly concentrated data are grouped based on their common properties. We propose a novel biclustering approach called BiClusO. We compare our biclustering algorithm with five different algorithms using biological and synthetic data and evaluate the performances. Our algorithm shows the best performance over the selected five biclustering algorithms. We also present new integrated software implementing the DPClusO and BiClusO algorithms to be utilized for simple and bipartite graph clustering. This tool provides the user with GUI based facilities for simple and bipartite graph clustering along with filtering and amalgamation of clusters, hierarchical node analysis, node distribution among cluster set and visualization of all or partial portion of a big cluster set. We used this tool to analyze the bipartite relations between species and volatile organic compounds (VOCs). VOCs emitted by different species have huge environmental and ecological impacts. Biosynthesis of VOCs depends on different metabolic pathways based on which the species can be categorized. Our experiment shows that VOC based classification is consistent with taxonomy based classification of the species. We assessed the diversity of VOC pathways across different species classes by using structurally similar VOC groups (SSVGs). We also analyzed the mRNA and miRNA bipartite relations. In such relation a miRNA-regulatory module (MRM) is a subset of miRNA target interactions (MTI) where a group of miRNAs participate cooperatively by regulating a bunch of genes to control different biological processes. We mainly focused on MRMs detection from MTIs involving the IBD related genes. We evaluated the relevance of the miRNAs with IBD by counting their occurrences in different MRMs and their interactions with known IBD genes. Finally we successfully identified some important IBD related miRNAs.