Protein sequence modeling and transcription regulation network analysis towards big data biology

Kibinge Nelson Kipchirchir ( 1361017 )


Computer science plays a key role in analysis of biological data. For example, to compare similarities between genes (DNA), sequences of interest are mapped to each other using computational tools. Due to the decreasing costs of instruments such as those used in DNA sequencing, there has been a corresponding increase in data available further strengthening the need for computer science in biology. One of the recent trends in computational biology is the use of data from multiple sources and across various levels of biological studies to understand integrated systems. This is especially true for both RNA and protein studies.

In this work, we introduce two examples of integrated systems biology applications. We first will describe an innovative approach for protein sequence representation in computer applications and examine how this tool can be used to incorporation of amino acid biological properties during computational analysis of proteins. We will also describe another integrated pipeline for examining transcription regulation networks from gene expression data. This approach will demonstrate the usefulness of biological objectivity in designing computer tools for analysing molecular data.