Graph-based protein-protein interaction prediction in Saccharomyces cerevisiae

by Paradesi, Martin Samuel

Abstract (Summary)
The term 'protein-protein interaction (PPI)' refers to the study of associations between proteins as manifested through biochemical processes such as formation of structures, signal transduction, transport, and phosphorylation. PPI play an important role in the study of biological processes. Many PPI have been discovered over the years and several databases have been created to store the information about these interactions. von Mering (2002) states that about 80,000 interactions between yeast proteins are currently available from various high-throughput interaction detection methods. Determining PPI using high-throughput methods is not only expensive and time-consuming, but also generates a high number of false positives and false negatives. Therefore, there is a need for computational approaches that can help in the process of identifying real protein interactions.

Several methods have been designed to address the task of predicting protein-protein interactions using machine learning. Most of them use features extracted from protein sequences (e.g., amino acids composition) or associated with protein sequences directly (e.g., GO annotation). Others use relational and structural features extracted from the PPI network, along with the features related to the protein sequence. When using the PPI network to design features, several node and topological features can be extracted directly from the associated graph.

In this thesis, important graph features of a protein interaction network that help in predicting protein interactions are identified. Two previously published datasets are used in this study. A third dataset has been created by combining three PPI databases. Several classifiers are applied on the graph attributes extracted from protein interaction networks of these three datasets. A detailed study has been performed in this present work to determine if graph attributes extracted from a protein interaction network are more predictive than biological features of protein interactions. The results indicate that the performance criteria (such as Sensitivity, Specificity and AUC score) improve when graph features are combined with biological features.

Bibliographical Information:


School:Kansas State University

School Location:USA - Kansas

Source Type:Master's Thesis

Keywords:protein interactions machine learning bioinformatics computer science 0984


Date of Publication:01/01/2008

© 2009 All Rights Reserved.