Towards inferring biologically informative protein-protein interactions
Abstract (Summary)
iii
With the accomplishment of the Human Genome Project, the study of proteins
and their functions has become a major focus of current biological research. Of particular
interest are their interactions, which are very important in determining cellular functions
because proteins seldom act alone. High throughput experiments have produced a large
volume of information about pair-wise protein-protein interactions. However, the data
contain a large amount of false negatives (i.e., incomplete interaction data) and false
positives (i.e., fake interactions). Our effort in analyzing the pairwise interaction data is
to mine the coherent information and forecast unobserved interactions from experimental
interaction data.
As proteins are assumed to interact through their domains, which are considered
to be the building blocks of proteins, a domain-based approach for inferring interactions is
adopted. We propose a new framework of learning by modeling the problem of interaction
inference as a constraint satisfiability problem and solve it as a linear program. To handle
the cases where multiple domains contribute to one interaction, a hyperclique pattern
based method is used to select domain combinations, which are then deemed as a single
unit of the interaction.
The domain-based approaches require a reasonable assignment of domains. However,
the vagueness of domain definition adds another layer of difficulty in the inference.
We thus investigate the consensus of domain definitions through the comparative mapping
of two types of domain definitions. In the cases of disagreement, the functional and
iv
evolutionary characteristics of the domains are examined to determine which domain
definition is biologically more informative.
One limitation shared by all domain-based interaction inference methods is that
domain composition is considered as the sole determining factor for interactions. However,
the presence of a pair of interacting domains in a pair of proteins only sets the
potential for the two proteins to interact. However, in a real biological setting, this does
not necessarily mean that the two proteins will interact. We attempt to use protein expression
profiles to filter out spurious interactions. Because each protein may participate
in a number of biological processes and thus will interact with different proteins at different
cellular stages, locally co-expressed protein clusters are discovered by biclustering
the time-series gene expression data.
Bibliographical Information:
Advisor:
School:Pennsylvania State University
School Location:USA - Pennsylvania
Source Type:Master's Thesis
Keywords:
ISBN:
Date of Publication: