Clustering protein-protein interactions based on conversed domain similarities
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
Protein interactions govern most cellular processes, including signal transduction, transcriptional regulation and metabolism. Saccharomyces ceravisae is estimated to have 16,000 protein interactions. Appereantly only a small number of these interactions were formed ab initio (invention), rest of them were formed through gene duplications and exon shuffling (birth). Domains form functional units of a protein and are responsible for most of the interaction births, since they can be recombined and rearranged much more easily compared to innovation. Therefore groups of functionally similar, homologous interactions that evolved through births are expected to have a certain domain signature. Several high throughput techniques can detect interacting protein pairs, resulting in a rapidly growing corpus of protein interactions. Although there are several efforts for computationally integrating this data with literature and other high throughput data such as gene expression, annotation of this corpus is inadaquate for deriving interaction mechanism and outcome. Finding interaction homologies would allow us to annotate an unannotated interaction based on already annotated known interactions, or predict new ones. In this study we propose a probabilistic model for assigning interactions to homologous groups, according to their conserved domain similarities. Based on this model we have developed and implemented an Expectation-Maximization algorithm for finding the most likely grouping of an interaction set. We tested our algorithm with synthetic and real data, and showed that our initial results are very promising. Finally we propose several directions to improve this work