Show simple item record

dc.contributor.advisorÇiçek, A. Ercüment
dc.contributor.authorKuru, Halil İbrahim
dc.date.accessioned2019-02-11T07:43:09Z
dc.date.available2019-02-11T07:43:09Z
dc.date.copyright2019-02
dc.date.issued2019-02
dc.date.submitted2019-02-08
dc.identifier.urihttp://hdl.handle.net/11693/49202
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2019.en_US
dc.descriptionIncludes bibliographical references (leaves 54-73).en_US
dc.description.abstractProtein-protein interaction (PPI) networks represent the possible set of interactions among proteins and thereby the genes that code for them. By integrating isolated signals on single genes such as mutations or differential expression patterns, PPI networks have enabled various biological discoveries so far. Furthermore, even the connectivity patterns of proteins in such networks have been proven to be highly informative for various prediction tasks involving proteins or genes. These tasks; however, require task specific feature engineering. Graph embedding techniques that learn a deep representation of the nodes on the network, provides a powerful alternative and obviate the need for this extensive feature engineering on the network. In this study we use graph embedding techniques on PPI networks in two independent machine learning tasks. The first part of the present work focuses on predicting gene essentiality. Using two different node embedding techniques, node2vec and DeepWalk, we present a classifier which only uses node embeddings as input and show that it can achieve up to 88 % AUC score in predicting human gene essentiality. The second part of the thesis proposes a novel representation of patients based on pairwise rank order of patient protein expression values and protein interactions, which we abbreviate as PRER. Specifically, we use the protein expression values of proteins, and generate a patient specific gene embedding to represent relative expression of a protein with other proteins in the neighborhood of that protein. The neighborhood is derived using a biased random-walk strategy. We first check whether a given protein is less or more expressed compared to the other proteins in their neighborhood for a specific tumor. Based on this we generate a representation that not only captures the dysregulation patterns among the proteins but also accounts for the molecular interactions. To test the effectiveness of this representation, we use PRER for the problem of patient survival prediction. When compared against the representation of patients with their individual protein expression features, PRER representation demonstrates significantly superior predictive performance in 8 out of 10 cancer types. Proteins that emerge as important in the PRER as opposed to individual expression values provide a valuable set of biomarkers with high prognostic value. Additionally, they highlight other proteins that should be further investigated for the dysregulation patterns.en_US
dc.description.statementofresponsibilityby Halil İbrahim Kuruen_US
dc.format.extentxvi, 94 leaves : charts (some color) ; 30 cm.en_US
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectGraph representationsen_US
dc.subjectNode embeddingsen_US
dc.subjectGene essentialityen_US
dc.subjectNetwork topological featuresen_US
dc.subjectSurvival predictionen_US
dc.subjectCanceren_US
dc.subjectProtein-protein interaction networken_US
dc.titleGraph embeddings on protein interaction networksen_US
dc.title.alternativeProtein etkileşim ağlarında çizge gömülümlerien_US
dc.typeThesisen_US
dc.departmentDepartment of Computer Engineeringen_US
dc.publisherBilkent Universityen_US
dc.description.degreeM.S.en_US
dc.identifier.itemidB159669


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record