Graph embeddings on protein interaction networks

Kuru, Halil İbrahim

Graph embeddings on protein interaction networks

Files

thesis.pdf (1.37 MB)

Date

2019-02

Authors

Kuru, Halil İbrahim

Advisor

Çiçek, A. Ercüment

BUIR Usage Stats

16
views

69
downloads

Abstract

Protein-protein interaction (PPI) networks represent the possible set of interactions among proteins and thereby the genes that code for them. By integrating isolated signals on single genes such as mutations or differential expression patterns, PPI networks have enabled various biological discoveries so far. Furthermore, even the connectivity patterns of proteins in such networks have been proven to be highly informative for various prediction tasks involving proteins or genes. These tasks; however, require task specific feature engineering. Graph embedding techniques that learn a deep representation of the nodes on the network, provides a powerful alternative and obviate the need for this extensive feature engineering on the network. In this study we use graph embedding techniques on PPI networks in two independent machine learning tasks. The first part of the present work focuses on predicting gene essentiality. Using two different node embedding techniques, node2vec and DeepWalk, we present a classifier which only uses node embeddings as input and show that it can achieve up to 88 % AUC score in predicting human gene essentiality. The second part of the thesis proposes a novel representation of patients based on pairwise rank order of patient protein expression values and protein interactions, which we abbreviate as PRER. Specifically, we use the protein expression values of proteins, and generate a patient specific gene embedding to represent relative expression of a protein with other proteins in the neighborhood of that protein. The neighborhood is derived using a biased random-walk strategy. We first check whether a given protein is less or more expressed compared to the other proteins in their neighborhood for a specific tumor. Based on this we generate a representation that not only captures the dysregulation patterns among the proteins but also accounts for the molecular interactions. To test the effectiveness of this representation, we use PRER for the problem of patient survival prediction. When compared against the representation of patients with their individual protein expression features, PRER representation demonstrates significantly superior predictive performance in 8 out of 10 cancer types. Proteins that emerge as important in the PRER as opposed to individual expression values provide a valuable set of biomarkers with high prognostic value. Additionally, they highlight other proteins that should be further investigated for the dysregulation patterns.

Keywords

Graph representations, Node embeddings, Gene essentiality, Network topological features, Survival prediction, Cancer, Protein-protein interaction network

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/49202

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Graph embeddings on protein interaction networks

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Graph embeddings on protein interaction networks

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type