Graph embeddings on protein interaction networks
Author(s)
Advisor
Çiçek, A. ErcümentDate
2019-02Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
416
views
views
3,877
downloads
downloads
Abstract
Protein-protein interaction (PPI) networks represent the possible set of interactions
among proteins and thereby the genes that code for them. By integrating
isolated signals on single genes such as mutations or differential expression patterns,
PPI networks have enabled various biological discoveries so far. Furthermore,
even the connectivity patterns of proteins in such networks have been proven
to be highly informative for various prediction tasks involving proteins or genes.
These tasks; however, require task specific feature engineering. Graph embedding
techniques that learn a deep representation of the nodes on the network, provides a
powerful alternative and obviate the need for this extensive feature engineering on
the network. In this study we use graph embedding techniques on PPI networks in
two independent machine learning tasks. The first part of the present work focuses
on predicting gene essentiality. Using two different node embedding techniques,
node2vec and DeepWalk, we present a classifier which only uses node embeddings
as input and show that it can achieve up to 88 % AUC score in predicting human
gene essentiality.
The second part of the thesis proposes a novel representation of patients based
on pairwise rank order of patient protein expression values and protein interactions,
which we abbreviate as PRER. Specifically, we use the protein expression
values of proteins, and generate a patient specific gene embedding to represent
relative expression of a protein with other proteins in the neighborhood of that
protein. The neighborhood is derived using a biased random-walk strategy. We
first check whether a given protein is less or more expressed compared to the other
proteins in their neighborhood for a specific tumor. Based on this we generate a
representation that not only captures the dysregulation patterns among the proteins
but also accounts for the molecular interactions. To test the effectiveness of
this representation, we use PRER for the problem of patient survival prediction.
When compared against the representation of patients with their individual protein
expression features, PRER representation demonstrates significantly superior
predictive performance in 8 out of 10 cancer types. Proteins that emerge as important
in the PRER as opposed to individual expression values provide a valuable
set of biomarkers with high prognostic value. Additionally, they highlight other
proteins that should be further investigated for the dysregulation patterns.
Keywords
Graph representationsNode embeddings
Gene essentiality
Network topological features
Survival prediction
Cancer
Protein-protein interaction network