Deepkinzero: zero-shot learning for predicting kinase phosphorylation sites
Author
Deznabi, Iman
Advisor
Çiçek, Abdullah Ercüment
Date
2018-08Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
216
views
views
18
downloads
downloads
Abstract
Protein kinases are a large family of enzymes that catalyze the phosphorylation
of other proteins. By acting as molecular switches for protein activity, the phosphorylation
events regulate intracellular signal transduction, thereby assuming a
central role in a broad range of cellular activities. On the other hand, aberrant
kinase function is implicated in many diseases. Understanding the normal and
malfunctioning signaling in the cell entails the identification of phosphorylation
sites and the characterization of their interactions with kinases. Recent advances
in mass spectrometry enable rapid identification of phosphosites at the proteome
level. Alternatively, there are many computational models that predict phosphosites
in a given input protein sequence. Once a phosphosite is identified, either
experimentally or computationally, knowing which kinase would catalyze the
phosphorylation on this particular site becomes the next question. Although a
subset of available computational methods provides kinase-specific predictions
for phosphorylation sites, due to the need for training data in such supervised
methods, these tools can provide predictions only for kinases for which a substantial
number of the phosphosites are already known. A particular problem that
has not received any attention is the prediction of new sites for kinases with few
or no a priori known sites. None of the current computational methods which
rely on the classical supervised learning settings can predict additional sites for
this kinases. We present DeepKinZero, the first zero-shot learning approach,
that can predict phosphosites for kinases with no known phosphosite information.
DeepKinZero takes a peptide sequence centered at the phosphorylation site and
learns the embeddings of these phosphosite sequences via a bi-directional recurrent
neural network, whereas kinase embeddings are based on protein sequence vector
representations and the taxonomy of kinases based on their functional properties.
Through a compatibility function that associates the representations of the site sequences and the kinases, DeepKinZero transfers knowledge from kinases with
many known sites to those kinases with no known sites. Our computational experiments
show that DeepKinZero achieves a 30-fold increase in accuracy compared to
baseline models. DeepKinZero complements existing approaches by expanding the
knowledge of kinases through mapping of the phosphorylation sites pertaining to
understudied kinases with no prior information, which are increasingly investigated
as novel drug targets.