Deepkinzero: zero-shot learning for predicting kinase phosphorylation sites
Çiçek, Abdullah Ercüment
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/47751
Protein kinases are a large family of enzymes that catalyze the phosphorylation of other proteins. By acting as molecular switches for protein activity, the phosphorylation events regulate intracellular signal transduction, thereby assuming a central role in a broad range of cellular activities. On the other hand, aberrant kinase function is implicated in many diseases. Understanding the normal and malfunctioning signaling in the cell entails the identification of phosphorylation sites and the characterization of their interactions with kinases. Recent advances in mass spectrometry enable rapid identification of phosphosites at the proteome level. Alternatively, there are many computational models that predict phosphosites in a given input protein sequence. Once a phosphosite is identified, either experimentally or computationally, knowing which kinase would catalyze the phosphorylation on this particular site becomes the next question. Although a subset of available computational methods provides kinase-specific predictions for phosphorylation sites, due to the need for training data in such supervised methods, these tools can provide predictions only for kinases for which a substantial number of the phosphosites are already known. A particular problem that has not received any attention is the prediction of new sites for kinases with few or no a priori known sites. None of the current computational methods which rely on the classical supervised learning settings can predict additional sites for this kinases. We present DeepKinZero, the first zero-shot learning approach, that can predict phosphosites for kinases with no known phosphosite information. DeepKinZero takes a peptide sequence centered at the phosphorylation site and learns the embeddings of these phosphosite sequences via a bi-directional recurrent neural network, whereas kinase embeddings are based on protein sequence vector representations and the taxonomy of kinases based on their functional properties. Through a compatibility function that associates the representations of the site sequences and the kinases, DeepKinZero transfers knowledge from kinases with many known sites to those kinases with no known sites. Our computational experiments show that DeepKinZero achieves a 30-fold increase in accuracy compared to baseline models. DeepKinZero complements existing approaches by expanding the knowledge of kinases through mapping of the phosphorylation sites pertaining to understudied kinases with no prior information, which are increasingly investigated as novel drug targets.