Browsing by Subject "Computational biology"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Open Access Algorithms for effective querying of compound graph-based pathway databases(BioMed Central Ltd., 2009-11-16) Doğrusöz, Uğur; Çetintaş, Ahmet; Demir, Emek; Babur, ÖzgünBackground: Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. Results: Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool PATIKAweb (Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases. Conclusion: The PATIKA Project Web site is http://www.patika.org. PATIKAweb version 2.1 is available at http://web.patika.org. © 2009 Dogrusoz et al; licensee BioMed Central Ltd.Item Open Access The BioPAX community standard for pathway data sharing(Nature Publishing Group, 2010-09) Demir, Emek; Cary, M. P.; Paley, S.; Fukuda, K.; Lemer, C.; Vastrik, I.; Wu, G.; D'Eustachio, P.; Schaefer, C.; Luciano, J.; Schacherer, F.; Martinez-Flores, I.; Hu, Z.; Jimenez-Jacinto, V.; Joshi-Tope, G.; Kandasamy, K.; Lopez-Fuentes, A. C.; Mi, H.; Pichler, E.; Rodchenkov, I.; Splendiani, A.; Tkachev, S.; Zucker, J.; Gopinath, G.; Rajasimha, H.; Ramakrishnan, R.; Shah, I.; Syed, M.; Anwar, N.; Babur, Özgün; Blinov, M.; Brauner, E.; Corwin, D.; Donaldson, S.; Gibbons, F.; Goldberg, R.; Hornbeck, P.; Luna, A.; Murray-Rust, P.; Neumann, E.; Reubenacker, O.; Samwald, M.; Iersel, Martijn van; Wimalaratne, S.; Allen, K.; Braun, B.; Whirl-Carrillo, M.; Cheung, Kei-Hoi; Dahlquist, K.; Finney, A.; Gillespie, M.; Glass, E.; Gong, L.; Haw, R.; Honig, M.; Hubaut, O.; Kane, D.; Krupa, S.; Kutmon, M.; Leonard, J.; Marks, D.; Merberg, D.; Petri, V.; Pico, A.; Ravenscroft, D.; Ren, L.; Shah, N.; Sunshine, M.; Tang R.; Whaley, R.; Letovksy, S.; Buetow, K. H.; Rzhetsky, A.; Schachter, V.; Sobral, B. S.; Doğrusöz, Uğur; McWeeney, S.; Aladjem, M.; Birney, E.; Collado-Vides, J.; Goto, S.; Hucka, M.; Novère, Nicolas Le; Maltsev, N.; Pandey, A.; Thomas, P.; Wingender, E.; Karp, P. D.; Sander, C.; Bader, G. D.Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery. © 2010 Nature America, Inc. All rights reserved.Item Open Access Causality analysis in biological networks(Bilkent University, 2010) Babur, ÖzgünSystems biology is a rapidly emerging field, shaped in the last two decades or so, which promises understanding and curing several complex diseases such as cancer. In order to get an insight about the system – specifically the molecular network in the cell – we need to work on following four fundamental aspects: experimental and computational methods to gather knowledge about the system, mathematical models for representing the knowledge, analysis methods for answering questions on the model, and software tools for working on these. In this thesis, we propose new approaches related to all these aspects. In this thesis, we define new terms and concepts that helps us to analyze cellular processes, such as positive and negative paths, upstream and downstream relations, and distance in process graphs. We propose algorithms that will search for functional relations between molecules and will answer several biologically interesting questions related to the network, such as neighborhoods, paths of interest, and common targets or regulators of molecules. In addition, we introduce ChiBE, a pathway editor for visualizing and analyzing BioPAX networks. The tool converts BioPAX graphs to drawable process diagrams and provides the mentioned novel analysis algorithms. Users can query pathways in Pathway Commons database and create sub-networks that focus on specific relations of interest. We also describe a microarray data analysis component, PATIKAmad, built into ChiBE and PATIKAweb, which integrates expression experiment data with networks. PATIKAmad helps those tools to represent experiment values on network elements and to search for causal relations in the network that potentially explain dependent expressions. Causative path search depends on the presence of transcriptional relations in the model, which however is underrepresented in most of the databases. This is mainly due to insufficient knowledge in the literature. We finally propose a method for identifying and classifying modulators of transcription factors, to help complete the missing transcriptional relations in the pathway databases. The method works with large amount of expression data, and looks for evidence of modulation for triplets of genes, i.e. modulator - factor - target. Modulator candidates are chosen among the interacting proteins of transcription factors. We expect to observe that expression of the target gene depends on the interaction between factor and modulator. According to the observed dependency type, we further classify the modulation. When tested, our method finds modulators of Androgen Receptor; our top-scoring result modulators are supported by other evidence in the literature. We also observe that the modulation event and modulation type highly depend on the specific target gene. This finding contradicts with expectations of molecular biology community who often assume a modulator has one type of effect regardless of the target gene.Item Open Access ChiBE: interactive visualization and manipulation of BioPAX pathway models(Oxford University Press, 2010-02-01) Babur, Özgün; Doğrusöz, Uğur; Demir, Emek; Sander, C.SUMMARY: Representing models of cellular processes or pathways in a graphically rich form facilitates interpretation of biological observations and generation of new hypotheses. Solving biological problems using large pathway datasets requires software that can combine data mapping, querying and visualization as well as providing access to diverse data resources on the Internet. ChiBE is an open source software application that features user-friendly multi-view display, navigation and manipulation of pathway models in BioPAX format. Pathway views are rendered in a feature-rich format, and may be laid out and edited with state-of-the-art visualization methods, including compound or nested structures for visualizing cellular compartments and molecular complexes. Users can easily query and visualize pathways through an integrated Pathway Commons query tool and analyze molecular profiles in pathway context. AVAILABILITY: http://www.bilkent.edu.tr/%7Ebcbi/chibe.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Item Open Access Integrating biological pathways and genomic profiles with ChiBE 2(Bilkent University, 2013) Çakır, MerveBiological pathways store information about spatial and temporal organization of interactions taking place in an organism. They hold valuable information that can assist scientific community in understanding the details of a particular mechanism or deciphering the reasons of disruption when the system goes wrong. However, extracting knowledge from these pathways is not trivial as they can be huge and complicated. Additionally, simple visualization of pathways will only reveal limited knowledge, whereas their integration with experimental results can identify distinct and intriguing relationships. Therefore, it is critical to have tools that are specialized in analyzing and understanding biological pathways. ChiBE is one such tool that can visualize, manipulate and analyze pathway data stored in BioPAX format. While preparing the second version of the tool, there have been improvements regarding pathway searches, high throughput data integration, and database connections. Visual notation has also been updated in order to follow standards in visualizations defined by the SBGN community. Previously defined pathway query algorithms have been adapted to be compatible with the BioPAX model. New query types have also been designed to offer a wider range of options. With these queries, ChiBE now offers a variety of ways of pathway decomposition and thorough analysis of complex pathway views. There has also been improvements in integration of high throughput experimental results. To offer easy access to expression microarrays, a gateway to the GEO database has been added. The cBio Cancer Genomics Portal is also now reachable within ChiBE in order to obtain information about genomic status of various cancer cells. After simply asking for an identifier of a particular experiment, ChiBE retrieves the results from databases and then integrates them with the available pathway view through color codes. Furthermore, a connection to DAVID database is available, in case users want to annotate a list of genes with respect to biological terms associated with them. With these new features and improvements, ChiBE 2 has become a comprehensive tool that offers a wide range of analysis options with a genomics-oriented workflow to deepen our understanding of biological pathways.Item Open Access k-Shell decomposition reveals structural properties of the gene coexpression network for neurodevelopment(TÜBİTAK, 2017) Çiçek, A. ErcümentNeurodevelopment is a dynamic and complex process, which involves interactions of thousands of genes. Understanding the mechanisms of brain development is important for uncovering the genetic architectures of neurodevelopmental disorders such as autism spectrum disorder and intellectual disability. The BrainSpan dataset is an important resource for studying the transcriptional mechanisms governing neurodevelopment. It contains RNA-seq and microarray data for 13 developmental periods in 8-16 brain regions. Various important studies used this dataset, in particular to generate gene coexpression networks. The topology of the BrainSpan gene coexpression network yielded various important gene clusters, which are found to play key roles in diseases. In this work, we analyze the topology of the BrainSpan gene coexpression network using the k-shell decomposition method. k-Shell decomposition is an unsupervised method to decompose a network into layers (shells) using the connectivity information and to detect a nucleus that is central to overall connectivity. Our results show that there are 267 layers in the BrainSpan gene coexpression network. The nucleus contains 2584 genes, which are related to chromatin modification function. We compared and contrasted the structure with the autonomous system level Internet. We found that despite similarities in percolation transition and crust size distribution, there are also differences: the BrainSpan coexpression network has a significantly large nucleus and only a very small number of genes need to access the nucleus first, to be able to connect to other genes in the crust above the nucleus. © TÜBİTAK.Item Open Access PATIKAmad: putting microarray data into pathway context(Wiley - V C H Verlag GmbH & Co. KGaA, 2008-06) Babur, Özgün; Colak, Recep; Demir, Emek; Doğrusöz, UğurHigh-throughput experiments, most significantly DNA microarrays, provide us with system-scale profiles. Connecting these data with existing biological networks poses a formidable challenge to uncover facts about a cell's proteome. Studies and tools with this purpose are limited to networks with simple structure, such as protein-protein interaction graphs, or do not go much beyond than simply displaying values on the network. We have built a microarray data analysis tool, named PATIKAmad, which can be used to associate microarray data with the pathway models in mechanistic detail, and provides facilities for visualization, clustering, querying, and navigation of biological graphs related with loaded microarray experiments. PATIKAmad is freely available to noncommercial users as a new module of PATIKAweb at http://web.patika.org. © 2008 Wiley-VCH Verlag GmbH & Co. KGaA.Item Open Access A privacy-preserving solution for compressed storage and selective retrieval of genomic data(Cold Spring Harbor Laboratory Press, 2016) Huang Z.; Ayday, E.; Lin, H.; Aiyar, R. S.; Molyneaux, A.; Xu, Z.; Fellay, J.; Steinmetz, L. M.; Hubaux, Jean-PierreIn clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared withBAM, thede factostandard for storing aligned genomic data, SECRAM uses 18%less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based insteadofread-based,and(2)itallowsrandomqueryingofasubregionfromaBAM-likefileinanencryptedform.Ourmethod thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data.Item Open Access Software support for SBGN maps: SBGN-ML and LibSBGN(Oxford University Press, 2012) Iersel, Martijn P. van; Villéger, A. C.; Czauderna, T.; Boyd, S. E.; Bergmann, F. T.; Luna, A.; Demir, E.; Sorokin, A.; Dogrusoz, U.; Matsuoka, Y.; Funahashi, A.; Aladjem, M. I.; Mi, H.; Moodie, S. L.; Kitano, H.; Le novère, N.; Schreiber, F.Motivation: LibSBGN is a software library for reading, writing and manipulating Systems Biology Graphical Notation (SBGN) maps stored using the recently developed SBGN-ML file format. The library (available in C++ and Java) makes it easy for developers to add SBGN support to their tools, whereas the file format facilitates the exchange of maps between compatible software applications. The library also supports validation of maps, which simplifies the task of ensuring compliance with the detailed SBGN specifications. With this effort we hope to increase the adoption of SBGN in bioinformatics tools, ultimately enabling more researchers to visualize biological knowledge in a precise and unambiguous manner. © The Author(s) 2012. Published by Oxford University Press.Item Open Access Two learning approaches for protein name extraction(Academic Press, 2009) Tatar, S.; Cicekli, I.Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. © 2009 Elsevier Inc. All rights reserved.