Browsing by Subject "Information storage and retrieval"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Open Access Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal(American Association for the Advancement of Science (A A A S), 2013) Gao J.; Aksoy, B. A.; Dogrusoz, U.; Dresdner, G.; Gross, B.; Sumer, S. O.; Sun, Y.; Jacobsen, A.; Sinha, R.; Larsson, E.; Cerami, E.; Sander, C.; Schultz, N.The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. © 2013 American Association for the Advancement of Science.Item Open Access An ontology for collaborative construction and analysis of cellular pathways(Oxford University Press, 2004-02-12) Demir, Emek; Babur, Özgün; Doğrusöz, Uğur; Gürsoy, Atilla; Ayaz, Aslı; Güleşır, Gürcan; Nişancı, Gürkan; Çetin Atalay, RengülMotivation: As the scientific curiosity in genome studies shifts toward identification of functions of the genomes in large scale, data produced about cellular processes at molecular level has been accumulating with an accelerating rate. In this regard, it is essential to be able to store, integrate, access and analyze this data effectively with the help of software tools. Clearly this requires a strong ontology that is intuitive, comprehensive and uncomplicated. Results: We define an ontology for an intuitive, comprehensive and uncomplicated representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information via collaboration, and supports manipulation of the stored data. In addition, it facilitates concurrent modifications to the data while maintaining its validity and consistency. Furthermore, novel structures for representation of multiple levels of abstraction for pathways and homologies is provided. Lastly, our ontology supports efficient querying of large amounts of data. We have also developed a software tool named pathway analysis tool for integration and knowledge acquisition (PATIKA) providing an integrated, multi-user environment for visualizing and manipulating network of cellular events. PATIKA implements the basics of our ontology. © Oxford University Press 2004; All rights reserved.Item Open Access PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways(Oxford University Press, 2002-06) Demir, Emek; Babur, Özgün; Doğrusöz, Uğur; Gürsoy, Atilla; Nişancı, Gürkan; Çetin Atalay, Rengül; Öztürk, MehmetMotivation: Availability of the sequences of entire genomes shifts the scientific curiosity towards the identification of function of the genomes in large scale as in genome studies. In the near future, data produced about cellular processes at molecular level will accumulate with an accelerating rate as a result of proteomics studies. In this regard, it is essential to develop tools for storing, integrating, accessing, and analyzing this data effectively. Results: We define an ontology for a comprehensive representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information and supports manipulation and incorporation of the stored data, as well as multiple levels of abstraction. Based on this ontology, we present the architecture of an integrated environment named PATIKA (Pathway Analysis Tool for Integration and Knowledge Acquisition). PATIKA is composed of a server-side, scalable, object-oriented database and client-side editors to provide an integrated, multi-user environment for visualizing and manipulating network of cellular events. This tool features automated pathway layout, functional computation support, advanced querying and a user-friendly graphical interface. We expect that PATIKA will be a valuable tool for rapid knowledge acquisition, microarray generated large-scale data interpretation, disease gene identification, and drug development.Item Open Access PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways(American Society for Biochemistry and Molecular Biology(ASBMB), 2002-09) Demir, Emek; Babur, Özgün; Doğrusöz, Uğur; Gürsoy, Atilla; Nişancı, Gürkan; Çetin Atalay, Rengül; Öztürk, MehmetItem Open Access PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization(Oxford University Press, 2006-02-01) Doğrusöz, Uğur; Erson, E. Zeynep; Giral, Erhan; Demir, Emek; Babur, Özgün; Çetintaş, Ahmet; Çolak, RecepSummary: PATIKAweb provides a Web interface for retrieving and analyzing biological pathways in the PATIKA database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting facilities to various pathway exchange formats. © The Author 2005. Published by Oxford University Press. All rights reserved.Item Open Access A privacy-preserving solution for compressed storage and selective retrieval of genomic data(Cold Spring Harbor Laboratory Press, 2016) Huang Z.; Ayday, E.; Lin, H.; Aiyar, R. S.; Molyneaux, A.; Xu, Z.; Fellay, J.; Steinmetz, L. M.; Hubaux, Jean-PierreIn clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared withBAM, thede factostandard for storing aligned genomic data, SECRAM uses 18%less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based insteadofread-based,and(2)itallowsrandomqueryingofasubregionfromaBAM-likefileinanencryptedform.Ourmethod thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data.Item Open Access Two learning approaches for protein name extraction(Academic Press, 2009) Tatar, S.; Cicekli, I.Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. © 2009 Elsevier Inc. All rights reserved.