Browsing by Subject "Databases"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Item Restricted A yellow pages of theory and criticism(1995) Carney, RayItem Open Access mESAdb: microRNA expression and sequence analysis database(Oxford University Press, 2011) Kaya, Koray D.; Karakülah, G.; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, ÖzlenMicroRNA expression and sequence analysis database (http://konulab.fen. bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.Item Open Access Modeling and querying temporal data(IGI Global, 2005) Tansel, Abdullah Uz; Rivero, L. C.; Doorn, J. H.; Ferraggine, V. E.The paper discusses modelling temporal variation of data in the context of the relational data model. This is done by employing time-stamps, which leads to two possible approaches: adding time-stamps to tuples and using first normal-form (1NF) relations, and adding time-stamps to attributes and using non-first normal form (NINF) relations. Modelling temporal data according to each approach with their respective data-manipulation operations are covered. This is followed by a discussion that includes the advantages and expressive power of each approach, the classification of temporal databases, and the handling of retroactive and postactive changes.Item Open Access An ontology for collaborative construction and analysis of cellular pathways(Oxford University Press, 2004-02-12) Demir, Emek; Babur, Özgün; Doğrusöz, Uğur; Gürsoy, Atilla; Ayaz, Aslı; Güleşır, Gürcan; Nişancı, Gürkan; Çetin Atalay, RengülMotivation: As the scientific curiosity in genome studies shifts toward identification of functions of the genomes in large scale, data produced about cellular processes at molecular level has been accumulating with an accelerating rate. In this regard, it is essential to be able to store, integrate, access and analyze this data effectively with the help of software tools. Clearly this requires a strong ontology that is intuitive, comprehensive and uncomplicated. Results: We define an ontology for an intuitive, comprehensive and uncomplicated representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information via collaboration, and supports manipulation of the stored data. In addition, it facilitates concurrent modifications to the data while maintaining its validity and consistency. Furthermore, novel structures for representation of multiple levels of abstraction for pathways and homologies is provided. Lastly, our ontology supports efficient querying of large amounts of data. We have also developed a software tool named pathway analysis tool for integration and knowledge acquisition (PATIKA) providing an integrated, multi-user environment for visualizing and manipulating network of cellular events. PATIKA implements the basics of our ontology. © Oxford University Press 2004; All rights reserved.Item Open Access Partitioning models for scaling distributed graph computations(Bilkent University, 2019-08) Demirci, Gündüz VehbiThe focus of this thesis is intelligent partitioning models and methods for scaling the performance of parallel graph computations on distributed-memory systems. Distributed databases utilize graph partitioning to provide servers with data-locality and workload-balance. Some queries performed on a database may form cascades due to the queries triggering each other. The current partitioning methods consider the graph structure and logs of query workload. We introduce the cascade-aware graph partitioning problem with the objective of minimizing the overall cost of communication operations between servers during cascade processes. We propose a randomized algorithm that integrates the graph structure and cascade processes to use as input for large-scale partitioning. Experiments on graphs representing real social networks demonstrate the e ectiveness of the proposed solution in terms of the partitioning objectives. Sparse-general-matrix-multiplication (SpGEMM) is a key computational kernel used in scienti c computing and high-performance graph computations. We propose an SpGEMM algorithm for Accumulo database which enables high performance distributed parallelism through its iterator framework. The proposed algorithm provides write-locality and avoids scanning input matrices multiple times by utilizing Accumulo's batch scanning capability and node-level parallelism structures. We also propose a matrix partitioning scheme that reduces the total communication volume and provides a workload-balance among servers. Extensive experiments performed on both real-world and synthetic sparse matrices show that the proposed algorithm and matrix partitioning scheme provide signi cant performance improvements. Scalability of parallel SpGEMM algorithms are heavily communication bound. Multidimensional partitioning of SpGEMM's workload is essential to achieve higher scalability. We propose hypergraph models that utilize the arrangement of processors and also attain a multidimensional partitioning on SpGEMM's workload. Thorough experimentation performed on both realistic as well as synthetically generated SpGEMM instances demonstrates the e ectiveness of the proposed partitioning models.Item Open Access PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways(Oxford University Press, 2002-06) Demir, Emek; Babur, Özgün; Doğrusöz, Uğur; Gürsoy, Atilla; Nişancı, Gürkan; Çetin Atalay, Rengül; Öztürk, MehmetMotivation: Availability of the sequences of entire genomes shifts the scientific curiosity towards the identification of function of the genomes in large scale as in genome studies. In the near future, data produced about cellular processes at molecular level will accumulate with an accelerating rate as a result of proteomics studies. In this regard, it is essential to develop tools for storing, integrating, accessing, and analyzing this data effectively. Results: We define an ontology for a comprehensive representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information and supports manipulation and incorporation of the stored data, as well as multiple levels of abstraction. Based on this ontology, we present the architecture of an integrated environment named PATIKA (Pathway Analysis Tool for Integration and Knowledge Acquisition). PATIKA is composed of a server-side, scalable, object-oriented database and client-side editors to provide an integrated, multi-user environment for visualizing and manipulating network of cellular events. This tool features automated pathway layout, functional computation support, advanced querying and a user-friendly graphical interface. We expect that PATIKA will be a valuable tool for rapid knowledge acquisition, microarray generated large-scale data interpretation, disease gene identification, and drug development.Item Open Access PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways(American Society for Biochemistry and Molecular Biology(ASBMB), 2002-09) Demir, Emek; Babur, Özgün; Doğrusöz, Uğur; Gürsoy, Atilla; Nişancı, Gürkan; Çetin Atalay, Rengül; Öztürk, MehmetItem Open Access PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization(Oxford University Press, 2006-02-01) Doğrusöz, Uğur; Erson, E. Zeynep; Giral, Erhan; Demir, Emek; Babur, Özgün; Çetintaş, Ahmet; Çolak, RecepSummary: PATIKAweb provides a Web interface for retrieving and analyzing biological pathways in the PATIKA database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting facilities to various pathway exchange formats. © The Author 2005. Published by Oxford University Press. All rights reserved.Item Open Access Scaling sparse matrix-matrix multiplication in the accumulo database(Springer, 2020) Demirci, Gündüz Vehbi; Aykanat, CevdetWe propose and implement a sparse matrix-matrix multiplication (SpGEMM) algorithm running on top of Accumulo’s iterator framework which enables high performance distributed parallelism. The proposed algorithm provides write-locality while ingesting the output matrix back to database via utilizing row-by-row parallel SpGEMM. The proposed solution also alleviates scanning of input matrices multiple times by making use of Accumulo’s batch scanning capability which is used for accessing multiple ranges of key-value pairs in parallel. Even though the use of batch-scanning introduces some latency overheads, these overheads are alleviated by the proposed solution and by using node-level parallelism structures. We also propose a matrix partitioning scheme which reduces the total communication volume and provides a balance of workload among servers. The results of extensive experiments performed on both real-world and synthetic sparse matrices show that the proposed algorithm scales significantly better than the outer-product parallel SpGEMM algorithm available in the Graphulo library. By applying the proposed matrix partitioning, the performance of the proposed algorithm is further improved considerably.Item Open Access Time-by-example query language for historical databases(IEEE, 1989) Tansel, A. U.; Arkun, M. E.; Ozsoyoglu, G.Time-by-Example (TBE) is a user-friendly query language designed specifically for historical relational databases. It follows the graphical structure and the example query concept of QBE, and employs the hierarchical arrangement of subqueries of Abe and STBE. Similar to STBE, TBE handles set- and simple-valued attributes. In addition, to handle time, TBE is capable of manipulating triplet- and set-triplet-valued attributes. The underlying data model used in TBE is an extended relational data model in which nonfirst normal form relations and attribute time stamping (in contrast to tuple time stamping) are used. A triplet is a 3-tuple whose components are the lower and upper bounds of a time interval and a value valid over the interval. A triplet is used as a timestamped value of a time-dependent attribute. Set-valued time-dependent attributes are modeled by sets of triplets. To process TBE queries and to define a historical relational algebra (HRA), standard operators of the relational algebra and the packlunpack operators of [Zl] are augmented by triplet-decomposition, tripletformation, slice, and drop-time operators. Methodologies for translating TBE queries into HRA expressions and for constructing their parse trees are presented.Item Open Access Vertical framing of superimposed signature files using partial evaluation of queries(Elsevier, 1997) Kocberbera, S.; Can, F.A new signature file method, Multi-Frame Signature File (MFSF), is introduced by extending the bit-sliced signature file method. In MFSF, a signature file is divided into variable sized vertical frames with different on-bit densities to optimize the response time using a partial query evaluation methodology. In query evaluation the on-bits of the lower on-bit density frames are used first. As the number of query terms increases, the number of query signature on-bits in the lower on-bit density frames increases and the query stopping condition is reached in fewer evaluation steps. Therefore, in MFSF, the query evaluation time decreases for increasing numbers of query terms. Under the sequentiality assumption of disk blocks, in a PC environment with 30 ms average disk seek time, MFSF provides a projected worst-case response time of 3.54 seconds for a database size of one million records in a uniform distribution multi-term query environment with 1–5 terms per query. Due to partial evaluation, this desired response time is guaranteed for queries with several terms. The comparison of MFSF with the inverted file approach shows that MFSF provides promising research opportunities.