Department of Computer Engineering

Permanent URI for this collection

https://hdl.handle.net/11693/13677

Browse

Now showing 1 - 13 of 13

Open Access
Anonymity on the internet: why the price may be too high
(Association for Computing Machinery, 2002) Davenport, D.
By allowing anonymous Net communication, the fabric of our society is at risk.
Open Access
Community-driven roadmap for integrated disease maps
(Oxford University Press, 2018) Ostaszewski, M.; Gebel, S.; Kuperstein, I.; Mazein, A.; Zinovyev, A.; Doğrusöz, Uğur; Hasenauer, J.; Fleming, R. M. T.; Novere, N. L.; Gawron, P.; Ligon, T.; Niarakis, A.; Nickerson, D.; Weindl, D.; Balling, R.; Barillot, E.; Auffray, C.; Schneider, R.
The Disease Maps Project builds on a network of scientific and clinical groups that exchange best practices, share information and develop systems biomedicine tools. The project aims for an integrated, highly curated and user-friendly platform for disease-related knowledge. The primary focus of disease maps is on interconnected signaling, metabolic and gene regulatory network pathways represented in standard formats. The involvement of domain experts ensures that the key disease hallmarks are covered and relevant, up-to-date knowledge is adequately represented. Expert-curated and computer readable, disease maps may serve as a compendium of knowledge, allow for data-supported hypothesis generation or serve as a scaffold for the generation of predictive mathematical models. This article summarizes the 2nd Disease Maps Community meeting, highlighting its important topics and outcomes. We outline milestones on the roadmap for the future development of disease maps, including creating and maintaining standardized disease maps; sharing parts of maps that encode common human disease mechanisms; providing technical solutions for complexity management of maps; and Web tools for in-depth exploration of such maps. A dedicated discussion was focused on mathematical modeling approaches, as one of the main goals of disease map development is the generation of mathematically interpretable representations to predict disease comorbidity or drug response and to suggest drug repositioning, altogether supporting clinical decisions.
Open Access
A comparison of logical and physical parallel I/O patterns
(SAGE Publications Inc., 1998) Simitci, H.; Reed, D. A.
Although there are several extant studies of parallel scientific application request patterns, there is little experimental data on the correlation of physical I/O patterns with application I/O stimuli. To understand these correlations, the authors have instrumented the SCSI device drivers of the Intel Paragon OSF/1 operating system to record key physical I/O activities, and have correlated this data with the I/O patterns of scientific applications captured via the Pablo analysis toolkit. This analysis shows that disk hardware features profoundly affect the distribution of request delays and that current parallel file systems respond to parallel application I/O patterns in nonscalable ways.
Open Access
Editorial: Alan Turing and artificial intelligence
(Springer, 2000) Akman, V.; Blackburn, P.
Open Access
Exploiting interclass rules for focused crawling
(IEEE, 2004) Altingövde, I. S.; Ulusoy, Özgür
A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant materials. The rule-based Web-crawling approach uses linkage statistics among topics to improve a baseline focused crawler's harvest rate and coverage. The crawler also employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps determine the relevancy of crawled pages.
Open Access
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions
(Oxford University Press, 2018-04) Cali, D. S.; Kim, J. S.; Ghose, S.; Alkan, Can; Mutlu, O.
Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
Open Access
An overview of regression techniques for knowledge discovery
(Cambridge University Press, 1999) Uysal, İ.; Güvenir, H. A.
Predicting or learning numeric features is called regression in the statistical literature, and it is the subject of research in both machine learning and statistics. This paper reviews the important techniques and algorithms for regression developed by both communities. Regression is important for many applications, since lots of real life problems can be modeled as regression problems. The review includes Locally Weighted Regression (LWR), rule-based regression, Projection Pursuit Regression (PPR), instance-based regression, Multivariate Adaptive Regression Splines (MARS) and recursive partitioning regression methods that induce regression trees (CART, RETIS and M5).
Open Access
Realizing the potential of blockchain technologies in genomics
(Cold Spring Harbor Laboratory Press, 2018) Özercan, Halil İbrahim; İleri, A. M.; Ayday, Erman; Alkan, Can
Genomics data introduce a substantial computational burden as well as data privacy and ownership issues. Data sets generated by high-throughput sequencing platforms require immense amounts of computational resources to align to reference genomes and to call and annotate genomic variants. This problem is even more pronounced if reanalysis is needed for new versions of reference genomes, which may impose high loads to existing computational infrastructures. Additionally, after the compute-intensive analyses are completed, the results are either kept in centralized repositories with access control, or distributed among stakeholders using standard file transfer protocols. This imposes two main problems: (1) Centralized servers become gatekeepers of the data, essentially acting as an unnecessary mediator between the actual data owners and data users; and (2) servers may create single points of failure both in terms of service availability and data privacy. Therefore, there is a need for secure and decentralized platforms for data distribution with user-level data governance. A new technology, blockchain, may help ameliorate some of these problems. In broad terms, the blockchain technology enables decentralized, immutable, incorruptible public ledgers. In this Perspective, we aim to introduce current developments toward using blockchain to address several problems in omics, and to provide an outlook of possible future implications of the blockchain technology to life sciences.
Open Access
A review of code reviewer recommendation studies: Challenges and future directions
(Elsevier, 2021-04-14) Çetin, H. Alperen; Doğan, Emre; Tüzün, Eray
Code review is the process of inspecting code changes by a developer who is not involved in the development of the changeset. One of the initial and important steps of code review process is selecting code reviewer(s) for a given code change. To maximize the benefits of the code review process, the appropriate selection of the reviewer is essential. Code reviewer recommendation has been an active research area over the last few years, and many recommendation models have been proposed in the literature. In this study, we conduct a systematic literature review by inspecting 29 primary studies published from 2009 to 2020. Based on the outcomes of our review: (1) most preferred approaches are heuristic approaches closely followed by machine learning approaches, (2) the majority of the studies use open source projects to evaluate their models, (3) the majority of the studies prefer incremental training set validation techniques, (4) most studies suffer from reproducibility problems, (5) model generalizability and dataset integrity are the most common validity threats for the models and (6) refining models and conducting additional experiments are the most common future work discussions in the studies.
Open Access
Ripping the text apart at different seams
(Stanford University Humanities Center, 1994) Akman, V.; Pound, E.; Eliot, T. S.
This is a brief reply to Herbert A. Simon's fine paper "Literary Criticism: A Cognitive Approach'', Stanford Humanties Review, Special Supplement ("Bridging the Gap'' Where Cognitive Science Meets Literary Criticism), vol. 4, no. 1, pp. 1-26, Spring 1994.
Open Access
Stance detection: a survey
(Association for Computing Machinery, 2020) Küçük, D.; Can, Fazlı
Automatic elicitation of semantic information from natural language texts is an important research problem with many practical application areas. Especially after the recent proliferation of online content through channels such as social media sites, news portals, and forums; solutions to problems such as sentiment analysis, sarcasm/controversy/veracity/rumour/fake news detection, and argument mining gained increasing impact and significance, revealed with large volumes of related scientific publications. In this article, we tackle an important problem from the same family and present a survey of stance detection in social media posts and (online) regular texts. Although stance detection is defined in different ways in different application settings, the most common definition is “automatic classification of the stance of the producer of a piece of text, towards a target, into one of these three classes: {Favor, Against, Neither}.” Our survey includes definitions of related problems and concepts, classifications of the proposed approaches so far, descriptions of the relevant datasets and tools, and related outstanding issues. Stance detection is a recent natural language processing topic with diverse application areas, and our survey article on this newly emerging topic will act as a significant resource for interested researchers and practitioners.
Open Access
Systems biology graphical notation: process description language Level 1 Version 2.0
(De Gruyter, 2019) Rougny, A.; Touré, V.; Moodie, S.; Balaur, I.; Czauderna, T.; Borlinghaus, H.; Doğrusöz, Uğur; Mazein, A.; Dräger, A.; Blinov, M. L.; Villéger, A.; Haw, R.; Demir, E.; Mi, H.; Sorokin, A.; Schreiber, F.; Luna, A.
The Systems Biology Graphical Notation (SBGN) is an international community effort that aims to standardise the visualisation of pathways and networks for readers with diverse scientific backgrounds as well as to support an efficient and accurate exchange of biological knowledge between disparate research communities, industry, and other players in systems biology. SBGN comprises the three languages Entity Relationship, Activity Flow, and Process Description (PD) to cover biological and biochemical systems at distinct levels of detail. PD is closest to metabolic and regulatory pathways found in biological literature and textbooks. Its well-defined semantics offer a superior precision in expressing biological knowledge. PD represents mechanistic and temporal dependencies of biological interactions and transformations as a graph. Its different types of nodes include entity pools (e.g. metabolites, proteins, genes and complexes) and processes (e.g. reactions, associations and influences). The edges describe relationships between the nodes (e.g. consumption, production, stimulation and inhibition). This document details Level 1 Version 2.0 of the PD specification, including several improvements, in particular: 1) the addition of the equivalence operator, subunit, and annotation glyphs, 2) modification to the usage of submaps, and 3) updates to clarify the use of various glyphs (i.e. multimer, empty set, and state variable).
Open Access
Technology dictates algorithms: recent developments in read alignment
(BioMed Central, 2021-08-26) Alser, Mohammed; Rotman, J.; Deshpande, D.; Taraszka, K.; Shi, H.; Baykal, P. I.; Yang, H. T.; Xue, V.; Knyazev, S.; Singer, B. D.; Balliu, B.; Koslicki, D.; Skums, P.; Zelikovsky, A.; Alkan, Can; Mutlu, Onur; Mangul, S.
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

Browse

Browsing Department of Computer Engineering by Type "Review"

Results Per Page

Sort Options