Browsing by Subject "Clustering"
Now showing 1 - 20 of 30
- Results Per Page
- Sort Options
Item Open Access A planar facility location–allocation problem with fixed and/or variable cost structures for rural electrification(2023-06) Akbaş, Beste; Kocaman, Ayşe SelinOne major impediment to developing countries’ economic growth is the lack of access to affordable, sustainable, and reliable modern energy systems. Even today, hundreds of millions of people live in rural areas and do not have access to essential electricity services. In this study, we present a planar facility location–allocation problem for planning decentralized energy systems in rural development. We consider nano-grid and micro-grid systems to electrify rural households. While micro-grids serve multiple households with a common generation facility, nano-grids are small-scale systems serving individual consumers. The households served by micro-grids are connected to the generation facilities with low-voltage cables, for which we employ a distance limit constraint due to technical concerns, including power loss and allowable voltage levels. In this problem, we minimize the total investment cost that consists of the facility opening and the low-voltage cable costs. In order to capture the diversity of cost structures in renewable energy investments, we consider three versions of the objective function where we incorporate different combinations of fixed and variable cost components for facilities. For this problem, we provide mixed-integer quadratically constrained problem formulations and propose model-based and clustering-based heuristic approaches. Model-based approaches are multi-stage, in which we solve the discrete counterparts of the problem and employ alternative selection methods for the candidate facility locations. Clustering-based approaches utilize faster clustering techniques to identify the type and location of the facilities. We conduct computational experiments on real-life instances from villages in Sub-Saharan Africa and perform a comparative analysis of the suggested heuristic approaches.Item Open Access Artificial intelligence-based hybrid anomaly detection and clinical decision support techniques for automated detection of cardiovascular diseases and Covid-19(2023-10) Terzi, Merve BegümCoronary artery diseases are the leading cause of death worldwide, and early diagnosis is crucial for timely treatment. To address this, we present a novel automated arti cial intelligence-based hybrid anomaly detection technique com posed of various signal processing, feature extraction, supervised, and unsuper vised machine learning methods. By jointly and simultaneously analyzing 12-lead electrocardiogram (ECG) and cardiac sympathetic nerve activity (CSNA) data, the automated arti cial intelligence-based hybrid anomaly detection technique performs fast, early, and accurate diagnosis of coronary artery diseases. To develop and evaluate the proposed automated arti cial intelligence-based hybrid anomaly detection technique, we utilized the fully labeled STAFF III and PTBD databases, which contain 12-lead wideband raw recordings non invasively acquired from 260 subjects. Using the wideband raw recordings in these databases, we developed a signal processing technique that simultaneously detects the 12-lead ECG and CSNA signals of all subjects. Subsequently, using the pre-processed 12-lead ECG and CSNA signals, we developed a time-domain feature extraction technique that extracts the statistical CSNA and ECG features critical for the reliable diagnosis of coronary artery diseases. Using the extracted discriminative features, we developed a supervised classi cation technique based on arti cial neural networks that simultaneously detects anomalies in the 12-lead ECG and CSNA data. Furthermore, we developed an unsupervised clustering technique based on the Gaussian mixture model and Neyman-Pearson criterion that performs robust detection of the outliers corresponding to coronary artery diseases. By using the automated arti cial intelligence-based hybrid anomaly detection technique, we have demonstrated a signi cant association between the increase in the amplitude of CSNA signal and anomalies in ECG signal during coronary artery diseases. The automated arti cial intelligence-based hybrid anomaly de tection technique performed highly reliable detection of coronary artery diseases with a sensitivity of 98.48%, speci city of 97.73%, accuracy of 98.11%, positive predictive value (PPV) of 97.74%, negative predictive value (NPV) of 98.47%, and F1-score of 98.11%. Hence, the arti cial intelligence-based hybrid anomaly detection technique has superior performance compared to the gold standard diagnostic test ECG in diagnosing coronary artery diseases. Additionally, it out performed other techniques developed in this study that separately utilize either only CSNA data or only ECG data. Therefore, it signi cantly increases the detec tion performance of coronary artery diseases by taking advantage of the diversity in di erent data types and leveraging their strengths. Furthermore, its perfor mance is comparatively better than that of most previously proposed machine and deep learning methods that exclusively used ECG data to diagnose or clas sify coronary artery diseases. It also has a very short implementation time, which is highly desirable for real-time detection of coronary artery diseases in clinical practice. The proposed automated arti cial intelligence-based hybrid anomaly detection technique may serve as an e cient decision-support system to increase physicians' success in achieving fast, early, and accurate diagnosis of coronary artery diseases. It may be highly bene cial and valuable, particularly for asymptomatic coronary artery disease patients, for whom the diagnostic information provided by ECG alone is not su cient to reliably diagnose the disease. Hence, it may signi cantly improve patient outcomes, enable timely treatments, and reduce the mortality associated with cardiovascular diseases. Secondly, we propose a new automated arti cial intelligence-based hybrid clinical decision support technique that jointly analyzes reverse transcriptase polymerase chain reaction (RT-PCR) curves, thorax computed tomography im ages, and laboratory data to perform fast and accurate diagnosis of Coronavirus disease 2019 (COVID-19). For this purpose, we retrospectively created the fully labeled Ankara University Faculty of Medicine COVID-19 (AUFM-CoV) database, which contains a wide variety of medical data, including RT-PCR curves, thorax computed tomogra phy images, and laboratory data. The AUFM-CoV is the most comprehensive database that includes thorax computed tomography images of COVID-19 pneu monia (CVP), other viral and bacterial pneumonias (VBP), and parenchymal lung diseases (PLD), all of which present signi cant challenges for di erential diagnosis. We developed a new automated arti cial intelligence-based hybrid clinical de cision support technique, which is an ensemble learning technique consisting of two preprocessing methods, long short-term memory network-based deep learning method, convolutional neural network-based deep learning method, and arti cial neural network-based machine learning method. By jointly analyzing RT-PCR curves, thorax computed tomography images, and laboratory data, the proposed automated arti cial intelligence-based hybrid clinical decision support technique bene ts from the diversity in di erent data types that are critical for the reliable detection of COVID-19 and leverages their strengths. The multi-class classi cation performance results of the proposed convolu tional neural network-based deep learning method on the AUFM-CoV database showed that it achieved highly reliable detection of COVID-19 with a sensitivity of 91.9%, speci city of 92.5%, precision of 80.4%, and F1-score of 86%. There fore, it outperformed thorax computed tomography in terms of the speci city of COVID-19 diagnosis. Moreover, the convolutional neural network-based deep learning method has been shown to very successfully distinguish COVID-19 pneumonia (CVP) from other viral and bacterial pneumonias (VBP) and parenchymal lung diseases (PLD), which exhibit very similar radiological ndings. Therefore, it has great potential to be successfully used in the di erential diagnosis of pulmonary dis eases containing ground-glass opacities. The binary classi cation performance results of the proposed convolutional neural network-based deep learning method showed that it achieved a sensitivity of 91.5%, speci city of 94.8%, precision of 85.6%, and F1-score of 88.4% in diagnosing COVID-19. Hence, it has compara ble sensitivity to thorax computed tomography in diagnosing COVID-19. Additionally, the binary classi cation performance results of the proposed long short-term memory network-based deep learning method on the AUFM-CoV database showed that it performed highly reliable detection of COVID-19 with a sensitivity of 96.6%, speci city of 99.2%, precision of 98.1%, and F1-score of 97.3%. Thus, it outperformed the gold standard RT-PCR test in terms of the sensitivity of COVID-19 diagnosis Furthermore, the multi-class classi cation performance results of the proposed automated arti cial intelligence-based hybrid clinical decision support technique on the AUFM-CoV database showed that it diagnosed COVID-19 with a sen sitivity of 66.3%, speci city of 94.9%, precision of 80%, and F1-score of 73%. Hence, it has been shown to very successfully perform the di erential diagnosis of COVID-19 pneumonia (CVP) and other pneumonias. The binary classi cation performance results of the automated arti cial intelligence-based hybrid clinical decision support technique revealed that it diagnosed COVID-19 with a sensi tivity of 90%, speci city of 92.8%, precision of 91.8%, and F1-score of 90.9%. Therefore, it exhibits superior sensitivity and speci city compared to laboratory data in COVID-19 diagnosis. The performance results of the proposed automated arti cial intelligence-based hybrid clinical decision support technique on the AUFM-CoV database demon strate its ability to provide highly reliable diagnosis of COVID-19 by jointly ana lyzing RT-PCR data, thorax computed tomography images, and laboratory data. Consequently, it may signi cantly increase the success of physicians in diagnosing COVID-19, assist them in rapidly isolating and treating COVID-19 patients, and reduce their workload in daily clinical practice.Item Open Access Balancing energy loads in wireless sensor networks through uniformly quantized energy levels-based clustering(IEEE, 2010) Ali, Syed Amjad; Sevgi, Cüneyt; Kocyigit, A.Clustering is considered a common and an effective method to prolong the lifetime of a wireless sensor network. This paper provides a new insight into the cluster formation process based on uniformly quantizing the residual energy of the sensor nodes. The unified simulation framework provided herein, not only aids to reveal an optimum number of clusters but also the required number of quantization levels to maximize the network's lifetime by improving energy load balancing for both homogeneous and heterogeneous sensor networks. The provided simulation results clearly show that the uniformly quantized energy level-based clustering provides improved load balancing and hence, a longer network lifetime than existing methods. © 2010 IEEE.Item Open Access CAP-RNAseq: an online platform for RNA-seq data clustering, annotation and prioritization based on gene essentiality and congruence between mRNA and protein levels(2024-04) Özdeniz, Merve VuralIn recent years, there has been a remarkable growth in the application of RNA-seq in both clinical and molecular biology research contexts. The analysis and interpretation of these RNA-seq data demands a good knowledge of bioinformatics. Many different applications are available to perform the analysis, but more comprehensive applications are needed, especially for researchers without coding experience. Therefore, I developed an all-in-one novel RNA-seq analysis tool, CAP-RNAseq (http://konulabapps.bilkent.edu.tr:3838/CAPRNAseq/), which provide valuable analysis for co-expression cluster prioritization and annotation. CAP-RNAseq in particular performs clustering of the genes based on their expression patterns, annotates mirror clusters that display inverse patterns with a network-based visualizations before prioritization of clusters and/or genes based on "gene essentiality", protein levels and the degree of congruence between mRNA and protein levels of genes. Furthermore, for illustration of the use of CAP-RNAseq in this thesis, I reanalyzed a number of published RNA-seq datasets and identified novel pathways modulated by NTRK2 overexpression (GSE136868) in neural stem cells and also showed significance of the essential genes/pathways in senescent cell clearance focusing on NTRK2 (fibroblast; GSE190998) and THBD (Huh7, GSE228941) siRNA models. In addition, I analyzed our lab’s novel RNA-seq data obtained from breast cancer cell lines in CAP-RNAseq; and the findings revealed a) the complex associations between steroid hormones; Drospirenone, Aldosterone, and Estrogen in hormone positive T47D and mineralocorticoid receptor-overexpressing MCF-7 cells; and b) significant differences in essential and non-essential gene expression of the isogenic MCF7 cells overexpressing wildtype or mutant TP53. I also studied a public breast cancer dataset (GSE201085) demonstrating CAP-RNAseq’s ability to identify novel breast cancer markers exhibiting high mRNA-protein level correlations. In conclusion, this thesis not only demonstrates the use and power of CAP-RNAseq as a tool to identify essential genes and pathways by analyzing RNA-seq data, but also provides new insights into the roles of essential genes in glioma, senescence and breast cancer.Item Open Access Categorization in a hierarchically structured text database(2001) Kutlu, FerhatOver the past two decades there has been a huge increase in the amount of data being stored in databases and the on-line flow of data by the effects of improvements in Internet. This huge increase brought out the needs for intelligent tools to manage that size of data and its flow. Hierarchical approach is the best way to satisfy these needs and it is so widespread among people dealing with databases and Internet. Usenet newsgroups system is one of the on-line databases that have built-in hierarchical structures. Our point of departure is this hierarchical structure which makes categorization tasks easier and faster. In fact most of the search engines in Internet also exploit inherent hierarchy of Internet. Growing size of data makes most of the traditional categorization algorithms obsolete. Thus we developed a brand-new categorization learning algorithm which constructs an index tree out of Usenet news database and then decides the related newsgroups of a new news by categorizing it over the index tree. In learning phase it has an agglomerative and bottom-up hierarchical approach. In categorization phase it does an overlapping and supervised categorization. k Nearest Neighbor categorization algorithm is used to compare the complexity measure and accuracy of our algorithm. This comparison does not only mean comparing two different algorithms but also means comparing hierarchical approach vs. flat approach, similarity measure vs. distance measure and importance of accuracy vs. importance of speed. Our algorithm prefers hierarchical approach and similarity measure, and greatly outperforms k Nearest Neighbor categorization algorithm in speed with minimal loss of accuracy.Item Open Access Cluster based collaborative filtering with inverted indexing(2005) Subakan, Özlem NurcanCollectively, a population contains vast amounts of knowledge and modern communication technologies that increase the ease of communication. However, it is not feasible for a single person to aggregate the knowledge of thousands or millions of data and extract useful information from it. Collaborative information systems are attempts to harness the knowledge of a population and to present it in a simple, fast and fair manner. Collaborative filtering has been successfully used in domains where the information content is not easily parse-able and traditional information filtering techniques are difficult to apply. Collaborative filtering works over a database of ratings for the items which are rated by users. The computational complexity of these methods grows linearly with the number of customers which can reach to several millions in typical commercial applications. To address the scalability concern, we have developed an efficient collaborative filtering technique by applying user clustering and using a specific inverted index structure (so called cluster-skipping inverted index structure) that is tailored for clustered environments. We show that the predictive accuracy of the system is comparable with the collaborative filtering algorithms without clustering, whereas the efficiency is far more improved.Item Open Access Cluster searching strategies for collaborative recommendation systems(2013) Altingovde, I. S.; Subakan, Ö. N.; Ulusoy, ÖzgürIn-memory nearest neighbor computation is a typical collaborative filtering approach for high recommendation accuracy. However, this approach is not scalable given the huge number of customers and items in typical commercial applications. Cluster-based collaborative filtering techniques can be a remedy for the efficiency problem, but they usually provide relatively lower accuracy figures, since they may become over-generalized and produce less-personalized recommendations. Our research explores an individualistic strategy which initially clusters the users and then exploits the members within clusters, but not just the cluster representatives, during the recommendation generation stage. We provide an efficient implementation of this strategy by adapting a specifically tailored cluster- skipping inverted index structure. Experimental results reveal that the individualistic strategy with the cluster-skipping index is a good compromise that yields high accuracy and reasonable scalability figures. © 2012 Elsevier Ltd. All rights reserved.Item Open Access A cluster-based external plagiarism and parallel corpora detection method(2011) Karbeyaz, Ceyhun EfeToday different editions and translations of the same literary text can be found. Intuitively such translations that are based on the same literary text are expected to possess significantly similar structure. In the same way, it is possible that a text that is suspected to have plagiarism can possess structural similarities with the text that is believed to be the source of the plagiarism. Textual plagiarism implies the usage of an author’s text, his/her work or the idea that is inserted in another textual work without giving a reference or without taking the permission of the original text’s author. Today, existing intrinsic and external plagiarism detection methods tend to detect plagiarism cases within a given dataset in order to run these algorithms in a reasonable amount of time. Hence a reference document set is built in order to search for plagiarism cases successfully by these algorithms. In this thesis, a method for detecting and quantifying the external plagiarism and parallel corpora is introduced. For this purpose, we use the structural similarities in order to analyze plagiarism detection problem and to quantify the similarity between given texts. In this method, suspicious and source texts are partitioned into corresponding blocks. Each block is represented as a group of documents where a document consists of a fixed amount of words. Then, blocks are indexed and clustered by using the cover coefficient clustering algorithm. Cluster formations for both texts are then analyzed and their similarities are measured. The results over PAN’09 plagiarism dataset and over different versions of the famous literary text classic Leylˆa and Mecnun show that the proposed method successfully detects and quantifies the structurally similar plagiarism cases and succeeds in detecting the parallel corpora.Item Open Access Clustering spatial networks for aggregate query processing: a hypergraph approach(Elsevier Ltd, 2008-03) Demir, E.; Aykanat, Cevdet; Cambazoglu, B. B.In spatial networks, clustering adjacent data to disk pages is highly likely to reduce the number of disk page accesses made by the aggregate network operations during query processing. For this purpose, different techniques based on the clustering graph model are proposed in the literature. In this work, we show that the state-of-the-art clustering graph model is not able to correctly capture the disk access costs of aggregate network operations. Moreover, we propose a novel clustering hypergraph model that correctly captures the disk access costs of these operations. The proposed model aims to minimize the total number of disk page accesses in aggregate network operations. Based on this model, we further propose two adaptive recursive bipartitioning schemes to reduce the number of allocated disk pages while trying to minimize the number of disk page accesses. We evaluate our clustering hypergraph model and recursive bipartitioning schemes on a wide range of road network datasets. The results of the conducted experiments show that the proposed model is quite effective in reducing the number of disk accesses incurred by the network operations. © 2007 Elsevier B.V. All rights reserved.Item Open Access Data decomposition techniques for parallel tree-based k-means clustering(2002) Şen, CenkThe main computation in the k-means clustering is distance calculations between cluster centroids and patterns. As the number of the patterns and the number of centroids increases, time needed to complete computations increased. This computational load requires high performance computers and/or algorithmic improvements. The parallel tree-based k-means algorithm on distributed memory machines combines the algorithmic improvements and high computation capacity of the parallel computers to deal with huge datasets. Its performance is affected by the data decomposition technique used. In this thesis, we presented novel data decomposition technique to improve the performance of the parallel tree-based k-means algorithm on distributed memory machines. Proposed tree-based decomposition techniques try to decrease the total number of the distance calculations by assigning processors compact subspaces. The compact subspace improves the performance of the pruning function of the tree-based k-means algorithm. We have implemented the algorithm and have conducted experiments on a PC cluster. Our experimental results demonstrated that the tree-based decomposition technique outperforms the random decomposition and stripwise decomposition techniques.Item Open Access Disrupted network topology in patients with stable and progressive mild cognitive impairment and alzheimer's disease(Oxford University Press, 2016) Pereira, J. B.; Mijalkov, M.; Kakaei, E.; Mecocci, P.; Vellas, B.; Tsolaki, M.; Kłoszewska, I.; Soininen, H.; Spenger, C.; Lovestone, S.; Simmons, A.; Wahlund, L.-O.; Volpe, G.; Westman, E.Recent findings suggest that Alzheimer's disease (AD) is a disconnection syndrome characterized by abnormalities in large-scale networks. However, the alterations that occur in network topology during the prodromal stages of AD, particularly in patients with stable mild cognitive impairment (MCI) and those that show a slow or faster progression to dementia, are still poorly understood. In this study, we used graph theory to assess the organization of structural MRI networks in stable MCI (sMCI) subjects, late MCI converters (lMCIc), early MCI converters (eMCIc), and AD patients from 2 large multicenter cohorts: ADNI and AddNeuroMed. Our findings showed an abnormal global network organization in all patient groups, as reflected by an increased path length, reduced transitivity, and increased modularity compared with controls. In addition, lMCIc, eMCIc, and AD patients showed a decreased path length and mean clustering compared with the sMCI group. At the local level, there were nodal clustering decreases mostly in AD patients, while the nodal closeness centrality detected abnormalities across all patient groups, showing overlapping changes in the hippocampi and amygdala and nonoverlapping changes in parietal, entorhinal, and orbitofrontal regions. These findings suggest that the prodromal and clinical stages of AD are associated with an abnormal network topology.Item Open Access Efficiency and effectiveness of query processing in cluster-based retrieval(Elsevier, 2004) Can, F.; Altingövde I.S.; Demir, E.Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C3M), and the Financial Times database of TREC containing 210158 documents of size 564 MB defined by 229748 terms with total of 29545234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering. © 2003 Elsevier Ltd. All rights reserved.Item Open Access Efficient successor retrieval operations for aggregate query processing on clustered road networks(Elsevier Inc., 2010) Demir, E.; Aykanat, CevdetGet-Successors (GS) which retrieves all successors of a junction is a kernel operation used to facilitate aggregate computations in road network queries. Efficient implementation of the GS operation is crucial since the disk access cost of this operation constitutes a considerable portion of the total query processing cost. Firstly, we propose a new successor retrieval operation Get-Unevaluated-Successors (GUS), which retrieves only the unevaluated successors of a given junction. The GUS operation is an efficient implementation of the GS operation, where the candidate successors to be retrieved are pruned according to the properties and state of the algorithm. Secondly, we propose a hypergraph-based model for clustering successively retrieved junctions by the GUS operations to the same pages. The proposed model utilizes query logs to correctly capture the disk access cost of GUS operations. The proposed GUS operation and associated clustering model are evaluated for two different instances of GUS operations which typically arise in Dijkstra's single source shortest path algorithm and incremental network expansion framework. Our simulation results show that the proposed successor retrieval operation together with the proposed clustering hypergraph model is quite effective in reducing the number of disk accesses in query processing. © 2010 Published by Elsevier Inc.Item Open Access EHPBS: Energy harvesting prediction based scheduling in wireless sensor networks(IEEE, 2013) Akgun, B.; Aykın, IrmakThe clustering algorithms designed for traditional sensor networks have been adapted for energy harvesting sensor networks (EHWSN). However, in these algorithms, the intra-cluster MAC protocols to be used were either not defined at all or they were TDMA based. These TDMA based MAC protocols are not specified except for the fact that cluster heads assign time slots to their members in a random manner. In this paper, we will modify this TDMA based scheduling as follows: members will request a time slot depending on their energy prediction and the cluster heads will assign these slots to members. This method will increase the network lifetime. The proof will be given with simulations. © 2013 IEEE.Item Open Access Farklı yapay sinir ağı temelli sınıflandırıcılar ile insan hareketi tanımlama(IEEE, 2017-05) Çatalbaş, Burak; Morgül, Ömer; Çatalbaş, Bahadırİnsan Hareketi Tanımlanması, taşıdığı önem ve sınırlı öznitelik vektörü ile yüksek sınıflandırma oranlarına ulaşmasında karşılaşılan zorluk nedeniyle popüler bir araştırma konusudur. Bireylerin hareket ölçülebilirliginin akıllı telefonların içinde gömülü bulunan atalet ölçüm birimleri sayesinde artması ile birlikte, bu alanda toplanan veri miktarı artmakta ve daha başarılı sınıflandırıcıların tasarlanabilmesine imkan saglanmaktadır. Yapay sinir ağları, konvansiyonel sınıflandırıcılara göre sınıflandırma sorunlarında daha iyi performans sergileyebilmektedir. Bu çalışmada, Irvine Kaliforniya Üniversitesi (UCI) veri setine yapay sinir ağı temelli bir sınıflandırıcı önermek için çeşitli yapay sinir ağı yapıları denenmiş olup, bu sınıflandırıcılar ile elde edilen başarı oranları literatürdeki aynı veri kümesi için bulunan sonuçlarla karşılaştırılmıştır.Item Open Access Hypergraph models and algorithms for data-pattern-based clustering(Springer, 2004) Ozdal, M. M.; Aykanat, CevdetIn traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern semantics. By clustering the patterns in the dataset, we infer a clustering of the transactions represented this way. For this, we propose a novel hypergraph model to represent the relations among the patterns. Instead of a local measure that depends only on common items among patterns, we propose a global measure that is based on the cooccurences of these patterns in the overall data. The success of existing hypergraph partitioning based algorithms in other domains depends on sparsity of the hypergraph and explicit objective metrics. For this, we propose a two-phase clustering approach for the above hypergraph, which is expected to be dense. In the first phase, the vertices of the hypergraph are merged in a multilevel algorithm to obtain large number of high quality clusters. Here, we propose new quality metrics for merging decisions in hypergraph clustering specifically for this domain. In order to enable the use of existing metrics in the second phase, we introduce a vertex-to-cluster affinity concept to devise a method for constructing a sparse hypergraph based on the obtained clustering. The experiments we have performed show the effectiveness of the proposed framework.Item Open Access Identification of cancer patient subgroups via pathway based multi-view graph kernel clustering(2017-07) Ünal, Ali BurakCharacterizing patient genomic alterations through next-generation sequencing technologies opens up new opportunities for re ning cancer subtypes. Di erent omics data provide di erent views into the molecular biology of the tumors. However, tumor cells exhibit high levels of heterogeneity, and di erent patients harbor di erent combinations of molecular alterations. On the other hand, different alterations may perturb the same biological pathways. In this work, we propose a novel clustering procedure that quanti es the similarities of patients from their alteration pro les on pathways via a novel graph kernel. For each pathway and patient pair, a vertex labeled undirected graph is constructed based on the patient molecular alterations and the pathway interactions. The proposed smoothed shortest path graph kernel (smSPK) assesses similarities of pair of patients with respect to a pathway by comparing their vertex labeled graphs. Our clustering procedure involves two steps. In the rst step, the smSPK kernel matrices for each pathway and data type are computed for patient pairs to construct multiple kernel matrices and in the ensuing step, these kernel matrices are input to a multi-view kernel clustering algorithm to stratify patients. We apply our methodology to 361 renal cell carcinoma patients, using somatic mutations, gene and protein expressions data. This approach yields subgroup of patients that di er signi cantly in their survival times (p-value 1:5 108). The proposed methodology allows integrating other type of omics data and provides insight into disrupted pathways in each patient subgroup.Item Open Access Koroner arter hastalığının destek vektör makineleri ve Gauss karışım modeli ile tespiti(IEEE, 2019-04) Terzi, Merve Begüm; Arıkan, OrhanBu çalışmada, koroner arter hastalığının (KAH) gürbüz tespitini gerçekleştirmek amacıyla EKG’deki anomalileri güncel sinyal işleme ve makine ögrenmesi yöntemlerini kullanarak tespit eden bir teknik geliştirilmiştir. Bu amaçla, European ST-T veri tabanındaki geniş bantlı kayıtlar kullanılarak, KAH’ın güvenilir tespiti için kritik olan EKG özniteliklerini elde eden özgün bir öznitelik çıkarım tekniği geliştirilmiştir. Elde edilen öznitelikleri kullanarak, KAH’ın gürbüz tespitini gerçekleştiren destek vektör makinelerine (DVM) ve çekirdek fonksiyonlarına dayalı bir gözetimli öğrenme tekniği geliştirilmiştir. İskemik EKG verilerinin eksik olduğu durumlarda, sadece bazal EKG verilerini kullanarak KAH’ın gürbüz tespitini gerçekleştiren Gauss karışım modeline (GKM) dayalı bir gözetimsiz ögrenme tekniği geliştirilmiştir. KAH’ı temsil eden aykırı değerlerin gürbüz tespitini gerçekleştirmek için Neyman-Pearson tipi bir yaklaşım geliştirilmiştir. Önerilen tekniğin European ST-T veri tabanı üzerindeki başarım sonuçları, tekniğin oldukça güvenilir KAH tespiti sağladığını göstermektedir.Item Open Access A link-based storage scheme for efficient aggregate query processing on clustered road networks(Elsevier Ltd, 2010) Demir, E.; Aykanat, Cevdet; Cambazoglu, B. B.The need to have efficient storage schemes for spatial networks is apparent when the volume of query processing in some road networks (e.g., the navigation systems) is considered. Specifically, under the assumption that the road network is stored in a central server, the adjacent data elements in the network must be clustered on the disk in such a way that the number of disk page accesses is kept minimal during the processing of network queries. In this work, we introduce the link-based storage scheme for clustered road networks and compare it with the previously proposed junction-based storage scheme. In order to investigate the performance of aggregate network queries in clustered road networks, we extend our recently proposed clustering hypergraph model from junction-based storage to link-based storage. We propose techniques for additional storage savings in bidirectional networks that make the link-based storage scheme even more preferable in terms of the storage efficiency. We evaluate the performance of our link-based storage scheme against the junction-based storage scheme both theoretically and empirically. The results of the experiments conducted on a wide range of road network datasets show that the link-based storage scheme is preferable in terms of both storage and query processing efficiency. © 2009 Elsevier B.V. All rights reserved.Item Open Access On-line new event detection and clustering using the concepts of the cover coefficient-based clustering methodology(2002) Vural, AhmetIn this study, we use the concepts of the cover coefficient-based clustering methodology (C3 M) for on-line new event detection and event clustering. The main idea of the study is to use the seed selection process of the C3 M algorithm for the purpose of detecting new events. Since C3 M works in a retrospective manner, we modify the algorithm to work in an on-line environment. Furthermore, in order to prevent producing oversized event clusters, and to give equal chance to all documents to be the seed of a new event, we employ the window size concept. Since we desire to control the number of seed documents, we introduce a threshold concept to the event clustering algorithm. We also use the threshold concept, with a little modification, in the on-line event detection. In the experiments we use TDT1 corpus, which is also used in the original topic detection and tracking study. In event clustering and event detection, we use both binary and weighted versions of TDT1 corpus. With the binary implementation, we obtain better results. When we compare our on-line event detection results to the results of UMASS approach, we obtain better performance in terms of false alarm rates.