Browsing by Subject "Parallel programming (Computer science)"

Now showing 1 - 4 of 4

Open Access
Improving the performance of similarity joins using graphics processing unit
(2012) Korkmaz, Zeynep
The similarity join is an important operation in data mining and it is used in many applications from varying domains. A similarity join operator takes one or two sets of data points and outputs pairs of points whose distances in the data space is within a certain threshold value, ". The baseline nested loop approach computes the distances between all pairs of objects. When considering large set of objects which yield too long query time for nested loop paradigm, accelerating such operator becomes more important. The computing capability of recent GPUs with the help of a general purpose parallel computing architecture (CUDA) has attracted many researches. With this motivation, we propose two similarity join algorithms for Graphics Processing Unit (GPU). To exploit the advantages of general purpose GPU computing, we rst propose an improved nested loop join algorithm (GPU-INLJ) for the speci c environment of GPU. Also we present a partitioning-based join algorithm (KMEANS-JOIN) that guarantees each partition can be joined independently without missing any join pair. Our experiments demonstrate massive performance gains and the suitability of our algorithms for large datasets.
Open Access
Independent task assignment for heterogeneous systems
(2013) Tabak, E Kartal
We study the problem of assigning nonuniform tasks onto heterogeneous systems. We investigate two distinct problems in this context. The first problem is the one-dimensional partitioning of nonuniform workload arrays with optimal load balancing. The second problem is the assignment of nonuniform independent tasks onto heterogeneous systems. For one-dimensional partitioning of nonuniform workload arrays, we investigate two cases: chain-on-chain partitioning (CCP), where the order of the processors is specified, and chain partitioning (CP), where processor permutation is allowed. We present polynomial time algorithms to solve the CCP problem optimally, while we prove that the CP problem is NP complete. Our empirical studies show that our proposed exact algorithms for the CCP problem produce substantially better results than the state-of-the-art heuristics while the solution times remain comparable. For the independent task assignment problem, we investigate improving the performance of the well-known and widely used constructive heuristics MinMin, MaxMin and Sufferage. All three heuristics are known to run in O(KN2 ) time in assigning N tasks to K processors. In this thesis, we present our work on an algorithmic improvement that asymptotically decreases the running time complexity of MinMin to O(KN log N) without affecting its solution quality. Furthermore, we combine the newly proposed MinMin algorithm with MaxMin as well as Sufferage, obtaining two hybrid algorithms. The motivation behind the former hybrid algorithm is to address the drawback of MaxMin in solving problem instances with highly skewed cost distributions while also improving the running time performance of MaxMin. The latter hybrid algorithm improves the running time performance of Sufferage without degrading its solution quality. The proposed algorithms are easy to implement and we illustrate them through detailed pseudocodes. The experimental results over a large number of real-life datasets show that the proposed fast MinMin algorithm and the proposed hybrid algorithms perform significantly better than their traditional counterparts as well as more recent state-of-the-art assignment heuristics. For the large datasets used in the experiments, MinMin, MaxMin, and Sufferage, as well as recent state-of-the-art heuristics, require days, weeks, or even months to produce a solution, whereas all of the proposed algorithms produce solutions within only two or three minutes. For the independent task assignment problem, we also investigate adopting the multi-level framework which was successfully utilized in several applications including graph and hypergraph partitioning. For the coarsening phase of the multi-level framework, we present an efficient matching algorithm which runs in O(KN) time in most cases. For the uncoarsening phase, we present two refinement algorithms: an efficient O(KN)-time move-based refinement and an efficient O(K2N log N)-time swap-based refinement. Our results indicate that multi-level approach improves the quality of task assignments, while also improving the running time performance, especially for large datasets. As a realistic distributed application of the independent task assignment problem, we introduce the site-to-crawler assignment problem, where a large number of geographically distributed web servers are crawled by a multi-site distributed crawling system and the objective is to minimize the duration of the crawl. We show that this problem can be modeled as an independent task assignment problem. As a solution to the problem, we evaluate a large number of state-of-the-art task assignment heuristics selected from the literature as well as the improved versions and the newly developed multi-level task assignment algorithm. We compare the performance of different approaches through simulations on very large, real-life web datasets. Our results indicate that multi-site web crawling efficiency can be considerably improved using the independent task assignment approach, when compared to relatively easy-to-implement, yet naive baselines.
Open Access
Massively parallel mapping of next generation sequence reads using GPU
(2012) Korkmaz, Mustafa
The high throughput sequencing (HTS) methods have already started to fundamentally revolutionize the area of genome research through low-cost and highthroughput genome sequencing. However, the sheer size of data imposes various computational challenges. For example, in the Illumina HiSeq2000, each run produces over 7-8 billion short reads and over 600 Gb of base pairs of sequence data within less than 10 days. For most applications, analysis of HTS data starts with read mapping, i.e. nding the locations of these short sequence reads in a reference genome assembly. The similarities between two sequences can be determined by computing their optimal global alignments using a dynamic programming method called the Needleman-Wunsch algorithm. The Needleman-Wunsch algorithm is widely used in hash-based DNA read mapping algorithms because of its guaranteed sensitivity. However, the quadratic time complexity of this algorithm makes it highly timeconsuming and the main bottleneck in analysis. In addition to this drawback, the short length of reads ( 100 base pairs) and the large size of mammalian genomes (3.1 Gbp for human) worsens the situation by requiring several hundreds to tens of thousands of Needleman-Wunsch calculations per read. The fastest approach proposed so far avoids Needleman-Wunsch and maps the data described above in 70 CPU days with lower sensitivity. More sensitive mapping approaches are even slower. We propose that e cient parallel implementations of string comparison will dramatically improve the running time of this process. With this motivation, we propose to develop enhanced algorithms to exploit the parallel architecture of GPUs.
Open Access
Parallel hardware and software implementations for electromagnetic computations
(2005) Bozbulut, Ali Rıza
Multilevel fast multipole algorithm (MLFMA) is an accurate frequencydomain electromagnetics solver that reduces the computational complexity and memory requirement significantly. Despite the advantages of the MLFMA, the maximum size of an electromagnetic problem that can be solved on a single processor computer is still limited by the hardware resources of the system, i.e., memory and processor speed. In order to go beyond the hardware limitations of single processor systems, parallelization of the MLFMA, which is not a trivial task, is suggested. This process requires the parallel implementations of both hardware and software. For this purpose, we constructed our own parallel computer clusters and parallelized our MLFMA program by using message-passing paradigm to solve electromagnetics problems. In order to balance the work load and memory requirement over the processors of multiprocessors systems, efficient load balancing techniques and algorithms are included in this parallel code. As a result, we can solve large-scale electromagnetics problems accurately and rapidly with parallel MLFMA solver on parallel clusters.