Browsing by Subject "parallel computing"

Now showing 1 - 3 of 3

Open Access
Effective preconditioners for iterative solutions of large-scale surface-integral-equation problems
(2010) Malas, Tahir
A popular method to study electromagnetic scattering and radiation of threedimensional electromagnetics problems is to solve discretized surface integral equations, which give rise to dense linear systems. Iterative solution of such linear systems using Krylov subspace iterative methods and the multilevel fast multipole algorithm (MLFMA) has been a very attractive approach for large problems because of the reduced complexity of the solution. This scheme works well, however, only if the number of iterations required for convergence of the iterative solver is not too high. Unfortunately, this is not the case for many practical problems. In particular, discretizations of open-surface problems and complex real-life targets yield ill-conditioned linear systems. The iterative solutions of such problems are not tractable without preconditioners, which can be roughly defined as easily invertible approximations of the system matrices. In this dissertation, we present our efforts to design effective preconditioners for large-scale surface-integral-equation problems. We first address incomplete LU (ILU) preconditioning, which is the most commonly used and well-established preconditioning method. We show how to use these preconditioners in a blackbox form and safe manner. Despite their important advantages, ILU preconditioners are inherently sequential. Hence, for parallel solutions, a sparseapproximate-inverse (SAI) preconditioner has been developed. We propose a novel load-balancing scheme for SAI, which is crucial for parallel scalability. Then, we improve the performance of the SAI preconditioner by using it for the iterative solution of the near-field matrix system, which is used to precondition the dense linear system in an inner-outer solution scheme. The last preconditioner we develop for perfectly-electric-conductor (PEC) problems uses the same inner-outer solution scheme, but employs an approximate version of MLFMA for inner solutions. In this way, we succeed to solve many complex real-life problems including helicopters and metamaterial structures with moderate iteration counts and short solution times. Finally, we consider preconditioning of linear systems obtained from the discretization of dielectric problems. Unlike the PEC case, those linear systems are in a partitioned structure. We exploit the partitioned structure for preconditioning by employing Schur complement reduction. In this way, we develop effective preconditioners, which render the solution of difficult real-life problems solvable, such as dielectric photonic crystals.
Open Access
Independent task assignment for heterogeneous systems
(2013) Tabak, E Kartal
We study the problem of assigning nonuniform tasks onto heterogeneous systems. We investigate two distinct problems in this context. The first problem is the one-dimensional partitioning of nonuniform workload arrays with optimal load balancing. The second problem is the assignment of nonuniform independent tasks onto heterogeneous systems. For one-dimensional partitioning of nonuniform workload arrays, we investigate two cases: chain-on-chain partitioning (CCP), where the order of the processors is specified, and chain partitioning (CP), where processor permutation is allowed. We present polynomial time algorithms to solve the CCP problem optimally, while we prove that the CP problem is NP complete. Our empirical studies show that our proposed exact algorithms for the CCP problem produce substantially better results than the state-of-the-art heuristics while the solution times remain comparable. For the independent task assignment problem, we investigate improving the performance of the well-known and widely used constructive heuristics MinMin, MaxMin and Sufferage. All three heuristics are known to run in O(KN2 ) time in assigning N tasks to K processors. In this thesis, we present our work on an algorithmic improvement that asymptotically decreases the running time complexity of MinMin to O(KN log N) without affecting its solution quality. Furthermore, we combine the newly proposed MinMin algorithm with MaxMin as well as Sufferage, obtaining two hybrid algorithms. The motivation behind the former hybrid algorithm is to address the drawback of MaxMin in solving problem instances with highly skewed cost distributions while also improving the running time performance of MaxMin. The latter hybrid algorithm improves the running time performance of Sufferage without degrading its solution quality. The proposed algorithms are easy to implement and we illustrate them through detailed pseudocodes. The experimental results over a large number of real-life datasets show that the proposed fast MinMin algorithm and the proposed hybrid algorithms perform significantly better than their traditional counterparts as well as more recent state-of-the-art assignment heuristics. For the large datasets used in the experiments, MinMin, MaxMin, and Sufferage, as well as recent state-of-the-art heuristics, require days, weeks, or even months to produce a solution, whereas all of the proposed algorithms produce solutions within only two or three minutes. For the independent task assignment problem, we also investigate adopting the multi-level framework which was successfully utilized in several applications including graph and hypergraph partitioning. For the coarsening phase of the multi-level framework, we present an efficient matching algorithm which runs in O(KN) time in most cases. For the uncoarsening phase, we present two refinement algorithms: an efficient O(KN)-time move-based refinement and an efficient O(K2N log N)-time swap-based refinement. Our results indicate that multi-level approach improves the quality of task assignments, while also improving the running time performance, especially for large datasets. As a realistic distributed application of the independent task assignment problem, we introduce the site-to-crawler assignment problem, where a large number of geographically distributed web servers are crawled by a multi-site distributed crawling system and the objective is to minimize the duration of the crawl. We show that this problem can be modeled as an independent task assignment problem. As a solution to the problem, we evaluate a large number of state-of-the-art task assignment heuristics selected from the literature as well as the improved versions and the newly developed multi-level task assignment algorithm. We compare the performance of different approaches through simulations on very large, real-life web datasets. Our results indicate that multi-site web crawling efficiency can be considerably improved using the independent task assignment approach, when compared to relatively easy-to-implement, yet naive baselines.
Open Access
Out-of-core implementation of the parallel multilevel fast multipole algorithm
(2013) Karaosmanoğlu, Barışcan
We developed an out-of-core (OC) implementation of the parallel multilevel fast multipole algorithm (MLFMA) to solve electromagnetic problems with reduced memory. The main purpose of the OC method is to reduce in-core memory (primary storage) by using mass storage (secondary storage) units. Depending on the OC implementation, the in-core data may be left in one piece or divided into partitions. If the latter, the partitions are written out into mass storage unit(s) and read into in-core memory when required. In this way, memory reduction is achieved. However, the proposed method causes time delays because reading and writing large data using massive storage units is a long procedure. In our case, repetitive access to data partitions from the mass storage increases the total time of the iterative solution part of MLFMA. Such time delays can be minimized by selecting the right data type and optimizing the sizes of the data partitions. We run the optimization tests on different types of mass storage devices, such as hard disks and solid state drives. This thesis explores OC implementation of the parallel MLFMA. To be more precise, it presents the results of optimization tests done on different partition sizes and shows how computation time is minimized despite the time delays. This thesis also presents full-wave solutions of scattering problems including hundreds of millions of unknowns by employing an OC-implemented parallel MLFMA.