Browsing by Subject "High performance computing"

Now showing 1 - 7 of 7

Open Access
Architecture framework for mapping parallel algorithms to parallel computing platforms
(CEUR-WS, 2013) Tekinerdogan, Bedir; Arkin, E.
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, and the mapping of the algorithm to the logical configuration platform. Unfortunately, in current parallel computing approaches there does not seem to be precise modeling approaches for supporting the mapping process. The lack of a clear and precise modeling approach for parallel computing impedes the communication and analysis of the decisions for supporting the mapping of parallel algorithms to parallel computing platforms. In this paper we present an architecture framework for modeling the various views that are related to the mapping process. An architectural framework organizes and structures the proposed architectural viewpoints. We propose five coherent set of viewpoints for supporting the mapping of parallel algorithms to parallel computing platforms. We illustrate the architecture framework for the mapping of array increment algorithm to the parallel computing platform. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
Foreword: 1st International Workshop on High Performance Computing for Big Data
(IEEE, 2014-09) Kaya, Kamer; Gedik, Buğra; Çatalyürek, Ümit V.
The 1st International Workshop on High Performance Computing for Big Data (HPC4BD) is held on September 10, 2014 in concordance with 43rd International Conference on Parallel Processing (ICPP-2014). The workshop aimed to bring high performance computing (HPC) experts and experts from various application domains together to discuss their Big Data problems. There were four works accepted to be presented in this year's workshop. This foreword presents a summary of the them. © 2014 IEEE.
Open Access
Improving efficiency of parallel vertex-centric algorithms for irregular graphs
(IEEE Computer Society, 2019) Özdal, Muhammet Mustafa
Memory access is known to be the main bottleneck for shared-memory parallel graph applications especially for large and irregular graphs. Propagation blocking (PB) idea was proposed recently to improve the parallel performance of PageRank and sparse matrix and vector multiplication operations. The idea is based on separating parallel computation into two phases, binning and accumulation, such that random memory accesses are replaced with contiguous accesses. In this paper, we propose an algorithm that allows execution of these two phases concurrently. We propose several improvements to increase parallel throughput, reduce memory overhead, and improve work efficiency. Our experimental results show that our proposed algorithms improve shared-memory parallel throughput by a factor of up to 2× compared to the original PB algorithms. We also show that the memory overhead can be reduced significantly (from 170 percent down to less than 5 percent) without significant degradation of performance. Finally, we demonstrate that our concurrent execution model allows asynchronous parallel execution, leading to significant work efficiency in addition to throughput improvements.
Open Access
Model-driven approach for supporting the mapping of parallel algorithms to parallel computing platforms
(Springer, Berlin, Heidelberg, 2013) Arkin, E.; Tekinerdogan, Bedir; Imre, K.M.
The trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallel algorithms to a computing platform that consists of multiple parallel processing nodes. In general different alternative mappings can be defined that perform differently with respect to the quality requirements for power consumption, efficiency and memory usage. The mapping process can be carried out manually for platforms with a limited number of processing nodes. However, for exascale computing in which hundreds of thousands of processing nodes are applied, the mapping process soon becomes intractable. To assist the parallel computing engineer we provide a model-driven approach to analyze, model, and select feasible mappings. We describe the developed toolset that implements the corresponding approach together with the required metamodels and model transformations. We illustrate our approach for the well-known complete exchange algorithm in parallel computing. © 2013 Springer-Verlag.
Open Access
Model-driven transformations for mapping parallel algorithms on parallel computing platforms
(MDHPCL, 2013) Arkin, E.; Tekinerdoğan, Bedir
One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. For platforms with a limited number of processing nodes this can be done manually. However, in case the parallel computing platform consists of hundreds of thousands of processing nodes then the manual coding of the parallel algorithms becomes intractable and error-prone. Moreover, a change of the parallel computing platform requires considerable effort and time of coding. In this paper we present a model-driven approach for generating the code of selected parallel algorithms to be mapped on parallel computing platforms. We describe the required platform independent metamodel, and the model-to-model and the model-to-text transformation patterns. We illustrate our approach for the parallel matrix multiplication algorithm. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
Parallelization of Sparse Matrix Kernels for big data applications
(Springer, 2016) Selvitopu, Oğuz; Akbudak, Kadir; Aykanat, Cevdet; Pop, F.; Kołodziej, J.; Di Martino, B.
Analysis of big data on large-scale distributed systems often necessitates efficient parallel graph algorithms that are used to explore the relationships between individual components. Graph algorithms use the basic adjacency list representation for graphs, which can also be viewed as a sparse matrix. This correspondence between representation of graphs and sparse matrices makes it possible to express many important graph algorithms in terms of basic sparse matrix operations, where the literature for optimization is more mature. For example, the graph analytic libraries such as Pegasus and Combinatorial BLAS use sparse matrix kernels for a wide variety of operations on graphs. In this work, we focus on two such important sparse matrix kernels: Sparse matrix–sparse matrix multiplication (SpGEMM) and sparse matrix–dense matrix multiplication (SpMM). We propose partitioning models for efficient parallelization of these kernels on large-scale distributed systems. Our models aim at reducing and improving communication volume while balancing computational load, which are two vital performance metrics on distributed systems. We show that by exploiting sparsity patterns of the matrices through our models, the parallel performance of SpGEMM and SpMM operations can be significantly improved.
Open Access
Performance improvement on latency-bound parallel HPC applications by message sharing between processors
(2021-02) Duymuş, Mustafa
The performance of paralellized High Performance Computing (HPC) applica-tions is tied to the eﬃciency of the underlying processor-to-processor commu-nication. In latency-bound applications, the performance runs into bottleneck by the processor that is sending the maximum number of messages to the other processors. To reduce the latency overhead, we propose a two-phase message-sharing-based algorithm, where the bottleneck processor (the processor sending the maximum number of messages) is paired with another processor. In the ﬁrst phase, the bottleneck processor is paired with the processor that has the maxi-mum number of common outgoing messages. In the second phase, the bottleneck processor is paired with the processor that has the minimum number of outgo-ing messages. In both phases, the processor pair share the common outgoing messages between them, reducing their total number of outgoing messages, but especially the number of outgoing messages of the bottleneck processor. We use Sparse Matrix-Vector Multiplication as the kernel application and a 512-processor setting for the experiments. The proposed message-sharing algorithm achieves a reduction of 84% in the number of messages sent by the bottleneck processor and a reduction of 60% in the total number of messages in the system.