Browsing by Subject "Parallel programming"

Now showing 1 - 6 of 6

Open Access
Architecture framework for mapping parallel algorithms to parallel computing platforms
(CEUR-WS, 2013) Tekinerdogan, Bedir; Arkin, E.
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, and the mapping of the algorithm to the logical configuration platform. Unfortunately, in current parallel computing approaches there does not seem to be precise modeling approaches for supporting the mapping process. The lack of a clear and precise modeling approach for parallel computing impedes the communication and analysis of the decisions for supporting the mapping of parallel algorithms to parallel computing platforms. In this paper we present an architecture framework for modeling the various views that are related to the mapping process. An architectural framework organizes and structures the proposed architectural viewpoints. We propose five coherent set of viewpoints for supporting the mapping of parallel algorithms to parallel computing platforms. We illustrate the architecture framework for the mapping of array increment algorithm to the parallel computing platform. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
Efficient heterogeneous parallel programming for compressed sensing based direction of arrival estimation
(John Wiley & Sons Ltd., 2021-07) Fişne, A.; Kılıç, Berkan; Güngör, Alper; Özsoy, A.
In the direction of arrival (DoA) estimation, typically sensor arrays are used where the number of required sensors can be large depending on the application. With the help of compressed sensing (CS), hardware complexity of the sensor array system can be reduced since reliable estimations are possible by using the compressed measurements where the compression is done by measurement matrices. After the compression, DoAs are reconstructed by using sparsity promoting algorithms such as alternating direction method of multipliers (ADMM). For the given procedure, both the measurement matrix design and the reconstruction algorithm may include computationally intensive operations, which are addressed in this study. The presented simulation results imply the feasibility of the system in real-time processing with energy efficient implementations. We propose employing parallel programming to satisfy the real-time processing requirements. While the measurement matrix design has been accelerated 16urn:x-wiley:cpe:media:cpe6490:cpe6490-math-0001 with CPU based parallel version with respect to the fastest serial implementation, ADMM based DoA estimation has been improved 1.1urn:x-wiley:cpe:media:cpe6490:cpe6490-math-0002 with GPU based parallel version compared to the fastest CPU parallel implementation. In addition, we achieved, to the best of our knowledge, the first energy-efficient real-time DoA estimation on embedded Jetson GPGPUs in 15 W power consumption without affecting the DoA accuracy performance.
Open Access
An improved spring embedder layout algorithm for compound graphs
(2012) Karaçelik, Alper
Interactive graph editing plays an important role in information visualization systems. For qualified analysis of the given data, an automated layout calculation is needed. There have been numerous results published about automatic layout of simple graphs, where the vertices are depicted as points in a 2D or 3D plane and edges as straight lines connecting those points. But simple graphs are insufficient to cover most real life information. Relational information is often clustered or hierarchically organized into groups or nested structures. Compound spring embedder (CoSE) of Chisio project is a layout algorithm based on a force-directed layout scheme for undirected, non-uniform node sized compound graphs. In order to satisfy the end-user, layout calculation process has to finish fast, and the resulting layout should be eye pleasing. Therefore, several methods were developed for improving both running time and the visual quality of the layout. With the purpose of improving the visual quality of CoSE, we adapted a multi-level scaling strategy. For improving the performance of the CoSE, the grid-variant algorithm proposed by Fruchterman and Reingold and parallel force calculation strategy by using graphics processing unit (GPU) were also adopted. Additionally, tuning of the parameters like spring constant and cooling factor were considered, as they affect the behavior of the physical system dramatically. Our experiments show that after some tuning and adaptation of the methods above, running time decreased and the visual quality of the layout improved significantly.
Open Access
A message ordering problem in parallel programs
(Springer, 2004) Uçar, B.; Aykanat, Cevdet
We consider a certain class of parallel program segments in which the order of messages sent affects the completion time. We give characterization of these parallel program segments and propose a solution to minimize the completion time. With a sample parallel program, we experimentally evaluate the effect of the solution on a PC cluster. © Springer-Verlag 2004.
Open Access
Peachy parallel assignments (EduPar 2019)
(Institute of Electrical and Electronics Engineers Inc., 2019) Öztürk, Özcan; Glick, B.; Mache, J.; Bunde, D. P.
Peachy Parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of assignments includes face recognition, finding the electrical potential of a square wire, and heat diffusion. All of these come with sample assignment sheets and the necessary starter code.
Open Access
Safe data parallelism for general streaming
(Institute of Electrical and Electronics Engineers, 2015) Schneider S.; Hirzel M.; Gedik, B.; Wu, Kun-Lung
Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions.