Browsing by Subject "Chip multiprocessor"

Now showing 1 - 7 of 7

Open Access
Adaptive compute-phase prediction and thread prioritization to mitigate memory access latency
(ACM, 2014-06) Aktürk, İsmail; Öztürk, Özcan
The full potential of chip multiprocessors remains unex- ploited due to the thread oblivious memory access sched- ulers used in off-chip main memory controllers. This is especially pronounced in embedded systems due to limita- Tions in memory. We propose an adaptive compute-phase prediction and thread prioritization algorithm for memory access scheduling for embedded chip multiprocessors. The proposed algorithm eficiently categorize threads based on execution characteristics and provides fine-grained priori- Tization that allows to differentiate threads and prioritize their memory access requests accordingly. The threads in compute phase are prioritized among the threads in mem- ory phase. Furthermore, the threads in compute phase are prioritized among themselves based on the potential of mak- ing more progress in their execution. Compared to the prior works First-Ready First-Come First-Serve (FR-FCFS) and Compute-phase Prediction with Writeback-Refresh Overlap (CP-WO), the proposed algorithm reduces the execution time of the generated workloads up to 23.6% and 12.9%, respectively. Copyright 2014 ACM.
Open Access
Application-specific heterogeneous network-on-chip design
(Oxford University Press, 2014) Demirbas, D.; Akturk, I.; Ozturk, O.; Güdükbay, Uğur
As a result of increasing communication demands, application-specific and scalable Network-on-Chips (NoCs) have emerged to connect processing cores and subsystems in Multiprocessor System-on-Chips. A challenge in application-specific NoC design is to find the right balance among different tradeoffs, such as communication latency, power consumption and chip area. We propose a novel approach that generates latency-aware heterogeneous NoC topology. Experimental results show that our approach improves the total communication latency up to 27% with modest power consumption. © 2013 The Author 2013. Published by Oxford University Press on behalf of The British Computer Society.
Open Access
Code scheduling for optimizing parallelism and data locality
(Springer, 2010-08-09) Yemliha, T.; Kandemir, M.; Öztürk, Özcan; Kultursay, E.; Muralidhara, S. P.
As chip multiprocessors proliferate, programming support for these devices is likely to receive a lot of attention in the near future. Parallelism and data locality are two critical issues in a chip multiprocessor environment. Unfortunately, most of the published work in the literature focuses only on one of these problems, and this can prevent one from achieving the best possible performance. The main goal of this paper is to propose and evaluate a compiler-directed code parallelization scheme, which considers both parallelism and data locality at the same time. Our compiler captures the inherent parallelism and data reuse in the application code being analyzed using a novel representation called the locality-parallelism graph (LPG). Our partitioning/scheduling algorithm assigns the nodes of this graph to the processors in the architecture and schedules them for execution. We implemented this algorithm and evaluated its effectiveness using a set of benchmark codes. The results collected so far indicate that our approach improves overall execution latency significantly. In this paper, we also introduce an ILP (Integer Linear Programming) based formulation of the problem, and implement the schedule obtained by the ILP solver. The results indicate that our approach gets within 4% of the ILP solution. © 2010 Springer-Verlag.
Open Access
Heterogeneous network-on-chip design through evolutionary computing
(Taylor & Francis, 2010) Ozturk, O.; Demirbas, D.
This article explores the use of biologically inspired evolutionary computational techniques for designing and optimising heterogeneous network-on-chip (NoC) architectures, where the nodes of the NoC-based chip multiprocessor exhibit different properties such as performance, energy, temperature, area and communication bandwidth. Focusing primarily on array-dominated applications and heterogeneous execution environments, the proposed approach tries to optimise the distribution of the nodes for a given NoC area under the constraints present in the environment. This article is the first one, to our knowledge, that explores the possibility of employing evolutionary computational techniques for optimally placing the heterogeneous nodes in an NoC. We also compare our approach with an optimal integer linear programming (ILP) approach using a commercial ILP tool. The results collected so far are very encouraging and indicate that the proposed approach generates close results to the ILP-based approach with minimal execution latencies. © 2010 Taylor & Francis.
Open Access
ILP-based communication reduction for heterogeneous 3D network-on-chips
(IEEE, 2013-02-03) Aktürk, İsmail; Öztürk, Özcan
Network-on-Chip (NoC) architectures and three-dimensional integrated circuits (3D ICs) have been introduced as attractive options for overcoming the barriers in interconnect scaling while increasing the number of cores. Combining these two approaches is expected to yield better performance and higher scalability. This paper explores the possibility of combining these two techniques in a heterogeneity aware fashion. We explore how heterogeneous processors can be mapped onto the given 3D chip area to minimize the data access costs. Our initial results indicate that the proposed approach generates promising results within tolerable solution times. © 2013 IEEE.
Open Access
On-chip memory space partitioning for chip multiprocessors using polyhedral algebra
(The Institution of Engineering and Technology, 2010) Ozturk, O.; Kandemir, M.; Irwin, M. J.
One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organisation. While it is possible to design an application-specific memory architecture, this may not necessarily be the best option, in particular when storage demands of individual processors and/or their data sharing patterns can change from one point in execution to another for the same application. Here, two problems are formulated. First, we show how a polyhedral method can be used to design, for array-based data-intensive embedded applications, an application-specific hybrid memory architecture that has both shared and private components. We evaluate the resulting memory configurations using a set of benchmarks and compare them to pure private and pure shared memory on-chip multiprocessor architectures. The second approach proposed consider dynamic configuration of software-managed on-chip memory space to adapt to the runtime variations in data storage demand and interprocessor sharing patterns. The proposed framework is fully implemented using an optimising compiler, a polyhedral tool, and a memory partitioner (based on integer linear programming), and is tested using a suite of eight data-intensive embedded applications. © 2010 © The Institution of Engineering and Technology.
Open Access
Shared scratch pad memory space management across applications
(Inderscience Publishers, 2009) Ozturk, Ozcan; Kandemir, M.; Son, S. W.; Kolcu, I.
Scratch Pad Memories (SPMs) have received considerable attention lately as on-chip memory building blocks. The main characteristic that distinguishes an SPM from a conventional cache memory is that the data flow is controlled by software. The main focus of this paper is the management of an SPM space shared by multiple applications that can potentially share data. The proposed approach has three major components; a compiler analysis phase, a runtime space partitioner, and a local partitioning phase. Our experimental results show that the proposed approach leads to minimum completion time among all alternate memory partitioning schemes tested.