BUIR logo
Communities & Collections
All of BUIR
  • English
  • Türkçe
Log In
Please note that log in via username/password is only available to Repository staff.
Have you forgotten your password?
  1. Home
  2. Browse by Subject

Browsing by Subject "Chip Multiprocessor"

Filter results by typing the first few letters
Now showing 1 - 4 of 4
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    ItemOpen Access
    Adaptive prefetching for shared cache based chip multiprocessors
    (IEEE, 2009-04) Kandemir, M.; Zhang, Y.; Öztürk, Özcan
    Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based CMP, multiple cores compete for the shared on-chip cache space and limited off-chip pin bandwidth. Purely software based prefetching techniques tend to increase this contention, leading to degradation in performance. In some cases, prefetches can become harmful by kicking out useful data from the shared cache whose next usage is earlier than the prefetched data, and the fraction of such harmful prefetches usually increases when we increase the number of cores used for executing a multi-threaded application code. In this paper, we propose two complementary techniques to address the problem of harmful prefetches in the context of shared L2 based CMPs. These techniques, namely, suppressing select data prefetches (if they are found to be harmful) and pinning select data in the L2 cache (if they are found to be frequent victim of harmful prefetches), are evaluated in this paper using two embedded application codes. Our experiments demonstrate that these two techniques are very effective in mitigating the impact of harmful prefetches, and as a result, we extract significant benefits from software prefetching even with large core counts. © 2009 EDAA.
  • Loading...
    Thumbnail Image
    ItemOpen Access
    Dynamic thread and data mapping for NoC based CMPs
    (IEEE, 2009-07) Kandemir, M.; Öztürk, Özcan; Muralidhara, S. P.
    Thread mapping and data mapping are two important problems in the context of NoC (network-on-chip) based CMPs (chip multiprocessors). While a compiler can determine suitable mappings for data and threads, such static mappings may not work well for multithreaded applications that go through different execution phases during their execution, each phase with potentially different data access patterns than others. Instead, a dynamic mapping strategy, if its overheads can be kept low, may be a more promising option. In this work, we present dynamic (runtime) thread and data mappings for NoC based CMPs. The goal of these mappings is to reduce the distance between the location of the core that requests data and the core whose local memory contains that requested data. In our experiments, we evaluate our proposed thread mapping and data mapping in isolation as well as in an integrated manner. Copyright 2009 ACM.
  • Loading...
    Thumbnail Image
    ItemOpen Access
    Multicore education through simulation
    (IEEE, 2009-07) Öztürk, Özcan
    This paper presents the experiences using a commercial full system simulation platform - Simics - in a graduate Chip Multiprocessors class. The Simics platform enables students and researchers to do research on computer architecture, operating systems, and hardware/software cosimulation. It provides the ability to simulate machines that are not physically available. This platform has been used in Chip Multiprocessors course to help graduate and undergraduate students in related areas. This course deals with both hardware and software issues in Chip Multiprocessors, and concludes with a team project at the end of the semester. The simulation-based approach was successful when student feedback and final projects are considered. ©2009 IEE.
  • Loading...
    Thumbnail Image
    ItemOpen Access
    Process variation aware thread mapping for chip multiprocessors
    (IEEE, 2009-04) Hong, S.; Narayanan, S. H. K.; Kandemir, M.; Özturk, Özcan
    With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions.With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions. © 2009 EDAA.

About the University

  • Academics
  • Research
  • Library
  • Students
  • Stars
  • Moodle
  • WebMail

Using the Library

  • Collections overview
  • Borrow, renew, return
  • Connect from off campus
  • Interlibrary loan
  • Hours
  • Plan
  • Intranet (Staff Only)

Research Tools

  • EndNote
  • Grammarly
  • iThenticate
  • Mango Languages
  • Mendeley
  • Turnitin
  • Show more ..

Contact

  • Bilkent University
  • Main Campus Library
  • Phone: +90(312) 290-1298
  • Email: dspace@bilkent.edu.tr

Bilkent University Library © 2015-2025 BUIR

  • Privacy policy
  • Send Feedback