Optimizing shared cache behavior of chip multiprocessors

dc.citation.epage516en_US
dc.citation.spage505en_US
dc.contributor.authorKandemir, M.en_US
dc.contributor.authorMuralidhara, S. P.en_US
dc.contributor.authorNarayanan, S. H. K.en_US
dc.contributor.authorZhang, Y.en_US
dc.contributor.authorÖztürk, Özcanen_US
dc.coverage.spatialNew York, USA
dc.date.accessioned2016-02-08T12:25:54Z
dc.date.available2016-02-08T12:25:54Z
dc.date.issued2009-12en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionConference name: MICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009
dc.descriptionDate of Conference: 12-16 December , 2009
dc.description.abstractOne of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme. Copyright 2009 ACM.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T12:25:54Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2009en
dc.identifier.doi10.1145/1669112.1669176en_US
dc.identifier.urihttp://hdl.handle.net/11693/28639
dc.language.isoEnglishen_US
dc.publisherACM
dc.relation.isversionofhttp://dx.doi.org/10.1145/1669112.1669176en_US
dc.source.titleMICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009en_US
dc.subjectAlgorithmen_US
dc.subjectExperimentationen_US
dc.subjectProcessorsen_US
dc.subjectDesign stylesen_US
dc.subjectMemory structureen_US
dc.subjectPerformanceen_US
dc.subjectProgramming languageen_US
dc.subjectComputer softwareen_US
dc.subjectDesignen_US
dc.subjectExperimentsen_US
dc.subjectLinguisticsen_US
dc.subjectMicroprocessor chipsen_US
dc.subjectMultiprocessing systemsen_US
dc.subjectOptimizationen_US
dc.subjectProgram compilersen_US
dc.subjectCache memoryen_US
dc.subjectCompilers
dc.titleOptimizing shared cache behavior of chip multiprocessorsen_US
dc.typeConference Paperen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Optimizing shared cache behavior of chip multiprocessors.pdf
Size:
7.44 MB
Format:
Adobe Portable Document Format
Description:
Full printable version