Show simple item record

dc.contributor.authorKandemir, M.en_US
dc.contributor.authorMuralidhara, S.P.en_US
dc.contributor.authorNarayanan, S.H.K.en_US
dc.contributor.authorZhang, Y.en_US
dc.contributor.authorOzturk O.en_US
dc.date.accessioned2016-02-08T12:25:54Z
dc.date.available2016-02-08T12:25:54Z
dc.date.issued2009en_US
dc.identifier.issn10724451
dc.identifier.urihttp://hdl.handle.net/11693/28639
dc.description.abstractOne of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme. Copyright 2009 ACM.en_US
dc.language.isoEnglishen_US
dc.source.titleProceedings of the Annual International Symposium on Microarchitecture, MICROen_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/1669112.1669176en_US
dc.subjectAlgorithmen_US
dc.subjectB.3.2 [memory structures]: design styles - cache memoriesen_US
dc.subjectD.3.4 [programming languages]: processors - compilersen_US
dc.subjectDesignen_US
dc.subjectExperimentationen_US
dc.subjectPerformanceen_US
dc.subjectD.3.4 [programming languages]: processors - compilersen_US
dc.subjectDesign stylesen_US
dc.subjectExperimentationen_US
dc.subjectMemory structureen_US
dc.subjectPerformanceen_US
dc.subjectProgramming languageen_US
dc.subjectComputer softwareen_US
dc.subjectDesignen_US
dc.subjectExperimentsen_US
dc.subjectLinguisticsen_US
dc.subjectMicroprocessor chipsen_US
dc.subjectMultiprocessing systemsen_US
dc.subjectOptimizationen_US
dc.subjectProgram compilersen_US
dc.subjectCache memoryen_US
dc.titleOptimizing shared cache behavior of chip multiprocessorsen_US
dc.typeConference Paperen_US
dc.departmentDepartment of Computer Engineering
dc.citation.spage505en_US
dc.citation.epage516en_US
dc.identifier.doi10.1145/1669112.1669176en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record