Optimizing shared cache behavior of chip multiprocessors

Kandemir, M.; Muralidhara, S. P.; Narayanan, S. H. K.; Zhang, Y.; Öztürk, Özcan

Optimizing shared cache behavior of chip multiprocessors

dc.citation.epage	516	en_US
dc.citation.spage	505	en_US
dc.contributor.author	Kandemir, M.	en_US
dc.contributor.author	Muralidhara, S. P.	en_US
dc.contributor.author	Narayanan, S. H. K.	en_US
dc.contributor.author	Zhang, Y.	en_US
dc.contributor.author	Öztürk, Özcan	en_US
dc.coverage.spatial	New York, USA
dc.date.accessioned	2016-02-08T12:25:54Z
dc.date.available	2016-02-08T12:25:54Z
dc.date.issued	2009-12	en_US
dc.department	Department of Computer Engineering	en_US
dc.description	Conference name: MICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009
dc.description	Date of Conference: 12-16 December , 2009
dc.description.abstract	One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme. Copyright 2009 ACM.	en_US
dc.identifier.doi	10.1145/1669112.1669176	en_US
dc.identifier.uri	http://hdl.handle.net/11693/28639	en_US
dc.language.iso	English	en_US
dc.publisher	ACM	en_US
dc.relation.isversionof	http://dx.doi.org/10.1145/1669112.1669176	en_US
dc.source.title	MICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009	en_US
dc.subject	Algorithm	en_US
dc.subject	Experimentation	en_US
dc.subject	Processors	en_US
dc.subject	Design styles	en_US
dc.subject	Memory structure	en_US
dc.subject	Performance	en_US
dc.subject	Programming language	en_US
dc.subject	Computer software	en_US
dc.subject	Design	en_US
dc.subject	Experiments	en_US
dc.subject	Linguistics	en_US
dc.subject	Microprocessor chips	en_US
dc.subject	Multiprocessing systems	en_US
dc.subject	Optimization	en_US
dc.subject	Program compilers	en_US
dc.subject	Cache memory	en_US
dc.subject	Compilers
dc.title	Optimizing shared cache behavior of chip multiprocessors	en_US
dc.type	Conference Paper	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Optimizing shared cache behavior of chip multiprocessors.pdf
Size:: 7.44 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Computer Engineering