Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table

dc.citation.epage187en_US
dc.citation.spage180en_US
dc.contributor.authorSoltaniyeh, M.en_US
dc.contributor.authorKadayıf, I.en_US
dc.contributor.authorÖztürk, Özcanen_US
dc.coverage.spatialComo, Italy
dc.date.accessioned2018-04-12T11:44:01Z
dc.date.available2018-04-12T11:44:01Z
dc.date.issued2016- 05en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: 16-19 May, 2016
dc.descriptionConference name: CF '16 Proceedings of the ACM International Conference on Computing Frontiers
dc.description.abstractChip multiprocessors (CMPs) require effective cache coher-ence protocols as well as fast virtual-To-physical address trans-lation mechanisms for high performance. Directory-based cache coherence protocols are the state-of-The-Art approaches in many-core CMPs to keep the data blocks coherent at the last level private caches. However, the area overhead and high associativity requirement of the directory structures may not scale well with increasingly higher number of cores. As shown in some prior studies, a significant percentage of data blocks are accessed by only one core, therefore, it is not necessary to keep track of these in the directory struc-ture. In this study, we have two major contributions. First, we show that compared to the classification of cache blocks at page granularity as done in some previous studies, data block classification at subpage level helps to detect consid-erably more private data blocks. Consequently, it reduces the percentage of blocks required to be tracked in the di-rectory significantly compared to similar page level classification approaches. This, in turn, enables smaller directory caches with lower associativity to be used in CMPs without hurting performance, thereby helping the directory struc-ture to scale gracefully with the increasing number of cores. Memory block classification at subpage level, however, may increase the frequency of the Operating System's (OS) in-volvement in updating the maintenance bits belonging to subpages stored in page table entries, nullifying some por-tion of performance benefits of subpage level data classification. To overcome this, we propose a distributed on-chip page table as a our second contribution. © 2016 Copyright held by the owner/author(s).en_US
dc.identifier.doi10.1145/2903150.2903175en_US
dc.identifier.urihttp://hdl.handle.net/11693/37563
dc.language.isoEnglishen_US
dc.publisherACMen_US
dc.relation.isversionofhttps://doi.org/10.1145/2903150.2903175en_US
dc.source.titleCF '16 Proceedings of the ACM International Conference on Computing Frontiersen_US
dc.subjectCache coherenceen_US
dc.subjectDirectory cacheen_US
dc.subjectMany-core systemen_US
dc.subjectPage tableen_US
dc.subjectVirtual memoryen_US
dc.subjectClassification (of information)en_US
dc.subjectMultiprocessing systemsen_US
dc.subjectPhysical addressesen_US
dc.subjectVirtual addressesen_US
dc.subjectCache Coherenceen_US
dc.subjectDirectory cachesen_US
dc.subjectMany coreen_US
dc.subjectPage tableen_US
dc.subjectVirtual memoryen_US
dc.subjectCache memoryen_US
dc.titleBoosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page tableen_US
dc.typeConference Paperen_US

Files