Classifying data blocks at subpage granularity with an on-chip page table to improve coherence in tiled CMPs

buir.contributor.authorÖztürk, Özcan
dc.citation.epage819en_US
dc.citation.issueNumber4en_US
dc.citation.spage806en_US
dc.citation.volumeNumber37en_US
dc.contributor.authorSoltaniyeh, M.en_US
dc.contributor.authorKadayif, I.en_US
dc.contributor.authorÖztürk, Özcanen_US
dc.date.accessioned2019-02-21T16:05:20Z
dc.date.available2019-02-21T16:05:20Z
dc.date.issued2018en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractAs shown in some prior studies, a significant percentage of data blocks accessed in parallel codes are private, and not keeping track of those blocks can improve the effectiveness of directory structures in Chip multiprocessors (CMPs). In this paper, we have two major contributions. First, we showed that compared to the classification of cache blocks at page granularity, data block classification (DBC) at subpage level helps to detect considerably more private data blocks. Based on this idea, we propose two different approaches for enhancing the effectiveness of directory caches in tiled CMPs. In the first approach, which is called quasi-dynamic subpage level DBC (QDBC), a data block is assumed to be private from the beginning of the program execution and stays private as long as the corresponding subpage is accessed by only one core. Our second approach, which is called dynamic subpage level DBC, turns a data block into private again after all blocks within the corresponding subpage are evicted from private cache hierarchy. Memory block classification at subpage level, however, may increase the frequency of the operating system involvement in updating the maintenance bits in page table entries. To overcome this, we propose, as a second contribution, a distributed table called as on-chip page table (o-CPT), which stores recently accessed page translations in the system. Our simulation results show that, compared to page level data classification, QDBC and DBC approaches relying on the o-CPT can detect significantly more private data blocks and considerably improve system performance.
dc.description.provenanceMade available in DSpace on 2019-02-21T16:05:20Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018en
dc.identifier.doi10.1109/TCAD.2017.2729280
dc.identifier.issn0278-0070
dc.identifier.urihttp://hdl.handle.net/11693/50246
dc.language.isoEnglish
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.isversionofhttps://doi.org/10.1109/TCAD.2017.2729280
dc.source.titleIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systemsen_US
dc.subjectAddress translationen_US
dc.subjectCache coherenceen_US
dc.subjectChip multiprocessor (CMP)en_US
dc.subjectDirectory cacheen_US
dc.subjectPage tableen_US
dc.subjectTranslation look-aside buffer (TLB)en_US
dc.titleClassifying data blocks at subpage granularity with an on-chip page table to improve coherence in tiled CMPsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Classifying_Data_Blocks_at_Subpage_Granularity.pdf
Size:
2.48 MB
Format:
Adobe Portable Document Format
Description:
Full printable version