Analysis of scratch-pad memory-based processor architecture for graph applications
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
In graph analytic applications, main memory accesses prove to be a bottleneck as graphs have a poor spatial and temporal locality usage in the caches and higher memory hierarchy. Although this bottleneck is slightly mitigated with the use of miss status handling registers (MSHRs) in caches, the problem becomes more signi cant in the case of large graphs. The MSHR, which relies on an out-of-order processor's reorder buffer, becomes quickly saturated as the memory requests keep on piling up because of the limited instruction window size. To tackle the memory bottleneck for graph applications, the use of a Scratchpad Memory (SPM) together with custom instructions is proposed. This model is implemented and tested on a custom in-order processor using the x86 architecture to accommodate the related custom instructions. The custom instructions provide non-blocking access to data from the main memory while overlapping with other non-blocking instructions in the CPU pipeline. This design is evaluated on an industry-level simulator, GEM5, and uses the graph kernels from the GAP Benchmark to test the proposed system. The system shows a speedup of up to 7x for PageRank while averaging a speedup of 1.5x for the other graph kernels such as Single-Source shortest path, Connected Components, and Triangle Counting.