Analysis of scratch-pad memory-based processor architecture for graph applications
Embargo Lift Date: 2022-03-28
Item Usage Stats
In graph analytic applications, main memory accesses prove to be a bottleneck as graphs have a poor spatial and temporal locality usage in the caches and higher memory hierarchy. Although this bottleneck is slightly mitigated with the use of miss status handling registers (MSHRs) in caches, the problem becomes more signi cant in the case of large graphs. The MSHR, which relies on an out-of-order processor's reorder buffer, becomes quickly saturated as the memory requests keep on piling up because of the limited instruction window size. To tackle the memory bottleneck for graph applications, the use of a Scratchpad Memory (SPM) together with custom instructions is proposed. This model is implemented and tested on a custom in-order processor using the x86 architecture to accommodate the related custom instructions. The custom instructions provide non-blocking access to data from the main memory while overlapping with other non-blocking instructions in the CPU pipeline. This design is evaluated on an industry-level simulator, GEM5, and uses the graph kernels from the GAP Benchmark to test the proposed system. The system shows a speedup of up to 7x for PageRank while averaging a speedup of 1.5x for the other graph kernels such as Single-Source shortest path, Connected Components, and Triangle Counting.
KeywordsDomain-speci c processor
Custom x86 instructions
Iterative graph applications