Compiler directed network-on-chip reliability enhancement for chip multiprocessors
Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in chip multiprocessors increases, network-on-chip (NoC) type communication fabrics are expected to replace traditional point-to-point buses. Most of the prior software related work so far targeting CMPs focus on performance and power aspects. However, as technology scales, components of a CMP are being increasingly exposed to both transient and permanent hardware failures. This paper presents and evaluates a compiler-directed power-performance aware reliability enhancement scheme for network-on-chip (NoC) based chip multiprocessors (CMPs). The proposed scheme improves on-chip communication reliability by duplicating messages traveling across CMP nodes such that, for each original message, its duplicate uses a different set of communication links as much as possible (to satisfy performance constraint). In addition, our approach tries to reuse communication links across the different phases of the program to maximize link shutdown opportunities for the NoC (to satisfy power constraint). Our results show that the proposed approach is very effective in improving on-chip network reliability, without causing excessive power or performance degradation. In our experiments, we also evaluate the performance oriented and energy oriented versions of our compiler-directed reliability enhancement scheme, and compare it to two pure hardware based fault tolerant routing schemes. © 2010 ACM.