Micro-Architectural features as soft-error markers in embedded safety-critical systems: preliminary study

Series

Abstract

Radiation-induced soft errors are one of the most challenging issues in Safety Critical Real-Time Embedded System (SACRES) reliability, usually handled using different flavors of Double Modular Redundancy (DMR) techniques. This solution is becoming unaffordable due to the complexity of modern micro-processors in all domains. This paper addresses the promising field of using Artificial Intelligence (AI) based hardware detectors for soft errors. To create such cores and make them general enough to work with different software applications, micro-Architectural attributes are a fascinating option as candidate fault detection features. Several processors already track these features through dedicated Performance Monitoring Unit (PMU). However, there is an open question to understand to what extent they are enough to detect faulty executions. Exploiting the capability of gem5 to simulate real computing systems, perform fault injection experiments, and profile micro-Architectural attributes (i.e., gem5 Stats), this paper presents the results of a comprehensive analysis regarding the potential attributes to detect soft errors and the associated models that can be trained with these features.

Source Title

2023 IEEE European Test Symposium (ETS)

Publisher

Institute of Electrical and Electronics Engineers Inc.

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

en