Private reliability environments for efficient fault-tolerance in CGRAs




Jafri SMAH, Piestrak SJ, Hemani A, Paul K, Plosila J, Tenhunen H

PublisherSpringer New York LLC

2014

Design Automation for Embedded Systems

DESIGN AUTOMATION FOR EMBEDDED SYSTEMS

DES AUTOM EMBED SYST

18

3-4

295

327

33

0929-5585

1572-8080

DOIhttps://doi.org/10.1007/s10617-014-9129-6



In the era of platforms hosting multiple applications with variable reliability needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable. As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the CGRA domain, the existing schemes either only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) or protect only a particular region of a device (e.g. configuration memory, computation, or data memory). To complement these strategies, we propose private fault-tolerance environments which, in addition to modular redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have confirmed that the proposed architecture reduces the area overhead for self-checking (58 %) and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.




Last updated on 2024-26-11 at 12:15