A1 Refereed original research article in a scientific journal

Private reliability environments for efficient fault-tolerance in CGRAs




AuthorsJafri SMAH, Piestrak SJ, Hemani A, Paul K, Plosila J, Tenhunen H

PublisherSpringer New York LLC

Publication year2014

JournalDesign Automation for Embedded Systems

Journal name in sourceDESIGN AUTOMATION FOR EMBEDDED SYSTEMS

Journal acronymDES AUTOM EMBED SYST

Volume18

Issue3-4

First page 295

Last page327

Number of pages33

ISSN0929-5585

eISSN1572-8080

DOIhttps://doi.org/10.1007/s10617-014-9129-6


Abstract

In the era of platforms hosting multiple applications with variable reliability needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable. As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the CGRA domain, the existing schemes either only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) or protect only a particular region of a device (e.g. configuration memory, computation, or data memory). To complement these strategies, we propose private fault-tolerance environments which, in addition to modular redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have confirmed that the proposed architecture reduces the area overhead for self-checking (58 %) and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.




Last updated on 2024-26-11 at 12:15