A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
Private reliability environments for efficient fault-tolerance in CGRAs
Tekijät: Jafri SMAH, Piestrak SJ, Hemani A, Paul K, Plosila J, Tenhunen H
Kustantaja: Springer New York LLC
Julkaisuvuosi: 2014
Journal: Design Automation for Embedded Systems
Tietokannassa oleva lehden nimi: DESIGN AUTOMATION FOR EMBEDDED SYSTEMS
Lehden akronyymi: DES AUTOM EMBED SYST
Vuosikerta: 18
Numero: 3-4
Aloitussivu: 295
Lopetussivu: 327
Sivujen määrä: 33
ISSN: 0929-5585
eISSN: 1572-8080
DOI: https://doi.org/10.1007/s10617-014-9129-6
In the era of platforms hosting multiple applications with variable reliability needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable. As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the CGRA domain, the existing schemes either only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) or protect only a particular region of a device (e.g. configuration memory, computation, or data memory). To complement these strategies, we propose private fault-tolerance environments which, in addition to modular redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have confirmed that the proposed architecture reduces the area overhead for self-checking (58 %) and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.