A1 Refereed original research article in a scientific journal
Tolerating transient illegal turn faults in NoCs
Authors: Huang LT, Zhang XF, Ebrahimi M, Li GJ
Publisher: ELSEVIER SCIENCE BV
Publication year: 2016
Journal: Microprocessors and Microsystems
Journal name in source: MICROPROCESSORS AND MICROSYSTEMS
Journal acronym: MICROPROCESS MICROSY
Volume: 43
Issue: SI
First page : 104
Last page: 115
Number of pages: 12
ISSN: 0141-9331
eISSN: 1872-9436
DOI: https://doi.org/10.1016/j.micpro.2016.01.016(external)
Abstract
Network-on-Chip (NoC) is becoming a competitive solution to connect hundreds of processing elements in modern computing platforms. Under the trend of shrinking feature sizes, circuits are likely to suffer from faults which lead to degraded performance and erroneous behaviour. Compared to permanent faults, transient faults happen even more frequently and seriously while they are hidden within complex on chip behaviours. One of the serious consequences caused by transient faults is taking illegal turns by the packets after the damage of control logic in on-chip routers which may lead to a deadlock situation and eventually crashing the entire system. To avoid this situation, in this paper, we propose a comprehensive scheme called ODT including an improved router architecture, an illegal-turn-resilient routing algorithm, online fault-detect units and a fault classification method. By applying ODT, more turns are supported on routing level and the deadlock situations can be significantly reduced. Experimental results indicate up to 22% increase of the survived packets in the network when 4% of routing computation units in failure. The extra area overhead and power consumption of ODT method is around 9.22% and 9.63%. (C) 2016 Elsevier B.V. All rights reserved.
Network-on-Chip (NoC) is becoming a competitive solution to connect hundreds of processing elements in modern computing platforms. Under the trend of shrinking feature sizes, circuits are likely to suffer from faults which lead to degraded performance and erroneous behaviour. Compared to permanent faults, transient faults happen even more frequently and seriously while they are hidden within complex on chip behaviours. One of the serious consequences caused by transient faults is taking illegal turns by the packets after the damage of control logic in on-chip routers which may lead to a deadlock situation and eventually crashing the entire system. To avoid this situation, in this paper, we propose a comprehensive scheme called ODT including an improved router architecture, an illegal-turn-resilient routing algorithm, online fault-detect units and a fault classification method. By applying ODT, more turns are supported on routing level and the deadlock situations can be significantly reduced. Experimental results indicate up to 22% increase of the survived packets in the network when 4% of routing computation units in failure. The extra area overhead and power consumption of ODT method is around 9.22% and 9.63%. (C) 2016 Elsevier B.V. All rights reserved.