A1 Refereed original research article in a scientific journal
Performance/Reliability-Aware Resource Management for Many-Cores in Dark Silicon Era
Authors: Haghbayan M, Miele A, Rahmani A, Liljeberg P, Tenhunen H
Publisher: IEEE Computer Society
Publication year: 2017
Journal: IEEE Transactions on Computers
Journal name in source: IEEE Transactions on Computers
Volume: 66
Issue: 9
First page : 1599
Last page: 1612
Number of pages: 14
ISSN: 0018-9340
eISSN: 1557-9956
DOI: https://doi.org/10.1109/TC.2017.2691009
Aggressive technology scaling has enabled the fabrication of many-core architectures while triggering challenges such as limited power budget and increased reliability issues, like aging phenomena. Dynamic power management and runtime mapping strategies can be utilized in such systems to achieve optimal performance while satisfying power constraints. However, lifetime reliability is generally neglected. We propose a novel lifetime reliability/performance-Aware resource co-management approach for many-core architectures in the dark silicon era. The approach is based on a two-layered architecture, composed of a long-Term runtime reliability controller and a short-Term runtime mapping and resource management unit. The former evaluates the cores' aging status w.r.t. a target reference specified by the designer, and performs recovery actions on highly stressed cores by means of power capping. The aging status is utilized in runtime application mapping to maximize system performance while fulfilling reliability requirements and honoring the power budget. Experimental evaluation demonstrates the effectiveness of the proposed strategy, which outperforms most recent state-of-The-Art contributions.