Performance/Reliability-Aware Resource Management for Many-Cores in Dark Silicon Era




Haghbayan M, Miele A, Rahmani A, Liljeberg P, Tenhunen H

PublisherIEEE Computer Society

2017

IEEE Transactions on Computers

IEEE Transactions on Computers

66

9

1599

1612

14

0018-9340

1557-9956

DOIhttps://doi.org/10.1109/TC.2017.2691009



Aggressive technology scaling has enabled the fabrication of many-core architectures while triggering challenges such as limited power budget and increased reliability issues, like aging phenomena. Dynamic power management and runtime mapping strategies can be utilized in such systems to achieve optimal performance while satisfying power constraints. However, lifetime reliability is generally neglected. We propose a novel lifetime reliability/performance-Aware resource co-management approach for many-core architectures in the dark silicon era. The approach is based on a two-layered architecture, composed of a long-Term runtime reliability controller and a short-Term runtime mapping and resource management unit. The former evaluates the cores' aging status w.r.t. a target reference specified by the designer, and performs recovery actions on highly stressed cores by means of power capping. The aging status is utilized in runtime application mapping to maximize system performance while fulfilling reliability requirements and honoring the power budget. Experimental evaluation demonstrates the effectiveness of the proposed strategy, which outperforms most recent state-of-The-Art contributions.



Last updated on 2024-26-11 at 21:45