A4 Refereed article in a conference publication
Tango: Low Latency Multi-DNN Inference on Heterogeneous Edge Platforms
Authors: Taufique, Zain; Vyas, Aman; Miele, Antonio, Liljeberg, Pasi; Kanduri, Anil
Editors: N/A
Conference name: IEEE International Conference on Computer Design
Publication year: 2024
Journal: Proceedings : IEEE International Conference on Computer Design
Book title : 2024 IEEE 42nd International Conference on Computer Design (ICCD)
Volume: 42
First page : 300
Last page: 307
ISBN: 979-8-3503-8041-5
eISBN: 979-8-3503-8040-8
ISSN: 1063-6404
eISSN: 2576-6996
DOI: https://doi.org/10.1109/ICCD63220.2024.00053
Web address : https://ieeexplore.ieee.org/document/10817997
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/477606436
There is an increasing demand to run DNN applications on edge platforms for low-latency inference. Executing multi-DNN workloads with diverse compute and latency requirements on resource-constrained heterogeneous edge platforms poses a significant scheduling challenge. In this work, we present Tango framework for orchestrating multi-DNN inference on heterogeneous edge platforms. Our approach uses a Proximal Policy-based Reinforcement Learning agent to jointly optimize cluster selection, accuracy configuration, and frequency scaling to minimize inference latency with a tolerable accuracy loss. We implemented the proposed Tango framework as a portable middleware and deployed it on real hardware of the Jetson TX edge platform. Our evaluation against relevant multi-DNN scheduling strategies demonstrates 61 % lower latency and 48.4 % lower energy consumption at a maximum accuracy loss of 1.59 %.
Funding information in the publication:
This work is funded by the European Union’s Horizon 2020 Research and Innovation Program (APROPOS) under the Marie Curie grant No. 956090