Tango: Low Latency Multi-DNN Inference on Heterogeneous Edge Platforms - UTU Research Portal

A4 Refereed article in a conference publication

Tango: Low Latency Multi-DNN Inference on Heterogeneous Edge Platforms

Authors: Taufique, Zain; Vyas, Aman; Miele, Antonio, Liljeberg, Pasi; Kanduri, Anil

Editors: N/A

Conference name: IEEE International Conference on Computer Design

Publication year: 2024

Journal:Proceedings : IEEE International Conference on Computer Design

Book title : 2024 IEEE 42nd International Conference on Computer Design (ICCD)

Volume: 42

First page : 300

Last page: 307

ISBN: 979-8-3503-8041-5

eISBN: 979-8-3503-8040-8

ISSN: 1063-6404

eISSN: 2576-6996

DOI: https://doi.org/10.1109/ICCD63220.2024.00053

Web address : https://ieeexplore.ieee.org/document/10817997

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/477606436

Abstract

There is an increasing demand to run DNN applications on edge platforms for low-latency inference. Executing multi-DNN workloads with diverse compute and latency requirements on resource-constrained heterogeneous edge platforms poses a significant scheduling challenge. In this work, we present Tango framework for orchestrating multi-DNN inference on heterogeneous edge platforms. Our approach uses a Proximal Policy-based Reinforcement Learning agent to jointly optimize cluster selection, accuracy configuration, and frequency scaling to minimize inference latency with a tolerable accuracy loss. We implemented the proposed Tango framework as a portable middleware and deployed it on real hardware of the Jetson TX edge platform. Our evaluation against relevant multi-DNN scheduling strategies demonstrates 61 % lower latency and 48.4 % lower energy consumption at a maximum accuracy loss of 1.59 %.

Funding information in the publication:
This work is funded by the European Union’s Horizon 2020 Research and Innovation Program (APROPOS) under the Marie Curie grant No. 956090