A1 Refereed original research article in a scientific journal

DAI-NET: Toward communication-aware collaborative training for the industrial edge




AuthorsMwase Christine, Jin Yi, Westerlund Tomi, Tenhunen Hannu, Zou Zhuo

PublisherElsevier BV

Publication year2024

JournalFuture Generation Computer Systems

Journal name in sourceFuture Generation Computer Systems

Volume155

First page 193

Last page203

ISSN0167-739X

eISSN1872-7115

DOIhttps://doi.org/10.1016/j.future.2024.01.027

Web address https://doi.org/10.1016/j.future.2024.01.027


Abstract
The industrial edge generates an abundance of spatially distributed and dynamic data that needs to remain on-site for privacy and security reasons. Collaborative training at the edge can leverage this data to refine pre-trained models locally for specific industrial tasks and environments and have them adapt to local changes for enhanced performance, agility, and resilience. However, communication between the devices during training is a key bottleneck and is not modelled by existing frameworks such as MxNet, PyTorch and TensorFlow. This paper introduces DAI-NET, a co-simulation framework for examining communication and its associated costs, and provides results from an implementation using Python, OMNET++ and INET. To validate it and showcase its utility, the developed platform is applied in the analysis of (i) the performance and cost of collaboratively training a Multilayer Perceptron model, and (ii) the influence of computational heterogeneity. Communication costs generated during the training are captured at the device and system levels. In computationally heterogeneous clusters, the root cause of stragglers is exposed. In addition, the key performance contributors are identified to be a cluster’s computation capability and the variation in the relative computation capabilities of its devices. This study is particularly useful for artificial intelligence of things (AIoT) systems, whose bandwidth and energy resources are limited. It lends the way for more practical research on communication-efficient algorithms, network protocols and architectures for the AIoT edge.



Last updated on 2024-26-11 at 19:12