A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

DAI-NET: Toward communication-aware collaborative training for the industrial edge




TekijätMwase Christine, Jin Yi, Westerlund Tomi, Tenhunen Hannu, Zou Zhuo

KustantajaElsevier BV

Julkaisuvuosi2024

JournalFuture Generation Computer Systems

Tietokannassa oleva lehden nimiFuture Generation Computer Systems

Vuosikerta155

Aloitussivu193

Lopetussivu203

ISSN0167-739X

eISSN1872-7115

DOIhttps://doi.org/10.1016/j.future.2024.01.027

Verkko-osoitehttps://doi.org/10.1016/j.future.2024.01.027


Tiivistelmä
The industrial edge generates an abundance of spatially distributed and dynamic data that needs to remain on-site for privacy and security reasons. Collaborative training at the edge can leverage this data to refine pre-trained models locally for specific industrial tasks and environments and have them adapt to local changes for enhanced performance, agility, and resilience. However, communication between the devices during training is a key bottleneck and is not modelled by existing frameworks such as MxNet, PyTorch and TensorFlow. This paper introduces DAI-NET, a co-simulation framework for examining communication and its associated costs, and provides results from an implementation using Python, OMNET++ and INET. To validate it and showcase its utility, the developed platform is applied in the analysis of (i) the performance and cost of collaboratively training a Multilayer Perceptron model, and (ii) the influence of computational heterogeneity. Communication costs generated during the training are captured at the device and system levels. In computationally heterogeneous clusters, the root cause of stragglers is exposed. In addition, the key performance contributors are identified to be a cluster’s computation capability and the variation in the relative computation capabilities of its devices. This study is particularly useful for artificial intelligence of things (AIoT) systems, whose bandwidth and energy resources are limited. It lends the way for more practical research on communication-efficient algorithms, network protocols and architectures for the AIoT edge.



Last updated on 2024-26-11 at 19:12