FPGA-Based Hardware Acceleration for Deep Learning in Mobile Robotics - UTU Tutkimustietojärjestelmä

A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

FPGA-Based Hardware Acceleration for Deep Learning in Mobile Robotics

Tekijät: Al-Ameri, Yasir; Nguyen, Minh; Westerlund, Tomi

Toimittaja: Nurmi, Jari; Rodrigues, Joachim; Pezzarossa, Luca; Åberg, Viktor; Behmanesh, Baktash

Konferenssin vakiintunut nimi: IEEE Nordic Circuits and Systems Conference

Julkaisuvuosi: 2024

Kokoomateoksen nimi: 2024 IEEE Nordic Circuits and Systems Conference (NORCAS)

Aloitussivu: 1

Lopetussivu: 7

ISBN: 979-8-3315-1767-0

eISBN: 979-8-3315-1766-3

DOI: https://doi.org/10.1109/NorCAS64408.2024.10752450

Verkko-osoite: https://ieeexplore.ieee.org/document/10752450

Tiivistelmä

The increasing demand for real-time low-power hardware processing systems, endowed with the capacity to perform compute-intensive applications, accentuated the inadequacy of the conventional architecture of multicore general-purpose processors. In an effort to meet this demand, edge computing hardware accelerators have come to the forefront, notably with regard to deep learning and robotic systems. This paper explores preeminent hardware accelerators and examines the performance, accuracy, and power consumption of a GPU and an FPGA-based platform, specifically designed for edge computing applications. The experiments were conducted using three deep neural network models, namely AlexNet, GoogLeNet, and ResNet-18, trained to perform binary image classification in a known environment. Our results demonstrate that the FPGA-based platform, particularly a Kria KV260 Vision AI starter kit, exhibited an inference speed of up to nine and a half times faster than that of the GPU-based Jetson Nano developer kit. Additionally, the empirical findings of this work reported as much as a quintuple efficiency over the Jetson Nano in terms of inference speed per watt with a mere 5.4% drop in accuracy caused by the quantization process required by the FPGA. However, the Jetson Nano showed a 1.6 times faster inference rate with the AlexNet model over the KV260 and its deployment process proved to be less challenging.