FPGA-Based Hardware Acceleration for Deep Learning in Mobile Robotics - UTU Research Portal

A4 Refereed article in a conference publication

FPGA-Based Hardware Acceleration for Deep Learning in Mobile Robotics

Authors: Al-Ameri, Yasir; Nguyen, Minh; Westerlund, Tomi

Editors: Nurmi, Jari; Rodrigues, Joachim; Pezzarossa, Luca; Åberg, Viktor; Behmanesh, Baktash

Conference name: IEEE Nordic Circuits and Systems Conference

Publication year: 2024

Book title : 2024 IEEE Nordic Circuits and Systems Conference (NORCAS)

First page : 1

Last page: 7

ISBN: 979-8-3315-1767-0

eISBN: 979-8-3315-1766-3

DOI: https://doi.org/10.1109/NorCAS64408.2024.10752450

Publication's open availability at the time of reporting: No Open Access

Publication channel's open availability : No Open Access publication channel

Web address : https://ieeexplore.ieee.org/document/10752450

Abstract

The increasing demand for real-time low-power hardware processing systems, endowed with the capacity to perform compute-intensive applications, accentuated the inadequacy of the conventional architecture of multicore general-purpose processors. In an effort to meet this demand, edge computing hardware accelerators have come to the forefront, notably with regard to deep learning and robotic systems. This paper explores preeminent hardware accelerators and examines the performance, accuracy, and power consumption of a GPU and an FPGA-based platform, specifically designed for edge computing applications. The experiments were conducted using three deep neural network models, namely AlexNet, GoogLeNet, and ResNet-18, trained to perform binary image classification in a known environment. Our results demonstrate that the FPGA-based platform, particularly a Kria KV260 Vision AI starter kit, exhibited an inference speed of up to nine and a half times faster than that of the GPU-based Jetson Nano developer kit. Additionally, the empirical findings of this work reported as much as a quintuple efficiency over the Jetson Nano in terms of inference speed per watt with a mere 5.4% drop in accuracy caused by the quantization process required by the FPGA. However, the Jetson Nano showed a 1.6 times faster inference rate with the AlexNet model over the KV260 and its deployment process proved to be less challenging.