RGB and Depth Image Fusion for Object Detection Using Deep Learning - UTU Research Portal

A3 Refereed book chapter or chapter in a compilation book

RGB and Depth Image Fusion for Object Detection Using Deep Learning

Authors: Farahnakian Fahimeh, Heikkonen Jukka

Editors: M. Arif Wani, Bhiksha Raj, Feng Luo, Dejing Dou

Publication year: 2021

Book title : Deep Learning Applications, Volume 3

Series title: Advances in Intelligent Systems and Computing

Volume: 1395

First page : 73

Last page: 93

ISBN: 978-981-16-3356-0

eISBN: 978-981-16-3357-7

DOI: https://doi.org/10.1007/978-981-16-3357-7_3

Abstract

Object detection as a main task of computer vision aims at locating and classifying interest objects in a scene. Most of the existing object detection methods apply on RGB images. However, RGB images cannot directly provide depth information which would help an object detector to achieve a better performance under complex environment. To address this problem, we present an early fusion architecture to perform object detection by combining RGB and depth images. The architecture firstly employs an unsupervised learning depth estimation technique to automatically infer a dense depth image from a single RGB input image. Then, the depth image is concatenated to a RGB image at a very low abstraction level to perform object detection using a deep learning model. Finally, the architecture predicts multiple 2D bounding boxes to localize the objects. To generate depth image, we investigate the effect of performance of four well-known depth estimation methods on our fusion architecture. Moreover, we compared the fusion architecture with two uni-modal architectures which use only RGB or depth image for object detection. The experimental results on the KITTI dataset show that our RGB-depth fusion approach outperforms the uni-modal architectures.